Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Teradata Studio
Teradata Employee

Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Aim: To achive data movement between Teradata and Hortonworks Hadoop using TDCH with Teradata Studio Client

 

Steps followed to configure and run TDCH connector using Teradata Studio client tool are as described below:

 

Software used along with the versions:

Teradata Studio 15.00.01

Hortonworks Hadoop 2.1

TDCH connector 1.3.1

Java Runtime Environment (JRE) version 1.6 or above

I Managed to get the TDCH connectors installed properly using the below steps:

1. After unzipping Teradata Connector for Hadoop tar.gz package, copied the following files into Hadoop's lib directory:

   /usr/lib/hadoop/lib

    a) teradata-connector-1.0.6.jar

    b) tdgssconfig.jar

    c) terajdbc4.jar

2. The Teradata Connector should be installed on a node that can submit hadoop

           mapreduce jobs via the 'hadoop jar' command; in mosts cases TDCH should be

           installed on a client node in hadoop cluster

3. After moving the TDCH rpm onto the target node, install the TDCH rpm using

           the following command. The rpm will install the TDCH jar and all other

           relevant files into the /usr/lib/tdch/<version> directory

             rpm -ivh teradata-connector-<version>-<target distro>.noarch.rpm

4. Setup defaults via the Teradata Connector configuration files (optional)

           Copy the following files from the /usr/lib/tdch/<version/conf

           directory into Hadoop's conf directory, set the default

           values in the template, and rename the files to remove ".template"

           suffix. The parameters in the configuration file will be used as

           default settings for Teradata Connector jobs.

             a) teradata-export-properties.xml.template

             b) teradata-import-properties.xml.template

Note : I tried with both modifying the required properties in the above two xml's and with any modifications.

 

5. Modify mapred.child.java.opts property value in mapred-site.xml on

           client node and add the following option (optional)

             -Djava.security.egd=file:/dev/./urandom

           Alternatively, user can also include the following JVM option in

           Teradata Connector job command:

             -Dmapred.child.java.opts="-Djava.security.egd=file:/dev/./urandom"

6. Modify the following OS networking values on the name node.

           (recommended)

            /proc/sys/net/ipv4/tcp_max_syn_backlog to 2048

7. Install TDCH into HDFS such that Teradata DataMover and

           Teradata Studio can launch TDCH jobs remotely

         Navigate to the /usr/lib/tdch/<version>/scripts directory,

         and run the configureOozie.sh script, providing the hostname

         or IP address of the node running the NameNode process as an

         argument.

         ./configureOozie.sh nn=<NameNode hostname or IP address>

         The configureOozie.sh script supports many more command line

         arguments; run the script without arguments for details.

After having done all this I was able to create a Hadoop Connection Profile using Hive as User. But I am unable to execute the LOAD_TO_HCATALOG and LOAD_FROM_HCATALOG connectors.

Below are the syntax for the queries executed on Teradata Studio 15.00.01

1)

select * from SYSLIB.LOAD_TO_HCATALOG (

on

(select * from UDA.TEST)

using

server('153.65.161.221')

port('9083')

username('hive')

dbname('default')

tablename('HADOOP_TEST')

) as A;

2)

SELECT *

FROM LOAD_FROM_HCATALOG(

using

server('153.65.161.221')

port('9083')

username('hive')

dbname('default')

tablename('HADOOP_TEST')

columns('*')

) AS A;

 

Error:

For the above query I get an error saying "syntax error expecting something like a ; between LOAD_TO_HCATALOG and ‘(‘  "

I have referred to the Teradata Query Grid document and other internet sources to ensure the syntax is correct. Please guide on what is going wrong here. Am I missing some configurations needed to make the data movement possible ?

Regards,

Aastha

7 REPLIES
Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Aastha, You are confusing two different products. Teradata Studio does not use SQL-H when transfering data between Teradata and Hadoop. If you want to transfer data between Teradata and Hadoop with Studio you would use the Smart Loader for Hadoop, a feature within Teradata Studio. Please refer to the article (http://developer.teradata.com/tools/articles/smart-loader-for-hadoop ) on how to use this feature. 

If you want to run SQL-H to transfer data from Teradata and Hadoop, this is a different product. Please post your question to the Extensibility Forum.

Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Yes I agree that I am confused. But if I check the post "SQL-H: Teradata to HDP: License Required?" on developer exchange I see the SQL-H queries being run from Teradata Studio itself.

I have installed TDCH on studio and what studio can do is do a data transfer using export, import and plugin command line. But at the same time I see load_from_hcatalog and load_to_hcatalog being run from Studio which is what confuses me.

Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Aastha, load_from_hcatalog() and load_to_hcatalog() are SQL-H functions. From Teradata Studio, you can open a SQL Editor window and execute SQL commands that call database functions. These commands are being executed by Teradata Database. Teradata Studio passed the SQL statement, via the JDBC driver, to the Teradata Database to be executed. Like I mentioned above, if you want to transfer data between Teradata and Hadoop using Studio functionality than you would use the Smart Loader for Hadoop.

Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

I have extracted the below files in my Windows machine but unable to copy them  to Hadoop's lib directory.

  a) teradata-connector-1.0.6.jar

    b) tdgssconfig.jar

    c) terajdbc4.jar

What is the command or procedure I should follow?

Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Swati, Please follow the instructions in the Readme on our Teradata Studio download page. As for moving files from Windows machine to Hadoop, you can use PUTTY or WinSCP.

Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Thanks for the reply. I have installed TDCH as mentioned in the Readme.

Now, I am trying to import the data from TD to Hadoop using command line. But I am getting the following error :

FIle -url does not exist.

I am giving the following url : -url jdbc:teradata://dbc/database=students.

Teradata Employee

Re: Error while trying for data movement between Teradata and Hadoop using TDCH installed on Teradata Studio

Swati, For command line usage of TDCH, please post issues to the Extensibility Forum (http://forums.teradata.com/forum/extensibility) or comment on the article (http://developer.teradata.com/connectivity/articles/teradata-connector-for-hadoop-now-available).