Aim: To achive data movement between Teradata and Hortonworks Hadoop using TDCH with Teradata Studio Client
Steps followed to configure and run TDCH connector using Teradata Studio client tool are as described below:
Software used along with the versions:
Teradata Studio 15.00.01
Hortonworks Hadoop 2.1
TDCH connector 1.3.1
Java Runtime Environment (JRE) version 1.6 or above
I Managed to get the TDCH connectors installed properly using the below steps:
1. After unzipping Teradata Connector for Hadoop tar.gz package, copied the following files into Hadoop's lib directory:
2. The Teradata Connector should be installed on a node that can submit hadoop
mapreduce jobs via the 'hadoop jar' command; in mosts cases TDCH should be
installed on a client node in hadoop cluster
3. After moving the TDCH rpm onto the target node, install the TDCH rpm using
the following command. The rpm will install the TDCH jar and all other
relevant files into the /usr/lib/tdch/<version> directory
rpm -ivh teradata-connector-<version>-<target distro>.noarch.rpm
4. Setup defaults via the Teradata Connector configuration files (optional)
Copy the following files from the /usr/lib/tdch/<version/conf
directory into Hadoop's conf directory, set the default
values in the template, and rename the files to remove ".template"
suffix. The parameters in the configuration file will be used as
default settings for Teradata Connector jobs.
Note : I tried with both modifying the required properties in the above two xml's and with any modifications.
5. Modify mapred.child.java.opts property value in mapred-site.xml on
client node and add the following option (optional)
Alternatively, user can also include the following JVM option in
Teradata Connector job command:
6. Modify the following OS networking values on the name node.
/proc/sys/net/ipv4/tcp_max_syn_backlog to 2048
7. Install TDCH into HDFS such that Teradata DataMover and
Teradata Studio can launch TDCH jobs remotely
Navigate to the /usr/lib/tdch/<version>/scripts directory,
and run the configureOozie.sh script, providing the hostname
or IP address of the node running the NameNode process as an
./configureOozie.sh nn=<NameNode hostname or IP address>
The configureOozie.sh script supports many more command line
arguments; run the script without arguments for details.
After having done all this I was able to create a Hadoop Connection Profile using Hive as User. But I am unable to execute the LOAD_TO_HCATALOG and LOAD_FROM_HCATALOG connectors.
Below are the syntax for the queries executed on Teradata Studio 15.00.01
select * from SYSLIB.LOAD_TO_HCATALOG (
(select * from UDA.TEST)
) as A;
) AS A;
For the above query I get an error saying "syntax error expecting something like a ; between LOAD_TO_HCATALOG and ‘(‘ "
I have referred to the Teradata Query Grid document and other internet sources to ensure the syntax is correct. Please guide on what is going wrong here. Am I missing some configurations needed to make the data movement possible ?
Aastha, You are confusing two different products. Teradata Studio does not use SQL-H when transfering data between Teradata and Hadoop. If you want to transfer data between Teradata and Hadoop with Studio you would use the Smart Loader for Hadoop, a feature within Teradata Studio. Please refer to the article (http://developer.teradata.com/tools/articles/smart
If you want to run SQL-H to transfer data from Teradata and Hadoop, this is a different product. Please post your question to the Extensibility Forum.
Yes I agree that I am confused. But if I check the post "SQL-H: Teradata to HDP: License Required?" on developer exchange I see the SQL-H queries being run from Teradata Studio itself.
I have installed TDCH on studio and what studio can do is do a data transfer using export, import and plugin command line. But at the same time I see load_from_hcatalog and load_to_hcatalog being run from Studio which is what confuses me.
Aastha, load_from_hcatalog() and load_to_hcatalog() are SQL-H functions. From Teradata Studio, you can open a SQL Editor window and execute SQL commands that call database functions. These commands are being executed by Teradata Database. Teradata Studio passed the SQL statement, via the JDBC driver, to the Teradata Database to be executed. Like I mentioned above, if you want to transfer data between Teradata and Hadoop using Studio functionality than you would use the Smart Loader for Hadoop.
I have extracted the below files in my Windows machine but unable to copy them to Hadoop's lib directory.
What is the command or procedure I should follow?
Swati, Please follow the instructions in the Readme on our Teradata Studio download page. As for moving files from Windows machine to Hadoop, you can use PUTTY or WinSCP.
Thanks for the reply. I have installed TDCH as mentioned in the Readme.
Now, I am trying to import the data from TD to Hadoop using command line. But I am getting the following error :
FIle -url does not exist.
I am giving the following url : -url jdbc:teradata://dbc/database=students.
Swati, For command line usage of TDCH, please post issues to the Extensibility Forum (http://forums.teradata.com/forum/extensibility) or comment on the article (http://developer.teradata.com/connectivity/article