The Teradata Connector for Hadoop (TDCH) is a map-reduce application that supports high-performance parallel bi-directional data movement between Teradata systems and various Hadoop ecosystem components.
The Teradata Connector for Hadoop (Command Line Edition) is freely available and provides the following capabilities:
For more detailed information on the Teradata Connector for Hadoop, please see the attached Tutorial document as well as the README file in the appropriate TDCH download packages. The download packages are for use on commodity hardware. For Teradata Hadoop Appliance hardware, it will be distributed with the appliance. TDCH is supported by Teradata CS in certain situations where the user is a Teradata customer.
For more information about Hadoop Product Management (PM), Teradata employees can go to Teradata Connections Hadoop PM.
Native Sqoop does not have an understanding of Teradata architecture or parallelism, so attempting to use native Sqoop to transfer data between Teradata and Hadoop would result in a point-to-point (single Teradata node to a single Hadoop node) data transfer with very poor performance.
TDCH enables parallel bi-directional data transfers between multiple Teradata nodes in a given Teradata system and multiple Hadoop nodes in a given Hadoop system and also provides access to some data transfer features unique to Teradata systems.
While TDCH does not need Sqoop to run, Cloudera, Hortonworks and MapR have provided Sqoop wrappers over TDCH for customers who need Sqoop-to-Teradata capability in their workflows.
I get this error when trying to run an import from teradata to HDFS .. any idea how to fix this ?
hadoop com.teradata.hadoop.tool.TeradataImportTool -libjars $LIB_JARS -url jdbc:teradata://myserver/database=mydb -username user -password password -jobtype hdfs -sourcetable example1_td -nummappers 1 -separator ',' -targetpaths /user/mapred/ex1_hdfs -method split.by.hash -splitbycolumn c1
I got the following error message when running the command to import data from Teradata to Hadoop.
The command is:
hadoop com.teradata.hadoop.tool.TeradataImportTool -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://pitd/DATABASE=hadoop_user1 -username hadoop_user1 -password hadoop_user1 -jobtype hdfs -fileformat textfile -method split.by.hash -separator "," -sourcetable sales_transaction -targetpaths /user/davidd/sales_transaction
My requirement is to process huge volumes of data in hadoop (HDFS), do trasnformations and copy the transformed data to Teradata.
Can you please share the user manual to move data from HDFS to Teradata using Java API or Sqoop? Thanks in adavance.
I am trying to understand the TDCH better and wanted to know if I would be able to load a non-empty table in Teradata from a hive table using the utility? or would I have to stage it in an empty table first( because the method is fastload) and then load to final target.
Issue resolved by setting the internal flag of the Teradata 104 to true.
Flag value explained by Mark Li:
There is a flag “acceptreplacementCharacters” in the database, and its meaning is:
AcceptReplacementCharacters - Allow invalid characters to converted to the replacement character, rather than rejected.
FALSE (default): do not accept the replacement character, return error.
TRUE : accept the replacement character.
A tpareset is required.
Also, either run as "hdfs" (su hdfs) or make sure mapred has write access to output target folder of '/user/mapred' and it has to exist. Teradata Hadoop VM template did not seem to create this folder, but there was a folder of '/mapred'. You have to create it yourself and give proper write permission. To list and check for folder content do:
hadoop fs -ls /user
hadoop fs -ls /
I was unable to find any document around the "-conf <conf file>" option below.
Hi, I'm Mark, from Teradata Connector Sustaining Team.
"-conf" option is used for hadoop command exclusively. User can set their hadoop options in a file, and use "-conf" to specify this file to pass all of his options to hadoop. You can refer to hadoop manual for more information.
I have been looking at ways to do a batch data export from TD to HADOOP.
We used sqoop connector but the perfromance was not as good as native FastExport to a falt file.
Are there any performance benchmarks available for the sqoop connector?