Teradata Connector for Hadoop Now Available

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Teradata Connector for Hadoop Now Available

The Teradata Connector for Hadoop (TDCH) is a map-reduce application that supports high-performance parallel bi-directional data movement between Teradata systems and various Hadoop ecosystem components.

Overview

The Teradata Connector for Hadoop (Command Line Edition) is freely available and provides the following capabilities:

  • End-user tool with its own CLI (Command Line Interface).
  • Designed and implemented for the Hadoop user audience.
  • Provides a Java API, enabling integration by 3rd parties as part of of an end-user tool. Hadoop vendors such as Hortonworks, Cloudera, IBM and MapR use TDCH's Java API in their respective Sqoop implementations, which are distributed and supported by the Hadoop vendors themselves. There is a Java API document available upon request.
  • Includes an installation script which sets up TDCH such that it can be launched remotely by Teradata Studio's Smart Loader for Hadoop and Teradata DataMover. For more information about these products see:  

Need Help?

For more detailed information on the Teradata Connector for Hadoop, please see the attached Tutorial document as well as the README file in the appropriate TDCH download packages.  The download packages are for use on commodity hardware.  For Teradata Hadoop Appliance hardware, it will be distributed with the appliance.  TDCH is supported by Teradata CS in certain situations where the user is a Teradata customer.

Teradata Connector for Hadoop 1.5.3 is now available.

For more information about Hadoop Product Management (PM), Teradata employees can go to Teradata Connections Hadoop PM.

 

Additional Information

Native Sqoop does not have an understanding of Teradata architecture or parallelism, so attempting to use native Sqoop to transfer data between Teradata and Hadoop would result in a point-to-point (single Teradata node to a single Hadoop node) data transfer with very poor performance.

 

TDCH enables parallel bi-directional data transfers between multiple Teradata nodes in a given Teradata system and multiple Hadoop nodes in a given Hadoop system and also provides access to some data transfer features unique to Teradata systems.

 

While TDCH does not need Sqoop to run, Cloudera, Hortonworks and MapR have provided Sqoop wrappers over TDCH for customers who need Sqoop-to-Teradata capability in their workflows.

89 REPLIES
Enthusiast

Re: Teradata Connector for Hadoop now available

I get this error when trying to run an import from teradata to HDFS .. any idea how to fix this ?

13/05/09 17:29:40 INFO tool.TeradataImportTool: TeradataImportTool starts at 1368120580654

java.io.FileNotFoundException: File -url does not exist.

at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:379)

at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:275)

at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:413)

at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:164)

at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:147)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:369)

13/05/09 17:29:41 INFO tool.TeradataImportTool: job completed with exit code 10000

Command:

hadoop com.teradata.hadoop.tool.TeradataImportTool -libjars $LIB_JARS -url jdbc:teradata://myserver/database=mydb -username user -password password -jobtype hdfs -sourcetable example1_td -nummappers 1 -separator ',' -targetpaths /user/mapred/ex1_hdfs -method split.by.hash -splitbycolumn c1

Enthusiast

Re: Teradata Connector for Hadoop now available

Please igore the above, it is resolved.

Teradata Employee

Re: Teradata Connector for Hadoop now available

I got the following error message when running the command to import data from Teradata to Hadoop.

mapred@sdll6060:/> hadoop com.teradata.hadoop.tool.TeradataImportTool -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://pitd/DATABASE=hadoop_user1 -username hadoop_user1 -password hadoop_user1 -jobtype hdfs -fileformat textfile -method split.by.hash -separator "," -sourcetable sales_transaction -targetpaths /user/davidd/sales_transaction

13/05/30 13:51:06 INFO tool.TeradataImportTool: TeradataImportTool starts at 1369947066274

13/05/30 13:51:06 INFO mapreduce.TeradataInputProcessor: job setup starts at 1369947066893

13/05/30 13:51:08 INFO mapreduce.TeradataInputProcessor: job setup ends at 1369947068339

13/05/30 13:51:08 INFO mapreduce.TeradataInputProcessor: job setup time is 1s

com.teradata.hadoop.exception.TeradataHadoopSQLException: com.teradata.jdbc.jdbc_4.util.JDBCException: [Teradata Database] [TeraJDBC 14.10.00.21] [Error 6706] [SQLState HY000] The string contains an untranslatable character.

        at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDatabaseSQLException(ErrorFactory.java:307)

        at com.teradata.jdbc.jdbc_4.statemachine.ReceiveInitSubState.action(ReceiveInitSubState.java:108)

        at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:321)

        at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:202)

        at com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:123)

        at com.teradata.jdbc.jdbc_4.statemachine.StatementController.run(StatementController.java:114)

        at com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:381)

        at com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:323)

        at com.teradata.jdbc.jdbc_4.TDStatement.doNonPrepExecuteQuery(TDStatement.java:311)

        at com.teradata.jdbc.jdbc_4.TDStatement.executeQuery(TDStatement.java:1087)

        at com.teradata.hadoop.db.TeradataConnection.getColumnDescsForTable(TeradataConnection.java:966)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.setupSchemaMapping(TeradataInputProcessor.java:215)

        at com.teradata.hadoop.mapreduce.TeradataSplitByHashInputProcessor.setupSchemaMapping(TeradataSplitByHashInputProcessor.java:85)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.setup(TeradataInputProcessor.java:53)

        at com.teradata.hadoop.mapreduce.TeradataSplitByHashInputProcessor.setup(TeradataSplitByHashInputProcessor.java:51)

        at com.teradata.hadoop.job.TeradataImportJob.runJob(TeradataImportJob.java:86)

        at com.teradata.hadoop.tool.TeradataJobRunner.runImportJob(TeradataJobRunner.java:119)

        at com.teradata.hadoop.tool.TeradataImportTool.run(TeradataImportTool.java:39)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:369)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.setupSchemaMapping(TeradataInputProcessor.java:217)

        at com.teradata.hadoop.mapreduce.TeradataSplitByHashInputProcessor.setupSchemaMapping(TeradataSplitByHashInputProcessor.java:85)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.setup(TeradataInputProcessor.java:53)

        at com.teradata.hadoop.mapreduce.TeradataSplitByHashInputProcessor.setup(TeradataSplitByHashInputProcessor.java:51)

        at com.teradata.hadoop.job.TeradataImportJob.runJob(TeradataImportJob.java:86)

        at com.teradata.hadoop.tool.TeradataJobRunner.runImportJob(TeradataJobRunner.java:119)

        at com.teradata.hadoop.tool.TeradataImportTool.run(TeradataImportTool.java:39)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:369)

13/05/30 13:51:08 INFO tool.TeradataImportTool: job completed with exit code 10000

mapred@sdll6060:/>

The command is:

hadoop com.teradata.hadoop.tool.TeradataImportTool -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://pitd/DATABASE=hadoop_user1 -username hadoop_user1 -password hadoop_user1 -jobtype hdfs -fileformat textfile -method split.by.hash -separator "," -sourcetable sales_transaction -targetpaths /user/davidd/sales_transaction

Re: Teradata Connector for Hadoop now available

My requirement is to process huge volumes of data in hadoop (HDFS), do trasnformations and copy the transformed data to Teradata.

Can you please share the user manual to move data from HDFS to Teradata using Java API or Sqoop? Thanks in adavance.

Re: Teradata Connector for Hadoop now available

Hi ,

I am trying to understand the TDCH better and wanted to know if I would be able to load a non-empty table in Teradata from a hive table using the utility? or would I have to stage it in an empty table first( because the method is fastload) and then  load to final target.

Thanks,

Anand

Teradata Employee

Re: Teradata Connector for Hadoop now available

Problem:com.teradata.hadoop.exception.TeradataHadoopSQLException: com.teradata.jdbc.jdbc_4.util.JDBCException: [Teradata Database] [TeraJDBC 14.10.00.21] [Error 6706] [SQLState HY000] The string contains an untranslatable character.

Issue resolved by setting the internal flag of the Teradata 104 to true.

Flag value explained by Mark Li:

There is a flag “acceptreplacementCharacters” in the database, and its meaning is:

AcceptReplacementCharacters - Allow invalid characters to converted to the replacement character, rather than rejected.           

     FALSE (default): do not accept the replacement character, return error.                                          

     TRUE           : accept the replacement character.     

A tpareset is required.

Also, either run as "hdfs" (su hdfs) or make sure mapred has write access to output target folder of '/user/mapred' and it has to exist.  Teradata Hadoop VM template did not seem to create this folder, but there was a folder of '/mapred'.  You have to create it yourself and give proper write permission.  To list and check for folder content do:

hadoop fs -ls /user

or 

hadoop fs -ls /

Enthusiast

Re: Teradata Connector for Hadoop now available

I was unable to find any document around the "-conf <conf file>" option below.

13/06/10 19:51:26 INFO tool.TeradataExportTool: TeradataExportTool starts at 1370919086995

hadoop jar teradata-hadoop-connector.jar

    com.teradata.hadoop.tool.TeradataExportTool

    [-conf <conf file>] (optional)

Can you please point me to place which has more info or provide me with more here ?

Thanks !

Teradata Employee

Re: Teradata Connector for Hadoop now available

Hi, I'm Mark, from Teradata Connector Sustaining Team.

"-conf" option is used for hadoop command exclusively. User can set their hadoop options in a file, and use "-conf" to specify this file to pass all of his options to hadoop. You can refer to hadoop manual for more information.

Thanks!

Re: Teradata Connector for Hadoop now available

Hi,

I have been looking at ways to do a batch data export from TD to HADOOP.

We used sqoop connector but the perfromance was not as good as native FastExport to a falt file.

Are there any performance benchmarks available for the sqoop connector?

Thanks!