Teradata Connector for Hadoop Now Available

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Re: Teradata Connector for Hadoop now available

indra91,

Do you have a support contract with Teradata for your TD system?  If so, you should open up incidents for help instead of asking on this article.

Best regards,

Sean

Enthusiast

Re: Teradata Connector for Hadoop now available

Hi Sean,
I am not sure whether I have a support contract with Teradata,as I can see this is a forum for discussion and many queries have been asked and answered through this forum.I could not find this info on the user guide and tutorial available with TDCH.It will be helpful if somebody can provide some insight.
Teradata Employee

Re: Teradata Connector for Hadoop now available

indra91,

Here are some answers to your questions.

1.  TDCH 1.4.4 (latest release) supports many different methods for data export from Teradata.  You can see these in section 1.4.5 of the tutorial.  TDCH 1.5 (which is scheduled to release Q3 of this year) will support FastExport.

2.  TDCH is free to use.

3.  HBase is not supported as a target.

4.  Compression is not supported at the moment.

Hope this helps.

Best regards,

Sean

Enthusiast

Re: Teradata Connector for Hadoop now available

Hi Sean,
Thanks a lot for your inputs it was really helpful.For a use case where we need to offload huge volume of data (in TB)between
Teradata systems and hadoop could you suggest which would be a faster option to extract the data native fastexport utility or the TDCH?
Teradata Employee

Re: Teradata Connector for Hadoop now available

indra91,

I would suggest waiting for the TDCH 1.5 release and using the FastExport within TDCH which will give you the best results.

Best regards,

Sean

Not applicable

Re: Teradata Connector for Hadoop now available

I got following error while importing table from teradata to hadoop:-

16/07/03 09:52:15 INFO mapreduce.TeradataInputProcessor: job setup ends at 1467564735483

16/07/03 09:52:15 INFO mapreduce.TeradataInputProcessor: job setup time is 10s

16/07/03 09:52:15 ERROR tool.TeradataImportTool: com.teradata.hadoop.exception.TeradataHadoopException: java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 15.00.00.20] [Error 1277] [SQLState 08S01] Login timeout for Connection to 153.65.248.183 Sun Jul 03 09:52:15 PDT 2016 socket orig=X.X.X.X cid=65efb4be sess=0 java.net.SocketTimeoutException: connect timed out  at java.net.PlainSocketImpl.socketConnect(Native Method)  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)  at java.net.Socket.connect(Socket.java:529)  at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$ConnectThread.run(TDNetworkIOIF.java:1216)

        at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDriverJDBCException(ErrorFactory.java:94)

        at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDriverJDBCException(ErrorFactory.java:69)

        at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeIoJDBCException(ErrorFactory.java:207)

        at com.teradata.jdbc.jdbc_4.util.ErrorAnalyzer.analyzeIoError(ErrorAnalyzer.java:59)

        at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF.createSocketConnection(TDNetworkIOIF.java:154)

        at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF.<init>(TDNetworkIOIF.java:133)

        at com.teradata.jdbc.jdbc.GenericTeradataConnection.getIO(GenericTeradataConnection.java:113)

        at com.teradata.jdbc.jdbc.GenericLogonController.run(GenericLogonController.java:98)

        at com.teradata.jdbc.jdbc_4.TDSession.<init>(TDSession.java:205)

        at com.teradata.jdbc.jdk6.JDK6_SQL_Connection.<init>(JDK6_SQL_Connection.java:35)

        at com.teradata.jdbc.jdk6.JDK6ConnectionFactory.constructSQLConnection(JDK6ConnectionFactory.java:25)

        at com.teradata.jdbc.jdbc.ConnectionFactory.createConnection(ConnectionFactory.java:179)

        at com.teradata.jdbc.jdbc.ConnectionFactory.createConnection(ConnectionFactory.java:169)

        at com.teradata.jdbc.TeraDriver.doConnect(TeraDriver.java:232)

        at com.teradata.jdbc.TeraDriver.connect(TeraDriver.java:158)

        at java.sql.DriverManager.getConnection(DriverManager.java:582)

        at java.sql.DriverManager.getConnection(DriverManager.java:185)

        at com.teradata.hadoop.db.TeradataConnection.connect(TeradataConnection.java:336)

        at com.teradata.hadoop.mapreduce.TeradataProcessorBase.openConnection(TeradataProcessorBase.java:77)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.openConnection(TeradataInputProcessor.java:108)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.setup(TeradataInputProcessor.java:51)

        at com.teradata.hadoop.mapreduce.TeradataSplitByHashInputProcessor.setup(TeradataSplitByHashInputProcessor.java:51)

        at com.teradata.hadoop.job.TeradataHdfsFileImportJob.runJob(TeradataHdfsFileImportJob.java:207)

        at com.teradata.hadoop.tool.TeradataJobRunner.runImportJob(TeradataJobRunner.java:119)

        at com.teradata.hadoop.tool.TeradataImportTool.run(TeradataImportTool.java:41)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:464)

Caused by: java.net.SocketTimeoutException: connect timed out

        at java.net.PlainSocketImpl.socketConnect(Native Method)

        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)

        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)

        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)

        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)

        at java.net.Socket.connect(Socket.java:529)

        at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$ConnectThread.run(TDNetworkIOIF.java:1216)

        at com.teradata.hadoop.mapreduce.TeradataProcessorBase.openConnection(TeradataProcessorBase.java:80)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.openConnection(TeradataInputProcessor.java:108)

        at com.teradata.hadoop.mapreduce.TeradataInputProcessor.setup(TeradataInputProcessor.java:51)

        at com.teradata.hadoop.mapreduce.TeradataSplitByHashInputProcessor.setup(TeradataSplitByHashInputProcessor.java:51)

        at com.teradata.hadoop.job.TeradataHdfsFileImportJob.runJob(TeradataHdfsFileImportJob.java:207)

        at com.teradata.hadoop.tool.TeradataJobRunner.runImportJob(TeradataJobRunner.java:119)

        at com.teradata.hadoop.tool.TeradataImportTool.run(TeradataImportTool.java:41)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:464)

I think I am not able to connect Teradata database. I checked I am able to ping the URL.

Not applicable

Re: Teradata Connector for Hadoop now available

Hi I've downloaded TDCH 1.4.4.  I'm trying to connect an IBM BI 3.1 (Hadoop 2.2) cluster to a TD 14.10 database.  I have read through the TDCH 1.4 tutorial that you included in this article but I noticed it doesn't include any details on the Java API.  Can I get a copy of the Java API documentation sent to me?  Also, can you tell me whether leveraging the Java API will still allow me to use TD Wallet?  Or is TD Wallet only compatible with the command line version of TDCH?

Thanks,

Jonathan

Re: Teradata Connector for Hadoop now available

Hi, I am trying to Import/Export From/To teradata to Hive using TDCH , But getting below issue with CHAR,VARCHAR and DATE data types not supported in TDCH. As an alternate if we change those data types to string then no problem seen.

INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1473252737445

16/09/07 14:52:17 INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-unjar6516039745100009834/teradata.connector.plugins.xml

16/09/07 14:52:18 INFO hive.metastore: Trying to connect to metastore with URI thrift://el3207.bc:9083

16/09/07 14:52:18 INFO hive.metastore: Connected to metastore.

16/09/07 14:52:18 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor starts at:  1473252738715

16/09/07 14:52:19 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor ends at:  1473252738715

16/09/07 14:52:19 INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor is: 0s

16/09/07 14:52:19 INFO tool.ConnectorExportTool: com.teradata.connector.common.exception.ConnectorException: CHAR(6) Field data type is not supported

        at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140)

        at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)

        at com.teradata.connector.common.tool.ConnectorExportTool.main(ConnectorExportTool.java:780)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

16/09/07 14:52:19 INFO tool.ConnectorExportTool: job completed with exit code 14006

If anyone has solution for this, please share.

Thanks

Re: Teradata Connector for Hadoop now available

Hello!

 

We are trying to use the Teradata conector for hadoop. We are launching the job from a client hots( lets call it edge node ) where there is just a Hadoop client and it sends the job to a Hadoop cluster in the backend. This is a node in the frontend, which should not have connection to the Teradata.

 

BUT when we try to use Teradata connector for hadoop, the edge node tries to connect directly to the Teradata. We were supposing that the nodes of the hadoop cluster were the only ones who should have connectivity with the Teradata, but our surprise is that the client in the edge node tries to do a connection directly to the Teradata before startint the MR jobs. Please we need help about this.

 

 

THANK YOU VERY MUCH!

Visitor

Re: Teradata Connector for Hadoop Now Available

Hi,

 

I am trying to export hive table (date, timestamp columns) to teradata table (date, timestamp(0) columns) but it's failing. please let me know what is workaround or what i am doing wrong. your prompt reply will be highly appreciated.

 

Source Hive table detail

========================================================

 hive> desc account_profession;
OK
acc_number string 
acc_fname string 
acc_lname string 
acc_balances double 
acc_dates date 
acc_date_time timestamp 
acc_prof string 
acc_prof_type string 
Time taken: 0.327 seconds, Fetched: 8 row(s)
hive> select * from account_profession;
OK
0123456789 Pawan KUMAR 1.2345678E7 1999-05-13 1999-05-13 03:58:10 Engineer Software
0123456790 Sachin Tendulkar 1.2345678E7 2000-12-20 2000-12-20 03:58:10 Player Cricket
0123456792 Tom Alter 1.2345678E7 2006-06-03 2006-06-03 03:58:10 Actor Bollywood
Time taken: 0.137 seconds, Fetched: 3 row(s)

 

Traget Teradata table schema

=========================================================

CREATE Table icdm.account_profession(
acc_number varchar(15),
acc_fname varchar(25),
acc_lname varchar(25),
acc_balances double precision,
acc_dates date format 'yyyy-mm-dd',
acc_date_time timestamp(0),
acc_prof varchar(50),
acc_prof_type varchar(50)) primary index(acc_number) ;

 

 

Export console output

========================================================

[cloudera@quickstart ~]$ hadoop jar /usr/lib/tdch/1.5/lib/teradata-connector-1.5.3.jar com.teradata.connector.common.tool.ConnectorExportTool -libjars $LIB_JARS -url jdbc:teradata://192.168.50.130/database=icdmuser -username icdmuser -password icdmuser -classname com.teradata.jdbc.TeraDriver -sourcedatabase poc -sourcetable account_profession -nummappers 1 -targettable account_profession -separator '\u0001' -jobtype hive -method internal.fastload -fileformat orc
17/07/27 11:56:32 INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1501181792476
17/07/27 11:56:35 INFO common.ConnectorPlugin: load plugins in jar:file:/usr/lib/hadoop/lib/teradata-connector-1.5.3.jar!/teradata.connector.plugins.xml
17/07/27 11:56:36 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
17/07/27 11:56:37 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
17/07/27 11:56:37 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/07/27 11:56:37 INFO hive.metastore: Connected to metastore.
17/07/27 11:56:40 INFO processor.TeradataOutputProcessor: output preprocessor com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor starts at: 1501181800618
17/07/27 11:56:42 INFO utils.TeradataUtils: the output database product is Teradata
17/07/27 11:56:42 INFO utils.TeradataUtils: the output database version is 15.0
17/07/27 11:56:42 INFO utils.TeradataUtils: the jdbc driver version is 16.0
17/07/27 11:56:42 INFO processor.TeradataOutputProcessor: the teradata connector for hadoop version is: 1.5.3
17/07/27 11:56:42 INFO processor.TeradataOutputProcessor: output jdbc properties are jdbc:teradata://192.168.50.130/database=icdmuser
17/07/27 11:56:43 INFO processor.TeradataInternalFastloadProcessor: output staging table is not needed
17/07/27 11:56:43 INFO processor.TeradataOutputProcessor: the number of mappers are 1
17/07/27 11:56:43 INFO processor.TeradataOutputProcessor: output preprocessor com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor ends at: 1501181803839
17/07/27 11:56:43 INFO processor.TeradataOutputProcessor: the total elapsed time of output preprocessor com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor is: 3s
17/07/27 11:56:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/07/27 11:56:45 INFO teradata.TeradataInternalFastloadOutputFormat: user provided number of Mappers is NOT overridden by [2] DBS.
17/07/27 11:56:45 INFO input.FileInputFormat: Total input paths to process : 1
17/07/27 11:56:45 INFO input.FileInputFormat: Total input paths to process : 1
17/07/27 11:56:45 INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
17/07/27 11:56:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/07/27 11:56:45 WARN mapred.ResourceMgrDelegate: getBlacklistedTrackers - Not implemented yet
17/07/27 11:56:45 INFO Configuration.deprecation: mapred.tasktracker.dns.interface is deprecated. Instead, use mapreduce.tasktracker.dns.interface
17/07/27 11:56:45 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
17/07/27 11:56:45 INFO teradata.TeradataInternalFastloadOutputFormat: started load task: 1
17/07/27 11:56:49 INFO input.FileInputFormat: Total input paths to process : 1
17/07/27 11:56:49 INFO input.FileInputFormat: Total input paths to process : 1
17/07/27 11:56:49 INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
17/07/27 11:56:49 INFO mapreduce.JobSubmitter: number of splits:1
17/07/27 11:56:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1501165924924_0011
17/07/27 11:56:50 INFO impl.YarnClientImpl: Submitted application application_1501165924924_0011
17/07/27 11:56:50 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1501165924924_0011/
17/07/27 11:56:50 INFO mapreduce.Job: Running job: job_1501165924924_0011
17/07/27 11:57:13 INFO mapreduce.Job: Job job_1501165924924_0011 running in uber mode : false
17/07/27 11:57:14 INFO mapreduce.Job: map 0% reduce 0%
17/07/27 11:57:36 INFO teradata.TeradataInternalFastloadOutputFormat$InternalFastloadCoordinator: USING "acc_number" (VARCHAR(15)), "acc_fname" (VARCHAR(25)), "acc_lname" (VARCHAR(25)), "acc_balances" (DOUBLE PRECISION), "acc_dates" (DATE), "acc_date_time" (CHAR(26)), "acc_prof" (VARCHAR(50)), "acc_prof_type" (VARCHAR(50)) INSERT INTO "account_profession" ( "acc_number", "acc_fname", "acc_lname", "acc_balances", "acc_dates", "acc_date_time", "acc_prof", "acc_prof_type" ) VALUES ( :"acc_number", :"acc_fname", :"acc_lname", :"acc_balances", :"acc_dates", :"acc_date_time", :"acc_prof", :"acc_prof_type" )
17/07/27 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
17/07/27 11:57:40 INFO mapreduce.Job: Job job_1501165924924_0011 completed successfully
17/07/27 11:57:40 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=133496
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1564
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=22924
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=22924
Total vcore-seconds taken by all map tasks=22924
Total megabyte-seconds taken by all map tasks=23474176
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=178
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=207
CPU time spent (ms)=3610
Physical memory (bytes) snapshot=136704000
Virtual memory (bytes) snapshot=1503948800
Total committed heap usage (bytes)=60882944
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/07/27 11:57:40 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor starts at: 1501181860908
17/07/27 11:57:41 WARN processor.TeradataInternalFastloadProcessor: error table "account_profession_ERR_1" is not empty
java.lang.Throwable
at com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor.cleanupDatabaseEnvironment(TeradataInternalFastloadProcessor.java:365)
at com.teradata.connector.teradata.processor.TeradataOutputProcessor.outputPostProcessor(TeradataOutputProcessor.java:72)
at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:172)
at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.teradata.connector.common.tool.ConnectorExportTool.main(ConnectorExportTool.java:853)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
17/07/27 11:57:41 INFO processor.TeradataInternalFastloadProcessor: error table "account_profession_ERR_2" was dropped
17/07/27 11:57:41 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor ends at: 1501181860908
17/07/27 11:57:41 INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataInternalFastloadProcessor is: 1s
17/07/27 11:57:41 INFO tool.ConnectorExportTool: ConnectorExportTool ends at 1501181861941
17/07/27 11:57:41 INFO tool.ConnectorExportTool: ConnectorExportTool time is 69s
17/07/27 11:57:41 INFO tool.ConnectorExportTool: job completed with exit code 0

 

Teradata Error table  detail

========================================================

BTEQ -- Enter your SQL request or BTEQ command:
sel ErrorCode, ErrorFieldName from ICDMUSER.account_profession_err_1;

sel ErrorCode, ErrorFieldName from ICDMUSER.account_profession_err_1;

*** Query completed. 3 rows found. 2 columns returned.
*** Total elapsed time was 1 second.

ErrorCode ErrorFieldName
--------- -----------------------------------------------------------------
2673 acc_prof
2673 acc_prof
2673 acc_prof

BTEQ -- Enter your SQL request or BTEQ command:

 

 

Thanks in advance,

Pawan Kumar