Teradata Connector for Hadoop Now Available

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Re: Teradata Connector for Hadoop now available

Guys, I need help regarding my inquery.

Teradata Employee

Re: Teradata Connector for Hadoop now available

Joseph - Talend and PowerCenter have their own methods for connecting to Hadoop and Teradata. Neither uses TDCH. You would need to work with each solution to figure out how they connect to Hadoop.

Since neither Talend or PowerCenter use TDCH, having TDCH on the same system as the Talend and PowerCenter connector should be possible.

Teradata Employee

Re: Teradata Connector for Hadoop now available

Thanks Ariff

Re: Teradata Connector for Hadoop now available

Hi,

I get the following error when invoking TDCH from oozie. How ever the same command runs fine in commandline. What am i doing wrong?

Jun 19, 2014 4:54:38 PM com.teradata.hadoop.tool.TeradataImportTool main
INFO: TeradataImportTool starts at 1403211278263
Jun 19, 2014 4:54:38 PM com.teradata.hadoop.tool.TeradataImportTool main
SEVERE: java.lang.NoClassDefFoundError: org.apache.log4j.LogManager (initialization failure)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:140)
at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:270)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:281)
at org.apache.hadoop.security.authentication.util.KerberosName.<clinit>(KerberosName.java:42)
at java.lang.J9VMInternals.initializeImpl(Native Method)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:216)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:671)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:573)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:166)
at com.teradata.hadoop.tool.TeradataImportTool.run(TeradataImportTool.java:39)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:464)

Jun 19, 2014 4:54:38 PM com.teradata.hadoop.tool.TeradataImportTool main
INFO: job completed with exit code 10000
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Command that i used:

hadoop com.teradata.hadoop.tool.TeradataImportTool -url jdbc:teradata://TDDEV/DATABASE=XXXXXX -username XXXXXX -password XXXXXX -jobtype hive -fileformat textfile -sourcequery "select * from tablename" -targettable TMP_exp -method split.by.partition -hiveconf /biginsights/hive/conf/hive-site.xml -stagedatabase DB_WRK_HDOOP -stagetablename abcd

Re: Teradata Connector for Hadoop now available

I downloaded TDCH 1.3 and used it successfully to import from TD to Hive, and from TD to HDFS (csv).  However, when I try to import from TD to HDFS (Avro), I get the following exception in the MapReduce jobs:

Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

I'm using Hortonworks 2.1.  The log file is large, but there is no exception stack trace.  This is a portion of the log, which is repeated for each of the task attempts:

2014-06-24 15:29:05,324 FATAL [IPC Server handler 0 on 48577] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1403546838254_0017_m_000000_1 - exited : Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-06-24 15:29:05,324 INFO [IPC Server handler 0 on 48577] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1403546838254_0017_m_000000_1: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-06-24 15:29:05,326 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1403546838254_0017_m_000000_1: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-06-24 15:29:05,329 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403546838254_0017_m_000000_1 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP

2014-06-24 15:29:05,330 INFO [ContainerLauncher #7] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1403546838254_0017_01_000006 taskAttempt attempt_1403546838254_0017_m_000000_1

2014-06-24 15:29:05,330 INFO [ContainerLauncher #7] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1403546838254_0017_m_000000_1

2014-06-24 15:29:05,333 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403546838254_0017_m_000000_1 TaskAttempt Transitioned from FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP

2014-06-24 15:29:05,334 INFO [CommitterEvent Processor #4] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: TASK_ABORT

2014-06-24 15:29:05,344 WARN [CommitterEvent Processor #4] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://hdp2.jri.revelytix.com:8020/user/jirwin/td_avro/_temporary/1/_temporary/attempt_1403546838254_0017_m_000000_1

2014-06-24 15:29:05,345 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403546838254_0017_m_000000_1 TaskAttempt Transitioned from FAIL_TASK_CLEANUP to FAILED

2014-06-24 15:29:05,345 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved hdp2.jri.revelytix.com to /default-rack

2014-06-24 15:29:05,345 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403546838254_0017_m_000000_2 TaskAttempt Transitioned from NEW to UNASSIGNED

2014-06-24 15:29:05,346 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1403546838254_0017_m_000000_2 to list of failed maps

2014-06-24 15:29:06,236 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:2 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:6 ContRel:2 HostLocal:2 RackLocal:0

2014-06-24 15:29:06,248 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1403546838254_0017: ask=1 release= 0 newContainers=0 finishedContainers=1 resourcelimit=<memory:0, vCores:0> knownNMs=1

2014-06-24 15:29:06,249 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the blacklist for application_1403546838254_0017: blacklistAdditions=0 blacklistRemovals=1

2014-06-24 15:29:06,249 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1403546838254_0017_01_000005

2014-06-24 15:29:06,250 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1403546838254_0017_m_000001_1: Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Re: Teradata Connector for Hadoop now available

With respect to my previous message, it appears that the problem is that HDP 2.1 is distributed with Hive 0.13, which does not have the same Avro jars as previous versions of Hive. 

In order to impprt/export Avro format using TDCH 1.3 on HDP 2.1, it is necessary to obtain the avro-1.7.4.jar and avro-mapred-1.7.4-hadoop2.jar from a prior version of Hive, such as Hive 0.12.  After obtaining those jar files, and adding them to the HADOOP_CLASSPATH and the -libjars, I was able to run TDCH 1.3 with Avro successfully.

By contrast, using the jars from the HDP 2.1 Hive 0.13, TDCH produces the following exception on the task:

2014-06-24 15:48:16,384 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

    at org.apache.avro.mapreduce.AvroKeyOutputFormat.getRecordWriter(AvroKeyOutputFormat.java:87)

    at com.teradata.connector.hdfs.HdfsAvroOutputFormat.getRecordWriter(HdfsAvroOutputFormat.java:44)

    at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.<init>(ConnectorOutputFormat.java:84)

    at com.teradata.connector.common.ConnectorOutputFormat.getRecordWriter(ConnectorOutputFormat.java:33)

    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:624)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:744)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)

    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:415)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)

    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Re: Teradata Connector for Hadoop now available

I have a new issue with TDCH 1.3 command line edition and HCatalog.

Every time I attempt to import from TD into HCatalog, I get the following exception.  I've tried various distributions (CDH4.5, CDH5.0 GA, and HDP2.1), and they all get the same exception.  I've previously used TDCH 1.1 on CDH5.0 GA, with no problems.

14/06/25 15:18:27 INFO tool.ConnectorImportTool: ConnectorImportTool starts at 1403723907881

14/06/25 15:18:28 INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-jirwin/hadoop-unjar3619433998773300551/teradata.connector.plugins.xml

14/06/25 15:18:28 INFO processor.TeradataInputProcessor: input preprocessor com.teradata.connector.teradata.processor.TeradataSplitByAmpProcessor starts at:  1403723908764

14/06/25 15:18:29 INFO utils.TeradataUtils: the input database product is Teradata

14/06/25 15:18:29 INFO utils.TeradataUtils: the input database version is 14.0

14/06/25 15:18:29 INFO utils.TeradataUtils: the jdbc driver version is 14.0

14/06/25 15:18:29 INFO processor.TeradataInputProcessor: the teradata connector for hadoop version is: 1.3

14/06/25 15:18:29 INFO processor.TeradataInputProcessor: input jdbc properties are jdbc:teradata://192.168.11.200/database=vmtest

14/06/25 15:18:29 INFO processor.TeradataInputProcessor: the number of mappers are 2

14/06/25 15:18:29 INFO processor.TeradataInputProcessor: input preprocessor com.teradata.connector.teradata.processor.TeradataSplitByAmpProcessor ends at:  1403723909868

14/06/25 15:18:29 INFO processor.TeradataInputProcessor: the total elapsed time of input preprocessor com.teradata.connector.teradata.processor.TeradataSplitByAmpProcessor is: 1s

14/06/25 15:18:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

14/06/25 15:18:30 INFO hive.metastore: Trying to connect to metastore with URI thrift://hdp2.jri.revelytix.com:9083

14/06/25 15:18:30 INFO hive.metastore: Connected to metastore.

14/06/25 15:18:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

14/06/25 15:18:30 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

14/06/25 15:18:31 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByAmpProcessor starts at:  1403723911542

14/06/25 15:18:31 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByAmpProcessor ends at:  1403723911542

14/06/25 15:18:31 INFO processor.TeradataInputProcessor: the total elapsed time of input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByAmpProcessor is: 0s

14/06/25 15:18:31 INFO tool.ConnectorImportTool: com.teradata.connector.common.exception.ConnectorException: java.lang.NullPointerException

    at org.apache.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:99)

    at com.teradata.connector.hcat.utils.HCatSchemaUtils.getTargetFieldsTypeName(HCatSchemaUtils.java:37)

    at com.teradata.connector.hcat.processor.HCatOutputProcessor.outputPreProcessor(HCatOutputProcessor.java:70)

    at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:88)

    at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:48)

    at com.teradata.connector.common.tool.ConnectorImportTool.run(ConnectorImportTool.java:57)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

    at com.teradata.connector.common.tool.ConnectorImportTool.main(ConnectorImportTool.java:694)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

    at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:103)

    at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:48)

    at com.teradata.connector.common.tool.ConnectorImportTool.run(ConnectorImportTool.java:57)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

    at com.teradata.connector.common.tool.ConnectorImportTool.main(ConnectorImportTool.java:694)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Teradata Employee

Re: Teradata Connector for Hadoop now available

I am getting the following error while importing data from Teradata to Hadoop. Can anybody  help please?

java.io.FileNotFoundException: File -url does not exist.

at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:379)

at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:275)

at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:413)

at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:164)

at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:147)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

at com.teradata.hadoop.tool.TeradataImportTool.main(TeradataImportTool.java:369)

13/05/09 17:29:41 INFO tool.TeradataImportTool: job completed with exit code 10000

Command:

hadoop com.teradata.hadoop.tool.TeradataImportTool -libjars $LIB_JARS -url jdbc:teradata://myserver/database=mydb -username user -password password -jobtype hdfs -sourcetable example1_td -nummappers 1 -separator ',' -targetpaths /user/mapred/ex1_hdfs -method split.by.hash -splitbycolumn c1

Teradata Employee

Re: Teradata Connector for Hadoop now available

@araghava

@hcnguyen

Yes, TDCH once installed on Hadoop allows data transfers to and from TD. I used HDP2.1 with TD 15 and Teradata Studio 15. But first I installed TDCH in HDP2.1 /usr/lib directory and run ./configureOozie there. More details here

http://ahsannabi.wordpress.com/2014/09/16/free-hadoop-teradata-integration/

Enthusiast

Re: Teradata Connector for Hadoop now available

Hi All,

Can anybody help me with the link for Java API for TDCH. Please let me know.

Regards

Shekhar