TDCH Batch.Insert is faster than internal.fastload

Tools & Utilities
Teradata Employee

TDCH Batch.Insert is faster than internal.fastload

Hello, I've been trying to search all over but I can't seem to find an answer to this question. We are trying to tune a TDCH transfer job from Hive to Teradata 15. Based on all documentation, internal.fastload should perform better than batch.insert but in our case, it is taking almost twice as long to finish. Any help would be appreciated. Thanks!

 

Here's the code we used for batch.insert. 

export USERLIBTDCH=/usr/lib/tdch/1.5/lib/teradata-connector-1.5.3.jar
export LIB_JARS=/etc/hive/conf.cloudera.hive,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/antlr-runtime-3.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/commons-dbcp-1.4.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/commons-pool-1.5.4.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/datanucleus-api-jdo-3.2.6.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/datanucleus-core-3.2.12.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/datanucleus-rdbms-3.2.12.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-cli-1.1.0-cdh5.7.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-exec-1.1.0-cdh5.7.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-jdbc-1.1.0-cdh5.7.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-metastore-1.1.0-cdh5.7.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/jdo-api-3.0.1.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/libfb303-0.9.2.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/libthrift-0.9.3.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-hcatalog-core-1.1.0-cdh5.7.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/avro-tools-1.7.6-cdh5.7.5.jar,/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/avro-1.7.6-cdh5.7.5.jar
export HADOOP_CLASSPATH=/etc/hive/conf.cloudera.hive:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/antlr-runtime-3.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/commons-dbcp-1.4.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/commons-pool-1.5.4.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/datanucleus-api-jdo-3.2.6.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/datanucleus-core-3.2.12.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/datanucleus-rdbms-3.2.12.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-cli-1.1.0-cdh5.7.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-exec-1.1.0-cdh5.7.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-jdbc-1.1.0-cdh5.7.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-metastore-1.1.0-cdh5.7.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/jdo-api-3.0.1.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/libfb303-0.9.2.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/libthrift-0.9.3.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/hive-hcatalog-core-1.1.0-cdh5.7.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/avro-tools-1.7.6-cdh5.7.5.jar:/opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/jars/avro-1.7.6-cdh5.7.5.jar
hadoop jar $USERLIBTDCH \
com.teradata.connector.common.tool.ConnectorExportTool \
-Dmapreduce.job.queuename=QUEUENAME \
-libjars $LIB_JARS \
-url jdbc:teradata://FULLHOSTNAME/database=DBNAME \
-username ${TDCH_USER} \
-password ${TDCH_PASS} \
-jobtype hive \
-sourcedatabase HIVESOURCEDB \
-sourcetable HIVESOURCETABLE \
-nummappers 150 \
-targettable TDTARGET \
-fileformat orcfile

We used the same code above except for the following parameters for internal.fastload

 

-method internal.fastload
-nummappers 42

Not sure if it's relevant but the internal.fastload job is throwing the following warnings in log. 

17/12/21 17:59:26 WARN mapred.ResourceMgrDelegate: getBlacklistedTrackers - Not implemented yet

17/12/21 17:59:28 WARN split.JobSplitWriter: Max block location exceeded for split: Paths:...;  splitsize: 13 maxsize: 10

 

Target is a NOPI table on a 40 AMP system. When monitoring in Viewpoint, the internal.fastload job seems to be running only 1 AMP operation. I also tried different batch sizes from default to 200,000 but speed remains consistent for both methods.

 

Is there anything we're doing wrong?

Tags (2)
2 REPLIES
Scout

Re: TDCH Batch.Insert is faster than internal.fastload

Hi, have you fixed this issue yet?
I have a same issue with TDCH 1.5.4.

Thanks,
Frank
Teradata Employee

Re: TDCH Batch.Insert is faster than internal.fastload

Did you get to the root cause of your issue? was it to do with the loads were getting data errors for the fastload process as that could have been the reason for a single session showing up in the Teradata system