tdch 1.5 export from TDW 15.10 to Hadoop

Hadoop

tdch 1.5 export from TDW 15.10 to Hadoop

evening folk,

i am trying to export dat from our TDW into our Hadoop instance. in going through our setup and install of tdhc, i can issue this comamdn and all works well to move the table into hdfs:

 

yarn jar $USERLIBTDCH \

   com.teradata.connector.common.tool.ConnectorImportTool \

   -libjars $LIB_JARS

   -url jdbc:teradata://TDWSERVER/database=mydb,log=INFO \

  - username AAAA \

  -password BBBB \

  -sourcetable tableWithOneRow \

  - nummappers 1 \

  - separator ',' \

  - targetpath /user/hdfs/export \

  - method split.by.amp \

  - splitbycolumn c1

 

the environment variables are all set according to the redme in the tdch installation, and this works exactly as i would expect it to.  i turned on teh INFO log level to watch the steps the process goes through.  the table is a trivial table of two columns, c1 is an int, and c2 is a string.  the table only has a single row in it.  if i then change this to be a table containing 5.6 million rows, teh process never spawns the hadoop jobs.  in the debuig info for the single row table i see a 'select count(*) from tableWithOneRow' show inthe output whichis then followed by a 'select * from tableWithOneRow', and then teh hadoop jobs get fired off.  when i switch it to move the 5.6 million row table into hadoop, i see the select count(*) from muchBiggerTable' followed by a 'select * from muchBiggerTable'.  this is where the problem shows up.  it never gets past that select statement to spawn the hadoop jobs.  i have waited over an hour on one of thetest runs, but the TDW shows my session as being idle, and nothing is getting done on the hadoop side, as the data has never returned yet to the edge node where the tdch jobs is being run from.

 

i don't see this as being a network issue, as teh TDW and hadoop (teradata applainces we have have purchased) are in the same vlan.  the dat does need to traverse over the customer lan, but this is a 10 gb connection between the systems.  i can "accept" if it is building up the result set from the select, but this seems like i am missing somethign here, or doing something incorrect for the "larger" tables.  i have tables 2 orders of magnitude larger in size i had planned to migrate over, but i don't have that much time at this rate...

 

any help/advice would be appreciated

 

thanks

tom

 

2 REPLIES
SD2
N/A

Re: tdch 1.5 export from TDW 15.10 to Hadoop

You can try with increasing the the number of mappers and i need to see the log of the execution.

Re: tdch 1.5 export from TDW 15.10 to Hadoop

i just realized i left off the '- jobtype hdfs \ ' parameter in the previous post.

  

if i understand the nummappers parameter that onlyt affects the number of MapReduce jobs (the parallelism) to do the hadoop populating.  this is occurring before we get to that point, as (and i am only guessing here) that the tdch connector needs to determine exactly how best to determine how to break up the workflow jobs. here is the log as i have it:

 

LIB_JARS=/usr/hdp/2.3.4.0-3485/sqoop/lib/avro-1.7.5.jar,/usr/hdp/2.3.4.0-3485/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar,/usr/hdp/2.3.4.0-3485/hive/conf,/usr/hdp/2.3.4.0-3485/hive/lib/antlr-2.7.7.jar,/usr/hdp/2.3.4.0-3485/hive/lib/antlr-runtime-3.4.jar,/usr/hdp/2.3.4.0-3485/hive/lib/commons-dbcp-1.4.jar,/usr/hdp/2.3.4.0-3485/hive/lib/commons-pool-1.5.4.jar,/usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-cli.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-exec.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-jdbc.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-metastore.jar,/usr/hdp/2.3.4.0-3485/hive/lib/jdo-api-3.0.1.jar,/usr/hdp/2.3.4.0-3485/hive/lib/libfb303-0.9.2.jar,/usr/hdp/2.3.4.0-3485/hive/lib/libthrift-0.9.2.jar,/usr/hdp/2.3.4.0-3485/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar

HADOOP_CLASSPATH=/usr/hdp/2.3.4.0-3485/sqoop/lib/avro-1.7.5.jar,/usr/hdp/2.3.4.0-3485/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar,/usr/hdp/2.3.4.0-3485/hive/conf:/usr/hdp/2.3.4.0-3485/hive/lib/antlr-2.7.7.jar:/usr/hdp/2.3.4.0-3485/hive/lib/antlr-runtime-3.4.jar:/usr/hdp/2.3.4.0-3485/hive/lib/commons-dbcp-1.4.jar:/usr/hdp/2.3.4.0-3485/hive/lib/commons-pool-1.5.4.jar:/usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-core-3.2.10.jar:/usr/hdp/2.3.4.0-3485/hive/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.3.4.0-3485/hive/lib/hive-cli.jar:/usr/hdp/2.3.4.0-3485/hive/lib/hive-exec.jar:/usr/hdp/2.3.4.0-3485/hive/lib/hive-jdbc.jar:/usr/hdp/2.3.4.0-3485/hive/lib/hive-metastore.jar:/usr/hdp/2.3.4.0-3485/hive/lib/jdo-api-3.0.1.jar:/usr/hdp/2.3.4.0-3485/hive/lib/libfb303-0.9.2.jar:/usr/hdp/2.3.4.0-3485/hive/lib/libthrift-0.9.2.jar:/usr/hdp/2.3.4.0-3485/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar

USERLIBTDCH=/usr/lib/tdch/1.5/lib/teradata-connector-1.5.0.jar
2016-11-04.07:41:21.477 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Teradata JDBC Driver 15.10.00.26
2016-11-04.07:41:21.656 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Hostname lookup for TDWSERVERcop1 took 2 ms and failed
2016-11-04.07:41:21.657 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Hostname lookup for TDWSERVER took 1 ms and found 4 address(es)
2016-11-04.07:41:21.659 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Attempting connection 1 to TDWSERVER/1.2.3.4:1025
2016-11-04.07:41:21.666 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Connection attempt to TDWSERVER/1.2.3.4:1025 with timeout 10000 ms took 6 ms and succeeded, waiting for thread took 0 ms
2016-11-04.07:41:21.669 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Config Request message, 111 bytes, time: 1 ms
2016-11-04.07:41:21.670 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read Config Response message 1, 1186 bytes, time: 0 ms
2016-11-04.07:41:21.692 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Teradata Database 15.10.00.05
2016-11-04.07:41:21.693 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 TdgssManager initialization took 0 ms, TdgssConfigApi initialization took 0 ms
2016-11-04.07:41:21.705 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 tdgss version: 16.0.0.0
2016-11-04.07:41:21.723 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 TdgssContext.initSecContext took 7 ms
2016-11-04.07:41:21.724 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Assign Request message, 246 bytes, time: 0 ms
2016-11-04.07:41:21.729 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read Assign Response message 1, 1110 bytes, time: 5 ms
2016-11-04.07:41:21.753 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 TdgssContext.initSecContext took 24 ms
2016-11-04.07:41:21.754 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote SSO Request message, 332 bytes, time: 0 ms
2016-11-04.07:41:21.761 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read SSO Response message 1, 62 bytes, time: 7 ms
2016-11-04.07:41:21.775 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 TeraEncrypt.encrypt: nQualityOfProtection=0 bPrivacy=true inBuf.length=693 offset=0 len=693 outToken.length=752 expansion=59
2016-11-04.07:41:21.775 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Connect Request message, 776 bytes, time: 0 ms
2016-11-04.07:41:21.784 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read Connect Response message 1, 216 bytes, time: 9 ms
2016-11-04.07:41:21.785 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 TeraEncrypt.decrypt: inBuf.length=192 offset=0 len=192 outToken.length=138 shrinkage=54
2016-11-04.07:41:21.789 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 TDSession constructor: DATABASE mydb
2016-11-04.07:41:21.799 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Start Request message, 119 bytes, time: 0 ms
2016-11-04.07:41:21.800 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 StatementReceiveState.action getState=3 nRemainingTime=0 nTimeoutInMs=0 this=com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState@5d9b7a8a(req#=0 stmt#=0 atype=0 acnt=0 currs=null ctlr=com.teradata.jdbc.jdbc_4.statemachine.StatementController@1e8ce150(sql=DATABASEmydb stmt=com.teradata.jdbc.jdk6.JDK6_SQL_Statement@604f2bd2(statecode=3 sess=com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788)))
2016-11-04.07:41:21.829 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read Start Response message 1, 106 bytes, time: 28 ms
2016-11-04.07:41:21.854 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 executeStatement queryTimeout=0
2016-11-04.07:41:21.854 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Start Request message, 154 bytes, time: 0 ms
2016-11-04.07:41:21.854 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 StatementReceiveState.action getState=3 nRemainingTime=0 nTimeoutInMs=0 this=com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState@52066604(req#=0 stmt#=0 atype=0 acnt=0 currs=null ctlr=com.teradata.jdbc.jdbc_4.statemachine.StatementController@340b9973(sql=SELECTCAST(COUNT(*) A stmt=com.teradata.jdbc.jdk6.JDK6_SQL_Statement@4a194c39(statecode=3 sess=com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788)))
2016-11-04.07:41:21.904 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read Start Response message 1, 262 bytes, time: 50 ms
2016-11-04.07:41:21.937 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Continue Request message, 56 bytes, time: 0 ms
2016-11-04.07:41:21.937 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 StatementReceiveState.action getState=5 nRemainingTime=0 nTimeoutInMs=0 this=com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState@35a9782c(req#=0 stmt#=0 atype=0 acnt=0 currs=null ctlr=com.teradata.jdbc.jdbc_4.statemachine.StatementController@70a36a66(sql=null stmt=com.teradata.jdbc.jdk6.JDK6_SQL_Statement@4a194c39(statecode=5 sess=com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788)))
2016-11-04.07:41:21.937 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Read Continue Response message 1, 56 bytes, time: 0 ms
2016-11-04.07:41:21.943 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 executeStatement queryTimeout=0
2016-11-04.07:41:21.943 TERAJDBC4 TIMING [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 Wrote Start Request message, 131 bytes, time: 0 ms
2016-11-04.07:41:21.943 TERAJDBC4 INFO [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788 StatementReceiveState.action getState=3 nRemainingTime=0 nTimeoutInMs=0 this=com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState@25a6944c(req#=0 stmt#=0 atype=0 acnt=0 currs=null ctlr=com.teradata.jdbc.jdbc_4.statemachine.StatementController@5e1fa5b1(sql=SELECT*FROM muchBiggerTable stmt=com.teradata.jdbc.jdk6.JDK6_SQL_PreparedStatement@6b00f608(statecode=3 sess=com.teradata.jdbc.jdk6.JDK6_SQL_Connection@4b7dc788)))

and i t now waits here ad nauseaum...

 

thanks