Export from TD to HDFS using HDFS API form node other than Hadoop cluster

Tools
Enthusiast

Export from TD to HDFS using HDFS API form node other than Hadoop cluster

Hi All,

We are trying to export data from TD to Hadoop using HDFS API.

Below are the steps we followed:

1. We installed TPT in one node which is not edge node in Hadoop cluster (TTU Version: 15.10.01.00  64 bit).

2. We modified .profile to include HADOOP_HOME environment variable. It's not "/usr/lib/hadoop".

3. We copied "hadoop-client" directory and all of its contents into the node where TTU is installed. Hadoop distribution is Hortonworks.

4. In tbuild command, we have given HoadoopHost=<NameNodeHost>:<NameNodePort (8020)> and FileName= hdfs://<FullPathOfHDFSDirectory>.

Below is the TPT Control File:

DEFINE JOB ACCOUNT_PRODUCT_V
DESCRIPTION 'Export script for ACCOUNT_PRODUCT_V from TD Table'
(
DEFINE SCHEMA SCHEMA_ACCOUNT_PRODUCT_V
(
"C1"  INTEGER,
"C2"  INTEGER,
"C3"  VARCHAR(10),
"C4"  CHAR(1),
"C5"  CHAR(1),
"C6"  VARCHAR(10),
"C7"  VARCHAR(25),
"C8"  VARCHAR(26),
"C9"  VARCHAR(25),
"C10"  VARCHAR(26)
);


DEFINE OPERATOR o_ExportOper
TYPE EXPORT
SCHEMA SCHEMA_ACCOUNT_PRODUCT_V
ATTRIBUTES
(
VARCHAR UserName                        = @UserName
,VARCHAR UserPassword                   = @UserPassword
,VARCHAR TdpId                = @TdpId
,INTEGER MaxDecimalDigits     = 38
,INTEGER MaxSessions                    = @MaxSessions
,INTEGER MinSessions                    = @MinSessions
,VARCHAR PrivateLogName                 = 'Export'
,VARCHAR SpoolMode              = 'NoSpool'
,VARCHAR WorkingDatabase      = @WorkingDatabase
,VARCHAR SourceTable            = @SourceTable
,VARCHAR SelectStmt             = @SelectStmt
);

DEFINE OPERATOR o_FileWritter
TYPE DATACONNECTOR CONSUMER
SCHEMA SCHEMA_ACCOUNT_PRODUCT_V
ATTRIBUTES
(
VARCHAR FileName                = @FileName
,VARCHAR Format                 = @Format
,VARCHAR TextDelimiter          = @TextDelimiter
,VARCHAR IndicatorMode          = 'N'
,VARCHAR OpenMode               = 'Write'
,VARCHAR PrivateLogName         = 'DataConnector'
,HadoopHost                     = @HadoopHost
);


SET FileWriterHadoopHost = @HadoopHost;


APPLY TO OPERATOR (o_FileWritter[@LoadInst])
SELECT * FROM OPERATOR (o_ExportOper[@ReadInst]);
)
;

Below is the tbuild command we are using:

tbuild -f <PathToTheControlFile>/ACCOUNT_PRODUCT_V.tpt.ctl -v <PathToTheControlFile>/logon_file_tpt -u " WorkingDatabase='EXTRACT_VIEWS' , SourceTable='ACCOUNT_PRODUCT_V' , load_op=o_ExportOper , LoadInst=1 , ReadInst=1 , MaxSessions=10 , MinSessions=5 , HadoopHost='<NameNodeHost>:8020' , HadoopUser='User' , FileName='hdfs://<ClusterName>/apps/hive/warehouse/account_product/ACCOUNT_PRODUCT_2017-05-25.txt' , LOAD_DTS='2017-05-25-043105' , Format='DELIMITED' , TextDelimiter='|' , SkipRows=0 , SelectStmt='SELECT CAST( "C1" AS INTEGER ), CAST( "C2" AS INTEGER ), CAST( "C3" AS VARCHAR(10) ), CAST( "C4" AS CHAR(1) ), CAST( "C5" AS CHAR(1) ), CAST( "C6" AS VARCHAR(10) ), CAST( "C7" AS VARCHAR(25) ), CAST( "C8" AS VARCHAR(26) ), CAST( "C9" AS VARCHAR(25) ), CAST( "C10" AS VARCHAR(26) ) FROM EXTRACT_VIEWS.ACCOUNT_PRODUCT_V; ' " ACCOUNT_PRODUCT_V -e UTF8

Below is the error we are getting:

Teradata Parallel Transporter Version 15.10.01.00 64-Bit
Job log: /opt/teradata/client/15.10/tbuild/logs/ACCOUNT_PRODUCT_V-35.out
Job id is ACCOUNT_PRODUCT_V-35, running on <NodeOtherThanEdgeNode>
Teradata Parallel Transporter DataConnector Operator Version 15.10.01.00
o_FileWritter[1]: Instance 1 directing private log report to 'DataConnector-1'.
Teradata Parallel Transporter Export Operator Version 15.10.01.00
o_ExportOper: private log specified: Export
o_FileWritter[1]: DataConnector Consumer operator Instances: 1
o_ExportOper: connecting sessions
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
hdfsBuilderConnect(forceNewInstance=0, nn=<NameNodeHost>:8020, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2713)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2720)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:95)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2756)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2738)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:376)
        at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:165)
        at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
Oops! Failed to connect to hdfs!
o_FileWritter[1]: TPT19015 TPT Exit code set to 12.
o_FileWritter[1]: Total files processed: 0.
o_ExportOper: disconnecting sessions
o_ExportOper: Total processor time used = '0.47 Second(s)'
o_ExportOper: Start : Thu May 25 10:39:18 2017
o_ExportOper: End   : Thu May 25 10:39:23 2017
Job step MAIN_STEP terminated (status 12)
Job ACCOUNT_PRODUCT_V terminated (status 12)
Job start: Thu May 25 10:39:15 2017
Job end:   Thu May 25 10:39:23 2017

Can anyone please help us finding out the resolution for this issue?

Also, what are the necessary jars from Hadoop client which needs to be copied on the node where TPT is running?

 

Thanks & Regards,

Arpan.

 

Tags (3)
3 REPLIES
Teradata Employee

Re: Export from TD to HDFS using HDFS API form node other than Hadoop cluster

The fact that you specified just HadoopHost indicates to me that you are trying to use the HDFS API.

Therefore, I am not sure why you are getting Java-related messages (as if you are trying to use the TDCH API).

I would need you to email me the job's binary log file (the .out file).

If I cannot get the information from that, I may need you to turn on the tracing for the DC Consumer.

 

-- SteveF
Enthusiast

Re: Export from TD to HDFS using HDFS API form node other than Hadoop cluster

Hi Steve,

Thanks for your reply.

Yes, I'm using HDFS API. Please let me know in which emaul I should send the binaly log file.

Also please let me know when to turn on the tracing for the DC Consumer.

 

Thanks & Regards,

Arpan.

Tags (3)
Teradata Employee

Re: Export from TD to HDFS using HDFS API form node other than Hadoop cluster

steven.feinholz@teradata.com

-- SteveF