Read from hive failed using Data connector

Tools & Utilities
Enthusiast

Read from hive failed using Data connector

I am trying to test TPT15.10 sample script PTS00030 using hortonworks HDP 2.4 sandbox. But receiving below error.

(note: PTS00029 was successful using same sandboxes) 

 

Hive_table_reader[1]: TPT19609 After 60 seconds, all clients have not connected; a timeout has occurred.

Hive_table_reader[1]: TPT19434 pmRead failed. General failure (34): 'Unknown Access Module failure'

Hive_table_reader[1]: TPT19305 Fatal error reading data.

Hive_table_reader[1]: TPT19015 TPT Exit code set to 12.

$LOAD: disconnecting sessions

 

Could some help resolving this

 

ctl file:

====

DEFINE JOB Load_TD_table_with_records_from_Hive_table
DESCRIPTION 'Load a TD table with records from a Hive table'
(

DEFINE SCHEMA PTS00030_schema(
COL1 INTEGER,
COL2 VARCHAR(50),
COL3 DECIMAL(10,4)
);
DEFINE OPERATOR Hive_table_reader()
TYPE DATACONNECTOR PRODUCER
SCHEMA PTS00030_schema
ATTR(
HadoopHost = @HadoopHost,
HadoopJobType = @HadoopJobType,
HadoopSourceTable = @HadoopTable
);
STEP Setup_TD_table
(
APPLY
('DROP TABLE ' || @TargetTable || ';'),
('DROP TABLE ' || @TargetTable || '_WT;'),
('DROP TABLE ' || @TargetTable || '_ET;'),
('DROP TABLE ' || @TargetTable || '_LT;'),
('DROP TABLE ' || @TargetTable || '_UV;'),
('CREATE TABLE ' || @TargetTable || ' ( COL1 INTEGER,
COL2 VARCHAR(50),
COL3 DECIMAL(10,4)
); ')
TO OPERATOR ( $DDL );
);
STEP Load_TD_table_with_records_from_Hive_table
(
APPLY
('INSERT INTO ' || @TargetTable || '(:COL1, :COL2, :COL3);')
TO OPERATOR ( $LOAD[@LoadInstances] )
SELECT * FROM OPERATOR ( Hive_table_reader() );
);
/*********************************************************
Uncomment this step if you want to clean up the tables
after the job completion.
*********************************************************/
/********************************************************
STEP CLEANUP
(
APPLY
('DROP TABLE ' || @TargetTable || '_WT ; '),
('DROP TABLE ' || @TargetTable || '_ET ; '),
('DROP TABLE ' || @TargetTable || '_UV ; '),
('DROP TABLE ' || @TargetTable || '_LOG ; '),
('DROP TABLE ' || @TargetTable || ' ; ')
TO OPERATOR ($DDL);

);
********************************************************/
);

 

jobvars2.txt

===========

/********************************************************/
/* TPT LOAD Operator attributes */
/********************************************************/
TargetTdpId = '127.0.0.1'
,TargetUserName = 'dbc'
,TargetUserPassword = 'dbc'
,TargetTable = 'PTS00030_TBL'
/********************************************************/
/* TPT Export Operator attributes */
/********************************************************/
,SourceTdpId = '127.0.0.1'
,SourceUserName = 'dbc'
,SourceUserPassword = 'dbc'
,SelectStmt = 'SELECT * FROM PTS00030_TBL;'
/********************************************************/
/* TPT LOAD Operator attributes */
/********************************************************/
,DDLErrorList = '3807'
/********************************************************/
/* TPT DataConnector Hadoop specific attributes */
/********************************************************/
,HadoopHost = '172.xx.xx.**bleep**'
,HadoopJobType = 'hive'
,HadoopFileFormat = 'rcfile'
,HadoopTable = 'PTS00030_TBL'
,HadoopTableSchema = 'COL1 INT, COL2 STRING, COL3 STRING'
/********************************************************/
/* APPLY STATEMENT parameters */
/********************************************************/
,LoadInstances = 1

 

 

 

 

 

 

5 REPLIES
Teradata Employee

Re: Read from hive failed using Data connector

Does PTS00029 always succeed and PTS00030 always fail?

 

-- SteveF
Enthusiast

Re: Read from hive failed using Data connector

Well actaully PTS00029 is also not working. I have one local directory datafiles/accounts1.txt (on TD15) from which it was loading TD table. So I thought it is actually reading from hdfs and loading the TD table. 

Could yu help resolving below issue for PTS00029 first. jobvars and script details are as given below.

TD-EXPRESS:~ # cat PTS00029

DEFINE JOB File_Load
DESCRIPTION 'Load a Teradata table from a file'
(

SET FileReaderHadoopHost = @HadoopHost;
SET TargetTable = 'PTS00029_TBL';
SET LogTable = @TargetTable || '_LOG';

STEP MAIN_STEP
(
APPLY $INSERT TO OPERATOR ( $LOAD [@LoadInstances] )
SELECT * FROM OPERATOR ( $FILE_READER [@ReaderInstances] );
);

/*********************************************************
Uncomment this step if you want to clean up the tables
after the job completion.
*********************************************************/

/********************************************************
STEP CLEANUP
(
APPLY
('DROP TABLE ' || @TargetTable || '_WT ; '),
('DROP TABLE ' || @TargetTable || '_ET ; '),
('DROP TABLE ' || @TargetTable || '_UV ; '),
('DROP TABLE ' || @TargetTable || '_LOG ; '),
('DROP TABLE ' || @TargetTable || ' ; ')
TO OPERATOR ($DDL);

);
********************************************************/

);

 

TD-EXPRESS:~ # cat jobvars.txt
/********************************************************/
/* TPT attributes - Common for all Samples */
/********************************************************/
TargetTdpId = '127.0.0.1'
,TargetUserName = 'dbc'
,TargetUserPassword = 'dbc'
,TargetErrorList = '3807'
,DDLPrivateLogName = 'ddl_log'

/*********************************************************/
/* TPT LOAD Operator attributes */
/********************************************************/
,LoadPrivateLogName = 'LOAD_OPERATOR_LOG'
,LoadTraceLevel = 'None'
,SourceTdpId = '127.0.0.1'
,SourceUserName = 'dbc'
,SourceUserPassword = 'dbc'
/********************************************************/
/* TPT DataConnector Producer Operator attributes */
/********************************************************/
,SourceFileName = 'accounts1.txt'
,SourceFormat = 'Delimited'
,SourceOpenmode = 'Read'
,SourceDirectoryPath = 'datafiles'
,SourceTextDelimiter = '|'
,ProducerPrivateLogName = 'FILE_READER_LOG'
,ProducerTraceLevel = 'Milestones'
,TargetFileName = 'accounts3.txt'
,TargetFormat = 'Delimited'
,TargetDirectoryPath = 'datafiles'
,TargetOpenMode = 'Write'
,HadoopHost = '172.16.88.229'
/*******************************************************/
/* MODIFY THE PIPE NAME as \\.\pipe\datapipe when this */
/* script is executed in windows platform */
/*******************************************************/
,PipeName = 'datapipe'

/*******************************************************/
/* TPT ODBC operator attributes */
/********************************************************/
,ODBCDSNname = '< ODBC DSN Name>'
,ODBCUserName = '< ODBC User Name>'
,ODBCUserPassword = '< ODBC User Password>'
,ODBCTraceLevel = 'None'
/********************************************************/
/* APPLY STATEMENT parameters */
/********************************************************/
,LoadInstances = 1
,UpdateInstances = 1
,ExportInstances = 1
,StreamInstances = 1
,InserterInstances = 1
,SelectorInstances = 1
,ReaderInstances = 1
,WriterInstances = 1
,OutmodInstances = 1
,ODBCOprInstances = 1

 

 

Teradata Parallel Transporter Version 15.10.00.03 64-Bit
Job log: /opt/teradata/client/15.10/tbuild/logs/kl11-117.out
Job id is kl11-117, running on TD-EXPRESS
Teradata Parallel Transporter DataConnector Operator Version 15.10.00.03
Teradata Parallel Transporter Load Operator Version 15.10.00.03
$LOAD: private log specified: LOAD_OPERATOR_LOG
$FILE_READER[1]: Instance 1 directing private log report to 'dtacop-root-12549-1'.
$FILE_READER[1]: DataConnector Producer operator Instances: 1
$FILE_READER[1]: ECI operator ID: '$FILE_READER-12549'
$FILE_READER[1]: TPT19231 file 'datafiles/accounts1.txt' not found.
$FILE_READER[1]: TPT19015 TPT Exit code set to 12.
$FILE_READER[1]: Total files processed: 0.
$LOAD: connecting sessions
$LOAD: preparing target table
$LOAD: entering Acquisition Phase
$LOAD: disconnecting sessions
$LOAD: Total processor time used = '0.05 Second(s)'
$LOAD: Start : Wed Jan 18 21:07:37 2017
$LOAD: End : Wed Jan 18 21:07:37 2017
Job step MAIN_STEP terminated (status 12)
Job kl11 terminated (status 12)
Job start: Wed Jan 18 21:07:33 2017
Job end: Wed Jan 18 21:07:37 2017

 

I am using hdp VM.

[root@sandbox ~]# hadoop fs -cat /datafiles/accounts1.txt

101|10001|01/01/|00001|100.01

105|10005|01/05/|00005|500.01

104|10004|01/04/|00004|400.01

103|10003|01/03/|00003|300.01

102|10002|01/02/|00002|200.01

 

Thanks

 

 

Teradata Employee

Re: Read from hive failed using Data connector

It looks to me as if you *might* be looking at 2 different locations (but I am not a Hadoop expert and so could be wrong here).

In the job variable file you list the source directory name as "datafiles".

When you issued the hadoop fs -cat command, you provided the directory name as "/datafiles".

Is it possible that your hadoop command and the TPT DC operator are looking at 2 different locations?

Have you tried to supply "/datafiles" to the DC operator in the SourceDirectoryPath job variable?

 

And in your job variable file, you list "source" and "target" job variables in the same section titled "TPT DataConnector Producer", but that is just something I noticed and not really that important.

 

 

-- SteveF
Enthusiast

Re: Read from hive failed using Data connector

Thanks for the response Feinholz. Yes I have tried "/datafiles" as well but hitting the same error.

Teradata Parallel Transporter Version 15.10.00.03 64-Bit

Job log: /opt/teradata/client/15.10/tbuild/logs/kl13-119.out
Job id is kl13-119, running on TD-EXPRESS
Teradata Parallel Transporter DataConnector Operator Version 15.10.00.03
$FILE_READER[1]: Instance 1 directing private log report to 'dtacop-root-25927-1'.
$FILE_READER[1]: DataConnector Producer operator Instances: 1
Teradata Parallel Transporter Load Operator Version 15.10.00.03
$LOAD: private log specified: LOAD_OPERATOR_LOG
$FILE_READER[1]: ECI operator ID: '$FILE_READER-25927'
$FILE_READER[1]: TPT19231 file '/datafiles/accounts1.txt' not found.
$FILE_READER[1]: TPT19015 TPT Exit code set to 12.
$FILE_READER[1]: Total files processed: 0.
$LOAD: connecting sessions
$LOAD: preparing target table
$LOAD: entering Acquisition Phase
$LOAD: disconnecting sessions
$LOAD: Total processor time used = '0.07 Second(s)'
$LOAD: Start : Thu Jan 19 09:40:25 2017
$LOAD: End : Thu Jan 19 09:40:26 2017
Job step MAIN_STEP terminated (status 12)
Job kl13 terminated (status 12)
Job start: Thu Jan 19 09:40:21 2017
Job end: Thu Jan 19 09:40:26 2017

 

PTS00029 document says below.

 Move the file 'accounts1.txt' file, generated by the */
/* PTSTP002 setup script, into the Hadoop cluster defined  by the HadoopHost attribute using the 'hadoop fs -put' */
/* command. The target of the put operator should be  'datafiles/accounts1.txt' */

 

 And regarding the source and target job variables, this is how they are positioned when I first found it in the userguide folder   . Hadoop community is also unable to help since they are not sure about TPT mechanism. Could you help deep dive into this issue.

Teradata Employee

Re: Read from hive failed using Data connector

Hi,
The script PTS00029 excutes fine for me.
Teradata Parallel Transporter Version 15.10.00.03 64-Bit
Job log: /opt/teradata/client/15.10/tbuild/logs/ok5-16.out
Job id is ok5-16, running on hdp267m1
Teradata Parallel Transporter Load Operator Version 15.10.00.03
$LOAD: private log specified: LOAD_OPERATOR_LOG
Teradata Parallel Transporter DataConnector Operator Version 15.10.00.03
$FILE_READER[1]: Instance 1 directing private log report to 'dtacop-hdfs-24675-1'.
$FILE_READER[1]: DataConnector Producer operator Instances: 1
$FILE_READER[1]: ECI operator ID: '$FILE_READER-24675'
$FILE_READER[1]: Operator instance 1 processing file 'datafiles/accounts1.txt'.
$LOAD: connecting sessions
$LOAD: preparing target table
$LOAD: entering Acquisition Phase
$LOAD: entering Application Phase
$LOAD: Statistics for Target Table: 'PTS00029_TBL'
$LOAD: Total Rows Sent To RDBMS: 5
$LOAD: Total Rows Applied: 5
$LOAD: Total Rows in Error Table 1: 0
$LOAD: Total Rows in Error Table 2: 0
$LOAD: Total Duplicate Rows: 0
$LOAD: disconnecting sessions
$FILE_READER[1]: Total files processed: 1.
$LOAD: Total processor time used = '0.08 Second(s)'
$LOAD: Start : Wed Feb 15 03:58:18 2017
$LOAD: End : Wed Feb 15 03:58:42 2017
Job step MAIN_STEP completed successfully
Job ok5 completed successfully
Job start: Wed Feb 15 03:58:15 2017
Job end: Wed Feb 15 03:58:42 2017

 

Can you give it try by exporting the hadoop paths correctly again?

 

As far as PTS00030 is concerned, it works fine with HDP 2.5 and TPT version 15.10.01.06.
Teradata Parallel Transporter Version 15.10.01.06 64-Bit
Job log: /opt/teradata/client/15.10/tbuild/logs/wqe-9.out
Job id is wqe-9, running on hdp267m1
Teradata Parallel Transporter SQL DDL Operator Version 15.10.01.06
$DDL: private log not specified
$DDL: connecting sessions
$DDL: sending SQL requests
$DDL: TPT10508: RDBMS error 3807: Object 'PTS00030_TBL_WT' does not exist.
$DDL: TPT18046: Error is ignored as requested in ErrorList
$DDL: TPT10508: RDBMS error 3807: Object 'PTS00030_TBL_LT' does not exist.
$DDL: TPT18046: Error is ignored as requested in ErrorList
$DDL: disconnecting sessions
$DDL: Total processor time used = '0.020001 Second(s)'
$DDL: Start : Wed Feb 15 02:19:22 2017
$DDL: End : Wed Feb 15 02:19:27 2017
Job step Setup_TD_table completed successfully
Teradata Parallel Transporter Load Operator Version 15.10.01.06
$LOAD: private log not specified
Teradata Parallel Transporter DataConnector Operator Version 15.10.01.06
Hive_table_reader[1]: Instance 1 directing private log report to 'dtacop-hdfs-10376-1'.
Hive_table_reader[1]: DataConnector Producer operator Instances: 1
Hive_table_reader[1]: ECI operator ID: 'Hive_table_reader-10376'
Hive_table_reader[1]: Operator instance 1 processing file 'PTS00030_TBL'.
$LOAD: connecting sessions
$LOAD: preparing target table
$LOAD: entering Acquisition Phase
Hive_table_reader[1]: TPT19601 All 2 clients connected.
$LOAD: entering Application Phase
$LOAD: Statistics for Target Table: 'PTS00030_TBL'
$LOAD: Total Rows Sent To RDBMS: 5
$LOAD: Total Rows Applied: 5
$LOAD: Total Rows in Error Table 1: 0
$LOAD: Total Rows in Error Table 2: 0
$LOAD: Total Duplicate Rows: 0
Hive_table_reader[1]: Total files processed: 1.
$LOAD: disconnecting sessions
$LOAD: Total processor time used = '0.088004 Second(s)'
$LOAD: Start : Wed Feb 15 02:19:29 2017
$LOAD: End : Wed Feb 15 02:20:13 2017
Job step Load_TD_table_with_records_from_Hive_table completed successfully
Job wqe completed successfully
Job start: Wed Feb 15 02:19:19 2017
Job end: Wed Feb 15 02:20:13 2017

 

However, with HDP 2.4, it fails with a "connection refused" error in the TDCH-log.
We will be looking into it further. 

 

--Sai Divya