bugs in TPT 15.00 for auto schema generation

Tools & Utilities
ywu
Fan

bugs in TPT 15.00 for auto schema generation

tbuild -v
Teradata Parallel Transporter Version 15.10.00.00 64-Bit

 

Script 

DEFINE JOB TD2HDFS

(

SET FileWriterHadoopHost = @HadoopHost;

APPLY

TO OPERATOR ($FILE_WRITER
        ATTRIBUTES(
         FileName=@FileName
        )
        )

SELECT * FROM OPERATOR ($EXPORT
        ATTRIBUTES(
        SelectStmt = @SelectStmt
        )
) ;
);

tbuild launch command: 

tbuild -f td2hdfs.ctl -v td2hdfs.jobvar.txt -u "FileName = 'job_exec.txt', SelectStmt = 'select * from job_status'" 
Teradata Parallel Transporter Version 15.10.00.00 64-Bit
Job log: /home/xxxx/tptscript/tptlog/xxxx-49.out
Job id is xxxxx9, running on xxxxx
Found CheckPoint file: /opt/teradata/client/15.10/tbuild/checkpoint/xxxxxLVCP
This is a restart job; it restarts at step MAIN_STEP.
Teradata Parallel Transporter DataConnector Operator Version 15.10.00.00
$FILE_WRITER[1]: Instance 1 directing private log report to 'dtacop-xxxxxx76004-1'.
$FILE_WRITER[1]: DataConnector Consumer operator Instances: 1
Teradata Parallel Transporter Export Operator Version 15.10.00.00
$EXPORT: private log not specified
$EXPORT: connecting sessions
TPT_INFRA: TPT02639: Error: Conflicting data type for column(1) - "JOB_EXEC_ID". Source column's data type (VARCHAR) Target column's data type (INTEGER).
$EXPORT: TPT12108: Output Schema does not match data from SELECT statement
$EXPORT: disconnecting sessions
$EXPORT: Total processor time used = '0.02 Second(s)'
$EXPORT: Start : Fri Jan 6 10:22:31 2017
$EXPORT: End : Fri Jan 6 10:22:31 2017
17/01/06 10:22:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
$FILE_WRITER[1]: ECI operator ID: '$FILE_WRITER-76004'
$FILE_WRITER[1]: Operator instance 1 processing HDFS file '/user/xxxx/job_exec.txt'.
$FILE_WRITER[1]: Total files processed: 0.
Job step MAIN_STEP terminated (status 12)
Job e_ywu terminated (status 12)
Job start: Fri Jan 6 10:22:27 2017
Job end: Fri Jan 6 10:22:32 2017

 

I also tried another script to use the auto schema generation . same error is thrown . job scripts below. 

DEFINE JOB TD2HDFS

(

SET FileWriterHadoopHost = @HadoopHost;
DEFINE SCHEMA TBL FROM TABLE @EXP_TBL;

APPLY

TO OPERATOR ($FILE_WRITER
        ATTRIBUTES(
         FileName=@FileName
        )
        )

SELECT * FROM OPERATOR ($EXPORT(TBL)
        ATTRIBUTES(
        SelectStmt = 'SELECT * FROM '||@EXP_TBL
        )
) ;
);

BTW, this has nothing to do with hadoop. I tried to switch to plain text file writer and same error occurs. This looks like a bug how TPT infra to manage schema generation.  I have to hard code all schema for now. 

PS: i also found that large decimal (x, y) is also trouble . When precision > 18, TPT cannot map the source and target schema with similar error message but shows precision not matching . 


Accepted Solutions
ywu
Fan

Re: bugs in TPT 15.00 for auto schema generation

Got a solution from Teradata support. Apparently the issue arises when the job variable file contains conflict information. 

 

The job var file that I used was copied from the tpt sample folder. and I apllied some edit to get rid of most of unused variables. 

However,for some reason, if the jobvarfile contains a variable called SourceFormat (which is not used anywhere), then this issue will appear.

**** edit ****** after i checked the document... em.. this time I checked the tpt userguide instead of tpt reference, I found this variable. In chapter 13,

SourceFormat
Before generating the DEFINE SCHEMA statement, Teradata PT queries the special job
variable SourceFormat. If has the value Delimited, then the generated schema will be in
delimited-file format; otherwise, it will be in the normal format in which the schema column
definitions closely match the Teradata Database table's column descriptions.

Great.. now I know why this happens.. and ... 

After I comment out this field, the auto schema works as expected. Here is the final working version of job var file

/********************************************************/
/* TPT Export Operator attributes                       */
/********************************************************/
SourceTdpId              = 'tdpid'
,SourceUserName           = 'user'
,SourceUserPassword       = 'secret'

/********************************************************/
/* TPT DataConnector Producer Operator attributes       */
/********************************************************/
/*,SourceFormat           = 'Delimited'     */


,SourceOpenmode           = 'Read'
,FileWriterTextDelimiter  = '@'
,DCProducerPrivateLogName = 'FILE_READER_LOG'
,DCProducerTraceLevel     = 'Milestones'
,TargetFormat             = 'Delimited'
,HadoopHost               = 'namenode:8020'
,MaxDecimalDigits         = 38

 

1 ACCEPTED SOLUTION
8 REPLIES
Enthusiast

Re: bugs in TPT 15.00 for auto schema generation

Please remove HadoopHost and auto schema generation should work. Same problem I'm also facing. 

DEFINE SCHEMA FROM TABLE is working as expected when I'm writting to a local file, but when I'm trying to load data in HDFS, I'm getting same error.

 https://community.teradata.com/t5/Tools/Export-to-Hadoop-hdfs-using-TPT/m-p/70439#M10259

 

Thanks & Regards,

Arpan.

Fan

Re: bugs in TPT 15.00 for auto schema generation

thanks for replying, in my case change to file writer by ignoring the hadoop host do not yield correct result. Same error message. 

 

i have opened incident with support to take a look at it. Let's see what will happen. will keep you posted. 

 

Teradata Employee

Re: bugs in TPT 15.00 for auto schema generation

Can you please provide the DDL of the source table?

 

-- SteveF
ywu
Fan

Re: bugs in TPT 15.00 for auto schema generation

show table edw_abc_db.job_status;

CREATE SET TABLE edw_abc_db.job_status ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      JOB_EXEC_ID INTEGER GENERATED BY DEFAULT AS IDENTITY
           (START WITH 1 
            INCREMENT BY 1 
            MINVALUE -2147483647 
            MAXVALUE 2147483647 
            NO CYCLE),
      JOB_ID INTEGER,
      JOB_START_TS TIMESTAMP(0),
      JOB_END_TS TIMESTAMP(0),
      JOB_STATUS_CD SMALLINT,
      JOB_STATUS_TXT VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      JOB_UPSERT_ROW_CNT BIGINT,
      CDC_BEG_VNUM BIGINT DEFAULT NULL ,
      CDC_END_VNUM BIGINT DEFAULT NULL ,
      JOB_DEL_ROW_CNT BIGINT DEFAULT 0 )
UNIQUE PRIMARY INDEX ( JOB_EXEC_ID );
Teradata Employee

Re: bugs in TPT 15.00 for auto schema generation

The subject line says TPT 15.00 is failing, but one of the outputs show TPT 15.10.00.00 being used.

If you are using 15.10, then please upgrade to at least 15.10.00.04.

If you are using 15.00, then please upgrade to at least 15.00.00.010.

-- SteveF
ywu
Fan

Re: bugs in TPT 15.00 for auto schema generation

well, cannot find the patches for tptbase on the patch server for 15.10..

I take the liberty to upgrade the ttu to ttu-15.10.01.06.

same problem .

 

rpm -qa|grep tpt
tptstream1510-15.10.01.06-1
tptbase1510-15.10.01.06-1

Teradata Employee

Re: bugs in TPT 15.00 for auto schema generation

Please post the job run showing the error, as well as the script (if it is different than what you originally posted), and the contents of the job variable file.

Thanks!

-- SteveF
ywu
Fan

Re: bugs in TPT 15.00 for auto schema generation

Got a solution from Teradata support. Apparently the issue arises when the job variable file contains conflict information. 

 

The job var file that I used was copied from the tpt sample folder. and I apllied some edit to get rid of most of unused variables. 

However,for some reason, if the jobvarfile contains a variable called SourceFormat (which is not used anywhere), then this issue will appear.

**** edit ****** after i checked the document... em.. this time I checked the tpt userguide instead of tpt reference, I found this variable. In chapter 13,

SourceFormat
Before generating the DEFINE SCHEMA statement, Teradata PT queries the special job
variable SourceFormat. If has the value Delimited, then the generated schema will be in
delimited-file format; otherwise, it will be in the normal format in which the schema column
definitions closely match the Teradata Database table's column descriptions.

Great.. now I know why this happens.. and ... 

After I comment out this field, the auto schema works as expected. Here is the final working version of job var file

/********************************************************/
/* TPT Export Operator attributes                       */
/********************************************************/
SourceTdpId              = 'tdpid'
,SourceUserName           = 'user'
,SourceUserPassword       = 'secret'

/********************************************************/
/* TPT DataConnector Producer Operator attributes       */
/********************************************************/
/*,SourceFormat           = 'Delimited'     */


,SourceOpenmode           = 'Read'
,FileWriterTextDelimiter  = '@'
,DCProducerPrivateLogName = 'FILE_READER_LOG'
,DCProducerTraceLevel     = 'Milestones'
,TargetFormat             = 'Delimited'
,HadoopHost               = 'namenode:8020'
,MaxDecimalDigits         = 38