Handling Unicode Characters for TPT Export Operator

Tools & Utilities
Enthusiast

Handling Unicode Characters for TPT Export Operator

Hi All,

We are trying to Export Unicode (UTF-8) data from teradata table to a FlarFile. We are using TPT Export Operator. As mentioned in the other posts in this forum, we have trippled the length in tpt control file schema defination and in Select Statement, we have kept as it is in the source table. Also in tbuild command we have used "-e UTF8" and in tpt control file we have used "USING CHARACTER SET UTF8".

Below is the table DDL:

CREATE SET TABLE YYYYYYY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (SYS_COL INTEGER,
      PRIN_COL INTEGER,
      AGNT_COL INTEGER,
      COLL_CODE_COL INTEGER,
      DELQ_FAMILY_COL CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
      DELQ_FAMILY_DESCR_COL VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,
      DROP_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      LS_WORK_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      LS_TRAN_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      NO_ACTS_COL INTEGER,
      NO_MEMOS_COL INTEGER,
      REACTIVE_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      SUB_ACCT_NO_COL CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC,
      START_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      WORK_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      AUDIT_PROCESS_DT DATE FORMAT 'YYYY-MM-DD',
      AUDIT_BATCH_ID INTEGER,
      AUDIT_JOB_ID INTEGER,
      AUDIT_DML_ACTION_CD CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      AUDIT_INSERTED_TS TIMESTAMP(6),
      AUDIT_UPDATED_TS TIMESTAMP(6),
      RECORD_START_TS TIMESTAMP(0),
      RECORD_END_TS TIMESTAMP(0))
PRIMARY INDEX COL_BASE_NPI ( SUB_ACCT_NO_COL );

Below is the tpt control file:

USING CHARACTER SET UTF8
DEFINE JOB XXXXXXX
DESCRIPTION 'Export script for XXXXXXX from TD Table'
(
DEFINE SCHEMA SCHEMA_XXXXXXX
(
"SYS_COL"  INTEGER,
"PRIN_COL"  INTEGER,
"AGNT_COL"  INTEGER,
"COLL_CODE_COL"  INTEGER,
"DELQ_FAMILY_COL"  CHAR(9),
"DELQ_FAMILY_DESCR_COL"  VARCHAR(75),
"DROP_DTE_COL"  VARCHAR(30),
"LS_WORK_DTE_COL"  VARCHAR(30),
"LS_TRAN_DTE_COL"  VARCHAR(30),
"NO_ACTS_COL"  INTEGER,
"NO_MEMOS_COL"  INTEGER,
"REACTIVE_DTE_COL"  VARCHAR(30),
"SUB_ACCT_NO_COL"  CHAR(48),
"START_DTE_COL"  VARCHAR(30),
"WORK_DTE_COL"  VARCHAR(30),
"AUDIT_PROCESS_DT"  VARCHAR(30),
"AUDIT_BATCH_ID"  INTEGER,
"AUDIT_JOB_ID"  INTEGER,
"AUDIT_DML_ACTION_CD"  CHAR(3),
"AUDIT_INSERTED_TS"  VARCHAR(78),
"AUDIT_UPDATED_TS"  VARCHAR(78),
"RECORD_START_TS"  VARCHAR(78),
"RECORD_END_TS"  VARCHAR(78),
"CURRENT_FLAG"  VARCHAR(6),
"REPORTING_START_TS"  VARCHAR(78),
"REPORTING_END_TS"  VARCHAR(78)
);


DEFINE OPERATOR o_ExportOper
TYPE EXPORT
SCHEMA SCHEMA_XXXXXXX
ATTRIBUTES
(
VARCHAR UserName                        = @UserName
,VARCHAR UserPassword                   = @UserPassword
,VARCHAR TdpId                = @TdpId
,INTEGER MaxDecimalDigits     = 38
,INTEGER MaxSessions                    = @MaxSessions
,INTEGER MinSessions                    = @MinSessions
,VARCHAR PrivateLogName                 = 'Export'
,VARCHAR SpoolMode              = 'NoSpool'
,VARCHAR WorkingDatabase      = @WorkingDatabase
,VARCHAR SourceTable            = @SourceTable
,VARCHAR SelectStmt             = @SelectStmt
);


DEFINE OPERATOR o_FileWritter
TYPE DATACONNECTOR CONSUMER
SCHEMA SCHEMA_XXXXXXX
ATTRIBUTES
(
VARCHAR FileName                = @FileName
,VARCHAR Format                 = @Format
,VARCHAR TextDelimiterHex       = @TextDelimiterHex
,VARCHAR IndicatorMode          = 'N'
,VARCHAR OpenMode               = 'Write'
,VARCHAR PrivateLogName         = 'DataConnector'
);

APPLY TO OPERATOR (o_FileWritter[@LoadInst])
SELECT * FROM OPERATOR (o_ExportOper[@ReadInst]);
)
;

Below is the tbuild command:

tbuild -f /data/infa_shared/NDW/Scripts/extracts/scripts/extracts/XXXXXXX.tpt.ctl -v /data/infa_shared/NDW/common/.tdlogon/logon_ndw_extracts_file_tpt -u " WorkingDatabase='NDW_EXTRACT_VIEWS' , SourceTable='XXXXXXX' , load_op=o_ExportOper , LoadInst=1 , ReadInst=1 , MaxSessions=10 , MinSessions=5 , FileName='/data/infa_shared/NDW/extracts/full_extracts/COL_BASE.txt' , LOAD_DTS='2016-11-15-103021' , Format='DELIMITED' , TextDelimiterHex='' , SkipRows=0 , SelectStmt='SELECT CAST( "SYS_COL" AS INTEGER ), CAST( "PRIN_COL" AS INTEGER ), CAST( "AGNT_COL" AS INTEGER ), CAST( "COLL_CODE_COL" AS INTEGER ), CAST( "DELQ_FAMILY_COL" AS CHAR(3) ), CAST( "DELQ_FAMILY_DESCR_COL" AS VARCHAR(25) ), CAST( "DROP_DTE_COL" AS VARCHAR(10) ), CAST( "LS_WORK_DTE_COL" AS VARCHAR(10) ), CAST( "LS_TRAN_DTE_COL" AS VARCHAR(10) ), CAST( "NO_ACTS_COL" AS INTEGER ), CAST( "NO_MEMOS_COL" AS INTEGER ), CAST( "REACTIVE_DTE_COL" AS VARCHAR(10) ), CAST( "SUB_ACCT_NO_COL" AS CHAR(16) ), CAST( "START_DTE_COL" AS VARCHAR(10) ), CAST( "WORK_DTE_COL" AS VARCHAR(10) ), CAST( "AUDIT_PROCESS_DT" AS VARCHAR(10) ), CAST( "AUDIT_BATCH_ID" AS INTEGER ), CAST( "AUDIT_JOB_ID" AS INTEGER ), CAST( "AUDIT_DML_ACTION_CD" AS CHAR(1) ), CAST( "AUDIT_INSERTED_TS" AS VARCHAR(26) ), CAST( "AUDIT_UPDATED_TS" AS VARCHAR(26) ), CAST( "RECORD_START_TS" AS VARCHAR(26) ), CAST( "RECORD_END_TS" AS VARCHAR(26) ), CAST( "CURRENT_FLAG" AS VARCHAR(2) ), CAST( "REPORTING_START_TS" AS VARCHAR(26) ), CAST( "REPORTING_END_TS" AS VARCHAR(26) ) FROM NDW_EXTRACT_VIEWS.XXXXXXX; ' " XXXXXXX -e UTF8

But we are getting below error:

Teradata Parallel Transporter Version 14.10.00.08
Job log: /opt/teradata/client/14.10/tbuild/logs/XXXXXXX-511.out
Job id is XXXXXXX-511, running on pacdcpaprdetl1.cable.comcast.com
Teradata Parallel Transporter DataConnector Version 14.10.00.08
o_FileWritter: Instance 1 directing private log report to 'DataConnector-1'.
Teradata Parallel Transporter Export Operator Version 14.10.00.08
o_ExportOper: private log specified: Export
o_FileWritter: DataConnector Consumer operator Instances: 1
o_FileWritter: ECI operator ID: 'o_FileWritter-13210'
o_FileWritter: Operator instance 1 processing file '/data/infa_shared/NDW/extracts/full_extracts/COL_BASE.txt'.
o_ExportOper: connecting sessions
TPT_INFRA: TPT02638: Error: Conflicting data length for column(5) - "DELQ_FAMILY_COL". Source column's data length (9) Target column's data length (6).
o_ExportOper: TPT12108: Output Schema does not match data from SELECT statement
o_ExportOper: disconnecting sessions
o_ExportOper: Total processor time used = '0.14 Second(s)'
o_ExportOper: Start : Tue Nov 15 10:31:30 2016
o_ExportOper: End   : Tue Nov 15 10:31:30 2016
o_FileWritter: Total files processed: 0.
Job step MAIN_STEP terminated (status 12)
Job XXXXXXX terminated (status 12)
Job start: Tue Nov 15 10:31:27 2016
Job end:   Tue Nov 15 10:31:30 2016

Can someone please help us getting this issue resolved?

 

Thanks & Regards,

Arpan.

10 REPLIES
Enthusiast

Re: Handling Unicode Characters for TPT Export Operator

Hi All,

Can you please provide your valuable advice for this issue?

 

Thanks & Regards,

Arpan.

Teradata Employee

Re: Handling Unicode Characters for TPT Export Operator

Here is one thing to keep in mind:

DATE fields and TIMESTAMP fields do not need to be tripled.

Only CHAR and VARCHAR fields are impacted by the bytes-per-character calculations.

Thus, there is no need to define DATA/TIMESTAMP fields as VARCHAR (they are fixed-length fields anyway) in the schema.

 

 

However, in your case, the error is being reported on an earlier CHAR field.

Since it is defined as CHARACTER SET LATIN in the target table (and not CHARACTER SET UNICODE), it looks like the maximum number of bytes that can be returned for that field is 6, not 9 (2 bytes per character).

 

(I am not a Unicode expert, I can only go by what I see, and what Teradata reports back to us as far as how many bytes are returned for that column.)

 

Try changing the definition in the schema for that column to CHAR(6) and see if that helps.

Of course, you will most likely encounter that same type of error on other fields as well.

So you may want to adjust those columns as well.

 

-- SteveF
Enthusiast

Re: Handling Unicode Characters for TPT Export Operator

Thanks a lot Steve for your response. Basically we have a framework which exports the data using TPT Export. The framework itself generates the TPT
Control file and execute the same.
Now to overcome the situation of doubling or trippling the datatype, I used DEFINE SCHEMA FROM TABLE (say X1). Now I have 3 question:
Q1) Do we have any case where data type of DEFINE SCHEMA table (X1) will be different from DEFINE SCHEMA in tpt control file?
Q2) Whether DEFINE SCHEMA FROM TABLE will resolve the problem of changing the data type lenghth depending upon CHARACTER SET? I.e. if CHARACTER SET is LATIN, then
automatically required datatype length will be doubled and whenever it's UTF8, the same will be trippled?
Q3) When I'm using DEFINE SCHEMA FROM TABLE/SELECT OF OPERATOR, DATE datatype with format 'YY/MM/DD' is not getting exported properly. Below is the example:
In Table (in bteq prompt): 0001-01-01
In Exported File: 1/01/

Below is my modified tpt control file:

USING CHARACTER SET UTF8
DEFINE JOB EVEREST_COL_BASE
DESCRIPTION 'Export script for EVEREST_COL_BASE from TD View'
(
DEFINE SCHEMA SCHEMA_EVEREST_COL_BASE FROM TABLE '<table/view created based on original view>' 
/*[[DEFINE SCHEMA SCHEMA_EVEREST_COL_BASE FROM SELECT OF OPERATOR o_ExportOper;]] Used for SELECT FROM sqlQuery */

DEFINE OPERATOR o_ExportOper
TYPE EXPORT
SCHEMA SCHEMA_EVEREST_COL_BASE
ATTRIBUTES
(
VARCHAR UserName              = @UserName
,VARCHAR UserPassword         = @UserPassword
,VARCHAR TdpId                = @TdpId
,INTEGER MaxDecimalDigits     = 38
,INTEGER MaxSessions          = @MaxSessions
,INTEGER MinSessions          = @MinSessions
,VARCHAR PrivateLogName       = 'Export'
,VARCHAR SpoolMode            = 'NoSpool'
,VARCHAR WorkingDatabase      = @WorkingDatabase
,VARCHAR SourceTable          = @SourceTable
,VARCHAR SelectStmt           = @SelectStmt
);

DEFINE OPERATOR o_FileWritter
TYPE DATACONNECTOR CONSUMER
SCHEMA SCHEMA_EVEREST_COL_BASE
ATTRIBUTES
(
VARCHAR FileName                = @FileName
,VARCHAR Format                 = @Format
,VARCHAR TextDelimiterHex       = @TextDelimiterHex
,VARCHAR IndicatorMode          = 'N'
,VARCHAR OpenMode               = 'Write'
,VARCHAR PrivateLogName         = 'DataConnector'
);

APPLY TO OPERATOR (o_FileWritter[@LoadInst])
SELECT * FROM OPERATOR (o_ExportOper[@ReadInst]);
)
;

Below is the tbuild command:

tbuild -f /data/infa_shared/NDW/Scripts/extracts/scripts/extracts/EVEREST_COL_BASE_mod.tpt.ctl -v /data/infa_shared/NDW/common/.tdlogon/logon_ndw_extracts_file_tpt -u " WorkingDatabase='NDW_EXTRACT_VIEWS' , SourceTable='EVEREST_COL_BASE' , load_op=o_ExportOper , LoadInst=1 , ReadInst=1 , MaxSessions=10 , MinSessions=5 , FileName='/data/infa_shared/NDW/extracts/full_extracts/COL_BASE.txt' , LOAD_DTS='2016-11-15-103021' , Format='DELIMITED' , TextDelimiterHex='' , SkipRows=0 , SelectStmt='SELECT "SYS_COL","PRIN_COL","AGNT_COL","COLL_CODE_COL","DELQ_FAMILY_COL","DELQ_FAMILY_DESCR_COL","DROP_DTE_COL","LS_WORK_DTE_COL","LS_TRAN_DTE_COL","NO_ACTS_COL","NO_MEMOS_COL","REACTIVE_DTE_COL","SUB_ACCT_NO_COL","START_DTE_COL","WORK_DTE_COL","AUDIT_PROCESS_DT","AUDIT_BATCH_ID","AUDIT_JOB_ID","AUDIT_DML_ACTION_CD","AUDIT_INSERTED_TS","AUDIT_UPDATED_TS","RECORD_START_TS","RECORD_END_TS","CURRENT_FLAG","REPORTING_START_TS","REPORTING_END_TS" FROM NDW_EXTRACT_VIEWS.EVEREST_COL_BASE; ' " EVEREST_COL_BASE -e UTF8

Below is the original table structure.

REATE SET TABLE XXXX ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      SYS_COL INTEGER,
      PRIN_COL INTEGER,
      AGNT_COL INTEGER,
      COLL_CODE_COL INTEGER,
      DELQ_FAMILY_COL CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
      DELQ_FAMILY_DESCR_COL VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,
      DROP_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      LS_WORK_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      LS_TRAN_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      NO_ACTS_COL INTEGER,
      NO_MEMOS_COL INTEGER,
      REACTIVE_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      SUB_ACCT_NO_COL CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC,
      START_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      WORK_DTE_COL DATE FORMAT 'YYYY-MM-DD',
      AUDIT_PROCESS_DT DATE FORMAT 'YYYY-MM-DD',
      AUDIT_BATCH_ID INTEGER,
      AUDIT_JOB_ID INTEGER,
      AUDIT_DML_ACTION_CD CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      AUDIT_INSERTED_TS TIMESTAMP(6),
      AUDIT_UPDATED_TS TIMESTAMP(6),
      RECORD_START_TS TIMESTAMP(0),
      RECORD_END_TS TIMESTAMP(0))
PRIMARY INDEX COL_BASE_NPI ( SUB_ACCT_NO_COL );

I'm using CREATE SET TABLE _tmp_XXXXX AS (SELECT * FROM <View based on Original Table>) WITH NO DATA;
But while creating the table DATE datatype is being converted to 'YY/MM/DD' instead of 'YYYY-MM-DD'. If I change
SESSION DATEFROM to ANSIDATE, the DATE column is created as 'YYYY-MM-DD'. Is there any way to
create the as it is in the original view?

 

Thanks & Regards,

Arpan.

Teradata Employee

Re: Handling Unicode Characters for TPT Export Operator

What version of TPT are you running?

For the date field, we did have a bug in the file writer related to dropping the leading 0's, that has been fixed.

If you would like TPT to generate the schema for you, that is the best method.

It gets around a lot of these nuances.

We will connect to Teradata using the specified session character set and Teradata will provide to us the proper size of each column in bytes (doubled and tripled as necessary). 

 

-- SteveF
Teradata Employee

Re: Handling Unicode Characters for TPT Export Operator

CREATE TABLE x AS (SELECT * FROM y) data types will be based on the result of the SELECT, which in turn can be affected by session settings.

CREATE TABLE x AS y will copy the source table data type definitions (as well as index definitions).

Enthusiast

Re: Handling Unicode Characters for TPT Export Operator

Thanks a lot Steve for your response. Teradata Parallel Transporter Version 14.10.00.08. Currently we are generating the TPT control file using a complex process and if we can use this feature properly, we will be able to generate the TPT contro, file very easily.

Just eger to know that DEFINE SCHEMA SCHEMA_EVEREST_COL_BASE FROM TABLE should work with a view as well right (e.g. DEFINE SCHEMA SCHEMA_EVEREST_COL_BASE FROM TABLE '<view created based on original view>') ?

 

I added "VARCHAR DateForm = 'ansiDate' in tpt control file so that the data will always be generated in ANSI DATE format. But while running the tbuild command, getting below error:

Job log: /opt/teradata/client/14.10/tbuild/logs/EVEREST_COL_BASE-538.out
Job id is EVEREST_COL_BASE-538, running on pacdcpaprdetl1.cable.comcast.com
Teradata Parallel Transporter DataConnector Version 14.10.00.08
o_FileWritter: Instance 1 directing private log report to 'DataConnector-1'.
o_FileWritter: DataConnector Consumer operator Instances: 1
o_FileWritter: ECI operator ID: 'o_FileWritter-419'
o_FileWritter: Operator instance 1 processing file '/data/infa_shared/NDW/extracts/full_extracts/COL_BASE.txt'.
Teradata Parallel Transporter Export Operator Version 14.10.00.08
o_ExportOper: private log specified: Export
o_ExportOper: TPT10545: DateForm set to ANSIDATE but schema has INTDATE column
o_ExportOper: Total processor time used = '0.01 Second(s)'
o_ExportOper: Start : Mon Nov 21 03:02:12 2016
o_ExportOper: End   : Mon Nov 21 03:02:12 2016
o_FileWritter: Total files processed: 0.
Job step MAIN_STEP terminated (status 8)
Job EVEREST_COL_BASE terminated (status 8)
Job start: Mon Nov 21 03:02:09 2016
Job end:   Mon Nov 21 03:02:12 2016

Is there any way I can get rid of this error?

Note: In my tbuild command, I'm using a select statement and below is the select statement.

SELECT "SYS_COL","PRIN_COL","AGNT_COL","COLL_CODE_COL","DELQ_FAMILY_COL","DELQ_FAMILY_DESCR_COL",CAST("DROP_DTE_COL" AS DATE),"LS_WORK_DTE_COL","LS_TRAN_DTE_COL","NO_ACTS_COL","NO_MEMOS_COL","REACTIVE_DTE_COL","SUB_ACCT_NO_COL","START_DTE_COL","WORK_DTE_COL","AUDIT_PROCESS_DT","AUDIT_BATCH_ID","AUDIT_JOB_ID","AUDIT_DML_ACTION_CD","AUDIT_INSERTED_TS","AUDIT_UPDATED_TS","RECORD_START_TS","RECORD_END_TS","CURRENT_FLAG","REPORTING_START_TS","REPORTING_END_TS" FROM <Database_Name>.<View_Name>; 

 

Thanks & Regards,

Arpan.

 

Enthusiast

Re: Handling Unicode Characters for TPT Export Operator

Thank you very much Fred for reply. We are trying to create a table from a view as the data we are trying to export is from a view and view may be complex view as well. This temporary table we are using to create tpt control file as we need the datatype to generate the schema.

Is there any way I can cteare the table with same datatype and precision (I'm not bothered about index at all) as view?

 

Thanks & Regards,

Arpan.

Enthusiast

Re: Handling Unicode Characters for TPT Export Operator

Hi Steve,

Just saw your response on the below thread and got the answer. Used your suggestion and values are coming as exopected.

unloaded-Date-value-issue-by-tpt-scrip

 

value sample given below:

0001-01-01|0001-01-01

Now as per your suggestion, this is going to be fixed (correct me if I'm wrong) when using "DEFINE SCHEMA FROM TABLE" with DC operator". Our current TPT version is "14.10.00.08" and platform is "Red Hat Enterprise Linux Server release 5.9 (Tikanga)".

 

In which version of TPT this issue is going to be fixed and when is the expected release date of that fix? We are planning to upgrade TTU, so we'll use that version of TTU to use "DEFINE SCHEMA FROM TABLE" feature as well as directly write to HDFS (which is available from 15.00.xx.xx I believe).

 

Thanks againg for your valuable suggestion.

 

Thanks & Regards,

Arpan.

Enthusiast

Re: Handling Unicode Characters for TPT Export Operator

Hi Steve,

Can you please advice in which version of TPT, the issue with DEFINE SCHEMA FROM TABLE with DC Operator has been fixed? If that fix is available, we'll upgrade TTU to that or later version only.

 

Thanks & Regards,

Arpan.