Unicode Export through TPT

Tools
Enthusiast

Unicode Export through TPT

Hi, 

I need to export few chinese and japanese characters stored in teradata to a file.

I have tried different options to export data through TPT but i am either not able to see the unicode characters in the generated file or i face the conflicting data length error.

Please find the details below.

Requirement : 

1. Export chinese / japanese characters out of teradata to a file.

2. The table has japanese characters stored in ClassNm and DeptNm field 

Eg      : 本菊地 , 竹益下

TPT Options tried :

1. Generated TPT with the same schema and column length as in TD   

        Output : Exports data , but unicode data not displayed in file.

2. Generated TPT with column length 2 times higher for unicode columns  

        Output : Exports data, but unicode data not displayed in file.

3. Generated TPT with column length 2 times higher for unicode columns and mentioned USING CHARACTER SET UTF8 as the first line of the TPT file.

        Output : Fails with Error as FILE_WRITER[1]: Operator instance 1 processing file 'unitest.out'.

                                     EXPORT_OPERATOR: connecting sessions

                                     TPT_INFRA: TPT02638: Error: Conflicting data length for column(1) - EmpCatNumber. Source column's data length (20) Target column's data length (60).

                                     EXPORT_OPERATOR: TPT12108: Output Schema does not match data from SELECT statement

                                     FILE_WRITER[1]: Total files processed: 0.

                                     EXPORT_OPERATOR: disconnecting sessions

                                     EXPORT_OPERATOR: Total processor time used = '0.020462 Second(s)'

TPT Execution command :

tbuild -f unitest.ctl -v TPTParameter.param -e UTF8 -s 1

Table Structure :

CREATE SET TABLE master_t.unitest ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      EmpCatNumber VARCHAR(10) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,

      EmpType CHAR(5) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,

      EmpCd VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,

      ClassNm VARCHAR(180) CHARACTER SET UNICODE NOT CASESPECIFIC,

      DeptNm VARCHAR(180) CHARACTER SET UNICODE NOT CASESPECIFIC,

      CrtdBy VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT USER ,

      CrtTmstmp TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0))

PRIMARY INDEX Emp_Nm_NUPI ( EmpCatNumber ,EmpType );

TPT Script :

USING CHARACTER SET UTF8

DEFINE JOB EXPORT_unitest_TABLE_TO_FILE

DESCRIPTION 'EXPORT unitest TABLE TO A FILE'

(

/*****************************/

DEFINE SCHEMA unitest_SCHEMA

DESCRIPTION 'SAMPLE unitest SCHEMA'

(

EmpCatNumber Varchar(20) , EmpType Varchar(12) , EmpCd Varchar(10) , ClassNm Varchar(360) , DeptNm Varchar(360) , CrtdBy Varchar(25) , CrtTmstmp Varchar(30)

);

/*****************************/

/*****************************/

DEFINE OPERATOR FILE_WRITER()

DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'

TYPE DATACONNECTOR CONSUMER

SCHEMA *

ATTRIBUTES

(

VARCHAR PrivateLogName    = 'file_writer_privatelog',

VARCHAR FileName          = 'unitest.out',

VARCHAR IndicatorMode     = 'N',

VARCHAR OpenMode          = 'Write',

VARCHAR Format = 'DELIMITED',

VARCHAR TextDelimiter = @MyDelimiter,

VARCHAR TRACELEVEL='ALL'

);

/*****************************/

/*****************************/

DEFINE OPERATOR EXPORT_OPERATOR()

DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'

TYPE EXPORT

SCHEMA unitest_SCHEMA

ATTRIBUTES

(

VARCHAR PrivateLogName    = 'export_privatelog',

INTEGER MaxSessions       = @MaxSessions ,

INTEGER MinSessions       = @MinSessions,

VARCHAR TdpId             = @MyTdPid,

VARCHAR UserName          = @MyUserName,

VARCHAR UserPassword      = @MyPassword,

VARCHAR AccountId,

VARCHAR SelectStmt        = 'select    

  cast (EmpCatNumber as Varchar(20)), cast (EmpType as Varchar(12)), cast (EmpCd as Varchar(10)), cast (ClassNm as Varchar(360)), cast (DeptNm as Varchar(360)), cast (CrtdBy as Varchar(25)), cast (CrtTmstmp as Varchar(30))

from eis_t.unitest  where  1=1 ; '

);

/*****************************/

STEP export_to_file

(

APPLY TO OPERATOR (FILE_WRITER() )

SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );

);

);

Request your help on the same.

Thanks & Regards,

Srivignesh KN

6 REPLIES
Teradata Employee

Re: Unicode Export through TPT

In order to see the chinese/japanese characters, you MUST specify either UTF8 or UTF16 as the character set in the script.

If you specify UTF8, you must triple the size of the character fields (CHAR, VARCHAR).

If you specify UTF16, you must double the size of the character fields (CHAR,VARCHAR).

-- SteveF
Enthusiast

Re: Unicode Export through TPT

Hi Steve, 

I have mentioned UTF8 and tripled the size of the unicode fields, i still get the following error.

EXPORT_OPERATOR: connecting sessions

TPT_INFRA: TPT02638: Error: Conflicting data length for column(1) - EmpCatNumber. Source column's data length (30) Target column's data length (90).

EXPORT_OPERATOR: TPT12108: Output Schema does not match data from SELECT statement

Thanks

Teradata Employee

Re: Unicode Export through TPT

In the example script you sent, EmpCatNumber was defined in the table as VARCHAR(10).

If that is the case, then the script must specify VARCHAR(30).

If this is done correctly, I am not sure how TPT could think that the target column's data length would be 90.

What version of TPT are you running?

-- SteveF
Enthusiast

Re: Unicode Export through TPT

I have changed the empcatnumber to 30. I even tried increasing the size to 60/90 but the error meesage remains the same.

My TPT version is Teradata Parallel Transporter Version 15.10.01.02 64-Bit.

Can you please help me understand if we need to make any system wide property defined for having handling unicode exports.

$ tbuild -f unitest.ctl -v TPTParameter.param -e UTF8 -s 1

Teradata Parallel Transporter Version 15.10.01.02 64-Bit

TPT_INFRA: TPT03624: Warning: tbuild -s option argument specifies the first job step;

  no job steps will be skipped (unless this is a restarted job).

Job log: /opt/teradata/client/15.10/tbuild/logs/edwexp-97.out

Job id is edwexp-97, running on ushexpd1ltdexp1

Teradata Parallel Transporter Export Operator Version 15.10.01.02

EXPORT_OPERATOR: private log specified: export_privatelog

Teradata Parallel Transporter DataConnector Operator Version 15.10.01.02

FILE_WRITER[1]: Instance 1 directing private log report to 'file_writer_privatelog-1'.

FILE_WRITER[1]: DataConnector Consumer operator Instances: 1

FILE_WRITER[1]: ECI operator ID: 'FILE_WRITER-63622'

FILE_WRITER[1]: Operator instance 1 processing file 'unitest.out'.

EXPORT_OPERATOR: connecting sessions

TPT_INFRA: TPT02638: Error: Conflicting data length for column(1) - EmpCatNumber. Source column's data length (30) Target column's data length (90).

EXPORT_OPERATOR: TPT12108: Output Schema does not match data from SELECT statement

FILE_WRITER[1]: Total files processed: 0.

EXPORT_OPERATOR: disconnecting sessions

EXPORT_OPERATOR: Total processor time used = '0.020582 Second(s)'

EXPORT_OPERATOR: Start : Fri Aug 19 14:00:09 2016

EXPORT_OPERATOR: End   : Fri Aug 19 14:00:10 2016

Job step export_to_file terminated (status 8)

Job edwexp terminated (status 8)

Job start: Fri Aug 19 14:00:05 2016

Job end:   Fri Aug 19 14:00:10 2016

Thanks,

Srivignesh KN

Teradata Employee

Re: Unicode Export through TPT

Please send me your script.

(I may make changes to it.)

-- SteveF
Junior Contributor

Re: Unicode Export through TPT

You must triple the VarChar size in your Export Schema, not in your Select. 

Additionally you don't need to specify a schema at all, when the Format is DELIMITED, and your Select can be a simple SELECT *  (unless you want to use a specific Format for the cast), as typecasts to varchar are done automatically.