TPT - Delimited Data Parsing error: Invalid multi-byte character

Tools & Utilities
Enthusiast

TPT - Delimited Data Parsing error: Invalid multi-byte character

I am developing a TPT load (Unix envirnoment) for a data file (with UTF-8 encoding) to populate a Teradata table with columns defined as VARCHAR() CHARACTER SET UNICODE. One character in the data file is causing my load to fail. If I remove this character the load completes successfully.

When the data file is viewed via Winscp the problem character appears as a square box, when I copy and paste the character into a Word document it appears as a "smiley face" emoji/emoticon type thing. Winscp details the following attributes for the character: character 55357 (oxD83 encoding utf-8)

whilst a bit of a googling suggests the following character 55357, unicode code point U+D83D, UTF-8 (Hex) ed a0 bd 

I'm afraid this means nothing to me, what do I need to do to ensure that the TPT load job doesn't fail for these spurious UTF-8 characters which appear not to be supported by Teradata UTF-8 Unicode character set ? I don't want to pre process the file to remove this specific character as tomorrow I could easily receive a file with a different problem character.

Thanks for any assistance

12 REPLIES
Teradata Employee

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

When you want assistance, it is always a good idea to provide:

1. the version of TPT you are using

2. the actual failure (is it a DBS failure? a TPT failure)

The word "fail" can mean many things.

Did the job complete bu the row(s) with the aforementioned character end up in the error table?

If so, that would indicate the character is not supported by Teradata.

Did TPT fail?

If so, what was the error message?

-- SteveF

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Hi ,


I am also getting similar error due to emoji/emoticons in data .

FILE_READER: TPT19134 !ERROR! Fatal data error processing file 'users/data/tgtfiles/rep_t_hit.out'. Delimited Data Parsing error: Column length overflow(s) in row 230.

FILE_READER: TPT19003 TPT Exit code set to 12.

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 0, Total Rows Received = 0, Total Rows Sent = 0

FILE_READER: Total files processed: 0.

LOAD_OPERATOR: Total processor time used = '0.337322 Second(s)'

LOAD_OPERATOR: Start : Mon Jul 20 13:41:55 2015

LOAD_OPERATOR: End   : Mon Jul 20 13:42:04 2015

Job step insert_data terminated (status 12)

Job rep_t_hit_85912063 terminated (status 12)

Job start: Mon Jul 20 13:30:04 2015

Job end:   Mon Jul 20 13:42:04 2015

Total available memory:          20000676

Largest allocable area:          20000676

Memory use high water mark:       3490272

Free map size:                       1024

Free map use high water mark:          19

Free list use high water mark:          0

Thanks in advance,

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Please ignore previous post , here is the correct error

Hi ,


I am also getting similar error due to emoji/emoticons in data .

DATACONN: TPT19003 Warning: EscapeTextDelimiter has been encountered as the last character of column data

LOAD_OPERATOR: disconnecting sessions

DATACONN: TPT19134 !ERROR! Fatal data error processing file '/home/HIT_TAB.txt'. Delimited Data Parsing error: Invalid multi-byte character in row 158989, col 97.

DATACONN: TPT19003 TPT Exit code set to 12.

DATACONN: Total files processed: 0.

DATACONN: TPT19003 11 occurances of EscapeTextDelimiter encountered as the last character of column data.

DATACONN: TPT19003 Warning: The use of the same EscapeTextDelimiter value ('\') to export this data in DELIMITED format will result in an error.

LOAD_OPERATOR: Total processor time used = '0.579601 Second(s)'

LOAD_OPERATOR: Start : Thu Aug 13 09:16:36 2015

LOAD_OPERATOR: End   : Thu Aug 13 09:16:56 2015

Job step MAIN_STEP terminated (status 12)

Job c1030983 terminated (status 12)

Job start: Thu Aug 13 09:16:34 2015

Job end:   Thu Aug 13 09:16:56 2015

Thanks in advance,

Teradata Employee

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Most emoji/emoticon characters are not supported by Teradata.

If your data contains invalid multi-byte characters, you can try to set RecordErrorFileName to a valid file and the DataConnector operator will put the error rows into that file and continue processing.

You did not tell me on what platform you are running and what version of TPT you are using.

-- SteveF

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Hi Steve,

We are on 14.10 and while using RowErrFileName, my TPT load puts the record into the error file, but also terminates the load

with below error:

FILE_READER: TPT19134 !ERROR! Fatal data error processing file 'new_gen.out'. Delimited Data Parsing error: Invalid multi-byte character in row 276, col 3.

The multibyte character encountered is not supported as of now in TD 14 or 15 hexa value (U+1F3C3).

I just want to make sure load doesn't terminate if it encounteres any such unsupported multi byte character.

Please guide.

Teradata Employee

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Some errors will cause the DataConnector operator to terminate if it feels it cannot continue because the parsing would not allow it to find the end-of-record character reliably.

However, I will have someone look into this specific case.

-- SteveF
Teradata Employee

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Was record number 276 the first record with an invalid character?

Can you provide me with all of the attribute values you set up in the script (or job variable file) for the DataConnector operator?

-- SteveF
Enthusiast

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

 U+00C7   Ç   Ç  capital C, cedilla

I am using the Tpt version -14.10.00.05 ,the load file is a UTF8  file with above delimiter.

When i try to load this file with CHARACTER SET UTF8 and

textdelimiter = Cedilla. i am getting the error

File_Reader: TPT19134 !Error! Fatal data error processing file X

Delimited data parsing Error : Column length overflow(s) in row1.

When i tried to view the file connent with UTF 8 fomat i am able to see the cedilla format but when i change the format to ISO in putty i am getting a different file format.

when i try to get the hexa value for the delimiter i am getting c3 87, which is not cedilla.

any inputs on error. ?

Teradata Employee

Re: TPT - Delimited Data Parsing error: Invalid multi-byte character

Please provide the script and first few rows of data.

-- SteveF