I am developing a TPT load (Unix envirnoment) for a data file (with UTF-8 encoding) to populate a Teradata table with columns defined as VARCHAR() CHARACTER SET UNICODE. One character in the data file is causing my load to fail. If I remove this character the load completes successfully.
When the data file is viewed via Winscp the problem character appears as a square box, when I copy and paste the character into a Word document it appears as a "smiley face" emoji/emoticon type thing. Winscp details the following attributes for the character: character 55357 (oxD83 encoding utf-8)
whilst a bit of a googling suggests the following character 55357, unicode code point U+D83D, UTF-8 (Hex) ed a0 bd
I'm afraid this means nothing to me, what do I need to do to ensure that the TPT load job doesn't fail for these spurious UTF-8 characters which appear not to be supported by Teradata UTF-8 Unicode character set ? I don't want to pre process the file to remove this specific character as tomorrow I could easily receive a file with a different problem character.
Thanks for any assistance
When you want assistance, it is always a good idea to provide:
1. the version of TPT you are using
2. the actual failure (is it a DBS failure? a TPT failure)
The word "fail" can mean many things.
Did the job complete bu the row(s) with the aforementioned character end up in the error table?
If so, that would indicate the character is not supported by Teradata.
Did TPT fail?
If so, what was the error message?
Please ignore previous post , here is the correct error
Most emoji/emoticon characters are not supported by Teradata.
If your data contains invalid multi-byte characters, you can try to set RecordErrorFileName to a valid file and the DataConnector operator will put the error rows into that file and continue processing.
You did not tell me on what platform you are running and what version of TPT you are using.
We are on 14.10 and while using RowErrFileName, my TPT load puts the record into the error file, but also terminates the load
with below error:
FILE_READER: TPT19134 !ERROR! Fatal data error processing file 'new_gen.out'. Delimited Data Parsing error: Invalid multi-byte character in row 276, col 3.
The multibyte character encountered is not supported as of now in TD 14 or 15 hexa value (U+1F3C3).
I just want to make sure load doesn't terminate if it encounteres any such unsupported multi byte character.
Some errors will cause the DataConnector operator to terminate if it feels it cannot continue because the parsing would not allow it to find the end-of-record character reliably.
However, I will have someone look into this specific case.
Was record number 276 the first record with an invalid character?
Can you provide me with all of the attribute values you set up in the script (or job variable file) for the DataConnector operator?
U+00C7 Ç Ç capital C, cedilla
I am using the Tpt version -14.10.00.05 ,the load file is a UTF8 file with above delimiter.
When i try to load this file with CHARACTER SET UTF8 and
textdelimiter = Cedilla. i am getting the error
File_Reader: TPT19134 !Error! Fatal data error processing file X
Delimited data parsing Error : Column length overflow(s) in row1.
When i tried to view the file connent with UTF 8 fomat i am able to see the cedilla format but when i change the format to ISO in putty i am getting a different file format.
when i try to get the hexa value for the delimiter i am getting c3 87, which is not cedilla.
any inputs on error. ?