TPT Load Error : Untranslate Characters

Tools & Utilities
Enthusiast

TPT Load Error : Untranslate Characters

Hi, Everyone. I runed a TPT Fastload Job , with some Error Codes in ErrorTable 1. The raw data with some characters like " �����" is untranslatable . And the Error Message of the TPT Job is . ”  loader: TPT10510: Instance 1: Error Limit has been reached or exceeded. Of 404231659 row(s) sent to the RDBMS, 10013 row(s) were recorded as errors. ” I do not care the untranslate characters, but want to load the data as more as possible. What should I do ?

Accepted Solutions
Teradata Employee

Re: TPT Load Error : Untranslate Characters

Those appear to be "error substitution characters" which generally indicate some prior ETL step failed to translate data correctly.

And of course you could increase the "error limit" value to allow the load to continue even if there are more errors.

1 ACCEPTED SOLUTION
7 REPLIES 7
Teradata Employee

Re: TPT Load Error : Untranslate Characters

Hi.

 

This is probably because you have a file which is supposed to be UTF8 but it has characters that don't follow the UTF8 format. Hence the errors.

 

If you want to load the data anyway, you might want to replace the "�����" with simple "?":

 

$ > sed "s/\xEF\xBF\xBD/\x3F/g" originalfile > amendedfile

 

and use the amended file for the fastload.

 

HTH.

 

Cheers.

 

Carlos.

 

 

Teradata Employee

Re: TPT Load Error : Untranslate Characters

Those appear to be "error substitution characters" which generally indicate some prior ETL step failed to translate data correctly.

And of course you could increase the "error limit" value to allow the load to continue even if there are more errors.

Highlighted
Enthusiast

Re: TPT Load Error : Untranslate Characters

Hi, Fred It really worked well . Thank you for your help. Another question, Could I load the untranslate characters ? I do not care about the untranslate characters (They are all in the same comment column which is not important, but want to be load as more as possible )
Teradata Employee

Re: TPT Load Error : Untranslate Characters

That depends on how the source data is encoded and what server character set is in use for the target column.

When the session character set is UTF8 or UTF16 and the target column is CHARACTER SET UNICODE, then in TD16+ with TTU16+ you can use the Unicode Pass Through feature.

UnicodePassThrough='On'

 

Enthusiast

Re: TPT Load Error : Untranslate Characters

Hi, Fred My Sessino CharacterSet is UTF8, The target table is character SET unicode. And Teradata version 16.20. I have already set the UniCodePassThrough ='On', But there are also a lot of records in ErrorTable and ErrorCode is 6706. the DataParcel data file: 'FINAL Comments Tü XYXY31 ORDER �FAXED XYXY21 XYXY22 SCHEDULING (P) 293-4333 �PER THEIR POLICY THE CLINIC WILL �REVIEW THE GIVEN INFORMATION THEN CALL PT TO SCHEDULE XYXY12. XYXY13 GRAY, REFERRAL SPECIALIST XYXY31 -RESULTS RECEIVED FROM XYXY21. XYXY22 GIVEN TO PROV '. I open the file in the suggestted encode. which is named like 'Spainish European' . I am sorry that I do not know the real English name of it. The character setting in my tpt script is UTF8. Thank you for your help . Waiting for your reply.
Enthusiast

Re: TPT Load Error : Untranslate Characters

Hi, Fred, I changed Character Set from UTF8 to ASCII in TPT script. Some records has been loaded with untranslated characters, but no warning. But there are still some records that can not be loaded. I have connected with the raw data supplier. They do not want to change their raw data file. ( we had asked them to change the encoding of the data file in order to decrease the untranslated characters . ) Is there any other way to load the untranslated characters? Thank you for your help. By the way, You did not reply me for a while. It never happened before, I feel worried about you . Best wishes .
Teradata Employee

Re: TPT Load Error : Untranslate Characters

The sequence that is rendered as ï¿½ is x'EFBFBD' which is the UTF8 encoding for U+FFFD, the Unicode error replacement character (indicating some translation failure "upstream" since it's present in the input data) - which at least implies the data is intended to be UTF8. This character should be accepted as-is with Unicode Pass Through enabled.

 

On the other hand, in UTF8 the ü  (x'FC') would indicate the first byte of a four-byte sequence while the next byte appears to be a simple space and not a continuation byte - so that would be an ill-formed sequence. But with UPT I would just expect to see that one byte replaced with U+FFFD and the load continue.

 

Bottom line: With session character set UTF8, UNICODE column, and UPT enabled, I would expect you to be able to load this data. You may need to open an incident with Teradata support for assistance, and provide DDL, TPT script / logs, and some sample data to reproduce the problem.

 

I suppose you could revert to the approach used prior to having UPT: Define a staging table with CHARACTER SET LATIN, load using ASCII session character set, then use the UDFs from the "Unicode Tool Kit" to translate the input bytes to Unicode while inserting to the target table. Alternatively, the tool kit also contains an Access Module that can be used to "cleanse" delimited text input while loading to UNICODE columns via UTF8 session.