TPT UTF8 Import of "no valid unicode character" - need help

Tools
Enthusiast

TPT UTF8 Import of "no valid unicode character" - need help

Hi,

i need held to import a .csv into a table.

We have a .csv file with a few columns. One of them is unicode. The import of these "U+10002A" characters failed. Without these characters the import is fine.

TPT19350 I/O error on file xxx

TPT19003 read

Teradata TPT is version 13.10.00.08

When you need more information please write.

Thanks!

brian

8 REPLIES
Teradata Employee

Re: TPT UTF8 Import of "no valid unicode character" - need help

Please provide the entire output from the console.

-- SteveF
Enthusiast

Re: TPT UTF8 Import of "no valid unicode character" - need help

Teradata Parallel Transporter Load Operator Version 13.10.00.04
LOAD_OPERATOR: private log specified: best_log_name
FILE_READER: TPT19008 DataConnector Producer operator Instances: 1
FILE_READER: TPT19003 ECI operator ID: FILE_READER-10650
FILE_READER: TPT19222 Operator instance 1 processing file '/text1.TXT'.
LOAD_OPERATOR: connecting sessions
LOAD_OPERATOR: preparing target table
LOAD_OPERATOR: entering Acquisition Phase
FILE_READER: TPT19350 I/O error on file '/text1.TXT'.
FILE_READER: TPT19003 Read
FILE_READER: TPT19350 I/O error on file '/text1.TXT'.
LOAD_OPERATOR: disconnecting sessions
FILE_READER: TPT19221 Total files processed: 0.
LOAD_OPERATOR: Total processor time used = '0.23 Second(s)'
LOAD_OPERATOR: Start : Mon Jun 29 07:31:19 2015

Hi,

here ist the output.

Teradata Employee

Re: TPT UTF8 Import of "no valid unicode character" - need help

Thank you.

And just checking that your script indicates "USING CHARACTER SET UTF8" prior to the DEFINE JOB?

-- SteveF
Teradata Employee

Re: TPT UTF8 Import of "no valid unicode character" - need help

It appears as though  "U+10002A" is a character from 4-byte UTF8 encoding.

If so, Teradata load/unload products do not support 4-byte UTF8 data.

-- SteveF
Enthusiast

Re: TPT UTF8 Import of "no valid unicode character" - need help

Thank you.

I think, we can not load the data without preparing the file.

Enthusiast

Re: TPT UTF8 Import of "no valid unicode character" - need help

BEGIN LOADING
   $DBX_LOAD....
   ERRORFILES
     $DBX_LOAD...._ERR1,
     $DBX_LOAD...._ERR2
     CHECKPOINT 3000000;
     SET RECORD VARTEXT "§" NOSTOP DISPLAY_ERRORS;

     axsmod /../.../work/cp2uni_axm.so "CodePage=UTF8, ErrorChar=U+003F";

Hi,

we found a solution for this unicode import problem! Using the "AXSMOD" file from the Unicode Toolkit.

The untranslatable character is now a "?" (define in ErrorChar).

And you can use the axsmod in TPT script:

Varchar AccessModuleInitStr = 'CodePage=UTF8, ErrorChar=U+003F, EOR=0A',

Varchar AccessModuleName = '/.../.../work/cp2uni_axm.so'

Simple - when you know...

greets,

brian

Enthusiast

Re: TPT UTF8 Import of "no valid unicode character" - need help

Info:  1st Code in last post is a FastLoad.

But you can use axsmode in FastLoad, MLoad or TPT.

There different axsmod files for AIX, Suse, ... 

Please refer to the documentation "Teradata Unicode Toolkit".

Teradata Employee

Re: TPT UTF8 Import of "no valid unicode character" - need help

An import of U+10002A is uncommon as it is a user-defined character in the Supplementary Private Use Area-B. It could also be a corrupted encoding.  Private use has been used by Japanese communications companies to encode Emoji.