Problem Loading Multibyte Characters In Teradata Using Fastload

Tools & Utilities
Enthusiast

Problem Loading Multibyte Characters In Teradata Using Fastload

Hi All,

I am finding it difficult to load multibyte data using fastload 

Some of the columns in our source are containing graphic characters like emoji along with other multibyte characters like 千春 

These are being extracted from Oracle using informatica and the o/p file (utf- 8 code page, delimited) generated converts the emoji's into its

equivalent hexadecimal value - d83d dd30 some thing like this - <d83d><dd30>千春

Now we are fastloading (Character set:ASCII) it to a staging table with columns defined as latin and finally loading into target

where the target column is defined as unicode using a UDF to convert latin to unicode 

Finally the Japanese characters are coming through fine however the emoji's are not converted properly

I doubt there anything wrong with the o/p file as 千春 are coming good and they are also unicode

Please help me to load emoji characters

Looking forward to your responses

Thanks

R.Rajeev

10 REPLIES
Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

 
Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

Hi,

The emoji characters look like the one in the following link

http://www.charbase.com/1f530-unicode-japanese-symbol-for-beginner

I am not able to paste the charatcer in the forum hence providing the link

Thanks

R.Rajeev

Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

Hi All,

Can anyone please help me with the above ?

Thanks

R.Rajeev

Teradata Employee

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

First, why are you trying to load UTF8 data using a character set of ASCII? And why would you want a staging table with Latin columns?

The UTF8 code points will most likely all be out of the range of ASCII.

Second, please give an example of your delimited input file. The character sequence <d83d> will not result in the data being converted to a character to be loaded. Delimited data represented in hex format will not be interpreted as an emoji character. The data is VARCHAR and all of the characters are just a sequence of bytes.

-- SteveF
Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

 
Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

 
Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

Hi Feinholz,

Thanks for your response here's a sample of how the data looks when you vi the output file  from infa in unix (mac)

879987576^A600^A487348983^A6585172061^A10^A^A<d83d><dd30><d83d><dd30><d83d><dd30>千春^A^A<d83d><dd30>Raju<d83d><dd30><d83d><dd30>^A^A林田<

^A  is the delimiter i can attach a sample file if required

We are using fastload to load stage and was not too sure if it can be used to load tables with unicode column (i may be wrong please correct me if so), 

so as per our current set up we load it in to table with latin columns with double the width and 

then use a UDF to convert them back to UNICODE before loading the target

i figured out that 'F09F94B0'XC is the exact hexadecimal literal as shown in the SS when i use ASCII the emoji is displayed

however it fails when i change it to UTF-8 not to sure why but there could be something wrong in my settings

 *** Logon successfully completed.

 *** Teradata Database Release is 13.10.07.15                   

 *** Teradata Database Version is 13.10.07.15                     

 *** Transaction Semantics are BTET.

 *** Session Character Set Name is 'ASCII'.

 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command: 

sel 'F09F94B0'XC;

sel 'F09F94B0'XC;

 *** Query completed. One row found. One column returned. 

 *** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command: 

.set session charset 'utf-8';

.set session charset 'utf-8';

 BTEQ -- Enter your SQL request or BTEQ command: 

sel 'F09F94B0'XC;

sel 'F09F94B0'XC;

 *** CLI error: MTDP: EM_CHARNAME(227): invalid character set name specified.

 *** Return code from CLI is: 227

 *** CLI error: MTDP: EM_CHARNAME(227): invalid character set name specified.

 *** Return code from CLI is: 311

Morever i was able to load the data  manually into a table using ASCII charset 

Now i am not too sure if the output file generated by informatica is indeed the one which teradata might interpret correctly and load and  display the emojis

may be the file should look a bit different (instead of d83d ) but i have used utf-8 in infa as well so dont know what to do

Looking forward for your inputs as this has become quite a  problem getting the graphic characters accross to TD :)

Thanks

R.Rajeev


Teradata Employee

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

When you use FastLoad to load a delimited file, the schema is made up of all VARCHAR fields. Thus, the characters <dd30> just represent a character string. There is no way for FastLoad to convert them.

Yes, FastLoad can load into a table with a character set of UTF8. But again, these are just a sequence of characters, They are character strings.

So, if you have a way to convert (via UDF) <dd30> to something that is in the format you desire, that would be the correct approach.

In BTEQ, have you tried "UTF8" as the character set name, and not "UTF-8"? (I do not know the correct syntax for BTEQ. In TPT and the load utilities we use "UTF8", not "UTF-8".)

-- SteveF
Highlighted
Enthusiast

Re: Problem Loading Multibyte Characters In Teradata Using Fastload

Hi Feinholz,

Thanks a lot for the insight

Now if i were to get 'F09F94B0'XC in my file instead of <d83d><dd30>

will then i be able to use fastload and load the table and avoid UDF ?

i have tried loading 'F09F94B0'XC manually in to a latin column and when i select that column it displays the graphic character :)

Looking for your inputs

Thanks

R.Rajeev