Multibyte character loading issue with TDCH

Database
Enthusiast

Re: Multibyte character loading issue with TDCH

Hi David,   Got Error 3798: A column or character expression is larger than the max size.

 

I downloaded Product ID: B035-1056-111K @ www.info.teradata.com for supported characters in Unicode & i see supported characters are only until U+FFFC under Unicode 6.0 while we have U+11000 to U+2B81D in Unicode 6.0. So are these not supported? I suppose we are using udf for the same reason. Please confirm if my understanding is right. 

 

That said, you think U+1F451 can be converted via UDF i am currently working? 

I just recapped to be sure i get it right.

 

Thank You

Highlighted
Teradata Employee

Re: Multibyte character loading issue with TDCH

Yes U+1F451 is not a supported character in td15.10. In td16.00 it is a pass through character. For more details, see chapter 9 of the currently available 16.00 documentation:

Teradata Database

International Character Set Support

Release 16.00

B035-1125-160K

December 2016

 

The latin''XC hex literal will not work on TD15.10 in this case as the SQL parser first converts the literal to Unicode, which is not what you want. You will need to load the UTF-8 characters with a load utility, or TDCH, using an ASCII client character set into a Latin column.

 

-David

 

Enthusiast

Re: Multibyte character loading issue with TDCH

Thank You David.

 

I loaded data into a table with LATIN character set using mload utility from UNIX terminal with ASCII terminal session.

mload -c ASCII < "scriptGreeting.txt" > "scriptGreeting.log"

 

I see the hex code stored fine in the table

SyntaxEditor Code Snippet

sel char2hexint(substr(greeting,25,8)) ,substr(greeting,25,4),greeting from "schema".Unicode_greeting_mload

Answer:

hexvalue(substring(greeting) - substring(greeting) - greeting

4E20F09F9191 - N П - THE HEIRS OF THE MOUNTAIN П‘‘

 

Now, if i use udf, i still get replacement character on screen.( I tried with UTF16 /UTF-8 session in SQLA )

SyntaxEditor Code Snippet

sel "schema".pt_utf8to16(Greeting,_UNICODE'003F'XC) from  "schema".Unicode_greeting_mload

 My goal is to see the ouput in answersheet a CROWN or atleast weired character but not replacement character when SELECTED from UNICODE column (assuming we will succesfully load these hex values to UNICODE column from LATIN column).

Did i miss something?

Teradata Employee

Re: Multibyte character loading issue with TDCH

You are going to run into problems selecting, exporting, or seaching for SMP emoji in any pre-16.00 release. See the UTK release 1.5.5.0, page 14 for a description of the limitations. Note that the these operations may work with UTF16. 

Enthusiast

Re: Multibyte character loading issue with TDCH

Thanks David. I was able to load emoji into latin and convert that using UDF and i see replacement character in select result.

I was looking at getting the data into table atleast with replacement character but our cureent setup doesnot allow replacement character.

Can we search and replace EFBFBD with say space?

How do i alter my below query to do so

SyntaxEditor Code Snippet

insert into Unicode_greeting_mload sel ID, CodePage, LCID, Description, "Schema".pt_utf8to16(Greeting,_UNICODE'003F'XC) from LATIN_greeting_mload

 

Please let me know if you think we can replace any character like SMP emoji that Teradata do not support into a space while loading using TDCH Fast load itself or i have to do TDCH -->Latin Table -->.Unicode table with conversion of replacement character to space

 

Thank You

 

Enthusiast

Re: Multibyte character loading issue with TDCH

If teradata cannot handle certain characters then we should be able to someway eliminate it. We are loosing records and ultimately end user will be hesitent to choose teradata. Any suggestion from anyone for alternate solution to get atleast valid characters into table?

 

Thank You

Teradata Employee

Re: Multibyte character loading issue with TDCH

Do you have a pre-sales technical contact, or the ability contact the Global Support Center?

 

Enthusiast

Re: Multibyte character loading issue with TDCH

Thanks for your support. 

Yes we reached GSC. Was trying to find answer interim :-)

 

They confirmed this as a drawback and they said everything you said here and unfortunately this is a bad news and we are upset as this was not known to us.

 

Strange is that TDCH fast load rejects the data while the same data if we use Fastload Utility is getting into the database with modified hex value for problamatic characters which are 4-byte in lenght. If Fastload utility can do then TDCH fastload plugin should also must have.

 

On a different note, can you help how to insert default timestamp value in a timestampe(0) data type field using fastload script. I tried to google but dint find any help. I saw one post where it say to use :DEFAULT in create table but i dont have requirement to create table everytime when fastloading.

i have a column called TDCH_DTTM TIMESTAMP (0) for audit purpose. How do i load this now? help me with Define & insert value part please.

 

Thank You

 

Teradata Employee

Re: Multibyte character loading issue with TDCH

Please start a new thread for 'how to insert default timestamp value ....'

 

Thanks,

-Dave

Teradata Employee

Re: Multibyte character loading issue with TDCH

According to your Select example and hex values 0xF09F9191, you have successfully loaded Emoji characters in UTF8 into Teradata Latin columns via ASCII session. Now, you can convert them to Unicode using pass-thorugh UDF, as your example below. 

sel "schema".pt_utf8to16(Greeting,_UNICODE'003F'XC) from  "schema".Unicode_greeting_mload ; 

 Because you are running TD 15.x, the UTF8 session character set only supports (3) bytes in UTF8. Instead of UTF8 session, you will need to switch to UTF16, so that the client can get the right UTF16 values on your client. Then, it is a matter of display Emoji characters on your client. With SQLA, you need to set the font "Segoe UI Symbol" on your Answer set pane. Highlight the Answer set pane, right click and go to Set Font. Then, you can display Emoiji characters on your SQL screen as shown below. 


Emoji_type EPISODE_TITLE

----------------------------------------------
1 Unicode 5.0 💻
2 Unicode 5.0 👑