Hi David, Got Error 3798: A column or character expression is larger than the max size.
I downloaded Product ID: B035-1056-111K @ www.info.teradata.com for supported characters in Unicode & i see supported characters are only until U+FFFC under Unicode 6.0 while we have U+11000 to U+2B81D in Unicode 6.0. So are these not supported? I suppose we are using udf for the same reason. Please confirm if my understanding is right.
That said, you think U+1F451 can be converted via UDF i am currently working?
I just recapped to be sure i get it right.
Yes U+1F451 is not a supported character in td15.10. In td16.00 it is a pass through character. For more details, see chapter 9 of the currently available 16.00 documentation:
International Character Set Support
The latin''XC hex literal will not work on TD15.10 in this case as the SQL parser first converts the literal to Unicode, which is not what you want. You will need to load the UTF-8 characters with a load utility, or TDCH, using an ASCII client character set into a Latin column.
Thank You David.
I loaded data into a table with LATIN character set using mload utility from UNIX terminal with ASCII terminal session.
mload -c ASCII < "scriptGreeting.txt" > "scriptGreeting.log"
I see the hex code stored fine in the table
SyntaxEditor Code Snippet
sel char2hexint(substr(greeting,25,8)) ,substr(greeting,25,4),greeting from "schema".Unicode_greeting_mload
hexvalue(substring(greeting) - substring(greeting) - greeting
4E20F09F9191 - N ÐŸ - THE HEIRS OF THE MOUNTAIN ÐŸ‘‘
Now, if i use udf, i still get replacement character on screen.( I tried with UTF16 /UTF-8 session in SQLA )
SyntaxEditor Code Snippet
sel "schema".pt_utf8to16(Greeting,_UNICODE'003F'XC) from "schema".Unicode_greeting_mload
My goal is to see the ouput in answersheet a CROWN or atleast weired character but not replacement character when SELECTED from UNICODE column (assuming we will succesfully load these hex values to UNICODE column from LATIN column).
Did i miss something?
You are going to run into problems selecting, exporting, or seaching for SMP emoji in any pre-16.00 release. See the UTK release 126.96.36.199, page 14 for a description of the limitations. Note that the these operations may work with UTF16.
Thanks David. I was able to load emoji into latin and convert that using UDF and i see replacement character in select result.
I was looking at getting the data into table atleast with replacement character but our cureent setup doesnot allow replacement character.
Can we search and replace EFBFBD with say space?
How do i alter my below query to do so
SyntaxEditor Code Snippet
insert into Unicode_greeting_mload sel ID, CodePage, LCID, Description, "Schema".pt_utf8to16(Greeting,_UNICODE'003F'XC) from LATIN_greeting_mload
Please let me know if you think we can replace any character like SMP emoji that Teradata do not support into a space while loading using TDCH Fast load itself or i have to do TDCH -->Latin Table -->.Unicode table with conversion of replacement character to space
If teradata cannot handle certain characters then we should be able to someway eliminate it. We are loosing records and ultimately end user will be hesitent to choose teradata. Any suggestion from anyone for alternate solution to get atleast valid characters into table?
Thanks for your support.
Yes we reached GSC. Was trying to find answer interim :-)
They confirmed this as a drawback and they said everything you said here and unfortunately this is a bad news and we are upset as this was not known to us.
Strange is that TDCH fast load rejects the data while the same data if we use Fastload Utility is getting into the database with modified hex value for problamatic characters which are 4-byte in lenght. If Fastload utility can do then TDCH fastload plugin should also must have.
On a different note, can you help how to insert default timestamp value in a timestampe(0) data type field using fastload script. I tried to google but dint find any help. I saw one post where it say to use :DEFAULT in create table but i dont have requirement to create table everytime when fastloading.
i have a column called TDCH_DTTM TIMESTAMP (0) for audit purpose. How do i load this now? help me with Define & insert value part please.
According to your Select example and hex values 0xF09F9191, you have successfully loaded Emoji characters in UTF8 into Teradata Latin columns via ASCII session. Now, you can convert them to Unicode using pass-thorugh UDF, as your example below.
sel "schema".pt_utf8to16(Greeting,_UNICODE'003F'XC) from "schema".Unicode_greeting_mload ;
Because you are running TD 15.x, the UTF8 session character set only supports (3) bytes in UTF8. Instead of UTF8 session, you will need to switch to UTF16, so that the client can get the right UTF16 values on your client. Then, it is a matter of display Emoji characters on your client. With SQLA, you need to set the font "Segoe UI Symbol" on your Answer set pane. Highlight the Answer set pane, right click and go to Set Font. Then, you can display Emoiji characters on your SQL screen as shown below.
1 Unicode 5.0 💻
2 Unicode 5.0 👑