How to handle Emoji characters in teradata. we are extracting data using Java JDBC with UTF-8 mode. in file the data was wrtitten as "King▒<9F><99><87> Pearl▒<9F><92><8B>" and the actual data was "King🙇 Pearl💋" .
when loading this content teradata using TPT load with UTF-8 charset, getting following error "TPT19003 Delimited Data Parsing error: Invalid multi-byte character ".
Please give some solution.
Please start by reading the documentation on the Teradata Database16.00 Unicode Pass Through feature in chapter 9 of the International Character Set Support reference: http://www.info.teradata.com/download.cfm?ItemID=1007186.
Also, the 3- byte UTF-8 code sequences are invalid. The first byte is probably missing. Are you refereing to the following Unicode characters?:
Please supply the release versions for the client and database software.
Emoji characters in your example indicates surrogate code points and requires 4-byte UTF16 characters (e.g. 🙇 =U+D83D, U+DE47). Those characters are not supported by Teradata 15.x. With Teradata 16.0 Unicode Pass Through (UPT), you can store and retrieve any Unicode characters including those Emoji characters. Please refer to the new version of the orange book that discuss UPT in Section 6.15.
Getting Started: International Character Sets and the Teradata Database (version G01, 6/5/2017).
With the Unicode Tool Kit (UTK), you can also insert/retrive any Unicode characters even on Teradata database 15.x/14.x. UTK can be downloaded from the developer exchange.
After download, go to:
..\utk_release126.96.36.199\04 TranslationUDFs\01 Teradata UDFs\suselinux-x8664\udf_installation\pass-through UDFs
In the latest version of UTK 1.6.0.x, you can find out the Internationalization orange book as well.
Thank You TT ...
So can't we handle these emoji characters in Fast Load on 15.x. Is there any way we can load these values through TPT using Load ?
Q: So can't we handle these emoji characters in Fast Load on 15.x.
A: No, without staging tables. If you take two-step loading process, Yes. First, load eveything into Latin staging tables using ASCII session. Then, insert-select with UDF calls and move/convert to Unicode target tables.
Q: Is there any way we can load these values through TPT using Load ?
A: No - with DBS 15.x and TTU 15.x. Yes - with DBS 16.0 and TTU 16.0.