Multibyte character loading issue with TDCH

Database
Enthusiast

Multibyte character loading issue with TDCH

Issue:

Unable to load UTF8 - Unicode Character 'CROWN' (U+1F451) , Hex value: 0xF0 0x9F 0x91 0x91 (f09f9191)

Refer for details on Characer: http://www.fileformat.info/info/unicode/char/1f451/index.htm

We are using TDCH (Teradata connector for Hadoop) , internal.fastload plugin with UTF8 as the setting

 

Error table is loaded with 6705 error - (An illegally formed character string was encountered during translation)

 

Also, tried using batch.insert target plugin for TDCH & got same 6705 error.

I confirm that source (HIVE database on Hadoop) is UTF8 & i see proper hex value in source f09f9191

 

Please let me know what can be done to get this character into Teradata.

 

Thank You

 

Accepted Solutions
Enthusiast

Re: Multibyte character loading issue with TDCH

I undersatnd. TDCH is only fast load unfortunately. Anyway we are good.


tt122509 wrote:

Fastload itself does not support complex insert statements such as calling UDFs. If TDCH can create a script for mload/TPT, then it works. 



Thank You :-)

1 ACCEPTED SOLUTION
22 REPLIES
Teradata Employee

Re: Multibyte character loading issue with TDCH

If you are using TD16.00, the Unicode Pass Through feature will allow this character.

 

-Dave 

Teradata Employee

Re: Multibyte character loading issue with TDCH

TD16.0 / TTU16.0 support "Unicode pass-through" feature which allows storage of Unicode characters which are not included in the Teradata Unicode server character set repertoire.

 

Earlier versions require you to treat such data as a string of bytes and use special processing (e.g. UDFs / UDTs, client access modules) to work with it.

Enthusiast

Re: Multibyte character loading issue with TDCH

How do i enable that feature? is it something at DBA does meaning, do i have to ask my DBA to check for this feature and ask to install one if not available. Please specify details. Thank you for the support.

Enthusiast

Re: Multibyte character loading issue with TDCH

is it a unicode toolkit that has to be in place on TD installed server? we have set Unicode option for columns in TD and does that mean we have Unicode toolkit and probably a version upgrade? I do see in teradata below which means we have to have this version maybe.. 

Date: 2015-7-9
Version 1.5.3.2
Add udf_find16() for pass-through Unicode characters
Add more examples for pass-through UDFs including fexp and tpt (update)
 
Please let me know if my understanding is right or correct me. Thank You
Enthusiast

Re: Multibyte character loading issue with TDCH

We are using TD15. That said, should we install latest version of tool kit from here : - https://downloads.teradata.com/download/tools/unicode-tool-kit

Is it open source and not part of Teradata certified tool kit?

 

Also, we couldnt find exactly what UDF to install and clear instruction post downloading latest version from above path. Can you tell if there are any instrutcions?

 

Thank You

Teradata Employee

Re: Multibyte character loading issue with TDCH

For td15, you will need to import the UTF-8 character data into Latin columns into a staging table in the DBS using the ASCII character set. The UTK has a pass through UTF8->UTF16 UDF which can be used to convert the emoji without error and store them into the final table which has Unicode columns. Note that the UTK is supported by the Global Support Center only.

 

Although Unicode Pass Through it is the best practice going forward, td16.00 is not yet GCA.

 

-David Craig

Enthusiast

Re: Multibyte character loading issue with TDCH

Thanks David.

Installed "udf_utf8to16" on TD 15.00.04.04

Successfully tested a testcase - SyntaxEditor Code Snippet

select cast(char2hexint("schema".pt_utf8to16(_LATIN'414200E38182414243'XC,_UNICODE'003F'XC)) AS VARCHAR(100));

Ran the mload script available in pass-through UDF document - > Page Number 4 EXACTLY

Loaded a record which has emoji - CROWN (UTF8 Hex -f09f9191) successfully. But, finally character is loaded as UNTRANSLATABLE character :-(

 

Please suggest what to do get this character exactly as CROWN.

 

Thank You

Enthusiast

Re: Multibyte character loading issue with TDCH

Thanks David.

Installed "udf_utf8to16" on TD 15.00.04.04

Successfully tested a testcase - SyntaxEditor Code Snippet

select cast(char2hexint("schema".pt_utf8to16(_LATIN'414200E38182414243'XC,_UNICODE'003F'XC)) AS VARCHAR(100));

Ran the mload script available in pass-through UDF document - > Page Number 4 EXACTLY

Loaded a record which has emoji - CROWN (UTF8 Hex -f09f9191) successfully. But, finally character is loaded as UNTRANSLATABLE character :-(

 

Please suggest what to do get this character exactly as CROWN.

 

Thank You

Tags (3)
Teradata Employee

Re: Multibyte character loading issue with TDCH

The UDF input should be the UTF8 Hex (f09f9191) for the Unicode Character 'CROWN' (U+1F451)':

 

select char2hexint("schema".pt_utf8to16(_LATIN'f09f9191'XC,_UNICODE'003F'XC));

 

What is the result?