Inserting and Exporting Chinese Data from Teradata

Database
Enthusiast

Inserting and Exporting Chinese Data from Teradata

Hi All,

 

We need to store and export data from Teradata having Chinese characters.The data will be exported by an export utility like FastExport/TPT.For testing purpose I created the table in SQL assistant as shown below:

 

CREATE SET TABLE charsettest ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
id INTEGER,
address VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL)
PRIMARY INDEX ( id );
INSERT INTO charsettest VALUES(1,'李白《静夜思》');

The data got inserted successfully.I made the install flag as Y in DBC.charTranslationsV as Y for the below Client character sets:

  1. SCHGB2312_1T0
  2. SCHINESE9360_6R0
  3. TCHINESE9360_6R0
  4. TCHBIG5_1R0

Now coming to the problems I am facing currently:

  1. If I try to set the client character set for the session in bteq login using  .SET SESSION CHARSET to any of the chinese character sets mentioned and then select data from the table I am getting the error below                                                             *** CLI error: MTDP: EM_CHARNAME(227): invalid character set name specified. *** Return code from CLI is: 227
  2. If I keep the default client character set(ASCII/UTF8) and do not set anything I am getting the data as shown below:          ??<>??<>
  3. When I try to export the data using FastExport and set the client character set using -c option during the script invocation      using the command    fexp -c SCHINESE9360_6R0 <ScriptName I am getting an error whenever I try to set anything apart  from UTF8 I am getting the below error:                                                                                                                              19:32:37 UTY1006 CLI error: 227, MTDP: EM_CHARNAME(227): invalid character  set name specified.

Is there anything I am missing here?Also is it possible to export a chinese character set in FastExport  or TPT?

 

Regards,

Indranil Roy

 

5 REPLIES
Teradata Employee

Re: Inserting and Exporting Chinese Data from Teradata

This CLI error  "*** CLI error: MTDP: EM_CHARNAME(227): invalid character set name specified. *** Return code from CLI is: 227" is typically returned when the character set is not installed properly. Did you

perform a full restart of the Teradata Database, using the tpareset utility? If so, check the system event log for any 2900 errors during the restart for installing character sets.

 

Thanks,

 

-Dave

Enthusiast

Re: Inserting and Exporting Chinese Data from Teradata

The problem is resolved by setting the session character set of the ODBC driver to UTF8.

Also data is exported properly(including all the Chinese characters) using FastExport by using "UTF8" character set using -c option while invokation of the FastExport command.

Teradata Employee

Re: Inserting and Exporting Chinese Data from Teradata

Yes, UTF8 is in wide use on the web. It is also a better choice because the TD16 Unicode Pass Through (UPT) feature will allow any Chinese character to be loaded as a pass through, or supported, character. This is even true for the soon to be released Unicode version 10.0. Note that the stand-alone fastexport utility does not currently support UPT, use TPT instead.

 

The following Chinese character sets will only support the 2-byte ideographs from the 6.0 BMP:

  1. SCHGB2312_1T0
  2. SCHINESE9360_6R0
  3. TCHINESE9360_6R0
  4. TCHBIG5_1R0

 

-Dave

Enthusiast

Re: Inserting and Exporting Chinese Data from Teradata

Hi Dave,

I was wondering why we need separate Chinese client character set(Ex:SCHINESE9360_6R0)

as given below if we are able to support all chinese Characters by using "UTF8" itself?

Can you please mention a use case where we need this Chinese Character set and UTF8 won't solve the purpose?

 

 

 

Regards,

Indranil Roy

Teradata Employee

Re: Inserting and Exporting Chinese Data from Teradata

It depends on the Windows code page in use. SCHINESE9360_6R0 is used for code page 936. If the windows platform supports UTF8 (which all should today), then use it.