Hi All ,
We are getting some chineese characters in our source files which Teradata is displaying in a different fomat.We have declared the paricular column as UNICODE(character set) which contains ths chineese character. AS a result Teradata is able to store the record but its displaying in different format. From dfferent posts what ever we learned ,we tried declaring the character set as KANJI as well, but in that case all the records having chineese character sets are getting rejected.
From some posts we got to know that we have to set the installation flag in charactertranslation table of DBC as Y for some specific session character set for chineese(traditional or simplified chineese) .
Please give your inputs.
To be honest I dont know Chinese. But just to pitch in. I heard that there are Chinese Simplified and Chinese Traditional. So those different geometrical boxes maybe different.
Thanks Raja for the response. Yes you are correct . As I have posted earlier SCHGB2312_1T0 and SCHINESE9360_6R0 are for Simplified chineese and rest 2 TCHINESE9360_6R0 and TCHBIG5_1R0 are for traditional chineese. I have tried using all these as session charatcer set while loading and the server character set I have kept as Unicode. But the out put is boxes only instead of character. What i doubt is that ,Teradata is able to read the chineese characters but its converting and displaying in different format.
Plese suggest if any one have faced similar problem.
Note that both the load and the query sessions must specify an appropriate client character set. Often it is best to use UTF16 (or UTF8 if data is mostly LATIN numbers and letters) for queries.
The character sets noted above are intended primarily for loading / exporting data encoded in a specific way.
Hi Fred ... Thanks for the reply ...
The requirement that we have here is: Teradata have to store some Asian/Chinese character as it is in the source file .
I followed various posts on this and as required I changed the instalation falg to Y for the above mentioned client character sets.
Even I tried with the UTF8 as the character set in my bteq script . But still Teradata is storing and displaying in different format( as boxes) .
Could you please provide some more insights on this.
You can use CHAR2HEXINT() function to determine what is actually being stored for the column. Verify that the expected Teradata UNICODE character values are present. If not, you will need to correct the load.
For example, the string of characters given above should be stored as the following six UNICODE values (spaces added for readability):
4F8B 5B50 6CE8 97F3 7B26 865F
First change the character set of the cureent Teradata session where query is fired using
.set session charset "utf8"
select top 10 column_name from table_name where translate_chk(column_name using Unicode_to_latin)>0;
Then Using TRANSLATE_CHK function you can check what is actually stored in the table