I have an issue where already existing teradata table is defined with columns as LATIN. The table is present in Datawarehouse since last 3 years. Now new set of datas with Chinese characters are flowing in. I know that defining the column to UNICODE will load chinese characters. Is there any different approach?
Not really, unless you want to design, create, manage your own character set translation tables - really not worth it.
Also remember that not only do your table columns have to change to be UNICODE, but:
- any processes that load or retrieve data will also need to use UFT8 or UTF16 as their session character set,
- (because of the above) you might find exported files containing these columns have longer records,
- any columns in other tables that these UNICODE columns are joined to (or otherwise used with - e.g. COALESCE) should also be converted.
Thank you Dave for the quick response. I don't have too much technical knowledge about Teradata as I am more into ETL. I have couple of questions.
1) Is there any way Teardata DBA's can do something in the Teradata end to support these characters? In other words without touching the existing tables
2) If we need to change the existing tables, do we need to drop and create new tables with all Varchar columns as Unicode
3) Also is there any disadvantage for creating tables with all varchar columns as Unicode as we are not sure how the data will be in future? Or is there any way to accept foreign characters without having Unicode for individual columns
Appreciate your help on this.
Answers to your questions:
1) Is there any way Teardata DBA's can do something in the Teradata end to support these characters? DW: Yes, define the columns as Unicode (sorry, couldn't resist that).
2) If we need to change the existing tables, do we need to drop and create new tables with all Varchar columns as Unicode DW: Basically yes, you cannot use ALTER TABLE to change from LATIN to UNICODE. In practise you'd create a new table, insert/select all data from 'old' to 'new', drop 'old' and rename 'new' to 'old'.
3) Also is there any disadvantage for creating tables with all varchar columns as Unicode as we are not sure how the data will be in future? DW: disk space, Unicode uses twice as many bytes on disk (ignoring compression).
IMHO - if a new business requirement is to support Chinese characters then part of meeting this new requirement is to make these changes.
I created a dummy table with column as UNICODE but unable to insert record as shown below
This data is present in Oracle as a box but when loading to teradata , getting an error as  The string contains an untranslatable character. Is there a way to fix it?
In the tool that you are using for this test, what is your character set?
If using TD Studio then check the connection properties.
In SQLA then look in the 'advanced' tab for dot net connections or the odbc dsn.
For CLI tools what have you specified?
We are using Informatica ETL tool to load data. We have set UTF-8 as the odbc connection.Below is the odbc enrty
Now all chinese characters are loaded correctly, but the one I mentioned previously is getting rejected.
To be honest not sure. If other characters are loading ok then it looks like your setup is correct.
I did a quick Google search for 'informatica utf8 data not loading' and came up with many hits> You might try:
This emoji is a Unicode "supplemental plane" character U+1F383. You can load this in TD16 with Unicode pass-through feature. For earlier releases, Teradata does not support characters that require more than 2 bytes; see the "Unicode toolkit" available from http://downloads.teradata.com for some ideas regarding workarounds.