To my knowldege, TD's Latin character column set supports ASCII and Western European characters . TD manual says Latin and ASCII characters are identical on all code points except the 80-FF range, where Teradata LATIN defines additional West European letters. TD Code points outside the seven-bit ASCII range result in data that may not behave as intend.
I am running into a situation where a LATIN character column values contain Hexadecimal values like 0x81, 0x91, 0x94. Hexadecimal values mentioned earlier were displayed when CHAR2HEXINT() function was applied on the column.
TD manual (ASCII->Latin mappings) says 0x81 is DIAERESIS, 0x91 is <control> PRIVATE USE ONE and 0x94 is <control> CANCEL CHARACTER. Not sure what these characters represent.
Is it good practice to store above mentioned extended ASCII values in a character (latin) column ? Will latin cause character translation issues especially when the data is exported from DB-> file in this case ? Else should such values be stored in a UNICODE character column to get accurate results ?
I am assuming data should be exported using UTF8/UTF16 char set in above mentioned case to get accurate results. Any downsides ? Any expert opinion ?
Using ASCII as the session character set tells the database that the input is either 7-bit ASCII (where the characters match) or that the client is responsible tor the translation to Teradata LATIN. If you have extended characters, preferred practice is to set the session character set to Unicode (UTF8 or UTF16) so that the external characters are translated to the proper internal code point (which might then require storage as UNICODE). Second choice is to use a special or custom session character set translation such as LATIN1252_0A (Windows), if your client application can't support Unicode.
Or you can choose to mislead the driver into passing single-byte characters in the x'80'-x'FF' range unchanged, and (particularly if all your clients use the same single-byte client character set) it will appear to mostly work.The collation sequence (sort order) might not be quite what you expect for the extended characters; but if you don't care how accented characters sort then it may be acceptable.
I try to solve an issue when loading data with odd characters, due to hypothetical copy of a text coming from windows ...
Is the UNICODE definition of a character column the universal answer ?
I wonder if activation of a "LATIN 1252" charset for database is still of some help ?
There's certainly not one universal answer.
Note that choice of column type (LATIN or UNICODE) is separate from session character set translation (e.g. UTF8 or LATIN1252_0A) when loading. Also note that you will need to make an appropriate choice of translation when you query the data (and that choice may differ).
For other ideas, review the Unicode Tool Kit, and for TD16 consider the "Unicode pass-through" feature.
we still use TD15.00.
what about setting DBS control 104. AcceptReplacementCharacters to true ?
i couldn't find comments on that point,