Extended ASCII characters in LATIN character column

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Extended ASCII characters in LATIN character column

 

To my knowldege, TD's Latin character column set supports ASCII and Western European characters . TD manual says Latin and ASCII characters are identical on all code points except the 80-FF range, where Teradata LATIN defines additional West European letters. TD Code points outside the seven-bit ASCII range result in data that may not behave as intend.

 

I am running into a situation where a LATIN character column values contain Hexadecimal values like 0x81, 0x91, 0x94. Hexadecimal values mentioned earlier were displayed when CHAR2HEXINT() function was applied on the column.

 

TD manual (ASCII->Latin mappings) says 0x81 is DIAERESIS, 0x91 is  <control> PRIVATE USE ONE and 0x94 is  <control> CANCEL CHARACTER. Not sure what these characters represent. 

 

Is it good practice to store above mentioned extended ASCII values in a character (latin) column  ? Will latin cause character translation issues especially when the data is exported from DB-> file in this case ? Else should such values be stored in a UNICODE character column to get accurate results ? 

 

I am assuming data should be exported using UTF8/UTF16 char set in above mentioned case to get accurate results.  Any downsides ? Any expert opinion ? 

1 REPLY
Teradata Employee

Re: Extended ASCII characters in LATIN character column

Using ASCII as the session character set tells the database that the input is either 7-bit ASCII (where the characters match) or that the client is responsible tor the translation to Teradata LATIN. If you have extended characters, preferred practice is to set the session character set to Unicode (UTF8 or UTF16) so that the external characters are translated to the proper internal code point (which might then require storage as UNICODE). Second choice is to use a special or custom session character set translation such as LATIN1252_0A (Windows), if your client application can't support Unicode.

 

Or you can choose to mislead the driver into passing single-byte characters in the x'80'-x'FF' range unchanged, and (particularly if all your clients use the same single-byte client character set) it will appear to mostly work.The collation sequence (sort order) might not be quite what you expect for the extended characters; but if you don't care how accented characters sort then it may be acceptable.