Iknow i Teradata 13.x, followng syntaxt is provided to implement compression for unicode
CREATE TABLE Customer
Customer_Address CHAR(200) CHARACTER SET UNICODE
COMPRESS USING TransUnicode ToUTF8
DECOMPRESS USING TransUTF8To Unicode);
But I am reading somehwere in 14.0, Tredata is storing unicode in UTF8, so this compression/decompression should not be required? I am creating brand new tables in 14/0.
Appreciate youir response.
AFAIK, TRANSUNICODETOUTF8 is a TD 13.10 enhancement. These functions are also present in TD 14.0 and used to compress and decomress Unicode. I have found the below reference how Unicode is stored within TD and this documentation is of TD 14.0.
Can you please share your source of information that TD 14.0 stores unicode as UTF8?
Only version 14 can store UTF8 on disk
This is obviously wrong.
But the first two sentences are correct :-)
I was going through some material and I came to know that TRANSUNICODETOUTF8 can only be used to compress UNICODE columns which contain ASCII LATIN 7 Bit data. So I guess if TD 14 is storing Unicode in UTF8 then it will require the data in ASCII LATIN, else it will store it as UTF16.
I was going through some material and I came to know that TRANSUNICODETOUTF8 can only be used to compress UNICODE columns which contain ASCII LATIN 7 Bit data. So I guess if TD 14 is to store Unicode in UTF8 then it will require the data in ASCII LATIN, else it will store it as UTF16.
TransUnicodeToUTF8 works for any UTF16 character, but if there's a lot of Latin chars it simply compresses better:
Most of the Latin chars are stored in one byte in UTF8 while some of the more exotic chars might need more than 2 bytes.
In the context of table joins, I feel that we need to be careful so that both the joining fields are of the same characters, else there will be performance degrade. I have heard quite a number of cases.
this only relates to LATIN vs. UNICODE, of course they hash differently and thus you can't get PI-to-PI joins. But algorithmic compression doesn't change the charset, only the storage (btw, you can't compress a PI column).
Joining on columns with different character sets is a sign of bad database design :-)
What i could understabd by reading some manuals is that : Space requirement for Unicode is double than that of latin. For joins, why do we say that they hash differently ? beasue the values in both of them will be different (latin might not be able to store any special characteres whereas Unicaode can). Could you give an example here ?
Can we not use MVC on the Unicode columns ? Can we use the Unicode columns in a where condition and does that perform good ? are there any other issues/considerations while usng unicode columns that we should consider ?
Unfortunately, there is not much details about these in the manuals, could you direct me to one if you have it ?