Does one need unicode compression in Tredata 14.0

Database
Enthusiast

Does one need unicode compression in Tredata 14.0

Iknow i Teradata 13.x, followng syntaxt is provided to implement compression for unicode

CREATE TABLE Customer


(Customer_Account_Number INTEGER,

Customer_Name VARCHAR(50),


Customer_Address CHAR(200) CHARACTER SET UNICODE

COMPRESS USING TransUnicode ToUTF8

DECOMPRESS USING TransUTF8To Unicode);

PAGE

But I am reading somehwere in 14.0, Tredata is storing unicode in UTF8, so this compression/decompression should not be required? I am creating brand new tables in 14/0.

Appreciate youir response.

10 REPLIES
Enthusiast

Re: Does one need unicode compression in Tredata 14.0

Hi,

AFAIK, TRANSUNICODETOUTF8 is a TD 13.10 enhancement. These functions are also present in TD 14.0 and used to compress and decomress Unicode. I have found the below reference how Unicode is stored within TD and this documentation is of TD 14.0.

http://www.info.teradata.com/htmlpubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1143_111A/ch05...

Can you please share your source of information that TD 14.0 stores unicode as UTF8?

Khurram
Enthusiast

Re: Does one need unicode compression in Tredata 14.0

http://goldenorbit.wordpress.com/2013/03/09/latin-utf8-and-utf16-with-teradata/

  • Unicode strings are stored as UTF16 on disk anyway. Yes, space is wasted; that’s why there is an algorithm compression function to just compress UTF16 to UTF8 in version 13. Only version 14 can store UTF8 on disk.
Senior Apprentice

Re: Does one need unicode compression in Tredata 14.0

Only version 14 can store UTF8 on disk

This is obviously wrong.

But the first two sentences are correct :-)

Enthusiast

Re: Does one need unicode compression in Tredata 14.0

Hi,

I was going through some material and I came to know that TRANSUNICODETOUTF8  can only be used to compress UNICODE columns which contain ASCII LATIN 7 Bit data. So I guess if TD 14 is storing Unicode in UTF8 then it will require the data in ASCII LATIN, else it will store it as UTF16.

Khurram
Enthusiast

Re: Does one need unicode compression in Tredata 14.0

Hi

I was going through some material and I came to know that TRANSUNICODETOUTF8 can only be used to compress UNICODE columns which contain ASCII LATIN 7 Bit data. So I guess if TD 14 is to store Unicode in UTF8 then it will require the data in ASCII LATIN, else it will store it as UTF16.

Khurram
Senior Apprentice

Re: Does one need unicode compression in Tredata 14.0

Hi Khurram,

TransUnicodeToUTF8 works for any UTF16 character, but if there's a lot of Latin chars it simply compresses better:

Most of the Latin chars are stored in one byte in UTF8 while some of the more exotic chars might need more than 2 bytes.

Enthusiast

Re: Does one need unicode compression in Tredata 14.0

In the context of table joins, I feel that we need to be careful so that both  the joining fields are of the same characters, else there will be performance degrade. I have heard quite a number of cases.

Raja

Senior Apprentice

Re: Does one need unicode compression in Tredata 14.0

Hi Raja,

this only relates to LATIN vs. UNICODE, of course they hash differently and thus you can't get PI-to-PI joins. But algorithmic compression doesn't change the charset, only the storage (btw, you can't compress a PI column).

Joining on columns with different character sets is a sign of bad database design :-)

Enthusiast

Re: Does one need unicode compression in Tredata 14.0

Hi Diether,

What i could understabd by reading some manuals is that : Space requirement for Unicode is double than that of latin. For joins, why do we say that they hash differently ? beasue the values in both of them will be different (latin might not be able to store any special characteres whereas Unicaode can). Could you give an example here ?

Can we not use MVC on the Unicode columns ? Can we use the Unicode columns in a where condition and does that perform good ? are there any other issues/considerations while usng unicode columns that we should consider ?

Unfortunately, there is not much details about these in the manuals, could you direct me to one if you have it ?