You can select up to 2048 columns with a combined size of a bit less than 64KB.
This means defined maximum not actual size.
If your session uses UTF8 you need to multiple the defined size times 3, for UTF16 times 2.
As per your post, the Row Size is decided by the sum of the user-defined column size, rather than the actual data size (which can be much lesser than the column size specification). Say 02 columns are defined Varchar(10) and Char(20). Even though the data in the 02 columns are "A" & "B", the RowSize will be calculated as 30 Bytes (Latin), 60 Bytes (UTF-16), 90 Bytes (UTF-8).
From a performance perspective, If I have to join a Latin & Unicode Column, is it advisable to join them simply or, convert one of the column via Unicode_To_Latin or Latin_To_Unicode. As far as I am aware, Latin_To_Unicode works always, whereas Unicode_To_Latin is not reliable owing to the fact that the space necessary is reduced by 1/2.
Also, an above post specifies "Thanks Dieter!! After changing the CHARSET to ASCII from UTF8, I was able to run the SQL".
Here, is "ASCII" = "LATIN" ? From what I gather, there is 02 character set (Latin & Unicode) with UTF-8 and UTF-16 being the 02 encodings for Unicode character set. Kindly correct me , if I am wrong.
Data modelers should be far-sighted from beginning and see to it that for joining cases if Unicode_To_Latin or Latin_To_Unicode conversions be avoided owing to performance issues.
Same issue with TD14. Character set was LATIN.
Had a column that was VARCHAR(3000). Using the below function gave error 3577: row size or sort key size overflow
Solution: (cast to appropriate size of data value)
@noeth: "If your session uses UTF8 you need to multiple the defined size times 3, for UTF16 times 2."
UTF-8 uses 1-4 bytes per glyph, so shouldn't Teradata allocate 4 bytes (not 3) per defined char for UTF-8?
UTF-16 uses (I believe) 2 or 4 bytes per glyph, so shouldn't Teradata allocate 4 bytes (not 2) per defined char for UTF-16?
In practice, I see evidence that TD is using the factors Dieter mentioned, but they don't seem correct.
I suspect the answer is that TD's UNICODE, which I've understood to be a proprietary DBCS, is the set of 2-byte characters from UTF-16. If that's the case, it need not do any expansion to represent it as true UTF-16 and would only need 3 bytes to expand to true UTF-8.
Confirmation or clarification of this would be hugely appreciated!
Teradata currently does not support surrogate code points (4-byte UTF16) defined in Unicode 3.1 or above.
See the "Unicode Server Character Set" topic in the "International Character Set Support" manual.
You can download or browse at www.info.teradata.com.
The errror can be resolved using the translate.
CAST ((TRANSLATE (s .sqltextinfo USING UNICODE_TO_LATIN )) AS VARCHAR (31000 )) AS SQLTEXTINFO
Thanks, Fred. I found the UTF-16 warning! I don't see a similar warning that explains that 4-byte UTF-8 encodings are not supported. But the "UTF8 Multibyte Sequences" doc only shows 1, 2, and 3-byte sequences ... implying TD does not support the 4-byte sequence!
Given that I think TD is using UCS-2 as its server character set, that makes sense.