I would like to ask about PDM on small (var)char() columns.
Is it better to define these columns as char() or varchar() if the length is until 5-10 bytes or not?
We will gain something to tablespace or on performance issues if these are used on Multiload/Fastload / group by /union/ order by ?
What about compression factor?
Thank you very much.
I think the important factor would be how much variance you are expecting in that specific data field. VARCHARs takes 2 extra bytes in the table header, so if you are expecitng the data to be of variable length and the frequency is bit high then VARCHAR will be more suitable than CHAR.
VARCHAR is better than fixed-width CHAR with respect to the Teradata wire protocol.
(1) Null CHAR values will consume the entire width of the fixed-width CHAR value when they are sent over the wire. In contrast, Null VARCHAR values will only consume two bytes when they are sent over the wire.
(2) If you use fixed-width CHAR columns with the UTF8 session character set, then the Teradata Database's Export Width feature will pad the character values with more space characters than you expect.
I believe Qaisar meant that the 2 extra bytes are in each row. If you have more than 2 trailing bytes of pad (1 Unicode space, or 2 Latin spaces), then varchar should take up less space. Both types should compress the same, but that depends on the technique.
There are also considerations for aligned row formats. See the database design reference for details.
Can i ask something more specific. (TD version 14)
I made one test on a table which has almost 500 columns and 1 M rows. (it as a source table).
Almost 200 of them are varchar(1) and compressed.
1)I changed to char(1) the varchar(1) columns and compressed , i have on perm space 2,186,168,832 bytes instead of 2,127,772,672 , more space? Is this logical?
2)Due to this is a source table which is loading by fastload , will i gain anything if i change to char(1)
. Will i manage to transfer less amount of data?So, less time.
Thank you for your time.
CHAR(n) always need exactly n bytes for LATIN or n*2 bytes for UNICODE while VARCHARs need 1 or two bytes for each character stored. But VARCHARs also need two byte additional overhead (a word indicating the length).
Assuming your columns are mainly Y/N or similar in LATIN a single character needs 3 bytes as a VARCHAR, but only 1 byte as CHAR. You never ever do a VARCHAR(1) on Teradata :-)
When the actual length of a string greatly varies and the average is at least 3 characters than the maximum you start using VARCHARS.
Q2: FastLoad calculates the number of records per block based on the maximum defined length.
If you load VARTEXT there should be no difference, but if you load DATA each VARCHAR will add 2 byes more than CHAR, so in your case saving 400 bytes per row will result in a higer number of rows per block. To check look for Number of recs/msg in the Insert Phase.
There are many factors to consider when sizing a row. For character types the factors include, server character set, nullability, compression, variable length, alignment, etc. If you have not done so already, perform the the sizing procedure explained in the 'Database Design' reference under the 'Row size Calculation' section.
Thank you for your answers.
Noeth , one question about Q1 , i agree about char(1) and varchar(1) . But as the example that i provided you , when i change from varchar(1) to char(1) , the size of the table increased.
This is why i asked. I was expecting less space since i change almost the half of the table. ( i beleive that compression has nothing to do ,correct ? ) Any other thoughts about this behaviour?
About Q2 , the same happens on TPT?
Thank you once more.