We have a table that is about 690MB - 31M rows on a Teradata system on AWS. When we create a PPI version of this table (Same DDL, only a PARTITION BY clause is added) and insert the data, the size of the new PPI table is now ~1.8GB. No explicit MVC or BLC on the table. Teradata system pretty much has default settings from the AWS marketplace.
Partition is 1 day interval over a 5-year range on a DATE column and the PI is well distributed (Skew=1%).
Tried this with another table and got similar results (6GB non-PPI table became 12GB when rows were inserted into the PPI version). Any thing else I can check to see why the table size is growing so large with PPI?
That seems 'unusual'. You may find this is due to re-blocking the data as it moves from NPPI to PPI - but I wouldn't have expected this much of a change.
Use the 'file system info' macros to do some investigation. Run the following on each table:
Some key columns to look at in the two outputs:
Comp Method Comp State Est Comp Ratio(%) Est Uncomp DBs(%) (@cols 7-10)
Min DB Size Avg DB Size Max DB Size Total DBs Lrg Cyls Sml Cyls (on the very right-hand side of the output)
690MB and 31M rows calculate to less than 23 bytes per row, subtracting the overhead of 14 bytes it's 9 bytes data, are you sure there's no BLC?
PPI in your case (2 byte partition number) should add a bit more 31*2 MB.
Regarding doubling the size in your 2nd example double check if the definition is actually the same, might be Fallback vs. Non Fallback
You were right, it looks like BLC is enabled by default for TD on AWS. Also, Fallback is automatic and can't be turned off in AWS. All tables have ProtectionFlag='Y' in dbc.tablesv.
Ran the macro on the 4 tables and it returned the following...Not quite sure what I am looking for
|Comp Method||Comp State||Est Comp Ratio(%)||Est Uncomp DBs(%)||Min DB Size||Avg DB Size||Max DB Size||Total DBs||Lrg Cyls||Sml Cyls|
|TABLE2||Comp Method||Comp State||Est Comp Ratio(%)||Est Uncomp DBs(%)||Min DB Size||Avg DB Size||Max DB Size||Total DBs||Lrg Cyls||Sml Cyls|
From the macro output the initial reason for the difference in table sizes is because of the average block size.
Table1: the larger table has avg block size of about 2.6 times the smaller table
Table2: the larger table has avg block size of about 1.8 times the smaller table
Sorry to ask this but can I check:
For each table, do the two copies of the table contain the same data - not only same row count but the same data values?
To break down the macro output a bit further, try the following:
This time, as well as the columns that you show below can you also include column TID_TAI (column#6). This will give the break down between Primary data and Fallback data.
When you look at the compression ratio you will notice that the old table had a much higher compression ratio,
e.g. 95.04 vs. 87.6 means 1:20 vs. 1:8
Now applying some math results in
SELECT 100 * 6743 / 2048 AS old_MB, -- size in MB 261 * 6822 / 2048 AS new_MB, Cast(old_MB / (1-0.9504) AS INT) AS old_uncomp, -- uncompressed size in MB Cast(new_MB / (1-0.876 ) AS INT) AS new_uncomp old_MB new_MB old_uncomp new_uncomp 329 869 6,633 7,008 SELECT 175 * 34769 / 2048 AS old_MB, 306 * 35043 / 2048 AS new_MB, Cast(old_MB / (1-0.9158) AS INT) AS old_uncomp, Cast(new_MB / (1-0.8494) AS INT) AS new_uncomp old_MB new_MB old_uncomp new_uncomp 2,970 5,235 35,273 34,760
The uncompressed size of the 1st table is just a bit larger than before (due to partitioning) and the copy of 2nd table is a bit smaller due to larger datablocks (and thus a bit less overhead).
Now the main question is: Why is compressin so different?
Looks like there was a lower compression level for the new copies, maybe there was a change in dbscontrol compression settings ...
Sorry for the late reply. The customer took down the Teradata instance that was showing a huge difference in current perm between PPI and non-PPI and brought up a new Teradata instance (Same size, changed from Enterprise to Advanced Edition). Unfortunately, I was not able to execute the populatefsysinfotable macro before they took it down.
On the new database, we are now seeing what we usually expect, PPI table size is about 1.2x larger. So, the difference in table size must have been due to compression??
There were no changes to dbscontrols and data in table was exactly the same. Here's the macro output now for the larger of the 2 tables:
|TID_TAI||Comp Method||Comp State||Est Comp Ratio(%)||Est Uncomp DBs(%)||DBs In 1 to 8||DBs In 9 to 24||DBs In 25 to 64||DBs In 65 to 120||DBs In 121 to 168||DBs In 169 to 216||DBs In 217 to 256||DBs In 257 to 360||DBs In 361 to 456||DBs In 457 to 512||DBs In 513 to 760||DBs In 761 to 1024||DBs In 1025 to 1304||DBs In 1305 to 1632||DBs In 1633 to 2048||Min DB Size||Avg DB Size||Max DB Size||Total DBs||Lrg Cyls||Sml Cyls|
Thank you both for all your help!
No problem, thanks for the update.
Just a thought (and unfortunately there's no way to check it now, but) I wonder if the maximum block size on your previous tables was always the same? Just looking back at the earlier filesysinfo output the 'maximum block size' for the two tables (small vs. large in each pair) were vastly different.
Typically the max block size is 255 sectors (127.5KB) although you can increase that to 2047 (@1MB) on current releases.
Interestingly, even on your new tables there is still a noticeable difference in compression ratio between the two tables (83% vs. 93%). To be honest I haven't spent time comparing two tables like this but that would seem to be quite a big difference given that the bulk of the data in the two tables is the same. yes there will be a difference in the byte patterns of the corresponding row in each table, because of the partition number in the row header, but my gut reaction (nothing more technical than that!) says that I wouldn't have expected that to make a 10% difference.
Not sure there's much more we can do here.
Block compression ratios are quite sensitive to what is grouped together in the block. If no other parameter changes were introduced, then it is possible that the ordering of the PI in some way resulted in bringing together data with more commonality in the blocks. When the PPI is introduced, the sort order of data in the table is the partition number followed by the rowhash rather than just the rowhash. This sort order especially with small granularity partitions may result in separating common data in some way that results in a lower compression ratio.