PPI table size is about 2 times greater than non-PPI table

Teradata Database on AWS
Teradata Employee

PPI table size is about 2 times greater than non-PPI table

We have a table that is about 690MB - 31M rows on a Teradata system on AWS.  When we create a PPI version of this table (Same DDL, only a PARTITION BY clause is added) and insert the data, the size of the new PPI table is now ~1.8GB.  No explicit MVC or BLC on the table.  Teradata system pretty much has default settings from the AWS marketplace.

    Partition is 1 day interval over a 5-year range on a DATE column and the PI is well distributed (Skew=1%).

Tried this with another table and got similar results (6GB non-PPI table became 12GB when rows were inserted into the PPI version).  Any thing else I can check to see why the table size is growing so large with PPI?

 

 

9 REPLIES
Junior Apprentice

Re: PPI table size is about 2 times greater than non-PPI table

Hi,

 

That seems 'unusual'. You may find this is due to re-blocking the data as it moves from NPPI to PPI - but I wouldn't have expected this much of a change.

 

Use the 'file system info' macros to do some investigation. Run the following on each table:

EXEC dbc.populatefsysinfotable('db-name','table-name','showblocks','s','','');

Some key columns to look at in the two outputs:

Comp Method    Comp State    Est Comp Ratio(%)    Est Uncomp DBs(%) (@cols 7-10)

Min DB Size    Avg DB Size    Max DB Size    Total DBs    Lrg Cyls    Sml Cyls (on the very right-hand side of the output)

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Junior Contributor

Re: PPI table size is about 2 times greater than non-PPI table

690MB and 31M rows calculate to less than 23 bytes per row, subtracting the overhead of 14 bytes it's 9 bytes data, are you sure there's no BLC? 

PPI in your case (2 byte partition number) should add a bit more 31*2 MB.

 

Regarding doubling the size in your 2nd example double check if the definition is actually the same, might be Fallback vs. Non Fallback

Teradata Employee

Re: PPI table size is about 2 times greater than non-PPI table

You were right, it looks like BLC is enabled by default for TD on AWS. Also, Fallback is automatic and can't be turned off in AWS.  All tables have ProtectionFlag='Y' in dbc.tablesv.

 

Teradata Employee

Re: PPI table size is about 2 times greater than non-PPI table

Dave,

Ran the macro on the 4 tables and it returned the following...Not quite sure what I am looking for

  Comp MethodComp StateEst Comp Ratio(%)Est Uncomp DBs(%)Min DB SizeAvg DB SizeMax DB SizeTotal DBsLrg CylsSml Cyls
TABLE1NON-PPI (690MB)MANUALC95.040.0581001126743760
 PPI (1.8GB)MANUALC87.60.021626132868221230
            
TABLE2 Comp MethodComp StateEst Comp Ratio(%)Est Uncomp DBs(%)Min DB SizeAvg DB SizeMax DB SizeTotal DBsLrg CylsSml Cyls
 NON-PPI (6GB)MANUALC91.5808175184347693320
 PPI (11GB)MANUALC84.940.018306344350435520
Junior Apprentice

Re: PPI table size is about 2 times greater than non-PPI table

Hi,

 

From the macro output the initial reason for the difference in table sizes is because of the average block size.

Table1: the larger table has avg block size of about 2.6 times the smaller table

Table2: the larger table has avg block size of about 1.8 times the smaller table

 

Sorry to ask this but can I check:

For each table, do the two copies of the table contain the same data - not only same row count but the same data values?

 

To break down the macro output a bit further, try the following:

EXEC dbc.populatefsysinfotable('db-name','table-name','showblocks','m','','');

This time, as well as the columns that you show below can you also include column TID_TAI (column#6). This will give the break down between Primary data and Fallback data.

 

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Junior Contributor

Re: PPI table size is about 2 times greater than non-PPI table

When you look at the compression ratio you will notice that the old table had a much higher compression ratio,

e.g. 95.04 vs. 87.6 means 1:20 vs. 1:8

 

Now applying some math results in

 

SELECT 
   100 * 6743 / 2048 AS old_MB, -- size in MB 
   261 * 6822 / 2048 AS new_MB,
   Cast(old_MB / (1-0.9504) AS INT) AS old_uncomp, -- uncompressed size in MB
   Cast(new_MB / (1-0.876 ) AS INT) AS new_uncomp 

old_MB  new_MB  old_uncomp  new_uncomp
   329     869       6,633       7,008


SELECT
   175 * 34769 / 2048 AS old_MB,
   306 * 35043 / 2048 AS new_MB,
   Cast(old_MB / (1-0.9158) AS INT) AS old_uncomp,
   Cast(new_MB / (1-0.8494) AS INT) AS new_uncomp 

old_MB  new_MB   old_uncomp  new_uncomp
 2,970   5,235       35,273      34,760

 

 

The uncompressed size of the 1st table is just a bit larger than before (due to partitioning) and the copy of 2nd table is a bit smaller due to larger datablocks (and thus a bit less overhead).

 

Now the main question is: Why is compressin so different?

Looks like there was a lower compression level for the new copies, maybe there was a change in dbscontrol compression settings ...

Teradata Employee

Re: PPI table size is about 2 times greater than non-PPI table

 

Sorry for the late reply.  The customer took down the Teradata instance that was showing a huge difference in current perm between PPI and non-PPI and brought up a new Teradata instance (Same size, changed from Enterprise to Advanced Edition).  Unfortunately, I was not able to execute the populatefsysinfotable macro before they took it down.

On the new database, we are now seeing what we usually expect, PPI table size is about 1.2x larger. So, the difference in table size must have been due to compression??

 

There were no changes to dbscontrols and data in table was exactly the same. Here's the macro output now for the larger of the 2 tables:

 

 TID_TAIComp MethodComp StateEst Comp Ratio(%)Est Uncomp DBs(%)DBs In 1 to 8DBs In 9 to 24DBs In 25 to 64DBs In 65 to 120DBs In 121 to 168DBs In 169 to 216DBs In 217 to 256DBs In 257 to 360DBs In 361 to 456DBs In 457 to 512DBs In 513 to 760DBs In 761 to 1024DBs In 1025 to 1304DBs In 1305 to 1632DBs In 1633 to 2048Min DB SizeAvg DB SizeMax DB SizeTotal DBsLrg CylsSml Cyls
ppi0MANUALN001000000000000000088840400
ppi1024MANUALC84.26000100000000000000837483044865990
ppi2048MANUALC83.674.3944947000000000000829484293346650
nonppi0MANUALN001000000000000000088840400
nonppi1024MANUALC93.55001000000000000000816243018884470
nonppi2048MANUALC93.8311.353466000000000000081324399269620

0

 

Thank you both for all your help!

 

 

Junior Apprentice

Re: PPI table size is about 2 times greater than non-PPI table

No problem, thanks for the update.

 

Just a thought (and unfortunately there's no way to check it now, but) I wonder if the maximum block size on your previous tables was always the same? Just looking back at the earlier filesysinfo output the 'maximum block size' for the two tables (small vs. large in each pair) were vastly different.

 

Typically the max block size is 255 sectors (127.5KB) although you can increase that to 2047 (@1MB) on current releases.

 

Interestingly, even on your new tables there is still a noticeable difference in compression ratio between the two tables (83% vs. 93%). To be honest I haven't spent time comparing two tables like this but that would seem to be quite a big difference given that the bulk of the data in the two tables is the same. yes there will be a difference in the byte patterns of the corresponding row in each table, because of the partition number in the row header, but my gut reaction (nothing more technical than that!) says that I wouldn't have expected that to make a 10% difference.

 

Not sure there's much more we can do here.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Teradata Employee

Re: PPI table size is about 2 times greater than non-PPI table

Block compression ratios are quite sensitive to what is grouped together in the block. If no other parameter changes were introduced, then it is possible that the ordering of the PI in some way resulted in bringing together data with more commonality in the blocks. When the PPI is introduced, the sort order of data in the table is the partition number followed by the rowhash rather than just the rowhash. This sort order especially with small granularity partitions may result in separating common data in some way that results in a lower compression ratio.