Compress on Cold: Four Easy Steps to Set up Temperature-Based BLC

Blog
The best minds from Teradata, our partners, and customers blog about relevant topics and features.
Teradata Employee

Temperature-based block-level compression, also known as "Compress on Cold" was introduced in Teradata Database 14.0. Temperature-based block-level compression (TBBLC) automatically compresses infrequently accessed data and decompress frequently accessed data.  The access frequency is determined by statistics collected by TVS. TBBLC may be applied on a table-by-table basis or by default on all user tables.

 

This blog posting provides a little background and describes the easiest way to enable TBBLC on just a few selected tables in your database.

 

Background  

Teradata Virtual Storage (TVS) provisions free space for the file system and keeps track of cylinder locations. TVS serves as an abstraction layer between each AMP’s file system and the actual physical location of the data on physical storage.

 

When TBBLC is enabled, TVS is queried for the temperature of the data being held in disk storage. Temperature is an indication of how often the data is accessed. A cold temperature indicates the data is accessed less often than other data on the system, whereas a hot temperature indicates the data is accessed more often. This temperature determination is used by the AutoTempComp, a TBBLC background task, to decide which cylinders to compress or to uncompress.  

 

TBBLC_TVS2.jpg

 

 

The AutoTempComp task, as is true with all file system activity, runs independently on each AMP.  This means that until a steady state is reached across the system, different portions of the same table will be compressed or not compressed on different AMPs.

 

This blog posting only offers a brief explanation of TBBLC. To fully understand how TBBLC works, and to get familiar with the variety of options and settings involved, please read the orange book titled:  Block-Level Compression, With Selected Enhancements Through Teradata Database 16.0, dated April 2017.  Chapters 5, 6, and 7 covers most of the relevant detail about TBBLC.

 

Simple Steps to Enable TBBLC

Below are the required changes to DBS Control parameters you will want to make if you wish to apply TBBLC on selected tables, with a brief explanation of each.  These parameters are all part of DBS Control Compression Fields. Compression takes place after making the appropriate changes to these parameters without the need of a system restart.

 

BlockLevelCompression = ON

This is setting # 1 and is the big switch for enabling block level compression on the system. This parameter mandates whether BLC of any type is allowed to take place. The default is OFF.

 

EnableTempBLC = TRUE

This is setting #32 and must be turned on in order for temperature data to be collected by TVS for use by TBBLC. Combined with the BlockLevelCompression setting, EnableTempBLC will allow temperature-based compression to take place either on selected tables or the entire system.

 

DefaultTableMode = MANUAL

This setting #33 specifies what the default approach to compression for all tables will be. Setting the mode to MANUAL means that each table will use the system defaults, TLA, or Query Band overrides. 

 

BLOCKLEVELCOMPRESSION = AUTOTEMP on selected tables

This is not a DBS Control parameter, but rather the table-level attribute that you will need to add onto each table you wish use temperature-based compression. Compression table level attributes allow you to override what the DefaultTableMode specifies, when different tables are to be treated differently.

 

The following graphic illustrates the relationship among these parameters and settings.

 

TBBLC_Modes2.jpg

 

Additional Information

Don't expect immediate changes when you begin to use TBBLC. Temperature changes are driven by access frequency or lack thereof and takes time depending on access patterns.  For that reason, the system may need to be running for several days or a week for the TVS statistics to see the results of TBBLC. 

7 Comments
Enthusiast

Hi Carrie,

 

Is it possible that due to TBBLC, table may be skew ( more  space on one amp ) even table has same row count on each amp? As per usage If data was compressed on few AMPs not for other AMPs

Teradata Employee

Hi SandeepY,

 

Each TVS task that is supporting TBBLC is running independently on each AMP, as is true of all file system routines. So each AMP will end up compressing a similar amount of data because it has its own threshold of "cold" independent of other AMPs. However, if a certain workload accesses data from a specific AMP more than other AMPs, the cylinders of a table on that AMP will be hotter than cylinders of same table on a different AMP… if so, the hotter AMP will have more uncompressed data FOR THAT TABLE on that AMP relative to that table on other AMPs, and so we have skew.

 

Because of that, you could have space skew for a given table across AMPs if certain AMPs are accessed more often than others for that table. The COLD 20% of cylinders will be compressed on each AMP, but the cylinders belong to various tables and not necessarily exactly the same across all AMPs.

 

Thanks, -Carrie

Teradata Employee

Hi Carrie,

1) Is there any way (apart from FERRET commands) to check the compression status of table for which TBBLC is applied at table level? We would like to assess the benefit of TBBLC by monitoring the table compression post implemention. 

2) Parameter TempBLCRescanPeriod = 7 days by default in DBS control.

If stable state is reached, does autotempcomp task stopped completely for 7 days? doesn't COMPRESSION/UNCOMPRESSION occur at all in this period?

 

Thanks,

Rushi Gadre

Teradata Employee

Hi Rushi,

 

Yes, you can use SQL to get table compression information.

 

There is an SQL SHOWBLOCKS command available, as described in the orange book titled: SQL SHOWBLOCKS and SQL

SHOWWHERE in Teradata® Database.

 

When used appropriately, you will get back fields such as EstCompRatio and EstPctOfUncompDBs without having to use ferret.

 

In terms of TempBLCRescanPeriod, here’s how that parameter works:

 

That parameter indicates how many days in between one full scan of the File System.  During each scan, the file system will compress/decompress accordingly depending on the temperature of each cylinder. Cylinders are scanned in succession.  Equilibrium/stability is achieved when 20% of storage is occupied by compressed data and it may take several scans to achieve.  This usually happens when all the compressed data are no longer accessed.  So once a full scan of the File System is completed, it sleeps for the remainder of the 7 days duration.

 

Thanks, -Carrie

Teradata Employee

Thank you, Carrie!

 

I checked SHOWBLOCKS and SHOWWHERE data.

 

We want to set TBBLC threshold to 'WARM' (80%) and apply TBBLC only for the selected big tables.

Is it a good idea to check the % of HOT/WARM/COLD cylinders of a particular table using SHOWWHERE data and use that as a consideration before applying TBBLC? Meaning, should we select only those tables for TBBLC for which 80% of cylinders are in MEDIUM/SLOW grade?

Teradata Employee

Question:  We want to set TBBLC threshold to 'WARM' (80%) and apply TBBLC only for the selected big tables.

 

Response:   Assuming no other compressed tables exist besides those big tables designated as TBBLC tables, doing that will cause 80% of the that table’s data to compress, leaving 20% uncompressed; if you know for sure that the workload only access a specific 20% of the cylinders all the time then that might be alright, otherwise that may not be the best approach..

 

Question:  Is it a good idea to check the % of HOT/WARM/COLD cylinders of a particular table using SHOWWHERE data and use that as a consideration before applying TBBLC? Meaning, should we select only those tables for TBBLC for which 80% of cylinders are in MEDIUM/SLOW grade?

 

Response: Temperature (HOT/WARM/COLD) are different than storage class/grade (FAST/ MEDIUM/SLOW)… temperature influences migration of data across storage grade (i.e. HOT data gravitate toward FAST storage, COLD toward SLOW…)… faster grade results in faster I/O time.  TBBLC uses temperature to influence compression or not (i.e. hotter/frequently accessed data should be left uncompressed for fast access and colder/seldom accessed data should be compressed to save space)… uncompressed/hot data avoid decompression cpu overhead for best performance. 

 

For TBBLC purposes, you do not want to use grade as a criteria.  Although temperature makes more sense, you still may not want to use that.  The reason is the temperature is all relative from one cylinder to another.  So if you specify WARM (80%) for TBBLC to compress because 80% of cylinders occupy WARM and COLD, you will incur the decompression CPU overhead on each access to WARM data and if your workload access lots of data that is WARM along with HOT, their temperature won’t change and will encounter performance degradation.

 

The best criteria to use is what percent of a table does a workload seldom access. That determines the compression threshold for TBBLC.

Visitor

Thank you, Carrie!