Block Level Compression evaluation with the BLC utility

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Teradata Employee

Block Level Compression evaluation with the BLC utility

One of the new compression features in Teradata 13.10 is Block Level Compression (BLC), which provides the capability to perform compression on whole data blocks at the file system level before the data blocks are actually written to storage. Like any compression features, BLC helps save space and reduce I/O. 

There is a CPU cost to perform compression on inserting data. And there is a CPU cost to perform decompression on whole data blocks whenever the compressed data blocks are accessed. Even when only one column of a single row is needed, the whole data block must be decompressed. For updates, the compressed data blocks have to be decompressed first and then recompressed. Careful evaluations shall be done before applying BLC in your production systems.

Please refer to the Orange Book "Compression in Teradata 13.10" for detailed guidelines. Basically, large tables (frequently accessed/updated or not), especially those accessed mostly or only by lower priority jobs, are potential candidates for BLC if the system has moderate or low level of CPU utilization; large tables that are infrequently accessed / updated are always good candidates for BLC in any system (CPU bound or not).

Download the BLC evaluation utility.

This evaluation package is provided to help with the BLC evaluation process. The test package contains instructions and scripts for selecting BLC candidate tables based on the different criteria below with different levels of completeness leading to different levels of accuracy on the output.

  • Level 1: table size only
  • Level 2: object use count and table size
  • Level 3: update frequency and volume, object use count and table size
In the article, we guide you through the test process step by step, including how to select potential candidate tables for the BLC testing, and how to create a test script for a specific table of interest. Test scripts are to be created by copying and modifying the provided test script templates in the package.

The samples folder contains example bteq files that are modified from the templates for testing over a sample system and a sample table. The output files are also included in the samples folder. They serve as references that help you see how the expected output may look like. The sample files include:
  • sel_blc_candidates_misty.bteq - script for selecting candidate tables in a sample system "misty"
  • sel_blc_candidates_misty_out.txt - output candidate tables in the sample system "misty"
  • blc_test_gw1_baseaddr.bteq - BLC impact test script for a sample table gw1.baseaddr
  • blc_test_gw1_baseaddr_out.txt - test output for the sample table

Please note that the instructions and scripts are mainly provided for 13.10 systems without  BLC applied yet. However, with easy tweaks, it will work for 13.10 systems with BLC already applied (including 2690 systems with BLC on by default) on evaluating BLC impact on tables that have already been compressed with BLC. Read further for details.

Requirements

This package requires Teradata Database Server release 13.10 that has the BLC feature. It should work for any OSes supported for TD 13.10 database server, Windows or Linux. It should work for any Teradata platforms that are supported for TD 13.10 database server.

However, a DBS client is needed to be Windows in order to be able to test with the Level 3 criteria for selecting BLC candidate tables in Step 1.3, because Teradata XML Services package is used in the optional Step 1.3 (recommended though), but it can only be downloaded on a Windows machine and installed from there.

Procedure

Step 0: Enable and configure BLC


This test works on TD 13.10 systems only. Please note, once BLC is enabled, you cannot go back down to an earlier release without SYSINIT. For systems with BLC enabled already but differently from the following, for example 2690 systems with BLC on by default (BlockLevelCompression = ON, CompressPermDBs = UNLESSQBNO), please skip this step, directly go to Step 1.

From DBSControl, run the following commands:

mod compress 1 = on
mod compress 2 = ONLYIFQBYES
write


Please make sure the second compress field is set to 'ONLYIFQBYES' that is assumed in the test logic, otherwise the test will not work as designed and the results will be misleading. 

This step is needed only once for testing all tables in interest in the same system.

Step 1: Search for candidate tables for the BLC test

Step 1.1: Search for candidate tables for the BLC test based on table size only

If you would like to search for BLC test candidates in your system based on not only table size but also access pattern, you may skip this step and move onto Step 1.2 or Step 1.3.

The following query selects candidate tables with at least 10 primary table data blocks per AMP. The average data block size is assumed as 96KB. The min_sample_fraction for each table indicates the minimal sample fraction value to use for sampling in order to get at least 10 primary table data blocks per AMP in the sample data for the following BLC testing.

The returned list of tables is ordered with the biggest table (hence with the least minimal sample fraction) appearing first. 

sel databasename (varchar(30)), tablename (varchar(30)), min_sample_fraction
from (
sel databasename, TVMName, 10 / ( sum(case when tv.protectiontype = 'F'
then currentpermspace / 2
else currentpermspace
end) / (hashamp() + 1) / 96000 )
as Minimum_Sample_Fraction
from dbc.dbase db inner join dbc.tvm tv
on db.databaseid = tv.databaseid
inner join dbc.databasespace ds
on ds.tableid = tv.tvmid
group by databasename, TVMName
where databasename not in ('dbc', 'syslib', 'systemfe', 'sysudtlib',
'sysadmin', 'tdwm', 'SQLJ', 'dbcmngr', 'dbcmanager', 'sysspatial',
'sysxml')
) as t1 (databasename, tablename, min_sample_fraction)
where min_sample_fraction <= 1.0
order by min_sample_fraction;


The following shows the output from a test system:

   databasename                    tablename                          min_sample_fraction
------------------------------ ------------------------------ ----------------------
gw1 baseaddr 6.52173459357593E-004
td_blc_test_user td_blc_test_tb_noblc 1.86335033371362E-003
td_blc_test_user td_blc_test_sample_tb 1.86335033371362E-003
gw1 addr_alc_all 2.06895752678472E-003
gw1 baseaddr0_lz 2.03187056713571E-002
td_blc_test_user td_blc_test_tb_blc 8.57064496960278E-002
gw1 addr 1.02265522201845E-001


You may look into Step 1.2 or Step 1.3 for more accurately selecting candidate tables, or directly move forward to Step 2 to prepare for figuring out BLC impact on the selected candidate tables.

For systems with BLC on by default like 2690 systems or systems with BLC enabled and applied already, please note the following:

1.  Some or all tables are compressed, but with different compression ratios achieved on specific tables,  hence the ordering of the above query results don't truely reflect the ordering of the uncompressed table sizes. The calculated min_sample_fraction values are not accurate for compressed tables, most likely overestimated, will need to be adjusted later in Step 3.2.

 

2. Step 1.2 and 1.3 are not quite applicable, jump to Step 2.

Step 1.2: Search for candidate tables for the BLC test based on table size and access count

If you would like to search for BLC test candidates in your system based on access count and table size, please make sure object use counting is enabled for 1 or 2 business cycles.

To enable object use counting by setting ObjectUseCountCollectRate in DBS Control to a positive value that specifies every how many minutes the use counts are to be updated. The recommended minimum value is 10 minutes. From DBSControl, run the following commands:

mod general 30 = 10
write


If object use counting was enabled before, and you decide to reset the AccessCount and LastAccessTimeStamp fields for all databases and objects in the system, in order to get accurate access count for comparison, you might want to run the following command from DBC if the user DBC has been granted the "UPDATE" privilege on the AccessCount and LastAccessTimeStamp columns of the Dbase, TVM, TVFields, and Indexes tables. Resetting is not required, especially if no reset for individual tables or databases was never done. Please consider carefully before you reset.

Exec DBC.ClearAllDatabaseUseCount;


With object use counting enabled one or two business cycles, you can use the following query to select candidate tables. 

sel dt.databasename, dt.tablename, accesscount, min_sample_fraction 
from (sel /* Minimum sample fraction per table */
databasename, TVMName, 10 / ( sum(case when tv.protectiontype = 'F'
then currentpermspace / 2
else currentpermspace
end) / (hashamp() + 1) / 96000 )
as Minimum_Sample_Fraction
from dbc.dbase db inner join dbc.tvm tv
on db.databaseid = tv.databaseid
inner join dbc.databasespace ds
on ds.tableid = tv.tvmid
where databasename not in ('dbc', 'syslib', 'systemfe', 'sysudtlib', 'sysadmin',
'tdwm', 'SQLJ', 'dbcmngr', 'dbcmanager', 'sysspatial', 'sysxml')
group by databasename, TVMName /* end of Minimum sample fraction per table */

) as minsamplefr (databasename, tablename, Min_Sample_Fraction),
dbc.tables dt
where min_sample_fraction <= 1.0 and
minsamplefr.databasename = dt.databasename and
minsamplefr.tablename = dt.tablename
order by accesscount, min_sample_fraction;


The returned list of tables is ordered with the coldest and the biggest table on top. The following shows the output from a test system:

DatabaseName      TableName                 AccessCount     Min_Sample_Fraction
---------------- ------------------------- -------------- ----------------------
gw1 addr_alc_all ? 2.06895752678472E-003
gw1 baseaddr0_lz ? 2.03187056713571E-002
td_blc_test_user td_blc_test_tb_noblc 2 1.86335033371362E-003
td_blc_test_user td_blc_test_tb_blc 2 8.57064496960278E-002
td_blc_test_user td_blc_test_sample_tb 4 1.86335033371362E-003
gw1 addr 5 1.02265522201845E-001
gw1 baseaddr 7 6.52173459357593E-004


Step 1.3: Search for candidate tables for the BLC test based on table size, access count and update frequency

If you would like to search for BLC test candidates in your system based on update frequency, access count and table size, please make sure object use counting and query logging with XMLPLAN had been enabled on all users for 1 or 2 business cycles, then use the query in sel_blc_candidates_template.bteq with the start time and end time modified accordingly to match the period that you would like to examine. 

Please be aware of the CPU overhead of object use counting and query logging, and also be aware of the additional space need for storing query logging data.

Step 1.3.1: Enable query logging (DBQL) with XML Plan

To start and DBQL with XML Plan on all users and all queries:

BEGIN QUERY LOGGING WITH XMLPLAN ON ALL;

END QUERY LOGGING WITH XMLPLAN ON ALL;

Step 1.3.2: Enable object use counting

Please refer to Step 1.2 for how to enable object use counting.

Step 1.3.3: Download and install Teradata XML Services Package

After object use counting and DBQL logging with XMLPlan are enabled for for 1 or 2 business cycles, you have the information for searching for candidate tables from for the BLC test based on table size, access count and update frequency. 

Before you use the query in the template sel_blc_candidates_template.bteq in bteq, the Teradata XML Services Package needs to be downloaded and installed on your client system from which you run BTEQ.


Choose the package appropriate for your Database Server. You will need to unpack this on a windows client which has Active Perl and BTEQ and run the install Perl script. The Orange Book included in the download package contains detailed installation instructions. Please make sure you choose yes when prompted "Do you want to install the function signatures implemented for older database versions for backward compatibility?" in the installation example, in order to have the function sysxml.xml_shred1 installed, which is used in sel_blc_candidates_template.bteq. Don't worry about any message that indicates the package is for TD 13.0 release or TD 12.0 release, it works for TD 13.10 release as well.

Now you can use the query in sel_blc_candidates_template.bteq directly; or copy the bteq template script and name it with your database system name, modify the login info in the copied file, then run the test script from the command line in the directory where the bteq file is located. Either way, please don't forget to modify the start time and end time accordingly to match the period that you would like to examine.

bteq < sel_blc_candidates_dbsystem.bteq > sel_blc_candidates_dbsystem_out.txt 2>&1


The returned BLC candidate tables are sorted with the best fit (least frequently updated, least accessed, biggest size) on top. The following shows the output from a test system:

DatabaseName     TableName            WriteLockCount WriteIOCount AccessCount Min_Sample_Fraction
---------------- -------------------- -------------- ------------ ----------- ---------------------
td_blc_test_user td_blc_test_tb_noblc 1 6 3 9.31667759823505E-003
gw1 addr 2 161 5 1.02265522201845E-001


Note the following columns:
  • WriteLockCount: how often the table was updated
  • WriteIOCount: how much was updated
  • AccessCount: total read and write access count
  • Min_Sample_Fraction: the minimal sample fraction value to use for sampling in order to get at least 10 primary table data blocks per AMP in the following BLC test for the specific table.

Step 2: Create test user with appropriate space


In order to get significant timing measurements for estimating BLC performance impact, the actual sample fraction to use is 500 * Minimum_Sample_Fraction, i.e., 500 MB per AMP. The test uses 3 copies of the sample data and has tables for reporting. To estimate how many MB perm space needed for the test, please run the following query:

sel (hashamp() + 1) * 500 * 4;


Then try to create the test user with the desirable amount of space:

create user td_blc_test_user AS PERM = 'ABOVE_QUERY_OUTPUT'e6 PASSWORD = td_blc_test_user;


If the above query fails, you can retry with successively less perm space per AMP. For example, you run the TEST_USER_PERM query again:

sel (hashamp() + 1) * TEST_USER_PERM_PER_AMP * 4;

with TEST_USER_PERM_PER_AMP = 400, which will return a smaller number, run the above CREATE USER query with 'ABOVE_QUERY_OUTPUT'replaced by the smaller number; if it succeeds, TEST_USER_PERM_PER_AMP = 400 and you can move forward, otherwise continue trying with TEST_USER_PERM_PER_AMP = 300, 200, …,  until you successfully created the test user.

Step 3: Estimate BLC compression ratio and basic performance impact


This step will create test tables by sampling from the table of interest and use them to estimate BLC compression ratio and compression and decompression CPU time overhead. The query used to measure the compression cost is an INS-SEL query. The query used to measure the decompression cost is a full table scan query.

Test statements for reporting changes on logical IO Count and Elapsed Time are included but commented out in the current test template, because BLC does not change logical IO count much and Elapsed Time measurement could be misleading given that in a customer production or testing system is affected very much by other activities. If you are interested in reporting them, you can simply uncomment them.

Step 3.1: Copy blc_test_template.bteq

Copy blc_test_template.bteq and name the copied file to include your db name and table name, blc_test_dbx_tablex.bteq.

Step 3.2: Adjust the sample fraction in the copied file based on the following recommendations.

In order to get significant timing measurements for estimating BLC performance impact, the actual sample fraction to use is 500 times higher. However, in Step 2, you may find out your system does not have that much spare space, and you have created the test user with a smaller perm space, i.e., TEST_USER_PERM_PER_AMP was smaller than 500 MB. 

The recommended actual sample fraction is:

TEST_USER_PERM_PER_AMP * Min_Sample_Fraction


As noted earlier, for systems with BLC on by default like 2690 systems or systems with BLC enabled and applied already, some or all tables are compressed, but with different compression ratios achieved on specific tables,  the min_sample_fraction values are not accurate for compressed tables, most likely overestimated, here they need to be adjusted. To reduce the possibiliy of running out of space during the test, please divide the above calculated sample fraction by 10 (considering compression rates could be as high as 90% for some tables).

If the recommended actual sample fraction is equal to or greater than 1.0, please remove the phrase 'sample sample_fraction' from the sampling query as noted in the template.

Step 3.3: Modify the copied test file

Modify the copied test file to use your system name, your dbc password, your database name and table name in all statements that are commented with 'Please modify' inside the file.

You may want to use a replace command instead of finding and modifying manually. Here are the specific items to be replaced:
  1. Replace all 'dbsystem' with your_system_name;
  2. Replace all 'dbc_pwd' with your_dbcpwd;
  3. Replace all 'dbx' with your_db_name;
  4. Replace all 'tablex' with your_table_name.

 Step 3.4: Run the test script from the command line in the directory the bteq file is located.

bteq < blc_test_dbx.tablex.bteq > blc_test_dbx.tablex_out.txt 2>&1

Step 3.5: Check the output file.

Make sure there are no unexpected failures or errors. See the last query output as the final report. 

Sample output:

Measurement                               BLC_Change  BLC_CPUcost_SECperGB
---------------------------------------- ---------- --------------------
Compression_Ratio 97.8%
INS_CPUTime 283.7% 17.6
SEL_CPUTime 74.3% 8.7


Compression Ratio

(tabSpace_noblc – tabSpace_blc) / tabSpace_noblc

E.g., (4,121,608,192 - 89,608,192) / 4,121,608,192

INS_CPUTime BLC_Change (Compression cost, relative)

(ins_blc_CPUTime - ins_noblc_CPUTime) / ins_noblc_CPUTime

E.g., (98.032 – 25.548) / 25.548

SEL_CPUTime BLC_Change (Decompression cost, relative)

(sel_blc_CPUTime - sel_noblc_CPUTime) / sel_noblc_CPUTime

E.g., (83.888 – 48.116) / 48.116

BLC_CPUcost_SECperGB (Absolute cost)

(CPUcost_blc – CPUcost_noblc) / total GB uncompressed data being compressed

E.g., for INSERT (Compression): (98.032 – 25.548) / (4,121,608,192 / 109)

        for SELECT (Decompression): (83.888 – 48.116) / (4,121,608,192 / 109)

Before you decide that you will apply BLC on this table in your production system, you shall evaluate BLC performance impact on your specific queries in a test environment that is similar to the production environment. Please see Appendix A for BLC Evaluation Utility considerations and see Appendix B for a list of queries to be evaluated in your test system.

Step 4: Repeat Step 3 for other tables of interest

Step 5: Clean up


Step 5.1: Modify blc_test_cleanup.bteq

Replace 'dbsystem' with your_system_name and to replace 'dbc_pwd' with your_dbcpwd;

Step 5.2: Run the cleanup script

Execute the following command:

bteq < blc_test_cleanup.bteq > blc_test_cleanup_out.txt 2>&1

Appendix A


Estimates are obtained from sample data. The sample data might not be well representing the data characteristics.

Compression and decompression overhead estimate could be rough. 
  1. Compression overhead is measured from INS-SEL into empty table, not representing updates.
  2. Decompression overhead is measured from full table scan, not representing single AMP / primary index access, not representing complex queries either.
  3. Only the specific table in interest itself is sampled into a test table with the same DDL, but cardinality, non-primary indexes and statistics are not preserved.

    Even if trying complex queries over the sample test data, measurements won't provide accurate estimates.
Before you apply BLC in production, you shall evaluate BLC performance impact on your specific queries in a test environment similar to the production environment, see Appendix B for a list of suggested specific queries.

Appendix B: Suggested Queries for BLC Evaluation


The following is a list of queries that are suggested to be evaluated in your test system before BLC is applied on your production system.
  1. Primary index queries and modifications on a compressed table.
  2. NUSI criteria queries and modifications on a compressed table where multiple rows are targeted (a NUSI with a  reasonable amount of duplicates, but not a large amount).
  3. Large table and small table join with the large table compressed.
  4. Full table modifications on a compressed table.
You also may want to experiment with different data sets and CLOBs (Note: BLC does not support CLOBs in TD 13.10 yet.)
  • Latin data - both directly in the row and in an external CLOB.
  • Unicode data - both directly in the row and in an external CLOB.






Tags (3)
7 REPLIES
Teradata Employee

Re: Block Level Compression evaluation with the BLC utility

Hi,

How can i found the apply BLC Tables at database information.

GS

Re: Block Level Compression evaluation with the BLC utility


We can find blc tables using Ferret showblocks output.

If you found a '#' sigh after 'The total number of Data blocks ' then that table is block level compressed else not.
Teradata Employee

Re: Block Level Compression evaluation with the BLC utility

In Teradata 14 there is a new Show command to list Compressed tables i.e.

Ferret > SHOWCOMPRESS

Re: Block Level Compression evaluation with the BLC utility

Hi Jenny,

I read the article  about the Block Level Compression in which one engineering recommendation is that only tables that are more than 5 times as large as total system-wide memory be considered for BLC. My question here is about the system-wide memory, is this something i need to consider FSG cache (which is set to 80 on our prod). Or i need to check the memory via 'free' command on production linux node. We have tables with billions of rows, so I would like to  calculate the system-wide memory and compare the size with our biggest tables on the system to see whether BLC is really is going to help or not.

I know BLC is best fit for the cold data (where the data gets updated/del/inserted once in a while OR no changes at all). If actively-accessed tables are being considered for compression, then BLC will be a better fit for platforms that have plenty of excess CPU. Our prod is almost on 100% CPU utilization. Just wanted to understand how much we should maintain when it said plenty of CPU.

We recently ran into huge space crunch, searching for all the options apart from MVC.

Thanks,

Geeta

Teradata Employee

Re: Block Level Compression evaluation with the BLC utility

Hi Geeta, thanks for the message! I searched but did not find the BLC article you mentioned. Would you email me the link so I may try to help get an answer for you or directly ask that author? My email address is Guilian.Wang@teradata.com

Re: Block Level Compression evaluation with the BLC utility

Sure, Just sent an email.

Re: Block Level Compression evaluation with the BLC utility

Why divide by 96000 when calculating Minimum_Sample_Fraction?