compression calculations

Database
Enthusiast

compression calculations

Hi,

 I've a requirement of finding the columns which are best suited candidates for compression. I've around 80 tables in my database. Can somebody suggest if we have any SQL's which give us calculations related to compression so that columns for compressin can be decided.

Thanks in advance 

5 REPLIES
Enthusiast

Re: compression calculations

Please refer below link for more information on compression and identifying and evaluating candidate tables for Compression:

http://developer.teradata.com/extensibility/articles/block-level-compression-evaluation-with-the-blc...

Enthusiast

Re: compression calculations

Kishore, I believe Shashi is looking for MVC of the columns in a table, but not BLC of the table.

@Shashi, Please let me know if otherwise.

The best columns that would be candidates for MVC are those which are having a high frequency of the same value.  MVC is primarily used in a good chunk for space saving  and there are additional savings like reduction in I/O.There are many things to be considered while implementing MVC of which the following will help:

1. How many of the values in a column are repeating.

2. How frequently that value exists in the column.

3. How frequently that column is updated. the more the column value is update, it is not ideal for MVC.

4. if we have a column which is char(100) and the length maximum value in that column is not more than 20, can we consider changing the column to char(20) or even Varchar(20)~ This will also help in space redution of the table.

5. The number of values compressed in a column cannot be more than 255(try to use the values for compression in multiples).

6. Also the tableheader could not be crossing a certain limit depending on the version you are in (TD12 supports 128KB of tableheader).

There is no  official tool from Teradata to identify the columns that qualify for MVC. An extensive analysis on the tables and the data is the way out. There are other tools in the market which can be looked for

Enthusiast

Re: compression calculations

Bteq script can be used to calculate best options for MVC by considering datatypes, frequency of repeating values, distinct value count. This MVC list can then be reviewed for actual perm space saving and used.

Regarding 4 point from Krishaneesh, varchar(100) will be able to support what char(100) was doing and also save spaces on mostly 20 character data.

Enthusiast

Re: compression calculations

Thanks for the response.. I was referring to MVC and I got a tool called PRISE.

Enthusiast

Re: compression calculations

Try this not sure whether it is useful. Please modify condition as you wish


--*****************************************************************

--**  Create Table Column compression candidate 

--**  Through the following selection Criterea

--**     1- Table Size

--**     2- Ratio; No of occurences/row count

--**     3- Column Length

--**     4- Column Compressible

--**     5- Not part of P, S or any other indices type

--**     6- Varchar is compressible from Rel 13.10 but can be excluded

--**

--**

--** A Ekladios

--** V2.0

--** 14/7/2014

--**

--*****************************************************************

SELECT       AE1.R1 ||  CASE WHEN ColumnID < CNT THEN ' UNION ' ELSE ' ; ' END       

 FROM 

             (    

            SELECT        ' SELECT  ' || '''' ||      'ALTER TABLE  ?DB.?TB   ADD   ' || TRIM ( C.ColumnName) || '  COMPRESS  (  ' ||  '''' ||  '||' || 

                                       '  CASE WHEN COL_VAL IS NULL THEN ' || '''' || 'NULL' || '''' || ' ELSE  Col_Val  END ' || '||'   || '''' || 

                                   '  ) ; ' || '''' || 

                               '    FROM (  

                                     SELECT ' || TRIM ( C.ColumnName ) || '  AS Col_Val   , ' || '''' || TRIM ( C.ColumnName ) || ''''  || '  AS Col_Name  ,  COUNT ( * ) AS  CNT  FROM '

                                              || TRIM ( C.DatabaseName ) || '.' || TRIM (C.TableName ) || '  GROUP BY 1 , 2       HAVING CNT >  ( SELECT  ?Ratio 

                                               *  COUNT (*) FROM ' 

                                              || TRIM ( C.DatabaseName) || '.' || TRIM (C.TableName ) || '   )   '  

                                                  || ' ) AA   '   AS R1

                                      , ColumnId  

                                      , MAX (ColumnID) OVER (  ORDER BY ColumnID ) AS CNT                      

              FROM               DBC.COLUMNS  C

                                          INNER JOIN

                                        DBC.TableSize       S

                                          ON      C.DatabaseName =  S.DatabaseName 

                                          AND   C.TableName        =  S.TableName

                                          INNER JOIN

                                        DBC.Indices           I

                                           ON      C.DatabaseName =  I.DatabaseName 

                                           AND   C.TableName        =  I.TableName

              WHERE            I.ColumnName <>  C.ColumnName

              AND                  C.ColumnLength > 10

                  /**

                      DO NOT Compress If table was already comressed

                      If this is not the case then comment out CompressValueList

                  **/

              AND                  C.CompressValueList IS NULL  

              AND                  C.DatabaseName = '?DB'

              AND                  C.TableName = '?TB'             

              HAVING           SUM ( S.CurrentPerm) /1e9 > 10

              GROUP BY     1 , 2

              ) AE1 (R1 , ColumnId , CNT  ) 

  ORDER BY        ColumnID

;