How to decide compression?


How to decide compression?

Hi friends,

I'm new to this forum and only using Teradata for a few months. My team has been given task to reduce the size of 40 most critical tables. The design of the tables was poor, so a lot of data types are unnecesarily big with no COMPRESS or only 1-2 values. Also indexes are poor, but I've got this part covered by another thread i found here.

Is there any trick/query to quickly review data types and check compression? I know I need to look at the frequency of the values but not sure how many to put there. Top 10? Also, when I'm getting the frequencies the query runsh awfully long time...

Glad for any help.

Thanks, Yurij

Tags (1)

Re: How to decide compression?

Hi Yurij, 

The quick answer on compression would be: use sampling and 1% rule.

Now if you continued reading, I can give you a bit more insight into your task. If you want to get optimal multi-value compression list, you would need to understand how presence bits work in TD. It might be beneficial to add less frequent values than 1% also or remove a few. The optimal number of values in compression list also depends on given the datatype (you get way more savings from compressing CHAR(20) than CHAR(2)). I've added some coments on presence bits here, check it out:

Also consider, whether the value could be subject to change (the optimum needs to be reviewed on a regular basis for critical tables) plus there are obviously wrong columns to consider compression (i.e. BUSINESS_DATE on a daily refreshed table).

Last but not least, there is procedure/macro available to do this semi-automatically table by table. Most importantly, if you want to check your DWH fully and do this task automatically, there are few great tools. We've been using tool named ATO on our project, it worked as a charm and it reviews also data-types, 1-click deployment, check out