Workload Toolkit – Part 4 – Compression Analysis

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

Workload Toolkit – Part 4 – Compression Analysis

Ok, so I shouldn’t even need to broach this topic, as I’m sure you have all heard it before: compress, compress, and compress.

However, for some reason, some sites are still not getting it, or, they have not adequately incorporated this pertinent step in their development process, leading to tables still being implemented without adequate compression. Perhaps, the tables were compressed a long time ago, but are not taking advantage of the advances in compression. like multi-value.

Compression not only decreases your storage space (anyone billing for that?) but also decreases the amount of IO that needs to be performed in retrieval and spooling queried data. For smaller tables, it can also increase the number of tables cached in memory, eliminating subsequent IO’s for those tables.

This query takes a quick look at the type of compression applied to each table, pointing to opportunities for further compression. I am looking at table 10GB in size or greater...adjust to your site's needs:

SELECT dbt.DATABASENAME,
dbt.TABLENAME,
MAX(CASE WHEN (compressvaluelist IS NOT NULL)
THEN (CASE WHEN INDEX(compressvaluelist,',') > 0
THEN '3. MVC '
ELSE '2. SVC '
END)
ELSE '1. NONE'
END) COMPRESS_TYPE,
MIN(pds.Current_Perm) CURRENT_PERM

FROM dbc.columns dbt,

(SELECT t.DATABASENAME,
t.TABLENAME,
SUM(ts.CurrentPerm) CURRENT_PERM
FROM DBC.Tables t,
DBC.TableSize ts
WHERE t.DATABASENAME = ts.DATABASENAME
AND t.TABLENAME = ts.TABLENAME
AND ts.TABLENAME <> 'ALL'
HAVING CURRENT_PERM > 10000000000
GROUP BY 1,2) pds

WHERE dbt.DATABASENAME IN ('AAAA','BBBB','CCCC')
AND dbt.DATABASENAME = pds.DATABASENAME
AND dbt.TABLENAME = pds.TABLENAME

-- HAVING COMPRESS_TYPE = '1. NONE'

GROUP BY 1,2
ORDER BY 1,3, 4 DESC,2
;

Here are some sample results from the query:

The query results simply give me a feel for what level of compression has been applied. NONE is bad, SVC (Single Value Compression) is better, but MVC (Multi-Value Compression) is where we really where we want to be at if possible.

The results are ordered within database by the size of the table and type of compression. Largest tables with NONE (no compression, followed by largest tables with SVC compression, etc. This should help you to quickly target the large outliers.

BTW, if you are just starting to compress, or redoing your compression, I propose this methodology:

Identify the largest tables in each database, as well as all the frequently used smaller common reference tables.

Start working down each database hitting the largest tables. For example take the top 5 from database a, b and c. Compress, implement, then move to next largest tables.

Simultaneously, work through as many of the common reference tables as you can as quickly as you can; since most of these will be your most widely used tables across applications, and will therefore benefit as wide a user base as possible. (Remember, frequency of use equates to more IO, so making these tables smaller and potentially cacheable can have a large impact on overall throughput).

Try it at your site and see what you get. If you can, please share some feedback on your results with the rest of us.

As always, comments or questions are welcome.

Good Luck!

Dave

26 REPLIES
Enthusiast

Re: Workload Toolkit – Part 4 – Compression Analysis

Hello !

What are the best practices to modify a table without compression?

Example:

Table uncompressed:

CREATE SET TABLE AAAA.TABLE-A,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
COLUMN0 BYTEINT NOT NULL,
COLUMN1 DATE FORMAT 'YYYYMMDD',
COLMUN2 CHAR(20) NOT NULL
)
PRIMARY INDEX ( COLUMN0 )
;

RENAME TABLE AAAA.TABLE-A TO AAAA.TABLE-A_old
;

New Table compressed:

CREATE SET TABLE AAAA.TABLE-A,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
COLUMN0 BYTEINT NOT NULL,
COLUMN1 DATE FORMAT 'YYYYMMDD',
COLMUN2 DECIMAL CHAR(20) NOT NULL COMPRESS ('CONSTANT0', 'CONSTANT1', 'CONSTANT2', 'CONSTANT3')
)
PRIMARY INDEX ( COLUMN0 )
;

INSERT INTO TABLE AAAA.TABLE-A
SELECT * FROM AAAA.TABLE-A_old
;

DROP TABLE AAAA.TABLE-A_old
;

Finally, collect statistics on the table AAAA.TABLE-A

Is there a better solution for it?

Regards,
Teradata Employee

Re: Workload Toolkit – Part 4 – Compression Analysis

This is the process I generally see used. Alters are limited, especially if there is any volume to the table.

The other thing to remember is that when new table is created any permissions on the old table need to be applied as well

Dave
Enthusiast

Re: Workload Toolkit – Part 4 – Compression Analysis

Thank you very much for your reply Dave!
Enthusiast

Re: Workload Toolkit – Part 4 – Compression Analysis

Thank you very much for your reply Dave!
Teradata Employee

Re: Workload Toolkit – Part 4 – Compression Analysis

Dave,

Compression (or doing it) is usually something everyone would LOVE to do - BUT, the tools available (???) are either limited, do not make the right decisions (e.g. compress all values up to and including the first 256 values), cumbersome to use, etc. etc. What would you recommend (that you know) to address these extremely time consuming issues. Provided there is a good tool (tool set) to better automate, analyze and implement, everyone would be 'compressing' and life in your world - would be good;-) Your approach is good, it's the execution of analysis on 'what to do' that is the laborious part and where a better tool would help.

JK
Fan

Re: Workload Toolkit – Part 4 – Compression Analysis

Hi Dave,
I am newbee for Teradata.
I just wanted to know, what a Teradata DBA does every morning
like check - nightly backups, check Permanen space and spool space etc.
what else he does.
Thanks in advance
Enthusiast

Re: Workload Toolkit – Part 4 – Compression Analysis

AtanaSoft has a pretty good tool called CompressTool (there is a pretty hefty price tag with it though). Deiter also has a pretty good compression algorithm that I have used quite a bit in the past.

Teradata Employee

Re: Workload Toolkit – Part 4 – Compression Analysis

Goldminer - I know Compress Tool well - was a BETA tester for it. For one customer, by compressing only 11 of their big tables with the Compress Tool (single run) they were able to save more than the price of Compress Tool! So, I think it's a matter of perception - OR - as Teradactyls, we got used to getting 'free/cheap tools' from TD (e.g. Queryman and WinDDI) - not free really but thousands of copies all over the world and not even @ TD Sites give it that impression. AND, SQL Assistant can't hold a candle to AtanaSoft's Query Tool! Horses for courses I guess. For me, I purchased AtanaSuite for myself while I was still working for TD. Now that I've left and a DBA at a small TD customer I still have it. If you've every used Compress Tool in earnest - you'd never want anything else for doing Compression. Haven't seen anything close to it - yet. But then again, that's just me for what it's worth on today's market;-) Thanks for your input.
Teradata Employee

Re: Workload Toolkit – Part 4 – Compression Analysis

Dear DN. As a DBA on a small site (just in data, not complexity) I use the PCMP suite of data collection facilities. Scripts run each night to scrape DBC tables (including DBQL tables) and put them off in history tables. From there, there are a lot of reports you can run. Also, I ahve TD Manager Alerts set up to monitory my 'data databases' and if any of the databases get wtihin X% - I get an EMail message and I then go in and take care of it. You can also create a BTEQ Script or (I prefer a Stored Procedure) to run daily, check DB sizes and if it finds any over x%, it can allocated more space using Dynamic SQL. So yo ucan automate a lot of the monitoring activities, in several ways. Hope this helps.