## Workload Toolkit - Part 3 - Statistics Analysis

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.

## Workload Toolkit - Part 3 - Statistics Analysis

In Part 3 of this series, we will take a quick look at how statistics are implemented and maintained at your site.

Statistics Collection can be a complicated and very deep-dive topic, with discussions on the frequency of collection, whether to use sampled stats, automation strategies, etc. This analysis is not going to go that deep, it is a high-level look at the statistics on the tables, and I am looking for just two things:

• Are statistics applied to the tables or missing?
• For those that are applied, is there consistency in the application and collection process?

Here is the analysis query that I use:

Note: this query leverages a version of Dieter Noeth's Stats_Info view (if you do not already have this, I am sure a web search will turn it up: try www.teradataforum.com/attach.htm, or ask your friendly on-site Teradata PS Consultant).

`SELECT S.DATABASENAME,       S.TABLENAME,       S.COLLECTDATE (FORMAT 'YYYY-MM-DD'),       CASE WHEN D.VALUES_SUM = 0            THEN 'ZERO OR NO STATS'            ELSE 'DIFF COLL DATES '            END CHECK_STATS  FROM TOOLSDB.Stats_Info_64bit S,( SELECT DATABASENAME,       TABLENAME,       COUNT(DISTINCT COLLECTDATE) NUM_DATES,       SUM(COALESCE(NumOfValues,0)) VALUES_SUM  FROM TOOLSDB.Stats_Info_64bit WHERE DATABASENAME IN ('AAAA','BBBB')GROUP BY 1,2HAVING NUM_DATES > 1    OR VALUES_SUM = 0) DWHERE S.DATABASENAME = D.DATABASENAME  AND S.TABLENAME    = D.TABLENAMEGROUP BY 1,2,3,4ORDER BY 1,2,3;`

And here is sample output from the query:

`DatabaseName                     TableName                        CollectDate   CHECK_STATS     ==============================   ==============================   ===========   ================AAAA                             TABLE-A                           2010-01-29   ZERO OR NO STATSAAAA                             TABLE-B                           2010-01-29   ZERO OR NO STATSAAAA                             TABLE-C                           2010-01-29   DIFF COLL DATES AAAA                             TABLE-C                           2010-02-16   DIFF COLL DATES AAAA                             TABLE-C                           2010-03-01   DIFF COLL DATES AAAA                             TABLE-C                           2010-05-16   DIFF COLL DATES AAAA                             TABLE-C                           2010-05-30   DIFF COLL DATES AAAA                             TABLE-D                           2010-01-29   DIFF COLL DATES AAAA                             TABLE-D                           2010-04-29   DIFF COLL DATES AAAA                             TABLE-D                           2010-05-30   DIFF COLL DATES AAAA                             TABLE-E                           2010-03-17   DIFF COLL DATES AAAA                             TABLE-E                           2010-05-30   DIFF COLL DATES BBBB                             TABLE-F                           2010-01-29   ZERO OR NO STATSBBBB                             TABLE-G                           2010-03-08   DIFF COLL DATES BBBB                             TABLE-G                           2010-05-31   DIFF COLL DATES BBBB                             TABLE-H                           2010-03-08   DIFF COLL DATES BBBB                             TABLE-H                           2010-05-31   DIFF COLL DATES `

So, what does this output tell me?

First, the zero or no stats designation tells me that either this is a brand new table - not yet populated - and hence no statistics. Or, the process for statistics collection is missing this table. A quick select count(*) from <table> will tell you which case it is.

Second, for those rows that have multiple collection dates, it tells me that whatever process is being used to collect these statistics is not collecting ALL of the statistics on the table at the same time. This means that the demographics are potentially "out of sync" and could be leading to inefficient query plans.

There may be valid reasons for different collection dates; one that comes to mind would be collection of statistics more frequently on the partition and partioning column. If this is the case, you should see it as a pattern, i.e. two dates only a week apart, etc.

If you are seeing either of these issues, it tells you that you need to investigate and fix the process you are using for collection of statistics, because you are not giving the optimizer the best information you can, which will lead to less-efficient plans and non-optimal performance.

Try it at your site and see what you get. If you can, please share some feedback on your results with the rest of us.

As always, comments or questions are welcome. Good Luck!

2 REPLIES
Enthusiast

## Re: Workload Toolkit - Part 3 - Statistics Analysis

Hi David - Where to locate Table/view - TOOLSDB.Stats_Info_64bit ?