Workload Toolkit - Part 3 - Statistics Analysis

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

Workload Toolkit - Part 3 - Statistics Analysis

In Part 3 of this series, we will take a quick look at how statistics are implemented and maintained at your site.

Statistics Collection can be a complicated and very deep-dive topic, with discussions on the frequency of collection, whether to use sampled stats, automation strategies, etc. This analysis is not going to go that deep, it is a high-level look at the statistics on the tables, and I am looking for just two things:

  • Are statistics applied to the tables or missing?
  • For those that are applied, is there consistency in the application and collection process?

Here is the analysis query that I use:

Note: this query leverages a version of Dieter Noeth's Stats_Info view (if you do not already have this, I am sure a web search will turn it up: try www.teradataforum.com/attach.htm, or ask your friendly on-site Teradata PS Consultant).

SELECT S.DATABASENAME,
S.TABLENAME,
S.COLLECTDATE (FORMAT 'YYYY-MM-DD'),
CASE WHEN D.VALUES_SUM = 0
THEN 'ZERO OR NO STATS'
ELSE 'DIFF COLL DATES '
END CHECK_STATS

FROM TOOLSDB.Stats_Info_64bit S,

(
SELECT DATABASENAME,
TABLENAME,
COUNT(DISTINCT COLLECTDATE) NUM_DATES,
SUM(COALESCE(NumOfValues,0)) VALUES_SUM

FROM TOOLSDB.Stats_Info_64bit

WHERE DATABASENAME IN ('AAAA','BBBB')

GROUP BY 1,2

HAVING NUM_DATES > 1
OR VALUES_SUM = 0

) D

WHERE S.DATABASENAME = D.DATABASENAME
AND S.TABLENAME = D.TABLENAME

GROUP BY 1,2,3,4
ORDER BY 1,2,3
;

 And here is sample output from the query:

DatabaseName                     TableName                        CollectDate   CHECK_STATS     
============================== ============================== =========== ================

AAAA TABLE-A 2010-01-29 ZERO OR NO STATS

AAAA TABLE-B 2010-01-29 ZERO OR NO STATS

AAAA TABLE-C 2010-01-29 DIFF COLL DATES
AAAA TABLE-C 2010-02-16 DIFF COLL DATES
AAAA TABLE-C 2010-03-01 DIFF COLL DATES
AAAA TABLE-C 2010-05-16 DIFF COLL DATES
AAAA TABLE-C 2010-05-30 DIFF COLL DATES

AAAA TABLE-D 2010-01-29 DIFF COLL DATES
AAAA TABLE-D 2010-04-29 DIFF COLL DATES
AAAA TABLE-D 2010-05-30 DIFF COLL DATES

AAAA TABLE-E 2010-03-17 DIFF COLL DATES
AAAA TABLE-E 2010-05-30 DIFF COLL DATES

BBBB TABLE-F 2010-01-29 ZERO OR NO STATS

BBBB TABLE-G 2010-03-08 DIFF COLL DATES
BBBB TABLE-G 2010-05-31 DIFF COLL DATES

BBBB TABLE-H 2010-03-08 DIFF COLL DATES
BBBB TABLE-H 2010-05-31 DIFF COLL DATES

So, what does this output tell me?

First, the zero or no stats designation tells me that either this is a brand new table - not yet populated - and hence no statistics. Or, the process for statistics collection is missing this table. A quick select count(*) from <table> will tell you which case it is.

Second, for those rows that have multiple collection dates, it tells me that whatever process is being used to collect these statistics is not collecting ALL of the statistics on the table at the same time. This means that the demographics are potentially "out of sync" and could be leading to inefficient query plans.

There may be valid reasons for different collection dates; one that comes to mind would be collection of statistics more frequently on the partition and partioning column. If this is the case, you should see it as a pattern, i.e. two dates only a week apart, etc.

If you are seeing either of these issues, it tells you that you need to investigate and fix the process you are using for collection of statistics, because you are not giving the optimizer the best information you can, which will lead to less-efficient plans and non-optimal performance.

Try it at your site and see what you get. If you can, please share some feedback on your results with the rest of us.

As always, comments or questions are welcome. Good Luck!

2 REPLIES
Enthusiast

Re: Workload Toolkit - Part 3 - Statistics Analysis

Hi David - Where to locate Table/view - TOOLSDB.Stats_Info_64bit ?
Teradata Employee

Re: Workload Toolkit - Part 3 - Statistics Analysis

Sorry for the late response. Look at the web link just before the query text for the stats queries. View name is our own, but logic is on web site.