In Part 3 of this series, we will take a quick look at how statistics are implemented and maintained at your site.
Statistics Collection can be a complicated and very deep-dive topic, with discussions on the frequency of collection, whether to use sampled stats, automation strategies, etc. This analysis is not going to go that deep, it is a high-level look at the statistics on the tables, and I am looking for just two things:
Here is the analysis query that I use:
Note: this query leverages a version of Dieter Noeth's Stats_Info view (if you do not already have this, I am sure a web search will turn it up: try www.teradataforum.com/attach.htm, or ask your friendly on-site Teradata PS Consultant).
S.COLLECTDATE (FORMAT 'YYYY-MM-DD'),
CASE WHEN D.VALUES_SUM = 0
THEN 'ZERO OR NO STATS'
ELSE 'DIFF COLL DATES '
FROM TOOLSDB.Stats_Info_64bit S,
COUNT(DISTINCT COLLECTDATE) NUM_DATES,
WHERE DATABASENAME IN ('AAAA','BBBB')
GROUP BY 1,2
HAVING NUM_DATES > 1
OR VALUES_SUM = 0
WHERE S.DATABASENAME = D.DATABASENAME
AND S.TABLENAME = D.TABLENAME
GROUP BY 1,2,3,4
ORDER BY 1,2,3
And here is sample output from the query:
DatabaseName TableName CollectDate CHECK_STATS
============================== ============================== =========== ================
AAAA TABLE-A 2010-01-29 ZERO OR NO STATS
AAAA TABLE-B 2010-01-29 ZERO OR NO STATS
AAAA TABLE-C 2010-01-29 DIFF COLL DATES
AAAA TABLE-C 2010-02-16 DIFF COLL DATES
AAAA TABLE-C 2010-03-01 DIFF COLL DATES
AAAA TABLE-C 2010-05-16 DIFF COLL DATES
AAAA TABLE-C 2010-05-30 DIFF COLL DATES
AAAA TABLE-D 2010-01-29 DIFF COLL DATES
AAAA TABLE-D 2010-04-29 DIFF COLL DATES
AAAA TABLE-D 2010-05-30 DIFF COLL DATES
AAAA TABLE-E 2010-03-17 DIFF COLL DATES
AAAA TABLE-E 2010-05-30 DIFF COLL DATES
BBBB TABLE-F 2010-01-29 ZERO OR NO STATS
BBBB TABLE-G 2010-03-08 DIFF COLL DATES
BBBB TABLE-G 2010-05-31 DIFF COLL DATES
BBBB TABLE-H 2010-03-08 DIFF COLL DATES
BBBB TABLE-H 2010-05-31 DIFF COLL DATES
So, what does this output tell me?
First, the zero or no stats designation tells me that either this is a brand new table - not yet populated - and hence no statistics. Or, the process for statistics collection is missing this table. A quick select count(*) from <table> will tell you which case it is.
Second, for those rows that have multiple collection dates, it tells me that whatever process is being used to collect these statistics is not collecting ALL of the statistics on the table at the same time. This means that the demographics are potentially "out of sync" and could be leading to inefficient query plans.
There may be valid reasons for different collection dates; one that comes to mind would be collection of statistics more frequently on the partition and partioning column. If this is the case, you should see it as a pattern, i.e. two dates only a week apart, etc.
If you are seeing either of these issues, it tells you that you need to investigate and fix the process you are using for collection of statistics, because you are not giving the optimizer the best information you can, which will lead to less-efficient plans and non-optimal performance.
Try it at your site and see what you get. If you can, please share some feedback on your results with the rest of us.
As always, comments or questions are welcome. Good Luck!