Statistics Collection Recommendations for Teradata 12

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Hi Carie,
Thanks for such an informative post.
We discussed what to collect and when to collect it, the various forms of it as well. I would be intrested to know which ones should NOT be collected. One of my frirnds recently suggested not to collect stats on a highly non unique column. Are there any such recommendations as well?

Thanks.
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Hi Carie,

Its a great Article.

I am in process of creating Stats Collection Process. I am targetting below thresholds for any table if require stats collection i.e.

1) Small Tables (if any growth ) than Collect Statistics
2) Large Tables (if growth >10) than collect Statistics
3) Otherwise Collect Stats on the Partition

No my question if my thresholds values are correct than incase of Large Partitioned table will this still be valid. Since there can a scenario where 2% increase in growth is recorded but in new partition. In that case we need to collect stats on the parition only or also on partitioning columns ?

Also a stupid question, how can we find partitioning column information from dbc without parsing DDL.

Thanks
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Laeeq,

You are correct about the re-collection rules being somewhat different for partitioned tables (PPI). If growth is greater than 10% in any one partition, it is advisable to recollect statistics on both PARTITION and partitioning column.

Prior to Teradata 14.0, the only way to get partitioning column information is by parsing DDL. In Teradata 14.0 however, there a column "PartitioningColumn" carried in TVFields (and can be seen in the Columns view). It has a value of Y if the column is a partitioning column. If a second column "ColumnPartitionNumber" is zero, that indicates the column is the partitioning column for a row-partitioned table (as opposed to a column-partitioned table partition).

Thanks, -Carrie
Junior Contributor

Re: Statistics Collection Recommendations for Teradata 12

Hi Carrie/Laeeq,
instead of parsing table DDL partitioning columns can be extracted from dbc.TableConstraints, too.

See "PartitioningColumn" in my new stats query:
http://developer.teradata.com/blog/dnoeth/2011/12/how-to-decode-the-binary-statistics-stored-in-dbc-tables

The number of partitions is also returned: "NumValues" for PARTITION stats ("StatsType"='Part')

Dieter
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Thanks, Dieter!
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Thanks Dieter/Carrie!!

Re: Statistics Collection Recommendations for Teradata 12

Hello Carrie,
In the case of a large partitioned table, after have a change of 10%, I need re-collect stats on PARTITION and partition column.
My question: at this point (after this 10% change) I need to re-collect in other columns of this table that have statistics? Or it depends on other factors of each column. Like any individual characteristic for example?
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

You do not necessarily need to recollect statistics on columns unrelated to partitioning at the same time as you recollect on PARTITON and the partitioning columns. However, if the entire table has changed by 10%, it is usually advisable to do so. It depends on how growth has impacted the current statistics for those columns.

Generally, with PARTITION and partitioning column(s) you want to recollect if any INDIVIDUAL PARTITION has changed by 10%, rather than waiting until the entire table has changed by 10%. In that case you could probably wait until the table has changed by 10% to do the other recollections.

Just a reminder, If you have multicolumn statistics that include either PARTITION or the partitioning column, they need to be recollected when the number of rows in any one partition changes by 10%.

Thanks, -Carrie
Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Hello Carrie,

Thanks a lot for providing a quick summary for statistics collection recommendations.

I have following doubts in statistics collection using Teradata Statistics Wizard:-

1. What are stale statistics?

2. Collection of statistics is advisable to be collected as per recommendation for a particular workload or for all tables which are part of the workload?

3. If I define a workload using "Create from SQL statement" and then get recommendations for collecting statistics on that particular workload, recommendation would include statistics collection on columns/ indexes for the query mentioned in workload. Here it might happen that statistics are collected on some columns/ indexes which may be not required for other queries. In other words, would the statistics collected for a particular query be stale for other queries?

4. Is it advisable to collect statistics for all tables that are frequently used regularly?

Thanks,
Shrey
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Shrey, My responses are below your questions:

1. What are stale statistics?

Stale statistics come about when statistics have been collected in the past, but the table has grown since the stats were collected, so the statistics are no longer accurate. Only tables that undergo growth without statistics recollections will have stale statistics.

2. Collection of statistics is advisable to be collected as per recommendation for a particular workload or for all tables which are part of the workload?

These are general recommendations to be applied depending on need. They are recommendations at the table level.

3. If I define a workload using "Create from SQL statement" and then get recommendations for collecting statistics on that particular workload, recommendation would include statistics collection on columns/ indexes for the query mentioned in workload. Here it might happen that statistics are collected on some columns/ indexes which may be not required for other queries. In other words, would the statistics collected for a particular query be stale for other queries?

No. Statistics are on the table. All queries that access that table will use those statistics, if they need them. If the table has increased its row count, then its statistics may be stale. The optimizer will recognize that the statistics are stale for all queries that access the table and require that statistics. The concept of "staleness" is not query by query. A particular stale statistic may not be used for a given query against that table, in which it case it will not matter that it is stale. It only matters if a statistic is stale if a given query is making use of the column(s) the statistic represents.

4. Is it advisable to collect statistics for all tables that are frequently used regularly?

Yes. But decisions about which statsitics are appropriate should be based on query access patterns.

There are several orange books on the topic of statistics collection that are available in the orange book repository if you require additional information.

Thanks, -Carrie