Accurately identifying how much a table has grown since statistics on a column were last collected is an important prerequisite for the statistics extrapolation process. Once table growth has been detected, then stale statistics can be adjusted accordingly. Extrapolations based on table growth will be be more accurate if statistics are being regularly collected on the system-derived column PARTITON, whether or not a table is partitioned.
As mentioned in the earlier posting of mine on this topic, a non-partitioned primary index (NPPI) table is treated as if it were just one big partition with a partition number of 0. When PARTITION statistics are collected, whether on a PPI or a NPPI table, the cylinder index is scanned in order to establish the number of rows per partition. In the case of a NPPI table, the number of rows in partition 0 will the same as the table row count. Collecting statistics on PARTITON is a very fast operation because base table rows are not touched.
When the data distribution of a table is relatively uniform, extrapolation can offer reasonably accurate row count estimates when a histogram is outdated. If stale data along with table growth is detected, extrapolation will make an upward adjustment to these values:
The enhanced table growth detection mechanism where PARTITION statistics can make a difference requires that version 4 type histograms are being used. Version 4 of the statistics histogram has been in use from Teradata 126.96.36.199 and Teradata 188.8.131.52 releases, and in all later releases.
With Version 4 histograms, if a statistic being collected is on the PI or system-derived column PARTITION, then a special random AMP sample (RAS) is collected at the same time and is stuffed into the histogram. This RAS is saved for future use with extrapolation--extrapolation for any index or column within that table that has had statistics aleady collected.
When a query is being optimized, a new random AMP sample (RAS) is always collected for each table in the request. Additionally, PI or the PARTITION histograms for a given table will be looked for, and read if they exist. If both PI and PARTITON histograms exist, the most recent will be used to determine the staleness of whatever other index or column statistic is needed on the same table.
The old RAS value number of rows (taken at the time either the PI or the PARTITION statistic was collected) will be compared to the current RAS value number of rows that the optimizer just collected, as a means of identifying table growth. The delta between those two row counts will indicate how much growth that table has undergone since the PI or PARTITION stats were collected.
If neither PI nor PARTITION statistics are available, then a row count estimate from the current RAS is used to estimate the current table row count. This current RAS estimate is compared against that statistic’s histogram row count. Comparing actual rows counts in an old histogram against a current RAS estimate is a less accurate indicator of table growth than you would get if PARTITION or PI stats existed. When the old and current RAS comparison is made, the determination of table growth is drawn from row count estimates originating from the same single AMP. Unless there is skewing in how the table grows (some AMPs have added a lot more rows, other AMPs none), this approach has proven to be more accurate, particularly when rows are distributed unevenly across all AMPs.
The algorithm that compares old RAS with new RAS (where PARTITION statistics play a role) provides reasonable results in most cases. But there are a couple of things to keep in mind with table growth detection:
Be aware that if you attempt to use a sampling (USING SAMPLE) when collecting statistics on PARTITION, it will be ignored. Internally the sample percent will be automatically raised to 100%, so you will always get full statistics when you collect on PARTITION. The same recommendation applies (collect only full statistics) for multicolumn statistics that include PARTITION.
A greater level of detail on how statistics extrapolation works is available in the Orange Book titled “Optimizer Statistics Extrapolations” by Rama Korlapati, September 2010.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.