Blog

The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

02-17-2012
03:11 PM

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

02-17-2012
03:11 PM

Accurately identifying how much a table has grown since statistics on a column were last collected is an important prerequisite for the statistics extrapolation process. Once table growth has been detected, then stale statistics can be adjusted accordingly. Extrapolations based on table growth will be be more accurate if statistics are being regularly collected on the system-derived column PARTITON, whether or not a table is partitioned.

As mentioned in the earlier posting of mine on this topic, a non-partitioned primary index (NPPI) table is treated as if it were just one big partition with a partition number of 0. When PARTITION statistics are collected, whether on a PPI or a NPPI table, the cylinder index is scanned in order to establish the number of rows per partition. In the case of a NPPI table, the number of rows in partition 0 will the same as the table row count. Collecting statistics on PARTITON is a very fast operation because base table rows are not touched.

When the data distribution of a table is relatively uniform, extrapolation can offer reasonably accurate row count estimates when a histogram is outdated. If stale data along with table growth is detected, extrapolation will make an upward adjustment to these values:

- Table row count
- Number of distinct values
- Now of rows estimated for single table selection

The enhanced table growth detection mechanism where PARTITION statistics can make a difference requires that version 4 type histograms are being used. Version 4 of the statistics histogram has been in use from Teradata 12.0.3.1 and Teradata 13.0.0.21 releases, and in all later releases.

With Version 4 histograms, if a statistic being collected is on the PI or system-derived column PARTITION, then a special random AMP sample (RAS) is collected at the same time and is stuffed into the histogram. This RAS is saved for future use with extrapolation--extrapolation for any index or column within that table that has had statistics aleady collected.

When a query is being optimized, a new random AMP sample (RAS) is always collected for each table in the request. Additionally, PI or the PARTITION histograms for a given table will be looked for, and read if they exist. If both PI and PARTITON histograms exist, the most recent will be used to determine the staleness of whatever other index or column statistic is needed on the same table.

The old RAS value number of rows (taken at the time either the PI or the PARTITION statistic was collected) will be compared to the current RAS value number of rows that the optimizer just collected, as a means of identifying table growth. The delta between those two row counts will indicate how much growth that table has undergone since the PI or PARTITION stats were collected.

If neither PI nor PARTITION statistics are available, then a row count estimate from the current RAS is used to estimate the current table row count. This current RAS estimate is compared against that statistic’s histogram row count. Comparing actual rows counts in an old histogram against a current RAS estimate is a less accurate indicator of table growth than you would get if PARTITION or PI stats existed. When the old and current RAS comparison is made, the determination of table growth is drawn from row count estimates originating from the same single AMP. Unless there is skewing in how the table grows (some AMPs have added a lot more rows, other AMPs none), this approach has proven to be more accurate, particularly when rows are distributed unevenly across all AMPs.

The algorithm that compares old RAS with new RAS (where PARTITION statistics play a role) provides reasonable results in most cases. But there are a couple of things to keep in mind with table growth detection:

- If tables have a lot of variable data, or are using compression, or have uneven data growth across AMPs, the estimate based on old RAS vs. new RAS may not be accurate.
- Only upward growth is detected, so if PPI table has dropped a partition and then added a partition of a similar size since statistics were last collected, the table would appear to have not grown and no extrapolation will take place.

Be aware that if you attempt to use a sampling (USING SAMPLE) when collecting statistics on PARTITION, it will be ignored. Internally the sample percent will be automatically raised to 100%, so you will always get full statistics when you collect on PARTITION. The same recommendation applies (collect only full statistics) for multicolumn statistics that include PARTITION.

A greater level of detail on how statistics extrapolation works is available in the Orange Book titled “Optimizer Statistics Extrapolations” by Rama Korlapati, September 2010.

7 Comments

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.