You may have already heard the recommendation on collecting statistics on the system-derived column PARTITION on partitioned primary index (PPI) tables. What I want to address in this blog may sound strange, but there are times when collecting statistics on PARTITION makes sense for a NON-PARTITIONED primary index (NPPI) table as well.
I’ve supplied some examples below of why you might want to do that, and what you’ll see if you do. By the way, the NPPI table is treated as if it is just one big partition with a partition number of 0.
Here’s a couple of reasons to consider collecting PARTITION stats on NPPI tables:
Prior to Teradata 13.0 (which improves the SELECT COUNT(*) efficiency), collecting stats on PARTITION and then accessing the histogram detail is often faster than issuing a SELECT COUNT(*) from the non-PPI table. This could be helpful if you are in a hurry and want to determine the current number of rows in a table, and it’s a big table. Here’s an example that illustrates this point from my Teradata 12 test system:
As shown above, depending on the number of rows in the table, collecting statistics on PARTITION can be quite a bit faster than doing the SELECT COUNT(*).
Of course, just collecting statistics does not in and of itself give you the table’s row count. You have to look at the actual histogram counts, by issuing this statement:
HELP STATS item COLUMN PARTITION;
You will get two rows back in the histogram. Here are those two rows (with just a subset of the columns illustrated) that you will get back after collecting statistics on PARTITION on the Item table:
The field labeled “Number of Rows” provides an accurate row count of this table at the time the statistics were collected.
Here is the SELECT COUNT(*) result:
select count(*) from item;
Starting in Teradata 12.0.3, Teradata 13.10 (and possibly Teradata 13.0.1) there are some new enhancements in the optimizer around the determination of stale statistics. As in previous releases, statistics extrapolation will happen only if stale statistics have been determined, plus the table is not a small table, plus there is no significant skew present.
With this enhancement, better approaches to table growth detection logic have evolved. As part of that enhancement, the table row count in the PARTITION statistic histogram can be conditionally used to help determine if the table has grown, and by how much. PARTITION stats come into play only if PARTITION statistics have actually been collected on the table and only if they offer the most recent row count demographic. This means that frequent recollection on system-derived column PARTITION for growing tables can help the database provide better statistics extrapolation, while still relying on single-AMP random AMP sampling.
Since collecting statistics on PARTITION is so quick, even for larger tables, collecting on PARTITION is a good technique to provide the optimizer with additional information that can be helpful in comparing against random AMP samples to determine if statistics are stale. This collection could be done several times a day, if that is helpful.
And in case you’ve forgotten why collecting stats on PARTITION is so fast, let me remind you. The cylinder index of the PPI table is scanned, instead of the table's data blocks and rows. This process of reading the cylinder indexes may involve accessing a few data blocks when partitions span blocks, but otherwise only cylinder index structures are examined.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.