Have you ever had what looks like a great query plan turn into life in the slow lane at run time?
You may have blamed it on your table having very skewed values in the join column. And you were probably right. But the optimizer is getting better at recognizing skew, starting in Teradata 12. This skew detection happens WHILE it’s building up the query plan, not after the query begins to execute. One initial step towards this but-what-will-it-be-like-at-run-time smarts happens when the optimizer is pondering a row redistribution step for a large table in preparation for a join. I thought you might like to see an illustration of how this works.
By identifying skew before the row redistribution is already nailed into the plan, the optimizer has a chance to consider other join options. When skew is detected in the join column of a table-to-be-redistributed, the estimated cost of row redistribution will be adjusted upwards, causing the optimizer to see it as a more expensive option (and let’s face it, with skew present, it may be).
I’ve played around with this on my Teradata 12 system, and like how it works. So if you want to peek over my shoulder for a moment, I’ll share my findings with you.
Each detailed interval in the statistics histogram has a field called Mode Frequency. Mode frequency is a count of the most popular value in that interval. In interval zero (which contains mostly summary information), the mode frequency field represents a count of the most popular value across all the detail intervals. It is the most highly-skewed value in the table.
Here’s a subset of columns and a few intervals from a histogram for a column. This histogram was built after I updated the table in a way that resulted in about 5 million rows sharing the same value in this column. In other words, I consciously introduced exaggerated skew into the mix.
Interval 1 (the 2nd row) contains the most frequent value for this column, 5. There are 5,001,159 rows in this table that carry the value 5 in this column, out of about 45 million rows in the table. The non-skewed values all have 19 or 20 rows per value.
Because 5 is the most frequent value for this column in the table, it ends up in the Mode Value and its row count in the Mode Frequency column of Interval 0 (the 1st row). Interval 0’s mode frequency is referred to as the “high mode frequency” because it represents the most frequent value across the entire table.
The histogram information is used to calculate the highest number of rows that are likely to go to one AMP for the redistribution and join activity. The optimizer uses the number of AMPs (40 in this case) and this high mode frequency to make a worst-case assessment of the cost of row redistribution.
Without this new skew awareness, the optimizer in this case would have assumed that each AMP will do the work of joining about 1.1 million rows (44,991,725 rows / 40 AMPs). However, with this new intelligence, the optimizer understands that all the rows that carry the high mode frequency value will be redistributed to one AMP, and that one AMP will be doing a disproportionate amount of the work.
Once it assesses what the worst-case AMP is likely to be doing, then the optimizer further assumes that the remaining rows will be split evenly across all AMPs. Let’s break that math apart.
Considering skew, the worst-case AMP will get 5,001,159 rows (the high mode frequency count). That same AMP will also get an additional 999,764 rows (the number of rows that carry the non-skewed values divided by the number of AMPs – 39,990,566 / 40 = 999,764). This results in an estimate of slightly over 6 million rows targeting the worst-case AMP, more than 5 times larger than what the optimizer estimates would have been in the blind-to-skew case.
Instead of assuming each AMP will receive 1.1 millions rows during row redistribution, the Teradata 12 optimizer assumes a worst-case redistribution of 6 million rows for at least one AMP. It uses that worst-case number of rows to adjust the estimated cost of the row redistribution choice.
So here’s what I did, graphically shown below. I started with a baseline join between the Lineitem and the Parttbl tables where there was no skew on L_Partkey (the larger table’s join column). This particular join required that both the Lineitem and the Parttbl be redistributed, as in my database neither side of the join was the table’s primary index column.
Next, I changed the underlying values contained in the Lineitem table join column, L_Partkey, to gradually introduce skew. After each such skew-increasing effort, I made sure to recollect statistics on L_Partkey, so the optimizer would have the information needed to detect the creeping skew. Then I ran an explain to see what plan I got after increasing skew and collecting stats, hoping to see a change in join geography, that would avoid a redistribution of the Lineitem table.
The following table shows my results from this experiment, where I went from no skew (high mode frequency of 118 rows) up to moderately strong skew (high mode frequency of over 100,000 rows). In this table I capture the high mode frequency values taken from the histogram for each test (remember, I recollected stats after each change in the skew). I also capture the estimated processing time that showed up for the row redistribution step in each query plan.
As can be seen in the table above, the join geography changed when the high mode frequency reached over 100,000 rows for L_Partkey. Seeing those changed demographics, the optimizer chose to leave the large table local, and duplicate the smaller table.
Notice that the estimated processing time taken from the explain text increases a small amount as skew is gradually introduced, even though the join geography in the plan does not change for the small skew and the smallish skew cases. That increase in the estimated processing time is evidence that skew detection is working. The estimated cost of performing the row redistribution increases gradually, until such time as the duplication of the smaller table becomes a more cost-effective choice.
Of course, optimizer decisions such as changing join geography depend on many factors, including processing power of the platform, number of rows in the tables on each side of the join, and other demographic information available to the optimizer. And the costing performed by the optimizer is always taken down to the AMP level, so the number of AMPs in the configuration can make a difference in the plan you get. This is a simple example to step you through the process.
Just a reminder, if there are multiple join predicates, multi-column statistics may be required.
What I have demonstrated to you is that skew-awareness is alive and well in the Teradata optimizer, and it’s here to stay. So your life as a Teradata DBA just got a little more boring.