Penalty boxes have been around for years. They come in all shapes and sizes. Their single, focused purpose is to lock away bad queries and thereby protect good queries, while not condemning the bad ones to the ultimate penalty of being aborted. If you’re using penalty boxes, I want to encourage you to look inside them once in a while, and open your eyes to some of the side-effects you might not have noticed in the past.
First, let me emphasize that only a small subset of Teradata sites feel they need to have a penalty box. This is not a widespread approach, and you are not missing out if you don’t have one defined.
That said, a penalty box is a priority scheduler allocation group with a low relative weight, where poorly-written queries (such as one with missing join constraints) can be relocated. The real force behind the penalty box is not the low relative weight (relative weights can be yielding) but rather a very tight CPU limit that is placed on that allocation group, usually (but not always) set at 5% or less. This keeps CPU usage for penalized queries capped, no matter what else is running on the platform. As a result, a query moved into the penalty box is slowed down, reigned in, caged.
Before TASM became popular, or even existed, DBAs at some sites would be on the lookout for rogue queries that were exhibiting odd behavior, like maintaining very high CPU to IO ratios, or performing in an outrageously skewed manner. Once the DBA’s attention was lasered onto one of these queries, he could manually evict the query from whatever priority it is running at and force it into this penalty box, if he decided that was the correct thing to do. From that point on the query faced a life sentence of slowly creeping to completion.
If more than 1 query at a time was demoted by the DBA, the penalty box might get crowded. CPU-per-query becomes scarcer when multiple queries share the resources of an allocation group with a low CPU limit. A crowded penalty box results in lengthier elapsed times for the imprisoned queries, and the queries are holding resources while they are still alive, such as AMP worker tasks, locks, and spool files. But because the pre-TASM DBA was reaching a verdict on each demoted query one at a time, he usually could tell about how full the penalty box was getting at any point.
With the introduction of TASM, a piece of code called exception handling often replaced the DBA and became the judge and jury when it comes to moving queries into a penalty box. This automation is good in many ways, and has greatly simplified the life of the DBA. But at the same it has taken away the direct knowledge that the DBA used to have about what’s going on in the penalty box. With TASM automating query demotions, it is possible for the penalty box to become over-populated, unbeknownst to the DBA.
There are several ways that over-population of the penalty box can happen:
I want you to understand all of this because I have seen cases where AMP worker task exhaustion is directly attributable to over-crowded, overly-restrictive penalty boxes. This can more easily happen with TASM because the DBA no longer has to be involved in the process. What can aggravate this tendency is that complex queries that are candidates for demotion often have parallel steps that do things like row redistributions or duplication or global aggregations, all which require multiple AMP worker tasks at a time. Even having just 3 such queries in the penalty box could tie up 10 or more AMP worker tasks for a long time. This could make it hard to get other, new work started up.
All workload management decisions come with tradeoffs. I know that you understand that. But sometimes out of sight is out of mind. So when you do your normal workload management tuning, and when you review your TASM settings, as I’m sure you do regularly, put penalty box health and balance high on your list of things to review.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.