Yet ANOTHER List…Post-Expansion Workload Management Checklist

The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

Yeah, I know. All you need is another “thing to remember to do” following your upcoming hardware upgrade. As if your “must do” list wasn’t long enough already.

Well, consider what I’m giving you as a “soft” list…Its a few things worth checking after an upgrade, but the sky won’t fall in if you don’t get to it right way. So relax, take a deep breath, and ponder my post-expansion soft to-do list at your leisure.

The Back-Story

The things I’m about to prod you about will make a lot more sense if you start off embracing my basic premise that a more powerful platform will require less time to accomplish the same work. Make sure you really get that: More CPU power--less time to accomplish the same work.

From there, it’s not such a big leap to these conclusions:

  • The optimizer will produce query plans with lower estimated processing times on more powerful hardware.
  • The same query will complete having accumulated less total CPU seconds on the new, more powerful hardware than it did on the old.

Here’s That Checklist

I'm not saying that all of these settings will need to undergo change.  Maybe none of them will.   But since there's a possibility they will behave differently after a hardware expansion, its a good idea to check them out.  

Step Time Thresholds on System Throttles:

A step time threshold is an escape mechanism for short queries. It determines which queries will be under the control of a system throttle, and which queries are allowed to execute immediately. If a step time threshold is present (it’s an option), a query whose step estimates are less than that number of seconds will skip the throttle rule.

You may want to lower your step time thresholds based on seeing lower estimated processing times on the new hardware. This is easy to check. Just run a few explains before and after the upgrade, focusing on queries where estimates are slightly above or below the step time threshold. Then see how those explain estimates have changed, and lower the step time threshold to correlate with that change. I’ve seen DBAs simply use the factor of increased processing power for the platform and divide the threshold number of seconds by that factor. For example, if the new platform has twice the power of the old platform, cut the step time threshold number of seconds in half.

TASM Classification Criteria:

With TASM, many sites use secondary classification criteria of estimated processing time. This practice allows queries with short estimates to be mapped to a workload associated with a higher priority, and longer queries to map to a workload associated with a lower priority. With a more powerful processing platform, classification criteria may need to be lowered, so queries of the same approximate magnitude continue to classify as they did before the upgrade.

TASM Exception Criteria:

If you have exception criteria defined that includes CPU seconds consumed, you may want to think about lowering this parameter. With faster CPUs, more work can be accomplished within the same number of CPU seconds. To have the same exception experiences as before the upgrade, use DBQL to compare CPU seconds consumed before and after, then make your adjustments accordingly.

Query Milestone Thresholds:

Query milestones are for the most part a pre-TASM option that allows demotion of a running query from one allocation group to another. Whatever number of seconds you have defined as your milestone threshold will be enforced on each node separately. In most cases, you will want to consider lowering this threshold number of seconds after an upgrade, if you want to the demotion to take place at about the same point in query execution as before the upgrade.

On the other hand, if you have more AMPs per node, and fewer nodes, after an upgrade, you may want to leave the query milestone threshold where it is. Remember, a query milestone threshold is enforced at the node level. With more AMPs per node, the new node will have more work to do. Check out the following graphic.

Other Things to Think About

If you alerting on any resource usage level, you might want to think about reviewing the point in which the alert is issued. Additionally, while throttle limits that control concurrency usually don’t need to change after hardware expansion, it could be that on a more powerful platform that you can tolerate a higher level of concurrency for some workloads. So you might (key word is might) want to relax (raise) some of your query limits.

Hi Carrie..Even though we have set MAX limit of CPU to 50..Still when the user is running COLLECT STATS statements it is exceeding the 50% of CAP ..oor RDBMS version is this something expected..Please advise..
Teradata Employee
A CPU limit works like a thermostat. The defined limit has to be exceeded before the logic kicks in to begin slowing down work that it controls so that CPU usage falls under the CPU limit. So you will rarely see a CPU limit holding exactly at where you have set it. Usually this variation is 1% or 2% points either above or below, but it could be little more.

If you are seeing a 50% CPU limit exceeded by more than a few percentage points for a prolonged period, then I would suggest you open an incident with the support center.

Thanks, -Carrie
Hi Carrie,
We are going to expand our systems with following things. I really love this article bcz it is a good checklist for my site as we are expanding our systems.
But I need help to understand whether all the above checlist items will change OR not. Actually I am confused with your point. How can we see none of them have been changed or will going to behave differently. My only concern is to have a proper resource allocation even after expansion.

""I'm not saying that all of these settings will need to undergo change. Maybe none of them will. But since there's a possibility they will behave differently after a hardware expansion""".

Expansion Details:
PROD -- We are not adding any nodes but increasing PMCOD. Is currently at 50% PMCOD, we are going to upgrade it to 62.5% (Currently we are on SLES10, DBS
PREPROD-- Merge 1650 (1+1) with 1650 (9+9) (Currently we are on SLES10, DBS
TEST--Merge 2690 (2N) with 2690 (4N) (Currently we are on SLES10, DBS
DEV--Merge 2690 (2N) with 2690 (4N) (Currently we are on SLES10, DBS

One more quick qn. What is the difference between PMCOD and CDS COD. I saw your articles and understood PMCOD means Platform Metering Capacity On Demand. Can you please explain me the exact difference?

Thank you.
Teradata Employee

The above posting was assuming a change in hardware processing power, not a change in PM COD. For that reason you cannot apply this checklist directly to PM COD changes.

Although it is possible, I have not seen evidence or heard any arguments saying that explain estimates will change when PM COD settings change. So I don't think you need to be concerned with any workload management settings that use estimated processing time. However, it would be good to check before and after the PM COD change, to see if the same query plan estimates are still the same or not, just to make sure. But I wouldn't expect them to change much at all.

However workload exception criteria based on CPU usage WILL be different when you alter the PM COD percent. So if you have those types of exceptions you will want to reduce the exception thresholds so they demote at fewer CPU seconds. This is because CPU seconds will be more powerful after the increase from 50% to 62.5% PM COD, and will get somewhat more work done per second consumed. So queries will complete with fewer CPU seconds. This will be apparent in DBQLogTbl CPUTime metrics.

In an easy example, if you go from 50% PM COD to no COD (100% PM COD), you could expect CPU used by the same query to be cut in half:

50/100 = .50

With no PM COD your query would consume 50% of the CPU seconds it required at 50% PM COD.

You can apply the same approach for getting a sense of what the expected change in CPU consumption would be in your case. But you should also validate that by looking at the CPU usage of a given query before the change and then after the change.

I am not familiar with the term "CDS COD". The only other COD I know about is priority scheduler system level CPU limits (sometimes referred to as COD). Priority Scheduler COD only applies to CPU. Differences are that it can be set at 1% granularities, and can be easily changed by the DBA as it is just another priority scheduler parameter. It doesn't have any impact on running work until the COD limit is actually hit. And it doesn't cause any difference in what is reported in DBQL, because each CPU second is still a full CPU second.

I have a different blog posting that describes priority scheduler COD in more detail, called "How to set a system-level CPU limit for Capacity on Demand", if you want to take a look at that.

Thanks, -Carrie