System going in critical(CPU>95) with handful of users

Database
Enthusiast

System going in critical(CPU>95) with handful of users

Hi,

Our system go in cirtical stage(viewpoint) every now than with very few number of users.Here is one of situation.System went 99.99CPU utilization.

here are some key columns from VP for active queries

SESSION IDREQ CPUCPU Skew OverheadDURATIONUSERNAMEWORKLOADIMPACT CPUSNAPSHOT CPU SKEWΔCPUREQ I/O
1733905841378.03-38935.791211WILSOB4WD-NI-LOW2442.2412.3239282141.2581982773
1733912240114.26-37784.021211WILSOB4WD-NI-LOW2330.2411.3935052064.7461984538
1732845931789.3-29553.141091WILSOB4WD-NI-LOW2236.1610.2581212006.7661987297
1733918331164.764-28979.1641091WILSOB4WD-NI-LOW2185.69.6894211973.8241983350
1733918829193.932-26893.77181016WILSOB4WD-NI-LOW2300.160214.8454011958.6881985451
1733930323114.896-18940.1763600MARTIERWD-ADHOC4174.71976.21961983915.0669259703
173394611141.016148.58410DAPPAJAWD-ADHOC1289.600111.5217150589183
173393613.496732.5040WISNIJ2WD-ADHOC7364.774457052666
173393880.7169.5240010SSACOGIDWD-RPT-SHORT10.24000145.93750402797
173394260.208438.831980SSACOGIDWD-AWS439.039986.79208660675
17339462  0VIEWPOINT 000 

 

result from resusgResUsageSpma at 9:50 when system went critical:

TheDateTheTimehrmnNodeIDNodeTypeNCPUsBusy%OS%IOWait%Idle%
12/15/20179:50:0095010228005699.984.60.020
12/15/20179:50:0095010928005699.974.640.020.01
12/15/20179:50:009501062800561005.0200
12/15/20179:50:009501052800561005.130

0

 

 

Can somebody please explain math behind CPU calc.

at 9:50

Teradata Viewpoint

System Health
99.99% CPU Utilization
97.95% User
 
Here is our CPU

--So estimated total available CPU seconds/day is #Nodes*#CPUs*Seconds/day 

4*56*86400 =19353600

Per hour : 806400(19353600/24)

5 REPLIES
Enthusiast

Re: System going in critical(CPU>95) with handful of users

Can somebody please reply..

Highlighted
Teradata Employee

Re: System going in critical(CPU>95) with handful of users

Teradata is designed to fully consume resources of the platform.The goal is to use the resources as full as possible in order to minimize response time. There are many AMPs on each node, each running a portion of each request. All of them can consume resources simultaneously. This means that when even a small number of CPU intensive pieces of work run simultaneously, the CPU available on the platform will be fully utilized. With a typical mixed workload, there is some work that is CPU intense and some that is IO intense at a particualr time and the resource utilization is more balanced. But if several (especially long running) CPU intensive pieces of work are running at the same time then the CPU can easily be fully utilized. On modern platforms with lots of memory and SSD, this can even be more pronounced if the data being worked on is in-memory or on fast storage.

 

Full utilization of CPU is not a problem by itself. However, if the CPU intensive work is impacting response time of higher priority work, then some workload manamgent rules to allocate CPU priority to the higher priority work may be indicated.

 

And of course it is always good to take a good look at high consuming queries to see if there are opportunities to optimize the work.

Enthusiast

Re: System going in critical(CPU>95) with handful of users

Thanks Todd!!

 

Can you also help me understand workload distribution. We have simple timeshare (no tactical/SLG:no active TASM) top,high,medium,low).

If there is nothing running on the system and top priority workload comes than obviously all resources will go to it(no 8:4:2:1).Now if a high,medium workloads comes all at same time.How resources will divide.Also if it followed by a low priority workload.

 

If only low WD is running (it will take all resources) than a top WD comes and it will take 8 times of all.I am confused.

 

When a query runs,how viewpoint (in query monitor) show resource consumption details. Is it till current step resources consumed or only for current step.I get confused as at one time it show 500GB spool and after few minutes it show 50GB.for total resource consumed by query do we need to use dbql?

Teradata Employee

Re: System going in critical(CPU>95) with handful of users

On a current sytem (SLES 11), a query in Top is given the chance to consume 8x the amount of CPU over an elapsed time interval as a query running concurrently in Low. It's all about taking turns and prioritizing who gets to go next.

 

Yes, many of the metrics in query monitor are "snapshot" values (as of last sample / current step) rather than cumulative. DBQL is a better place to get total / peak values.

Teradata Employee

Re: System going in critical(CPU>95) with handful of users

A key thing to understanding CPU prioritization is that the CPU allocation is continuously re-evaluated at very short intervals. In each interval the relative CPU allocation is granted/enforced. So when a new piece of work enters the system, the allocations to all work are very quickly changed to reflect the new mix. Thus, a low prioirty only workload can consume the entire CPU capacity but when a high priority piece of worrk arrives, the appropriate allocation will be given to the high work at the expense of the low work.

 

For more details about scheduling and WLM, I suggest looking for Carrie Ballinger's posts and blogs on the topic.