How Resources are Shared in the SLES 11 Priority Scheduler

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

The SLES 11 priority scheduler implements priorities and assigns resources to workloads based on a tree structure.   The priority administrator defines workloads in Viewpoint Workload Designer and places the workloads on one of several different available levels in this hierarchy. On some levels the admin assigns an allocation percent to the workloads, on other levels not.

How does the administrator influence who gets what?  How does tier level and the presence of other workloads are on the same tier impact what resources are actually allocated?  What happens when some workloads are idle and others are not?

This posting gives you a simple explanation of how resources are shared in SLES 11 priority scheduler and what happens when one or more workloads are unable to consume what they have been allocated.

The Flow of Resources within the Hierarchy

Conceptually, resources flow from the top of the priority hierarchy through to the bottom.  Workloads near the top of the hierarchy will be offered all the resources they are entitled to receive first.  What they cannot use, or what they are not entitled to, will flow to the next level in the tree.  Workloads at the bottom of the hierarchy will receive resources that either cannot be used by workloads above them, or resources that workloads above them are not entitled to.

What does “resources a workload is entitled to” mean? 

Tactical is the highest level where workloads can be placed in the priority hierarchy.  A workload in tactical is entitled to a lot of resources, practically all of the resources on the node if it is able to consume that much.  However, tactical workloads are intended to support very short, very highly-tuned requests, such as single-AMP queries, or few-AMP queries.  Tactical is automatically given a very large allocation of resources to boost its priority, so that work running there can enjoy a high level of consistency.  Tactical work is expected to use only a small fraction of what it is entitled to.

If recommended design approaches have been followed, the majority of the resources that flow into the tactical level will flow down to the level below.  If you are on an Active EDW platform, the next level down will be SLG Tier 1.   If you are on an Appliance platform, it will be Timeshare.

The flow of resources to Service Level Goal (SLG) Tiers

SLG Tiers are intended for workloads where there is a service level goal, whose requests have an expected elapsed time and where their elapsed time is critical to the business.  Up to five SLG Tiers may be defined, although one, or maybe two, are likely to be adequate for most sites.  Multiple workloads may be placed on each SLG Tier.  The figure below shows an example of what SLG Tier 1 might look like.

In looking back at priority hierarchy figure, shown first, note that the tactical tier and each SLG Tier include a workload labeled “Remaining”.  That workload is created internally by priority scheduler.  It doesn’t have any tasks or use any resources.  Its purpose is to connect to and act as a parent to the children in the tier below.  The Remaining workload passes unused or unallocated resources from one tier to another.

The administrator assigns an allocation percent to each user-defined workload on an SLG Tier.  This allocation represents a percent of resources the workload is entitled to from among the resources that flow into the tier.  If 80% of the node resources flow into SLG Tier 1, the Dashboard workload (which has been assigned an allocation of 15%) is entitled to 12% of the node resources (80% of 15% = 12%).

The Remaining workload on an SLG tier is automatically assigned an allocation that is derived by summing all the user-defined workload allocations on that tier and subtracting that sum from 100%.  Remaining in the figure above gets an allocation of 70% because 100% - (15% + 10% + 5%) = 70%.  Remaining’s allocation of 70% represents the percent of the resources that flow into SLG Tier 1 that the tiers below are entitled to.  You will be forced by Workload Designer to always leave some small percent to Remaining on an SLG Tier so work below will  never be in danger of starving.

Sharing unused resources within the SLG Tier

An assigned allocation percent could end up providing a larger level of node resources than a workload ever needs.   Dashboard may only ever consume 10% of node resources at peak processing times.   Or there may be times of day when Dashboard is not active.  In either of those cases, unused resources that were allocated to one workload will be shared by the other user-defined workloads on that tier, based on their percentages.  This is illustrated in the figure below.  

Note that what the Remaining workload is entitled to remains the same.   The result of Dashboard being idle is that WebApp1 and WebApp2 receive higher run-time allocations.  Only if the two of them are not able to use that spare resource will it go to Remaining and flow down to the tiers below.  

Unused resources on a tier are offered to sibling workloads (workloads on the same tier) first.  What is offered to each is based on the ratio of their individual workload allocations.  WebApp1 gets offered twice as much unused resource originally intended for Dashboard as WebApp2, because WebApp1 has twice as large a defined allocation.

Priority scheduler uses the same approach to sharing unused resources if the tiers below cannot use what flows to them.  The backflow that comes to an SLG tier from the tier below will be offered to all active workloads on the tier, proportional to their allocations. However, this situation would only occur if Timeshare workloads were not able to consume the resources that flowed down to them.  All resources flow down to the base of the hierarchy first.  Only if they cannot be used by the workloads at the base will they be available to other workloads to consume.   Just as in SLES 10 priority scheduler, no resource is wasted as long as someone is able to use it.

Sharing resources within Timeshare

Timeshare is a single level in the hierarchy that is expected to support the majority of the work running on a Teradata platform.  The administrator selects one of four access levels when a workload is assigned to Timeshare:  Top, High, Medium and Low.  The access level determines the level of resources that will be assigned to work running in that access level's workloads.  Each access level comes with an access rate that determines the actual contrast in priority among work running in Timeshare.  Top has an access rate of 8, High 4, Medium 2 or Low 1.  Access rates cannot be altered.

Priority Scheduler tells the operating system to allocate resources to the different Timeshare requests based on the access rates of the workload they have classified to.  This happens in such a way that any Top query will always receive eight times the resources as any Low query, and four times the resource of any Medium query, and two times the resource of any High query. 

This contrast in resource allocation is maintained among queries within Timeshare no matter how many are running in each access level.  If there are four queries running in Top, each will get 8 times the resource of a single query in Low. If there are 20 queries in Top, each will get 8 times the resource of a single query in Low.  In this way, high concurrency in one access level will not dilute the priority differences among queries active in different access levels at the same time.

Conclusions

When using SLES 11 priority scheduler, the administrator can influence the level of resources assigned to various workloads by several means.  The tier (or level) in the priority hierarchy where a workload is placed will identify its general priority.  If a workload is placed in the SLG Tier, the highest SLG tier will be offer a more predictable level of resources, compared to the lowest SLG Tier.  

The allocation percent given to SLG Tier workloads will determine the minimum percent those workloads will be offered.   How many other workloads are defined on the same SLG Tier and their patterns of activity and inactivity can tell you whether sibling sharing will enable a workload to receive more than its defined allocation.

Workloads placed in the Timeshare level may end up with the least predictable stream of resources, especially on a platform that supports SLG Tiers that use more at some times and less at others.  This is by design, because Timeshare work is intended to be less critical and not generally associated with service levels.  When there is low activity above Timeshare in the hierarchy, more unused resources will flow into Timeshare workloads.  But if all workloads above Timeshare are consuming 100% of their allocations, Timeshare will get less. 

However, there is always an expected minimum amount of resources you can count on Timeshare receiving.  This can be determined by looking at the allocation percent of the Remaining workload in the tier just above.  That Remaining workload is the parent of all activity that runs in Timeshare, so whatever is allocated to that Remaining will be shared across Timeshare requests.   

You can route more resources to Timeshare, should you need to do that, by ensuring that the SLG Tier Remaining workloads that are in the parent chain above Timeshare in the tree have adequate allocations associated with them.  (To accomplish this you may need to reduce some of the allocation percentages of the user-defined workloads on the various SLG Tiers.)  What Timeshare is entitled to, based on the Remaining workloads above it, is honored by the SLES 11 priority scheduler in the same way as the allocation of any other component higher up in the tree is honored.  But since Timeshare is able to get all the unused resources that no one else can use, it is likely that Timeshare workloads will receive much more than they are entitled to most of the time.

31 Comments
Enthusiast

Thank you for this article Carrie. Some questions:

1. As per the new architecture, at what level should we be placing our utility workloads? I know it shouldn't be tactical.

2. We have 25 workloads on our current system and we shall be shortly migrating to 14.10 nodes with SLES 11 in some time. Do you think it would be reasonable to start with the following TASM architecture once we migrate to SLES 11: Critical(single/group amp) tactical in tactical tier, all amp tactical in SLG Tier 1, and the rest in timeshare with allocations in order of concurrency (i.e. very high concurrent workloads in top, high concurrnecy wd's in high, medium concurrency wd's in medium and low concurrency wd's in low.)

3. Based on what I've read about the new SLES 11 architecture, I've concluded that it would make sense to have penalty box either in "low" access level or in slg tier 1 with an absolute "hard" limit of 1%. Which of these 2 solutions would you recommend and why?

4. It is mentioned that, on a busy system, the higher tier workloads will always be able to preempt the cpu from the lower tier workloads. At the same time, it is also mentioned that unused resources(that fail to get utilized anywhere in the tree) use a "bottom-up" approach and hence run the risk of getting allocated to the unimportant lower tiers before the more important higher tiers. To understand these 2 statements, I've built a hypothetical example after which I have few questions. Please help me answer them:

Example:

SLG tier 1 has 2 wd's : wdt1.1  - 10% and wdt1.2 - 30%. There is SLG tier 2 that has 2 wd's too: wdt2.1 - 10% and wdt2.2 - 30% and then there are some workloads assigned to timeshare(assume one workload each in top, high, medium and low)

Now if wd's at tier 1 are consuming upto their percentages, 60% (100-40%) will flow to tier 2 and if tier 2 wd's are consuming upto their allocated percentages, 36% (60% - 10% of 60% - 30% of 60%) will flow to timeshare. Now assume that nothing is running in timeshare. This will cause the 36% resources to now follow the "bottom-up" approach. Now this 36% is given to tier 2 wd's first. It is important to note that wdt1.1, wdt1.2, wdt2.1 and wdt2.2 continue to consume resources upto their maximum shared percentages. Assume that at the same point in time, 2 queries entered the system. query#1 got classified to wd 1.1 and the query#2 got classified to wd 2.1.

Question 1: The bottom-up approach will ensure that query#2 gets serviced before query#1 right and thereby make use of the 36% of resources that were unutilized?

Question 2: The reason query#1 wasn't able to preempt cpu from tasks running in the 2nd tier is because tier 1 was already consuming resources upto its shared percent limit. Is this correct?

Question 3: At the time query#1 and 2 came into the systme, if the task already running in wd1.1 completed, then will query#1 be able to grab upto 10% of the CPU from the 36% of the resources?

Regards,

Suhail

Teradata Employee

#1:  There is no one answer to where you should place your utility workload.  It depends on its priority and the priority of other work running in the system at the same time.   But you are correct that it should not be in tactical, and not likely to fit in SLG Tier 1 either.

#2:  The SLG Tiers are intended for any work that has a service level goal (or agreement) associated with it.   It could be for only tactical-like work, or it could also include business critical reporting or web applications where someone expects a short answer back.   In terms of Timeshare, while concurrency is important to keep in mind, priority of the work is a more important determinant of what workloads to put in Top vs. High, Medium or Low. 

#3:  I don't believe sites will require penalty boxes when they move to SLES 11, as SLES 11 is pretty good about not allowing any work to overconsume and therefore you no longer need a structure to hold it back.  Such work might fit in Timeshare Low.  I do not recommend use of hard limits in SLES 11, until and unless it is proven in practice that you absolutely require one.    See how well SLES 11 enforces allocations first, get some experience, then make that decision.

#4:  Tier position has nothing to do with pre-empting CPU from other active tasks.  Priority does.   A workload on SLG Tier 1 could have an allocation of 0.1, and not be entitled to very much CPU, and a workload on SLG Tier 5 could have an allocation of 90% and end up with a higher practical priority than the other, higher-tier workload.    Tier position is only one factor in determined that actual operating system shares that are used behind the scene.    

Resources generally flow downwards in the SLES 11 priority hierarchy.  Resources only flow upwards from Timeshare in the case where Timeshare cannot use all of the resources that flow into it, AND there are other SLG Tier workloads above Timeshare that need and can use more than their allocation at that point in time.   That condition is not likely to ever happen if you have defined the SLG Tier workload allocations appropriately, so that they are slightly above peak processing levels.  If this becomes a problem for you, then re-adjust the allocations of the higher priority workloads to make sure they are always entitled to the maximum they will ever need.   Your bottom three assumptions are correct, but as I said, not likely to happen with correctly tuned setup.  Plus they rest on the assumption that query2 can actually consume more than its allocation at that exact point in time.  But if it can, it will, as long as the WDs above it have maxed out their allocations.

Thanks, -Carrie

Enthusiast

Hi Carrie,

Thank you for your elaborate response. I have some more questions:

1. In sles 10 tasm, percentages were allocated to allocation groups that may contain 1 to many workloads. For eg: AG1 may have 5 wd's in it and it carries an allocation% of say 20%. This will ensure that queries running in these 5 wd's will consume a maximum of 20% on a busy box and their consumption can go beyond 20% if the box has idle CPU cycles. As I migrate the ruleset to sles 11, and if I choose to put all these workloads in slg tier 2 (slg tier 1 will have all amp tactical and tactical tier will have single/group amp tactical), I would need to allocate each of these 5 wd's a shared percent value. Should I achieve this by monitoring each wd's peak usage level and then assigning a shared percent value? If yes, then a simple hourly CPU consumption by workload over a historical range of time should get me that result, right?

2. Regarding the SLG tier 1 workloads and expediting them, it is recommended to ensure that the concurrency limit of the slg tier 1 workloads and the tactical workloads does not cross the upper limit of 20 reserved AWTs. Can you elaborate on this limit of 20? I'm not sure how this number was calculated.

3. What if we have some unimportant higher concurrency levels wd's in LOW and some important wd's with lower concurrency levels in TOP? is this necessarily an inefficient practice(reason being that the queries classified in low will spend more time on the system, thereby causing contention because their access rate will be 1/8th of the query in a TOP wd)? Will it be a wise decision to reclassify the high concurrency wd(even though it holds less importance to us) to TOP or HIGH so that they can quickly get in and out with the high cpu access rate and later free up cpu cycles for other important wd's?

4. what is the advantage of choosing to expedite tactical and slg tier 1 all amp tactical work without actually reserving any AWTs? just the expedited IO queues right?

Many thanks in advance for your responses.

Teradata Employee

1.   Just a point of clarity, in SLES 11 the allocation is to the workload, not a group of workloads as in SLES 10.   To decide the appropriate allocation (aka share percent), look at peak CPU usage for the workload in SLES 10.  Then make sure the allocation you give for the workload will allow it to get that level of resources  in SLES 11. 

To understand what level of node resources a workload will be offered when a workload is on SLG Tier 2, multiply the workload's allocatoin percent by the allocation of all parents in the hierarchy above that workload.  In this case that would be the Remaining workload on SLG Tier 1 and the virtual partition allocation percent.

2.  The limit of 20 on reserved AWTs is a Viewpoint convention, to prevent a user form over-allocation reserve pools for tactical at the expense of other active work.   AWTs for the new reserve pools come from the general pool of unassigned AWTs.   The reasoning is that if you are require more than 20 AWTs to be reserved, you are likely expediting non-tactical work, and the recommendation would be to un-expedite some of those workloads that may be of a less tatical nature.

3. Each query in Timeshare Top will get 8 times the resource of any query running in Timeshare Low, it doesn't matter what the concurrency levels are.  No matter how few the queries in Top, they will always run more efficiently than the queries in Low.  We recommend that you place workloads in Timeshare based on priority.

Why don't you wait and see how it actually works for you when you get on SLES 11.  I would prefer not to get too deeply into hypothetical scenarios.

4.  See the blog posting titled:  Expedite Your Tactical Queries, Whether You Think They Need It or Not.  

It would be a little easier for me to respond to comments if they could be kept down to 1 or 2 questions at a time.  If you are seeking in depth knowledge on a topic, Teradata training classes, Teradata Education Network webinars, orange books, and official publications might be more suitable sources to turn to.

Thanks, -Carrie

Enthusiast

Thanks Carrie for your responses.

-Suhail

Teradata Employee

Hi Carrie,

Thank you for this article. I have a question about workload hard limits:

per KAP314ACE6,

<http://pc02.td.teradata.com/__8525621800464274.nsf/0/98724B709BDEF31685257C07006E9184>

we can know WM COD and Virtual Partition Hard Limits function  for SLES11 will be available from 14.10.03.XX.

I need to know if worklad hard limits will be available from 14.10.03.XX also. Would you like to tell me if you know.

Best Regards,

Yanmei

Teradata Employee

Virtual partition hard limits and workload level hard limits in SLES11 will be available at the same time a WM COD is available.  All three levels of hard limits are bundled into a single feature in SLES11.

Thanks, -Carrie

Teradata Employee

Hi Carrie,

Thank you very much.

Best Regards,

Yanmei

kvz
Enthusiast

Hi Carrie,

On which basis queries are classified to Top, High, Medium and Low workloads in case of timeshare? 

Teradata Employee

A query is mapped to a workload, whether the workload is in Timeshare or not, based on the classification criteria that the workload has been given.   If a query matches the classification criteria of a worklad that is in Timeshare Top, that is where the query will run. If it matches the classification criteria of a workload in Timeshare Low, that is where it will run. So the characteristics of the query, as well as logon information for the session that the query belongs to, are used by TASM to determine which workload to use.

There is an entire chapter in the TASM orange book that discusses workloads and how workload classifiction works.  That would be a good source for background information on this topic.

Workload classification works the same whether you are in SLES 10 or in SLES 11, and whether you are using TASM or TIWM.

Thanks, -Carrie

kvz
Enthusiast

Thanks for response, Carrie.

I don't have link for orange books.

What is a classification criteria and which factors it considers?

Teradata Employee

Orange books are available several ways:

If you are Teradata associate you can find them in the share point orange book repository.  There are several tools available in Teradata to search for orange books, or you can ask your co-workers/manager how to obtain them.

If you are a Teradata customer or are working with a Teradata customer, orange books are available in the Teradata At Your Service orange book repository.

There is some discussion about TASM classification in the Teradata Viewpoint User Guide manual in the chapter on Workload Designer, the section titled "About Classification Settings".

There are also several courses in Teradata Education Network that describes basics about TASM, including classification criteria.   This would be a good place to get more information about the basics.

Thanks, -Carrie

kvz
Enthusiast

Thank you very much Carrie !

Enthusiast
Hi Carrie,
very interesting and clear ! can you help me to make the good relation between Workloard group assignment to Timeshare levels and Priority inherited from a PROFILE definition of Accountstring?
Let us consider two queries in two sessions for distinct Users, the both being classified in a LOW Timeshare Group according to the User name:
one with account $H and the other with account $L: how is treated the "accounting" priority ? inside the LOW Timeshare Group ? or ignored as well ?
Thanks,

Pierre
Teradata Employee

Pierre,

The mapping of a query to a workload will be determined by two things:

1.  The classification criteria defined on the workload

2.  The session and query characteristics

If the Timeshare Low workload contains only user classification criteria, then only the user information of the session that submits the query is considered, and the account string is ignored (at least for that workload).  A workload definition would have to include classification criteria on account in order for the account of a session to make any difference for which workload the query uses.  

If you have two workloads that a given query could map to, one with classification on user and another with classification on account, the Workload Evaluation order in TASM will determine which of those workloads the query will use.   When it attempts to find a workload for a query, Workload management code examines all possible workloads in workload evaluation order, and matches the query to the first worklaod it finds where the classification criteria is satisfied.

This is explained in more detail in the TASM orange book in Chapter 3 on Workload Basics.  

Best regards, -Carrie

Hi Carrie,

         Nice and detail explanation. Thanks.

Does SLES 11 consume more system CPU? We recently moved to SLES 11 priority scheduler. After the migration, we see good amount of increase in CPUUSERV (system CPU). Earlier it was around 9 to 12% of total CPU and now it is around 20 to 32%.  Because of the increase, the CPU utilization of the system is always above 95% even though the execution CPU is well under 65%.  I went through one year of history data and I haven't seen more than 15% of system CPU anytime in my system.  What could be the possible reason for this? Does it run any additional services in the backend or kernel ?  Anyone else experienced the same? 

Teradata Employee

Amir,

It is difficult to quantify CPU consumption changes between SLES 10 and SLES 11 because the accounting is completely different in SLES 11.   

But in general, most sites experience some increase in CPU usage, much of it in "system".  As with all new versions of software that adds improved features and greater value, you will see a slight uptick in CPU in SLES 11  just to account for the extra work being performed.   There  is, for example, an additional cost to the improved accuracy and granularity in keeping track of resource usage, and with the stonger enforcement of priorities.   The new operating system accounting and scheduling is orders of magnitude more precise.

In addition,  many of the small things that get done in and by the operating system have been moved to different places in the code in SLES11.  SLES11 is a complete re-architecture, and many things that ran somewhere else, now get bucketed into "system".  And there are a number of file system background tasks that run to a greater degree in system in SLES 11 compared to SLES 10.  So a comparison of that metric before and after is unfortunately apples to oranges.     

Thanks, - Carrie

Teradata Employee

Hi Carrie,

I have one doubt we are on TD 14.10 and SLES11, currently if a query is running on Step 7 which usually takes 30min to complete and if DBA team manually change workload of that query from Batch long ( low priority Timeshare )  to Batch Short ( SLG Tier 1) to complete it faster, will the change reflect immediately during the long running step  7 and the query will start receiving CPU & IO on high priority for that ongoing step as accordance to that newly assigned workload  ?  I do see change in Workload from Batch-Long to Batch Short in ViewPoint for that particular step but execution wise the step took same time. ( System was busy during that time but queires with low priority workloads were running on it)

  Can you please explain on it ?  

Teradata Employee

Sachin,

If you move a query to a different workload in the middle of a step, the new priority will take effect immediately.   If you don’t think giving a higher priority to the query is as effective as you would like, there are a couple of things you can check:

1.  Check the global weight of both the Timeshare workload the query came from and the SLG Tier workload it was moved to.  I have seen several cases where the SLG Tier workload was not given a high enough allocation percent, and actually ended up with a lower run-time priority than a workload in Timeshare.  Although that is unlikely for an SLG Tier workload to have a lower global weight than a Timeshare Low workload, if Batch Long was the only workload active in Timeshare, it would get all of the resources that flow into Timeshare from the Remaining above, and so Batch Long could be running at a higher priority than the SLG Tier workload.  The orange book explains how to calculate global weight, and there is another posting on my blog explaining global weight.

2.  Check concurrency within the SLG Tier workload.  SLG Tier workload allocatons are shared among all active requests within the workload.  So expect a query to run much faster if it is the only request active there, and slower if there are 10 or 20 other active queries in the workload.

If this is a chronic problem, you could increase the allocation percent of the SLG Tier workload so it has a higher priority.  Or reduce the concurrency of work running there, by adding a workload throttle.

Thanks, -Carrie

Teradata Employee

Thank You for your explanation, really appreciate it.

During Batch window SLG Tier has 85% global weight with all 4 workloads in SLG Tier 1 and no other levels present, there are no Tactical workloads defined in the system so everything goes to SLG and below.   We suspect that whenever Large volume of Reporting queires classified under Timeshare runs on the system along with Batch jobs, the job gets delayed.  Queries running in Timeshare are short reporting queries and some of them takes 1-2min to complete though some of them are skew. 

Can concurrency or skewness  in low priority Timeshare impact upper higher priority workloads, we verified that High priority workloads are not waiting for AWT for long period. 

Teradata Employee

Sachin,

Are you saying that all 4 workloads on SLG Tier 1 in combination have a global weight of 85%?  And that Timeshare workloads will share a global weight of 15%?

And can I assume that all the batch work (which is high priority at this time) is running in SLG Tier 1?

Assuming that is the case, then CPU will be allocated (offered) to the SLG Tier workloads based on each workoad's individual allocation percent.  CPU allocations align strictly with the setup that has been defined.

But being offered CPU and having that CPU be consumed are two different things.    Contention for other resources could hold back SLG Tier workloads from consuming all the CPU they are offered.   And while I/O is prioritized, if any of the work running in a workload is I/O-intensive, it's CPU usage patterns  will not align as closely to what it's allocation percent entitles it to.   That is because I/O is issued with system throughput as a priority, so things like merging I/O requests (that are close to each other on disk) that come from high and low priority workloads takes place, diluting priority differentiation.   In addition, if there are different speeds of disk in the system, SSD on hot storage will get serviced faster than one on cold storage.  And in order to exhibit any I/O priority difference, I/O requests must be targeting the same disk, otherwise I/Os can be issued simultaneously.   

See Section 5.4 in the I/O chapter of the priority scheduler orange book for more detail on how I/O contention may make a workload look like it is not adhering to the prioritization scheme.

I would guess most of the batch work is I/O-intensive.  If so, that will make consumption of CPU as a measure of priority effectiveness less useful.  

With more concurrency in Timeshare, there could be greater contention for I/O, and that could be a factor.   If skew is very great, that can also make clear prioritization a little more uneven.   If one AMP is congested, and others are not, then those other AMPs will slow down in their CPU consumption, and priority differentation will be more difficult to detect.    

Thanks, -Carrie

Teradata Employee

Thank You Carrie, i will go through Orangebook's IO section.

N/A

Hi Carrie,

Can you please breif the purpose of SLG in timeshare workloads  on 2800 appliances.

Thanks

Teradata Employee

Setting a service level goal (SLG) is an option within workload management.   You are not required to set a service level goal for a workload, and it may not make much sense for most appliance workloads, especially for Timeshare workloads.  Many sites do not set service level goals on any workloads.

Below is some text from the TASM orange book section on Service Level Goals.  The same points apply to the Appliance using TIWM.  

Thanks, -Carrie

==========================    

Some workloads require a goal to reach critical performance objectives, whereas other workloads require no goal because their performance levels are mostly irrelevant.

In general, it is good practice to establish SLGs for the important workloads, but especially the tactical workloads. TASM helps to establish a goal-based-orientation not only by encouraging you to set goals, but also by helping you establish and evolve those goals so that they reflect the needs of the business. SLGs, and how actual performance compares to those goals, are communicated clearly in the Workload Monitor portlet and can be a subject of or a column in data mining exercises on the DBQL and TDWMSummaryLog tables.

SLGs are measurable. They can be set on either of the following:

•             Response time at a particular service percent (e.g., 2 seconds or less 99% of the time)

•             Throughput (e.g., 1000 queries per hour)

To maximize the effectiveness of SLGs, they should be realistic and attainable, as well as supporting the business and technical needs of the system. But when SLGs have never been set for a workload, it is difficult to know the value that best represents the business and technical needs of the system. So how do you determine the right value for the SLG? Several approaches can be taken, as follows, but keep in mind that the SLG may evolve over time as needs change or knowledge increases.

•             Known business need – For example, a web application is used by many demanding but inexperienced users. Experience has shown that users kill and restart a request if it does not respond within 5 seconds, further aggravating the peak load situation that is causing their slow response times in the first place. This customer established a SLG of 4 seconds to avoid the aggravated demand.

•             Unknown need – For example, an important application currently has no established response time goal, and therefore user satisfaction has been difficult to measure. They know when things are bad based on an increase in user complaints to IT, but they do not necessarily know what response time point triggers the dissatisfaction. Consider drawing an initial “line in the sand” based on typical actual response times obtained (either equal to or, for example, up to twice the typical actual). Once that initial goal is set, measure and monitor SLGs, adjusting as necessary and as determined by cross-comparing the SLG vs. complaints or business targets missed.

N/A

Thanks  Carrie :)

i have already refered to this articel , but my understanding is not clear

so when we set SLG to workload exspecially in timeshare workloads  how the resources  will be allocated to timeshares

 there are 4 timeshare categories T,H,M,L and shares the resources  in 8:4:2:1( I refered to your other atricel on how the resources are allocated to each request in each timeshare worload

https://developer.teradata.com/blog/carrie/2014/03/how-resources-are-shared-in-the-sles-11-priority-...)

example Top timeshare WL has 1 req  which gets 8 time of 1 req in low timeshare, if SLG in applied to top timeshare work load to reach the goal will extra resources will be allocated  which can be greater then 8 times ?

excuse my typos

Thanks,

Naga

Teradata Employee

Hi Naga,

A service level goal (SLG) is only for reporting purposes, so you can compare actual performance against your goals.  It is optional.  It sounds like you were under the impression that assigning an SLG to a workload would result in a change to that workload's resource allocation.  But that is not the case. 

If you choose to assign an SLG to a workload in Timeshare that will not change how resources are allocated to the workload.  A request running in a Timeshare Top workload will always get offered 8 times the resource of any request running in Timeshare Low, and 4 times what any request in Timeshare Medium receives, whether or not it has an SLG defined.   It doesn't matter to priority scheduler if a Timeshare Top workload is not meeting it's SLG.  The priority scheduler is not even aware that an SLG has been set for that workload.

From the TASM orange book:

SLGs, and how actual performance compares to those goals, are communicated clearly in the Workload Monitor portlet and can be a subject of or a column in data mining exercises on the DBQL and TDWMSummaryLog tables.

Thanks, -Carrie

N/A

Thank you Carrie

Teradata Employee

Hi Carrie,

I am trying to understand the pros and cons of Virtual Partitioning (VP), specifically Dynamic Virtual Partitioning.  We have a shared system where two sister companies sharing the cost with a split of 30% and 70%.  We are at 15.10 and have TASM.  I understand the a VP is the top level in the hierarchy under the internal control group 'users'.  I want to understand how VP can help us manage resources better within a multi-tenant system.

We want to be able to assure that Company S, which should have a 30% share is guaranteed atleast 30%.  We don't want to use fixed VP since it would impose a hard limit on each company even when the other company does not have a heavy work load at some point in time.

1. Let's say we have two dynamic VP with a 30:70 split (for company Sm and Bg resp.,).  Assume that Company Bg is using 10% of the system, and Company Sm is using 30% of the system.  My understanding is that in this case Company Sm can exceed the 30% limit and consume more resources because company Bg is not using it's full partition share.  Let's say that with Company Sm running at 30% of the system, it issues a query ends up needing an additional 20% of the sytem, so now company Sm is using 50% of the system cpu.

2. Further, while Company Sm is running this long running high cost query, someone from Company Bg wants to run some expensive queries.  What happens in this case?  If Company Bg's query is a high CPU query (say needs an additional 40% of cpu), then does Company Sm's query immediately get's less access cycles and thus free up resouces above 30% for Company Bg?  Not sure if TASM demotes the Sm query and the effects are immediate (as in a few seconds) or not.

Or does Company Bg have to wait until Company Sm's query is finished before getting CPU cycles to consume ?

Thanks,

Amit

Teradata Employee

Amit,

You are correct in your basic understanding of how dynamic virtual partitions work.   I also agree with you that using the dynamic sharing of resources between virtual partitions is preferable than the fixed approach.

With dynamic virtual partitions, Company Sm can use more CPU than it is allocation percent of 30% entitles it to at any point in time, if Company Bg doesn't have enough work active to consume its allocated 70%.  However, as soon as Company Bg has enough active work to use its allocated 70%, it will get that CPU immediately.   It won't have to wait for a Company Sm query to complete or even wait a couple of seconds for resources to be moved over.  

The SLES 11 priority scheduler is built on the SLES 11 operating system, which immediately adjusts its resource allocations when new work enters the system that is of a higher priority, or that belongs to a previously under-utilizied virtual partition or workload.   SLES 11 does accounting and makes resource allocation decisions in the nano-second range, so you can expect that the CPU cycles will become available to the arriving Bg query very close to immediately.

Thanks, -Carrie

Junior Supporter

Hi Carrie,

We have two virtual partitions - prod and non prod. The prod has all prod IDs and non Prod has all UAT/Dev ids. In resource partition, i see that 85% is given to prod and 15% to non prod. In workload distribution, i see that we have only tactical and timeshare defined. In the timeshare i see workloads classified into top,high, medium and low. But, i dont see the UAT/Dev workload mentioned there. is this possible or am i missing anything here ?So, how is it defined that how they get resources ? Moreover, if 15% is allocated to them, does it mean that if they are not running any queries, that % can be used by the prod user ids.

--Samir Singh

Teradata Employee

Samir,

Just to be sure that I understand what you mean when you say   " In workload distribution, i see that we have only tactical and timeshare defined"  Are you talking about the Prod or Dev virtual partition?

Workload Distribution screen only shows one virtual partition's workloads at a time.   If your question is how to see the workloads from the other virtual partition:  There is a down arrow next to the virtual partition name near the top of the Workload Distribution screen which will allow to you select the other virtual partition's workloads.

In addition, if you click on the "System Workload Report" button on the top right side of the Workload Distribution screen, you can see the global weights that will be distributed to each workload within each virtual partition.   There is a blog posting that explains what global weight is and how to find it in Viewpoint:

http://developer.teradata.com/blog/carrie/2015/04/global-weights-in-sles-11-priority-scheduler

Sharing of resources among multiple virtual partition works the same as sharing resources among multiple workloads.  If the queries that run in the virtual partition are not using their allocated resource, that resource will be available to be used by work running in the other virtual partition.  The orange book titled:  "Priority Scheduler for Linux SLES 11" provides more information on virtual partitions. 

Regards, -Carrie