New, Simplified Approach to Parsing Workload Selection

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

As of Teradata Database 14.10.05 and 15.00.03, a simplified approach to determining which workload will support session management and parsing has been adopted  This posting describes this more straightforward approach which will be used starting in these 14.10 and 15.0 releases, and going forward in all future releases.

 

An earlier posting titled:  “A Closer Look at How to Setup a Parsing-Only Workload”  explains how session and parsing workload assignment takes place prior to this more simplified approach.  If you are on an earlier release, please see that posting from December 2014.

 

Background:  Session Classification

 

“Session classification” happens at session logon time.  Session classification determines which TASM or TIWM workload will be used by tasks on the PE that get the session established before any queries are issued.  These parsing engine tasks do things such as validate the user who is logging in, and they execute under the control of a workload, just like AMP work does.  Once a session begins to submit queries, all parsing and optimizing work on behalf of the queries will run at the session priority that has already been established for this session at logon time.

 

The workload that a session classifies to can be identified in DBQLogTbl in field SessionWDID.  The workload used for session handling is the same workload that will support all parsing and optimizing activity for the queries that are executed within that session.

 

How a Session Classifies to a Workload

 

A session classifies to a single workload, based strictly on the “Who” classification criteria (referred to as Request Source criteria in Viewpoint) of the workload.  Information about a session is available at session logon time and includes Who classification criteria such as Account string, Application, Username and Profile.

 

 A workload often has other classification criteria in addition to Who criteria.  All other non-Who classification criteria that the workload might have (such as estimated processing time) are ignored at session classification time. That non-Who criteria will be used only to determine which workload is chosen when a request executes on the AMPs.

 

TASM and TIWM determine the workload that will support session management (and secondarily, request parsing for the session’s requests) by taking the following steps:

  1. Sort all workloads by priority (highest priority workloads first).   In SLES 10 priority is expressed by relative weight.  In SLES 11 priority is based on tier first, and within each tier the highest global weight.  If there are multiple virtual partitions, all Tier 1 workloads will sort above all Tier 2 workloads, etc.  In Timeshare, all Timeshare Top workloads will sort above all Timeshare High workloads, etc.
  2. Select the first workload encountered from within the sorted list where there is a match between the session logon information and the Who criteria of the workload.   This selection process does not take into consideration the number of matches between a session and a workload’s Who criteria. The first workload that has its Who criteria matched by the session information will be selected.  Note that if there are multiple Who criteria on the workload being considered, the session must match all of those criteria for the workload to be considered for session management and parsing.

Below are two examples that illustrate how this selection takes place:

         

In the example above, all sessions from User A will classify to WD-Quick.  WD-Quick is the highest priority workload among all workloads that User A will match to.  In addition, WD-Quick has only a single classification criteria—on User.   WD-Quick won’t be tested against a session’s Account because parsing workload selection only needs to match to the workload’s defined Who classification criteria, in this case User.

 

In this second example, all sessions from User A and Account YY will match to WD-Slow.  Such sessions will not match to WD-Medium because WD-Medium requires both User = A and Account = XX to be satisfied.

 

In summary, the highest priority workload is selected for parsing where there is a match on the workload’s defined Who criteria, as long as there are no mismatches between other session characteristics and other Who criteria on the same workload. 

 

Setting up Special Parsing-Only Workloads

 

With this background, you can set up one, several or even many workloads specifically for parsing if you wish.  Or you can just use the existing workloads and observe the information in the SessionWDID field in DBQLogTbl output using this new understanding to make better sense of the priority at which parsing is taking place.

 

If you decide to set up a single higher-priority workload for all parsing activities for all requests, then you will want to create a new workload and give it a very high priority.

 

To make sure this workload is chosen to perform parsing for all requests, consider including a Who criteria (such as Username, or Account) using the broadest possible matching scope (Username = * or Account = *).  Only a single wild card Who classification criteria is required because the selection is not based on the number of criteria matches.  

   

It is recommended that Profile classification criterion not be used in defining a parsing-only workload, unless all users accessing the platform have an actual Profile assigned.   When there is wild card classification on Profile (Profile = *), any session that logs on without providing a Profile will be disqualified from the workload.

 

A Second Important Step

 

Because you only want parsing work to run in the high-priority parsing-only workload, and you are using wild card Who criteria, an important second step needs to be taken so you can prevent AMP work from running there (it is a parsing-only workload after all).  You must put in dummy secondary classification for that parsing-only workload so that no query will ever successfully map to the parsing-only workload for its AMP work.  

 

For example, you could create a special dummy table just for this purpose that you are confident no one actually accesses. Then specify that table in the parsing workload’s secondary classification criteria.  Or you could setup multiple criteria that are contradictory and therefore highly unlikely to ever be represented together in a query.  For example, you could add query characteristic criteria that mandates that only single/few AMP queries will run in the workload, and at the same time include very large plan estimates, as shown in the following example.

 

Parsing_Only_WD_New2.jpg

Establishing a parsing-only workload is not going to reduce the resources required to do parsing.  If the plan is complex or many decisions have to be made by the optimizer in producing the plan, the same number of CPU seconds will be required to accomplish parsing. However, a high priority parsing-only workload may speed up the time to do that parsing when the system is busy, because parsing may now be running at a higher priority.

 

You could consider placing this parsing workload on the SLES 11 Tactical Tier, but be very careful in doing that.   If some of the parsing activity consumes more than 2 CPU seconds per node, the parsing task risks being demoted to a lower priority workload due to the automatic tactical exception that all tactical workloads come with.  If you wish parsing to take place at a tactical priority, consider increasing the CPU threshold for demotion in the parsing-only workload’s tactical exception definition, so that it is greater than any expected parser CPU times.

 

Of course, as is true with all workload management decisions, you need to examine the tradeoffs involved.   When you increase the priority of one type of work, by definition you reduce the priority of some other work.  So make sure you are keeping an eye on the overall balance of priorities on your platform.  As a result, you may want to only apply this parsing-only workload technique to just a critical set of queries, rather than all active queries.

13 Comments
Teradata Employee

Hi Carrie,

Excellent article quick and useful.

Two questions how does Teradata normally do the same operation absent this technique?

And how can I measure the benifit I get after implementing this, will it it be purely in terms of shortened elapsed times or are there other metrics to consider as well?

Thanks

Teradata Employee

There's an earlier posting that explains the earlier approach to how the parsing workload is selected:

http://developer.teradata.com/blog/carrie/2014/12/a-closer-look-at-how-to-set-up-a-parsing-only-work...

If you haven't set up a parsing-only workload, parsing will still take place based on the highest priority WD that matches on Who criteria as described in the section "How a Session Classifies to a Workload" in the text above.  In the second example illustrated in that section, it could be that the WD-Medium does not have a very high priority compared to other workloads.  So if you have not set up a parsing-only workload (which you may be just fine not doing), the priority at which parsing runs may be high sometimes, low sometimes, depending on your workload management setup. 

The main benefit of using this approach over the previous approach is that if you do set up a workload solely for parsing priority, you don't require multiple wild card Who criteria on that workload, as you had to do with the earlier approach.   Previously, you had to keep adding wild card Who classificationcritieria to a parsing-only workload if you had other workloads of a lower priority that had more Who criteria than the parsing workload had.  That is because the old approach selected the highest priority workload that had the greatest number of Who criteria among the workloads that were matched.  If you did not do this, then some queries did not pick the  intended workload for parsing.  So the old way, you had to make your parsing workload's scope artificially high.  Not everyone did this, so parsing prioritization was not consistent.

The performance advantage of this new approach is likely to be small, subtle, for most sites,  as parsing generally uses very little resources.  But it could be more noticeable in cases where you had many queries that required multiple seconds of CPU to do the parsing, and the contention for CPU was high, and the parsing had previously been running at a low priority.  You could experience faster parsing in cases such as that, and under some circumstances that could reduce the total execution time of a query, especially short-running queries.   Especially since the parsing work is run on a single task and cannot be parallelized as can AMP work, and is competing against an increasing number of AMPs on the node for resources.

Thanks, -Carrie

Enthusiast

Hi Carrie,

assuming that DBQLOGTBL captures the whole logging of the activity, what is the proven rule to distinguish session rows with a specific parsing WDID ?

everytime SessionWDID is different from WDID (or Final WDID) ?

and with parserCPUtime > 0 ?

Thanks,

Pierre

Teradata Employee

Pierre,

Each DBQLogTbl row represents an entire request, parsing and AMP work combined.   There are no "session-only rows" in DBQLogTbl, there are only "request rows" (assuming I am understanding your terminology correctly).  Within each request row in the DBQLogTbl you can identify which workload supported the parsing activity for that request by looking at SessionWDID.   You are correct that SessionWDID is often a different (but sometimes is the same) workload than WDID or FinalWDID fields.  They can be different bbecause the AMP work only is represented in the WDID and FinalWDID fields and that workload is selected based on the request matching all of the classification criteria of the workload.   As this above posting describes, the SessionWDID is selected differently than the WDID.  It is selected only based on Who criteria, and then the highest priority among all workloads that match the request's Who criteria.

Please get back in touch if I have mis-understood your question.

Regards, -Carrie

Enthusiast

Carrie,

I do agree on everything about rows/requests considerations. About Session WDID and WDID of the request i can't say that they often differ,

just it happens on some users.

About identification of specific parsing workload  i have seen cases where SessionWDID  and WDID are the same and ParserCPUtime is very much higher than AMPCPUtime ! (for instance a ParserCPUtime of 5 seconds compared to an AMPCPUtime of 0,13 seconds).

An other case i can not explain (i do not know): requests with  SessionWDID identified, WDID is null, numsteps = 0 , ParserCPUtime > 0 and AMPCPUtime = 0.

Should those lines be included in a total parsing CPU time for the session they belong to ?

Thanks,

Pierre

Teradata Employee

Pierre,

The number of CPU seconds used for parsing compared to the number of CPU seconds used for AMP work will depend on the type of query that is executing.  I have seen examples similar to what you have described below, but that is not necessarily a problem.  Sometimes there is a very complex query that requires a lot of CPU for the plan to be built, but only executes on a very small amount of data so uses very little CPU on the AMPs.  But that is not related to the priority of the workoad where the parsing takes place.  The request will used 5 CPU seconds for parsing whether it runs in a high priority workload or a low priority workload.   The reason a parsing-only workload can be useful is that it offers a potential way to reduce the time it takes to do all that CPU work, but running parsing under a higher priority workload.

I have also seen cases where there is SessionWDID but no WDID, along with the other conditons that you describe.  The most common situations where this will happen if the request is a HELP (HELP DATABASE, HELP STATS, etc.) or a SHOW (SHOW TABLE, SHOW JOIN INDEX, etc.) or any other similar command that runs primarily on the parsing engine.

Most of those types of commands never go to the AMPs, so do not accumulate AMP CPU time and they do not classify to a WDID.    Maybe you can check the QueryText column in DBQLogTbl to see if that is the case for your examples.

DBQL reports parsing CPU time and AMP CPU time for a request, each line representing one request.  So when you see a line of output with no WDID it does not mean that CPU usage is not coming from a request, it just means the request didn't go to the AMPs.  

Thanks, -Carrie

Teradata Employee

Hi Carrire,

Thanks for this Blog.

My DB version is 14.10.05.03 and we have TASM on this system. We created one workload for Batch in SLG layer. When i check the fastload session in viewpoint on workload monitor/Query monitor, it shows worload information blank. I check DBQLogTBL for WDID,FinalWDID and SessonWDID and found some of the sessions (For APPID TPTLOAD) have WDID and FinalWDID as blank ( But thes query text for these sessions are like help database/help session etc. Some Aborted Select Queryies with error 'table does not exist' also have the same issue)

My Question :

Fastload in viewpoint shows workload information blank but it will not execute these queries (help session/help database etc.) all the time. It should show fastload  information in viewpoint under which workload this fastload session will execute but we are getting workload information as blank through fastload session life then What will be th priority of this fastload session ?

Thanks in Advance.

Teradata Employee

Sandeep,

I am not sure that I have understood your question completely.  If you question has to do with commands like SHOW and HELP that do not show a WDID in DBQL, that is because those type of commands are usually issued as express requests from the parsing engine.  Since they do not go to the AMPs, they do not require classification to a workload and do not have a workload assigned.

There's an explanation and an example of this in my Teradata Magazine column from 2nd quarter 2015 that illustrates this behavior of HELP and SHOW commands:

http://www.teradatamagazine.com/v15n02/Tech2Tech/Technical-Insights-About-the-Teradata-Database/

If your question is mainly about what Viewpoint does or does not display, I am not able to help you with that.  You will need to ask someone who is knowledgeable about Viewpoint.  But if there is no WDID assigned to the requests, because of the explanation provided above, there would be nothing for Viewpoint to display.

Thanks, -Carrie

Teradata Employee

Thanks Carrire.

I think i should raise this concer in Viewpoint Area. Well My question is as:

Workload Information is blank for fastload session in Viewpoint. When we check session in viewpoint query monitor, its shows Workload info also in overview tab like under which workload this session is running. But for fastload session this information is blank.

Thanks for your response :)

Teradata Employee

I agree with Sandeep, i too see similar behaviour.

Question: Why there is no workload information present for Fastlaod sessions in ViewPoint.

 Workload Info section is Blank for all the session in the same LSN of Fastload Job, i do see Delta CPU and IO values changing.

Teradata Employee

Sachin,

I am not familar with that situation.  You may want to open an incident with the support center, or ask your question of someone with expertise in Viewpoint.

Thanks, -Carrie

Enthusiast

Carrie,

Thanks for greate articles / blogs on DBQL. I have a situation that is somewaht disscussed above but not matching with my case.

Here is my example

wdid=? sessionwdid=41 and statement=SELECT. I do see delay times in these conditions. By readig above, I understand that if WDID is null and sessionWDID is not null then they are not AMP operations (like HELP, SHOW) but in my situation it is a SELECT. I am trying to fix this issue to avoid delay times.

Any help appreciated..

Thanks

Suresh

Teradata Employee

Suresh,

Could this be a PREPARE statement, if it is coming from ODBC.

Based on ODBC settings, queries will get submitted twice. The first time for parsing and second rime for actual execution. You can recognize Prepare queries because they have a WDID as NULL and AMPCPUTime with zero, but will have corresponding ParserCPUTime.  Queries which represent the actual execution will have a non-NULL WDID and will show AMPCPUTime.

If the request doesn't go the AMPs (as evidenced by having NULL in WDID), then a throttle could not delay that request.  Throttles are enforced at the dispatcher level before the first step in the request is sent to the AMPs, but requests with WDID of NULL never pass through the dispatcher module.

There is more information on this and the possibilites of unexpected delays on the blog posting called:   Things That Run In Session Workload.

http://developer.teradata.com/blog/carrie/2015/11/things-that-run-in-the-session-workload

Thanks, -Carrie