Workload Management with User Defined Functions and Table Operators

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

How does TASM enforce rules such as classification and workload exceptions against User Defined Functions?  What about table functions and table operators, some of which do their work outside of Teradata?  How far can you rely on TASM classifications using estimated processing time in these cases?  Will there be accurate resource usage numbers reported to support workload exceptions on CPU or I/O?

These are some of the questions that need answering if you are extending the use of simple or more complex user-defined functions on your platform.

Simple users defined functions (UDFs) are database objects that can extend the capability of normal SQL within the database, a row at time. Table functions and table operators are special more complex types of UDFs that manipulate multiple rows, and can extend outside the Teradata Database.   Once compiled, a UDF can be referenced in SQL statements for activities such as enforcing business rules, reading a queue file, aiding in the transformation of data, or accessing foreign servers such as Hadoop.

Let’s consider straight-forward (scalar) UDFs first, then table functions and table operators.

Scalar UDFs and Workload Management

A scalar UDF is the simplest form of a UDF.  It is referenced like a column and operates on the values of a single row. Scalar UDFs are ideal for managing events and standardizing operations inside of the Teradata Database. Here are a few characteristics of scalar UDFs as they relate to workload management.

Use of AMP Worker Task

AMP worker tasks (AWT) are required to accomplish work that is sent to the AMPs.  However, whether or not AWTs will be required by a UDF will depend on which mode it is operating in:

  • Protected mode:  If the UDF is running in protected mode it uses a separate process that is set up for the purpose of executing the UDF.  When the step that contains the UDF is executed, the UDF running in the AMP worker task (AWT) will grab the UDF server process and execute the UDF within that context.  When the UDF completes its processing for that row, it releases the protected mode server process to be used for other transactions that have a UDF.  
  • Non-protected mode:  When not in protected mode, the UDF is running in the context of the AWT being used by the query step.  There is no additional AWT involved. 

Optimizer Estimates for Classification

For scalar UDFs there is a fixed default cost that is established by the optimizer, the cost to run the UDF per each row in the table.  This cost will be adjusted depending on the number of rows per table and other predicates.  So you will see an estimated processing time for steps that include non-table function UDFs, based on these fixed values.

The optimizer will use a different default cost for UDFs running in protected mode vs. unprotected mode.  

Workload Exception Handling Based on Resource Usage

TASM looks at the transaction-in-process (TIP) table when it checks whether or not a workload exception based on resource usage has taken place.

A scalar UDF will update the TIP table on each AMP after every row is updated.  Each time it finishes with its work for a given row, the UDF makes an internal API call in order to keep a running tally of CPU and I/O. This means that if each row in an answer set causes a second of CPU time to be used executing a UDF, that second of time will be attributed to the query and will be seen by workload management when the UDF relinquishes control for that row.  TASM will see an increase in CPU for the transaction over time, as the UDF processes one row, then another, then another.

If for some reason the UDF is spending a lot of CPU on one row, the CPU used against that one row will not be visible until the UDF relinquishes control of the row and updates the internal database structures.   So if a UDF were to spend a minute processing one row, or if it were caught in an internal loop, workload management would not be able to detect its CPU usage until the base SQL query gets control again.

==== ================== What is the TIP TABLE =======================================

The transaction-in-process (TIP) table holds information about all the current transactions (or requests) that are in process on each AMP.  The TIP table keeps track of things like Host Number, spool usage, CPU and I/O usage, transient journal information, user ID, the start and end times of the transaction, and other detail.  Database Query Log, PMON and TASM get some of their information directly from the memory-resident TIP table, which can only be accessed using special internal APIs.   

============================================================================

Using Workload Management with Table Functions and Table Operators

While simple UDFs appear in the SELECT list and return a single value, table functions and/or table operators appear in the FROM clause and process a set of rows.  The term “table function” in this discussion will be used to refer to UDFs that access multiple rows inside or outside the Teradata Database that have to be called one time for each row processed.   The term “table operator” will be used to refer to UDFs that can access data inside Teradata or via a foreign server, but that are called only once for an entire data set.   The table operator comes with a high level of flexibility when it comes to converting input into output and vice versa, which table functions lack.

Use of AMP Worker Tasks

No additional AMP worker tasks are required for the execution of either a table function or table operator.   The AWT already acquired to execute the query step invokes the function and performs any required database work.  Any work performed outside of the database is outside the scope of what an AWT can do.  However AMP worker tasks that invoke the function will be held for the period of time that a table operator executes externally. 

The concepts of protected mode and unprotected mode discussed earlier work the same for table functions and table operators.  Neither mode requires additional AMP worker tasks.  When external data access is taking place, it is recommended that only protected mode be used, in order to insulate the AMP from external connections.

Optimizer Estimates for Classification

No cost or cardinality estimates are currently produced by the optimizer for table functions or for table operators.  Consequently, no estimated processing time is provided that TASM can use for classification purposes.  It is unknown to the optimizer how many rows are being processed by a table function/operator or the effort involved in such processing.

Here is an Explain of a query that includes a table operator that accesses a remote server.  The Explain was taken from a Teradata Database 15.0 system.  Note that each step in the Explain includes an estimated processing time, except for the part of the plan that is executed on the remote server.  That part of the Explain text is delimited by the text “BEGIN/END EXPLAIN FOR REMOTE QUERY”.

EXPLAIN SELECT

CAST(Price AS DECIMAL (8,2))

, mileage

, CAST(make AS VARCHAR(20))

, CAST(model AS VARCHAR(20))

FROM vim.cardata@sdll7940 WHERE brand='Buick';

Explanation

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

1) First, we do an all‐AMPs RETRIEVE step executing table operator

SYSLIB.load_from_hcatalog with a condition of ("(1=1)"). The size

of Spool 2 is estimated with low confidence to be 4 rows (85,656

bytes). The estimated time for this step is 0.07 seconds.

2) Next, we do an all‐AMPs RETRIEVE step from Spool 2 (Last Use) by

way of an all‐rows scan into Spool 3 (used to materialize view,

derived table, table function or table operator TblOpInputSpool)

(all_amps), which is redistributed by hash code to all AMPs. The

size of Spool 3 is estimated with low confidence to be 4 rows (

85,656 bytes). The estimated time for this step is 0.08 seconds.

3) We do an all‐AMPs RETRIEVE step from Spool 3 (Last Use) by way of

an all‐rows scan executing table operator

SYSLIB.load_from_hcatalog with a condition of ("cardata.BRAND =

'Buick'") into Spool 4 (used to materialize view, derived table,

table function or table operator cardata) (all_amps), which is

built locally on the AMPs.

< BEGIN EXPLAIN FOR REMOTE QUERY ‐‐>

We use tdsqlh_td 15.00.00.00dev to connect to tdsqlh_hdp

01.03.02.01dev Hive Metastore server(sdll7940.labs.teradata.com)

on port 9083, then we retrieve and process 0 hadoop splits for

partitions brand = "Buick" that is about 0 bytes worth of rowdata

from remote table vim.cardata for the qualifying columns

(price,mileage,make,model,brand) and map them to the following

Teradata output columns.

price DOUBLE => REAL_DT, mileage BIGINT

=> BIGINT_DT, make STRING => VARCHAR_DT, model STRING =>

VARCHAR_DT, brand STRING => VARCHAR_DT

<‐‐ END EXPLAIN FOR REMOTE QUERY >

The size of Spool 4 is estimated with low confidence to be 4 rows

(964 bytes). The estimated time for this step is 0.08 seconds.

4) We do an all‐AMPs RETRIEVE step from Spool 4 (Last Use) by way of

an all‐rows scan with a condition of ("cardata.BRAND = 'Buick'")

into Spool 5 (group_amps), which is built locally on the AMPs.

The size of Spool 5 is estimated with low confidence to be 4 rows

(196 bytes). The estimated time for this step is 0.08 seconds.

5) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

‐> The contents of Spool 5 are sent back to the user as the result of

statement 1. The total estimated time is 0.31 seconds.

For more information about this Explain and to see other similar examples, see the orange book titled Teradata QueryGrid: Teradata Database-to-Hadoop.

If you base workload or throttle classification on estimated processing time, this query could classify to a workload inappropriately, because the time and effort taken by the table operator is not able to be represented within the query characteristics.

Workload Exception Handling Based on Resource Usage

Table functions do not update the TIP table with every row processed as do scalar UDFs.  Rather a buffer of 64K rows is constructed, and when the entire buffer is complete the combined resource usage is sent back to the TIP table.

TASM workload exception handling looks at the TIP table to identify when a workload exception, such as CPU usage, has been met.  This approach may cause a slight delay in the identification of a resource usage and the exception action may not be performed as quickly as it would with a scalar UDF.  The degree of the delay will depend how much processing the table function performs on each row within the external data source.

In the case of table operators, which are invoked once per data source, the TIP table on the AMPs will not get any resource usage information until a buffer of 64K rows has been accessed and spooled on the AMPs.  Once the table operator has completed, all resource usage can be tracked and the TIP table on each AMP will be updated a final time.  The resource usage reported is only for the Teradata database activity.   Any activity as a result of a table operator which is external to Teradata is not able to be reported in the TIP table and will not be visible to TASM.

Use workload exceptions based on resource usage with care when the workload is supporting queries that contain table functions or table operators.

New TASM Features Related to UDFs

A new TASM feature around UDFs in Teradata Database 14.10 allows target type classification to specify selected functions, similar to how other target classifications on things like tables or views works.  This includes scalar UDFs as well as table functions or table operators.  

Classification by UDF name allows you to control the workload (and thus the priority) that requests that contain specific UDFs will map to.  It also means you can now control concurrency by specific functions or groups of functions, by means of either a system throttle or a workload throttle. You could also create a filter rule with function classification that disallowed certain functions from being executed at specific times of the day.

A second related TASM enhancement that appears in Teradata Database 15.0 adds classification by Server Object.  A new database object of “server” is able to  be defined and allows TASM to use the server name explicitly for classification purposes.  A server object represents an external system that a table operator is accessing, and acts much like a view.  This enhancement allows you to control concurrency of requests that are going to be accessing a specific external server.

Comparing these two TASM enhancements, classification by function (in 14.10) is the more granular, as it allows you to exhibit control at the level of a single UDF if you wish. Classification by server object (in 15.0) will cover all UDFs that reference a specific foreign server.  

For additional information on UDFs, table functions and table operators, see the following orange books:

  • User Defined Functions, Mike Watzke
  • Teradata QueryGrid: Teradata Database-to-Hadoop, Doug Fraser and Vimalraj Panneerselvam
7 Comments
Enthusiast

hi Carrie,

While reading this article, few things triggered me to get clarified on 'delay issues on the system'. Sometimes we see too many delayed sessions in delay queue from Viewpoint. From your previous articles, i understood that there are 'send delays' (in milliseconds) and 'receive delays' (in milliseconds) for a request. Are these milliseconds are wall clock ticks OR cpu seconds? Another question is, when we see delay queue hiked to 3 digit or 4 digit number, in that case, does system experience any disadvantage in terms of resouce wastage (like CPU/IO)?

Just trying to understand the hardest hit of delay queue on the system beyond the delay in processing of the cocnern application jobs. 

Teradata Employee

Geeta,

There are many different delay queues in the database.   When you say "delayed sessions in the delay queue" that you say you can see in Viewpoint, I believe you are speaking about the  throttle delay queue.  The throttle delay queue is on the parsing engine and is delaying requests before they begin execution. 

The throttle delay queue is entirely different from the ResUsageSPS metrics WorkMsgSendDelay and WorkMsgReceiveDelay.  I am assuming that those are the two metrics you are referring to when you reference "send delays" and "receive delays".

WorkMsgSendDelay and WorkMsgReceiveDelay have nothing to do with requests that are delayed by a throttle.   WorkMsgSendDelay and WorkMsgReceiveDelay represent the time in milliseconds that a work message was delayed in being sent from one vproc to another, such as from the PE to the AMP, or being received on a vproc.   Those are milliseconds of wall-clock time that the work message was either waiting to be sent, or waiting to get an AWT once it arrives on the AMP.

When you ask about the "delay queue" becoming a 3 or 4 digit number, it is not clear if you are asking about the throttle delay queue, or the time in milliseconds that a work message was delayed in getting an AWT (WorkMsgReceiveDelay).  In either case, the object being delayed (in the first case a request, in the second a work message that represents a step in a request), no CPU or I/O is being consumed during the waiting. 

If you are asking if it is problem if WorkMsgSendDelay and WorkMsgReceiveDelay reports 3 or 4 digit numbers in ResUsageSPS,  it is not necessarily a problem.   However, it usually is an indication of running out of AMP worker tasks or being in flow control.  It would be better to monitor MaxInuse and MailBoxDepth in the ResUsageSAWT table to understand the severity of AWT exhaustion.   What WorkMsgReceiveDelay data is useful for is when you want to find out what workload is being impacted by  a shortage of AWTs.   The workloads with the higher number of milliseconds in that field in ResUsageSPS are the ones that are being impacted the most by AWT shortages.   Whether or not that is a problem for you depends on how you interpret that in your environment.  If it is low priority work  you may not care that there is some waiting to get an AWT.

To prevent AMP worker tasks exhaustion, and reduce the time reported in WorkMsgReceiveDelay, we recommend you use stronger throttle so there is less concurrency on the platform. That will lower the demand for AWTs.

If your were asking about the throttle delay queue increasing to 3 or 4 digit numbers, that is considered rather lengthy for a throttle delay queue.  A lot of users will be waiting a long time for their queries to run.  But no additional I/O or CPU is being used in that case either.  The requests are just being held at the parsing engine until each throttles counter falls below its limit.   It might be a good idea to review workload management setup if your throttle delay queues are often of that length.

Thanks, -Carrie

Enthusiast

As always my question is  perfectly answered, thank you very much for detailed explanation.

And sorry, i would have clearly mentioned about thorrle delay queue.

It good to know that the throttle delay queue (in Viewpoint under Delayed sessions from query monitor) is different from 'Send, Recieve Delays'.

While focusing on resolution, this is what I am trying to implement.

To get the exact effect with Flowcontrol, I think it is good idea to focuse on FlowControlTime(FCT) instead of FlowControlCount(FCC) based on FlowControlleD (FCD) filed in ResusageSAWT table. Just using shortcut names for easy of explanation. 

So before going to analyze on FCT, i will check wither FCD is recorded as 1 or 0. If FCD is 0 the AMP is not in flowcontrol. My question here is,

For one logging period of 600 seconds (10 mints), if FlowControlCnt showing 20 and FlowControlTime=6000 millseconds and FlowControlled=0. So in this case i can ignore even FCT showing 6 seconds?

Teradata Employee

Geeta,

FlowControlled:  This metric tells you if the AMP is in flow control at this point in time. It is a snapshot taken at the end of the logging interval.  If it is zero, it indicates the AMP was not in flow control at that time.  If it is non-zero, it means that the AMP was in flow control at interval end.  But because it is zero doesn't mean that flow control had not been happening during the logging interval.

Because this metric is a snapshot taken at the end of the logging interval, it only tells you if there was flow control at that point in time.  You could have been in and out flow control many times during the interval and FlowControlled will not reflect that.  Therefore, when this metric is 0, you should not assume there has been no flow control.

If you are no 14.0 and above, it is best to rely on FlowCtlTime to detect flow control, no matter what flow control count reports.  If you are on earlier releases,  FlowCtlTime was not always accurate, so in that case it is best to rely on FlowCtlCnt.

See page 28-29 of the orange book AMP Worker Tasks Teradata Database 14.0 for a good explanation of how the flow control metrics in the ResUsageSAWT interact, and which ones to use for what.

Thanks, -Carrie

Enthusiast

Thank you for the clarification before and after Veriosn 14 for FlowControl analysis. 

And in terms of setting up an alert on flowcontrol in Viewpoint, as far as I knnow, there is no alert type available in Viewpoint. Rather, I think we can create a 'System Event Type' for 'Flow Control' from type of the system event type.

First we would like to see how many times the system is really going in to Flowcontrol before working with application teams. So I think for getting an email alert to a DBA, setting a 'System Event Type', with "#of Amps=1", "Qualification Time =1 Minute" can send an email. So that way, we can analyze the frequency of Flowcontrol on one amp or many amps (I set # of AMPs=1 just to see every trace of Flowcontrol on the system).

Please let me know if I miss anything in considering a system event type.

Teradata Employee

Geeta,

The flow control event in 13.10 is not a very reliable way to assess the frequency or even the presence of flow control on the AMPs.  That is because it does not use the flow control time metric from the SAWT table, as the flow control event does in 14.10.  It uses the FlowControlled metric, which only reflects the state of the AMP at the end of the logging interval.  It will not tell you how often AMPs go into flow control.   So there will be many episodes of flow control that will be missed by FlowControlled.

I would suggest you think about using the AMP worker task event instead of the flow control event.  If you pay attention to when you run out of AMP worker tasks and take action when you do, you will never get into flow control conditions, or rarely get into them.  Therefore you would not need the flow control event.

Thanks, -Carrie

Enthusiast

Good to know the event is not so realiable. Just cuirous to know, was there any bug OR limitation with 13.10 for that specific event in DBMS/Viewpoint?