When you push too much paper into your shredder, it jams; when you pour too much coffee into your cup, it overflows; when you eat too much food, you…well, you know. Everything’s got limits.
One thing that grabbed my attention when I first joined Teradata back in '88, and that I still find striking today, is how the database manages the level of work that enters the system. Managing the flow of work inside Teradata is decentralized, low-effort, and scalable, exhibiting control at the lowest possible level—at the AMP.
Each AMP, Teradata’s parallel unit, keeps track of how much work it has accepted, how much work it has active, and how much work is queued up waiting to run. When specific internal limits are reached on a given AMP, that AMP will close the door to accepting new work, giving the AMP time to catch its breath and work off the work it already has accepted.
When an AMP temporarily closes the door to new work, that AMP is in a state that we call “flow control.” When in a state of flow control, which often lasts a fraction of second, that AMP will turn away newly-arriving messages. Arriving work can be user work that the dispatcher has sent to the AMP for processing. Arriving work could also be rows being redistributed from other AMPs on behalf of a user query, or internal database work that has to be done.
Consider the example of user work arriving from the dispatcher. This work-to-be-done is in the form of a message that usually represents a single optimized query step. If the step is an all-AMP step then the message is sent to all AMPs. When the message arrives on an AMP it attempts to acquire an AMP worker task (AWT), in order to begin executing on the AMP. The work message contains all the context of what needs to be done in the AMP, and information about what the data sources are, but the message needs an AMP worker task, which acts as the engine to do the actual work.
Some AMPs may have available AWTs and when the query step arrives on those AMPs it can begin executing immediately. Other AMPs may not have an AWT available because all of their AWTs are busy servicing other work. It is acceptable in Teradata for some AMPs to begin working on a step while other AMPs queue up the message so that it can be worked on when an AWT on the that AMP becomes available. Unnecessary message-passing and coordination between AMPs is eliminated using this approach.
The following graphic illustrates that each AMP is working on its portion of a query step independently from the other AMPs. AMP 0 and AMP 2 have AWTs available and begin work immediately at Time 1. At Time 2, AMP 1 is able to get an AWT for this work and starts working, even though by that time AMP 2 has completed its work and released its AWT.
In the case where the message has to be queued up and waits for an AWT, this wait can be very short or not so short, based on how long the work currently running (and tying up AWTs) takes to complete. Each AMP has its own message queue to hold these waiting messages during the times when all their AWTs are busy.
The message queue on each AMP has a limit of how many messages it can hold for each work type. Having such a limit helps to keep the memory requirements of an AMP at a reasonable level. A “work type” is a category of work message having to do with the importance of the arriving work in completing work already underway. One work type, for example, is WorkNew, which represents work messages that come directly from the dispatcher, usually representing new user-initiated work. New work coming into an AMP is considered the less important than other work types because it does not contribute to completing already-started work, rather it starts something new.
Each work type has its own flow control gates that keep track of how many AMP worker tasks are in-use supporting that category of work. These flow control gates also keep a count of how many messages of that work type are waiting in the message queue on that AMP.
When the specified limit of messages on the message queue for an AMP is reached for a given work type, it’s flow control gate temporarily closes and any additional messages of that work type arriving on the AMP are not accepted.
The dispatcher knows when messages are no longer able to be fully accepted by all AMPs, and will retry such a message, multiple times if necessary, until there is an open place on the message queue of each AMP to receive it. If even one AMP is unable to accept an all-AMP message (either by providing an AWT or by queuing the message), then no AMP is allowed to accept that message.
The graphic below shows a situation where a single AMP has its flow control gate closed for WorkNew messages, causing a new incoming message to be turned away. That incoming message will be retried periodically by the dispatcher.
Having one or more AMPs in the state of flow control and having to retry messages is usually considered a sign of congestion, and there are recommended actions that can be taken to get AMPs back to a more normal state, and keep them there. However, being in this state of flow control is just the high-end of a graceful, efficient, highly-integrated approach to managing the peaks and valleys of work that gets thrown at a typical Teradata data warehouse. Managing flow control in this manner is certainly a better approach than simply overflowing like my coffee cup, or jamming like my shredder.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.