Teradata Parallel Transporter (TPT) is a flexible, high-performance Data Warehouse loading tool, specifically optimized for the Teradata Database, which enables data extraction, transformation and loading. TPT incorporates an infrastructure that provides a parallel execution environment for product components called “operators.” These integrate with the infrastructure in a "plug-in" fashion and are thus interoperable.
TPT operators provide access to such external resources as files, DBMS tables, and Messaging Middleware products, and perform various filtering and transformation functions. The TPT infrastructure includes a high performance data transfer mechanism called the data stream, used for interchanging data between the operators.
As shown in Figure 1, a typical data stream connects two types of operators:
Each pair of operators connected by the data stream must share the same schema (that is, data layout and format) for the data to be interchanged. If the consumer operator expects a schema that is different from the schema defined in the DEFINE SCHEMA statement (in a TPT job script), an error will be posted in the public and private logs.
Figure 1: Operators and Data Streams
Prior to TPT 13.0, the standard mode of data transfer between producer and consumer operators in TPT entailed the producer operator passing individual data rows to the data stream where they were buffered and then sent to the consumer operator. It retrieved the data rows one at a time from the data stream. However, some consumer operators, such as the Load operator and the Data Connector operator, when not writing data that required the insertion of delimiters, would be capable of processing an entire buffer of data without any movement or modification of the data rows in the buffer, if only the rows were delivered in a buffered format. Thus, if producer operators were able to transfer an entire correctly formatted buffer to the data stream and consumer operators then received the entire buffer, considerable savings in code path length, and hence CPU time, could be achieved. TPT Buffer Mode, introduced in version 13.0, is the implementation of this capability.
Buffer Mode Processing
Buffer Mode processing is an internal function. It is not necessary for the user to specify the use of Buffer Mode. For a TPT producer/consumer job to be eligible for Buffer Mode, the TPT job script cannot contain any filter/projector operators or any filtering “CASE/WHEN” or “WHERE” clauses in the TPT SELECT statement. TPT decides which scenario and which type of job will run using Buffer Mode.
Not all jobs run with Buffer Mode because not all operators support Buffer Mode. Currently, the Export, Select, ODBC, and Data Connector producer operators and the Load and Data Connector consumer operators support Buffer Mode. LOB importing and exporting are not Buffer-Mode eligible.
The following are the typical scenarios that are Buffer-Mode eligible:
Exporting rows from a Teradata Database table and:
The use of Buffer Mode has produced substantial performance improvements in TPT.
Figures 2 and 3 show parallel loads from multiple files:
Figure 2: Parallel Loads from Multiple Files – TPT 12.0
Figure 3: Parallel Loads from Multiple Files – TPT 13.0
Comparison tests between TPT 12.0 and TPT 13.0 were performed with one cabinet and one TMS with RAID0 and 4 LUNs on a 2550 machine. As the numbers indicated, both TPT 12.0 and TPT 13.0 show close to linear performance when the number of file streams (one stream per disk) increases. With TPT 12.0, the acquisition phase rate was at 157MB/second when both the load server and the DBS approached 83% CPU usage with 6 parallel file streams. With TPT 13.0, the acquisition phase rate was at 178MB/second when the load server approached 55% CPU usage with only 4 parallel file streams, but the DBS CPU usage was close to 100%.
However, as the size of an individual data row approaches the buffer size, the performance improvement diminishes or vanishes entirely, due to the excessive amount of buffers being transferred in data streams. To address this issue, Buffer Mode was enhanced to block multiple buffers into a single data stream message. While the original Buffer Mode requires the detailed involvement of the operators, the blocking enhancement is entirely transparent to the operators, that is, any change to the “blocking factor” (either by the user or by TPT) doesn’t require any change to operators.
Buffer Mode Blocking Factor
The central problem with blocked Buffer Mode is determining a blocking factor, that is, the number of buffers in a message. The blocking factor is determined based on the:
The Buffer Mode blocking factor formula is shown below:
Buffers/Block = (MemoryPercent * TotalSharedMemory) / ((ProducerCount + (QueueDepth * ProducerCount + 1) * ConsumerCount) * BufferSize).
TPT gets the producer and consumer counts from the job script, while the consumer operator sets the buffer size. The other three parameters, total shared memory, data stream memory percentage, and data stream queue depth have defaults of 10 megabytes, 80%, and 2 respectively.
TPT does provide the default setting of the “blocking factor,” but it may not be optimal because it only takes the default values for the MemoryPercent, TotalSharedMemory, and the QueueDepth when deciding the “blocking factor.” If users want to use a larger “blocking factor” to minimize the number of buffers being transferred through data streams, they need to adjust those values accordingly.
Very ineresting this explanation on buffer mode.
Hope that I'm in the right forum for my case. I need help please.
Before openning TPT to developpers I need to ensure that our load server is enough sized (CPU and memory) to TPT activities.
Load server's features : 65 GO RAM and 16 CORES
We limited the utilities activity to 30 in TASM. It's included TPT Load, TPT multiload and TPT Export.
I'm worried about the sizing of the CPU :
Potentially I can have 30 simultaneous load jobs (30 unix process).
I can have multiple instances in a TPT job. Suppose N the average number of instances per job.
So that I can have N * 30 unix process in parallel.
If N = 2, 60 unix process simultaneous
If N = 3, 90 unix process simultaneous
With 16 CORES can I keep confident ?
what is the right approach to size my server over my business?
In terms of memory size I think I'm pretty good because with 30 simultaneous loads, I need the maximun
128 MB * 30 = 3.75 GB of shared memory.
Thanks for your advice
Obviously no inspiration on this subject.
I Would like very much to share points of view on these concerns related to the consumption of TPT.