Teradata TPump dynamic buffer filling

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Teradata Employee

Teradata TPump dynamic buffer filling

TPump has been enhanced to dynamically determine the PACK factor and fill up data buffer if there is variable-length data. This feature is available in Teradata TPump 13.00.00.009, 13.10.00.007, 14.00.00.000 and higher releases.

TPump’s behavior prior to this feature

TPump utilized the "defined" row size, rather that the "actual" row size, to determine how many rows would fit into its data buffer. For example, a VARCHAR column of 35000 bytes will be assigned a pack factor of 27 which is determined by up front test, even if 99.9% of the incoming rows have only 50 bytes of data in this VARCHAR(35000) column! Such an up-front testing approach makes sense only if there are no variable-length fields in the input data; it was highly inefficient with variable-length fields.

TPump’s current behavior with this feature implemented

TPump will now dynamically determine the optimal PACK factor for input data with variable-length fields in Array Support. The user sets the PACKMAXIMUM option or explicitly defines PACK 2430, and TPump will then fill up to that or until the buffer is full on a request-by-request basis. Doing so will not cause problem in the statement cache; it is the PA (Parameter Array) that receives the most performance benefit from the higher PACK factor. Similarly, for NOPI, TPump will benefit from the higher PACK factor.  The optimal PACK factor is established by the following dynamics, with the restriction that the total bytes not exceeding 1MB.

  • Actual size of data rows
  • Size of the multi-statement request (doubled if the client session character set is UTF16)
  • Extra Teradata CLIv2 overhead for jobs that use TPump Array Support

The PACK factor could be floating; TPump will inform the user of the "floating" PACK factor via the following new TPump message UTY6679:

**** 12:30:44 UTY6679 WARNING: PACK factor has changed. The minimum PACK factor is <n>
data records per request. The maximum PACK factor is <m> data records per request.

For example, take a target table defined as:

CREATE MULTISET TABLE testtbl, FALLBACK (
c1 integer,
c2 varchar(4),
c3 decimal(10,2),
c4 integer,
c5 varchar(500),
c6 varchar(4000)
)
NO PRIMARY INDEX;

For such an overall varchar(4525) column, the PACK factor is 230 if the defined row size length is used.

TPump will trigger dynamic buffer filling feature when PACKMAXIMUM is set or PACK 2430 is explicitly defined with Array Support turning on, sample “BEGIN LOAD” command is listed as below:

.BEGIN LOAD
  SESSIONS 4 1
  ERRORTABLE <my_error_table>
  PACKMAXIMUM /* or PACK 2430 */
  ARRAYSUPPORT ON
;

Here is the layout to load data into the target table using VARTEXT format:

.LAYOUT LAY1A ;

.FIELD c1  *  varchar(4) ;
.FIELD c2  *  varchar(4) ;
.FIELD c3  *  varchar(13) ;
.FIELD c4  *  varchar(4) ;
.FIELD c5  *  varchar(500) ;
.FIELD c6  *  varchar(4000) ;

In TPump output, a UTY6679 message will be displayed telling the user the “floating” PACK factor:

**** 15:58:36 UTY6679 WARNING: PACK factor has changed. The minimum PACK factor is 471
data records per request. The maximum PACK factor is 2430 data records per request.

Performance Improvement

The following TPump performance is assessed based upon the case of loading 73072 data rows:

Before

  • PACK factor = 230 (TPump PACK factor is fixed based on defined data length)

          Elapsed time:   00:00:00:23(dd:hh:mm:ss)

         CPU time:       12.6875 Seconds

         MB/sec:         0.320265

         MB/cpusec:      0.580578

  • PACK factor = 20 (TPump default PACK factor)

         Elapsed time:   00:00:02:16(dd:hh:mm:ss)

         CPU time:       8.5625 Seconds

         MB/sec:         0.0541624

         MB/cpusec:      0.396826

After              

  • Floating PACK factor ranged from 471 to 2430

          Elapsed time:   00:00:00:13(dd:hh:mm:ss)

          CPU time:       12.0781 Seconds

          MB/sec:         0.566622

          MB/cpusec:      0.60987

Observation

TPump does a better job determining the optimal PACK factor and it runs faster with dynamically allocating data buffer feature. TPump with dynamically filling data buffer feature is almost 1.7 times faster in term of elapsed time than TPump with a PACK factor defined by data length; it is almost 10 times faster in term of elapsed time than a default 20 PACK factor is used.  Total data TPump sends per elapsed second (MB/sec) is much improved.

1 REPLY
Teradata Employee

Re: Teradata TPump dynamic buffer filling

Thanks Ivyuan for the great article.

Can you indicate the release in TPT Stream where this became available?