Determining the Optimal Pack Factor for a TPT Stream Operator Job

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Teradata Employee

Determining the Optimal Pack Factor for a TPT Stream Operator Job

Do you want to have your Teradata Parallel Transporter (Teradata PT) Stream operator jobs run faster? Are you having difficulty determining the optimal pack factor for your Stream operator jobs? Knowing how to use the Stream operator’s PackMaximum attribute enables you to determine the optimal pack factor and thus improve the performance of your Stream operator job.

The PackMaximum attribute directs the Stream operator to determine dynamically the optimal pack factor for a Stream operator job. When the PackMaximum attribute is set to ‘Yes’, the Stream operator determines the optimal pack factor by sending iterative requests to the Teradata Database. After the optimal pack factor has been determined, the job uses the optimal pack factor to quickly load the data into the Teradata Database table(s). The value for the optimal pack factor is displayed in the Stream operator’s private log when the job completes.

Before running the job again, you can explicitly specify the calculated optimal pack factor in the Stream operator’s Pack attribute and set the PackMaximum attribute to ‘No’. By doing this, the job can run faster, because the Stream operator does not have to recalculate the optimal pack factor.

The default pack factor is 20 statements per request. The maximum pack factor is 2400. Different jobs can have different optimal pack factors. Optimal pack factors depend on the following variables:

  • Size of the multi-statement request
  • Size of the data record
  • Extra Teradata CLIv2 overhead for jobs that use the Array Support feature. For jobs that do not use the Array Support feature, there is no extra Teradata CLIv2 overhead.

The total bytes for the above variables cannot exceed 1MB.

For information on how to retrieve a Teradata PT private log, see Teradata Parallel Transporter User Guide

PERFORMANCE BENEFIT

The optimal pack factor improves the job’s performance by reducing the number of sends and receives between the Stream operator and the Teradata Database. There are fewer sends and receives because the Stream operator packs more statements into the buffer.

TWO SAMPLE SCRIPTS

The first sample script defines the Stream operator’s PackMaximum attribute to ‘Yes’. When the job for the first sample script completes, the optimal pack factor value is stored in the Stream operator’s private log. Here is a sample message that provides the optimal pack factor value:

Packing set to 2400 statement(s) per request

Note: In this example, the maximum pack factor is the same as the optimal pack factor.

The second sample script specifies the Stream operator’s Pack attribute as 2400 and sets the PackMaximum attribute to ‘No’.

In each script, data is read from a file and loaded into the Teradata table named TARGET_EMP_TABLE. Both scripts use job variables.

Remember: To get an optimal pack factor value, execute the first sample script before the second.

SAMPLE SCRIPT 1

DEFINE JOB USE_PACKMAXIMUM_TO_DETERMINE_THE_OPTIMAL_PACK_FACTOR
DESCRIPTION 'USE PACKMAXIMUM TO DETERMINE THE OPTIMAL PACK FACTOR'
(
   /*** Schema Definition ***/

   DEFINE SCHEMA EMPLOYEE_SCHEMA
   DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
   (
      EMP_ID   INTEGER,
      EMP_NAME CHAR(10)
   );

   /*** Data Connector Operator Definition as a File Reader ***/

   DEFINE OPERATOR FILE_READER
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
   TYPE DATACONNECTOR PRODUCER
   SCHEMA EMPLOYEE_SCHEMA
   ATTRIBUTES
   (
      VARCHAR PrivateLogName = 'dcoper_file_reader_privatelog',
      VARCHAR FileName       = @SourceFileName,
      VARCHAR IndicatorMode  = 'Y',
      VARCHAR OpenMode       = 'Read',
      VARCHAR Format         = 'Formatted'
   );

   /*** Stream Operator Definition ***/

   DEFINE OPERATOR STREAM_OPERATOR
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER STREAM OPERATOR'
   TYPE STREAM
   SCHEMA EMPLOYEE_SCHEMA
   ATTRIBUTES
   (
      VARCHAR PrivateLogName = 'streamoper_privatelog',
      VARCHAR PackMaximum    = 'Yes',
      INTEGER Pack,
      INTEGER MaxSessions    = 16,
      INTEGER MinSessions,
      VARCHAR TdpId          = @TargetTdpId,
      VARCHAR UserName       = @TargetUserName,
      VARCHAR UserPassword   = @TargetUserPassword,
      VARCHAR ErrorTable     = 'STREAMOPER_ERRTABLE',
      VARCHAR LogTable       = 'STREAMOPER_LOGTABLE'
   );

   /*** Apply Statement ***/

   APPLY
   ('INSERT INTO TARGET_EMP_TABLE (:EMP_ID, :EMP_NAME);')
   TO OPERATOR (STREAM_OPERATOR [2])

   SELECT * FROM OPERATOR (FILE_READER [1]);
); 

SAMPLE JOB VARIABLE FILE FOR SAMPLE SCRIPT 1 

 SourceFileName       = 'MySourceFile1'
,TargetTdpId          = 'MyTargetTdpId'
,TargetUserName       = 'MyTargetUserName'
,TargetUserPassword   = 'MyTargetUserPassword'

SAMPLE SCRIPT 2

DEFINE JOB SET_THE_PACK_FACTOR_TO_THE_OPTIMAL_VALUE
DESCRIPTION 'SET THE PACK FACTOR TO THE OPTIMAL VALUE'
(
   /*** Schema Definition ***/

   DEFINE SCHEMA EMPLOYEE_SCHEMA
   DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
   (
      EMP_ID   INTEGER,
      EMP_NAME CHAR(10)
   );

   /*** Data Connector Operator Definition as a File Reader ***/

   DEFINE OPERATOR FILE_READER
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
   TYPE DATACONNECTOR PRODUCER
   SCHEMA EMPLOYEE_SCHEMA
   ATTRIBUTES
   (
      VARCHAR PrivateLogName = 'dcoper_file_reader_privatelog',
      VARCHAR FileName       = @SourceFileName,
      VARCHAR IndicatorMode  = 'Y',
      VARCHAR OpenMode       = 'Read',
      VARCHAR Format         = 'Formatted'
   );

   /*** Stream Operator Definition ***/

   DEFINE OPERATOR STREAM_OPERATOR
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER STREAM OPERATOR'
   TYPE STREAM
   SCHEMA EMPLOYEE_SCHEMA
   ATTRIBUTES
   (
      VARCHAR PrivateLogName = 'streamoper_privatelog',
      VARCHAR PackMaximum    = 'No',
      INTEGER Pack           = 2400,
      INTEGER MaxSessions    = 16,
      INTEGER MinSessions,
      VARCHAR TdpId          = @TargetTdpId,
      VARCHAR UserName       = @TargetUserName,
      VARCHAR UserPassword   = @TargetUserPassword,
      VARCHAR ErrorTable     = 'STREAMOPER_ERRTABLE',
      VARCHAR LogTable       = 'STREAMOPER_LOGTABLE'
   );

   /*** Apply Statement ***/

   APPLY
   ('INSERT INTO TARGET_EMP_TABLE (:EMP_ID, :EMP_NAME);')
   TO OPERATOR (STREAM_OPERATOR [2])

   SELECT * FROM OPERATOR (FILE_READER [1]);
);

SAMPLE JOB VARIABLE FILE FOR SAMPLE SCRIPT 2

 SourceFileName       = 'MySourceFile2'
,TargetTdpId          = 'MyTargetTdpId'
,TargetUserName       = 'MyTargetUserName'
,TargetUserPassword   = 'MyTargetUserPassword'

In Teradata PT Application Programming Interface (API) Mode

To find an optimal pack factor value in Teradata PT API mode:

  1. Set the Stream driver’s TD_PACKMAXIMUM attribute to ‘Yes’ in your Teradata PT API application. Here is a sample C++ code to define the TD_PACKMAXIMUM attribute:
Connection *conn = new Connection();

conn->AddAttribute(TD_PACKMAXIMUM, "Yes");

 

  1. Query the Stream driver’s TD_Evt_PackFactor event to retrieve the pack factor. This event can only be queried after initiating, but before terminating a connection. Here is a sample C++ code to query the TD_Evt_PackFactor event:
char     *dataptr;

TD_Length datalen;

int       querystatus = 0;

queryStatus = conn->GetEvent(TD_Evt_PackFactor, &dataptr, &datalen);

 

For information on Teradata PT API, see Teradata Parallel Transporter Application Programming Interface Programmer Guide.