TPT - Instances Vs Sessions

Tools & Utilities
Enthusiast

TPT - Instances Vs Sessions

Greetings Experts,

What is the basic difference between the instances and sessions in TPT.

If I declare a Maxsessions attribute for a operator to be 10 over a 10 AMP system, and the consumer operator uses all 10 sessions (as producer is able to keep the consumer busy)

Having said that 10 sessions, say if I increase the number of consumer instances to 2 from 1 instance, each using  5 sessions, what kind of advantage is gained over here so that work done by 2 instances each with 5 sessions >= work done by 10 sessions. 

I have gone through some where that the multiple instances are used when we read from multiple sources. Are multiple instances used when we use multiple data sources?  I have seen some scripts that use multiple instances for a single data source.

Does TPT allow parallel extraction from different data sources with in a single script?(I know the data is not landed to any files, but in data stream) What kind of performace advantage is gained from TPT when we could achieve the same from Load utilities/scripting (Say if I have to extract the data from 3 sources with 3 producers, these should be done in sequence I guess with multiple job steps in TPT; it can also be done with multiple loader utility jobs if applicable) Can the extraction be done in parallel for 3 data sources in a single TPT script?

Tags (3)
13 REPLIES
Teradata Employee

Re: TPT - Instances Vs Sessions

Performance is always dependent upon where your normal bottleneck would be.

If the bottleneck is on the file I/O, then TPT may not be able to help much.

If the bottleneck is on the network side, TPT scaling may not be as efficient as you would like.

However, if you have enough network bandwidth and I/O is not an issue, then TPT's scaling abilities will be able to improve performance.

The number of instances will always depend on the type of job and amount of data. Depending on the operator, the number of consumer instances may not need to expand beyond 1. One instance of the Load operator is very fast. I have rarely seen a need for more than 1 instance of the Load operator. Same applies to the Update operator in most cases. But YMMV.

Data is not processed by the consumer instance in a round-robin fashion. Instance 1 will always try to get all of the work, as long as it can keep up with the rate at which data is flowing through the data streams. If instance 1's buffers begin to back up a bit then instance 2 will begin to get a little work.

Sessions, as you correctly notied, are distributed evenly across the instances. Therefore, only use the minimum instances you really need. Otherwise you are wasting resources, including sessions.

You will need to run a job with 1 instance (10 sessions) and then 2 instances (5 sessions each) and look at the operator output. It will show the rows processed by each instance. If instance #1 got a vast majority of the work, then you know you really do not need the 2nd instance and one instance with more sessions would still be best.

For file reading, we support the wildcard syntax. So if you have multiple files to be processed, you can specify more than one instance of the DataConnector operator and the files will be processed in parallel. However, realize that it will create a load on the disk head.

We also support having multiple instances of the DataConnector operator reading from a single file and have seen some positive performance results with that as well.

-- SteveF
Enthusiast

Re: TPT - Instances Vs Sessions

Thank you Steve!  Can you please elucidate the below.

"By the time Teradata PT processes the message indicating that the error limit has been exceeded, it may have loaded more rows into the error table than the actual number specified in the Error Limit. The ErrorLimit specification is not cumulative, but applies to each instance of the Stream"

The load, update operators which loads in 64K doesn't load more rows into the error table than the actual number specified in the error limit as they process the data in parallel? 

Then how does stream operator is different from load and update operator in this aspect?

For Update and Load Operators

"If instance #1 processes 500 error rows and instance #2 processes 500 error rows the job

will do the following:

• If the job has already passed the final checkpoint (the transaction is fully committed),

the job will complete. In this case, the error limit is calculated per instance
"

Can you please explain final checkpoint (the transaction is fully committed)

Say, if 2 instances are extracting the data from source that has 10000 rows out of which   1000 are error rows with checkpoint defined at 1000 rows and a checkpoint have been taken at 9000 rows. 

Now, if each instance #1 and instance #2 reaches to 500 error rows, will the job be terminated.  If not, how can there be any errors after final checkpint ( I presume

final checkpoint refers to processing entire 10000 rows from source).

Enthusiast

Re: TPT - Instances Vs Sessions

Can we define the number of sessions >= number of AMP's say in multiples of them.

Defining more instances (if they are effectively used in consumer operator) yields better performance as it is operated by multiple processes unlike to that of Utilities where as single process handles all the operations and there might be CPU limit imposed to the single process.

If we terminate a job that has processed say 800 rows from a source of 1000 rows using the twbcmd which takes checkpoing and terminates and then restart the job using the different supported consumer operator to load into target, will it start processing from 801 row?

Teradata Employee

Re: TPT - Instances Vs Sessions

Ok, will try to answer the questions in order from the 2 posts:

The Stream operator is different from the Load and Update. Each instance of the Stream is sending their own requests. With the Load and Update, all instances are sending the data as a single transaction. The master instance opens the transaction with a "BT", and the "ET" is not sent until all data has been loaded. Different protocols. Different behavior.

The Load and Update operators cannot connect more sessions than there are AMPs.

Yes, if you stop a job that has loaded (and checkpointed) 800 records, then the subsequent (resumed) job will start at 801. It does not matter if the resumed job uses a different number of instances or sessions.

-- SteveF
Enthusiast

Re: TPT - Instances Vs Sessions

Hi,

Does each instance uses separate task or it is one task per operator?

Teradata Employee

Re: TPT - Instances Vs Sessions

If, by "task", you mean "load slot", then it is one load slot per operator.

Not per instance.

-- SteveF
Enthusiast

Re: TPT - Instances Vs Sessions

Thanks Steven.

Enthusiast

Re: TPT - Instances Vs Sessions

Hello,

What are the ways to identify the IO and network bandwidth?

Regards

Enthusiast

Re: TPT - Instances Vs Sessions

Hello,

I have seen a few posts that indicate multiple instances can be used to read from a SINGLE file.  I have some rather large files, but the log (below) shows the 2nd and 3rd instances are ignored because no data files assigned to them.  What might I be doing wrong? Thanks

My File Loader File looks like this:

DEFINE JOB File_Load

DESCRIPTION 'Load a Teradata table from a file'

(   STEP MAIN_STEP

  (    APPLY $INSERT TO OPERATOR ( $LOAD        [@LoadInstances]    )

    SELECT * FROM OPERATOR    ( $FILE_READER [@ReaderInstances]  );   );

My Jobvariable file has many variables, but last are:

,LoadInstances     = 3

,UpdateInstances   = 1

,ExportInstances   = 1

,StreamInstances   = 1

,InserterInstances = 1

,SelectorInstances = 1

,ReaderInstances   = 3

,WriterInstances   = 1

,OutmodInstances   = 1

--------------------------

 

TLOGVIEW indicates only single instance because no other files assigned to the others:

...

$FILE_READER[1]: DataConnector Producer operator Instances: 3

$FILE_READER[3]: TPT19012 No files assigned to instance 3.  This instance will be inactive.

$FILE_READER[2]: TPT19012 No files assigned to instance 2.  This instance will be inactive.

...

$LOAD: Statistics for Target Table:  'ORDERS'

$LOAD: Total Rows Sent To RDBMS:      318671360

$LOAD: Total Rows Applied:            318671360

$LOAD: Total Rows in Error Table 1:   0

$LOAD: Total Rows in Error Table 2:   0

$LOAD: Total Duplicate Rows:          0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 476805, Total Rows Received = 0, Total Rows Sent = 0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 0, Total Rows Received = 0, Total Rows Sent = 0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 0, Total Rows Received = 0, Total Rows Sent = 0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 0, Total Rows Received = 0, Total Rows Sent = 0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 13, Total Rows Received = 0, Total Rows Sent = 0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 476792, Total Rows Received = 0, Total Rows Sent = 0

$LOAD: disconnecting sessions

$FILE_READER[1]: Total files processed: 1.