TPT Export job performing tuning

Tools

TPT Export job performing tuning

We are using TPT to export data from a table having 50 million rows using export operator.
We are planning to use either of the two approach.
1)Create 10 TPT scripts each working on the subset of the data by giving where condition
Ex:1 st script working on 1 to 5 million,2 and script working to 5.1 to 10 million and so on..
2)Create multiple instance of export operator and set the max number of session to 10.
Following are my concern:
1)Does both approach work same internally?Does the second approach also work on a subset of the data for each session?
2)If the number of AMP's are less than 10 how many parallel read threads can be executed by both the approaches?
6 REPLIES
Teradata Employee

Re: TPT Export job performing tuning

First of all, you do not need to run 10 different TPT scripts in order to run 10 different Export operators (with the WHERE clause).

You can use a single TPT script, and use the UNION ALL construct.

Each Export operator in each SELECT statemet of the UNION ALL will be a unique separate copy of the Export operator.

 

If you only have 10 AMPs (for example) and you use 1 Export operator with 10 instances (way too many. BTW) then each instance can only use 1 data session.

If you only have 10 AMPs (for example) and use use 10 Export operators with the UNION ALL construct, each Export operator can use up to 10 data sessions.

 

The UNION ALL approach should give you better performance because you can use each copy of the Export operator to export a subset of the rows.

 

When you use 'n' instances of the Export operator (and 50 million rows is not that much, so 'n' can be as few as 2), the Export operator will knwo ahead of time

how many blocks of data will be spooled and coming back to the operator. The blocks are then distributed evenly across the instances, so each instance is

processing a subset of the rows.

 

You will also want to consider exporting the data in NoSpool mode. This will speed up the export process.

 

Having said all of that, the destination of the data could very well be the bottleneck in all of this. If you are writing the data to disk, that will be the bottleneck

(file I/O is always slower than exporting from Teradata) and thus adding more instances of the Export operator, or using a lot of Export operator copies in

the UNION ALL construct will not necessarily make the job go faster.

 

Also keep in mind that the number of data streams being used in the job is a multiple of the producer instances and the consumer instances, and using

too many of each will use a lot of system resources.

 

For 50 million rows, you do not need 10 instances (or 10 copies) of the Export operator to achieve optimal performance.

 

-- SteveF

Re: TPT Export job performing tuning

Thanks for the detailed explanation.A couple of things I would like to get clarified:

 

  1. When you say               

    If you only have 10 AMPs (for example) and you use 1 Export operator with 10 instances (way too many. BTW) then each instance can only use 1 data session.

    If you only have 10 AMPs (for example) and use use 10 Export operators with the UNION ALL construct, each Export operator can use up to 10 data sessions.

    Does this mean each instance of an operator can use as many sessions as the number of AMP's but if we use multiple export operator with 1 instance each it can use 1 session in each AMP's?      
  2. Creating 10 TPT scripts with 10 export operator or creating 1 script with UNION ALL(as you suggested) does both work the same way internally?
  3. If I create a TPT script with 5 export operators (using UNION ALL) but there are only 4 AMP's available,does it mean each operator can use 4 sessions?                                                                                                                                                                                                                                  
Teradata Employee

Re: TPT Export job performing tuning

All good questions.

It is important to understand what we mean when we say "instances" versus "copies".

When you specify a single Export operator and put [10] after it, those are "instances" of a single Export operator.

Those instances together appear to Teradata as one Export operator job.

Instances of an operator share the same metadata (logon information, SELECT statement, etc.).

And thus, the max number of sessions you can connect (total) is equal to the total number of available AMPs.

And that total is divided as evenly as possible across the instances.

Thus, with 10 instances and 10 AMPs, each instance can only connect 1 session each.

 

When you are using the UNION ALL construct, each Export in the SELECT statement is a completely different "copy" of the operator.

Each copy has its own set of metadata.

And each copy can have its own set of instances.

 

So, yes, using UNION ALL is similar to running multiple TPT jobs in parallel.

With UNION ALL, it is all treated as one job and all of the output log information will reside in a single log file.

And if one operator incurs an error, everything stops.

With multiple distinct TPT jobs being run in parallel, an error in one job will not stop the other jobs from running.

 

Hope this helps.

-- SteveF

Re: TPT Export job performing tuning

Hi @feinholz

 

Excellent explaination it really helped me in clearing all my doubts.Just wanted to clarify the scenario below:

 

  1. How do I divide all the rows into multiple export operator using the UNION ALL clause if there is no numeric column present in the table which distributes the rows evenly.I was thinking of using ROWNUM but it comes at the expense of an order by clause which can prove to be very costly if the table is huge.Is there any better way to achieve it?
  2. If I use 5 copies of the export operator and there are only say 3 AMP's avaliable does it mean that each operator copy can always use 3 sessions or it is limited based on the availability of the AMP's and system resources?
Teradata Employee

Re: TPT Export job performing tuning

1. I do not know your data and so I cannot comment on how you would partition up the data in a WHERE clause.

2. as indicated in my previous reply; "copies" are individual distinct operators and thus have the capability to connect as many sessions as there are available AMPs.

-- SteveF

Re: TPT Export job performing tuning

Hi @feinholz

 

The table has around 20 columns(Mostly Numeric/Character/DateTime types)  but there is no unique identifier column present in the table.

My ultimate goal is to create multiple copies of the export operator each working on a subset of rows specified by some condition in the where clause like

 

Export1: Select * from tablename where rownum>1 and rownum<=1000000

Export2:Select * from tablename where rownum>1000000 and rownum<=2000000

Export3:..... so on

There is an option to generate sequence number in Teradata for each row based on ROW_NUMBER() and then using it in the where clause but that requires a order by clause which is costly since the number of rows is over 50 million to 1 Billion.So is there any other better solution to a achieve this?We need to follow a general approach which should not be dependent on the structure of the table.