We have a table having huge number of rows in billions with around 180 columns which we need to export using TPT.The DBA would be allow to use more than 1 session.
Is it advisable to achieve this using a export using a single TPT script?
Is it better to use multiple TPT scripts with each TPT scripts exporting subset of the data?In this case what should be the number of rows to be exported by each TPT script?Though we have a unique primary index but it is not a numeric sequence.We have a other column which is increasing numeric sequence but not part of any index.So which column should we use to distribute the records in each TPT script if we need to use multiple TPT script?
Given that you are exporting a range of data values, the primary index (unique or otherwise) is basically irrelevant. Ignore it for this discussion.
"Is it better to use multiple TPT scripts with each TPT scripts exporting subset of the data?" - well that really depends on your definition of 'better'. What are your goals?
- ease of operation (run one script)
- speed of extract? (run multiple)
If you are using multiple scripts (which I suspect will be the answer that you come to) then for speed and efficiency you're trying to avoid a full table scan for each script on your large table.
For selection you are probably best using a PPI column(s).
If PPI is not an option then look at using a NUSI column(s).
If none of those then are there any hash indexes or single table join indexes on the table?
If none of the above are available then I suspect that every request (TPT script) will be a full table scan. Good luck.
Once you've exported the data, what are you going to do with it?
If subsequent processing requires effectively merging the data. Does that processing suggest a good order in which to export the data?
Some thoughts for you.
Adding to Dave Wellman:
Do you have enough spool to run a single Select?
Then a single export using multiple instances of both Export & FileWriter might be the best.
Do you want to export all rows without WHERE?
Then you might try the NO SPOOL option to directly export (but this will keep the table locked until the end of the export)
We cannot use NO SPOOL option in the TPT script and the DBA would not allow a single TPT script with multiple instance of each operator.We will however be allowed to execute 2-3 TPT scripts simultaneously but the issue here is that how do we formulate the where condition so that nearly equal number of rows are fetched by each TPT script.We do not have any PPI or NUSI in the table.We have a composite Primary Index with a combination of integer and a varchar column.Further the integer column is not sequential number.Is there any way to write the select query in TPT avoiding a full table scan?
If the only index you have on the table is the PI then being practical you cannot avoid the full table scan.
If this process is not going to be done very often, can you add a NUSI for this processing?
Trying to get 'nearly equal number of rows' may not be straight forward. The obvious choice is using the ROW_NUMBER function, but that will require movement of the data - in your case I suspect the entire table. Not good.
I think you'll need to do some investigation of the data values and code the WHERE clause accordingly.
A possible idea (not nice but it technically works):
- build a perm table which contains ONLY the Primary Key column(s) and a sequential value (use the ROW_NUMBER function to generate it). Something like:
CREATE SET TABLE t1 AS (SELECT primary key column(s) ,ROW_NUMBER() OVER(ORDER BY existing-primary-index-columns) AS seqnbr FROM big-table) WITH DATA PRIMARY INDEX(existing-primary-index-columns);
- in your TPT export, join this table to your big table, on the PI and PK columns
- add in a WHERE clause against the 't1' built above using MOD function against the 'seqnbr' column
Assume that you want 3 TPT scripts to run concurrently.
- TPT#1 contains WHERE t1.seqnbr MOD 3 = 0
- TPT#1 contains WHERE t1.seqnbr MOD 3 = 1
- TPT#1 contains WHERE t1.seqnbr MOD 3 = 2
Yes the build of the table 't1' will do a full table scan of the 'big-table', but the TPT scripts themselves should do a merge join to it.
(if you follow this approach, don't bother adding a NUSI)
NOTE: the above instructions include both PK 'primary key' and PI 'primary index' - make sure you use the correct ones.
DBA would not allow a single TPT script with multiple instance of each operator.
We will however be allowed to execute 2-3 TPT scripts simultaneously
Ask your DBA why multiple scripts (e.g. 3) with a single instance each should be more efficient than a single script with multiple instances (e.g. 3)?
Thanks for your valuable inputs.After lots of discussion it seems we can use a single TPT script with multiple sessions and multiple instance of the operators in TPT.We can also use NOSPOOL option.
The only thing I wanted to verify here is:
How to distribute multiple instance vs multiple session.For example,
Is it better advised to use 10 sessions with 1 instance or 5 sessions with 2 instance.The DBA has mentioned we can use upto 10 sessions but did not mention anything regarding the use of instance.
The use of multiple session vs instance is what I not very clear about?