Any invocation of a TPT script is an independent job as has no relation to any other TPT job that is running.
Thus, you can run all 4 at the same time.
I am attempting to run some TPT scripts to export data from a table to a delimited file. I have been successful exporting many tables and can even get them to produce up to 4 output files. For some reason, the export is only populating the first 2 files as you can see below and prodominently the first file. The table being exported is 365GB and has a skew factor of 0. Any idea why this would not utilize all four files?
394856946306 Oct 11 13:37 TKT_FARE_CALC_4_delimited.dat-1
2204772955 Oct 11 13:37 TKT_FARE_CALC_4_delimited.dat-2
0 Oct 11 11:40 TKT_FARE_CALC_4_delimited.dat-3
0 Oct 11 14:06 TKT_FARE_CALC_4_delimited.dat-4
TPT Script I am running:
USING CHARACTER SET ASCII
DEFINE JOB export_delimited_file
DESCRIPTION 'Exports the EDW_DATA.TKT_FARE_CALC table data to a pipe delimited file.'
DEFINE SCHEMA SOURCE_SCHEMA
( TKT_ID VARCHAR(30),
DEFINE OPERATOR EXPORT_OPERATOR
UserName = 'tpt_test_1',
UserPassword = '32165469',
TdpId = '10.456.254.130'
,PrivateLogName = 'tpt_test_log'
,SelectStmt = '
LOCKING EDW_DATA.TKT_FARE_CALC FOR ACCESS MODE
SELECT TRIM(CAST(TKT_ID AS VARCHAR(30))),
TRIM(CAST(FARE_CALC_ID_NBR AS VARCHAR(1))),
DEFINE OPERATOR FILE_WRITER_REQ1
TYPE DATACONNECTOR CONSUMER
Filename = '/opt/teradata/tpt-output/TKT_FARE_CALC_4_delimited.dat',
Format = 'DELIMITED',
TextDelimiter = '|',
IndicatorMode = 'N',
OpenMode = 'Write',
PrivateLogName = 'my_log'
APPLY TO OPERATOR (FILE_WRITER_REQ1)
FROM OPERATOR (EXPORT_OPERATOR);
By default, the consumer operator instances do not work in a round-robin fashion.
When loading data to the database, the context switching between processes is too expensive.
For file writing, however, to get an even distribution of data blocks to the files, you do need a round-robin type of process.
Thus, use the -C command line option (uppercase).
This will force the data going to the consumer instances in a round-robin fashion.
Hi SteveF! Do you mind providing an example of a job script that specifies multiple instances of a file writer?
I am trying to solve a similiar problem - trying to speed up export of a large table by specifying multiple export operators and attempting to run them in parallel with(probably multiple) DATACONNECTOR CONSUMER operators writing out the output in parallel?
Ie table X, partitioned by load_date column.
Export operator 1 - select where load_date between 2016 and 2017
Export operator 2 - select where load_date between 2017 and 2018
Export operator 3 - select where load_date less than 2017, etc
Can this be done in a single job script?
Are you expecting that each partition's data be written to a different file?
Or were you just partitioning the Exports to try to speed up the export process?
Ideally I want to speed up the export process. I was thinking that partitioning by utilizing several exports applied to their own file writers would speed things up.
Am I on the wrong track?
Probably overkill for what you want to do. The Export processing is not going to be the bottleneck. The file writing will be.
Just leave it at a single Export, increase the session count (and the instance count), and then in the APPLY clause, just provide an instance count for the number of output files.
Then, on the command line, use the -C command line option (uppercase).
That will create the output files of roughly the same size.
If you feel like the challenge, try that way first. Obtain your throughput numbers.
Then, try your way, using separate Export operators, each one accessing a different partition of data (use the UNION ALL syntax), using the same number of file writer instances.
Then compare the numbers.
So, basically like this:
APPLY TO OPERATOR ($FILE_WRITER[n] SELECT * FROM OPERATOR ($EXPORT[m]);
APPLY TO OPERATOR ($FILE_WRITER[n]) SELECT * FROM OPERATOR ($EXPORT[m]) UNION ALL SELECT * FROM OPERATOR ($EXPORT[x]) UNION ALL SELECT * FROM OPERATOR ($EXPORT[y]);
You will have to add in code after the $EXPORT part to provide the unique SELECT statement that is sent to the DBS.
'n', 'm', 'x' and 'y' are the instance numbers. They do not have to be the same, but they can be.
Note also that number of parallel sessions is probably controlled indirectly via TASM (e.g. set QueryBandSessInfo='UtilityDataSize=Large;') and not by legacy MaxSessions attribute.
You will want to keep the number of EXPORT instances small; a single EXPORT instance may actually be faster.