Create multiple output files during export via TPT

Tools & Utilities
Enthusiast

Re: Create multiple output files during export via TPT

But let's say we execute 4 TPT scripts in parallel each having a select statement as part of the export operator and the number of AMP's of the system is 2.In that case will it be possible to execute all the 4 TPT scripts in paralled or 2 will be queued up or get aborted?Let's assume the number of instance in each export operator is set to 1.
Teradata Employee

Re: Create multiple output files during export via TPT

Any invocation of a TPT script is an independent job as has no relation to any other TPT job that is running.

Thus, you can run all 4 at the same time.

 

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

I am attempting to run some TPT scripts to export data from a table to a delimited file.  I have been successful exporting many tables and can even get them to produce up to 4 output files.  For some reason, the export is only populating the first 2 files as you can see below and prodominently the first file.  The table being exported is 365GB and has a skew factor of 0.  Any idea why this would not utilize all four files?

Thanks,
          Mike

 

 394856946306 Oct 11 13:37 TKT_FARE_CALC_4_delimited.dat-1
     2204772955 Oct 11 13:37 TKT_FARE_CALC_4_delimited.dat-2
                       0 Oct 11 11:40 TKT_FARE_CALC_4_delimited.dat-3
                       0 Oct 11 14:06 TKT_FARE_CALC_4_delimited.dat-4


TPT Script I am running:

USING CHARACTER SET ASCII
DEFINE JOB export_delimited_file
DESCRIPTION 'Exports the EDW_DATA.TKT_FARE_CALC table data to a pipe delimited file.'
(
DEFINE SCHEMA SOURCE_SCHEMA
( TKT_ID VARCHAR(30),
FARE_CALC_ID_NBR VARCHAR(1),
FARE_CALC_AREA VARCHAR(100)
);

DEFINE OPERATOR EXPORT_OPERATOR
TYPE EXPORT
SCHEMA SOURCE_SCHEMA
ATTRIBUTES
(
UserName = 'tpt_test_1',
UserPassword = '32165469',
TdpId = '10.456.254.130'
,MaxSessions= 2
,MinSessions= 2
,SpoolMode='NoSpool'
,PrivateLogName = 'tpt_test_log'
,SelectStmt = '
LOCKING EDW_DATA.TKT_FARE_CALC FOR ACCESS MODE
SELECT TRIM(CAST(TKT_ID AS VARCHAR(30))),
TRIM(CAST(FARE_CALC_ID_NBR AS VARCHAR(1))),
TRIM(FARE_CALC_AREA)
FROM IDW_DATA.TKT_FARE_CALC;'
);

DEFINE OPERATOR FILE_WRITER_REQ1
TYPE DATACONNECTOR CONSUMER
SCHEMA SOURCE_SCHEMA
ATTRIBUTES
(
Filename = '/opt/teradata/tpt-output/TKT_FARE_CALC_4_delimited.dat',
Format = 'DELIMITED',
TextDelimiter = '|',
IndicatorMode = 'N',
OpenMode = 'Write',
PrivateLogName = 'my_log'
);

 APPLY TO OPERATOR (FILE_WRITER_REQ1[4])
SELECT TKT_ID,
FARE_CALC_ID_NBR,
FARE_CALC_AREA
FROM OPERATOR (EXPORT_OPERATOR[4]);
);

 

Teradata Employee

Re: Create multiple output files during export via TPT

By default, the consumer operator instances do not work in a round-robin fashion.

When loading data to the database, the context switching between processes is too expensive.

For file writing, however, to get an even distribution of data blocks to the files, you do need a round-robin type of process.

Thus, use the -C command line option (uppercase).

This will force the data going to the consumer instances in a round-robin fashion.

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

Hi SteveF!  Do you mind providing an example of a job script that specifies multiple instances of a file writer?

 

I am trying to solve a similiar problem - trying to speed up export of a large table by specifying multiple export operators and attempting to run them in parallel with(probably multiple) DATACONNECTOR CONSUMER operators writing out the output in parallel? 

 

Ie table X, partitioned by load_date column. 

Export operator 1 - select where load_date between 2016 and 2017

Export operator 2 - select where load_date between 2017 and 2018

Export operator 3 - select where load_date less than 2017, etc

 

Can this be done in a single job script?

 

Thanks!

Teradata Employee

Re: Create multiple output files during export via TPT

Are you expecting that each partition's data be written to a different file?

Or were you just partitioning the Exports to try to speed up the export process?

 

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

Hi Steve!

 

Ideally I want to speed up the export process. I was thinking that partitioning by utilizing several exports applied to their own file writers would speed things up. 

 

Am I on the wrong track?

 

Thanks again!

Teradata Employee

Re: Create multiple output files during export via TPT

Probably overkill for what you want to do. The Export processing is not going to be the bottleneck. The file writing will be.

Just leave it at a single Export, increase the session count (and the instance count), and then in the APPLY clause, just provide an instance count for the number of output files.

Then, on the command line, use the -C command line option (uppercase).

That will create the output files of roughly the same size.

If you feel like the challenge, try that way first. Obtain your throughput numbers.

Then, try your way, using separate Export operators, each one accessing a different partition of data (use the UNION ALL syntax), using the same number of file writer instances.

Then compare the numbers.

 

So, basically like this:

APPLY TO OPERATOR ($FILE_WRITER[n]
SELECT * FROM OPERATOR ($EXPORT[m]);

and then:

APPLY TO OPERATOR ($FILE_WRITER[n])
SELECT * FROM OPERATOR ($EXPORT[m])
UNION ALL
SELECT * FROM OPERATOR ($EXPORT[x])
UNION ALL
SELECT * FROM OPERATOR ($EXPORT[y]);

You will have to add in code after the $EXPORT part to provide the unique SELECT statement that is sent to the DBS.

'n', 'm', 'x' and 'y' are the instance numbers. They do not have to be the same, but they can be.

 

-- SteveF
Highlighted
Enthusiast

Re: Create multiple output files during export via TPT

That's awesome - let me try it. Will report back shortly!

 

Have a great weekend!

Teradata Employee

Re: Create multiple output files during export via TPT

Note also that number of parallel sessions is probably controlled indirectly via TASM (e.g. set QueryBandSessInfo='UtilityDataSize=Large;') and not by legacy MaxSessions attribute.

 

You will want to keep the number of EXPORT instances small; a single EXPORT instance may actually be faster.