Create multiple output files during export via TPT

Tools & Utilities
Enthusiast

Create multiple output files during export via TPT

 

 

We have a table in Teradata having large number of records(Millions) which we are exporting using TPT.Since the number of records are huge instead of a single large file we want to create few smaller files.Going through previous posts in this forum and TPT reference  manual I saw we can achieve this using  multiple instances of the file writer using the -C during the invocation.

 

  1. I tried to implement this for a table having small number of records(Around 100) using 2 instance of the file writer[2].Although 2 output files were getting generated but all the records goes to the first file itself.Is there any particular reason that all the records got exported into the first file itself?Can we control the number of records in each file?
  2. If we use single reader and multiple writer does creation of multiple files happen in parallel or sequentially?

--Indranil Roy

 

 


Accepted Solutions
Teradata Employee

Re: Create multiple output files during export via TPT

The file writer operator and the Export operator have nothing to do with each other.

They run independent of each other.

The file writer does not know anything about session connections to the database.

The file writer just write blocks of data to the files.

If you specify 10 instances of the file writer and use the -C command line option, you will get 10 output files.

-- SteveF
1 ACCEPTED SOLUTION
13 REPLIES
Teradata Employee

Re: Create multiple output files during export via TPT

The rows are sent from the producer operator (e.g. Export) to the consumer operator (e.g. file writer) in blocks, not row-by-row.

It is more efficient that way.

And those blocks are written out to the file.

Thus, with small numbers of rows you might see them written by Instance 1 to its file. It depends on how many total blocks there are in the job.

I believe, pre-16.00, the block size is 64K.

 

That is why we say the distribution of rows to the files is "approximately" equal because the number of rows per block will vary based on the row size. And the number of blocks might not be evenly divisible by the number of file writer instances.

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

I was curous since the output is based on the size of blocks and the row size may vary as you pointed out in that case how it is ensured that a single row always goes to a single writer instance? I mean the block size might not always be perfectly divisible by the row size?
Also the write by various instances of file writer does it happen in parallel?
Teradata Employee

Re: Create multiple output files during export via TPT

I am not sure I understand the concern over whether a single row always goes to a single writer instance.

Are you concerned that there would be duplicates being written out?

That cannot happen.

When you use the -C command line option, the writing of the blocks is done in a round-robin fashion.

Block 1 would go to instance 1, block 2 to instance 2, etc.

As long as the blocks are full, then the number of bytes written to the files will be somewhat equal.

The number of rows may not be, however, if your data contains VAR fields and the sizes of the rows varies.

 

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

My only concern is whether a single row will always go to a single writer instance since the division is based on block size.
Highlighted
Teradata Employee

Re: Create multiple output files during export via TPT

Again, I am not clear on your concern.

When you say, "whether a single row will always go to a single writer instance", are you concerned that a single row would go to multiple writer instances?

That cannot happen.

And there is guarantee which row will go to which writer.

That is the nature of parallelism.

Parallelism means you cannot control the order.

 

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

Yes the only concern was whether a single row can get written to multiple writer instance.Thanks for clearing that doubt.

Teradata Employee

Re: Create multiple output files during export via TPT

A single row (and any single block of rows) is only sent to a single writer instance.

 

-- SteveF
Enthusiast

Re: Create multiple output files during export via TPT

@feinholzIf we use multiple writer instance say 10 how many sessions will be used.Do we need to specify it?Further does it depend on the number of AMP's of the database?



 

Teradata Employee

Re: Create multiple output files during export via TPT

The file writer operator and the Export operator have nothing to do with each other.

They run independent of each other.

The file writer does not know anything about session connections to the database.

The file writer just write blocks of data to the files.

If you specify 10 instances of the file writer and use the -C command line option, you will get 10 output files.

-- SteveF