Hi Dieter/Standalone, I have been following this thread and really appreciates the knowledge being shared.
I have one question related to the fastload log being posted we have statement axsmod np_axsmod.sl "";
What does it mean? Looking through the fastload/mload manual suggests its the Name of the access module file to be used to import data. but what does this means that the file specified in Define statement i.e. file=/data2/TERADATA/IN/FACT_IN_CALL/LOAD_FIFO_08; is not used and instead the output of np_axsmod.sl is used as input stream.
does gunzip -c per 15 file > to "special" mkfifo creates a Stream of data instead of creating a merged file and eating more space and then axsmod np_axsmod.sl looks into this stream?
if yes how does axsmod np_axsmod.sl comes to know that it has to take data from "special" stream and not another stream?
Teradata utilities are capable of reading/writing into a fifo (aka named pipe). This reduces the necessity for filesystem storage etc, especially when transferring data from one system to another.
But there's a catch though. The traditional FIFOs, don't support the "seek" operation i.e. you can't point to a particular position in data. This is much essential for having restart capabilities for Terdata utilities.
The named pipe access module solves this problem by being an intermediate link between the fifo and the utility, it reads data from the fifo and keeps track of it's progress (ie check point etc) in a data file of it's own. This helps it to recover in the event the job needs to be restarted (or say there was a DB restart) without doing a seek operation against the fifo (which will error if attempted).
the accessmodule knows it needs to read from that fifo, because fastload knows it, and it tells the accessmodule.
You can find more info on this module in the Access module reference manual.
What was not clear (and I did not read every reply) is whether the data from these 40,000 files must be loaded into a single table.
If so, has any thought been given to using TPT (Teradata Parallel Transporter). TPT has the ability to read multiple files at one time. In fact, TPT can be configured to read an entire directory of files in one job.
Depending on where the bottleneck is, TPT could still product faster results (unless you are totally I/O bound or CPU bound).
TPT supports the MultiLoad protocol as well.
If you have the CPU and I/O bandwidth, then TPT will improve on performance due to the ability to read multiple files in parallel, with a single job (much easier to script and manage over running parallel utility jobs by yourself).