Improving Loading Mechanism using Fastload

Tools & Utilities
Enthusiast

Re: Improving Loading Mechanism using Fastload

this is log if i run in single fastload (not parallel fastload)

**** 11:53:59 Processing starting at: Wed Oct 22 11:53:58 2008
**** 11:53:59 FDL4866 SESSIONS command accepted
**** 11:53:59 Error limit set to: 1
**** 11:53:59 Teradata Database Release: 12.00.01.13
**** 11:53:59 Teradata Database Version: 12.00.01.13
**** 11:53:59 Current CLI or RDBMS allows maximum row size: 64K
**** 11:53:59 Character set for this job: ASCII
**** 11:54:00 Number of FastLoad sessions requested = 8
**** 11:54:00 Number of FastLoad sessions connected = 8
**** 11:54:00 FDL4808 LOGON successful
**** 11:54:01 Command completed successfully
**** 11:54:01 RDBMS error 3807: Object 'IN_CAT_LOAD_PAR_BENCH_ERR_1'
**** 11:54:01 RDBMS error 3807: Object 'IN_CAT_LOAD_PAR_BENCH_ERR_2'
**** 11:54:01 Command completed successfully
**** 11:54:02 Number of AMPs available: 12
**** 11:54:02 BEGIN LOADING COMPLETE
**** 11:54:02 Now set to read 'Variable-Length Text' records
**** 11:54:02 Delimiter character(s) is set to '|'
**** 11:54:02 Command completed successfully
**** 11:54:02 FDL4803 DEFINE statement processed
**** 11:54:02 Number of recs/msg: 1
**** 11:54:02 Starting to send to RDBMS with record 1
**** 11:54:17 Sending row 100000
**** 11:54:32 Sending row 200000
**** 11:54:42 Sending row 272375
**** 11:54:42 Finished sending rows to the RDBMS
**** 11:54:46 END LOADING COMPLETE
**** 11:54:47 Logging off all sessions
**** 11:54:47 Total processor time used = '41.28 Seconds'
**** 11:54:47 FDL4818 FastLoad Terminated

the weird thing is it took less than a minute.
Ambassador

Re: Improving Loading Mechanism using Fastload

The bottleneck is FastLoad's insert phase, if you run several FastLoads in parallel they will compete for the same resources.

Dieter
Enthusiast

Re: Improving Loading Mechanism using Fastload

Hi Dieter/Standalone,
I have been following this thread and really appreciates the knowledge being shared.

I have one question related to the fastload log being posted we have statement axsmod np_axsmod.sl "";

What does it mean? Looking through the fastload/mload manual suggests its the Name of the access module file to be used to import data. but what does this means that the file specified in Define statement i.e. file=/data2/TERADATA/IN/FACT_IN_CALL/LOAD_FIFO_08; is not used and instead the output of np_axsmod.sl is used as input stream.

does gunzip -c per 15 file > to "special" mkfifo creates a Stream of data instead of creating a merged file and eating more space and then axsmod np_axsmod.sl looks into this stream?

if yes how does axsmod np_axsmod.sl comes to know that it has to take data from "special" stream and not another stream?

Enthusiast

Re: Improving Loading Mechanism using Fastload

That's the named pipe access module.

Teradata utilities are capable of reading/writing into a fifo (aka named pipe). This reduces the necessity for filesystem storage etc, especially when transferring data from one system to another.

But there's a catch though. The traditional FIFOs, don't support the "seek" operation i.e. you can't point to a particular position in data. This is much essential for having restart capabilities for Terdata utilities.

The named pipe access module solves this problem by being an intermediate link between the fifo and the utility, it reads data from the fifo and keeps track of it's progress (ie check point etc) in a data file of it's own. This helps it to recover in the event the job needs to be restarted (or say there was a DB restart) without doing a seek operation against the fifo (which will error if attempted).

the accessmodule knows it needs to read from that fifo, because fastload knows it, and it tells the accessmodule.

You can find more info on this module in the Access module reference manual.
Teradata Employee

Re: Improving Loading Mechanism using Fastload

What was not clear (and I did not read every reply) is whether the data from these 40,000 files must be loaded into a single table.

If so, has any thought been given to using TPT (Teradata Parallel Transporter).
TPT has the ability to read multiple files at one time.
In fact, TPT can be configured to read an entire directory of files in one job.

-- SteveF
Enthusiast

Re: Improving Loading Mechanism using Fastload

Hi everyone,

actually, i had already implement 10 parallel fastload to load these 40.000 files.

with these configuration we managed to load 60.000 files per day.

but,...

with that kind of configuration, it over burden the box with to many I/O process.

after 3 month, we decide to change the architecture.

instead load the raw data and process it inside TERADATA, we decide to pre-process it out side TERADATA first.

so, TERADATA will only recieve the final data.

and we change the fastload to multiload, because the intensive I/O process.

after a week or so, it seems the multiload become slower and slower.

so, today we decide to use TPT.

has anyone use TPT in parallel mode and insert directly to 1 table?
Ambassador

Re: Improving Loading Mechanism using Fastload

TPT uses the same load protocols as FastLoad or MLoad, if TPT will be faster depends where's the bottleneck.

Could you post a log output again?

Dieter
Enthusiast

Re: Improving Loading Mechanism using Fastload

Hi dieter,

actually, yesterday we found out that TPT is using MLOAD also.
so, we postpone the development for TPT, instead we focus on finding the bottle neck.

we found out that in our target table, we use NUSI.
after we drop the NUSI, MLOAD performance drastically increase.
right now we still monitoring the this changes, wheter the performance is stable.
Teradata Employee

Re: Improving Loading Mechanism using Fastload

Depending on where the bottleneck is, TPT could still product faster results (unless you are totally I/O bound or CPU bound).

TPT supports the MultiLoad protocol as well.

If you have the CPU and I/O bandwidth, then TPT will improve on performance due to the ability to read multiple files in parallel, with a single job (much easier to script and manage over running parallel utility jobs by yourself).

-- SteveF
Enthusiast

Re: Improving Loading Mechanism using Fastload

Hi Feinholz,

I must have load multiple input data files as they appear in a Unix file directory using asingle load operation. Is the TPT supports this?

Thanks in advance.