Teradata Utilities are specialised tool for loading or exporting huge volume of data as compared to conventional SQL tools like SQL Assistant & BTEQ. I wish to learn the fundamental performance gain reason between them. I have provided my understanding on them below. I would appreciate any addition/correction to them.
(a) FastExport Vs BTEQ Export: BTEQ process each row individually while exporting whereas FastExport deals in block's export (63 or 64 KB). The Block size is an indication of the message size which carries the rows. Large the block size, more is the message size and hence, more number of rows exported per message. The Block size (63 or 64 KB) is not user modifiable in FastExport.
(b) FastLoad Vs BTEQ Import: Same as above. BTEQ Import process 1 row at a time, meaning it import 1 row from the source file to the table. FastLoad uses the buffer's size (63 or 64 KB as indicated by the Buffer Size). This size of 63 or 64 KB as indicated by Buffer Size is defaulted to the maximum with user's modification allowed. No Transient Journaling is maintained for FastLoad, whereas BTEQ uses Transient journaling.
(c) FastLoad Vs MultiLoad: MLoad allows journaling, NUSIs, fallback whereas FLoad doesn't allow any of them. This is point # 1 for performance gain. To cater to duplicate records, MLoad uses SEQUENCE NUMBER. This sequence number is used to avoid loading the same row twice when restart happens. As FLoad doesn't allow duplicate records, the SEQUENCE NUMBER isn't maintained in FLoad. This is point # 2 for performance gain. 05 phases of MLoad and 02 phases of FLoad is point # 3. MLoad works via Work Table whereas FLoad works directly on the Target Table, which is point # 4 for performance gain factor. The BLOCK SIZE as indicated by BUFFER SIZE pointing the message size carrying rows is SAME for FastLoad and MultiLoad, and hence there is no discrepancy in this context.
A few clarifications: (as this is read, all cases of Multiload can be read as TPT Update, Fastload as TPT Insert, TPump as TPT Streams - TPT should be being leveraged at this point for any new developmet rather than the legacy utilities)
a1) BTEQ returns rows in blocks from the database to the underlying CLI in blocks as well. Depending on your setup 64K may be the default but I believe this can be adjusted upwrd to 1MB (CLI parameter not BTEQ parameter). This should be changed if exporting medium to large result sets.
a2) The primary difference between fastexport and BTEQ export is the ability to ship data over multiple session connections simultaneously thereby leveraging the total connectivity available between the client platform and the database engine. In order to do this, Fastexport spends more resources on executing the query in order to prepare the blocks in such a way that when they are exported over multiple sessions they can easily be reassembled in the right order by the client without additiona sorting or processing of the rows.
b1) BTEQ import does process a row at a time. Thus it is generally appropriate only for very small imports.
b2) Fastload does use a buffered approach. But its real differentiation comes from its use of multiple simultaneous sessions which allow it to leverage all of the parallelism of the parallel database engine. Multiple AMPs are working on receiving, deblocking, data type transformation, hashing and redistribution of the data.
b3) Fastload is only capable of inserting data into an empty table. This obviates the need for journaling because if it doesn't finish properly, the proper recovery is to simply delete the table and start over. BTEQ import uses journaling because it can load into an existing table and needs to be able to roll back an unfinished import row.
c1) Fastload only allows insert into empty tables. Thus the thought is that it is easy to add indexes and other table attributes after the table has been loaded with the data. No need for journals because the data is either loaded or it is not. No need to roll forward or back.
c2) The primary reason Fastload does not allow duplicate records is that at the time it was designed and built, multi-set tables did not exist. Teradata's early days allowed only SET tables so Fastload did not have to be designed for being able to load duplicate records. It was deemed a use friendly feature to automatically eliminate them rather than report them as errors. When Multiload was designed it had a very different set of requirements including Multiset and the ability to insert, update and delete in existing populated tables. This in turn led to very different design including the sequence numbers in the incoming rows to allow ordered apply, insert of duplicate rows and the checkpoint/restart case while still preserving the duplicate rows. It was decided that we would not go back and redesign Fastload to cover all of these cases but rather to leave it for the simple case of insert into empty tables and have Multiload handle any other case not supported by Fastload.
Choosing between the bulk tools should be less about relative performance and more about matching the required functionality to the use case. If inserting data into an empty table (without dups) then Fastload else Multiload. BTEQ import and TPump are chosen if the import data is small or medium respectively and it is desirable to avoid the overhead of using a bulk utility slot and the overhead of startup and shutdown of the bulk utility for small data sets. BTEQ export likewise is appropriate for small to medium result sets. FastExport is designed to handle th high volume exports in an expeditious way. And of course TPump can be used when the requirement is to load continuously.
Thanks Todd for explaining the differences, in a way rectifying my points and adding on them.
Also, Thanks for your help on the Skew + NUSI discussion [http://forums.teradata.com/forum/database/how-to-calulate-skew-on-join-index#comment-133308].
One small correct (sorry Todd!):
The TPT equivalent of FastLoad is "TPT Load".
TPT Insert is equivalent to BTEQ inserts (uses SQL sessions).