Mload vs Fastload in rejecting the duplicates

General
Enthusiast

Mload vs Fastload in rejecting the duplicates

Hi All,

I am bit confused about why Fastload does nt load duplicates even in Multiset table.I got a reason from my colleague that ,its because of its restartability feature.The example which he gave was:

If u want to load 50 rows using Fload and its checkpoint is 5 rows.Then if a job fails while loading 7 th row ,fastload will try to reload the data from 6th row.So 6th row will be a duplicate .Hence it will be difficult to descriminate the actual dupliactes with these dupliactes.So it simply discards them duplicates to error table.

But I was not convinced with this because even mload uses restart logic.But Mload loads the duplicates in a mulitset table.Can someone explain why?

Thanks,
Ashok.
6 REPLIES
Enthusiast

Re: Mload vs Fastload in rejecting the duplicates

I believe Fastload silently discards duplicate rows. They don't go to the error table.

Many of the differences between FastLoad and MultiLoad are due to the fact that MultiLoad loads by way of an intermediate work table, whereas Fastload loads directly into the target table, which is sorted later. The sorting process automatically removes duplicate rows. The records that MultiLoad loads the work table during the acquisition phase are not exact images of the target rows. They contain sequence numbers that are generated by the client module. If the acquisition phase is restarted, the client may resend rows that were previously sent after the last checkpoint, as is the case with Fastload. However, only duplicate records with matching sequence numbers are discarded when the work table is sorted at the end of the acquisition phase. Duplicate records with non-matching sequence numbers are retained.
Enthusiast

Re: Mload vs Fastload in rejecting the duplicates

Thanks Jim.

So you mean to say becuase of this sequence numbers in acquistion phase,Mload will be in a position to identify the duplicates in work table loaded due to restart and discard them.However in Fastload case,as we dont have a work table it will not have a sequence number because of which Fastload cannot identify the duplicates.Hence Fastload rejects the duplicates?Please correct me If I am wrong.

Thanks,
Ashok.
Teradata Employee

Re: Mload vs Fastload in rejecting the duplicates

Hello Jim,

I was looking for the answer concerning why & how FastLoad rejects the Duplicate rows and MultiLoad doesn't do the same. Having said that, can you explain as to how exactly Fastload works. From what I understand, in Phase 01, Fastload brings the data from the source file to the Database (This phase populates the Error 1 table as in Conversion error, constraint error, unavailable AMP error). In Phase 2, these data are sent to the actual AMP on the basis of hashing the PI Values and then, sorting the same. Post sorting, the Duplicates are removed and the Error 2 table (UPI Violation error) are captured.

My question is: When the data is brought into the Database from the source file, is it stored in some temporary memory and in some random AMP. If Random AMP, how Teradata ensures that a few AMPs are getting most of the values, even though till the completion of Phase 1 ?

Thanks in Advance,

Smarak

Teradata Employee

Re: Mload vs Fastload in rejecting the duplicates

Phase 1 - data is supplied from the client/load server side of Fastload/TPT Load Operator in blocks of data sent down multiple sessions each of which is connected to a different AMP (not all AMPs have to have sessions, data is not organized by the client to be sent to its eventual target AMP). The receiver thread in the AMP gets the block, deblocks the records, does data type translation from the source types to the target table types, hashes the fields destined for the PI of the target table, then forwards (redistributes) the row to the AMP that will own the row. A second receiver thread gets the redistributed row and writes it into the target table. This process continues until the client signals that it has sent all data and until the internal deblocking and redistribution has all completed.

Phase 2 - A sort is performed on the target table to organize the rows according to the sort order of the PI. One mode of our sort engine is to be able to eliminate duplicates while sorting (eg for DISTINCT operation). We utilize this mode for the sort. The result of the sort is written into it's permanent home in the target table. Then the load does its completion operations and reports completion of the load. As part of the completion, the unsorted version of the data is removed from the target table.

Teradata Employee

Re: Mload vs Fastload in rejecting the duplicates

Thanks Todd for your answers.

I would appreciate your feedback on this forum, which is yet to receive any feedback:

http://forums.teradata.com/forum/tools/difference-between-load-export-utilities

My purpose with this forum is to understand these utilities at a fundamental level, rather than simply knowing how to use them.

Thanks in advance.

Enthusiast

Re: Mload vs Fastload in rejecting the duplicates

Thanks Todd for your very informative answer regarding both phases of FL.

Could you please explain in same way all the phases of Multiload.

Thanks