I was looking for the answer concerning why & how FastLoad rejects the Duplicate rows and MultiLoad doesn't do the same. Having said that, can you explain as to how exactly Fastload works. From what I understand, in Phase 01, Fastload brings the data from the source file to the Database (This phase populates the Error 1 table as in Conversion error, constraint error, unavailable AMP error). In Phase 2, these data are sent to the actual AMP on the basis of hashing the PI Values and then, sorting the same. Post sorting, the Duplicates are removed and the Error 2 table (UPI Violation error) are captured.
My question is: When the data is brought into the Database from the source file, is it stored in some temporary memory and in some random AMP. If Random AMP, how Teradata ensures that a few AMPs are getting most of the values, even though till the completion of Phase 1 ?
Thanks in Advance,
Phase 1 - data is supplied from the client/load server side of Fastload/TPT Load Operator in blocks of data sent down multiple sessions each of which is connected to a different AMP (not all AMPs have to have sessions, data is not organized by the client to be sent to its eventual target AMP). The receiver thread in the AMP gets the block, deblocks the records, does data type translation from the source types to the target table types, hashes the fields destined for the PI of the target table, then forwards (redistributes) the row to the AMP that will own the row. A second receiver thread gets the redistributed row and writes it into the target table. This process continues until the client signals that it has sent all data and until the internal deblocking and redistribution has all completed.
Phase 2 - A sort is performed on the target table to organize the rows according to the sort order of the PI. One mode of our sort engine is to be able to eliminate duplicates while sorting (eg for DISTINCT operation). We utilize this mode for the sort. The result of the sort is written into it's permanent home in the target table. Then the load does its completion operations and reports completion of the load. As part of the completion, the unsorted version of the data is removed from the target table.
Thanks Todd for your answers.
I would appreciate your feedback on this forum, which is yet to receive any feedback:
My purpose with this forum is to understand these utilities at a fundamental level, rather than simply knowing how to use them.
Thanks in advance.
Thanks Todd for your very informative answer regarding both phases of FL.
Could you please explain in same way all the phases of Multiload.