I have a question on Mload & FastLoad.
Lets assume that I have an empty table and I am trying to load a file into this table using Mload or fastload. Now, based on below knowledge, I want to conclude, in application phase , which of these utilities will perform better.
To my knowledge, in Mload, data gets loaded to worktable in aquisition phase and then will be moved to actual table in application phase. My question here is
In Fastload, data first gets pushed on to the amps in aquisition phase and then in application phase, it gets re-distributed across amps. Does this mean that hashing in fastload happens in application phase and will the data gets moved across amps over BYNET. If this is true, application phase of Fastload takes more time compared to Mload as hashing is involved here.
Please help & correct me if my understanding is wrong
In FastLoad Phase 1, blocks of data are sent to arbitrary AMPs which deblock the data, compute the rowhash, and send each row to the proper destination AMP. So "redistribution" is part of Phase 1. Phase 2 is AMP-local (sort by ROWID).
The MultiLoad worktable has the same PI as the target table, and is always Fallback protected (so the rows are actually written twice). Again, the hash computation and row (re)distribution happens in the first (Acquisition) phase. The Application phase is AMP-local (with no NUSIs, essentially merges worktable to target table).
Thanks Fred. So, if I understood you point correctly, be it phase-2 of fastload or application phase of Mload, data is just committed to the actual target table. Redistribution/hashing of data to move to the corresponding Amp will happen in Phase-1/Aquisition phase itself.
Also, Could you please clarify on the below.
In both cases, an entire data block from the client is sent to an AMP, which immediately de-blocks the data, builds individual rows, and sends each row to the appropriate target AMP, which then appends the row to the table (FastLoad) or inserts it to the worktable (MultiLoad).