Can any one let me know the detailed functionality of TPUMP utility. On PI table, the row distribution is fairly even. But in case of NoPI, how is the even distribution achieved?
In case of Fast Load on NoPI tables, if the no. of fast load sessions is less than the no. of amps in the system,then only those amps will be used for loading the data and then the deblocker task will perform a round robin technique to distribute the rows evenly to other amps.
In case of TPUMP Load on NoPI table , I read that the hashing is done on query ID and all the rows that TPUMP fetches will be loaded to that AMP. Lets say, we have written a query to load the data in NoPI table using TPUMP. The query may fetch one row or multiple rows(say 100 rows). Since in TPUMP the hashing is done on query ID, the output would be 32 bit Row hash. If we take 16 or 20 bits to map a AMP, all the 100 rows goes to the same AMP.If this is the case, is it not leading to skewing?
Is the deblocker task performing a round robin technique, as in case of Fast load to acomplish even distribution?
Please help me in understanding the TPUMP utility functionality.
Thanks in advance,
all rows go to the same AMP and are appended at the 'end' of the table probably in a single datablock, that's why it's much more efficient than distributing them to multiple AMPs and storing them in multiple blocks (in worst case read/write one datablock for each input row).
The next pack of rows will be stored on a different AMP based on the next query ID. If you got lots of rows randomly distributing those packs normally results in a good distribution.
And if it's a small number of rows you simply don't care :-)
everything's in the manuals, but not always easy to find :-)