I have a MULTISET table with a PI column which is of INTEGER datatype (single column PI only). This column has system generated values that are unique (like a surrogate key) (using ROW_NUMBER() OVER (ORDER bY 1)). With the PI column value being unique, I still note the table to have high skew value of 74% with the table having a row count of 20 million rows .
I understand that having unique PI will distribute the tables rows evenly across all AMPs and the more unique the PI value is the skew will be better ( < 1) . With generated surrogate key also being unique, why is it that the table skew is still high. Please share your thoughts around this.
To be honest you manage to confused me .
You mentioned system generated values but then you say that you use row_number()......
You mentioned that the table is Multiset but you have not mentioned if you have a UPI or PI ?
Can you provide us the result of
sel pi_clm,count(*) from table
group by 1
having count(*) >1;
Then , can you post the number of the Amp's ? SELECT HASHAMP()+1 ;
and the sel count(*) from table;
It is a MULTISET table with PI. We generated the surrogate key with row_number() OVER (ORDER BY 1).
Also the select pi_column, count(*) always gives 1 as the count for each PI value and there were none with count(*) > 1.
How did you calculate the skew, based on HASHAMP(HASHBUCKET(HASHROW(PI))) or dbc.TableSizeV?
Can you show the actual query?