data distribution in amps

General

data distribution in amps

hai
if iam having 5 amps and i have 5 rows it will distribute rows evenly in 5 amps due to primary index , if iam having 3 rows how it ill distribute in amps ?

thanq
Tags (1)
2 REPLIES
WAQ
Enthusiast

Re: data distribution in amps

It depends on your index.

If the index is unique then 3 rows will distribute on 3 AMPs leaving two AMPs empty (without rows). This is called skewness. In this case you will not be benefiting from the full parallel performance of the system as 2 AMPs will be sitting idle.

If the index is non-unique and all the 3 values are same then they end up on the same AMP which again result in highly skewed data.
Enthusiast

Re: data distribution in amps

Data is distributed to amps based on the hash value of the primary index columns. You can use the HASH functions to see which AMP a row would be distributed to. The function would look like this: HASHAMP(HASHBUCKET(HASHROW())). This can be placed in a SELECT statement and can be used as a grouping field in an aggregate query. I've used this statement quite a bit to estimate skewing for a potential primary index.

But to answer your question, it really depends on your system and the PI values. Even unique values can potentially have hash collisions where two different values will hash to the same thing. It's also possible that (for a small enough system) you will have multiple hash values that will land on the same amp. In my experience, the more rows you have the less this becomes a factor.