General

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-06-2011
11:46 PM

02-06-2011
11:46 PM

hai

if iam having 5 amps and i have 5 rows it will distribute rows evenly in 5 amps due to primary index , if iam having 3 rows how it ill distribute in amps ?

thanq

if iam having 5 amps and i have 5 rows it will distribute rows evenly in 5 amps due to primary index , if iam having 3 rows how it ill distribute in amps ?

thanq

2 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-07-2011
05:39 AM

02-07-2011
05:39 AM

It depends on your index.

If the index is unique then 3 rows will distribute on 3 AMPs leaving two AMPs empty (without rows). This is called skewness. In this case you will not be benefiting from the full parallel performance of the system as 2 AMPs will be sitting idle.

If the index is non-unique and all the 3 values are same then they end up on the same AMP which again result in highly skewed data.

If the index is unique then 3 rows will distribute on 3 AMPs leaving two AMPs empty (without rows). This is called skewness. In this case you will not be benefiting from the full parallel performance of the system as 2 AMPs will be sitting idle.

If the index is non-unique and all the 3 values are same then they end up on the same AMP which again result in highly skewed data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-07-2011
09:57 AM

02-07-2011
09:57 AM

Data is distributed to amps based on the hash value of the primary index columns. You can use the HASH functions to see which AMP a row would be distributed to. The function would look like this: HASHAMP(HASHBUCKET(HASHROW())). This can be placed in a SELECT statement and can be used as a grouping field in an aggregate query. I've used this statement quite a bit to estimate skewing for a potential primary index.

But to answer your question, it really depends on your system and the PI values. Even unique values can potentially have hash collisions where two different values will hash to the same thing. It's also possible that (for a small enough system) you will have multiple hash values that will land on the same amp. In my experience, the more rows you have the less this becomes a factor.

But to answer your question, it really depends on your system and the PI values. Even unique values can potentially have hash collisions where two different values will hash to the same thing. It's also possible that (for a small enough system) you will have multiple hash values that will land on the same amp. In my experience, the more rows you have the less this becomes a factor.