Redistribution to all AMPs


Redistribution to all AMPs

Hi, everyone!

As I have understood, generally, when performing a query the result goes to the Spool which can be
1. built locally on the AMP
2. duplicated to all AMPs
3. redistributed by hash code to all AMPs

Could you tell me, how the Optimizer decides by which field the result is redistributed to all AMPs?

Let's assume I have two tables T1 (fields p1, p2), T2 (fields q1,q2,q3) and a query like this:

SELECT T1.p2, T2.q2, T2.q3
ON (T1.p1=T2.q1)

The result contains fields p2,q2,q3. So, by which field it will be redistributed to all AMPs?

Are there any ways to influence on this process by modifying the query?

Thank you in advance!

Tags (2)
Senior Apprentice

Re: Redistribution to all AMPs

Hi Andrey,
the redistribution is usually based on the join columns, you can see the actual columns in EXPLAIN.

Rewriting the query might change the plan in some rare cases, e.g. if you join on a calculated column and put the calculation in a Derived Table.
Usually collecting stats has more influence on the optimizer.


Re: Redistribution to all AMPs

Hi, Dieter!

Thank you for help!
Great, I can see the actual columns in EXPLAIN!

But the problem is that a query executes very slowly (exactly on this step, when JOIN and redistribution to all AMPs are performed) with a SKEW of 99%...
Could you please tell me, what it means? And are there any ways to avoid such situations?

Thank you in advance!

Re: Redistribution to all AMPs

Most likely it's redistributing on a poor choice of columns. This could be because your data on some join columns isn't very unique or that the optimizer doesn't have good information on your data.

Make sure you have statistics collected on your join columns. Better demographics information may alter the plan the optimizer uses.

If this is a commonly executed query, you could look at the indexes and see if changing the PI would help (or creating a join index with the other PI).