all-AMPs STATS FUNCTION

Analytics
N/A

all-AMPs STATS FUNCTION

I am trying to understand the following steps from an explain of a massively skewed query.

The "no confidence" is because of the calculations and functions applied to the columns.

Would appreciate if someone could correct my understanding.

It's a 2650, 219 amps system.

 17) We do an all-AMPs STAT FUNCTION step from Spool 6 (Last Use) by

     way of an all-rows scan into Spool 33 (Last Use), which is assumed

     to be redistributed by value to all AMPs.  The result rows are put

     into Spool 31 (all_amps), which is built locally on the AMPs.  The

     size is estimated with no confidence to be 912,238,561 rows (

     643,128,185,505 bytes).

 18) We do an all-AMPs STAT FUNCTION step from Spool 31 (Last Use) by

     way of an all-rows scan into Spool 36 (Last Use), which is assumed

     to be redistributed by value to all AMPs.  The result rows are put

     into Spool 35 (all_amps), which is built locally on the AMPs.  The

     size is estimated with no confidence to be 912,238,561 rows (

     644,040,424,066 bytes).

 19) We do an all-AMPs STAT FUNCTION step from Spool 35 (Last Use) by

     way of an all-rows scan into Spool 39 (Last Use), which is assumed

     to be redistributed by value to all AMPs.  The result rows are put

     into Spool 38 (all_amps), which is built locally on the AMPs.  The

     size is estimated with no confidence to be 912,238,561 rows (

     648,601,616,871 bytes).

 20) We do an all-AMPs STAT FUNCTION step from Spool 38 (Last Use) by

     way of an all-rows scan into Spool 42 (Last Use), which is assumed

     to be redistributed by value to all AMPs.  The result rows are put

     into Spool 5 (group_amps), which is built locally on the AMPs.

     Then we do a SORT to order Spool 5 by the sort key in spool field1.

     The size is estimated with no confidence to be 912,238,561 rows (

     654,987,286,798 bytes).

Step 17:

The optimizer performs an all rows scan on spool 6 and creates spool 33 from this data.

Spool 33 is redistributed to all AMPS.

The possibility of this data from spool 33 being redistributed to only "few" amps seems likely, which i'm comteplating because of the 99% CPU & Disk skew incurred in this step.

Once the data is redistributed on these "few" amps, the STAT function (Analytics) is applied on this redistributed data and the result goes in spool 31.

Spool 31 is built locally on the amps.

Steps 18, 19 & 20 would be on pretty much sames lines as step 17 ?

Any insight is appreciated...

Thanks

Sanji