I have a set of SQLs that run a long amount of time and CPU all all of these are taking time at the aggregation step that explains something like below.Stats are collected on obvious joining columns but not sure if they are sufficient.
Can someone please let me know what happens in an aggregations step and what conclusions we can take when this happens to try and resolve this.
Any ideas/thoughts are highly appreciated.
We do an all-AMPs SUM step to aggregate from Spool 17 (Last Use)
by way of an all-rows scan , grouping by field1 (
Intermediate Results are computed globally, then placed in Spool
28. The aggregate spool file will not be cached in memory. The
size of Spool 28 is estimated with low confidence to be
762,227,092 rows (645,606,346,924 bytes). The estimated time for
this step is 31 minutes and 4 seconds.
If there's no GROUP BY in your SELECT it's probably a DISTINCT rewritten by the optimier as an aggragtion.
If the estimated numbers are correct, this step might run for a long time, did you check dbc.QryLogStepsV if this step actually uses lots of resources?
The SQL for which i pasted the aggregation explain had a GROUP BY clause involved in it.
I have checked and confirmed that this is not only taking more time but also using more than 70K CPU which, as you know, is a lot.Let me know if any other information from DBC.QryLogStepsV can support/help.
Coming to the estimates in this case,the expected rows are greater than the actual rows.
In some cases it might be more efficient to materialize the result of the query without the GROUP BY first as a MultiSet Volatile Table with a Primary Index on the GROUP BY columns. Then do the aggregation in another SELECT.