I wanted to know how the GROUP BY is processed in TD14. I've read some posts and blogs about GROUP BY vs DISTINCT in pre-TD13 like:
But I couldn't find any discussion on how it is done in TD14. I wanted to know if there was a way to estimate or know the spool usage for the aggregate step like how Sir Dnoeth explained it.
Would appreciate any feedback or even just a link or document which could provide explanation.
GROUP BY for TD14 is explained on following link:
Thank you so much for your reply (^_^)
I didn't find the spool info I needed but will still try searching for it using the link you gave (^_^)
Since TD13 there are multiple algorithms to do aggregation, some are indicated in Explain by "skippiong local/global cache", etc.
Plus the optimizer automatically decides if DISTINCT or GROUP BY is more efficient.
And as there's no way to force the optimizer to use one specific algorithm, you can only check estimated spool size in Explain and/or DBQL steps.
One quick question:
What is the difference between select distinct and group by?
i.e. select distinct field from table vs. select field from table group by 1??
Will these two queries pull the same result? Is the goup by more efficient?
The queries will pull the same result but there is processing difference between these two queries. Distinct will first sort and then eliminate duplicates, while group by do not sort. So group by is more efficient.
Depending on the algorithm GROUP BY also sorts: ARSA - Aggregate-Redistribute-Sort-Aggregate.
And GROUP BY is not always more efficient, there are cases where DISTINCT is faster, if the rows per value is lower than number of AMPs.
That's why the optimizer decides since TD13 which is more effcient based on statistics.
I posted some comparison in the thread mentioned in the first post: