I have many users issuing queries that include multiple count(distinct) clauses in the select list. when .5 to 2 B rows are being accessed the queries sometimes run 10 hours or so.
Has anyone found a way to optimize such queries?
COUNT ( DISTINCT household_num ),
SUM ( Ext_Paid_Price_Amt ),
COUNT ( DISTINCT Unique_Trans ),
SUM ( Ext_Item_Qty )
WHERE crm_view.Customer_Transaction_Item.upc_num =
AND ( trans_dt BETWEEN '2010-06-30' AND '2010-09-21'
OR trans_dt BETWEEN '2011-06-29' AND '2011-09-20' )
GROUP BY upc_num,
ORDER BY upc_num,
Couple of thoughts come to mind:
The main fact Table cannot easily have an AJI due to near constant usage and size(1 TB).
I have created a denormalize table to test but the rusults where not a lot better, considering the overhead that would be required (the fact table is continous load and the denormalized version would need to be close to that)
The fact table is partitioned on trans_dt as is the denormalized table.
The users submitting this type of query are mainly the Marketing folks and I'm suspect they are pulling more data than is actually used but theres little hope of convincing anyone they don't need it all.
I think I will test an AJI on the denormalized version.