Performace of Count(Distinct)

Database
Enthusiast

Performace of Count(Distinct)

I have many users issuing queries that include multiple count(distinct) clauses in the select list. when .5 to 2 B rows are being accessed the queries sometimes run 10 hours or so.

Has anyone found a way to optimize such queries?

6 REPLIES
Enthusiast

Re: Performace of Count(Distinct)

Are these simple SELECT statements or more complex queries with multiple joins? 

Enthusiast

Re: Performace of Count(Distinct)

More complex with two or 3 joins usually.

Enthusiast

Re: Performace of Count(Distinct)

Example:

SELECT    upc_num,

    wd_cd,

    COUNT ( DISTINCT household_num ),

    SUM ( Ext_Paid_Price_Amt ),

    COUNT ( DISTINCT Unique_Trans ),

    SUM ( Ext_Item_Qty )

    FROM crm_view.Customer_Transaction_Item,

        user_view.INVRY_PROD_UPC_XREF_CRNT

    WHERE crm_view.Customer_Transaction_Item.upc_num =

    user_view.INVRY_PROD_UPC_XREF_CRNT.scn_id

        AND  ( trans_dt BETWEEN '2010-06-30' AND '2010-09-21'

    OR    trans_dt BETWEEN '2011-06-29' AND '2011-09-20' )

    GROUP BY upc_num,

        wd_cd

    ORDER BY upc_num,

        wd_cd;

Enthusiast

Re: Performace of Count(Distinct)

I would suggest to create a sparse aggregate join index for the query, if possible. That will help a lot

Enthusiast

Re: Performace of Count(Distinct)

Couple of thoughts come to mind:

  1. AJI would be one option, as mentioned, depending on your load strategy for the tables. Some load utilities require that AJIs be dropped and recreated after the load has completed.
  2. Have you considered a denormalized, static fact table that is maintained as part of the daily ETL process that would satisfy this type of user access? The query appears to be quarterly in nature. Your denormalized fact table(s) could be produced at different calendar levels to facilitate easy access. (e.g. Weekly, Monthly, Quarterly, Annual, etc.) Then maximize the use of compression to reduce the storage requirements.
  3. What partitioning has been implemented on the underlying tables, if any?
  4. Do the users always run this report for entire corporation and all products sold? How much of the data is ignored by the users because they are only interested in how well fishing poles and tackle vs. swimming pools and goggles sold during the Summer season?
Enthusiast

Re: Performace of Count(Distinct)

  1. Thank you all for the responses:

The main fact Table cannot easily have an AJI due to near constant usage and size(1 TB).

I have created a denormalize table to test but the rusults where not a lot better, considering the overhead that would be required (the fact table is continous load and the denormalized version would need to be close to that)

The fact table is partitioned on trans_dt as is the denormalized table.

The users submitting this type of query are mainly the Marketing folks and I'm suspect they are pulling more data than is actually used but theres little hope of convincing anyone they don't need it all.

I think I will test an AJI on the denormalized version.

Thanks,