Aggregation is so slow..can anyone help me what i have to do?

Database
Fan

Aggregation is so slow..can anyone help me what i have to do?

Hi,

Please help me to optimze the below query...

Query:

SELECT
IECMD.MONTH_END_DATE_KEY AS DATE_KEY,
AWSP.STORE_KEY AS STORE_KEY,
AWSP.PRODUCT_KEY AS PRODUCT_KEY,
AWSP.STORE_CODE AS STORE_CODE,
AWSP.PRODUCT_CODE AS PRODUCT_CODE,
SUM(SALE_TOT_QTY) AS SALE_TOT_QTY,
SUM(SALE_TOT_QTY_LY) AS SALE_TOT_QTY_LY,
SUM(SALE_NET_VAL) AS SALE_NET_VAL,
SUM(SALE_NET_VAL_LY) AS SALE_NET_VAL_LY,
SUM(SALE_RETURN_QTY) AS SALE_RETURN_QTY,
SUM(SALE_RETURN_QTY_LY) AS SALE_RETURN_QTY_LY,
SUM(SALE_RETURN_VAL) AS SALE_RETURN_VAL,
SUM(SALE_RETURN_VAL_LY) AS SALE_RETURN_VAL_LY,
SUM(SALE_GLOBAL_DISC_QTY) AS SALE_GLOBAL_DISC_QTY,
SUM(SALE_GLOBAL_DISC_QTY_LY) AS SALE_GLOBAL_DISC_QTY_LY,
SUM(SALE_GLOBAL_DISC_VAL) AS SALE_GLOBAL_DISC_VAL,
SUM(SALE_GLOBAL_DISC_VAL_LY) AS SALE_GLOBAL_DISC_VAL_LY,
SUM(SALE_EMP_DISC_QTY) AS SALE_EMP_DISC_QTY,
SUM(SALE_EMP_DISC_QTY_LY) AS SALE_EMP_DISC_QTY_LY,
SUM(SALE_EMP_DISC_VAL) AS SALE_EMP_DISC_VAL,
SUM(SALE_EMP_DISC_VAL_LY) AS SALE_EMP_DISC_VAL_LY,
SUM(SALE_MARKDOWN_QTY) AS SALE_MARKDOWN_QTY,
SUM(SALE_MARKDOWN_QTY_LY) AS SALE_MARKDOWN_QTY_LY,
SUM(SALE_MARKDOWN_VAL) AS SALE_MARKDOWN_VAL,
SUM(SALE_MARKDOWN_VAL_LY) AS SALE_MARKDOWN_VAL_LY,
SUM(SALE_ADJ_QTY) AS SALE_ADJ_QTY,
SUM(SALE_ADJ_QTY_LY) AS SALE_ADJ_QTY_LY,
SUM(SALE_ADJ_VAL) AS SALE_ADJ_VAL,
SUM(SALE_ADJ_VAL_LY) AS SALE_ADJ_VAL_LY,
SUM(SALE_TOT_TAX_VAL) AS SALE_TOT_TAX_VAL,
SUM(SALE_TOT_TAX_VAL_LY) AS SALE_TOT_TAX_VAL_LY,
SUM(SALE_TOT_VAL_AT_PP) AS SALE_TOT_VAL_AT_PP,
SUM(SALE_TOT_VAL_AT_PP_LY) AS SALE_TOT_VAL_AT_PP_LY,
SUM(SALE_TOT_VAL_AT_WAP) AS SALE_TOT_VAL_AT_WAP,
SUM(SALE_TOT_VAL_AT_WAP_LY) AS SALE_TOT_VAL_AT_WAP_LY,
SUM(SALE_TOT_VAL_AT_HOSP) AS SALE_TOT_VAL_AT_HOSP,
SUM(SALE_TOT_VAL_AT_HOSP_LY) AS SALE_TOT_VAL_AT_HOSP_LY,
SUM(SALE_TOT_VAL_AT_SP) AS SALE_TOT_VAL_AT_SP,
SUM(SALE_TOT_VAL_AT_SP_LY) AS SALE_TOT_VAL_AT_SP_LY,
SUM(ACCRUED_REBATE_VAL) ACCRUED_REBATE_VAL,
SUM(ACCRUED_REBATE_VAL_LY) ACCRUED_REBATE_VAL_LY,
SUM(VO_GLOBAL_DISC_QTY) AS VO_GLOBAL_DISC_QTY,
SUM(VO_GLOBAL_DISC_QTY_LY) AS VO_GLOBAL_DISC_QTY_LY,
SUM(VO_GLOBAL_DISC_VAL) AS VO_GLOBAL_DISC_VAL,
SUM(VO_GLOBAL_DISC_VAL_LY) AS VO_GLOBAL_DISC_VAL_LY,
SUM(VO_TOT_RECD_QTY) AS VO_TOT_RECD_QTY,
SUM(VO_TOT_RECD_QTY_LY) AS VO_TOT_RECD_QTY_LY,
SUM(VO_NET_RECD_VAL) AS VO_NET_RECD_VAL,
SUM(VO_NET_RECD_VAL_LY) AS VO_NET_RECD_VAL_LY,
SUM(VO_TOT_RECD_VAL_AT_PP) AS VO_TOT_RECD_VAL_AT_PP,
SUM(VO_TOT_RECD_VAL_AT_PP_LY) AS VO_TOT_RECD_VAL_AT_PP_LY,
SUM(VO_TOT_RECD_VAL_AT_WAP) AS VO_TOT_RECD_VAL_AT_WAP,
SUM(VO_TOT_RECD_VAL_AT_WAP_LY) AS VO_TOT_RECD_VAL_AT_WAP_LY,
SUM(VO_TOT_RECD_VAL_AT_HOSP) AS VO_TOT_RECD_VAL_AT_HOSP,
SUM(VO_TOT_RECD_VAL_AT_HOSP_LY) AS VO_TOT_RECD_VAL_AT_HOSP_LY,
SUM(VO_TOT_RECD_VAL_AT_SP) AS VO_TOT_RECD_VAL_AT_SP,
SUM(VO_TOT_RECD_VAL_AT_SP_LY) AS VO_TOT_RECD_VAL_AT_SP_LY,
SUM(INV_TOT_TXN_QTY) AS INV_TOT_TXN_QTY,
SUM(INV_TOT_TXN_QTY_LY) AS INV_TOT_TXN_QTY_LY,
SUM(INV_TOT_TXN_VAL_AT_PP) AS INV_TOT_TXN_VAL_AT_PP,
SUM(INV_TOT_TXN_VAL_AT_PP_LY) AS INV_TOT_TXN_VAL_AT_PP_LY,
SUM(INV_TOT_TXN_VAL_AT_WAP) AS INV_TOT_TXN_VAL_AT_WAP,
SUM(INV_TOT_TXN_VAL_AT_WAP_LY) AS INV_TOT_TXN_VAL_AT_WAP_LY,
SUM(INV_TOT_TXN_VAL_AT_HOSP) AS INV_TOT_TXN_VAL_AT_HOSP,
SUM(INV_TOT_TXN_VAL_AT_HOSP_LY) AS INV_TOT_TXN_VAL_AT_HOSP_LY,
SUM(INV_TOT_TXN_VAL_AT_SP) AS INV_TOT_TXN_VAL_AT_SP,
SUM(INV_TOT_TXN_VAL_AT_SP_LY) AS INV_TOT_TXN_VAL_AT_SP_LY,
SUM(SALE_TOT_ITEM_COUNT) SALE_TOT_ITEM_COUNT,
SUM(SALE_TOT_ITEM_COUNT_LY) AS SALE_TOT_ITEM_COUNT_LY,
IECMD.ARC_DATE AS ARC_DATE
FROM
ARCPROD.AGG_WLY_STR_PROD AWSP
JOIN INC_ETL_CUR_MONTH_DATES IECMD ON IECMD.DATE_KEY = AWSP.DATE_KEY
JOIN ARCPROD.DIM_STORE DS ON AWSP.STORE_CODE = DS.STORE_CODE
JOIN ARCPROD.DIM_PRODUCT DP ON AWSP.PRODUCT_CODE = DP.PRODUCT_CODE
WHERE
IECMD.DATE_FLD >= DS.VALID_FROM AND IECMD.DATE_FLD < DS.VALID_TO AND
IECMD.DATE_FLD >= DP.VALID_FROM AND IECMD.DATE_FLD < DP.VALID_TO
GROUP BY
AWSP.STORE_KEY,
AWSP.PRODUCT_KEY,
AWSP.STORE_CODE,
AWSP.PRODUCT_CODE,
IECMD.MONTH_END_DATE_KEY,
IECMD.ARC_DATE;

in the where clause date_key in Aggregate weekly str prod table is a PPI and store_code and product code are NUSI.Valid_from ,valid_to are the NUSI and date_fld also.Collect stats. have been done for the above columns also.but it is taking more time during the aggregation part..

Explain :

Explanation
1) First, we lock a distinct ARCPROD."pseudo table" for read on a
RowHash to prevent global deadlock for ARCPROD.AWSP.
2) Next, we lock a distinct ARCPROD."pseudo table" for read on a
RowHash to prevent global deadlock for ARCPROD.IECMD.
3) We lock a distinct ARCPROD."pseudo table" for read on a RowHash to
prevent global deadlock for ARCPROD.DP.
4) We lock a distinct ARCPROD."pseudo table" for read on a RowHash to
prevent global deadlock for ARCPROD.DS.
5) We lock ARCPROD.AWSP for read, we lock ARCPROD.IECMD for read, we
lock ARCPROD.DP for read, and we lock ARCPROD.DS for read.
6) We do an all-AMPs RETRIEVE step from ARCPROD.IECMD by way of an
all-rows scan with no residual conditions into Spool 4 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs.
The size of Spool 4 is estimated with high confidence to be 56
rows (1,624 bytes). The estimated time for this step is 0.01
seconds.
7) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to ARCPROD.DS by way of an all-rows
scan with a condition of ("NOT (ARCPROD.DS.STORE_CODE IS NULL)").
Spool 4 and ARCPROD.DS are joined using a product join, with a
join condition of ("(DATE_FLD >= ARCPROD.DS.VALID_FROM) AND
(DATE_FLD < ARCPROD.DS.VALID_TO)"). The result goes into Spool 5
(all_amps) (compressed columns allowed), which is duplicated on
all AMPs. Then we do a SORT to partition by rowkey. The size of
Spool 5 is estimated with low confidence to be 2,408 rows (81,872
bytes). The estimated time for this step is 0.02 seconds.
8) We do an all-AMPs RETRIEVE step from ARCPROD.DP by way of an
all-rows scan with a condition of ("NOT (ARCPROD.DP.PRODUCT_CODE
IS NULL)") into Spool 6 (all_amps) (compressed columns allowed)
fanned out into 9 hash join partitions, which is duplicated on all
AMPs. The size of Spool 6 is estimated with high confidence to be
8,958,600 rows (259,799,400 bytes). The estimated time for this
step is 3.65 seconds.
9) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
all-rows scan, which is joined to ARCPROD.AWSP with a condition of
("NOT (ARCPROD.AWSP.PRODUCT_CODE IS NULL)"). Spool 5 and
ARCPROD.AWSP are joined using a product join, with a join
condition of ("(DATE_KEY = ARCPROD.AWSP.DATE_KEY) AND
(ARCPROD.AWSP.STORE_CODE = STORE_CODE)") enhanced by dynamic
partition elimination. The input table ARCPROD.AWSP will not be
cached in memory, but it is eligible for synchronized scanning.
The result goes into Spool 7 (all_amps) (compressed columns
allowed), which is built locally on the AMPs into 9 hash join
partitions. The result spool file will not be cached in memory.
The size of Spool 7 is estimated with low confidence to be
3,673,861 rows (2,031,645,133 bytes). The estimated time for this
step is 29.99 seconds.
10) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using a hash join
of 9 partitions, with a join condition of ("(DATE_FLD < VALID_TO)
AND ((DATE_FLD >= VALID_FROM) AND (PRODUCT_CODE = PRODUCT_CODE ))").
The result goes into Spool 3 (all_amps) (compressed columns
allowed), which is built locally on the AMPs. The result spool
file will not be cached in memory. The size of Spool 3 is
estimated with low confidence to be 10,876,059 rows (
5,992,708,509 bytes). The estimated time for this step is 1
minute and 13 seconds.
11) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
ARCPROD.AWSP.STORE_KEY ,ARCPROD.AWSP.PRODUCT_KEY
,ARCPROD.AWSP.STORE_CODE ,ARCPROD.AWSP.PRODUCT_CODE
,ARCPROD.IECMD.MONTH_END_DATE_KEY ,ARCPROD.IECMD.ARC_DATE).
Aggregate Intermediate Results are computed globally, then placed
in Spool 8. The aggregate spool file will not be cached in memory.
The size of Spool 8 is estimated with no confidence to be
8,157,045 rows (5,098,153,125 bytes). The estimated time for this
step is 8 minutes and 28 seconds.
12) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with no confidence to be
8,157,045 rows (4,543,474,065 bytes). The estimated time for this
step is 33.63 seconds.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 10 minutes and 48
seconds.

Can you please help me on this ASAP?...and should i need to create any aggregate index for this (AJI),if it is please how to create that for the above query...very urgent..

Thanks

Lavakumar
6 REPLIES
Enthusiast

Re: Aggregation is so slow..can anyone help me what i have to do?

Weigh benefits of having an AJI (Aggregate Join Index). In a few AJI circumstances, AJI reduced from hours to seconds in our database. You may know that AJI supports sum and count, not sure about all math functions.
l_k
Enthusiast

Re: Aggregation is so slow..can anyone help me what i have to do?

The aove select query is running more than 10 minutes everyday..

may i know how to apply AJI for the above select query?
Enthusiast

Re: Aggregation is so slow..can anyone help me what i have to do?

Hi Lavakumar,

Try collecting stats on the columns used in the joins for the tables used and then try running explain again.

thanks
N/A

Re: Aggregation is so slow..can anyone help me what i have to do?

Even I am facing the same issue. Stats are ready on all column used. Still my select query is taking 2 hrs. So to build AJI, it takes 2 hrs. pls correct me if iam wrong.
Enthusiast

Re: Aggregation is so slow..can anyone help me what i have to do?

In these sort of scenarios I have broken down the requirement into smaller tables with proper PI. And then joining the tables. That helped me a lot. You can also try that.
Enthusiast

Re: Aggregation is so slow..can anyone help me what i have to do?

you may try this..
Insert all the dates which satisfy the following condition from IECMD table to another table with two columns - valid_from and valid_to dates.

IECMD.DATE_FLD >= DS.VALID_FROM AND IECMD.DATE_FLD < DS.VALID_TO AND
IECMD.DATE_FLD >= DP.VALID_FROM AND IECMD.DATE_FLD < DP.VALID_TO
Collect stats on the new table columns.
Then join IECMD table with the new table for the date columns and try runnin the query.