Cumulative Sum Performance Improvement Question

Database
Enthusiast

Cumulative Sum Performance Improvement Question

Hello Experts,

 

My target table (application requirement):

CREATE VOLATILE TABLE SOM_TEST_TARGET
(
WK VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, PRD VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, QTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, YR VARCHAR(6) CHARACTER SET LATIN NOT CASESPECIFIC
, CUST_ATTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, GEO_ATTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, PROD_ATTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, MEAS1_ACTL_AMT DECIMAL(24,6)
, MEAS2_ACTL_AMT DECIMAL(24,6)
, MEAS3_ACTL_AMT DECIMAL(24,6)
, MEAS4_ACTL_AMT DECIMAL(24,6)
, MEAS1_ACTL_AMT_PTD DECIMAL(38,6)
, MEAS2_ACTL_AMT_PTD DECIMAL(38,6)
, MEAS3_ACTL_AMT_PTD DECIMAL(38,6)
, MEAS4_ACTL_AMT_PTD DECIMAL(38,6)
, MEAS1_ACTL_AMT_QTD DECIMAL(38,6)
, MEAS2_ACTL_AMT_QTD DECIMAL(38,6)
, MEAS3_ACTL_AMT_QTD DECIMAL(38,6)
, MEAS4_ACTL_AMT_QTD DECIMAL(38,6)
, MEAS1_ACTL_AMT_YTD DECIMAL(38,6)
, MEAS2_ACTL_AMT_YTD DECIMAL(38,6)
, MEAS3_ACTL_AMT_YTD DECIMAL(38,6)
, MEAS4_ACTL_AMT_YTD DECIMAL(38,6)
, MEAS1_ACTL_AMT_YAG DECIMAL(24,6) -- YAG stands for Year AGo
, MEAS2_ACTL_AMT_YAG DECIMAL(24,6)
, MEAS3_ACTL_AMT_YAG DECIMAL(24,6)
, MEAS4_ACTL_AMT_YAG DECIMAL(24,6)
, MEAS1_ACTL_AMT_YAG_PTD DECIMAL(38,6)
, MEAS2_ACTL_AMT_YAG_PTD DECIMAL(38,6)
, MEAS3_ACTL_AMT_YAG_PTD DECIMAL(38,6)
, MEAS4_ACTL_AMT_YAG_PTD DECIMAL(38,6)
, MEAS1_ACTL_AMT_YAG_QTD DECIMAL(38,6)
, MEAS2_ACTL_AMT_YAG_QTD DECIMAL(38,6)
, MEAS3_ACTL_AMT_YAG_QTD DECIMAL(38,6)
, MEAS4_ACTL_AMT_YAG_QTD DECIMAL(38,6)
, MEAS1_ACTL_AMT_YAG_YTD DECIMAL(38,6)
, MEAS2_ACTL_AMT_YAG_YTD DECIMAL(38,6)
, MEAS3_ACTL_AMT_YAG_YTD DECIMAL(38,6)
, MEAS4_ACTL_AMT_YAG_YTD DECIMAL(38,6)
)
PRIMARY INDEX (WK, PRD, YR, CUST_ATTR, GEO_ATTR, PROD_ATTR)
ON COMMIT PRESERVE ROWS;

My Source table:

 

CREATE VOLATILE TABLE ACTL_YAG_MSRS
(
WK VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, PRD VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, QTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, YR VARCHAR(6) CHARACTER SET LATIN NOT CASESPECIFIC
, CUST_ATTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, GEO_ATTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, PROD_ATTR VARCHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC
, MEAS1_ACTL_AMT DECIMAL(24,6)
, MEAS2_ACTL_AMT DECIMAL(24,6)
, MEAS3_ACTL_AMT DECIMAL(24,6)
, MEAS4_ACTL_AMT DECIMAL(24,6)
)
PRIMARY INDEX (WK, PRD, YR, CUST_ATTR, GEO_ATTR, PROD_ATTR)
ON COMMIT PRESERVE ROWS;

Target table contains only last 52 weeks data. Source table contains last 7 years data. For every week, my source has about 5 million records.

 

Now, my objective is to run a job every week to refresh the target table for last 52 weeks (truncate and load every time).

 

I am calculating:

PTD as:

SUM(MEAS1_ACTL_AMT) OVER (PARTITION BY PRD, QTR, YR, CUST_ATTR, GEO_ATTR, PROD_ATTR) AS MEAS1_ACTL_AMT_PTD

QTD as:

SUM(MEAS1_ACTL_AMT) OVER (PARTITION BY QTR, YR, CUST_ATTR, GEO_ATTR, PROD_ATTR) AS MEAS1_ACTL_AMT_QTD

YTD as:

SUM(MEAS1_ACTL_AMT) OVER (PARTITION BY YR, CUST_ATTR, GEO_ATTR, PROD_ATTR) AS MEAS1_ACTL_AMT_YTD

 

I created multiple volatile tables in between to load PTD data, then QTD data, then YTD data for one measure at a time. While checking the performance of individual queries; spool, CPU SKEW, PJI are all in control.

 

However, each week data takes me about 30 mins to load. Hence, for 52 weeks, as you can imagine it would take about 26 hours which isn't great.

If I combine all the sums at once, it exceeds all performance metric thresholds and runs out of spool.

Any ideas how to improve the timings without affecting other performance metrics? Can I do the sum in a different way? Thanks in advance!

SR

Accepted Solutions
Enthusiast

Re: Cumulative Sum Performance Improvement Question

The end report has 2 parts (at this time):

  • A summary view page that contains only data at a summary level with WTD, PTD, QTD, YTD for latest week alone (for execs)
  • A detailed view page for each of the dimensions that contains data only at WTD and PTD (for data analysts)

The main concern and most time consuming parts are QTD and YTD calculations. Moreover, QTD and YTD calculations were being used in the report while displaying report only at a summary level. This is what we did:

 

Created an intermediate table to contain only WTD and PTD values. So, the job runtime went down from 30 minutes to about 2-3 minutes for each week. So, in total, for 52 weeks, it took less than 2 hours to load the entire table. CPU usage, PJI, Spool are all in control. The end report still uses this table to show the detailed section when users want to drill down to a more detailed resultset.

 

Created a final table containing WTD, PTD, QTD and YTD. However, this time, the source of this table is the intermediate table. Also, we could eliminate some dimensional attributes from this table to make the table data set a little smaller. Run time of this table is about 2-3 minutes per week. Users wanted to see this view only for the latest week, so, we restricted the table to have only the latest week data.

 

So, in total, data load for 2 tables take less than 2 hours and CPU usage/PJI/Spool are all in control. The end report runs live queries to the database. We analyzed the report queries and added JIs in place to have them run really quick.

SR
1 ACCEPTED SOLUTION
2 REPLIES

Re: Cumulative Sum Performance Improvement Question

As your source table is huge, range-partitioning should have improved the performance.

Long time since you've posted this question, so you can share the approach you'd have picked then.

Enthusiast

Re: Cumulative Sum Performance Improvement Question

The end report has 2 parts (at this time):

  • A summary view page that contains only data at a summary level with WTD, PTD, QTD, YTD for latest week alone (for execs)
  • A detailed view page for each of the dimensions that contains data only at WTD and PTD (for data analysts)

The main concern and most time consuming parts are QTD and YTD calculations. Moreover, QTD and YTD calculations were being used in the report while displaying report only at a summary level. This is what we did:

 

Created an intermediate table to contain only WTD and PTD values. So, the job runtime went down from 30 minutes to about 2-3 minutes for each week. So, in total, for 52 weeks, it took less than 2 hours to load the entire table. CPU usage, PJI, Spool are all in control. The end report still uses this table to show the detailed section when users want to drill down to a more detailed resultset.

 

Created a final table containing WTD, PTD, QTD and YTD. However, this time, the source of this table is the intermediate table. Also, we could eliminate some dimensional attributes from this table to make the table data set a little smaller. Run time of this table is about 2-3 minutes per week. Users wanted to see this view only for the latest week, so, we restricted the table to have only the latest week data.

 

So, in total, data load for 2 tables take less than 2 hours and CPU usage/PJI/Spool are all in control. The end report runs live queries to the database. We analyzed the report queries and added JIs in place to have them run really quick.

SR