Database
Enthusiast

## Cumulative calculation problem

I have this situation. The base table here stores counts(number of units) flowing in for a product on a particular date and the Cumulative unit count.

Base Table structure

PRODUCT      DATE     DAILY_CNT      CUM_UNIT_CNT

An incoming file will have number of units for a particular product and date combination, using which the cumulative value needs to be calculated and updated in the base table on a daily basis.

Incoming File/Intermediate table structure

PRODUCT      DATE    DAILY_CNT

We move the data from the incoming file to an intermediate table and using  that table we insert the new records in the base table. What we need to do is, for every new product date combination, we need to insert a new record in the base table, and calculate the Cumulative count using the previous Cumulative unit count calculated on NOT the greatest date for that product, BUT on the date which is less than the date in the intermediate table which we are processing. This is because for any product, it is possible to get dates in the incoming file, which will lesser than the max date for that product that already exists in the base table.

Therefore what complicates the process is that, we not only we need to calculate the CUM value for that date, but we will also have to update the already calculated CUM values for each date greater than the date in the intermediate table.

We can handle it as a two step process, one will update the CUM values for a particular date. Step two will roll forward the CUM values if an intermediate date was received for a product.

If somebody can help me design this in an efficient way, i would really appreciate it.

Thanks a ton!

PT

Tags (1)
2 REPLIES
Senior Supporter

## Re: Cumulative calculation problem

I think you need to follow a two step approach.

The following might work.

Set up a STG table with some example data

`create table csum_stgas(Select cast(random(1,10) as integer) as prod_id, calendar_date as product_usage _date, cast(random(1,1000) as integer) as quantity, cast(0 as integer) as csum_quantityfrom sys_calendar.calendarwhere calendar_date between date-1000 and date-10 ) with data unique primary index (prod_id, product_usage _date);`

TGT table with same layout

`create table csum_tgtas csum_stg with no data;`

`insert into csum_tgt select prod_id, product_usage _date, quantity, sum(quantity) over (partition by prod_id order by product_usage _date rows between unbounded preceding and current row)from csum_stg ;`

Clean STG and add generate some delta data

`delete from csum_stg;insert into csum_stgSelect cast(random(1,10) as integer) as prod_id, calendar_date as product_usage _date, cast(random(1,1000) as integer) as quantity, cast(0 as integer) as csum_quantityfrom sys_calendar.calendarwhere calendar_date between date-100 and date;`

Step 1 of delta process Merge into TGT with csum_quantity = 0

`Merge into csum_tgt using csum_STG         on csum_STG.prod_id = csum_tgt.prod_id            and csum_STG.product_usage_date = csum_tgt.product_usage_datewhen matched then update set quantity = csum_tgt.quantity + csum_STG.quantity, csum_quantity = 0when not matched then insert (prod_id, product_usage _date, quantity, csum_quantity) values (csum_STG.prod_id, csum_STG.product_usage_date, csum_STG.quantity, csum_STG.csum_quantity);`

Step 2 of delta process Update with fully recalculated csum but only update affected rows

`update tfrom csum_tgt t,(select t.prod_id, t.product_usage_date, t.quantity, sum(t.quantity) over (partition by t.prod_id order by t.product_usage_date rows between unbounded preceding and current row) as csum_quantityfrom csum_tgt t     join     (select prod_id, min(product_usage _date) as min_date from csum_stg group by prod_id) as s          on s.prod_id = t.prod_idqualify product_usage _date >= min_date) sset csum_quantity = s.csum_quantitywhere s.prod_id = t.prod_id      and s.product_usage_date = t.product_usage_date;`

Cross check that current values and fully recalculated values are the same

`select t.prod_id, t.product_usage_date, t.quantity, sum(t.quantity) over (partition by t.prod_id order by t.product_usage_date rows between unbounded preceding and current row) as csum_quantityfrom csum_tgt tminusselect *from csum_tgt;select *from csum_tgt minusselect t.prod_id, t.product_usage_date, t.quantity, sum(t.quantity) over (partition by t.prod_id order by t.product_usage_date rows between unbounded preceding and current row) as csum_quantityfrom csum_tgt t;`

Clean up STG table

`delete from csum_stg;`

Enthusiast

## Re: Cumulative calculation problem

Thanks a ton for taking out time for this. I will test this out.

-PT