Cumulative calculation problem

Database
Enthusiast

Cumulative calculation problem

I have this situation. The base table here stores counts(number of units) flowing in for a product on a particular date and the Cumulative unit count. 

Base Table structure

PRODUCT      DATE     DAILY_CNT      CUM_UNIT_CNT

An incoming file will have number of units for a particular product and date combination, using which the cumulative value needs to be calculated and updated in the base table on a daily basis.

Incoming File/Intermediate table structure

PRODUCT      DATE    DAILY_CNT

We move the data from the incoming file to an intermediate table and using  that table we insert the new records in the base table. What we need to do is, for every new product date combination, we need to insert a new record in the base table, and calculate the Cumulative count using the previous Cumulative unit count calculated on NOT the greatest date for that product, BUT on the date which is less than the date in the intermediate table which we are processing. This is because for any product, it is possible to get dates in the incoming file, which will lesser than the max date for that product that already exists in the base table.

Therefore what complicates the process is that, we not only we need to calculate the CUM value for that date, but we will also have to update the already calculated CUM values for each date greater than the date in the intermediate table.

We can handle it as a two step process, one will update the CUM values for a particular date. Step two will roll forward the CUM values if an intermediate date was received for a product.

If somebody can help me design this in an efficient way, i would really appreciate it.

Thanks a ton!

PT

Tags (1)
2 REPLIES
Supporter

Re: Cumulative calculation problem

I think you need to follow a two step approach.

The following might work.

Set up a STG table with some example data

create table csum_stg
as
(
Select cast(random(1,10) as integer) as prod_id, calendar_date as product_usage _date, cast(random(1,1000) as integer) as quantity, cast(0 as integer) as csum_quantity
from sys_calendar.calendar
where calendar_date between date-1000 and date-10
) with data
unique primary index (prod_id, product_usage _date);

TGT table with same layout

create table csum_tgt
as csum_stg with no data;

Initial load of TGT

insert into csum_tgt 
select prod_id, product_usage _date, quantity, sum(quantity) over (partition by prod_id order by product_usage _date rows between unbounded preceding and current row)
from csum_stg ;

Clean STG and add generate some delta data

delete from csum_stg;

insert into csum_stg
Select cast(random(1,10) as integer) as prod_id, calendar_date as product_usage _date, cast(random(1,1000) as integer) as quantity, cast(0 as integer) as csum_quantity
from sys_calendar.calendar
where calendar_date between date-100 and date;

Step 1 of delta process Merge into TGT with csum_quantity = 0

Merge into csum_tgt 
using csum_STG
on csum_STG.prod_id = csum_tgt.prod_id
and csum_STG.product_usage_date = csum_tgt.product_usage_date
when matched then update set quantity = csum_tgt.quantity + csum_STG.quantity, csum_quantity = 0
when not matched then insert (prod_id, product_usage _date, quantity, csum_quantity) values (csum_STG.prod_id, csum_STG.product_usage_date, csum_STG.quantity, csum_STG.csum_quantity);

Step 2 of delta process Update with fully recalculated csum but only update affected rows

update t
from csum_tgt t,
(
select t.prod_id, t.product_usage_date, t.quantity, sum(t.quantity) over (partition by t.prod_id order by t.product_usage_date rows between unbounded preceding and current row) as csum_quantity
from csum_tgt t
join
(select prod_id, min(product_usage _date) as min_date from csum_stg group by prod_id) as s
on s.prod_id = t.prod_id
qualify product_usage _date >= min_date
) s
set csum_quantity = s.csum_quantity
where s.prod_id = t.prod_id
and s.product_usage_date = t.product_usage_date
;

Cross check that current values and fully recalculated values are the same

select t.prod_id, t.product_usage_date, t.quantity, sum(t.quantity) over (partition by t.prod_id order by t.product_usage_date rows between unbounded preceding and current row) as csum_quantity
from csum_tgt t
minus
select *
from csum_tgt;

select *
from csum_tgt
minus
select t.prod_id, t.product_usage_date, t.quantity, sum(t.quantity) over (partition by t.prod_id order by t.product_usage_date rows between unbounded preceding and current row) as csum_quantity
from csum_tgt t;

Clean up STG table

delete from csum_stg;

Enthusiast

Re: Cumulative calculation problem

Thanks a ton for taking out time for this. I will test this out.

-PT