Running sum

Database
Enthusiast

Running sum

Hi there,

I have been playing a bit with the running sum over partition and I found out some behaviour I sincerely was not expecting.

Let's make it with a simple example:

sum(coalesce(sales,0)) over (partition by Account_manager order by month ROWS UNBOUNDED PRECEDING) as cummulative

There are 4 account managers that make some sales on month 1.

    Mth1

AM1    250

AM2    250

AM3    250

AM4    250

-------

total    1000

If the AM1 makes no sales on Month 2 then we have on that month

    Mth2

AM2    250

AM3    250

AM4    250

-------

total    750

The cummulative sums happen to be

    Mth2

AM2    500

AM3    500

AM4    500

-------

total    1500

instead of

              Mth1      Mth2

AM1       250       250

AM2        250        500

AM3        250        500

AM4        250        500

-----------------------------

total      1000     1750

What happens is that the partition defined has a combination occuring on month 1 but not on month 2. There is no record for that AM1 and therefore is not included in the sum.

I would like to know if there is any way to keep all the combinations as therefore the overall sums are not correct due to the lack of inclusion of the AM1. Something like scanning all the partitions and have the combinations stable across all the partitions.

I know I could use a partition on year level to get all the combinations and then make it as an LOV doing an outer join to the fact table but I don't think is the optimal soultion nor the cleanest one.

I am sure I am not the first person who realises this behaviour on the 'over partition' when the partition combinations are not available in all the partitions.

8 REPLIES
Junior Contributor

Re: Running sum

It's hard to follow your explanation, but you need to include a missing month in your result set?

Are you on TD13.10?

There's the new EXPAND ON clause to accomplish that.

Dieter

Enthusiast

Re: Running sum

Hi Dieter,

The current issue is that there is no record with the combination of AM1 and Month2. Therefore, when executing the sum over partition, the Month2 cummulative sum is just summing AM2, AM3 and AM4.

Take into account that for the Month2 the values should be the sum for Month1 and Month2. What I would like is to have the value for the AM1 for the previous month to be incorporated as eventhough he hasn't done any sale on Month2, he did some for Month1 and therefore to be accounted on the overall running sum on Month2.

I don't think the 'EXPAND ON' is what would solve my current issue. I heard something like 'filling the gaps' which potentially would be an option but I haven't read anything about that.

Indeed we are on TD 13.10.

Thanks for your support.

Regards,

Esteve

Enthusiast

Re: Running sum

Hi again,

I read a bit about 'EXPAND ON' and I think it does not solve my current issue because:

  • I have records per month and not period ranges
  • I don't expand value amongst a time period but want to have a 'fake' record for the months without values

Where can I find more information about the proMISS application?

Regards,

Esteve

Enthusiast

Re: Running sum

Could not get you exactly

SEL

SUM(sales) OVER (PARTITION BY AM ORDER BY DT1 ROWS UNBOUNDED PRECEDING) AS CUMMULATIVE

,AM

,VAL1,

VAL2

FROM TEMP

;

But Above Query gives desired result.

Enthusiast

Re: Running sum

Hi Jigar,

Thanks for your comment.

The table has the following values

sales_rep    month    value

AM1            1        250

AM2            1        250

AM3            1        250

AM4            1        250

AM2            2        250

AM3            2        250

AM4            2        250

So when doing a running sum, there will be no value for the AM1 for the month 2. Therefore the overall running sum for month 2 will NOT include the value for AM1 for the month 1.

Therefore, the running sum is not 'filling the gap' of unexisting combinations.

Hope this explanation helped a bit your understanding of my current problem.

Regards,

Esteve

Junior Contributor

Re: Running sum

What's the datatype of "month"?

If you can cast it to a date (preferably 1st of month), you can create a period on-the-fly using

period(month_as_date,
coalesce(min(month_as_date)
over (partition by account_manager
order by month
rows between 1 following and 1 following)
,date '9999-12-31')
) as p

This should not need an extra STATS step in explain, because it's the same PARTITION/ORDER as the cummulative sum.

And then you use an

EXPAND ON p AS p2 BY ANCHOR MONTH_BEGIN

FOR PERIOD (DATE '2011-01-01', DATE '2011-05-01').

This will create any missing rows after the first month with data (showing the previous cummulative sum), but none before, e.g. if you got data for february only this will result in:

feb/mar/april, but no jan row.

Dieter

Enthusiast

Re: Running sum

Hi Dieter,

Thanks indeed for your answer.

It's quite interesting. I was thinking of something similar which would be doing a partition on year level then to know which AM are involved for that year and then expand all the months with zero values. That approach, though, would mean to run twice the scan of the table: one for the year partition with months expanded with zero values and the second one to read the month level records to, later on, sum both of them (zero + value=value).

I would need to check if the situation of not being able to report values for the same year before the 'first' record month is a problem or not from the report perspective.

Again, thanks a lot for sharing your expertise on this problem.

Regards,

Esteve

Enthusiast

Re: Running sum

Hi Dieter,

Which would be the approach for another database which is not on 13.10 but 13.01 as the 'EXPAND ON' functionality is not there?

Thanks again and regards,

Esteve