Cumulative Sum by distinct customer ID by year

Database

Cumulative Sum by distinct customer ID by year

Hi - I've seen some posts here regarding the SUM OVER function to calculate the cumulative sum.  My problem at hand is slightly different, where I need to keep a running total of distinct purchasers by year:

Year        #Distinct Customers         #Cumulative (Distinct Customers)

2010                  150                                         150

2011                  180                                         310  <-- Note that the count is smaller than just (150+180), because some may be repeat purchasers

2012                  200                                         500  <-- same thing here as above, would like to exclude repeat purchasers

2013                  250                                         750  <-- just so happens that all 250 customers in 2013 were all new

2014                  190                                         805 

Could anyone please let me know if this can somehow work with SUM OVER Partition?  Right now I have, but it only gives a simple running total.

SELECT year, count(distinct cust_id) as cust_count,

SUM(cust_count) OVER (PARTITION BY year ORDER BY year ROWS UNBOUNDED PRECEDING ) AS cumul_sum

Thanks!

Carey

20 REPLIES
Junior Contributor

Re: Cumulative Sum by distinct customer ID by year

Hi Carey,

I've done a similar query, new vs. existing customers for every period :-)

There's no way to do it with a simple SUM OVER as you can't use DISTINCT within an OLAP function.

SELECT yr,
COUNT(*) AS distinct_customers
,SUM(flag) AS new_customers
,SUM(SUM(flag))
OVER (ORDER BY yr
ROWS UNBOUNDED PRECEDING) AS cumulative_new_customers
FROM
(
SELECT
cust_id
,EXTRACT(YEAR FROM order_dt) AS yr
,CASE -- find the first year when a customer placed an order
WHEN EXTRACT(YEAR FROM order_dt)
= MIN(EXTRACT(YEAR FROM order_dt))
OVER (PARTITION BY cust_id)
THEN 1
ELSE 0
END AS flag
FROM orders
GROUP BY yr,cust_id
) AS dt
GROUP BY 1

Re: Cumulative Sum by distinct customer ID by year

Thank you so much, Dieter.  This is exactly what I needed!!

Junior Supporter

Re: Cumulative Sum by distinct customer ID by year

Hi all:

Sorry for jumping in (and jumping in late)

I'm not sure about requirements but doesn't a scalar subquery do the trick?

SELECT a.year,

       count(distinct a.cust_id) as cust_count,

       (select count(distinct b.cust_id) from mytable b where b.year <= a.year)

from mytable a

group by a.year

order by a.year;

Cheers.

Carlos.

Junior Contributor

Re: Cumulative Sum by distinct customer ID by year

Hi Carlos,

of course this should work, too.

But the Explain wil be horrible due to the non-equi-Join, including a bad Product Join with a huge intermediate spool. I always try to avoid Scalar Subqueries like this, most of them are easily rewritten using OLAP functions, etc. :-)

Junior Supporter

Re: Cumulative Sum by distinct customer ID by year

Hi Dieter:

Agreed with the horrible query plan. Agreed with the scalar subqueries (I don't like them either).

BUT:

Sometimes the simplicity of the query can be a adequate solution for many users (Not everybody is able to develop a query like yours. "Easily rewritten" is a fuzzy concept here ;-) ).

Cheers.

Carlos.

Re: Cumulative Sum by distinct customer ID by year

Hi - I am new to Teradata SQL and I have a problem like this. I need to get the LATEST record from each customer.

To get this, I created a derived table, but it doesn't work. Could anyone has the idea please?

Here is my codes:

select top 100*

from CUST_CST_CUAD as CUST

where PERSON_ID=tempo.PERSON_ID and

CUST.DOE=tempo.take

order by PERSON_ID, DOE,

(select

PERSON_ID

,max(DOE) as take

from CUST_CST_CUAD

group by PERSON_ID) as tempo;

Cheers,

Yusuf

Junior Contributor

Re: Cumulative Sum by distinct customer ID by year

@Carlos:

you're right, this specific query was not "easily rewritten", but it's also not an easy question :-)

@Yusuf:

The Deried Table must be in the FROM clause:

select top 100*
from CUST_CST_CUAD as CUST,
(select
PERSON_ID
,max(DOE) as take
from CUST_CST_CUAD
group by PERSON_ID) as tempo
where PERSON_ID=tempo.PERSON_ID and
CUST.DOE=tempo.take
order by PERSON_ID, DOE

But you can simply use an OLAP function:

select *
from CUST_CST_CUAD as CUST
qualify
MAX(DOE)
OVER (PARTITION BY PERSON_ID) = DOE;

select *
from CUST_CST_CUAD as CUST
qualify
RANK()
OVER (PARTITION BY PERSON_ID
ORDER BY DOE DESC) = 1;

Both return exactly the same result, there might be more than one row per customer if there are multiole rows for the same max date (If you want only a single row, you must add another column to ORDER BY or switch to ROW_NUMBER instead of RANK)

Re: Cumulative Sum by distinct customer ID by year

I have below record set

Customer col1  col2  col3  col4  col5
123 2920292109800000000 352008063830068 6147439366 7/2/2014 VZ
123 2920292109800000000 352008063988684 6147438242 7/2/2014 VZ
123 2920292109800000000 352005063348786 6145379130 7/2/2014 VZ
123 2920292109800000000 352005063125267 6147435989 7/2/2014 VZ
123 2920292109800000000 352008063805045 6147437954 7/2/2014 VZ
123 2920029207100030000 35200006841935 6147437954 6/29/2014 SP
123 2920029207100030000 35200406638684 6147435989 6/29/2014 SP
123 2920029207100030000 35200406629904 6147439979 6/29/2014 SP
123 2920029207100030000 35200006817715 6147438242 6/29/2014 SP
123 2920029207100030000 35200006840965 6147439366 6/29/2014 SP
123 2920029206700020000 352006063253877 6145512232 6/24/2014 VZ
123 2920029206700020000 352006063184395 6145511878 6/24/2014 VZ
123 2920029206700020000 352008064038380 6145511808 6/24/2014 VZ
123 2920029206700020000 352006063342274 6145512162 6/24/2014 VZ
123 2920029206700020000 352006063493580 6145510997 6/24/2014 VZ
123 2920002926800000000 35200006843597 6143524054 6/23/2014 SP
123 2920002926800000000 35200006858820 6143520916 6/23/2014 SP
123 2920002926800000000 35200006776180 6143521670 6/23/2014 SP
123 2920002926800000000 35200006802267 6143520510 6/23/2014 SP
123 2920002926800000000 35200406639117 6143523467 6/23/2014 SP

Desired result set:

cus col4  col5 Sum(distinct count) order by col4
123 7/2/2014 VZ 2
123 6/29/2014 SP 2
123 6/24/2014 VZ 2
123 6/23/2014 SP 1
Junior Contributor

Re: Cumulative Sum by distinct customer ID by year

What are the rules to get the desired result?

What is "Sum(distinct count) order by col4"?