I have a table like this:
CUST_ID purchase_dt prev_purch_dt gap avg_gap reset_flag
1 543,278,201 2/25/2016 0 0 0
2 543,278,201 3/24/2016 2/25/2016 27 13.5 0
3 543,278,201 4/28/2016 3/24/2016 34 20.3 0
4 543,278,201 7/27/2016 4/28/2016 87 37 1
5 543,278,201 9/15/2016 7/27/2016 0 0 0
6 543,278,201 9/21/2016 9/15/2016 6 3 0
The column 'gap' is the time in days between the customer's purchases. Avg_gap is the cumulative or running average across all rentals made by the customer until they encounter the reset flag = 1. Once the reset flag = 1, I want to set the 'gap' and 'avg_gap' to 0 so basically restart the counter. Is there a way to do that?
So this is the SQL I have,
AVG(gap) OVER(PARTITION BY r.cust_id ORDER BY r.purchase_dt ASC ROWS UNBOUNDED PRECEDING ) AS avg_gap,
case when gap > avg_gap then reset_flag = 1 else 0
from table a.
So two questions -
1. where would the RESET WHEN clause go?
2. is it an issue that the reset_flag is getting defined AFTER it has been referenced in Avg window function?
This is not going to work with RESET because there's a recursive definition, reset_flag is based on avg_gap and vice versa.
OLAP functions are probably not going to work, you'lly need a recursive query to get the expected result.
You must code a recursive Common Table Expression (CTE) using WITH RECURSIVE which iterates over the rows per customer and applies your logic. The first problem is finding the next row, it must be easily calculated based on the current row, if both purchase_dt and prev_purch_dt are columns in a table it's simple, but I assume it's calculated. In that case you better materialize the data first and add a ROW_NUMBER.
Which columns in your example are from a base table and which are calculated (and how)?
My table has multiple customers.
Taking a snapshot for a single customer, this is what it looks like:
CUST_ID purchase_dt prev_purch_dt gap avg_gap Std_dev reset_flag
1 543,278,201 10/22/2009 0 0 0 0
2 543,278,201 02/11/2010 10/22/2009 112 56 79 0
3 543,278,201 03/11/2010 02/11/2010 28 46 58 1
4 543,278,201 04/08/2010 03/11/2010 28 42 48 0
5 543,278,201 05/15/2010 04/08/2010 7 35 44 0
6 543,278,201 9/21/2016 9/15/2016 77 42 43 0
Only cust_id and purchase_dt are from a base table. Rest all are calculated. Here's the definition
I have a row_number in the beginning that lists their purchases.
Prev_purchase date - lists the previous purchase date.
gap is the time between two purchases.
reset flag = 1 if gap > avg_gap + std_dev.
When reset flag = 1 the gap column in next row should be reset and the calculations should be restarted for avg_gap and std_dev until they reach a reset_flag = 1 again.
Does that make sense?