I have a table (status_moves) which logs each time a record changes status. It looks like this:
I need to write a query to determine what stage a a particular account_id was in on a particular date. For example,
For the question :"In what status was Account_id 400 on June 21, 2014?"
The Answer should be "C", as the most recent "new_value" prior to June 21 was "C".
How can I write a query to get the end result to look like this:
I believe I need to use max or min function but am unsure of where to start.
This is the format of the [db.stage_moves] table:
_date account_id old_value new_value
5/21/2014 400 B C
5/31/2014 400 C A
6/01/2014 300 C B
6/04/2014 400 A C
7/01/2014 300 B A
And this is how I would want to see the result to determine status on June 21, 2014:
_date account_id account_status_on_date
6/21/2014 400 C
I used the following to recreate your example:
CREATE VOLATILE TABLE vt_stage_moves
, NO FALLBACK
, NO JOURNAL
) PRIMARY INDEX (_date,account_id)
ON COMMIT PRESERVE ROWS;
insert into vt_stage_moves values('2014-05-21',400,'B','C');
insert into vt_stage_moves values('2014-05-31',400,'C','A');
insert into vt_stage_moves values('2014-06-01',300,'C','B');
insert into vt_stage_moves values('2014-06-04',400,'A','C');
insert into vt_stage_moves values('2014-07-01',300,'B','A');
I would personally feel most comfortable doing this with Ordered Analytic Functions.
SELECT '2014-06-21' AS _date,account_id,newValue AS account_status_on_date
FROM vt_stage_moves st
WHERE st._date <= '2014-06-21'
AND st.account_id = 400
QUALIFY ROW_NUMBER() OVER (ORDER BY st._date DESC) = 1;
Now... you'll note that I had to manually hard code the date... this query would work best by creating a macro for it, where you can just use variables.
The basic idea of the query is:
1. Pick all of the dates less than the specified date.
2. Pick the highest date out of the dates from (1).
ROW_NUMBER() simply assigns a number to the row based on the ORDER BY in the OVER clause. QUALIFY is the clause for filtering based on ordered analytic functions (much like you use the HAVING clause for aggregation functions). There are MANY other ways to use ordered analytic functions... this is just one use case.
Thank you cmedved! This makes a ton of sense. I also agree with your logic regarding the macro.
Moreover, my bigger goal is to create a table which has all dates up to today, the account_id, and the stage on that day.
I'm going to combine the technique you've suggested in conjunction with a join I am doing to a dates table and will post my findings/results.
Appreciate your help!
Instead of a join you might use EXPAND on to create the missing dates, see
Thanks Dieter -
I am interested in using the EXPAND method but am not sure exactly how to apply this as the prior example seems to be focused on dates, and I am trying to populate the most recent value of a VARCHAR(255) field for 'account_status_on_date' .
Tried doing this but got nowhere so far as it gave me a "Expected something between the Select keyword and the 'Key' Keyword.
Any guidance on how I might change the query?
The error message tells exactly what's wrong:
KEY is a keyword, if there's actually a column named key in your table you must use double quotes: "key"
I don't get what you want to achieve, could you add some data and expected result?