Classification of dates in different columns

Database
Enthusiast

Classification of dates in different columns

Hello,

I have 4 different products (1=having the product, 0 = not having the product) with each, a corresponding start date, in 4 different columns:

For ex.

Customer_id  Prod1     Prod2     Prod 3     Prod4      Start_dt_prod1    Start_dt_prod2     Start_dt_prod3   Start_dt_prod4

100              1            0           1             1           04/02/2002                                         01/02/2015       10/03/2015

200              1             1          1            1             02/03/1978           10/11/1999             05/08/1962         10/10/2004

300              1             1          0             0            10/10/2014            10/10/2014

I have to create a new variable, with 4 positions, each corresponding to a product (1st position=prod1, 2nd position=prod2, etc...) and the value from 0 to 4 would be defined according to the start date.

For ex,

for customer '100': it would be '1023'  (because prod 1 has the first start_dt, there is no prod 2 for this customer, prod 3 has the second start_dt and prod 4 was the last to be started)

for customer '200', it would be '2314'

for customer '300', it would be '1100' because the 2 products have the same start_dt

I am trying to solve this with an endless number of 'case when' clauses.  However, I am wondering if there would be an easier and more effecient way to tackle this issue.

Thank you very much for your help !

11 REPLIES
Senior Apprentice

Re: Classification of dates in different columns

This is a horrible data model, as a result you need horrible SQL :-)

Following approach normalizes the columns into rows and the RANKs the data. Maybe you need a DENSE_RANK instead of a RANK, this is supported in TD14.10: 

SELECT dt.*
, CASE WHEN Prod1 = '0' THEN '0' ELSE TRIM(rnk) END ||
CASE WHEN Prod2 = '0' THEN '0' ELSE TRIM(MIN(rnk) OVER (PARTITION BY customer_id ORDER BY n
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)) end ||
CASE WHEN Prod3 = '0' THEN '0' ELSE TRIM(MIN(rnk) OVER (PARTITION BY customer_id ORDER BY n
ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)) end ||
CASE WHEN Prod4 = '0' THEN '0' ELSE TRIM(MIN(rnk) OVER (PARTITION BY customer_id ORDER BY n
ROWS BETWEEN 3 FOLLOWING AND 3 FOLLOWING)) end
FROM
(
SELECT t.*,
n,
RANK() OVER (PARTITION BY customer_id ORDER BY COALESCE( CASE n
WHEN 1 THEN Start_dt_prod1
WHEN 2 THEN Start_dt_prod2
WHEN 3 THEN Start_dt_prod3
WHEN 4 THEN Start_dt_prod4
END, DATE '9999-12-31')) AS rnk
FROM yourtable AS t
CROSS JOIN nums -- this is a table with a single column and four rows in it: 1,2,3,4
) AS dt
QUALIFY n = 1

I don't know if this is actually easier/more efficient, at least it could be easily extended to more than 4 columns...

Enthusiast

Re: Classification of dates in different columns

Thank you soooooo much, it works !!!  I am really impressed :-)

Enthusiast

Re: Classification of dates in different columns

oups.... Answered too fast...

I found such results:

Customer_id    Prod_1    Prod_2    Prod_3   Prod_4    Start_dt_prod_1  Start_dt_prod_2   Start_dt_prod_3   Start_dt_prod_4

888                      1           1            1           1          05/11/1985         02/12/2009        15/09/2009             05/11/1985

and this results in '1431' instead of '1321'...

Any idea ?

Thanks a lot !!!!

Senior Apprentice

Re: Classification of dates in different columns

That's why I wrote "maybe you need a DENSE_RANK instead of a RANK" :-)

Are you on TD14.10?

Enthusiast

Re: Classification of dates in different columns

No, I have version TD13.00 and DENSE_RANK is not recognized...

Senior Apprentice

Re: Classification of dates in different columns

For DENSE_RANK it's a bit more complicated, see Missing Functions: DENSE_RANK

SELECT dt.*
, CASE WHEN Prod1 = '0' THEN '0' ELSE TRIM(rnk) END ||
CASE WHEN Prod2 = '0' THEN '0' ELSE TRIM(MIN(rnk) OVER (PARTITION BY customer_id ORDER BY n NULLS LAST
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)) end ||
CASE WHEN Prod3 = '0' THEN '0' ELSE TRIM(MIN(rnk) OVER (PARTITION BY customer_id ORDER BY n NULLS LAST
ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)) end ||
CASE WHEN Prod4 = '0' THEN '0' ELSE TRIM(MIN(rnk) OVER (PARTITION BY customer_id ORDER BY n NULLS LAST
ROWS BETWEEN 3 FOLLOWING AND 3 FOLLOWING)) end
FROM
(
SELECT dt.*,
SUM(flag) OVER (PARTITION BY customer_id ORDER BY dat, n ROWS UNBOUNDED PRECEDING) AS rnk
FROM
(
SELECT t.*, n
,COALESCE( CASE n
WHEN 1 THEN Start_dt_prod1
WHEN 2 THEN Start_dt_prod2
WHEN 3 THEN Start_dt_prod3
WHEN 4 THEN Start_dt_prod4
END, DATE '9999-12-31') AS dat,
CASE WHEN ROW_NUMBER()
OVER (PARTITION BY customer_id, dat
ORDER BY n) = 1
THEN 1
ELSE 0
END AS flag
FROM your table as t
CROSS JOIN nums -- this is a table with a single column and four rows in it: 1,2,3,4
) dt
) AS dt
QUALIFY n = 1
Enthusiast

Re: Classification of dates in different columns

Thanks, Dieter.

I tried this but get the error message 3706 'expected the word RESET or ')' after ORDER BY clause.

Could sth be missing ? 

Senior Apprentice

Re: Classification of dates in different columns

My syntax was ok, maybe you removed a ")"?

Enthusiast

Re: Classification of dates in different columns

Triple checked the SQL ...and can not find the issue (the only thing I know is that the issue comes from the top 4 case when lines....because the rest of the sql runs fine... maybe due to my sql editor ???).  Anyway, thanks a lot, you helped me a great deal.