Clustering Orders/Transactions with close order dates

Database
Enthusiast

Clustering Orders/Transactions with close order dates

Hi,

The task is to cluster all the orders or transactions under a customer with close order dates, e.g. order dates apart from each other for less than 60 days will be grouped together.  Below are the before and after table as well as the SQL code to generate the before table. Please advise how to code it.

 

Thanks!

 

Here is the before table:

 

Customer

Order_Date

John

2015-02-01

John

2015-02-18

John

2016-02-02

John

2016-02-19

Mike

2015-01-01

Mike

2015-01-05

Mike

2015-01-31

Mike

2016-01-02

Mike

2016-01-06

 

 

And here is the after table with one addition table named “cluster”.

 

Customer

Order_Date

Cluster

John

2015-02-01

1

John

2015-02-18

1

John

2016-02-02

2

John

2016-02-19

2

Mike

2015-01-01

1

Mike

2015-01-05

1

Mike

2015-01-31

1

Mike

2016-01-02

2

Mike

2016-01-06

2

 

 

For your convenience, here is the code to create the original toy table.

 

CREATE VOLATILE TABLE TableD

(

   CUSTOMER     VARCHAR(255)  NOT NULL

  ,ORDER_DATE   DATE                     NOT NULL  

)

PRIMARY INDEX

(

     CUSTOMER

)

ON COMMIT PRESERVE ROWS;

 

INSERT INTO TableD VALUES('Mike', '2015-01-01');

INSERT INTO TableD VALUES('Mike', '2015-01-05');

INSERT INTO TableD VALUES('Mike', '2015-01-31');

INSERT INTO TableD VALUES('Mike', '2016-01-02');

INSERT INTO TableD VALUES('Mike', '2016-01-06');

INSERT INTO TableD VALUES('John', '2015-02-01');

INSERT INTO TableD VALUES('John', '2015-02-18');

INSERT INTO TableD VALUES('John', '2016-02-02');

INSERT INTO TableD VALUES('John', '2016-02-19');

 

SELECT *

FROM TableD

ORDER BY CUSTOMER, ORDER_DATE ASC

 

 

3 REPLIES
Junior Supporter

Re: Clustering Orders/Transactions with close order dates

There could be much easier and cleaner solution to this. But below query got me the result you wanted.

select y.customer, y.order_date, z."cluster" from
(select customer, order_date, extract(year from order_date) as yr,
extract(month from order_date) as mo
from tableD)y
inner join

(select customer, yr, mo,
sum("cluster1") over(partition by customer order by yr, mo rows 1 preceding) as "cluster"
from ( select customer,
extract(year from order_date) as yr,
extract(month from order_date) as mo,
rank()over(partition by customer, yr, mo order by yr, mo) as "cluster1"
from tableD a
group by 1, 2, 3) x)z
on y.customer = z.customer
and y.yr =z.yr
and y.mo = z.mo
order by y.customer, order_date

Enthusiast

Re: Clustering Orders/Transactions with close order dates

Thanks nealvenna, that's a nice try!

 

But your code does not consider the 60 days time window. I would like to group any two orders if their corresponding order dates are less than 60 days apart. So if I change the toy data a little bit, the results will be wrong. Could you please further advise? 

 

Thanks!

 

New before table where on the second row I intentionally changed order date to 2015-03-18.

 

CustomerOrder_Date
John2015-02-01
John2015-03-18
John2016-02-02
John2016-03-19
Mike2015-01-01
Mike2015-01-05
Mike2015-02-28
Mike2016-01-02
Mike2016-01-06

 

 

New after table results, for John, the 2015-03-18 order is clusterred incorrectly as the order date is less than 60 days apart from 2015-02-01.

 

CustomerOrder_DateCluster
John2015-02-011
John2015-03-182
John2016-02-022
John2016-03-192
Mike2015-01-011
Mike2015-01-051
Mike2015-02-282
Mike2016-01-022
Mike2016-01-062

 

 

 

 

 

 

 

Highlighted
Enthusiast

Re: Clustering Orders/Transactions with close order dates

Finally, I had one solution. But please share it if you have a better or more efficient solution.

 

Thanks!

 

CREATE MULTISET VOLATILE TABLE TableD

(

   CUSTOMER     VARCHAR(255)  NOT NULL

  ,ORDER_DATE   DATE          NOT NULL  

)

PRIMARY INDEX

(

     CUSTOMER

)

ON COMMIT PRESERVE ROWS;

 

INSERT INTO TableD VALUES('Mike', '2015-01-01');

INSERT INTO TableD VALUES('Mike', '2015-01-05');

INSERT INTO TableD VALUES('Mike', '2015-01-31');

INSERT INTO TableD VALUES('Mike', '2016-01-02');

INSERT INTO TableD VALUES('Mike', '2016-01-06');

INSERT INTO TableD VALUES('John', '2015-02-01');

INSERT INTO TableD VALUES('John', '2015-02-18');

INSERT INTO TableD VALUES('John', '2016-02-02');

INSERT INTO TableD VALUES('John', '2016-02-19');

 

SELECT

B.CUSTOMER

,B.ORDER_DATE

,SUM(B.ind) OVER (PARTITION BY CUSTOMER ORDER BY ORDER_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS "CLUSTER"

FROM

(

SELECT A.*

      ,CASE WHEN (A.ORDER_DATE - MIN(ORDER_DATE) OVER (PARTITION BY CUSTOMER ORDER BY ORDER_DATE ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)) <61 THEN 0 ELSE 1 END  ind

FROM TableD A

) B

ORDER BY

1,2