Problem merging historical data

Analytics

Problem merging historical data

Hi,

I have the flowing problem: In my customer_product table there are 3 groups of products defined each group has it's own priority (1=highest,2=middle,3=lowest). For example (a certain customer):

start_date end_date product priority
2008-01-01 2008-06-12 prodx1 3
2008-06-13 2008-12-31 prodx2 3
2008-02-15 2008-04-15 prody1 2
2008-03-15 2008-08-31 prodz1 1
2008-07-01 2008-07-05 prodz2 1

I need to build a history considering the priority. This means, whenever more than one product is active at the same time only the product with the highest priority will be considered (but there could as well be more than one products with the same "highest" priority active). The desired result for the above example would be:

2008-01-01 2008-02-14 prodx1 3
2008-02-15 2008-03-14 prody1 2
2008-03-15 2008-08-31 prodz1 1
2008-07-01 2008-07-05 prodz2 1
2008-09-01 2008-12-31 prodx2 3

As you see, the two priority "1" products are active at the same time because both of them have the highest priority.

Has anybody an idea how to solve this problem. I tried a solution joining the sys_calendar.calendar table, expanding the historical records to have one record for each day, afterwards considering the highest priority per day and selecting for each day only the records with the highest priority. And packing the history afterwards again with OLAP functions but it was too slow (the table has 200 Mio records).

Any idea how to solve this (maybe with OLAP functions)?

Thanks in advance

Roland Wenzlofsky

3 REPLIES
N/A

Re: Problem merging historical data

Hi Roland,

this is a tough one.

At the first glance it looked like a typical OLAP query based on a

"SELECT start_date, ... UNION ALL SELECT end_date, ..." Derived Table.

But i couldn't think of any way to get the expected result set (maybe nesting OLAP functions several times), so i switched over to traditional SQL.

There are two non-equi-joins, but the query should perform well if the number of products per customer is not too high (for Telco customers this is hopefully true).

The first part adjusts the date range if there's a concurrent product with a lower priority and excludes that range, e.g.

xxxxx lower prio
___xxxxx higher prio

xxx result range

There's a special case:
a higher priority product date range is completely within a lower priority product date range,
thus that single row is split into two rows:

xxxxxxxxxxxx lower prio
____xxxx____ higher prio

xxxx____xxxx result ranges

The NOT EXISTS checks if there's already a product with a higher priority in the same date range.

[font=Courier New]

SELECT DISTINCT

cust, sd, ed, prio, prod

FROM

(

SELECT

CASE

WHEN c1.priority < c2.priority

AND

(

c1.start_date >= c2.start_date

AND c1.end_date <= c2.end_date

)

THEN 1

ELSE 0

END AS flag,

c1.cust,

CASE

WHEN flag = 1

THEN c2.start_date

WHEN c1.priority > c2.priority

AND

(

c1.start_date BETWEEN c2.start_date AND c2.end_date

OR

(

c1.start_date <= c2.start_date

AND c1.end_date >= c2.end_date

)

)

THEN c2.END_date + 1

ELSE c1.start_date

END AS sd,

CASE

WHEN flag = 1

THEN c1.start_date - 1

WHEN c1.priority > c2.priority

AND

(

c1.end_date BETWEEN c2.start_date AND c2.end_date

OR

(

c1.start_date <= c2.start_date

AND c1.end_date >= c2.end_date

)

)

THEN c2.start_date - 1

ELSE c1.end_date

END AS ed,

CASE

WHEN flag = 1

THEN c2.product

ELSE c1.product

END AS prod,

CASE

WHEN flag = 1

THEN c2.priority

ELSE c1.priority

END AS prio

FROM

customer_product AS c1

JOIN customer_product AS c2

ON

c1.cust = c2.cust

WHERE

sd <= ed

) AS dt

WHERE

NOT EXISTS

(

SELECT

*

FROM

customer_product AS c3

WHERE

dt.cust = c3.cust

AND (dt.sd, dt.ed) OVERLAPS (c3.start_date, c3.end_date)

AND c3.priority < dt.prio

)[/font]

Argh, i hate it when all leading blanks are removed by that $%&* forum software

Dieter
N/A

Re: Problem merging historical data

Even worse, when i edit the post, each linebreak is doubled (as a punishment for working on a mac?)

Dieter

Re: Problem merging historical data

Thank you for supporting me. I tried to use your solution and it worked quite well but in one special case there seems to be a problem:

If the records look like this:

xxxxxxxxxxxxxxxxxxxxxxx
_____xxxxxx--xxxxxxx__

i lost the inner part I marked with '--'

Here is a solution i found using OLAP functions. The outer part of the query does nothing else than packing together consecutive records with same priority (having been created by the inner algorithm):

select t01.customer_id
,t01.product
,min(t01.start_date)
,max(t01.end_date)
,t01.priority
,t01.product_id

from
(
select customer_id,product,start_date,end_date,priority,product_id
,sum(changed) over (partition by t01.customer_id,product_id order by t01.start_date ROWS UNBOUNDED PRECEDING) - changed as group_cd

from
(
select
customer_id,product,start_date,end_date,priority,product_id
,max(t01.start_date) over ( partition by t01.customer_id,t01.product_id order by t01.product,t01.start_date ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) as start_date_next
,case
when ((end_date+1) <> start_date_next) then 1
else
0
end Changed

from
(

select
customer_id,product,start_date,end_date,priority,product_id

from(
select customer_id,product,priority,date_x+1-is_start_date as start_date,
max(is_start_date) over (partition by customer_id,product_id order by date_x rows between 1 following and 1 following) as next_is_start_date,
max(date_x) over (partition by customer_id,product_id order by date_x rows between 1 following and 1 following)
- next_is_start_date as end_date
,product_id

from
(

select t01.customer_id,date_x,t02.product,t02.priority,is_start_date,t02.product_id
from
(select customer_id,start_date as date_x ,1 as is_start_date,product_id
FROM twh_rpr.roaming_product_base4

union

select customer_id,end_date,0 as is_start_date,product_id
FROM twh_rpr.roaming_product_base4

) t01

inner join
twh_rpr.roaming_product_base4 t02
on
t01.customer_id = t02.customer_id and
date_x between t02.start_date and t02.end_date
) t01
t01a
qualify priority = min(priority) over (partition by customer_id,start_date)
WHERE end_date IS NOT NULL
AND start_date <= end_date

) t01
) t01
) t01
group by t01.customer_id,t01.product,t01.priority,t01.product_id,group_cd