**Urgent - Query to club the records comparing 2 record's date columns

Database
Highlighted

**Urgent - Query to club the records comparing 2 record's date columns

 

Considering the below data in a Teradata table say Test

iddata 1start_dateend_date
1x1/8/20196/8/2019
1x7/8/20198/8/2019
1x9/8/201910/8/2019
1x11/8/201915/8/2019
1x17/8/201919/8/2019
1x20/8/201925/8/2019
1x28/8/201930/8/2019
1x1/10/201912/10/2019
    
The requirement is to write a query for a given id & data in sorted order of dates to club the rows in such a way if there is a gap in between end date of the previous row and start date of current row split the data into separate rows. Gap meaning its not continues 
    
Expected result 
    
iddata 1start_dateend_date
1x1/8/201915/8/2019
1x17/8/201925/8/2019
1x28/8/201930/8/2019
1x1/10/201912/10/2019

 

Not able to create a query for the above requirement, thinking if it is even possible to write this kind of query in Teradata. Not understanding how lead and lag works in clubbing of the data  

 

Please help me if anyone has any inputs or a sample query its a bit urgent for an application delivery..

Tags (1)
4 REPLIES 4
Ambassador

Re: **Urgent - Query to club the records comparing 2 record's date columns

This has been covered several times in this forum, search for "Merge overlapping periods".

 

For your example data the shortest version would apply NORMALIZE:

select 
   id
  ,data1
  ,begin(pd) as start_date, -- split the period into begin/end
  ,last(pd) as end_date
from
 (
   select normalize
      id
     ,data1
     ,period(start_date, end_date+1) as pd -- normalize works on periods only
   from mytable
 ) as dt

 

Re: **Urgent - Query to club the records comparing 2 record's date columns

Thanks dnoeth.

I am in the process of trying the same.

 

1 more special case is there If there is a change in a year but no gap in dates(e.g end date of the previous record is 31 Dec 2018 & start date of the current record is 1 Jan 2019), we still have to separate those into 2 rows. How can we include that in the query. I missed this scenario while asking the question.

 

 

 

Teradata Employee

Re: **Urgent - Query to club the records comparing 2 record's date columns

Hi nikhilc,

 

You can use EXPAND ON BY ANCHOR PERIOD YEAR_BEGIN after the normalize.

Data

create multiset volatile table mvt_data, no log
( Id            byteint                  not null
, Data1         char(1)                  not null
, start_date    date format 'yyyy-mm-dd' not null
, end_date      date format 'yyyy-mm-dd' not null
)
primary index (Id)
on commit preserve rows;

insert into mvt_data values (1, 'X', date '2019-08-01', date '2019-08-06');
insert into mvt_data values (1, 'X', date '2019-08-07', date '2019-08-08');
insert into mvt_data values (1, 'X', date '2019-08-09', date '2019-08-10');
insert into mvt_data values (1, 'X', date '2019-08-11', date '2019-08-15');
insert into mvt_data values (1, 'X', date '2019-08-17', date '2019-08-19');
insert into mvt_data values (1, 'X', date '2019-08-20', date '2019-08-25');
insert into mvt_data values (1, 'X', date '2019-08-28', date '2019-08-30');
insert into mvt_data values (1, 'X', date '2019-10-01', date '2019-10-12');
insert into mvt_data values (1, 'X', date '2019-10-13', date '2020-01-10'); -- I've added this row, not sure it's your real case

collect statistics column (Id) on mvt_data;

Query

with cte_expand as
(
select Id
     , Data1
     , pd
     , expd
  from cte_normalize
expand on pd as expd by anchor period year_begin
)
 , cte_normalize (Id, Data1, pd) as
(
select normalize
       Id
     , Data1
     , period(start_date, end_date + 1)
  from mvt_data
 where Id    = 1
   and Data1 = 'X'
)
select Id
     , Data1
     , greatest(begin(pd) (int), begin(expd) (int)) (date) as start_date
     , (least(end(pd) (int), end(expd) (int)) (date)) - 1  as end_date
  from cte_expand;

Id  Data1  start_date  end_date
--  -----  ----------  ----------
 1  X      2019-08-01  2019-08-15
 1  X      2019-08-17  2019-08-25
 1  X      2019-08-28  2019-08-30
 1  X      2019-10-01  2019-12-31
 1  X      2020-01-01  2020-01-10

If you're in 16.xx+, you don't have to cast dates to integers in greatest / least functions.

 

Ambassador

Re: **Urgent - Query to club the records comparing 2 record's date columns

No need for GREATEST/LEAST, there's P_INTERSECT :-)

 

...
SELECT 
       Id
     , Data1
     , Begin(pd P_INTERSECT expd) AS start_date
     , Last(pd P_INTERSECT expd) AS end_date
  FROM cte_expand;