Date Gap Analysis

Analytics
Enthusiast

Date Gap Analysis

Here are the details I have a table that contains effective start/end dates. What I want to do is find where the gap is and only pull data on either side of that gap. for example:

UPC effect_start effect_end
123 2-7-2007 8-2-2007
123 8-2-2007 11-2-2007
123 11-2-2007 12-4-2008
123 8-1-2009 8-4-2009
123 8-4-2009 NULL

If I wanted to pull the most current items history information, I would only return the records from 8-1-2009. The indication is the mismatch between the effect_start and the effect_end fields. I have tried all sorts of ways to make this happen. My understanding of SQL deals with databases other than Teradata, and I can't recall ever having this much trouble doing this. I would think that there is a way to compare the data within a order recordset/answerset. Am I completely off base here? Can this and/or how do you perform this operation. Please understand that I would prefer to not build temp tables, but I will if I have to. All of my attempts have been without the use of temp tables.
7 REPLIES
Enthusiast

Re: Date Gap Analysis

Try this:

SELECT
t.upc,
t.effect_start,
t.effect_end,
MIN(t.effect_end) OVER (
PARTITION BY t.upc
ORDER BY t.effect_start
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
) AS effect_end_prev
FROM
MyTable AS t
QUALIFY
t.effect_start <> effect_end_prev;

Please note that this code will select any invalid dates i.e. both gaps and overlaps. To select gaps only, change the '<>' operator to '>'.
Enthusiast

Re: Date Gap Analysis

Thanks for the reply. I'll give it a shot. What I finally came up with was a procedure that processes the data recursively. Here is what I ended up doing.

REPLACE PROCEDURE ITEM.getcurrentitemhistory()

BEGIN

DECLARE items SCROLL CURSOR FOR

SEL *

FROM

(

SEL a.*,ROW_NUMBER() OVER(PARTITION BY upc,sku ORDER BY effective_start_date DESC) arn1

FROM item.chris_test_table a

) a;

DEL FROM item.chris_test_table2;

OPEN items;

label: FOR myitems AS items CURSOR FOR

SEL *

FROM

(

SEL a.*,ROW_NUMBER() OVER(PARTITION BY upc,sku ORDER BY effective_start_date DESC) arn1

FROM item.chris_test_table a

) a

DO

INSERT INTO item.chris_test_table2

SEL upc,sku ,effective_start_date,effective_end_date,curr_rec_ind ,primary_sku_ind

FROM

(

SEL a.*,ROW_NUMBER() OVER(PARTITION BY upc,sku ORDER BY effective_start_date DESC) arn1

FROM item.chris_test_table a

) a

WHERE a.effective_end_date=myitems.effective_start_date

OR a.effective_end_date IS NULL;

END FOR;

CLOSE items;

END;

This is definitely not the prettiest or the cleanest probably, but it does work. It processes and handles 21 million records in about 1.5 minutes. Since this is a proof of concept, it will do for now I guess, but I'd love to know how to make it better. I guess I was more hoping to not have to use a procedure and cursor. I was hoping to be able to do it with a simple query off the base table instead of building a separate table.
Enthusiast

Re: Date Gap Analysis

Ok... I ran the SQL that you gave, and it produced only part of my sample set.... What I mean is that it produced 15 rows out of 21 million..... Is there something I'm missing? Sorry if these are beginner questions. I'm not a Teradata guru by any means, but I like to think I'm ok at it. :P Anyways it appears that the SQL gathers 1 maybe 2 dates on either side of a central point, which is what I would expect it to do for each record. Do I have to add with recursive to make this work or is there another way to do it?
Enthusiast

Re: Date Gap Analysis

We are using V2R6.
Enthusiast

Re: Date Gap Analysis

Thanks for all of your replies. It appears that (according to the responses) that the only true way to gather the results I expect is through recursive functionality. I appreciate everyone's help on this.
Enthusiast

Re: Date Gap Analysis

OK, I misunderstood the task - my code simply identifies records that have a wrong date i.e. date intervals overlap or there's a gap between them.

How about that:

WITH RECURSIVE tmp (
upc,
effect_start
) AS (
-- latest records
SELECT
t.upc,
MAX(t.effect_start) AS effect_start
FROM
MyTable AS t
GROUP BY
t.upc
UNION ALL
-- previous records
SELECT
t.upc,
t.effect_start
FROM
MyTable AS t,
tmp
WHERE
t.upc = tmp.upc
AND
t.effect_end = tmp.effect_start
AND
t.effect_start <> t.effect_end -- condition to avoid infinite recursive loop (records with effect_start = effect_end will be selected anyway)
)
SELECT
t.*
FROM
MyTable AS t
JOIN
tmp
ON
t.upc = tmp.upc
AND
t.effect_start = tmp.effect_start
ORDER BY
t.upc,
t.effect_start,
t.effect_end DESC;

The idea behind that code is to find the latest records (that is done in the seed part of the recursive query) and then recursively "go back" in the history of each upc which is done by using dates to look for the previous record.

Re: Date Gap Analysis

I think this query show the mistaches between effect_star and effect_end.

select effect_start
from test a
where not exists (
select 'x'
from test b
where a.cod=b.cod
and a.effect_start=b.effect_end)
and a.effect_start > (
select min(c.effect_start)
from test c
where a.cod=c.cod)