Handling Gaps in Date Ranges.

General
Enthusiast

Handling Gaps in Date Ranges.

Most of my SQL experience has been in writing DB2 Stored procedures which processed cursors.

I am working on a TD BTEQ script, and have ran into a problem doing something through set processing, which was not so hard to do when processing one row at a time while looping through cursors within a stored procedure or a COBOL program.

I have a table which lists a series of events associated to an individual employee. These events have a begin date, and an end date. Separate events may have overlaping dates, but even so, the days should only be counted once.

To calculate a combined total of days for these events, the process needs to deterimine a series of one year date ranges based on selecting the earliest begin date as the starting date and extending for a one year period inclusive of any subsequent events occuring in that same one year period.

If that one year calculation period end date falls on a day which is between an event begin and end date, in other words it falls within an active event, the next calculation range begins on the day after the previous calculation range end date.

I have run into a problem when the end date of a calculation period falls on a day between active events, or in other words falls within a gap between events. In this case, the next calculation date range needs to "roll forward" and begin on the date of the next event period.

The table columns are

I_LVE_EVT (Numerical Event ID),
I_LVE_PRD (Numerical Period ID),
D_BGN (Date this period began),
D_END (Date this period ended)

As an example of what I need to do, these are two events, each with several periods, for the same employee. There are some gaps in the period date ranges which are causing me some difficulty.

I_LVE_EVT I_LVE_PRD D_BGN D_END
81619 1 4/13/2004 3/7/2005
81621 1 6/15/2004 3/7/2005
81621 2 3/8/2005 3/7/2006
81619 2 3/17/2005 3/7/2006
81619 4 4/17/2006 3/1/2007
81621 4 4/18/2006 3/1/2007
81621 5 3/12/2007 4/12/2007
81619 5 4/2/2007 4/12/2007
81619 6 4/25/2007 3/1/2008
81621 6 4/27/2007 3/1/2008
81621 7 3/2/2008 4/26/2008
81619 8 5/5/2008 7/29/2008
81621 8 6/4/2008 9/24/2008
81619 9 9/3/2008 3/1/2009
81621 9 1/12/2009 2/28/2009
81621 10 3/9/2009 8/31/2009
81619 10 5/18/2009 9/1/2009

The first calculation range is based on the begining of the earliest period, and has a duration of one year. The end of that range falls during an active event so the next calculation period begins the day after the first ended, and has a duration of one year.

The end of the second calculation range falls between active periods, so the begin date of the third range "rolls forward" to the begin date of the next event period, and there will also be a gap at the end of the third period resulting in the same sort of adjustment as the begin date of the forth range again has to roll forward to the the next period begin date.

The forth and fifth calculation range end dates fall during active events, so these both begin the day after the end date of the previous range.

The example above should result in these ranges.

4/13/2004 - 4/12/2005
4/13/2005 - 4/12/2006

4/17/2006 - 4/16/2007

4/25/2007 - 4/24/2008
4/25/2008 - 4/24/2009
4/25/2009 - 4/24/2010

I suspect others have encountered similar needs, and hope someone out there has a good solution to this problem.

Any suggestions ?
7 REPLIES
Junior Contributor

Re: Handling Gaps in Date Ranges.

Uh oh, this is a hard one.
If i hear "cursor", i usually think about OLAP-functions. But this is probably not possible using those (at least not without nesting several levels).
Then i thought, "hey this is task for recursion", but Teradata's WITH RECURSIVE is too limited.

So IMHO there's only an SP with a loop doing INSERT/SELECTs into a temp table or (in worst case) a cursor with INSERTs.
Which is more efficient depends on the table size, the number of rows processed per employee and any additional columns required in the answer set.

Dieter
Enthusiast

Re: Handling Gaps in Date Ranges.

We have a strong desire to avoid cursors, or loops, if at all possilbe.

The current SP, which the new process will replace, rolls a cursor on this data.

It is one of a half dozen cursors in this SP, and for each one of them, there are several million rows fetched.

This is the only one we haven't yet resolved with set processing.

The currrent SP runs 36 hours on average, and the BTEQ script we are developing runs in less that three minutes.
Junior Contributor

Re: Handling Gaps in Date Ranges.

Cursor in Teradata are really worst case, sequential processing in a parallel database, so it's a good ideas to avoid them :-)

Out of curiosity, why do you use a BTEQ script instead of an SP?
Without SP it's hard to write loops.
And is that 3 minutes with or without that last cursor?

This would much easier in a WITH RECURSIVE, but aggregates are not allowed in there:

CREATE VOLATILE TABLE temp, NO LOG (emp_no INT, d_bgn DATE, d_end DATE, lvl INT)
ON COMMIT PRESERVE ROWS;

REPLACE PROCEDURE sp ()
BEGIN
DECLARE lvl INT DEFAULT 1;

INSERT INTO temp
SELECT emp_no, MIN(d_bgn) AS x, ADD_MONTHS(MIN(d_bgn),12) -1,:lvl
FROM dropme
GROUP BY 1;

WHILE ACTIVITY_COUNT > 0 DO
INSERT INTO temp
SELECT
t1.emp_no,
CASE WHEN MIN(t2.d_bgn) <= ADD_MONTHS(t1.d_bgn,12) THEN ADD_MONTHS(t1.d_bgn,12) ELSE MIN(t2.d_bgn) END AS x,
ADD_MONTHS(x, 12) -1,
lvl + 1
FROM temp AS t1 JOIN dropme AS t2
ON t1.emp_no = t2.emp_no AND
t2.d_end >= ADD_MONTHS(t1.d_bgn,12)
WHERE t1.lvl = :lvl
GROUP BY 1,4, t1.d_bgn;

SET lvl = lvl + 1;

END WHILE;
END;

CALL sp();

SELECT * FROM temp
ORDER BY 2;

Dieter
Enthusiast

Re: Handling Gaps in Date Ranges.

The prototype BTEQ executes in three minutes and it does indeed include processing all the same data as is being processed by the related cursor in the SP.

We may need to combine a call to a SP to do this, but are still looking at "With Recursive".

I agree that not being able to perform the aggregates is a big problem, but I keep thinking maybe this can occur outside of the box, or temp table.

Your idea may be the only way to do this, and I certainly appreciate the example.

Enthusiast

Re: Handling Gaps in Date Ranges.

Dieter,
I am thinking that If I can just come up with the begin dates of the periods, I could do the math outside the recursive SQL to determine ranges.
Enthusiast

Re: Handling Gaps in Date Ranges.

I believe below approach should be more efficient.

1. get all valid begin dates

create volatile table valid_begin_dates as
(
sel calendar_date d_bgn from sys_calendar.calendar, temp_source t
where calendar_date between t.d_bgn and t.d_end
group by 1
) with data on commit preserve rows;

2. create volatile table final_date_ranges as valid_begin_dates with no data on commit preserve rows;

3. use cursor here loop (sel (max(d_end)- min(d_bgn))/365 +1) times

insert into final_date_ranges
sel min(d_bgn) from valid_begin_dates t1
where t1.d_bgn > (sel add_months(coalesce( max(d_bgn) , date'1900-01-01' ) ,12) -1 from final_date_ranges)

4. sel d_bgn, add_months(d_bgn,12) -1 from final_date_ranges
order by 1;

George
Enthusiast

Re: Handling Gaps in Date Ranges.

Thank you George, you are indeed correct.

Several people were looking at this, and one of our DB2 developers provided a recursive example that worked fine in DB2. The Teradata DBA working on this project was able to translate this approach into a Teradate recursive solution, very much like the example you have provided.

I have installed that in the BTEQ, and have been testing it for a number of days. It works like a champ, and we have avoided any sequential processing. The BTEQ script now executes in about five minutes. Comparing that to the average 36 + hour execution of the stored procedure it replaces, is absolutely amazing.