Most of my SQL experience has been in writing DB2 Stored procedures which processed cursors.
I am working on a TD BTEQ script, and have ran into a problem doing something through set processing, which was not so hard to do when processing one row at a time while looping through cursors within a stored procedure or a COBOL program.
I have a table which lists a series of events associated to an individual employee. These events have a begin date, and an end date. Separate events may have overlaping dates, but even so, the days should only be counted once.
To calculate a combined total of days for these events, the process needs to deterimine a series of one year date ranges based on selecting the earliest begin date as the starting date and extending for a one year period inclusive of any subsequent events occuring in that same one year period.
If that one year calculation period end date falls on a day which is between an event begin and end date, in other words it falls within an active event, the next calculation range begins on the day after the previous calculation range end date.
I have run into a problem when the end date of a calculation period falls on a day between active events, or in other words falls within a gap between events. In this case, the next calculation date range needs to "roll forward" and begin on the date of the next event period.
The table columns are
I_LVE_EVT (Numerical Event ID), I_LVE_PRD (Numerical Period ID), D_BGN (Date this period began), D_END (Date this period ended)
As an example of what I need to do, these are two events, each with several periods, for the same employee. There are some gaps in the period date ranges which are causing me some difficulty.
The first calculation range is based on the begining of the earliest period, and has a duration of one year. The end of that range falls during an active event so the next calculation period begins the day after the first ended, and has a duration of one year.
The end of the second calculation range falls between active periods, so the begin date of the third range "rolls forward" to the begin date of the next event period, and there will also be a gap at the end of the third period resulting in the same sort of adjustment as the begin date of the forth range again has to roll forward to the the next period begin date.
The forth and fifth calculation range end dates fall during active events, so these both begin the day after the end date of the previous range.
Uh oh, this is a hard one. If i hear "cursor", i usually think about OLAP-functions. But this is probably not possible using those (at least not without nesting several levels). Then i thought, "hey this is task for recursion", but Teradata's WITH RECURSIVE is too limited.
So IMHO there's only an SP with a loop doing INSERT/SELECTs into a temp table or (in worst case) a cursor with INSERTs. Which is more efficient depends on the table size, the number of rows processed per employee and any additional columns required in the answer set.
Cursor in Teradata are really worst case, sequential processing in a parallel database, so it's a good ideas to avoid them :-)
Out of curiosity, why do you use a BTEQ script instead of an SP? Without SP it's hard to write loops. And is that 3 minutes with or without that last cursor?
This would much easier in a WITH RECURSIVE, but aggregates are not allowed in there:
CREATE VOLATILE TABLE temp, NO LOG (emp_no INT, d_bgn DATE, d_end DATE, lvl INT) ON COMMIT PRESERVE ROWS;
REPLACE PROCEDURE sp () BEGIN DECLARE lvl INT DEFAULT 1;
INSERT INTO temp SELECT emp_no, MIN(d_bgn) AS x, ADD_MONTHS(MIN(d_bgn),12) -1,:lvl FROM dropme GROUP BY 1;
WHILE ACTIVITY_COUNT > 0 DO INSERT INTO temp SELECT t1.emp_no, CASE WHEN MIN(t2.d_bgn) <= ADD_MONTHS(t1.d_bgn,12) THEN ADD_MONTHS(t1.d_bgn,12) ELSE MIN(t2.d_bgn) END AS x, ADD_MONTHS(x, 12) -1, lvl + 1 FROM temp AS t1 JOIN dropme AS t2 ON t1.emp_no = t2.emp_no AND t2.d_end >= ADD_MONTHS(t1.d_bgn,12) WHERE t1.lvl = :lvl GROUP BY 1,4, t1.d_bgn;
I believe below approach should be more efficient.
1. get all valid begin dates
create volatile table valid_begin_dates as ( sel calendar_date d_bgn from sys_calendar.calendar, temp_source t where calendar_date between t.d_bgn and t.d_end group by 1 ) with data on commit preserve rows;
2. create volatile table final_date_ranges as valid_begin_dates with no data on commit preserve rows;
3. use cursor here loop (sel (max(d_end)- min(d_bgn))/365 +1) times
insert into final_date_ranges sel min(d_bgn) from valid_begin_dates t1 where t1.d_bgn > (sel add_months(coalesce( max(d_bgn) , date'1900-01-01' ) ,12) -1 from final_date_ranges)
4. sel d_bgn, add_months(d_bgn,12) -1 from final_date_ranges order by 1;
Several people were looking at this, and one of our DB2 developers provided a recursive example that worked fine in DB2. The Teradata DBA working on this project was able to translate this approach into a Teradate recursive solution, very much like the example you have provided.
I have installed that in the BTEQ, and have been testing it for a number of days. It works like a champ, and we have avoided any sequential processing. The BTEQ script now executes in about five minutes. Comparing that to the average 36 + hour execution of the stored procedure it replaces, is absolutely amazing.