Emp Records EmpNo Name ActiveInd StartDt EndDate 10 Sam Y 2007-01-01 2007-12-31 10 Sam N 2008-01-01 2008-09-30
20 Pam Y 2006-01-01 2006-12-31 20 Pam Y 2007-01-01 2007-10-31 20 Pam N 2008-01-01 2008-12-31
30 Ram Y 2007-01-01 2007-12-31 30 Ram Y 2008-01-01 2008-12-31
Result set should be something like this PayDt Payno PayAmt EmpNo Name ActiveInd StartDt EndDate 1)2007-10-01 100 10000 10 Sam Y 2007-01-01 2007-12-31 2)2007-11-01 200 15000 10 Sam Y 2007-01-01 2007-12-31 3)2008-10-10 300 10000 10 Sam N 2008-01-01 2008-09-30
4)2007-12-10 400 25000 20 Pam Y 2007-01-01 2007-10-31 5)2008-01-01 500 20000 20 Pam N 2008-01-01 2008-12-31
6)2006-12-05 600 30000 30 Ram Y 2007-01-01 2007-12-31
In Example 1,2,5 the EmpNo matches between Pay Records and Emp Records and PayDt falls between StartDt and EndDt
In Example 3 the employee was paid on 2008-10-10 although he was fired on 2008-09-30.The above condition would therefore won't work
In Example 4 the employee was paid on 2007-12-10 although he was fired on
In Example 6, the employee was paid a advance on 2006-12-05 and he joined on 2007-01-01.The above condition would therefore won't work.
In short find the span that is a exact match , if one does not exist find the closest min span , if one does not exist look out for closest future span.
Hi Sam, i usually use following apporach for joining on "closest date":
UNION dates from both tables, add a marker to remember from which table is a specific date. Use an OLAP function to look for the latest date from the "second" table. Filter rows from the "first table" using QUALIFY. This results in first.date and exact second.date which are used in a three-way join.
SELECT pay.*, emp.* FROM pay JOIN ( SELECT EmpNo, StartDt, COALESCE ( MAX(CASE WHEN enddate IS NOT NULL THEN startdt END) OVER (PARTITION BY empno ORDER BY startdt, enddate DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) ,MIN(CASE WHEN enddate IS NOT NULL THEN startdt END) OVER (PARTITION BY empno ORDER BY startdt, enddate DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) ) AS MatchedStartDt FROM ( SELECT empno,startdt, enddate FROM emp UNION ALL SELECT empno, paydt, NULL FROM pay ) dt QUALIFY enddate IS NULL ) dt ON pay.empno = dt.empno AND pay.paydt = dt.startdt JOIN emp ON dt.empno = emp.empno AND dt.MatchedStartDt = emp.startdt ORDER BY emp.empno, dt.startdt
If the number of rows per empno is large (or if you got more than one condition) this is much faster than using non-equi joins (resulting in a huge intermediate spool) followed by a QUALIFY on ROW_NUMBER.