Loading changed data

Database
Enthusiast

Loading changed data

Hi,

I am loading history data. The requirement is to load only changed data for some column. In below e.g. the comparing column is “COL1”. Col1 value is not changed from source start date 01/02/2007 to 01/05/2007. So only one value is inserted into target table also SOURCE_END_DT is also changed accordingly. (please see below).

I have design a solution. The approach is below –
1) Using csum function first generate the sequence.
2) Inserted first row into a temp table.
3) updating the end date of this record from source table till the value of col1 is same. I am doing this through loop. I have two bteq scripts. One is for insert another one is for update.

The solution is working fine. But the volume of data is very huge (150) and time is very less.

So, I am looking for a solution where the above functionality can be achieved with loops. I am able to get the changed records but I am not able to achieve the right sequence of dates.

Any help will be highly appreciated. Thanks in advance.

Source data

AGRMNT_ID COL1 SOURCE_START_DT SOURCE_END_DT
43872823 0 01/02/2007 01/03/2007
43872823 0 01/03/2007 01/04/2007
43872823 0 01/04/2007 01/05/2007
43872823 0 01/05/2007 01/06/2007
43872823 2.0 01/06/2007 01/07/2007
43872823 0 01/07/2007 01/08/2007
43872823 0 01/08/2007 01/09/2007
43872823 1.75 01/09/2007 01/10/2007
43872823 0 01/10/2007 01/11/2007
43872823 0 01/11/2007 01/12/2007
43872823 0 01/12/2007 01/01/2008

Target data should be as below.

AGRMNT_ID COL1 SOURCE_START_DT SOURCE_END_DT
43872823 0 01/02/2007 01/06/2007
43872823 2.0 01/06/2007 01/09/2007
43872823 1.75 01/10/2007 01/10/2007
43872823 0 01/12/2007 01/01/2008
2 REPLIES
Enthusiast

Re: Loading changed data

Hi,

Alternatively the same thing can be achieved by below four steps. it doesnot use any updates but use 3 temp tables. it will be faster if the index of intermediate tables are chosen wisely.

Step 1. Give a sequence number to each row. For each AGRMNT_ID, the lowest SOURCE_START_DT is given 1 and next row is given 2 and so on. use Rank to achieve this.
RANK () OVER (PARTITION BY AGRMNT_ID ORDER BY SOURCE_START_DT asc ) AS SEQ_NO

You can store this data into a separate table say Table_Rank. The output of this step would be
AGRMNT_ID COL1 SOURCE_START_DT SOURCE_END_DT SEQ_NO

43872823 0 01/02/2007 01/03/2007 1

43872823 0 01/03/2007 01/04/2007 2

43872823 0 01/04/2007 01/05/2007 3

43872823 0 01/05/2007 01/06/2007 4

43872823 2.0 01/06/2007 01/07/2007 5

43872823 0 01/07/2007 01/08/2007 6

43872823 0 01/08/2007 01/09/2007 7

43872823 1.75 01/09/2007 01/10/2007 8

43872823 0 01/10/2007 01/11/2007 9

43872823 0 01/11/2007 01/12/2007 10

43872823 0 01/12/2007 01/01/2008 11

Step 2. In Step 2 you compare the two cosecutive seq_nos from Table_Rank and check if there is any change between consecutive records if there is a change then insert into the temp table else discard rows.

Insert into Table_Temp
(
AGRMNT_ID,
COL1,
SOURCE_START_DT,
SOURCE_END_DT,
SEQ_NO
)
SELECT
T1.AGRMNT_ID,
T1.COL1,
T1.SOURCE_START_DT,
T1.SOURCE_END_DT,
T1.SEQ_NO
FROM
Table_Rank T1
WHERE SEq_No = 1

UNION ALL

SELECT
T1_A.AGRMNT_ID,
T1_A.COL1,
T1_A.SOURCE_START_DT,
T1_A.SOURCE_END_DT,
T1_A.SEQ_NO
FROM
Table_Rank T1_A,
Table_Rank T1_B
WHERE
T1_A.AGRMNT_ID = T1_B.AGRMNT_ID
AND T1_A.SEq_No = T1_B.Seq_No+1
AND COALESCE(T1_A.COL1,-1) <> COALESCE(T1_B.COL1,-1);

The output of this step would be
AGRMNT_ID COL1 SOURCE_START_DT SOURCE_END_DT SEQ_NO

43872823 0 01/02/2007 01/03/2007 1

43872823 2.0 01/06/2007 01/07/2007 5

43872823 0 01/07/2007 01/08/2007 6

43872823 1.75 01/09/2007 01/10/2007 8

43872823 0 01/10/2007 01/11/2007 9

Step 3. As you can see the Seq_No has become non-sequential and we need to make Seq_no again Sequential .

Insert into Table_Rank2
(
AGRMNT_ID,
COL1,
SOURCE_START_DT,
SOURCE_END_DT,
SEQ_NO
)
SELECT
T1.AGRMNT_ID,
T1.COL1,
T1.SOURCE_START_DT,
T1.SOURCE_END_DT,
RANK () OVER (PARTITION BY T1.AGRMNT_ID ORDER BY T1.SOURCE_START_DT asc ) AS SEQ_NO
FROM
Table_Temp T1

The output of this step would be
AGRMNT_ID COL1 SOURCE_START_DT SOURCE_END_DT SEQ_NO

43872823 0 01/02/2007 01/03/2007 1

43872823 2.0 01/06/2007 01/07/2007 2

43872823 0 01/07/2007 01/08/2007 3

43872823 1.75 01/09/2007 01/10/2007 4

43872823 0 01/10/2007 01/11/2007 5

Step 4. In last step we set the Dates correctly by self join and comparing consecutive Seq_No's

Del from Target_table;

Insert into Target_table
(
AGRMNT_ID,
COL1,
SOURCE_START_DT,
SOURCE_END_DT,
SEQ_NO
)
SELECT
T1.AGRMNT_ID,
T1.COL1,
T1.SOURCE_START_DT,
Coalesce(T2.SOURCE_START_DT,T1.SOURCE_END_DT) // Coalesce is used set SOURCE_END_DT for the last record.
FROM
Table_Temp T1
left outer join
Table_Temp T2
on t1.AGRMNT_ID = t2.AGRMNT_ID
and t1.seq_no = t2.Seq_no-1;

The output of this step would be
AGRMNT_ID COL1 SOURCE_START_DT SOURCE_END_DT

43872823 0 01/02/2007 01/06/2007

43872823 2.0 01/06/2007 01/07/2007

43872823 0 01/07/2007 01/09/2007

43872823 1.75 01/09/2007 01/10/2007

43872823 0 01/10/2007 01/11/2007

Hope this helps

Ignore if there are some Typo errors

Cheers,
Novice
Enthusiast

Re: Loading changed data

Thanks for the quick response. Thanks for the solution. This solution gives the first record if the comparing value is same in following months. So in the above example the last rows is-

43872823 0 01/10/2007 01/11/2007

but it should be 43872823 0 01/10/2007 01/01/2008 .

I have developed a similar solution where I am able to get the last row if the comparing value is same in following months (using CSUM). but it was failed in few scenarios.

But this solution is good. End date of last record can be update.

Thanks again.