Select rows which have at least one relevant change compared to previous row

Database
Fan

Select rows which have at least one relevant change compared to previous row

Hi all,

 

I am working with a table that contains versions of loan contracts. Any change in an attribute leads to a new row/version. However, I am only interested in a part of the attributes. Therefore I would like to compare those interesting attributes with the previous row to see if there is a change. If not, I would like to ignore that version.

 

In the example below, I am only interested in changes to the limit so I want to select rows 2,3 and 4, but not 5 and 6.

I tried to achieve this with QUALIFY ROW_NUMBER() OVER (PARTITION BY Loan_ID, Limit ORDER BY Start_date) = 1,

but because row 2 and 4 are the same, it did not select row 4.

 

Any ideas?

 example loan versions.PNG

 


Accepted Solutions
Apprentice

Re: Select rows which have at least one relevant change compared to previous row

If you also want to see the first row from your data set (where there is no previous row), try:

SELECT a.*
FROM vt1 AS a
QUALIFY contract_limit <> FIRST_VALUE(contract_limit) OVER(ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
  OR FIRST_VALUE(contract_limit) OVER(ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL
ORDER BY 1,2;

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
1 ACCEPTED SOLUTION
8 REPLIES
Apprentice

Re: Select rows which have at least one relevant change compared to previous row

Hi,

Try this:

SELECT a.*
FROM vt1 AS a
QUALIFY contract_limit <> FIRST_VALUE(contract_limit) OVER(ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
ORDER BY 1,2;

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Highlighted
Apprentice

Re: Select rows which have at least one relevant change compared to previous row

The answer above is only looking at the 'limit' column, extend it to look at other columns.

HTH
Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Apprentice

Re: Select rows which have at least one relevant change compared to previous row

If you also want to see the first row from your data set (where there is no previous row), try:

SELECT a.*
FROM vt1 AS a
QUALIFY contract_limit <> FIRST_VALUE(contract_limit) OVER(ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
  OR FIRST_VALUE(contract_limit) OVER(ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL
ORDER BY 1,2;

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Select rows which have at least one relevant change compared to previous row

In your SQL, you need to use your version begin and end dates to insure you are only comparing a new version to the previous version, and not all other versions.  

Apprentice

Re: Select rows which have at least one relevant change compared to previous row

I don't know about 'previous version' the original question was about the previous row (when sorted by start_date) and that is what the "order by start_date" is intended to do.

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Select rows which have at least one relevant change compared to previous row

Well, how do you apply versioningso it will join to other tables?  If you use a version begin and end date range, the previous version should end just prior to the new version beginning (see below). Good luck 

 

acct_id       dw_beg_dt   dw_end_dt

TM6525395   1111-01-01 2007-03-31
TM6525395   2007-04-01 2007-06-30
TM6525395   2007-07-01 2008-07-31
TM6525395   2008-08-01 5555-12-31   

Apprentice

Re: Select rows which have at least one relevant change compared to previous row

Hi,

I quite agree with you, applying a 'version' to each row is common and probably necessary for joins - as you mention.

I was simply trying to respond to the original question without making assumptions about what other columns there might be in the table.

Let's see whether the original problem is solved, and if not we can take it from there.

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Fan

Re: Select rows which have at least one relevant change compared to previous row

Thank you! This answer was the clue I needed.

I had never heard about FIRST_VALUE before. It took me all day and the discovery of LAST_VALUE and IGNORE NULLS to piece together the complete solution but I got it.

Albeit an inefficient one I suspect, using two intermediate tables. Any suggestion how to improve the select below?

CREATE TABLE vcontract_status,FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL(
   Loan_ID     INTEGER NOT NULL  ,Contract_status     INTEGER 
  ,Irrelevant_attribute         CHAR(1)
  ,Contract_status_start_date         DATE  ,Contract_status_end_date            DATE)PRIMARY INDEX( Loan_ID );

INSERT INTO vcontract_status
VALUES (1,10,'A',1170101,1170131);
INSERT INTO vcontract_status
VALUES (1,20,'B',1170201,1170228);
INSERT INTO vcontract_status
VALUES (1,10,'C',1170301,1170331);
INSERT INTO vcontract_status
VALUES (1,10,'D',1170401,NULL);
INSERT INTO vcontract_status
VALUES (22,10,'A',1170101,1170331);
INSERT INTO vcontract_status
VALUES (22,20,'B',1170401,NULL);

CREATE TABLE vcontract_limit,FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL(
   Loan_ID     INTEGER NOT NULL  ,Contract_limit     INTEGER 
  ,Irrelevant_attribute         CHAR(1)
  ,Contract_limit_start_date         DATE  ,Contract_limit_end_date        DATE)PRIMARY INDEX( Loan_ID );

INSERT INTO vcontract_limit
VALUES (1,1000,'X',1170101,1170214);
INSERT INTO vcontract_limit
VALUES (1,2000,'X',1170215,1170218);
INSERT INTO vcontract_limit
VALUES (1,1500,'X',1170219,1170408);
INSERT INTO vcontract_limit
VALUES (1,1500,'Y',1170409,NULL);
INSERT INTO vcontract_limit
VALUES (22,500,'Z',1170409,NULL);

WITH 
cte2 AS(
        SELECT 
        *
        FROM cte1 
        QUALIFY contract_status <> FIRST_VALUE(contract_status) OVER(PARTITION BY  loan_id ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
                     OR contract_limit <> FIRST_VALUE(contract_limit) OVER(PARTITION BY  loan_id ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
                        OR ROW_NUMBER() OVER(PARTITION BY loan_id ORDER BY start_date) = 1
                     OR (contract_status IS NOT NULL AND FIRST_VALUE(contract_status) OVER(PARTITION BY  loan_id ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL)
                     OR (contract_limit IS NOT NULL AND FIRST_VALUE(contract_limit) OVER(PARTITION BY  loan_id ORDER BY start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL)),cte1 AS(
        SELECT 
                COALESCE(t1.loan_id,t2.loan_id) AS Loan_ID                ,COALESCE(t1.contract_status_start_date,t2.contract_limit_start_date) AS Start_date                ,COALESCE(contract_status,LAST_VALUE(contract_status ignore NULLS) OVER(PARTITION BY COALESCE(t1.loan_id,t2.loan_id) ORDER BY COALESCE(t1.contract_status_start_date,t2.contract_limit_start_date) ROWS BETWEEN 99 PRECEDING AND 1 PRECEDING)) AS contract_status                ,COALESCE(contract_limit,LAST_VALUE(contract_limit ignore NULLS) OVER(PARTITION BY COALESCE(t1.loan_id,t2.loan_id) ORDER BY COALESCE(t1.contract_status_start_date,t2.contract_limit_start_date) ROWS BETWEEN 99 PRECEDING AND 1 PRECEDING)) AS contract_limit
        FROM vcontract_status t1
        FULL OUTER JOIN vcontract_limit t2
                ON t1.loan_id= t2.loan_id AND contract_status_start_date = contract_limit_start_date)SELECT
loan_id,contract_status,contract_limit,start_date        
        ,MAX(start_date) OVER (PARTITION BY Loan_ID ORDER BY  start_date ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)-1  AS end_date
        FROM cte2
ORDER BY 1,4