Database
Enthusiast

## Need help in Optimizing Row_Number/RANK() operations

Hello

I have source of transactional data..  For a particular primary key I need to compare the current row with previous row to see if the data received in current row is exact duplicate. If so i can drop the current row (INTRA SCD).

To do this operation I'm using the Ordered analytical function( Row_Number()). My table is huge and it contains around 50 million data, So row_number operation is very long time and lot of CPU's. I can see in explain plan that it always goes for full table scan and redistributes it again.How to improve this

Table structure :

`CREATE TABLE SOURCE_DATA(KEY1 varchar(50),KEY2 varchar(50),KEY3 varchar(50),Data_Value varchar(500),Transaction_dttm timestamp(6)))Primary index ( key1,key2,key3);`

Row_Number syntax:

`Select a.*row_Number() over(partition by key1,key2,key3 order by Transaction_dttm asc)FROM SOURCE_DATA;`

7 REPLIES
Enthusiast

## Re: Need help in Optimizing Row_Number/RANK() operations

You can try something like this:

select .....,your_field, max(your_field) over (order by your_field rows between 1 preceding and 1 preceding) m

qualify m<your_field----- <for test>

Enthusiast

## Re: Need help in Optimizing Row_Number/RANK() operations

I cant use that because the sample kind of logic is present in other tables where I've lot of the Data Values(around 20 values) , so I need to have 20 analytical functions which makes things much worse..

Let me explain my case with simple examples..

Source Data

`Key1     Key2        Data_value1             Data_value2    Transaction_Dttm1        a           x                       y              2014-01-01 01:00:001        a           x                       b              2014-01-01 02:00:001        a           x                       b              2014-01-01 03:00:001        a           x                       y              2014-01-01 04:00:002        a           z                       z              2014-01-01 05:00:002        a           z                       z              2014-01-01 06:00:00`

Expected Output

`Key1     Key2        Data_value1           Data_value2  Valid_From_Dttm          Valid_To1        a           x                     y            2014-01-01 01:00:00      2014-01-01 01:59:591        a           x                     b            2014-01-01 02:00:00      2014-01-01 03:59:591        a           x                     y            2014-01-01 04:00:00      9999-12-31 23:59:592        a           z                     z            2014-01-01 05:00:00      9999-12-31 23:59:59`

How can i efficiently Transform like this ?

Enthusiast

## Re: Need help in Optimizing Row_Number/RANK() operations

What is the logic, that you arrive at valid_to  from the output of ist row , 2nd row, 3rd row, 4th row?

Junior Contributor

## Re: Need help in Optimizing Row_Number/RANK() operations

Can you show your current process?

Enthusiast

## Re: Need help in Optimizing Row_Number/RANK() operations

Hi Raja

I'll compare the current row with the Next row and drop the next row if the data value is same.

If the data value is different then Valid_To_dttm of current row will be next row's Transaction_dttm-1 second. This is to avoid duplications of data.

Enthusiast

## Re: Need help in Optimizing Row_Number/RANK() operations

Hi Dieter

Currently I'm creating the row number for each record for the same primary key and join with same table based on primary key +row number ,. something like this.

Consider this is the source data : Table_01

`Key1     Key2        Data_value1             Data_value2    Transaction_Dttm1        a           x                       y              2014-01-01 01:00:001        a           x                       b              2014-01-01 02:00:001        a           x                       b              2014-01-01 03:00:001        a           x                       y              2014-01-01 04:00:002        a           z                       z              2014-01-01 05:00:002        a           z                       z              2014-01-01 06:00:00`

Step 2 : Adding row number : table_02

`ins into table_02selecta.*,row_number() OVER (partition key1,key2 order by Transaction_dttm asc) as rownumberfrom table_01 a;`

`Key1     Key2        Data_value1             Data_value2    Transaction_Dttm Rownumber1        a           x                       y              2014-01-01 01:00:00 11        a           x                       b              2014-01-01 02:00:00 21        a           x                       b              2014-01-01 03:00:00 31        a           x                       y              2014-01-01 04:00:00 42        a           z                       z              2014-01-01 05:00:00 12        a           z                       z              2014-01-01 06:00:00 2`

Step 03:

` insert into table_03 select a.key1 ,a.key2 ,a.data_value1 ,a.data_value2 ,a.transaction_dttm as new_valid_from_dttm , COALESCE( MAXIMUM ( a.transaction_dttm ) OVER ( PARTITION BY a.key1 , a.key2             ORDER BY a.transaction_dttm ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING ) , date'9999-12-31' ) AS NEW_VALID_TO_DTTM,    ,CASE                WHEN COALESCE( a.data_value1 , 'ZZ' ) = COALESCE( b.data_value1 , 'ZZ' )                 AND COALESCE( a.data_value2 , 'ZZ' ) = COALESCE( b.data_value2 , 'ZZ' ) AND a.ROW_NUM > 1 THEN 0  ELSE 1ENd as data_chk from table_02 a join table_02 b on a.key1=b.key1 and a.key2=b.key2 and a.rownumber=b.rownumber+1Where data_chk=;`
Junior Contributor

## Re: Need help in Optimizing Row_Number/RANK() operations

Are you sure that this is returning the expected result?

Seems like you want to combine multiple consecutive rows with the same values into one, a slowly changing dimension? This is why there are Temporal Tables :)

You might try to utilize the TD_NORMALIZE_MEET function which works on Periods, so you must create a Period first:

`INSERT INTO table_02WITH cte AS (   SELECT        Key1     ,Key2     ,Data_value1     ,Data_value2     ,PERIOD(Transaction_Dttm            ,COALESCE(MAX(Transaction_Dttm)                      OVER (PARTITION BY Key1, Key2                            ORDER BY Transaction_Dttm                            ROWS BETWEEN 1 FOLLOWING                                      AND 1 FOLLOWING)                     ,TIMESTAMP'9999-12-31 23:59:59.999999')            ) AS pd   FROM table_01 )SELECT   Key1  ,Key2  ,Data_value1  ,Data_value2  ,BEGIN(pd2)  ,LAST(pd2)  FROM TABLE (TD_NORMALIZE_MEET             (NEW VARIANT_TYPE(cte.Key1                              ,cte.Key2                              ,cte.Data_value1                              ,cte.Data_value2)             ,cte.pd)     RETURNS (Key1 INT             ,Key2 CHAR             ,Data_value1 CHAR             ,Data_value2 CHAR             ,pd2 PERIOD(TIMESTAMP(6))             ,Nrm_Count INTEGER)     HASH BY Key1, Key2, Data_value1, Data_value2      LOCAL ORDER BY  Key1, Key2, Data_value1, Data_value2, pd ) dt`