Need help in Optimizing Row_Number/RANK() operations

Database
Enthusiast

Need help in Optimizing Row_Number/RANK() operations

Hello

I have source of transactional data..  For a particular primary key I need to compare the current row with previous row to see if the data received in current row is exact duplicate. If so i can drop the current row (INTRA SCD).

To do this operation I'm using the Ordered analytical function( Row_Number()). My table is huge and it contains around 50 million data, So row_number operation is very long time and lot of CPU's. I can see in explain plan that it always goes for full table scan and redistributes it again.How to improve this

Table structure :

CREATE TABLE SOURCE_DATA
(
KEY1 varchar(50),
KEY2 varchar(50),
KEY3 varchar(50),
Data_Value varchar(500),
Transaction_dttm timestamp(6))
)
Primary index ( key1,key2,key3);


Row_Number syntax:

Select 
a.*
row_Number() over(partition by key1,key2,key3 order by Transaction_dttm asc)
FROM SOURCE_DATA;

Please suggest...

7 REPLIES
Enthusiast

Re: Need help in Optimizing Row_Number/RANK() operations

You can try something like this:

select .....,your_field, max(your_field) over (order by your_field rows between 1 preceding and 1 preceding) m

qualify m<your_field----- <for test>

Enthusiast

Re: Need help in Optimizing Row_Number/RANK() operations

I cant use that because the sample kind of logic is present in other tables where I've lot of the Data Values(around 20 values) , so I need to have 20 analytical functions which makes things much worse..

Let me explain my case with simple examples..

Source Data

Key1     Key2        Data_value1             Data_value2    Transaction_Dttm
1 a x y 2014-01-01 01:00:00
1 a x b 2014-01-01 02:00:00
1 a x b 2014-01-01 03:00:00
1 a x y 2014-01-01 04:00:00
2 a z z 2014-01-01 05:00:00
2 a z z 2014-01-01 06:00:00


 

Expected Output

Key1     Key2        Data_value1           Data_value2  Valid_From_Dttm          Valid_To
1 a x y 2014-01-01 01:00:00 2014-01-01 01:59:59
1 a x b 2014-01-01 02:00:00 2014-01-01 03:59:59
1 a x y 2014-01-01 04:00:00 9999-12-31 23:59:59
2 a z z 2014-01-01 05:00:00 9999-12-31 23:59:59


How can i efficiently Transform like this ?

Enthusiast

Re: Need help in Optimizing Row_Number/RANK() operations

What is the logic, that you arrive at valid_to  from the output of ist row , 2nd row, 3rd row, 4th row?

Junior Contributor

Re: Need help in Optimizing Row_Number/RANK() operations

Can you show your current process?

And what's your TD release?

Enthusiast

Re: Need help in Optimizing Row_Number/RANK() operations

Hi Raja

I'll compare the current row with the Next row and drop the next row if the data value is same. 

If the data value is different then Valid_To_dttm of current row will be next row's Transaction_dttm-1 second. This is to avoid duplications of data.

Enthusiast

Re: Need help in Optimizing Row_Number/RANK() operations

Hi Dieter

We have teradata 13.10 version.

Currently I'm creating the row number for each record for the same primary key and join with same table based on primary key +row number ,. something like this.

Consider this is the source data : Table_01

Key1     Key2        Data_value1             Data_value2    Transaction_Dttm
1 a x y 2014-01-01 01:00:00
1 a x b 2014-01-01 02:00:00
1 a x b 2014-01-01 03:00:00
1 a x y 2014-01-01 04:00:00
2 a z z 2014-01-01 05:00:00
2 a z z 2014-01-01 06:00:00

 

Step 2 : Adding row number : table_02

 

ins into table_02
select
a.*
,row_number() OVER (partition key1,key2 order by Transaction_dttm asc) as rownumber
from table_01 a;

Key1     Key2        Data_value1             Data_value2    Transaction_Dttm Rownumber
1 a x y 2014-01-01 01:00:00 1
1 a x b 2014-01-01 02:00:00 2
1 a x b 2014-01-01 03:00:00 3
1 a x y 2014-01-01 04:00:00 4
2 a z z 2014-01-01 05:00:00 1
2 a z z 2014-01-01 06:00:00 2

Step 03: 

 insert into table_03
select
a.key1
,a.key2
,a.data_value1
,a.data_value2
,a.transaction_dttm as new_valid_from_dttm
, COALESCE( MAXIMUM ( a.transaction_dttm ) OVER ( PARTITION BY a.key1 , a.key2 ORDER BY a.transaction_dttm ROWS BETWEEN 1 FOLLOWING
AND 1 FOLLOWING ) , date'9999-12-31' ) AS NEW_VALID_TO_DTTM,
,CASE
WHEN COALESCE( a.data_value1 , 'ZZ' ) = COALESCE( b.data_value1 , 'ZZ' )
AND COALESCE( a.data_value2 , 'ZZ' ) = COALESCE( b.data_value2 , 'ZZ' )
AND a.ROW_NUM > 1 THEN 0
ELSE 1
ENd as data_chk
from table_02 a join table_02 b
on a.key1=b.key1
and a.key2=b.key2
and a.rownumber=b.rownumber+1
Where data_chk=;
Junior Contributor

Re: Need help in Optimizing Row_Number/RANK() operations

Are you sure that this is returning the expected result?

Seems like you want to combine multiple consecutive rows with the same values into one, a slowly changing dimension? This is why there are Temporal Tables :)

You might try to utilize the TD_NORMALIZE_MEET function which works on Periods, so you must create a Period first:

INSERT INTO table_02
WITH cte AS
(
SELECT
Key1
,Key2
,Data_value1
,Data_value2
,PERIOD(Transaction_Dttm
,COALESCE(MAX(Transaction_Dttm)
OVER (PARTITION BY Key1, Key2
ORDER BY Transaction_Dttm
ROWS BETWEEN 1 FOLLOWING
AND 1 FOLLOWING)
,TIMESTAMP'9999-12-31 23:59:59.999999')
) AS pd
FROM table_01
)
SELECT
Key1
,Key2
,Data_value1
,Data_value2
,BEGIN(pd2)
,LAST(pd2)
FROM TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.Key1
,cte.Key2
,cte.Data_value1
,cte.Data_value2)
,cte.pd)
RETURNS (Key1 INT
,Key2 CHAR
,Data_value1 CHAR
,Data_value2 CHAR
,pd2 PERIOD(TIMESTAMP(6))
,Nrm_Count INTEGER)
HASH BY Key1, Key2, Data_value1, Data_value2
LOCAL ORDER BY Key1, Key2, Data_value1, Data_value2, pd
) dt