Database

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-07-2014
06:05 AM

10-07-2014
06:05 AM

Hello

I have source of transactional data.. For a particular primary key I need to compare the current row with previous row to see if the data received in current row is exact duplicate. If so i can drop the current row (INTRA SCD).

To do this operation I'm using the Ordered analytical function( Row_Number()). My table is huge and it contains around 50 million data, So row_number operation is very long time and lot of CPU's. I can see in explain plan that it always goes for full table scan and redistributes it again.How to improve this

Table structure :

CREATE TABLE SOURCE_DATA

(

KEY1 varchar(50),

KEY2 varchar(50),

KEY3 varchar(50),

Data_Value varchar(500),

Transaction_dttm timestamp(6))

)

Primary index ( key1,key2,key3);

Row_Number syntax:

Select

a.*

row_Number() over(partition by key1,key2,key3 order by Transaction_dttm asc)

FROM SOURCE_DATA;

Please suggest...

7 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-07-2014
07:11 AM

10-07-2014
07:11 AM

You can try something like this:

select .....,your_field, max(your_field) over (order by your_field rows between 1 preceding and 1 preceding) m

qualify m<your_field----- <for test>

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-07-2014
08:34 AM

10-07-2014
08:34 AM

I cant use that because the sample kind of logic is present in other tables where I've lot of the Data Values(around 20 values) , so I need to have 20 analytical functions which makes things much worse..

Let me explain my case with simple examples..

Source Data

Key1 Key2 Data_value1 Data_value2 Transaction_Dttm

1 a x y 2014-01-01 01:00:00

1 a x b 2014-01-01 02:00:00

1 a x b 2014-01-01 03:00:00

1 a x y 2014-01-01 04:00:00

2 a z z 2014-01-01 05:00:00

2 a z z 2014-01-01 06:00:00

Expected Output

Key1 Key2 Data_value1 Data_value2 Valid_From_Dttm Valid_To

1 a x y 2014-01-01 01:00:00 2014-01-01 01:59:59

1 a x b 2014-01-01 02:00:00 2014-01-01 03:59:59

1 a x y 2014-01-01 04:00:00 9999-12-31 23:59:59

2 a z z 2014-01-01 05:00:00 9999-12-31 23:59:59

How can i efficiently Transform like this ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-07-2014
11:48 AM

10-07-2014
11:48 AM

What is the logic, that you arrive at valid_to from the output of ist row , 2nd row, 3rd row, 4th row?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-08-2014
09:24 AM

10-08-2014
09:24 AM

Can you show your current process?

And what's your TD release?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-15-2014
09:10 AM

10-15-2014
09:10 AM

Hi Raja

I'll compare the current row with the Next row and drop the next row if the data value is same.

If the data value is different then Valid_To_dttm of current row will be next row's Transaction_dttm-1 second. This is to avoid duplications of data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-15-2014
09:24 AM

10-15-2014
09:24 AM

Hi Dieter

We have teradata 13.10 version.

Currently I'm creating the row number for each record for the same primary key and join with same table based on primary key +row number ,. something like this.

Consider this is the source data : Table_01

1 a x y 2014-01-01 01:00:00

1 a x b 2014-01-01 02:00:00

1 a x b 2014-01-01 03:00:00

1 a x y 2014-01-01 04:00:00

2 a z z 2014-01-01 05:00:00

2 a z z 2014-01-01 06:00:00

Step 2 : Adding row number : table_02

ins into table_02

select

a.*

,row_number() OVER (partition key1,key2 order by Transaction_dttm asc) as rownumber

from table_01 a;

Key1 Key2 Data_value1 Data_value2 Transaction_Dttm Rownumber

1 a x y 2014-01-01 01:00:00 1

1 a x b 2014-01-01 02:00:00 2

1 a x b 2014-01-01 03:00:00 3

1 a x y 2014-01-01 04:00:00 4

2 a z z 2014-01-01 05:00:00 1

2 a z z 2014-01-01 06:00:00 2

Step 03:

insert into table_03

select

a.key1

,a.key2

,a.data_value1

,a.data_value2

,a.transaction_dttm as new_valid_from_dttm

, COALESCE( MAXIMUM ( a.transaction_dttm ) OVER ( PARTITION BY a.key1 , a.key2 ORDER BY a.transaction_dttm ROWS BETWEEN 1 FOLLOWING

AND 1 FOLLOWING ) , date'9999-12-31' ) AS NEW_VALID_TO_DTTM,

,CASE

WHEN COALESCE( a.data_value1 , 'ZZ' ) = COALESCE( b.data_value1 , 'ZZ' )

AND COALESCE( a.data_value2 , 'ZZ' ) = COALESCE( b.data_value2 , 'ZZ' )

AND a.ROW_NUM > 1 THEN 0

ELSE 1

ENd as data_chk

from table_02 a join table_02 b

on a.key1=b.key1

and a.key2=b.key2

and a.rownumber=b.rownumber+1

Where data_chk=;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-16-2014
11:50 AM

10-16-2014
11:50 AM

Are you sure that this is returning the expected result?

Seems like you want to combine multiple consecutive rows with the same values into one, a slowly changing dimension? This is why there are Temporal Tables :)

You might try to utilize the TD_NORMALIZE_MEET function which works on Periods, so you must create a Period first:

INSERT INTO table_02

WITH cte AS

(

SELECT

Key1

,Key2

,Data_value1

,Data_value2

,PERIOD(Transaction_Dttm

,COALESCE(MAX(Transaction_Dttm)

OVER (PARTITION BY Key1, Key2

ORDER BY Transaction_Dttm

ROWS BETWEEN 1 FOLLOWING

AND 1 FOLLOWING)

,TIMESTAMP'9999-12-31 23:59:59.999999')

) AS pd

FROM table_01

)

SELECT

Key1

,Key2

,Data_value1

,Data_value2

,BEGIN(pd2)

,LAST(pd2)

FROM TABLE (TD_NORMALIZE_MEET

(NEW VARIANT_TYPE(cte.Key1

,cte.Key2

,cte.Data_value1

,cte.Data_value2)

,cte.pd)

RETURNS (Key1 INT

,Key2 CHAR

,Data_value1 CHAR

,Data_value2 CHAR

,pd2 PERIOD(TIMESTAMP(6))

,Nrm_Count INTEGER)

HASH BY Key1, Key2, Data_value1, Data_value2

LOCAL ORDER BY Key1, Key2, Data_value1, Data_value2, pd

) dt

Copyright © 2004-2015 Teradata Corporation. Your use of this Teradata website is governed by the Privacy Policy and the Terms of Use, including your rights to materials on this website, the rights you grant to your submissions to this website, and your responsibilities regarding your conduct on this website.

The Privacy Policy and Terms of Use for this Teradata website changed effective September 8, 2016.