Database

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-25-2014
07:13 AM

01-25-2014
07:13 AM

Hello everybody,

I've a table which has over 20 billion record for now, and it'll get bigger everyday.

So Ive problem with generate skey, because it takes to much time.

I use ROW_NUMBER() + SEL MAX_ID logic. but even SEL MAX_ID statement took over 20 minutes.

such that big table, shoul I use any other logic to create skey.

thanks.

13 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-25-2014
07:56 AM

01-25-2014
07:56 AM

I used same method when loading from staging to target. I think you use staging too. Do you use? Please let me know.

I feel that at any cost, you have to do it, one way or the other. My thought.Now I can think of

running the job during low traffic time(maybe night time).

Cheers,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-25-2014
08:05 AM

01-25-2014
08:05 AM

first I create skey from staging, then I load data to target using that skey

what you mean with "one way or the other". can you tell about more pls.

thnks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-26-2014
10:52 PM

01-26-2014
10:52 PM

So , yes it is the same concept we did. I think it is a better way to do this way,rather than using sequence or identity number. There may be smarter ways of doing, if someone can share.

Cheers,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-27-2014
02:52 AM

01-27-2014
02:52 AM

Hi

In my knowledge, within teradata there are 3 ways to generate skey.

1. Indentity Column, 2. Hashing Algorithm and 3. Analytical Function

The 3rd option is more effective. We can use either ROW_NUMBER() or CSUM() to generate skey. But as the table will grow so the processing time.

Hope with new TD version they will provide sequence generator option like oracle.

However, you can try generating skey thorugh any ETL/ELT tools also.

I agree with Raja, if the experts here can share their experience for a smarter way that would be great.

Thanking You

Santanu

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-28-2014
02:38 AM

01-28-2014
02:38 AM

Hi,

The third one is discussed in detail with explanation from expert:

sequence generator option like oracle: It has loop hole too, if you use currval, nextval etc when rerun and mess up your data and want to go for correction.

Yes in ETL tool, like Ab Initio(next_in_sequence etc), informatica..sequence geneartor, SAP DS..key_generation etc.

Cheers,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-28-2014
09:19 AM

01-28-2014
09:19 AM

How about storing the max value in a separate table instead of calculating every time?

For example, let table sk_seed have two columns (tablename varchar(128), next_sk bigint) with tablename being the PI, then psuedo code for generating surrogate key would be something like:

bt;

-- lock the seed row for write in case of concurrent usage

update sk_seed set next_sk = next_sk+0 where tablename = 'staging_table';

-- generate and use surrogate keys

select (select next_sk from sk_seed where tablename = 'staging_table')

+ row_number() over(order by natural_key) as surrogate_key

from staging_table;

-- update next_sk for the next iteration

update sk_seed set next_sk = next_sk + (select count(*) from staging_table) where tablename = 'staging_table';

et;

One downside of this approach is, if surrogate keys generation logic isn't centralized to use above logic then sk_seed table may get out-of-sync and you'll get duplicate surrogate keys.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-31-2014
08:47 PM

01-31-2014
08:47 PM

Is there any compulsion in using ROW_NUMBER?

From my experience, CSUM and ROW_NUMBER can cause the response time to slow-down considerably.

For smaller tables they work fine, but for larger tables i prefer using [and is getting much better results] following logic:

SELECT

SUM(1) OVER(ROWS UNBOUNDED PRECEDING) + AliasMAX.MAX_VALUE AS SK

,Col1

,Col2

FROM LRD.Table1 LRD

LEFT OUTER JOIN

EDW.Table1 EDW

ON

EDW.ID = LRD.ID

,

(

SELECT

COALESCE(MAX(ID), 0) AS MAX_VALUE

FROM

EDW.Table1

) AliasMAX

HTH!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-10-2014
07:10 AM

02-10-2014
07:10 AM

Hi Adeel,

did you check DBQL for resources used?

*SUM(1) or COUNT(*) OVER (ROWS UNBOUNDED PRECEDING)* (without PARTITION BY) should be worse than *ROW_NUMBER:*

There will be an implicit ORDER BY over **all** columns while ROW_NUMBER can sort by less than all columns, so SUM should consume more spool and more CPU.

If there's PARTITION BY then SUM actually doesn't sort -> faster than ROW_NUMBER

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-18-2014
09:37 AM

02-18-2014
09:37 AM

Indeed Dieter .... I wrote SQL on-the-fly and missed PARTITION BY in there. Thanks! :)

Copyright © 2004-2015 Teradata Corporation. Your use of this Teradata website is governed by the Privacy Policy and the Terms of Use, including your rights to materials on this website, the rights you grant to your submissions to this website, and your responsibilities regarding your conduct on this website.

The Privacy Policy and Terms of Use for this Teradata website changed effective September 8, 2016.