UDA

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-31-2008
07:08 PM

03-31-2008
07:08 PM

We've a SET table defined with a UPI that has 9.3 Million rows on a 1400 AMP, V2R6 Teradata system. Please look at the numbers below. The AMP that has the MAX rows has 3.47 times more rows than the AMP with the MIN rows. This does not make sense to me given the fact that we've a UPI and a high volume of rows in the table for an even distribution.

xTimesGreaterMaxAvg - 1.56

xTimesGreaterMaxMin - 3.47

Mx - 1393664

Avrg - 890657.07

Mn - 401920

CurPerm - 1282546176

Can someone please shed some light on this? Is there something we can look at to improve data distribution? Any help would be greatly appreciated.

PS: The table also has a PPI defined on a date field which is part of the UPI (if this matters).

Thanks,

Sayee.

xTimesGreaterMaxAvg - 1.56

xTimesGreaterMaxMin - 3.47

Mx - 1393664

Avrg - 890657.07

Mn - 401920

CurPerm - 1282546176

Can someone please shed some light on this? Is there something we can look at to improve data distribution? Any help would be greatly appreciated.

PS: The table also has a PPI defined on a date field which is part of the UPI (if this matters).

Thanks,

Sayee.

3 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-01-2008
08:49 AM

04-01-2008
08:49 AM

Hi Sayee,

most of the skew is due to the high number of AMPs:

There are 65536 entries within the hashmap, so the average number of entries per AMP is:

SELECT 2**16/1400 -> 46,81

Of course there's no .81 entry, so some AMPs will get 46 and some 47:

SELECT 47/46.0000 -> 1,0217 -> 2.17 percent more data even if there's a UPI.

Now the table is still small (for a 1400 AMP system), 6642 rows per AMP in average, this is probably not enough for a perfectly even distribution. Now add this small skew to the 2.17 percent...

Dieter

most of the skew is due to the high number of AMPs:

There are 65536 entries within the hashmap, so the average number of entries per AMP is:

SELECT 2**16/1400 -> 46,81

Of course there's no .81 entry, so some AMPs will get 46 and some 47:

SELECT 47/46.0000 -> 1,0217 -> 2.17 percent more data even if there's a UPI.

Now the table is still small (for a 1400 AMP system), 6642 rows per AMP in average, this is probably not enough for a perfectly even distribution. Now add this small skew to the 2.17 percent...

Dieter

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-08-2008
02:15 PM

04-08-2008
02:15 PM

Thank you, Dieter! That was pretty good information. I never would have thought 9MM rows was not enough volume. :-)

We've more than 1 day's data in those tables now and we're seeing much better numbers (distribution).

We've more than 1 day's data in those tables now and we're seeing much better numbers (distribution).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-09-2016
06:18 AM

04-09-2016
06:18 AM

Hi,

You can get into troubles when values of your UPI are not distributed uniformly. Following example generates rows with such UPI values. All 100 rows of SKEWED_UPI_TABLE generated by SKEWED_UPI_VIEW belong to AMP number 1.

It is clear that this is very unlikely in the real world, but one cannot rule it out. ;-)

Martin

CREATE RECURSIVE VIEW SKEWED_UPI_VIEW (N, AMP_NO, AMP_1_CNT)

AS

(

SELECT t.*

FROM

(

SELECT

CAST(0 AS BIGINT) AS n_

, HASHAMP(HASHBUCKET(HASHROW(n_))) AS amp_no_

, CASE amp_no_ WHEN 1 THEN 1 ELSE 0 END AS amp_1_cnt_

) AS t

UNION ALL

SELECT

X.N + 1 AS N_

, HASHAMP(HASHBUCKET(HASHROW(N_))) AS AMP_NO_

, X.AMP_1_CNT + CASE AMP_NO_ WHEN 1 THEN 1 ELSE 0 END

FROM SKEWED_UPI_VIEW AS X

WHERE X.AMP_1_CNT < 100

)

;

CREATE MULTISET TABLE SKEWED_UPI_TABLE (N BIGINT)

UNIQUE PRIMARY INDEX (N)

;

INSERT INTO SKEWED_UPI_TABLE (N)

SELECT N

FROM SKEWED_UPI_VIEW

WHERE AMP_NO = 1

;

SELECT n, HASHAMP(HASHBUCKET(HASHROW(n))) AS amp_no

FROM SKEWED_UPI_TABLE

;

Copyright © 2004-2015 Teradata Corporation. Your use of this Teradata website is governed by the Privacy Policy and the Terms of Use, including your rights to materials on this website, the rights you grant to your submissions to this website, and your responsibilities regarding your conduct on this website.

The Privacy Policy and Terms of Use for this Teradata website changed effective September 8, 2016.