Database

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-14-2014
10:02 PM

08-14-2014
10:02 PM

Hi

I was wondering if there was a way to calculate the skew of an existing table with a different choice of PI. I am aware of the hashing functions, such that the following query will show the distribution of rows across the amps based on the new PI, however with a a system with hundreds of AMPs, it would be nice to determine the skew factor value:

SELECT

hashamp(hashbucket(hashrow( new_PI_column_list ))) as ampnum

,count(*)

from <database>.<tablename>

group by 1

order by 2

Cheers

Steven

6 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-15-2014
12:15 AM

08-15-2014
12:15 AM

I am not aware of any dictionary table. However, if it is me, I will write a script which will loop through a table, reading fields and (also composite if need be) and then redirect the output to a file preferably and not table. In this way, an automation script can read for all databases and tables required and provide outputs for all fields or composite fields if required.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-15-2014
05:57 AM

08-15-2014
05:57 AM

Hi Steven,

how do you define the skew factor?

I use this for calculating the percent deviation from average:

SELECT

HASHAMP(HASHBUCKET(HASHROW(col))) AS vproc,

COUNT(*) AS cnt,

100 * (cnt - AVG(cnt) OVER ()) / AVG(cnt) OVER () (DEC(8,2)) AS deviation

FROM tab

GROUP BY 1

And based on the count per AMP you can do the skew calclation:

SELECT

SUM(cnt) AS RowCount

,MAX(SkewedAMP) AS SkewedAMP

-- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP

,MAX(cnt) / NULLIF(AVG(cnt),0) (DEC(5,2)) AS SkewFactor

-- skew factor, between 0 and 99. Same calculation as WinDDI/ TD Administrator

,(100 - (AVG(cnt) / NULLIF(MAX(cnt),0) * 100)) (DEC(3,0)) AS SkewFactor_WINDDI

FROM

(

SELECT

HASHAMP(HASHBUCKET(HASHROW(col))) AS vproc,

COUNT(*) AS cnt,

100 * (cnt - AVG(cnt) OVER ()) / AVG(cnt) OVER () (DEC(8,2)) AS deviation,

CASE WHEN cnt = MAX(cnt) OVER () THEN vproc END AS SkewedAMP

FROM tab

GROUP BY 1

) AS dt

And for big tables you might better use a SAMPLE of a few percent instead of aggregating all rows

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-15-2014
06:28 AM

08-15-2014
06:28 AM

and this works too...:)

select

sum(tallyset.rowtally)

,min(tallyset.rowtally)

,max(tallyset.rowtally)

,avg(tallyset.rowtally)

,100 - (avg(tallyset.rowtally)/max(tallyset.rowtally) *100) as skewfactor

from

(select

hashamp(hashbucket(hashrow(<candidate columns>))) as hashedamp

,count(*) as rowtally

from

<dbname.tablename>

group by 1) as tallyset

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-15-2014
08:41 AM

08-15-2014
08:41 AM

Here is a different approach in case this helps you.

For a given PI of a table the skew factor is based on the count of the values of the records across the amps. Just determine it based on the unique value. SEL PI COLUMNS, count(*) from <databasename>.<tablename> group by PI columns. usually if the count(*) is not distributed evenly then the table is skewed i.e., if the count of a particular value is in the order of some thousands and the minimum of another PI value is in the order of few, then the table is heavily skewed.

Take the new PI columns you wanted to check for, and check the distribution as explained above. if it is even distribution of the counts across the various values, then you can consider it for changing it to the new PI.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-17-2014
05:06 PM

08-17-2014
05:06 PM

Thanks everybody for your replies.

Steven

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

06-10-2015
03:03 PM

06-10-2015
03:03 PM

Hi Dieter,

using your 2nd query above, how much of skewfactor and SkewFactor_WINDDI is accpetable ? i have it for a table as 1.50 and 33.

Thanks !

Samir

Copyright © 2004-2015 Teradata Corporation. Your use of this Teradata website is governed by the Privacy Policy and the Terms of Use, including your rights to materials on this website, the rights you grant to your submissions to this website, and your responsibilities regarding your conduct on this website.

The Privacy Policy and Terms of Use for this Teradata website changed effective September 8, 2016.