Database

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-29-2007
11:16 PM

10-29-2007
11:16 PM

Hi,

There is one data demographic - Typical rows per value. At the time of PI selection we consider this demographic.

Is it the number of duplicate value in a column ?.

Please explain .

Regards,

Ayes

There is one data demographic - Typical rows per value. At the time of PI selection we consider this demographic.

Is it the number of duplicate value in a column ?.

Please explain .

Regards,

Ayes

5 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-30-2007
12:15 AM

10-30-2007
12:15 AM

Typical row/value for a column col1, can be explained like this ...

do a

sel col1, count(*) cnt

from mytable

group by col1

;

now check what is the maximum repeated value for cnt (ie the most recurring count value) that's the typical row/value for col1.

for example if the above query gave the following result set,

col1 cnt

12 3

11 2

5 5

30 3

27 3

22 2

21 3

you can see that cnt=3 has 4 records, cnt=2 has 2 records and cnt=5 has only 1 record. So since cnt=3 has maximum number of records (4), the typical rows per value for col1 is 3.

Note that this can get complicated at times... for example what if there were 4 records with cnt=2 ? ... in this case we would stick to the maximum value of cnt, ie it would be still cnt=3 that would be chosen. (worst case scenario).

You can read more about this in detail in the Database design manual, "maximum and typical value frequencies"... there are also case studies provided with examples...

do a

sel col1, count(*) cnt

from mytable

group by col1

;

now check what is the maximum repeated value for cnt (ie the most recurring count value) that's the typical row/value for col1.

for example if the above query gave the following result set,

col1 cnt

12 3

11 2

5 5

30 3

27 3

22 2

21 3

you can see that cnt=3 has 4 records, cnt=2 has 2 records and cnt=5 has only 1 record. So since cnt=3 has maximum number of records (4), the typical rows per value for col1 is 3.

Note that this can get complicated at times... for example what if there were 4 records with cnt=2 ? ... in this case we would stick to the maximum value of cnt, ie it would be still cnt=3 that would be chosen. (worst case scenario).

You can read more about this in detail in the Database design manual, "maximum and typical value frequencies"... there are also case studies provided with examples...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-30-2007
12:41 AM

10-30-2007
12:41 AM

Thanks you so much. It's clear to me now

Regards,

Ayes

Regards,

Ayes

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-30-2007
01:47 AM

10-30-2007
01:47 AM

You can use this value along with "Max rows per value" to see if there are chances of data skew.

If those two value differs by a big margin then there are chances of high data skew which might lead to some issues!!(Hot Amp, Spool space issues etc)

Regards,

Annal T

If those two value differs by a big margin then there are chances of high data skew which might lead to some issues!!(Hot Amp, Spool space issues etc)

Regards,

Annal T

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-30-2007
06:37 AM

10-30-2007
06:37 AM

Thanks for further clarification. Can you please let me know how to calculate maximum row/value?.

Regards,

Ayes

Regards,

Ayes

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10-30-2007
06:51 AM

10-30-2007
06:51 AM

Its done in the same way.

In the count(*) option select the maximum one.

In the above example you can see that the max rows per value is 5.

Here the typical rows per value and max rows per value dont differ much, so this column might give good distribution.

But suppose there is a value "99" for which we have 1000 records, then it indicates that there will be a a data skew.

Regards,

Annal T

In the count(*) option select the maximum one.

In the above example you can see that the max rows per value is 5.

Here the typical rows per value and max rows per value dont differ much, so this column might give good distribution.

But suppose there is a value "99" for which we have 1000 records, then it indicates that there will be a a data skew.

Regards,

Annal T