What is Typical rows per value?

Database
Enthusiast

What is Typical rows per value?

Hi,

There is one data demographic - Typical rows per value. At the time of PI selection we consider this demographic.

Is it the number of duplicate value in a column ?.

Please explain .

Regards,
Ayes
5 REPLIES
Enthusiast

Re: What is Typical rows per value?

Typical row/value for a column col1, can be explained like this ...

do a

sel col1, count(*) cnt
from mytable
group by col1
;

now check what is the maximum repeated value for cnt (ie the most recurring count value) that's the typical row/value for col1.

for example if the above query gave the following result set,

col1 cnt
12 3
11 2
5 5
30 3
27 3
22 2
21 3

you can see that cnt=3 has 4 records, cnt=2 has 2 records and cnt=5 has only 1 record. So since cnt=3 has maximum number of records (4), the typical rows per value for col1 is 3.

Note that this can get complicated at times... for example what if there were 4 records with cnt=2 ? ... in this case we would stick to the maximum value of cnt, ie it would be still cnt=3 that would be chosen. (worst case scenario).

You can read more about this in detail in the Database design manual, "maximum and typical value frequencies"... there are also case studies provided with examples...
Enthusiast

Re: What is Typical rows per value?

Thanks you so much. It's clear to me now

Regards,
Ayes
Enthusiast

Re: What is Typical rows per value?

You can use this value along with "Max rows per value" to see if there are chances of data skew.

If those two value differs by a big margin then there are chances of high data skew which might lead to some issues!!(Hot Amp, Spool space issues etc)

Regards,
Annal T
Enthusiast

Re: What is Typical rows per value?

Thanks for further clarification. Can you please let me know how to calculate maximum row/value?.

Regards,
Ayes
Enthusiast

Re: What is Typical rows per value?

Its done in the same way.

In the count(*) option select the maximum one.

In the above example you can see that the max rows per value is 5.

Here the typical rows per value and max rows per value dont differ much, so this column might give good distribution.

But suppose there is a value "99" for which we have 1000 records, then it indicates that there will be a a data skew.

Regards,
Annal T