skew factor

Database
Enthusiast

Re: skew factor

Hi,

The data distribution of table among AMPs is called Skew Factor.....

Generally For Non-Unique PI we get duplicate values so the

more duplicate vales we get more the data have same rowhash

so all the same data will come to same amp, it makes data

distribution inequality,

One amp will store more data and other amp stores less

amount of data, when we are accessing full table, The amp

which is having more data will take longer time and makes

other amps waiting which leads processing wastage

A skew factor of 0 indicates that the data is perfectly

distributed across all the AMPS

In this situation (unequal distribution of data)we get Skew

Factor High........

For this type of tables we should avoid full table scans

Skew factor is distribution of rows of a table among the

available no.of AMP's.

If your table has a chance of using unique primary index,it

is always better to use UPI which ensures the skew factor

around 0%.

If there is no chance of having unique values column in a

table choose a column as PI(primary index) which has less

duplicate values which inturn results in less skew factor.

That is the data will be distributed almost(not exactly

equal percentage) equally to all AMP's.....

ex:

AMP0 AMP1

10000(10%) 9000000(90%)

in this situation skew factor is very high 90%

Re: skew factor

could any one please tell how we choose our primary indexes

Fan

Re: skew factor

Hello,

if the table and views contains the same values in all record in 3 out of 7 columns;  does it impact the performance?

I was told that having all values the same does tend to lead to skewing. Is that correct statment? Does it make sense to remove those 3 columns with the same values for performance and space reazon?

thank you.  

Ambassador

Re: skew factor

The table will only be skewed if the Primary Index is based on those columns.

Of course it's strange when there's only a single value in a column, what's the business meaning?

If you drop the column you can add it as a literal to the view-definition. But if there might be other values in future, better apply COMPRESS 'thisvalue' and it's using almost no space.

Fan

Re: skew factor

Thank you Dieter for your reply!

the columns are - effective_dt, expiration-dt, load_ts and the values in effective_dt and load_ts are identical in all rows. The expiration_dt has the same value in all rows as well. And there is not much of a business value at all. It was design that way before. The question is will it improve performance if those columns are dropped. The entire table is only 1.8 gb. So maybe just leave it alone? 

efct_dt       Expr_dt         load_ts

5/16/2016  12/21/9999   5/16/2016

Ambassador

Re: skew factor

The table will be smaller when you drop those columns (or COMPRESS them), at least a Full Table Scan will be faster.

Fan

Re: skew factor

Thank you Dieter 

Enthusiast

Re: skew factor

A NUPI column with high number of NULL result in high skew factor.

if yes then  why ? as NULL cannot be compared with each other.

Enthusiast

Re: skew factor

This is because of the location of all NULLs on a same (and single) AMP, according to the hash process

Pierre

Enthusiast

Re: skew factor

BTW,

i just notice a case where a small table is displayed with a high skew by TD Administartor (15.00):

Space by AMP gives 8 lines with almost the whole of 1.3 M lines on one AMP, although a query shows a good distribution on all AMPs (60).

after copy on a new temp table TD Administrator display is correct, i mean identical to the query result: how to explain that ?

It looks like TD Admin had lost information on the first table.

Pierre