The data distribution of table among AMPs is called Skew Factor.....
Generally For Non-Unique PI we get duplicate values so the
more duplicate vales we get more the data have same rowhash
so all the same data will come to same amp, it makes data
One amp will store more data and other amp stores less
amount of data, when we are accessing full table, The amp
which is having more data will take longer time and makes
other amps waiting which leads processing wastage
A skew factor of 0 indicates that the data is perfectly
distributed across all the AMPS
In this situation (unequal distribution of data)we get Skew
For this type of tables we should avoid full table scans
Skew factor is distribution of rows of a table among the
available no.of AMP's.
If your table has a chance of using unique primary index,it
is always better to use UPI which ensures the skew factor
If there is no chance of having unique values column in a
table choose a column as PI(primary index) which has less
duplicate values which inturn results in less skew factor.
That is the data will be distributed almost(not exactly
equal percentage) equally to all AMP's.....
in this situation skew factor is very high 90%
if the table and views contains the same values in all record in 3 out of 7 columns; does it impact the performance?
I was told that having all values the same does tend to lead to skewing. Is that correct statment? Does it make sense to remove those 3 columns with the same values for performance and space reazon?
The table will only be skewed if the Primary Index is based on those columns.
Of course it's strange when there's only a single value in a column, what's the business meaning?
If you drop the column you can add it as a literal to the view-definition. But if there might be other values in future, better apply
COMPRESS 'thisvalue' and it's using almost no space.
Thank you Dieter for your reply!
the columns are - effective_dt, expiration-dt, load_ts and the values in effective_dt and load_ts are identical in all rows. The expiration_dt has the same value in all rows as well. And there is not much of a business value at all. It was design that way before. The question is will it improve performance if those columns are dropped. The entire table is only 1.8 gb. So maybe just leave it alone?
efct_dt Expr_dt load_ts
5/16/2016 12/21/9999 5/16/2016
i just notice a case where a small table is displayed with a high skew by TD Administartor (15.00):
Space by AMP gives 8 lines with almost the whole of 1.3 M lines on one AMP, although a query shows a good distribution on all AMPs (60).
after copy on a new temp table TD Administrator display is correct, i mean identical to the query result: how to explain that ?
It looks like TD Admin had lost information on the first table.