Data Skew


Data Skew

I noticed that last night during our data warehouse run that some log entries were not occurring towards the end of the run.

The log generated from our controlling linux problem suggested that the load completed to the end but our teradata log table stopped getting entries about 4 stored procedures from the end.

I ran a ske indicator piece of sql that showed our two main log tables with very high skew approx 82%.

Would this be a reason for the log entries not going in?

Re: Data Skew

I think the fix is to backup the current table, redefine it with a more suitable primary index and copy back in the data to the new table.

I think this will redistribute the data across the amps again based on the new primary key. Correct?

Re: Data Skew

Although from what you have mentioned recreating the table with a new PI ( mind it not PK they are not the same in Teradata) should be a viable option, but I would suggest you do the below :

a) Do a data demographics analysis. You can use the Help Stats functionality.
b) Based on the above see if the skew is not because of lots of NULLs in the current PI. I had a similar experience when a table was very much skewed, The problem was a large number of Nulls in the PI columns. This was resulting in a Hot AMP. I filled the Nulls with Dummy value and that promtly removed the skew.
c) Use Index Wizard if you want.

Let me know if these worked.

Re: Data Skew

I've checked the stats.

select hashamp(hashbucket(hashrow(job_id))) ,count(*)
from log_table
group by 1;

11 14
0 32
7 4
13 13122 this is the problem
1 30
5 32
9 236
4 6
2 10
12 12
3 883
10 8
8 30
6 28

I know exactly why this happened. There's a very valid if very stupid explanation.
There is a date field i could use to give a very even partition.
There is also one other column I could use(a description column).

What I really want to know is:
If i drop the table and re-create with a new PI will it automatically rebalance the data across the amps?


Re: Data Skew

dropping the table and recreating with a new PI will result in automatic reallocation of the data across AMPs.


Re: Data Skew