Using a PPI column in PI

Data Modeling
Enthusiast

Using a PPI column in PI

Hi All,

I have a table that has PI consisting of Col_A and Col_B. Col_C is a timestamp(6) field which is a PPI.

As per the current data, the table has a skew factor of 75%.

Therefore, I created an identical table, this time with PI consisting of Col_A, Col_B and Col_C, with Col_C again as PPI.

Now, when I loaded the records that are in the existing table to this new one, the skew factor is around 5%.

However, I am not sure if this has any other implications in terms of data processing times or anything else.

Please advise.

Thanks,

Aarsh

Tags (4)
4 REPLIES
Teradata Employee

Re: Using a PPI column in PI

Enthusiast

Re: Using a PPI column in PI

hi

i have  PI as colA and now the skew is 80%. i added one more col to that pi i.e my proposed pi is (colA,colB) AND the skew is 7%. is there any impact on downstream. like my table is  A->B->C->D. I am working on B if changed PI at B is it impact on C & D. How resolve this?

Enthusiast

Re: Using a PPI column in PI

if your downstream queries don't access the new composite PI you'll have full table scans.  what defines the most common access path to the table with colA and colB?  You've got an extreme case of 80% skew and 7% skew.  I'd suspect that your initial PI doesn't define the most common access path because of the data clustering in colA.

build some test cases and see what the explains look like

-cheers

Teradata Employee

Re: Using a PPI column in PI

So table B is second in the join path.  If the joins to Table B do not already include the new index column then there will definitely be an impact - Table B will need to undergo a full table scan if additional WHERE clause conditions exist, and then either it may need to be redistributed to complete the joins to the other tables.  Whether or not the index change impacts the other joins depends on the PI of the other tables, and if they join only on colA, or on colA & colB.

Let's assume that Table C and Table D have different PI's than Table B, both before and after the change.  This will require the row redistribution of one table to complete the join provided the join is on both colA & colB.  If the join is only on colA, then it is possible that both tables in each join will need to be redistributed (if Table B was being redistributed in the old join plan then it will likely once again be redistributed post-change, leaving the rest of the join plan unchanged).  The problem with this is that the redistribution of Table B will be by the old PI column, colA, which will likely cause the spool file created to have that 80% skew factor you're seeing currently.  This can be even more problematic than a skewed table because you will incur the performance hit of the skewed table redistribution every time this join is performed. To make matters worse, if there are no residual WHERE clause conditions that means you could be building your original table with the old PI in spool over and over again.  In this case it would have been better for you to leave the original PI alone and incur the space hit of the skewed index - all other things being equal.

If you do change the PI, and colB is available in the other tables then the joins should be changed to include both columns if they do not already do so.