Please suggest on an approach to use a STJI

Database
Enthusiast

Please suggest on an approach to use a STJI

I have a situation here. I have a very big table (> 50 mn records and will be loaded with 1 mn everyday ) and that

Table is one of the transaction tables we are maintaining in our Data Warehouse. We have 10 such transaction tables

and all share the same PI (MESSAGE_ID). These tables are joined frequently and hence share the same PI to avoid Data

Re-distribution of such huge data volumes across the AMPS.

All the Transaction tables are well distributed apart from this which is the biggest of them all. This tables has

lot of rows stitting on one AMP hence lot of skew.

After the initial analysis I have found that if we add a date column (LOAD_DT) along with the existing PI then it ends

in a close to even distribution but that date column would not be used in the joins with other transaction tables hence resulting in re-distribution of

either of the transaction tables.

I am planning to change the PI of the TABLE to (MESSAGE_ID, LOAD_DT) and create a STJI on the same table with all

the required columns used in the reports and then set the PI of that join index as (MESSAGE_ID) so that they could

still benefit from PI to PI joins with Other TRAN tables. I am still to test this approach and see if the STJI is used

or not but is that right way to go or anybody else has any better suggestion.

Please advice !!

Manik

1 REPLY
Teradata Employee

Re: Please suggest on an approach to use a STJI

First is to figure out why it is skewed. The name MessageId implies something that is faily unique but a fairly unique field should hash and distribute nicely. Since adding Load_Dt to the PI makes it distribute better, it suggests that there are duplicate MessageIds across different load dates. I would look for data quality errors where there are zeros, nulls or something like that coming in every day in the MessagId field.

re you proposed solution: the JI on messageid will distribute exactly as badly as the PI on message ID. The JI may be smaller if it has only a subset of columns but it will be just as skewed. And of course it will be a load time cost to maintain the JI. Probably not worthwhile in this case unless the table has a large number of columns that can be left out of the JI.