Hi everyone,
I have a performance related query.
Lets say I have to create a table for future use which will have some 2 billion records.
tables structure
----------------
table_1
(col_a, col_b, col_c) {all columns may carry duplicates}
now in the future two different join operation is going to happen with table_2 and table_3
lets say
table_1 A inner join table_2 B
on(A.col_a = B.col_a)
table_1 A inner join table_2 B
on (A.col_b = B.col_b)
Now my question is - to get the best performance should we declare composite primry index on table_1
like primary_index(col_a,col_b) or should we go like primary_index(col_a) and secondary_index(col_b).
which will give better performance?
Hi,
For the joins described, the composite primary index (col_a, col_b) is useless. It will never be used. A PI is only used in the join when all PI columns are included in the join criteria with equality conditions and those conditions are AND'd together.
But a PI controls distribution as well as helping joins and/or selection. IMHO, for a 2 billion row table you have to get the distribution good enough (not necessarily perfect). The Teradata dbms has other index types (nusi, hash, join) which can then be used for joins and/or selection.
Have a look at Physical Design - Primary Index and Join Planning.
HTH
Dave
Thanks Dave.