scenario: We have two tables: BT for base, and HT for history of the base table. The primary index of BT consist of one column bt_id, HT has all the columns of BT plus a few extra columns: ht_id (which is sequence generated numeric column and is unique for HT), change_type, change_date, etc..There are 3 candidate PI designs for HT:
opt1. HT.PI consists of ht_id only;
opt2. HT.PI consists of bt_id only;
opt3. HT.PI consists of bt_id plus something else e.g. change_type and change_date.
Question, which PI design for HT is better for queries of the pattern select * from BT, HT where BT.bt_id=HT.bt_id, or there are other better options?
Assumptions: both BT and HT are fact+ tables of very large data size; HT can be viewed as a child table of BT.
More theoretically, in teradata, if two tables A, B have similar but not exactly the same PI designs: A.PI=(col1), B.PI=(col1,col2); A.col1 is PK of A; and B.col1 is a FK of A. Then how the data of A,B would be distributed over all AMPs, would the rows of A and B with the same col1 values be hashed into the same AMPs? and as related what kind of PI design of B would best support the frequent query pattern is select * from A,B where A.col1=B.col1 and A.col1 in (...)?
To add, in order for data in A and B to have the same data distribution with respect to col1, is it necessary to let both A and B to have exactly the same PI or it is ok for A.PI=(col1) and B.PI=(col1, something else)?
opt2. HT.PI consists of bt_id only will be best for your case.
When Two tables are joined on a column,which is primary index of each of the joined table then there is no need to Redistribute / copy the rows from either of the table. This way the Join is faster as it is AMP local.
You need to have exactly the same PI is both tables in order to have same data distribution.
Thanks for your quick and clear reply!. But I still have two further questions: 1. Does teradata doc say explicitly or imply that for two identical tables A,B with only name and PI differences, such that if A.PI=(col1), and B.PI=(col1, col2), or A.PI=(col1, col3), and B.PI=(col1, col2). Then the data distribution for A and B will be different, and how different (near random or not)? 2. Given tables A, B as specified above, does teradata have any to data hashing distribution so that records with the same values of col1 for both A and B to into the same AMPs?