bahaviour of primary index with two columns


bahaviour of primary index with two columns

scenario: We have two tables: BT for base, and HT for history of the base table. The primary index of BT consist of one column bt_id, HT has all the columns of BT plus a few extra columns: ht_id (which is sequence generated numeric column and is unique for HT), change_type, change_date, etc..There are 3 candidate PI designs for HT:

opt1. HT.PI consists of ht_id only;

opt2. HT.PI consists of bt_id only;

opt3. HT.PI consists of bt_id plus something else e.g. change_type and change_date.

Question, which PI design for HT is better for queries of the pattern select * from BT, HT where BT.bt_id=HT.bt_id, or there are other better options?

Assumptions: both BT and HT are fact+ tables of very large data size; HT can be viewed as a child table of BT.

More theoretically, in teradata, if two tables A, B have similar but not exactly the same PI designs: A.PI=(col1), B.PI=(col1,col2); A.col1 is PK of A; and B.col1 is a FK of A. Then how the data of A,B would be distributed over all AMPs, would the rows of A and B with the same col1 values be hashed into the same AMPs? and as related what kind of PI design of B would best support the frequent query pattern is select * from A,B where A.col1=B.col1 and A.col1 in (...)?

To add, in order for data in A and B to have the same data distribution with respect to col1, is it necessary to let both A and B to have exactly the same PI or it is ok for A.PI=(col1) and B.PI=(col1, something else)?

Thanks in advance

Re: bahaviour of primary index with two columns

opt2. HT.PI consists of bt_id only will be best for your case.

When Two tables are joined on a column,which is primary index of each of the joined table then there is no need to Redistribute / copy the rows from either of the table. This way the Join is faster as it is AMP local.

You need to have exactly the same PI is both tables in order to have same data distribution.


Re: bahaviour of primary index with two columns

Thanks for your quick and clear reply!.
But I still have two further questions:
1. Does teradata doc say explicitly or imply that for two identical tables A,B with only name and PI differences, such that if A.PI=(col1), and B.PI=(col1, col2), or A.PI=(col1, col3), and B.PI=(col1, col2). Then the data distribution for A and B will be different, and how different (near random or not)?
2. Given tables A, B as specified above, does teradata have any to data hashing distribution so that records with the same values of col1 for both A and B to into the same AMPs?

Teradata Employee

Re: bahaviour of primary index with two columns

Teradata calculates a RowHash based on the values of the Primary Index fields, and uses that to determine the AMPs (via hash map "buckets"). The same set of values always has the same RowHash.

Investigate for yourself using Teradata SQL functions: