I am doing physical Database design in my project. LDM(3rd normal form relation model) is done and I am in the process of selecting primary index for the tables.
According to my understanding we must consider both Access path(Join processing) and Data distribution while selecting the primary index. I have three types of tables in the LDM.
Reference Tables which store only code values and thier description. Examples are Account type table which contain code value as SA and description as Savings Account. In this scenario primary index selection is straight forward because there is only one column(Account_cd) which will be used to join with other tables and it also satisfies the condition for Data distribution. I can choose the Code value column as primary index
Parent tables like Customer
In this case too the primary index selection is straight forward because Customer_id is ideal for both Data distribution and Access path(Join processing). customer_id will be used as Foreign key in other child tables and the foreign key will be used to join with primary key of the parent table and so it helps in join processing. Customer_cd cannot be considered because as primary index candidate because of the skew factor.
Even though Customer_Addr_id is unique and will ensure Data distribution it wont be used while joining with other tables so we must go for Customer_id as primary index because Customer_Id which is the foreign key will be used to join with its parent table often.
I have below mentioned questions:
1.There will be some skew if we choose Customer_id as primary index because it is not unique. Please let me know how much skew is acceptable.
2.Are primary indexes selected only after source data has been loaded into Teradata staging tables because in most cases source won't be Teradata and we cannot calculate skew factor using dbc.tablesize if data is not in Teradata .Our source data happens to be in SQL Server. I can calculate the skew factor using dbc.tablesize only after the data has been loaded into Terdata staging tables.Please let me know if there is a way to calculate skew factor using number of rows in source(SQL Server).
3.I have seen few tables in which Audit columns(like Batch_id) being included in Primary index. My understanding is that this should not be done because it may gurantee unique distribution but it will create performance issues while joining with other tables . Please let me know if my understanding its correct.
4.Please let me know if there is any kind of analysis to be done during Data profiling of Source data which is necessary for primary index selection.
Please point if there is any gap in my understanding and Thanks in advance.
Thanks and Regards,