We have a 3NF model implemented in Teradata and I have a table of the following format :
Surrogate Key Cust Id Cust Name X1 X2 X3 X4 Effective Start Date Effective End Date
1 X-Y23 John Smith 11 22 AB CD 05-10-2009 27-02-2010
2 X-Y41 Jane Doe 12 24 VC BS 24-12-2006 31-12-9999
3 X-Y23 John Smith 13 39 XZ SE 28-02-2010 31-12-9999
This is a huge table with millions of records and as you can see stores changes to customer records over a period of time - it is a historical data store.
What should be the PI for this table?
Should it be a NUPI on Cust Id OR
Should it be a UPI on Cust Id and Start Date, End Date with a NUSI on Cust Id
Can I have a PPI here?
if the number of rows per customer is not too high you should keep cust_id as PI.
When most of the queries access the current data (= where end_date = date '9999-12-31') you might partition by
case_n(end_date = date '9999-12-31', no case) to keep all current rows in a single partition. Of course there's overhead when you change end_date.
Btw, in TD13.10 there's the temporal feature to load & query slowly changing dimensions automatically.