Should a SCD table contain a UPI including date fields or use a natural key?

Database
Enthusiast

Should a SCD table contain a UPI including date fields or use a natural key?

At my new company we rarely use UPI and instead generally use a PI on 1 column and use a "Natural Key".  IE:

TABLE: acct_id, acct_typ, start_dt, end_dt, cur_in, rcv_in, metric1, metric2, metric3

With a primary index of acct_id

I would use a UPI on (acct_id, acct_typ, start_dt, end_dt) and my join would look something like:

JOIN ON a.acct_id = b.acct_id and a.actvy_dt between b.start_dt and b.end_dt

The reason that I was told we do not use UPI is that it is a SCD table and that you should not be joining on the date.  My thought is why would you have an SCD table if your not going to use historical records which would require you to use the dates in the table.  This is also my first experience with using a Natural Key.  Is that a common pratice?

Is there something I am missing here or should we be using an UPI instead of a PI on just acct_id?

In the example above curr_in flags the active record and rcv_in flags that we recieved the data from the source system.  It also seems odd that we are using rcv_in because if the record is deleted from the source wouldn't we wanted it gone from our DW?

Thank you for the opinions!

Don

2 REPLIES
Enthusiast

Re: Should a SCD table contain a UPI including date fields or use a natural key?

Tables that have historical records may always not be used with condition on Date columns. In most cases we would be interested only in the latest active record; While rarely a retrofit run (back dated run) may need to use the history data.

The issue with using UPI which includes Date fields also, is while Joining with other tables you need to use all the columns including the Date fields in the JOIN condition to get optimum performance. So PI on acct_id will be better, since we expect multiple joins in the Fact-Dimension model.

Having one of the fields of PI that is NULL-able (end_dt) is also not optimal. If possible have all columns as NOT NULL in PI

Enthusiast

Re: Should a SCD table contain a UPI including date fields or use a natural key?

I understand that logic but if the table is only going to be used to pull current information what is the intention of leaving expired records?  It seems to me that if you have a SCD table that it should be designed in a waythat allows for fast querying of historical data as well.

Is there any way to account for both?