Do we need an intelligent key for each subject area

Data Modeling

Do we need an intelligent key for each subject area

Hi,

Can anyone please suggest if I need an intelligent key for each subject area? What are the pros and cons of  implementing for each and every subject area?

Thanks and regards,

Raja

6 REPLIES
Teradata Employee

Re: Do we need an intelligent key for each subject area

Raja, Help me understand your question before I attempt a response.  When you say "subject area" are you refering to a Teradata industry model subject?  If not, please explain.  When you say intelligent key do you mean a natural key?

Teradata Employee

Re: Do we need an intelligent key for each subject area

Hi Raja, I going to risk an assumption and suppose that by "subject area" you mean each major subject of the model.  For example, in the Teradata Healthcare model we have 13 major subject areas.  Each major subject has a central theme that serves well as a Primary Index for most entities in the subject area.  e.g. In the Clinical subject the central theme is a clinical encounter, thus a good PK on most entities is Encounter Id.  I will also assume the meaning of your use of "intelligent key" corresponds to my use of "natural key" as opposed to a "surrogate key".  And also, any performance aspects of this answer are from a Teradata database viewpoint.

So to rephrase your question into an example... Would I use an natural key for Encounter Id?  Yes I would.  Here are some conditions:

1. Among the candidate keys there is a suitable numeric encounter number available.  Numeric data types provide the most compact and fast keys.  Our logical models use data type of INTEGER.

2. The encounter number would often be used for retrieval.

3. That encounter number provides a good distribution and uniqueness.

4. You don't foresee a situation where another source system will generate colliding keys.  These situations are often unforeseen and can be due to mergers and acquisitions.

To further address your question... Would I use a natural key for all subject areas?  The short answer "Yes".  The careful answer is that it would depend on the candidate keys available (see above).  The answer by subject area depends on the candidate keys available for each subject area.  Teradata Solution Methodology (TSM) provides excellent guidance for selecting a good PI from candidate keys.  Most consultants I know at Teradata have no problem using a natural key when a good one is available.  Many prefer a natural key... as do I.

The following are some links that address the pro and con in detail:

http://www.tdan.com/view-articles/5178

http://www.agiledata.org/essays/keys.html

Points for Surrogate keys

http://www.teradataforum.com/teradata/20111201_103612.htm

About surrogate key generation techniques

http://developer.teradata.com/blog/mtmoura/2011/12/lets-talk-about-surrogate-key-generation

Re: Do we need an intelligent key for each subject area

Hi Arnett,

Yes I am referring to the subject areas of Teradata if there are areas where we need Intelligent key. Also, when I am referring to the intelligent key it is general.

I know natural keys, business keys, Surrogate key etc. I see the importance of intelligent key. However, I am looking for cases, where intelligent key is indisputably unavoidable.

Intelligent key is not natural key.

Thanks and regards,

Raja

Teradata Employee

Re: Do we need an intelligent key for each subject area

Raja,

I cannot offer any indisputable case for an intelligent key.  If a natural key is not suitable I would use a surrogate key.  However, a natural key might be also an intelligent key.  For example, the USA issued Social Security Number (SSN) is an intelligent key made of three pieces (area number, group number, serial number).  It could be used for a natural key if one was certain there would be no duplicates.  Alas, even the SSN is vulnerable to duplicates.  Regardless, for one to make use of the intelligence in the SSN one would have to break it into three fields, and in that form it would not necessarily make a good key.

Teradata Employee

Re: Do we need an intelligent key for each subject area

An "intelligent key", as I understand it, serves as a unique identifier, but also contains within itself information that can be parsed out.  For example, if a company has account numbers that originated from several formerly independent companies that are now merged, there is the possibility that there is duplication of numbers across the former companies, so a unique account number is now a combination of the old account number and a source code of some type, and potentially other columns like an origination date since account numbers might have been reused at some point.  All columns must be specified in all joins in order to ensure you are getting the right account, and the more columns you have the more cumbersome it is to write all those joins.  So, I've seen cases where an Intelligent Key is specified such that it is composed of all the data elements strung together.  For example...

Normalized Account Number (add leading/trailing zeros to create a uniform number length) || Source Code || Origination Date (as 'YYYYMMDD') || Sequence Number (2 digits, just in case there's still a duplicate)

This works, but it is often very large, and there is always, always, the temptation to parse out the information contained within the key, and sometimes even perform joins with it.  This is the primary reason I am never a fan of Intelligent Keys.

Surrogate Keys, on the other hand, can be a great way to unify disparate natural key structures while also allowing you to specify single column indexes on tables to improve performance and reduce fatigue when having to write all those join statements.  A surrogate key, using the example above, would be initiated in one table that would house the natural key columns (not all of which would have to be specified - for example, one source may have a single account number, another an account number and a customer code, so for the first source customer code could be NULL without violating PK rules), and map the natural key elements to a single surrogate that would then be used throughout the data warehouse.  This requires that the first and last step to most import and export queries would be to translate natural keys to the surrogate, and then back from surrogate to natural when going out the door.  This is not necessarily problematic when compared to all the work required to maintain all columns on all tables. 

Surrogate keys are not a requirement, but if you are dealing with even the possibility of natural keys existing in multiple formats, even in the future, implementing a surrogate key will allow you to preserve the table and index structure on almost all of your warehouse tables even as you adapt to new natural keys since you will only need to change the "home" table.  Otherwise this kind of change can cause literally months of rework when a new natural key structure must be adapted since it will require changes to every table that uses the natural key, and potentially any intelligent key that is derived from existing natural keys.  So if using natural keys is not possible for you, go with a surrogate, but avoid intelligent keys at all costs.

Hint: I often specify a single table join index on this translation table that has a PI of the natural key columns so that translation of the input data is as efficient as possible.  This tends to work better than a USI on the table.

Re: Do we need an intelligent key for each subject area

Hi Jake,

I totally agree with you .I hope I can implement this if I have heteregenous sources to answer business questions in a proper way. But I will never use this for join :).

Thanks and regards,

Raja