Can anyone let me know with justifications and steps which one to choose of two evils :). Kimball methodology or Inmon methodology for SCD type 2 implementation.
How do we handle in case of Inmon method or in Teradata, since I can see that a semantic layer sits on top of 3NF.
Thanks and regards,
Raja, I am not aware that Inmon has a methodology for SCD Type 2 dimensions. That term was coined by Kimball and he has thoroughly expounded on it. Are you referring to handling SCD Type 2 in a Inmon 3NF warehouse with a semantic (data view) layer to create the dimensions? If this is the case the question is about the justification and steps to create an Type 2 dimension in a semantic layer.
Your statement indicates you know Teradata methodology advocates a 3NF with a view layer for dimensions. If performance considerations make this technique undoable the fallback technique is Join Indexes. The third and least palatable option is a dependent datamart dimension. Other options include "vertical partitioning" of a dimension based on how rapidly the attributes change and how large the dimension is. In "The Data Warehouse Lifecycle Tool Kit" Kimball covers many aspects of SCD 2 dimensions. His treatment is based on creating dependent data marts with surrogate dimension keys. (1st edition p 180). His work is foundational, profound, and I am a big fan... but it is not the only way.
I am aware of two methods for creating Type II dimensions with database views. Both require the underlying 3NF data warehouse to have a form of "temporal" tracking. Here are two situations:
Situation 1: The data warehouse keeps effective dates for each row in the pertinent tables. This means there are one or more pairs of dates that are maintained by ETL processes, and perhaps a flag to designate the current row. In this situation the dimension generated contains effective dates and there may be multiple rows with the same key. This requires a "non-unique" Primary Index. Unless the query constrains on the dimension effective dates it may return multiple rows per fact row. To quote Kimball in the above book... "This is what we want to happen." (1st edition p 181.) This can be overcome by using multiple database view sets, some of which contain only "current" rows.
Situation 2: The data warehouse uses Teradata "temporal" features. Using temporal features one can designate that the database insert and maintain effective dates. This frees the ETL from extensive processes. When writing SQL it allows the coder to control the time period in the scope of the query with relative ease. It also frees the query from extensive date logic which reduces the size and complexity. This can be specified on a table by table basis.
Is "temporal" logic easy? No. Any comprehensive enduring solution isn't easy... but it is our job to understand, and we must be skilled at understanding and implementing it.
If data volume is quite high then performance is going to be an issue always , I mean views on top of 3 nf. JI also engenders storage issue. Dependent datamart dimension again is space-consuming.
Temporal tracking , I sort of agreeing since it reduces the work in ETL processing. I have not implemented temporal features( validtime,transactiontime or bitemporal) , though I know its historical tracking capabilities.
I am worried when you mention about the logic not being easy. I hope it is as straightforward as creating any DDL :).
Thanks and regards,