Scd type2

Tools & Utilities
Enthusiast

Scd type2

Hi all,

How do we implement Scd type 2 in Teradata.Please let me know the process.

Thanks,
kiran
3 REPLIES
Enthusiast

Re: Scd type2

SCD-Slowly Changing Dimension

It is capturing the slowly changing data which changes very slowly with respect to the time.

Example:
The address of a custumer may change in rare case. The address of a custumer never changes frequently.

There are 3 types of SCD.

Type1- Here the most recent changed data is stored
Type2- Here the recent data as well as all past data (historical data) is stored
Type3- Here partially historical data and recent data are stored. it means, it stores most recent update and most recent history.

As datawarehouse is a historical data, so type2 is more useful for it.
Apprentice

Re: Scd type2

Hi,

 

Not sure if you've sorted this by now, but if not try the following:

Start with some sample tables:

CREATE SET TABLE target_t
(key_col1 INTEGER NOT NULL
,key_col2 INTEGER NOT NULL
,data_col1 CHAR(10) NOT NULL
,data_col2 CHAR(10)
,start_dt DATE 
,end_dt DATE
)
PRIMARY INDEX(key_col1, key_col2);

CREATE SET TABLE staging_t
(key_col1 INTEGER NOT NULL
,key_col2 INTEGER NOT NULL
,data_col1 CHAR(10) NOT NULL
,data_col2 CHAR(10)
)
PRIMARY INDEX(key_col1, key_col2);

Now we set up some test data

DELETE FROM target_t;
INSERT INTO target_t VALUES(1,1,'aaa','aaa',DATE-10,DATE-7);
INSERT INTO target_t VALUES(1,1,'aaa','bbb',DATE-6,NULL);
INSERT INTO target_t VALUES(3,3,'aaa','ccc',DATE-20,NULL);
INSERT INTO target_t VALUES(4,4,'ddd',NULL,DATE-5,NULL);
DELETE FROM staging_t;
INSERT INTO staging_t VALUES(1,1,'aaa','bbb'); -- no changes
INSERT INTO staging_t VALUES(2,2,'aaa','aaa'); -- new key value
INSERT INTO staging_t VALUES(3,3,'ccc','aaa'); -- changes
INSERT INTO staging_t VALUES(4,4,'ddd',NULL); -- no changes includes NULL data values

The following SQL handles NEW, CHANGED and UNCHANGED rows (based on matching on key columns and comparing values in data columns).

-- This 'terminates' all current rows where we have a change
UPDATE target_t FROM (SELECT stg.key_col1 AS key_col1_end
                            ,stg.key_col2 AS key_col2_end
                      FROM staging_t AS stg
                      INNER JOIN target_t AS tgt
                         ON stg.key_col1 = tgt.key_col1
                         AND stg.key_col2 = tgt.key_col2
                         AND tgt.end_dt IS NULL -- only looks at 'current row'
                         AND stg.data_col1 <> tgt.data_col1 -- handles NOT NULL columns
                         AND (stg.data_col2 IS NULL AND tgt.data_col2 IS NOT NULL
                              OR stg.data_col2 IS NOT NULL AND tgt.data_col2 IS NULL
                              OR stg.data_col2 <> tgt.data_col2
                              ) -- handles NULLable columns
                      ) AS dt1
    SET end_dt = DATE-1
WHERE key_col1 = dt1.key_col1_end
  AND key_col2 = dt1.key_col2_end
  AND end_dt IS NULL
;INSERT INTO target_t -- this insert new rows and copies across new versions of changed rows
SELECT stg.key_col1
      ,stg.key_col2
      ,stg.data_col1
      ,stg.data_col2
      ,DATE AS start_dt
      ,NULL AS end_dt
FROM staging_t AS stg
LEFT OUTER JOIN target_t AS tgt
 ON stg.key_col1 = tgt.key_col1
 AND stg.key_col2 = tgt.key_col2
 AND stg.data_col1 = tgt.data_col1 -- handle NOT NULL data columns
 AND (stg.data_col2 IS NULL AND tgt.data_col2 IS NULL
	  OR stg.data_col2 IS NOT NULL AND tgt.data_col2 IS NOT NULL
	  OR stg.data_col2 = tgt.data_col2
	  ) -- handles NULLable columns
 AND (tgt.end_dt = DATE - 1 OR tgt.end_dt IS NULL) -- only process 'current row' or the row 'terminated' by previous update
WHERE (tgt.key_col1 IS NULL
                 OR tgt.key_col2 IS NULL
                 );

Yes the above makes a few assumptions:

- you are using separate DATE columns to identify the start and end period for each row

- the 'current row' is identified by END_DT having a value of NULL

 

There are other ways to do these things, but this approach is very common. Change as required if you have a different design.

 

Your post talked about using TPT. What I've provided above is the logic, how you execute that is (largely irrelevant). if you are using TPT then your job probably consists of:

TPT DDL Operator - empty the staging table

TPT Load operator - load the staging table from your source

TPT DDL Operator - run the MSR shown above

 

Let me know what you think.

 

Cheers,

Dave

 

 

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Apprentice

Re: Scd type2

Hi,

 

Just realsied that you didn't specifically mention TPT (sorry, I got confused with anotehr post) however the logic above is probably still useable.

 

Also, if your table is defined as a 'temporal' table then you can get the dbms to do som eof the work for you.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com