Design Issue

Database
Enthusiast

Design Issue

I have surrogate key column and non unique key column.I have to assign primary index. Which column i have to choose (either surrogate key column or non-unique column) in order to make effcient design?
4 REPLIES
Enthusiast

Re: Design Issue

The thumb rule is PI: It is used for data distribution.

PI can be decided based on Frequency of usage in Where Clause columns (either it could be surrogate key / natural key)

If the non-unique column is having low cardinality (say M/F) then your NUPI can cause skewness in AMPS.

Thanks
Toad
Enthusiast

Re: Design Issue

Hi,

To add more on choosing PI first do the data profiling:
Consider both the column as candidate columns for PI selection and identify the following stats for each of them:
1. Number of rows
2. Number of distinct values
3. Typical rows per value
4. Number of NULL values
5. Value Access Frequency - Frequency of usage of this column in Where Clause
6. Join Access Frequency - Frequency of usage of this column in Join

Based on the above stats give priority to the 6 and 7 items, and you can have the data skewness upto 20% if that column is going to be mostly used in Join/Where.

Regards,
Balamurugan
Enthusiast

Re: Design Issue

Hi

Its a trade off between data distribution and retrival. Surrogate key can be a good PI candidate for data distribution but wont be effective in queries on the table. Thus if the table PI doesnt exist in queries, then the retrival would be slow because of higher number of AMPs involvement,data redistribution or dublication.
Enthusiast

Re: Design Issue

I believe you are in physical database design stage. Now in order to choose a good PI here you should have complete ELDM ( extended logical datamodel design ) The ELDM components are "join clause and Data demographics"

Join Clause inculdes :
1)join access frequency
2)Value access frequency
3)Value access rows

Data Demographics include:
1)Distinct rows
2)Max rows per value
3)max rows null
4)typical rows per value
5)change rating

Note: A great PI will have : high value access frequency ,join access frequency, reasonable distribution and a change rating below 2 (out of 10).