This has been a question which has been bugging me for a while.
When we are creating a new table in my project I am asked by my leads to use a column which has most/max number of unique values which makes a perfect sense. But the problem is to achieve this they are also asking me to create a Record_id/ETL transaction id - which would be a unique number generated for each row and to use this as unique primary index.
Does this make sense?Is this a good practise? What are the benefits of this approach? I read somewhere this would cause unnecessary overhead on the DB.
I was in an understanding that we should be using a data column/columns combination which would make the primary index for most part to be unique. Someone please help me understand?
Since your primary index will be used to access the data as well as distribute it using the suggestion below would address only distribution.
If there is not a combination of columns on the table that will be unique then a data modeler may need to be involved.
If you end up generating a dumb key, consider generating a sequential numeric column possibly using Identy column.
Thanks rjg.. but if i am creating a sequential number for the purpose of primary index how will this help in the performance of the queries as we are no where using those sequential numbers in the joins or where conditions?
It won't help with joins unless you propagate those IDs to the related tables. (For example, when creating an order in the Order table the application generates an Order_ID and then uses that value to create rows in the Order_Item table.) Likewise, it won't enable single-row access unless the query knows the generated ID. (However, single-row access can be helped in this case by a secondary index on something that is meaningful to the person issuing the query.)
In relational theory, a primary key is sort of a "natural" unique identifier of an entity instance. To continue the order example, this might be considered the order number, or it might be thought of as the CustomerName-Timestamp-Destination. Employee Number (usually sequentially assigned) might identify an Employee instead of some other combination of values which might not be guaranteed to be unique. But sometimes (often?) sequential numbers are entirely arbitrary and mean nothing to anyone, and you can see this in data models that include these in every single table, no matter what. As far as I know, the philosophical debate on whether this is good practice continues unresolved.