What are the reason(s) why a PK will be implemented as USI instead of UPI? This question was formulated based on the following statement:
"Primary keys are implemented as unique secondary indexes for non-temporal tables..."
Reference: "Database Design" (Release 14.0); page 296
Thank you in advance for any responses.
Providing added description:
I think that PK as UPI is better since it will uniformly distribute the rows across all AMPs since PK is always unique by definition.
So why use USI for a PK?
Thanks for the response; but why are there instances when a designer is implementing a PK as USI and not as UPI to evenly distribute rows in AMPs.
I am seeing some table definitions that have a UPI/NUPI defined but the column used is not the actual PK in LDM. The PK was implemented as a USI instead?
Thanks, Carlos; but why not implement PK as UPI at first? Why chose a different Primary Index other than the PK in LDM?
I can't tell. Ask your designer. But Primary Indexes and Primary Keys are two different things. The data distribution and the access to them may advice to choose a PI which is not the PK.
In my previous post I was trying to explain the mechanics of how Teradata implements PKs, not the decissions taken at design phase.
In my opinion, for large transaction tables the primary key is rarely the best option for a primary index. The primary index should be defined based on three things: distribution, access and volatility. The data should be well distributed, the primary index should be used to access the data, and the primary index values should very rarely (if ever) change.
If you think of a banking example, with 3 tables:
BANK_ACCOUNT - PK = ACCOUNT_NUMBER, SORT_CODE
BANK_ACCOUNT_HISTORY - PK = ACCOUNT_NUMBER, SORT_CODE, EFFECTIVE_PERIOD (period, timestamp(6))
BANK_ACCOUNT_TRANSACTION - PK = ACCOUNT_NUMBER, SORT_CODE, TRANSACTION_TIMESTAMP (timestamp(6))
If you create all the tables with the primary index of all of the columns in the primary key then the tables will all be well distributed. However when you join them together the rows will all be on different amps, so will need to be redistributed. If you run a query to get all the transactions for a single account for a date range then it would need to read the whole table. The BANK_ACCOUNT_TRANSACTION table would only be accessed using the primary index if you knew the transaction timestamp down to the nanosecond.
If the tables all had the same primary index of ACCOUNT_NUMBER, SORT_CODE the data would be very well distributed, though possibly not perfectly as you would have a few accounts with very large numbers of transactions. However all joins would be amp local. Any queries that wanted all the details for a single account and sort code would be single amp queries and be extremely fast.