I am a beginner in Teradata and would like to know the below
You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins.
Can you please elucidate this.
It's all about the access. Since you've proclaimed yourself a newbie, remember Primary Index <> Primary Key. PK's are logical constructs in the TD world.
TD uses the PI to distribute data across the amps/vprocs. PI's aren't necessarily used to maintain uniqueness, but the can be.
NUPI's can be chosen over UPI's to enable 'co-location' of frequently joined rows on the same amp, reducing or eliminating the need for spool or system overhead to fulfill the request.
Typical example, take an Order Header table with a PI of HeaderNumber, which can reasonably expected to be unique across the domain of Orders. The associated OrderLineItem table would require a composite Primary Index in order to maintain uniqueness, HeaderNumber and LineNumber.
If you use the PI to maintain uniqueness, you're going to end up with the line item records and the header records on different amps/vprocs. Since the records have to be on the same amp to be joined, you're forcing the spool and redistribution of the smaller of the two table, probably the OrderHeader table to co-locate the rows onto the same amps/vprocs as the larger, OrderLineItem table.
If we build the Header table with a UPI on HeaderNumber and the OrderLineItem table with a NUPI on HeaderNumber we can skip a lot of the overhead as the rows will now reside on the same amp/vproc. The HeaderNumber column, assuming they've been modeled correctly and are the same data type, will hash to the same amp, regardless of which table it's from.
If you need to maintain uniqueness on the OrderLineItem table (with the NUPI), this can be done via a USI on the table. Secondary indices don't affect data location, but they do require an index subtable and additional I/O to maintain that subtable.
But what about distribution you ask? You're going to introduce some skew to the larger table (with the NUPI) in most cases, perfect distribution of all tables is one of those dreams that crumbles in performance tuning exercises.
There are probably more than enough opinions about the threshold of acceptable skew to keep this thread alive until the next Mayan calender debacle or the Y10K panic, but I've always tried to keep it at 20-25% or below.
Have a great day and welcome.
If Both tables are joined on HeaderNumber then why line item records and the header records will end up on different amps/vprocs?
Reply greatly appreciated.
Consideration while choosing a PI:
Remember the UPI does not allow duplicates and the NUPI does.
The HeaderNumber will repeat in the Order Line Table so we cannot have the HeaderNumber as the UPI in that table to maintain uniqueness. So, we have to use a composit key (mentioned in para 3 of VandeBergB’s thread).
If we have composite UPI in on table and a single column UPI the row hash will be different hence the rows will end up on different AMP’s.