Why can't we create Partitioned Primary Index (PPI) on a column that is not a part of Unique Primary Index UPI eventhough there won't be any difference in the data distribution while including PPI? But we were allowed to create PPI on columns other than NUPI.
"Ok so we have an agreement that it takes the same amount of effort to decide to which AMP a row should go whether the table is PPI or non PPI.
Ok given that I was inserting record (409, 33).
So let's say AMP X is the target for hashing on 409.
So the AMP is decided. Everything is cool so far.
for the non PPI table AMP will use that hash, search on that hash and check if 409 is already there ( remember that the table is sorted on row hash, so search is faster )
Now for the PPI table let' say we had 40 depts, then AMP has to probe in all the partitions (within a partition it's sorted on rowhash, but you still don't know the parition to look for a duplicate value) to check for the row hash of 409 that's because there could be a (409, 12) already there or may be a (409, 39) is already there ... and so on .... which means that AMP has to go to each of the 40 partitions and search for this empno to ensure that it's not already there. But the catch is it doesn't know which partitions can contain 409, so it has to search all the partitions.
So what's the big deal ?
Big deal is that searching over multiple partitions are highly time consuming that searching one giant partition (I mean non - PPI table).
It's like the above Tree example I gave you. it's faster to search in a single tree of 100 nodes than in 10 trees of 10 nodes each. ( Seriously if you do that exercise you would understand the difference ). It's a common error to think that both operations take same time, that's why I asked you to do that exercise, in reality the amount of search time required is quite a few many times more.
And if you remember in TD tables can have thousands of partitions, which makes the amount of time quite HUGE.
Due to this time consuming effort in searching for duplicate PI values in all partitions, it was decided not allow a UPI.
BTW I mean to say "it's inefficient to search in all the partitions" in my previous post. I am sorry if I miscommunicated that.
Now let me put another question. Let's say you need to find a book in a library. you know only the name of the book, not whether it's a Chemistry book or a Physics book or so.
All the books irrespective of the Dept are arranged in the order of the alphabets.
All the books are divided into different departments. and within department the are arranged in the alphabetical order.
Which would help you find the book faster given you don't know the dept to which the book belong ? P.S. Imagine you are in library of congress and you have tones of depts, for this question !"
Needs some more clarification on the resolution provided. The above justification is fine when a single column is made as UPI, but consider a scenario of composite UPI...
Eg. Table has 5 columns (col1, col2, col3, col4, col5), the UPI is derived from combination of (col1, col2, col3) and col2 is made as PPI. This snipet is allowed in Teradata and never throws an error. As per the table definition the ROWID will be created on the combination of the columns (col1, col2, col3) and the partitions are made on col2 so the data within a AMP is partitioned based on col2 alone.
Hence the above DDL leads to the very same issue mentioned in the solution provided, i.e. "searching over multiple partitions are highly time consuming". Can any one provide a resolution which may help me in understanding the issue better...???
Multiple partitions are to be probed only when the partitioning expression is not based on PI columns. Only for that case it's not allowed to create a UPI: the row has to be inserted into one partition, but it's unknown if it already exists within another partition.
It's similar to a where-condition based on PI without partitioning column: The AMP is known (RowHash can be calculated), but partition is unknown.