Reg: Partitioned Primary Index

Database

Reg: Partitioned Primary Index

Hi All,

Why can't we create Partitioned Primary Index (PPI) on a column that is not a part of Unique Primary Index UPI eventhough there won't be any difference in the data distribution while including PPI?
But we were allowed to create PPI on columns other than NUPI.

For Illustration:

CREATE TABLE SAMPLE_TBL
(
COL1 INTEGER,
COL2 INTEGER,
COL3 VARCHAR(20),
COL4 INTEGER,
COL5 CHAR(8)
)
UNIQUE PRIMARY INDEX (COL1,COL2)
PARTITION BY COL4;

This SQL statement will through an error like "We CAN NOT have a UNIQUE PRIMARY INDEX on a table that is partitioned by something not included in the Primary Index."

But the below mentioned SQL works fine

CREATE TABLE SAMPLE_TBL
(
COL1 INTEGER,
COL2 INTEGER,
COL3 VARCHAR(20),
COL4 INTEGER,
COL5 CHAR(8)
)
UNIQUE PRIMARY INDEX (COL1,COL2)
PARTITION BY COL2;

Kindly provide me an explation why the Teradata Design never accepts to create PPI on a column that is not a part of UPI.

4 REPLIES
Enthusiast

Re: Reg: Partitioned Primary Index

Hi Magesh,

Do refer the previous discussion on the same topic...

http://www.teradata.com/teradataForum/Topic6553-9-1.aspx?Highlight=PPI+UPI

Regards,
Balamurugan

Re: Reg: Partitioned Primary Index

Hi Balamurugan,

I got this resolution from the previous discussion on the same topic...

http://www.teradata.com/teradataForum/Topic6553-9-1.aspx?Highlight=PPI+UPI

"Ok so we have an agreement that it takes the same amount of effort to decide to which AMP a row should go whether the table is PPI or non PPI.

Ok given that I was inserting record (409, 33).

So let's say AMP X is the target for hashing on 409.

So the AMP is decided. Everything is cool so far.

for the non PPI table AMP will use that hash, search on that hash and check if 409 is already there ( remember that the table is sorted on row hash, so search is faster )

Now for the PPI table let' say we had 40 depts, then AMP has to probe in all the partitions (within a partition it's sorted on rowhash, but you still don't know the parition to look for a duplicate value) to check for the row hash of 409 that's because there could be a (409, 12) already there or may be a (409, 39) is already there ... and so on .... which means that AMP has to go to each of the 40 partitions and search for this empno to ensure that it's not already there. But the catch is it doesn't know which partitions can contain 409, so it has to search all the partitions.

So what's the big deal ?

Big deal is that searching over multiple partitions are highly time consuming that searching one giant partition (I mean non - PPI table).

It's like the above Tree example I gave you. it's faster to search in a single tree of 100 nodes than in 10 trees of 10 nodes each. ( Seriously if you do that exercise you would understand the difference ). It's a common error to think that both operations take same time, that's why I asked you to do that exercise, in reality the amount of search time required is quite a few many times more.

And if you remember in TD tables can have thousands of partitions, which makes the amount of time quite HUGE.

Due to this time consuming effort in searching for duplicate PI values in all partitions, it was decided not allow a UPI.

BTW I mean to say "it's inefficient to search in all the partitions" in my previous post. I am sorry if I miscommunicated that.

Now let me put another question. Let's say you need to find a book in a library. you know only the name of the book, not whether it's a Chemistry book or a Physics book or so.

Scenario (1)

All the books irrespective of the Dept are arranged in the order of the alphabets.

Scenario (2)

All the books are divided into different departments. and within department the are arranged in the alphabetical order.

Which would help you find the book faster given you don't know the dept to which the book belong ?
P.S. Imagine you are in library of congress and you have tones of depts, for this question !"

Needs some more clarification on the resolution provided. The above justification is fine when a single column is made as UPI, but consider a scenario of composite UPI...

Eg.
Table has 5 columns (col1, col2, col3, col4, col5), the UPI is derived from combination of (col1, col2, col3) and col2 is made as PPI. This snipet is allowed in Teradata and never throws an error. As per the table definition the ROWID will be created on the combination of the columns (col1, col2, col3) and the partitions are made on col2 so the data within a AMP is partitioned based on col2 alone.

Hence the above DDL leads to the very same issue mentioned in the solution provided, i.e. "searching over multiple partitions are highly time consuming". Can any one provide a resolution which may help me in understanding the issue better...???

With Regards,
Magesh Vasan P.

Enthusiast

Re: Reg: Partitioned Primary Index

In your example when we try to insert a row, TD uses c1,c2,c3 to find the AMP
and C2 to find the partition in that AMP and scans just that partition alone to check for duplicates.

Why would we go for scan across all partitions?

Regards,
Annal T
Junior Contributor

Re: Reg: Partitioned Primary Index

Multiple partitions are to be probed only when the partitioning expression is not based on PI columns.
Only for that case it's not allowed to create a UPI:
the row has to be inserted into one partition, but it's unknown if it already exists within another partition.

It's similar to a where-condition based on PI without partitioning column:
The AMP is known (RowHash can be calculated), but partition is unknown.

Dieter