How does Teradata store/retrieve Partition ID?

Database
Enthusiast

How does Teradata store/retrieve Partition ID?

Hi gurus,

How does Teradata store/retrieve Partition ID/number information?

Consider the scenario where a query/insert statement specifies the PI and Partition information (PI contains the column used in partition - simply say a Date column)

Based on TD manuals:

STEP 1: PE creates four-part message composed of Table Id, Partition information, Row Hash and PI value

- 48 bit Table Id is located via Data Dictionary

- 32-bit Row Hash value is generated by the Hashing algorithm

- Partition information and PI value come from the SQL request

- THE PE (via Data Dict) KNOWS IF A TABLE HAS A PPI AND DETERMINES THE PARTITIONS TO INCLUDE IN THE REQUEST BASED ON THE SQL REQUEST

STEP 2: The Message Passing Layer uses a portion of the Row Hash to determine to which AMP to send the request

- It uses the DSW portion of the Row Hash (first 16 or 20 bits) to locate a bucket in the Hash Map(s).

- This bucket identifies to which AMP the PE will send the request

STEP 3: The AMP uses the Table Id, Partition Number(s), and Row Hash to identify and locate the proper data block(s).

STEP 4: The AMP then uses the Row Hash and PI value to locate the specific row(s).


- The PI value is required to distinguish between Hash Synonyms

- Each data row will have the Partition number stored within it

Question:

I understand that in STEP 1, query contains the Partition information, say you included the partition value in the WHERE clause

When you jump to STEP 3, it already mentioned that the AMP uses the "Partition Number"... now, the way I understand it is that "Partition Information" is the value you specified in the query and "Partition Number" is the actual Partition Number that will be stored in the data row.

Question is, in what part isthe "Partition information" converted into actual "Partition Number"? I initially thought that Hash Maps will contain the Partition Number if it's a PPI. But I keep on reading that the bucket you get from the Hash map will only tell you to which AMP the PE will send the request, it never mentioned that you can get the Partition number from that as well - which is what STEP 2 is saying.

So the only place where it seemed that this Partition information to actual Partition number was converted was in STEP 1 where it says:

- THE PE (via Data Dict) KNOWS IF A TABLE HAS A PPI AND DETERMINES THE PARTITIONS TO INCLUDE IN THE REQUEST BASED ON THE SQL REQUEST

Does this sentence say that in this step, it determintes the "actual partition number" based on the "partition information" you get from the SQL Request?

This has been bothering me since last Thursday guys, I really wanted to think that you get the partition number from the Hash Map but the manuals don't say it explicitly.

Thanks!

Pat

3 REPLIES
Senior Apprentice

Re: How does Teradata store/retrieve Partition ID?

Hi Pat,

the Hashmap is used to ditribute the rows across AMPs, partitioning simply changes the sort order within in each AMP from "sorted by hash" to "sorted by partition, then by hash".

Let's assume 5 years from 2008 to 2012 each month -> 60 partitions:

A row from 2011-11 calculates to the external partition number 47, shown by the PARTITION keyword. The internally used partition number within this record will be 49, because there's a always a predefined number 1 and 2 for the NO RANGE and UNKNOWN partitions whereas they're always the last partitions externally, e.g. 61 and 62.

The mapping between internal and external partition number is stored within the table header and allows for a change of the covered range, e.g. when the 2008 partitions are dropped, the external number will change to 35, but the internal is still 49.

If the partitioning definitionn doesn't use RANGE_N or CASE_N then the result of the partitioning calculation is directly used a parition number.

Dieter

Enthusiast

Re: How does Teradata store/retrieve Partition ID?

Thanks for the quick reply Dieter.

So I get one thing cleared out - Hash Map is used to distribute rows across AMPs, it doesn't do anything about the Partitions.

As for the partition numbers:

I actually deduced earlier that Partition numbers must be understood straightforward by the PE when it said:

- THE PE (via Data Dict) KNOWS IF A TABLE HAS A PPI AND DETERMINES THE PARTITIONS TO INCLUDE IN THE REQUEST BASED ON THE SQL REQUEST

I just missed out to understand properly the test query I ran

select partition from table1 where... etc

when it gave a result of "1062", I immediately thought that this must be some internal number because I was expecting "1" because I only inserted one row. But after you explained the external and internal partition numbers and the sample you gave, that made me remember that even though it's just one record, it will be inserted to the proper partition number depending on the range that I gave. Thanks a lot for this!

This my remaining question now and I think I'll get it all cleared out.

When you mentioned that the mapping of external and internal partition numbers are stored in the table header, in what STEP does it actually does a lookup on the Table Header?

Is it right before STEP 3: The AMP uses the Table Id, Partition Number(s), and Row Hash to identify and locate the proper data block(s).

It reads the table header and gets the equivalent internal partition number then locates the proper data block?

Or does it locate the proper data block before it's able to know the corresponding Table Header?

Where is this table header stored? Is it in the AMP?

I remember MVC also keeps the mapping in the table header..

Thanks!

Pat

Senior Apprentice

Re: How does Teradata store/retrieve Partition ID?

Hi Pat,

the table header is stored on each AMP as a special subtable, When you create a new empty table and query dbc.TableSizeV you'll see the same permspace used on each AMP, it's the table header.

Everything an AMP needs to work with a table is stored there, column/index/partition/MVC info.

The optimizer probably calculates the internal partition number in step 1 if it's possible, i.e. it's found in a WHERE-condition using hardcoded values.

But otherwise, e.g. for an Insert/Select it must be calculated by the target AMP.

Dieter