Order of partition columns in MLPPI

Database
Enthusiast

Order of partition columns in MLPPI

Hi All

May be I have posted this question before in this forum, but I am reposting it because that time I did not get any reply. Now I have been asked in a technical meeting to suggest a solution. But I am still not sure how to decide this, as my understanding is still in grey area.

"What is best to consider while deciding the order of partition columns in MLPPI?"

So far what I have read very frequesntly in PDFs and web links is;

a) There are more contiguous partitions to scan and skip at the highest level.

b) You should consider specifying partitioning expression with greatest number of partitions at lowest level.

I am little confused and not able to figure out the gist from above 2 statements.

1. 

If in RANGE partition COL1 is creating 50 partitions and COL2 is creating 10 partitions, 

Shall I use COL1 at the higher level of MLPPI because it is getting more number of partitions than COL2?

Or this will not at all impact the order of partition hierarchy?

2.

If a data demographic says, COL1 has 50000 typical rows/value and COL2 has 1000 distinct values then,

Shall I use COL1 at higher level and COL2 at lower level or it should be the opposite?

What is the logic behind this decision?

Actually, my understanding is not clear about the order of partition columns in MLPPI.

If someone kindly explains this or shares any document or web link where it is depicted, then it will be really helpful. I need to know this answer very urgently.

Thanking You

Santanu

4 REPLIES
Enthusiast

Re: Order of partition columns in MLPPI

Hi

Anyone, any help or any response, please share.

Santanu

Enthusiast

Re: Order of partition columns in MLPPI

Hi San,

The base of decisions in defining MLPPI is to improve access, so you need to consider access and then decide about which col should be at the top and which at the lowest level.

The most fundamental thing in MLPPI is the number of data blocks you are going to touch. The more the number of data blocks are read, the worst the performance. 

So first of you need to check the data blocks that will be constructed against your combined partition. If the number of datablock for the combined partitions is large, then the order of partitions does not matter.

The second thing is to put the partition expression at the top that will be mostly access and have chances of partition elimination. Third you must define expression with max partitions at the lowest level, but if it is being accessed most then you can move it to top.

So in all cases, you need to consider the number of data blocks that will be accessed. try to design your expression that ultimately lead to minimum data blocks to be accessed.

for your scenario, Col1 with 50000 row/val will create 50 partitions and col2 with 1000 distinct values will create 10 partition, so I suggest you can define Col2 partitions at level 1, say in a queary 2 of these 10 partitions are being probed, and then each of these two partitions will lead to second level, second level contains more values per lock so the number of data blocks against these 2 level 1 partitions will be less. 

On the contrary, if you define col1 at level 1, out of the 50 partitions say 5 partitions are left after elmination, against each of these partitions, more second level partitions will be probed, and as the number of rows per block are less, there are chances that more blocks will be access and it will degrade the performance.

Hope I am able to explain what I wanted to say. Please let me know if you have any confusions.

Khurram
Enthusiast

Re: Order of partition columns in MLPPI

Page 301 of "SQL Request and Transaction Processing" describes with examples with MLPPI very nicely.They have taken a case of 65,535 partitions.

Enthusiast

Re: Order of partition columns in MLPPI

Thanks guys for your responses. I will consider them as my baseline and try to evaluate more. Probably I will post more questions if I need clarifications. :-)

Thanks

Santanu