Size of partitions and PPI performance

Database
Fan

Size of partitions and PPI performance

Hi All,
We have a big table (~ 6 Bil rows) on which we are planning to create a PPI. From our initial analysis the data distribution within the partitions is very skewed (This column will be used in joins and qualifying data). The PI (PI is not part of PPI) for this table is unique (defined as non unique), so data distribution is fine on Amps. I would like to know if you have faced any performance issues due to uneven partitions (storage or access).
Thanks in advance.
-S
5 REPLIES
Enthusiast

Re: Size of partitions and PPI performance

If there are too many partitions and most of them dont have data, your performance would suffer.

But since you have used partitioning, why dont you use the new V2R6 feature of collecting stats on the system column PartitionID?

The Collect stats operation on this column is fast compared to regular collect stats and also the empty paritions can be identified and left out when reading data.

This will impact your performance positively!!!

Regards,
Annal T
Enthusiast

Re: Size of partitions and PPI performance

Hi Annal,

That was new to hear. Can you explain in brief about collect stats on system column partition id with example.

Is it useful only in PPI usage?

In what ways its different from column/index level collect stats.

Regards,
Sakthi
Enthusiast

Re: Size of partitions and PPI performance

Collecting statistics on the System column PARTITION(An ID/Number which identifies a Partition) will help the optimiser in coming up with a better plan when the read table has partitions.

If a partition is empty it would be left out from the read process even if its not eliminated directly by the query's Where Clause, with the help of the stats collected.

Assume we have 100 partitions and through our query we eliminate 50 paritions directly.

Now one needs to access and read 50 partitions. By collecting stats on the column PARTITION we get information like if the Partition is empty or not etc.

So if 25 partitions out of 50 are empty in reality , Stats would give this info to optimiser
and actually we would end up scanning just 25 Partitions instead of 50.

But this facility is available only from V2R6 and i guess its not available in V2R5.

Collecting stats on the system derived column Partition is faster because rather than reading all the base table rows for collecting information, it usually just scans the cylinder index for that PPI table
and collects information from that which is not the case with regular stats collection on user defined Columns.

Hope this info helps!!

Regards,
Annal T
Enthusiast

Re: Size of partitions and PPI performance

Hi,

Need help with partitions. Will the data in the NORANGE partitions automatically be moved to a new Partition that is created with an Alter table command ?

Thanks in advance

Anita

Enthusiast

Re: Size of partitions and PPI performance

Adding the new partition is done through MODIFY PRIMARY INDEX(...) statement, and once the primary idex is changed the data will be re-distributed accordingly. So yes the data present in NORANGE partition should be redistributed to the new related partitions.

I guess you are aware of the fact that only the RANGE partitions at the 'ends' of you partition scheme can be altered, dropped/added.