Let's start by understanding what a primary index does. A primary index (PI) is used to distribute the rows of a table to the AMPs and on each AMP to order the rows by hash value within the combined partitions defined by a PARTITION BY clause (if this clause is not specified, there is just one partition, that is, the entire table).
One purpose of column-storage (implemented using physical rows called containers) for a column partition is to provide row header compression by having sequential values with increasing uniqueness without having an explicit row header (or rowid) for each value. This means that, for a column-partitioned table with a PI, a container would only be able to contain values with the same internal partition number and hash value. This is likely to cause over-partitioning (partitioning to too fine of a granularity). That is, each container would have one or only a few values. The row header overhead for these containers would cause a large increase (up to 23 times) in space usage. The maintenance overhead could be very expensive.
Some of the complications of having a PI for a CP table could be avoided if the PI was only used for distribution of rows to AMPs but not to use it to order rows within combined partitions on the AMPs. This would allow single-AMP access improving tactical queries against a CP table. Also, this would allow for local-based joins on the PI columns. However, merge joins where the rows are ordered by the hash of the PI columns would not be applicable and other local join methods would need to be modified or implemented.
While there are some use cases that might benefit when rows are distributed by a hash value (instead of randomly or locally) and either ordered by the hash value within a combined partition or not (in the case the column partitions are wide), they also require further implementation effort. As the feature matures, enhancement opportunities will be explored.
Note: all join methods are supported with a CP table. However for some cases, selected rows from the CP table may need to be distributed/spooled/sorted prior to a join step (such as a merge join). Since only the needed columns are scanned, a scan of the CP table can be much more efficient than scanning a non-CP table where entire rows must be read.
If a PI or partitioned PI (PPI) is needed for some queries (tactical queries, primary index joins, etc.), a CP table can have a join index on it where the join index does have a PI, PPI, or MLPPI (but not column partitioning). Alternatively, a PI, PPI, or MLPPI table could have a join index on it where the join index (possibly sparse) has column partitioning (but not a primary index) plus optionally one or more levels of row partitioning. Either of these would allow the benefits of both PI/PPI/MLPPI and column partitioning, albeit with the extra space usage and maintenance of the join index. The optimizer would pick the table or the join index depending on which one was better suited for the query.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.