When are secondary indexes required?

Tools
Enthusiast

When are secondary indexes required?

Hi All,

I understand that a Secondary Index is a 2-AMP or an all-AMP operation, and also the subtable that is created and its contents.

However, I am not sure in what scenarios should a secondary index (unique or non-unique) be defined.

I would really appreciate some help here.

Thanks,

Aarsh

Tags (1)
19 REPLIES
Enthusiast

Re: When are secondary indexes required?

greetings,

Secondary indexes are typically defined to improve performance with regard to residual conditions in the where clause.  There are multiple ways to improve performance of these conditions, including but not limited to row level paritioning, secondary indexes etc...

Secondary indexes will ONLY be used by the opitmizer if they are very selective, nearly (>95%) selective.  You can find out if the SI is being used by examining the explain plan of the query (ies) that you are tuning.  The first secondary index created on a table will be identified as "index#4", with each subsequent index as a higher mutliple of 4.

One of the more common uses for a Unique Secondary Index arises with the need to maintain uniqueness on a column or columns that are NOT the primary index.  The primary index is NOT a primary key.  The key function of the PI is to define the most common access path while maintaining acceptable data skew, NOT maintaining uniqueness.

cheers!

Enthusiast

Re: When are secondary indexes required?

There may be cases where queries may not use PI. Then SI comes into the picture to enhance performance and chance of avoiding FTS.Value ordered NUSI is recommended for range queries.

They can be created and dropped anytime.

Understanding the business requirement and design, drives us to create SIs.

Cheers,

Raja

Enthusiast

Re: When are secondary indexes required?

Hi Arash,

Secondary indexes are mostly used to improve acces on the queries that use a non PI column in search conditions.

Following are some common uses of SI:

  1. If a non PI column is being used in where clause often, define SI on it
  2. you can use USI to enforce uniqueness in a PPI table where partition columns are not part of PI
  3. You can define NUSI in a PPI table to make a PI access single amp.
Khurram
Enthusiast

Re: When are secondary indexes required?

Hi Saeed

I am just curious to know more about your last comment (may be what I have interpreted is wrong).

"NUSI in a PPI table to make a PI access single amp" - would you kindly elaborate more.

As per my understanding, if it is a NUPI table with PPI and we run SQL only with PI column equality check in WHERE predicate it will end up in 1 AMP operation but will scan all partitions. Can NUSI improve this?

Thanks

Santanu

Enthusiast

Re: When are secondary indexes required?

Hi San,

Yes, PI will lead you to single AMP but will probe all the partitions. Thats why it is recomended to use NUSI consisting of same solumns as NUPI on top of PPI. and when this NUSI is access via where clause it will avoid scanning all the partitions and will lead you directly to the desired row.

Some considerations for NUSI over PPI are:

  1. Eliminate partition probing with PI access
  2. uses row hash locks
  3. 1 amp operation
  4. Can be used with NUPI or UPI
  5. Must be in Where condition
  6. NUSI single amp operation is only supported on PPI
  7. You can use multiload to load table

Please let me know if you need further guidance :)

Khurram
Enthusiast

Re: When are secondary indexes required?

Hi Saeed

Thanks for your quick reply. But when I am trying the same in my TD DB I am getting a different result. I am not able to understand why is it different than the desired result.

CREATE MULTISET TABLE SCPLN_W.SAMPLE1, NO FALLBACK

(

TRANS_ID INTEGER,

REGION_CD INTEGER,

TRANSACTION_DT DATE

)

PRIMARY INDEX(TRANS_ID)

PARTITION BY RANGE_N(TRANSACTION_DT BETWEEN  DATE '2013-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' DAY, NO RANGE)

;

CREATE INDEX(TRANS_ID) ON SCPLN_W.SAMPLE1;

COLLECT STATS ON SCPLN_W.SAMPLE1 INDEX(TRANS_ID);

SELECT *

FROM SCPLN_W.SAMPLE1

WHERE TRANS_ID = 1001

;

Explain SELECT *

FROM SCPLN_W.SAMPLE1

WHERE TRANS_ID = 1001

;

  1) First, we do a single-AMP RETRIEVE step from all partitions of

     SCPLN_W.SAMPLE1 by way of the primary index

     "SCPLN_W.SAMPLE1.TRANS_ID = 1001" with a residual condition of (

     "SCPLN_W.SAMPLE1.TRANS_ID = 1001") into Spool 1 (one-amp), which

     is built locally on that AMP.  The size of Spool 1 is estimated

     with high confidence to be 1 row (33 bytes).  The estimated time

     for this step is 0.00 seconds. 

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.00 seconds. 

Please let me konw your response.

Thanks

Santanu


Enthusiast

Re: When are secondary indexes required?

San,

Can you please confirm how much data do you have in this table?

The optimizer will always go for the most optimal plan, if it finds that scanning all partitions is more efficient than using NUSI, it will go for PI access scanning all partitions.

The final decision is made by the optimizer :)

Can you please share the demographics of this table?

Khurram
Enthusiast

Re: When are secondary indexes required?

Hi Saeed

Here is the detail.

1. There are 12499 data in the table.

2. The typical rows / value for TRANSACTION_DT is 3000

3. Even though it is defined as MULTISET NUPI but the TRANS_ID field contains all unique value

Do you need any other demographic information?

I think you are right, optimizer is deciding to go for 1-AMP all partition scan for this data volume, if not any other reason behind this.

Thanks

Santanu

Enthusiast

Re: When are secondary indexes required?

Saeed,

You've partitioned the table on Transaction_Dt, but the query is asking for Trans_id.  Regardless of the demographics, it will never give you DPE with that configuration.

As a test, repartition the table by Trans_id.  You've only got 12499 rows according to your most recent post, so you could partition the table for each '1' Trans_id.  If you run the same query you will see the explain plan pull from one partition.

If you want to pull the transactions by date, you'll need add the Transaction_Dt column in the partitioning.