When are join Indexes being used by the optimizer?

Database
Enthusiast

When are join Indexes being used by the optimizer?

Hi gurus,

Generally speaking, when are join indexes being used by the optimizer?

i.e. SELECT COUNT(*) query with date filter..

SPARSE JOIN INDEX 3 columns and a date filter on the base table 

Will the optimizer utilize the sparse join index?

Generally speaking, the query should specify the columns defined in the index? else it wouldn't use that index?

Secondly, even your query used all the columns defined in the index sometimes the optimizer may not choose that index if it sees that it is slower?

Need your thoughts on this

Thanks!

3 REPLIES
Senior Apprentice

Re: When are join Indexes being used by the optimizer?

Q3: Yes, if the query can be processed using only the JI the optimizer will probably use it (if it's more efficient than accessing the base table).

Q1: Based on Q3, should be Yes.

Q2: If additional columns are accessed the optimizer might still use the JI (if a unique column or the ROWID is included in the JI enabling a join back to the base table)

Only an Explain can reveal the actual usage :-)

Dieter

Enthusiast

Re: When are join Indexes being used by the optimizer?

CREATE SET TABLE Test2 ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT

(

Col1 INTEGER

,col2 DATE FORMAT 'YYYY-MM-DD'

, col3 VARCHAR(10)

)

UNIQUE PRIMARY INDEX ( Col1 );

create join index test2_join AS

select col1,col3

from test2

where col2 = CAST('2012-03-01' AS DATE FORMAT 'YYYY-MM-DD')

primary index(col1)


insert into test2 values

(1, CAST('2012-03-01' AS DATE FORMAT 'YYYY-MM-DD'),'pat')

Queries

select * from test2

where col3 = 'pat'

select * from test2

where col2 = CAST('2012-03-01' AS DATE FORMAT 'YYYY-MM-DD')



Below explain text for both queries:

  1) First, we lock a distinct TEST_DB."pseudo table" for read on a

     RowHash to prevent global deadlock for TEST_DB.test2. 

  2) Next, we lock TEST_DB.test2 for read. 

  3) We do an all-AMPs RETRIEVE step from TEST_DB.test2 by way of an

     all-rows scan with a condition of ("TEST_DB.test2.col3 = 'pat'")

     into Spool 1 (group_amps), which is built locally on the AMPs. 

     The size of Spool 1 is estimated with no confidence to be 1 row (

     34 bytes).  The estimated time for this step is 0.03 seconds. 

  4) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.03 seconds. 

why didn't it use the join index I created?

Senior Apprentice

Re: When are join Indexes being used by the optimizer?

There's only a single row in that table and an "all-AMPs RETRIEVE step":

FTS -> One IO per AMP to access that datablock

Now assume using the JI, which needs a join back to the base row:

One IO per AMP to access the JI datablock plus another IO to read the base row datablock.

You should always test with some larger datasets and collected stats:

DELETE FROM test2;

INSERT INTO test2

SELECT day_of_calendar, DATE '2012-01-01' + RANDOM(1,100), 'pat'

FROM sys_calendar.calendar;

COLLECT STATS ON test2_join COLUMN col1;

COLLECT STATS ON test2 COLUMN col1;

COLLECT STATS ON test2 COLUMN col2;

Btw, instead of "CAST('2012-03-01' AS DATE FORMAT 'YYYY-MM-DD')" better use "DATE '2012-03-01'"

Dieter