I have a table of 100 GB with 80 columns in it,, it has around 300 million rows. I'm using like operator to fetch the rows (string entered from front end tool). Currently its taking around 10-20 secs to fetch the rows, I need it to bring down to 2-5 secs.
Stats are collected on those columns
I've tried creating Join index, Hash Index , secondary index but nothing seems to be used when i look into the EXPLAIN Plan. It still goes for full table scan.
Following is the table structure
Create SET table t1
id integer not null,
start_dt date not null,
end_dt date not null,
) unique primary index(id,end_dt)
I use the following select queries
Select * from t1 where col1 like '%test%'
Select * from t1 where col2 like '%test%'
Please suggest me in creating the proper hash index or join index which will improve my select performance.
Did you check if the optimizer misses a statistic?
You can try a join index with the same PI as the base table and the column searched by LIKE:
create join index ... as
select id,end_dt, col1
primary index (id,end_dt)
The optimizer might do the LIKE on the JI and do a PI join to the base table.
For any LIKE match that is not a left substring match (eg LIKE 'test%') the optimizer has no way to match the requested value against any statistics. Therefore regardless of any indexes that might exist the optimizer is going to choose a pessimistic selectivity for the LIKE clause which will cause it in turn to see the table scan as the lower cost alternative. You will see this in the explain in the estimate of the number of rows returned. For an arbitrary LIKE expression with no other qualifiers against other fields you will not be able to cause the optimizer to do other than the scan.
If the string search is not completely generalized then there may be something you can do to create some form of word index or string index like a search engine would do that would allow you to look up the string there and link back to the records you want to retrieve.
What's the selectivity of your LIKE-condition?
I just tried it on a TD15 VM and it's using a NUSI (or JI) if the number of distinct values is high (in my case 150,000 within 9,000,000 rows).
As Todd said, it still has to do a FTS on the NUSI/JI, but if this is small compared to the base table and the estimated number of row is low it's still less work than a FTS on the base table.
create another table (and mantain it, when loading t1)
create table aux_t1
(id integer not null,
end_dt date not null,
unique primary index(id,end_dt);
insert into aux_t1 select id, end_dt, col1, col2 from t1;
collect stats on aux_t1.index(id,end_dt);
(this new table is smaller)
and then transform SQL in
from t1 , aux_t1
where t1.id = aux_t1.id
and t1.end_dt =aux_t1.end_dt
and aux_t1.col1 like '%test%';
If it works, TD will do a full table scan of the smaller table (much faster than do it on the big table) and then do a merge join with the big table.
This is what we do to avoid a full table scan of call data record, when we want to obtain a few records thru like condition,
eg. select * from cdr where IMEI like '%xxxxxxx%'.
Let me know if it works.