EXISTS vs. IN

UDA

EXISTS vs. IN

Is EXISTS faster than IN? For example, table B has millions of records.

Select A.student_id from A
where EXISTS
(select student_id from B where A.student_id=B.student_id)

vs.

Select student_id from A
where A.student_id IN (select B.student_id from B)

student_id is NOT NULL
8 REPLIES

Re: EXISTS vs. IN

Hi,
u r query
Select A.student_id from A
where EXISTS
(select student_id from B where A.student_id=B.student_id)

is equivalent to

Select student_id from A
where A.student_id IN (select B.student_id from B where B.student_id is not null)

also the performance wise first query is better.

regds,
sachin

Re: EXISTS vs. IN

Thanks Sachin.
N/A

Re: EXISTS vs. IN


Hi,

Can u pls explain why 'Exists' performance is better than 'in'?

Regards,
abc

Re: EXISTS vs. IN

Hi,

The performance of ‘EXISTS’ and ‘IN’ will be same for queries having fewer members in their expression list.
But if the members in the expression list is more, then the performance of ‘EXISTS’ is better than ‘IN’.
This is because, while using ‘IN’ if the set of constants consists of 70 or fewer members, the system uses hashing to retrieve the rows.
If the set consists of more than 70 members, the system does a full-table scan.
Note that full-table scans are much more time consuming than row hashed accesses.

Regards,
Balamurugan
N/A

Re: EXISTS vs. IN

The SQL that uses EXISTS may be processed as a correlated subquery.

Experience shows that certain correlated subqueries take longer to process than similar SQL's which use a JOIN to achieve the same result.

I'd recommend the IN form of the SQL over the EXISTS form of the SQL.

--Foxbat
N/A

Re: EXISTS vs. IN

In almost any case there will be no difference between an IN-subquery and an EXISTS-correlated subquery, the optimizer will rewrite both to a join.

Compared to a JOIN there's also hardly any difference, if the subquery is the unique part of the 1:m relation.
If the subquery is the m-part, there's an automatic DISTINCT added to the subquery (in Explain that step will be executed before the join), but you have to add it manually to the JOIN (after the join within Explain).
If that subquery column is very non-unique that automatic DISTINCT might be overridden by a GROUP BY to enhance performance.

If it's NOT IN vs. NOT EXISTS there might be a huge difference, because NOT IN has to deal with NULLs via three-way-logic, whereas NOT EXISTS simply ignores NULL. If there's any NULLable column (inner or outer) used by the subquery, it is recommended to use NOT EXISTS instead of NOT IN. And NOT EXISTS is always more efficient than an Outer Join solution filtering for NULLs.

Dieter

Re: EXISTS vs. IN

I also read a good explaination on the following site. Noting this is an Oracle site I believe logically teradata adhears to this method however functionally it will "create joins" like dieter said:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:953229842074

You Asked Tom:

can you give me some example at which situation

IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *

  from t1, ( select distinct y from t2 ) t2

 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to

the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

   for x in ( select * from t1 )

   loop

      if ( exists ( select null from t2 where y = x.x )

      then

         OUTPUT THE RECORD

      end if

   end loop

It always results in a full scan of T1 whereas the first query can make use of an index

on T1(x).

So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery

    ( select y from T2 )

is "huge" and takes a long time.  But the table T1 is relatively small and executing (

select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the

exists will be faster as the time to full scan T1 and do the index probe into T2 could be

less then the time to simply full scan T2 to build the subquery we need to distinct on.

Lets say the result of the subquery is small -- then IN is typicaly more appropriate.

If both the subquery and the outer table are huge -- either might work as well as the

other -- depends on the indexes and other factors.

N/A

Re: EXISTS vs. IN

@dnoeth

Can you please elaborate on

"And NOT EXISTS is always more efficient than an Outer Join solution filtering for NULLs.

"

@Topic

Yes, typically In ,Exists and Join would give similar results. However as dnoeth mentioned in case join is creating dups .Eilimination in join has to be done exclusively ;In which de dup logic will be applicable for more columns and possibly more rows making it more expensive.

(Unless you write a sub query and use in join )