Which is efficient?

Database
Enthusiast

Which is efficient?

I have written a query as follows :

select

a.x1,

a.x2,

b.x3

from

(select x1,x2,x4 from table1 where 'some conditions') as a

inner join

(select x3,x4 from table2) as b

on

a.x3=b.x3

I have another version as :

select

a.x1,

a.x2,

b.x3

from

(select x1,x2,x4 from table1 where 'some conditions') as a

inner join

 table2 as b

on

a.x3=b.x3

table1 has 12 columns and table2 has 20 columns.

Can someone tell me which version is better and why?

Tags (2)
11 REPLIES
Enthusiast

Re: Which is efficient?

Hi Prateek

Looking at your query, my first response is, this query will give error. You are joining table b with table a on x3 column but you are not selecting x3 column in your sub-query. Instead you have slected x4.

Assuming it is a type-o, the joining will not depend on the number of columns. It is optimizer's decision.

If x3 is not the indexing column for both tables, in your subquery for table1 you have given some conditions. If this makes selection of very few records compared to table2, then optimizer can decide to duplicate "a" into all-AMPs and go for product join for both the queries.

You can always run Explain to see which one is performing better.

Other experts in this forum may have extra points to add.

Thanking You

Santanu

Enthusiast

Re: Which is efficient?

Thanks Sanatanu. Yes it was a typo, i intended to join these tables on x4 only. :)

Enthusiast

Re: Which is efficient?

Hi Prateek,

In my opinion the first option is good. As you have restricted the number required columns from table b. The spool and the other resources consumed will be less.

Can you check the spool and number of steps required in both the ways and let us know.

Regards,

Sri

Enthusiast

Re: Which is efficient?

I mean in the explain plan - the spool and number of steps required in both the ways.

Enthusiast

Re: Which is efficient?

My statement is subject to correction :).

You can check the explain plan for both.I feel that for small amount of data , they are almost the same. ResTree to ResTree' conversion,in Query Rewrite for the first case, I feel that it does not need to push projections into spooled views, since the columns are specified. For the second case, since the table2 has 20 columns, so Query Rewrite will take time.

Again, Query Rewrite may or will go for folding the derived table. So it may or may not take time here . It all boils down to the optimizer.

Cheers,
Enthusiast

Re: Which is efficient?

Parteek,

The first version will be better. The reason is to get performance you need to reduce IO as much as you can. While selecting from tables, if you select only columns requied and restrict rows as well, then the IO will be termnedously reduced and less data block will be accessed. 

In first query you are selecting only two columns from table2, but in second 20 column will be read. so second one will be more IO intensive.

You can check explain plan to confirm the difference. or you ca paste the explain plan for discussion. 

Khurram
Senior Apprentice

Re: Which is efficient?

Derived Tables are only materialzed when there's something like DISTINCT/GROUP BY/OLAP. 

Otherwise they are already resolved by the parser. Also any unused columns are removed from the parse tree, simply compare the estimated spool sizes.

When you explain both queries they should be exactly the same as the optimizer should get the same parse tree.


Enthusiast

Re: Which is efficient?

If you are just selecting records then it does not matter whether you use Derived table or the table directly.

Teradata PE is enough smart to include only those columns which are required by the query.

You can check the explain plan and run both the queries.

I believe you will see same spool usage too in both cases.

Raj

Enthusiast

Re: Which is efficient?

Does it mean that Tables can have any number of columns and this will not impact performance as long as select is retrieving few columns?

select a1,a2,b3 from A join B on A.a1=B.b1

--A has 200 columns and B has 10 columns.  50 mil rows in A and 200 mil in B ---same performance ?

--A has 10 columns and B has 10 columns.  50 mil rows in A and 200 mil in B -----same performace?