UNION Vs MSR

Database
N/A

UNION Vs MSR

Hi,

I have a quesion.

In case of huge data volume which one from below wil be better?

Insert into A

(

aa,bb,cc

)

sel

aa,bb,cc

from B

UNION ALL

sel

aa,bb,cc

from C;

                                                                    OR

Insert into A

(

aa,bb,cc

)

sel

aa,bb,cc

from B

;Insert into A

(

aa,bb,cc

)

sel

aa,bb,cc

from C;

Thanks,

Abhishek

Tags (3)
14 REPLIES
N/A

Re: UNION Vs MSR

I assume we are talking about an empty target table A. Correct?

You should check and compare the explains of both queries.

I would expect that the multi statement request is faster. 

Both are seems to be very similar but the MSR us only one spool file.

But again - check and share your explain.

Example

create table tab_s1 (a integer, b integer)unique primary index (a);
create table tab_s2 (a integer, b integer)unique primary index (a);
create table tab_t (a integer, b integer) unique primary index (a);

insert into tab_s1 (1,2);
insert into tab_s1 (3,2);
insert into tab_s2 (2,2);

collect stats tab_s1 column (a);
collect stats tab_s2 column (a);

explain
insert into tab_t
select *
from tab_s1
union all
select *
from tab_s2
;

explain
insert into tab_t
select *
from tab_s1
;insert into tab_t
select *
from tab_s2
;


explain

insert into tab_t

select *

from tab_s1

union all

select *

from tab_s2

;

 *** Help information returned. 31 rows.

 *** Total elapsed time was 1 second.

Explanation

---------------------------------------------------------------------------

  1) First, we lock a distinct xxx."pseudo table" for write on

     a RowHash to prevent global deadlock for xxx.tab_t.

  2) Next, we lock a distinct xxx."pseudo table" for read on a

     RowHash to prevent global deadlock for xxx.tab_s2.

  3) We lock a distinct xxx."pseudo table" for read on a

     RowHash to prevent global deadlock for xxx.tab_s1.

  4) We lock xxx.tab_t for write, we lock xxx.tab_s2

     for read, and we lock xxx.tab_s1 for read.

  5) We do an all-AMPs RETRIEVE step from xxx.tab_s1 by way of

     an all-rows scan with no residual conditions into Spool 1

     (all_amps), which is built locally on the AMPs.  The size of Spool

     1 is estimated with high confidence to be 2 rows (42 bytes).  The

     estimated time for this step is 0.13 seconds.

  6) We do an all-AMPs RETRIEVE step from xxx.tab_s2 by way of

     an all-rows scan with no residual conditions into Spool 1

     (all_amps), which is built locally on the AMPs.  The size of Spool

     1 is estimated with high confidence to be 3 rows (63 bytes).  The

     estimated time for this step is 0.13 seconds.

  7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of

     an all-rows scan into Spool 2 (all_amps), which is built locally

     on the AMPs.  Then we do a SORT to order Spool 2 by the hash code

     of (A).  The size of Spool 2 is estimated with high confidence to

     be 3 rows (63 bytes).  The estimated time for this step is 0.07

     seconds.

  8) We do an all-AMPs MERGE into xxx.tab_t from Spool 2 (Last

     Use).  The size is estimated with high confidence to be 3 rows.

     The estimated time for this step is 0.32 seconds.

  9) We spoil the parser's dictionary cache for the table.

 10) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

explain

insert into tab_t

select *

from tab_s1

;insert into tab_t

select *

from tab_s2

;

 *** Help information returned. 26 rows.

 *** Total elapsed time was 1 second.

Explanation

---------------------------------------------------------------------------

  1) First, we lock a distinct xxx."pseudo table" for write on

     a RowHash to prevent global deadlock for xxx.tab_t.

  2) Next, we lock a distinct xxx."pseudo table" for read on a

     RowHash to prevent global deadlock for xxx.tab_s2.

  3) We lock a distinct xxx."pseudo table" for read on a

     RowHash to prevent global deadlock for xxx.tab_s1.

  4) We lock xxx.tab_t for write, we lock xxx.tab_s2

     for read, and we lock xxx.tab_s1 for read.

  5) We do an all-AMPs RETRIEVE step from xxx.tab_s1 by way of

     an all-rows scan with no residual conditions into Spool 1

     (all_amps), which is built locally on the AMPs.  The size of Spool

     1 is estimated with high confidence to be 2 rows (42 bytes).  The

     estimated time for this step is 0.13 seconds.

  6) We do an all-AMPs RETRIEVE step from xxx.tab_s2 by way of

     an all-rows scan with no residual conditions into Spool 1

     (all_amps), which is built locally on the AMPs.  The size of Spool

     1 is estimated with high confidence to be 1 row (21 bytes).  The

     estimated time for this step is 0.13 seconds.

  7) We do an all-AMPs SORT to order Spool 1 (all_amps) by row hash.

  8) We do an all-AMPs MERGE into xxx.tab_t from Spool 1 (Last

     Use).  The size is estimated with high confidence to be 1 row.

     The estimated time for this step is 0.32 seconds.

  9) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

     No rows are returned to the user as the result of statement 2.

N/A

Re: UNION Vs MSR

Hi Ulrich,

Thanks.

I want to know one more thing. What will be the impact if the table is not empty?

We can assume that the table is MULTISET with PI.

Re: UNION Vs MSR

In general, inserting into a populated target table will be slower as INSERT SELECT operates on a row-by-row/block-by-block basis. The performance is optimised when the target table is empty and has the same PI as the source table.Also incase of populated table, a WRITE of the inserted row into the transient journal table is required.I believe that MSR performance will be better incase of empty target table.

N/A

Re: UNION Vs MSR

again check your explain.

I added one row to the target table and checked the explains afterwards.


They didn't change in this test case. So also here the MSR would use only one spool file where the union would use two. Which should give the MSR some benefits.


The optimiser estimates 0.58 sec for MSR and 0.65 sec for the union. So optimizer expects about 10% less time for MSR.


As Stefans stated - if the target is not empty you loss the benefit of the fast path insert - which means no journal.


you can also check the MERGE statement to load into the populated table

explain
merge into tab_t as t
using (
select *
from tab_s1
union all
select *
from tab_s2
) as s on t.a = s.a
when not matched then insert (s.a,s.b)
;

The explain shows the usage of two spool files again due to the union. But the final merge might be faster - unfortunatly the explain is not giving any guesses on the timings this time.

Re: UNION Vs MSR

Stalin:

>>"In general, inserting into a populated target table will be slower as INSERT SELECT operates on a row-by-row/block-by-block basis."

INSERT...SELECT will operate block-by-block ONLY if THE TABLE IS EMPTY, otherwise it will operate row-by-row with all the transient journal stuff. This is where the performance optimisation (block-by-block, no tj) takes part.

Cheers.

Carlos.

Re: UNION Vs MSR

Thanks for the correction Carlos:)

N/A

Re: UNION Vs MSR

Thanks everyone for the comments.

Re: UNION Vs MSR

Hi,

Consider this MSR statements. This involves 2 tables customer and orders

Locking table customer for access

Locking table order for access

Sel c.*,o.*

from customer inner join orders on <conditions>

;update orders

set = <Value> where conditions

In this MSR, statement 1 will use access lock on the customer and orders table. Statement 2 uses write lock on orders table. So, Customer table is locked for access and order table is loacked for write.

Please correct me if i am wrong.

N/A

Re: UNION Vs MSR

When you read the Explain you'll easily see that your assumption is correct.

Dieter