SQL statements in UNION executes serial or parallel

Database
Highlighted
KVB
Enthusiast

SQL statements in UNION executes serial or parallel

Hi,

Does SQL statements in UNION executes serial or parallel?

7 REPLIES
Enthusiast

Re: SQL statements in UNION executes serial or parallel

Hi bikky,

Not sure from which context you are looking. However, I ran an explain test:


select databasename,tablename from dbc.tables where databasename <> 'dbc'

union 

select databasename,tablename from dbc.tables where databasename <> 'test'

The result is parallel:

 1) First, we lock DBC.dbase in view tables for access, and we lock

     DBC.tvm in view tables for access.

  2) Next, we execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step from DBC.tvm in view tables

          by way of an all-rows scan with a condition of ("((DBC.tvm in

          view tables.TVMId > '00C001000000'XB) AND (DBC.tvm in view

          tables.TVMId < '00C002000000'XB)) OR ((DBC.tvm in view

          tables.TVMId < '00C001000000'XB) OR (((DBC.tvm in view

          tables.TVMId > '00C002000000'XB) AND (DBC.tvm in view

          tables.TVMId < '00C009000000'XB)) OR (((DBC.tvm in view

          tables.TVMId > '00C009000000'XB) AND (DBC.tvm in view

          tables.TVMId < '00C010000000'XB)) OR (((DBC.tvm in view

          tables.TVMId > '00C010000000'XB) AND (DBC.tvm in view

          tables.TVMId < '00C017000000'XB)) OR (DBC.tvm in view

          tables.TVMId > '00C017000000'XB)))))") into Spool 2

          (all_amps), which is redistributed by the hash code of (

          DBC.tvm.DatabaseId) to all AMPs.  Then we do a SORT to order

          Spool 2 by row hash.  The size of Spool 2 is estimated with

          low confidence to be 186,192 rows (19,363,968 bytes).  The

          estimated time for this step is 0.22 seconds.

       2) We do an all-AMPs RETRIEVE step from DBC.dbase in view tables

          by way of an all-rows scan with a condition of (

          "(TRANSLATE((DBC.dbase in view tables.DatabaseName )USING

          UNICODE_TO_LOCALE WITH ERROR )(CHAR(30), CHARACTER SET LATIN,

          NOT CASESPECIFIC))<> 'dbc '") into Spool 3 (all_amps), which

          is redistributed by the hash code of (DBC.dbase.DatabaseId)

          to all AMPs.  Then we do a SORT to order Spool 3 by row hash.

          The size of Spool 3 is estimated with no confidence to be

          4,277 rows (444,808 bytes).  The estimated time for this step

          is 0.02 seconds.

  3) We execute the following steps in parallel.

       1) We do an all-AMPs JOIN step from Spool 2 by way of a RowHash

          match scan, which is joined to Spool 3 (Last Use) by way of a

          RowHash match scan.  Spool 2 and Spool 3 are joined using a

          merge join, with a join condition of ("DatabaseId =

          DatabaseId").  The result goes into Spool 1 (group_amps),

          which is redistributed by the hash code of (

          TRANSLATE((DBC.tvm.TVMName )USING UNICODE_TO_LOCALE WITH

          ERROR )(CHAR(30), CHARACTER SET LATIN, NOT CASESPECIFIC),

          TRANSLATE((DBC.dbase.DatabaseName )USING UNICODE_TO_LOCALE

          WITH ERROR )(CHAR(30), CHARACTER SET LATIN, NOT CASESPECIFIC))

          to all AMPs.  The size of Spool 1 is estimated with no

          confidence to be 167,581 rows (33,013,457 bytes).  The

          estimated time for this step is 0.13 seconds.

       2) We do an all-AMPs RETRIEVE step from DBC.dbase in view tables

          by way of an all-rows scan with a condition of (

          "(TRANSLATE((DBC.dbase in view tables.DatabaseName )USING

          UNICODE_TO_LOCALE WITH ERROR )(CHAR(30), CHARACTER SET LATIN,

          NOT CASESPECIFIC))<> 'test '") into Spool 5 (all_amps), which

          is redistributed by the hash code of (DBC.dbase.DatabaseId)

          to all AMPs.  Then we do a SORT to order Spool 5 by row hash.

          The size of Spool 5 is estimated with no confidence to be

          4,277 rows (444,808 bytes).  The estimated time for this step

          is 0.02 seconds.

  4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a

     RowHash match scan, which is joined to Spool 5 (Last Use) by way

     of a RowHash match scan.  Spool 2 and Spool 5 are joined using a

     merge join, with a join condition of ("DatabaseId = DatabaseId").

     The result goes into Spool 1 (group_amps), which is redistributed

     by the hash code of (TRANSLATE((DBC.tvm.TVMName )USING

     UNICODE_TO_LOCALE WITH ERROR )(CHAR(30), CHARACTER SET LATIN, NOT

     CASESPECIFIC), TRANSLATE((DBC.dbase.DatabaseName )USING

     UNICODE_TO_LOCALE WITH ERROR )(CHAR(30), CHARACTER SET LATIN, NOT

     CASESPECIFIC)) to all AMPs.  Then we do a SORT to order Spool 1 by

     the sort key in spool field1 eliminating duplicate rows.  The size

     of Spool 1 is estimated with no confidence to be 204,240 rows (

     40,235,280 bytes).  The estimated time for this step is 0.15

     seconds.

  5) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.51 seconds.

Is this what you want to confirm?

Cheers,

Enthusiast

Re: SQL statements in UNION executes serial or parallel

Bikky,

It is a decision made by the optimizer based on several factors. Optimizer decides whether to execute statements sequentailly or parallel. 

the statements independent of each other are mostly executed in parallel, and the statements which use common objects will be executed in serial.

For example if your union statements are selecting from different tables, then optimizer can go for parallel, but if the same table is being used in multiple slects then there is chance for sequential execution.

Khurram
KVB
Enthusiast

Re: SQL statements in UNION executes serial or parallel

Hi Raja,

I understood the explanation.Thanks for that.

 Just an analogy explained below.

Suppose I want to extract the rows based on 1 to 10000 id's.If a write a single sql statement then it's going to take a long time.So Ihave written multiple SQL stmts using UNION like

select * from table where id between 1 and 1000

union

select * from table where id between 1001 and 2000

union

...so on

Here My question is all the SQL statements submit to the server in parallel and get the results set in parallel.

Will it save time better than the first SQL stmt?

Bikky.

Junior Contributor

Re: SQL statements in UNION executes serial or parallel

Hi Bikky,

a UNION needs to insert both results into the same spool -> the individual SELECT might run in parallel (if they need multiple steps due to joins, etc), but the final inserts are always processed serially.

Splitting this query returning 10000 ids into multiple queries plus UNION (or better UNION ALL to avoid the DISTINCT processing) will never be faster.

How many rows are in this table, what's the PI and partitioning and what's your actual query and why do you think it's slow?

KVB
Enthusiast

Re: SQL statements in UNION executes serial or parallel

Thanks Dieter.I just want to know the background of this UNION and it's got clarified.

Enthusiast

Re: SQL statements in UNION executes serial or parallel

Here's a variation where it would be great all if the select's of a union all executed in parallel and the resulting spools were combined sequentially at the end... 

    select count(*) as Table1_Count from Table 1 where key = 123 union all

    select count(*) as Table2_Count from Table 2 where key = 123 union all

    select count(*)  as Table3_Count from Table 3 where key = 123

Instead we (only) see parallel behavior for the pairs of: the spool from the previous select getting combined to a common spool while the current select runs.  Is there a way to indicate to the query optimizer that union all select's should be run in parallel?

The EXPLAIN...

3) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by

          way of an all-rows scan into Spool 1 (all_amps), which is

          built locally on the AMPs.  The size of Spool 1 is estimated

          with high confidence to be 1 row (83 bytes).  The estimated

          time for this step is 0.00 seconds.

       2) We do an all-AMPs SUM step to aggregate from a single

          partition of db.table with a condition

          of ("db.table2.key = 6140") with a

          residual condition of ("db.table2.key = 6140") , grouping by field1 (

          db.table2).  Aggregate

          Intermediate Results are computed globally, then placed in

          Spool 6.  The size of Spool 6 is estimated with high

          confidence to be 1 row (29 bytes).  The estimated time for

          this step is 0.02 seconds.

Teradata Employee

Re: SQL statements in UNION executes serial or parallel

There is not such an option. It would not work for three reasons under current optimizer/execution engine rules.

- Currently the engine is not able to write to the same spool file from multiple concurrent steps. This means that the write to the union spool steps cannot be executed in parallel.

- Currently we do not dispatch more than 2 all-AMP steps concurrently regardless of how many concurrent steps are shown in the explain (more single/few AMP steps can be run in parallel). Even if we grouped all the select steps together we would only execute two of them at a time, and then we would have all the steps for writing into the final spool to do sequentially. Running more steps sequentially would allow a single query to significantly over use resources relative to other queries on the system.

- The algorithm for parallel steps currently does not allow for reordering of steps in the plan. And as soon as there is a dependency like the use of the spool produced from the previous step, then there has to be a parallel step break. Without reordering, the step for writing into the union spool is going to be a parallel step break from the previous steep tha makes the spool.