Query from view optimizations

Database

Query from view optimizations


I have table SRC partitioned by PART. On top is a view WIDE which does simple transpose.

I query WIDE view for selected columns. Everything TD need is PART='00' (alias a0) and i expected TD will optimize query and removes not relevant right sides of joins (alias a1 and a2) and will do simple select without any joins.

But it doesn't. Is there way to modify this behavior? Maybe WIDE view must be defined other way?

/* SRC table */

PART    KEY    DATA

00      001    10

01      001    15

02      001    20

/* WIDE view */

Create view WIDE as

Select

                a0.KEY as KEY,

                a0.DATA as DATA0,

                a1.DATA as DATA1,

                a2.DATA as DATA2

from SRC a0

                left join SRC a1

                                on a0.KEY=a1.KEY

                left join SRC a2

                                on a0.KEY=a2.KEY 

where

                a0.PART='00' and a1.PART='01' and a2.PART='02';

/* queries */

EXPLAIN select * from WIDE

EXPLAIN select KEY, DATA0 from WIDE

result 1 (as expected):

EXPLAIN select * from WIDE

Explanation

---------------------------------------------------------------------------

1) First, we execute the following steps in parallel.

a. We do a single-AMP RETRIEVE step from TD.a0 in view WIDE by way of the primary index "TD.a0 in view WIDE.PART = '00'" with a residual condition of ("(TD.a0 in view WIDE.PART = '00') AND (NOT (TD.a0 in view WIDE.KEY IS NULL ))") into Spool 3 (one-amp), which is redistributed by the hash code of ('01') to all AMPs.  The size of Spool 3 is estimated with low confidence to be 2 rows (52 bytes). The estimated time for this step is 0.02 seconds.

b. We do a single-AMP RETRIEVE step from TD.a2 in view WIDE by way of the primary index "TD.a2 in view WIDE.PART = '02'" with a residual condition of ("(TD.a2 in view WIDE.PART = '02') AND (NOT (TD.a2 in view WIDE.KEY IS NULL ))") into Spool 4 (all_amps), which is redistributed by the hash code of (TD.a2.KEY, TD.a2.KEY) to all  AMPs.  Then we do a SORT to order Spool 4 by row hash.  The size of Spool 4 is estimated with low confidence to be 2 rows (50 bytes).  The estimated time for this step is 0.00 seconds.

2) Next, we do a single-AMP JOIN step from Spool 3 (Last Use) by way of an all-rows scan, which is joined to TD.a1 in view WIDE by way of the primary index "TD.a1 in view WIDE.PART = '01'" with a residual condition of ("TD.a1 in view WIDE.PART = '01'").  Spool 3 and TD.a1 are joined using a product join, with a join condition of ("KEY = TD.a1.KEY").  The result goes into Spool 5 (all_amps), which is redistributed by the hash code of (TD.a0.KEY, TD.a1.KEY) to all AMPs.  Then we do a SORT to order Spool 5 by row hash.  The size of Spool 5 is estimated with no confidence to be 3 rows (105 bytes).  The estimated time for this step is 0.01 seconds.

3) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a RowHash match scan, which is joined to Spool 5 (Last Use) by way of a RowHash match scan.  Spool 4 and Spool 5 are joined using a merge join, with a join condition of ("(KEY = KEY) AND (KEY = KEY)").  The result goes into Spool 2 (all_amps), which is built locally on the AMPs.  The size of Spool 2 is estimated with no confidence to be 4 rows (360 bytes).  The estimated time for this step is 0.04 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

 -> The contents of Spool 2 are sent back to the user as the result of statement 1.  The total estimated time is 0.06 seconds.

result 2 (not optimized in my opinion):

EXPLAIN select KEY, DATA0 from WIDE

Explanation

---------------------------------------------------------------------------

1) First, we execute the following steps in parallel.

1) We do a single-AMP RETRIEVE step from TD.a1 in view WIDE by way of the primary index "TD.a1 in view WIDE.PART = '01'" with a residual condition of ("(TD.a1 in view WIDE.PART = '01') AND (NOT (TD.a1 in view WIDE.KEY IS NULL ))") into Spool 3 (one-amp), which is redistributed by the hash code of ('02') to all AMPs.  The size of Spool 3 is estimated with low confidence to be 2 rows (36 bytes). The estimated time for this step is 0.02 seconds.

2) We do a single-AMP RETRIEVE step from TD.a0 in view WIDE by way of the primary index "TD.a0 in view WIDE.PART = '00'" with a residual condition of ("(TD.a0 in view WIDE.PART = '00') AND (NOT (TD.a0 in view WIDE.KEY IS NULL ))") into Spool 4 (all_amps), which is redistributed by the hash code of (TD.a0.KEY, TD.a0.KEY) to all AMPs.  Then we do a SORT to order Spool 4 by row hash.  The size of Spool 4 is estimated with low confidence to be 2 rows (50 bytes).  The estimated time for this step is 0.00 seconds.

2) Next, we do a single-AMP JOIN step from Spool 3 (Last Use) by wayn of an all-rows scan, which is joined to TD.a2 in view WIDE by way of the primary index "TD.a2 in view WIDE.PART = '02'" with a residual condition of ("TD.a2 in view WIDE.PART = '02'").  Spool 3 and TD.a2 are joined using a product join, with a join condition of ("KEY = TD.a2.KEY").  The result goes into Spool 5 (all_amps), which is redistributed by the hash code of (TD.a1.KEY, TD.a2.KEY) to all AMPs.  Then we do a SORT to order Spool 5 by row hash.  The size of Spool 5 is estimated with no confidence to be 3 rows (57 bytes).  The estimated time for this step is 0.01 seconds.

3) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a RowHash match scan, which is joined to Spool 5 (Last Use) by way of a RowHash match scan.  Spool 4 and Spool 5 are joined using a merge join, with a join condition of ("(KEY = KEY) AND (KEY = KEY)").  The result goes into Spool 2 (all_amps), which is built locally on the AMPs.  The size of Spool 2 is estimated with no confidence to be 4 rows (184 bytes).  The estimated time for this step is 0.04 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

  -> The contents of Spool 2 are sent back to the user as the result of      statement 1.  The total estimated time is 0.06 seconds.

Tags (1)
10 REPLIES
N/A

Re: Query from view optimizations

The optmizer might do a join elimination when it's not changing the result set.

There are two issues with your approach:

#1: It must be a join to a UNIQUE/PK column.

In your case this is not known by the optmizer (indicated by the increasing estimated number of rows in each step).

#2: There must be an Outer Join or a Foreign Key.

Your view results in an Inner Join due to the WHERE condition on a1.PART and a2.PART.

To fix it you should define (PART, KEY) as UNIQUE and move a1.PART='01' and a2.PART='02' into the ON condition.

Dieter

Re: Query from view optimizations

Thx for suggestions,

I do some modifications but TD still tries to make joins. Any ideas?

I created SRC table using:

  create table SRC ( PART varchar(2) not null, ID varchar(3) not null,  DATA float) primary index (PART, ID);

view:

Create view WIDE as

Select

                a0.ID as ID,

                a0.DATA as DATA0,

                a1.DATA as DATA1,

                a2.DATA as DATA2

from SRC a0

                left join SRC a1

                                on a0.ID=a1.ID and a0.PART='00' and a1.PART='01'

                left join SRC a2

                                on a0.ID=a2.ID and a0.PART='00' and a2.PART='02'


explain select id,data0 from wide;

 *** Help information returned. 34 rows.

 *** Total elapsed time was 1 second.

Explanation

---------------------------------------------------------------------------

  1) First, we lock a distinct SAS."pseudo table" for read on a RowHash to prevent global deadlock for SAS.a0.

  2) Next, we lock SAS.a0 in view wide for read.

  3) We do an all-AMPs RETRIEVE step from SAS.a0 in view wide by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '01') to all AMPs.  Then we do a SORT to order Spool 3 by row hash. The size of Spool 3 is estimated with low confidence to be 2 rows (54 bytes).  The estimated time for this step is 0.01 seconds.

  4) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of a RowHash match scan, which is joined to SAS.a1 in view wide by way of a RowHash match scan with a condition of ("SAS.a1 in view wide.PART = '01'").  Spool 3 and SAS.a1 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '00'"), with a join condition of ("ID = SAS.a1.ID").  The result goes into Spool 4 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '02') to all AMPs. Then we do a SORT to order Spool 4 by row hash.  The size of Spool 4 is estimated with no confidence to be 2 rows (54 bytes).  The estimated time for this step is 0.03 seconds.

  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a RowHash match scan, which is joined to SAS.a2 in view wide by way of a RowHash match scan with a condition of ("SAS.a2 in view wide.PART = '02'").  Spool 4 and SAS.a2 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '00'"), with a join condition of ("ID = SAS.a2.ID").  The result goes into Spool 2 (group_amps), which is built locally on the AMPs.  The size of Spool 2 is estimated with no confidence to be 3 rows (138 bytes).  The estimated time for this step is 0.05 seconds.

  6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

  -> The contents of Spool 2 are sent back to the user as the result of statement 1.  The total estimated time is 0.09 seconds.


Re: Query from view optimizations

Generally in real life tables called SRC have hundreds columns and millions of rows per months (part column). It must be prepared (transposed, same way as WIDE view does) to be processed using another procedure, which needs values from many months in one row. Typically we touch 6, sometimes 12 months. Sometimes there are more than one SRC table, so if there as eight SRC tables with 12 months WIDE view will join 96 parititions. If DB will get first part, join second, next db will join third to result, next woll join another, and again, again it will be veeery inefficient. Best way is to do sort and merge all tables i one pass.

Maybe view is not good option, what do you think?

N/A

Re: Query from view optimizations

You only applied 50% of my suggestion :-)

UNIQUE primary index (PART, ID)

Dieter

Re: Query from view optimizations

my fault ;)

looks much better :)

but...

I next comment wrote, that typically i will join 6 or more partitions. here is example view:

Create view WIDE as

Select

                a0.ID as ID,

                a0.DATA as DATA0,

                a1.DATA as DATA1,

                a2.DATA as DATA2,

                a3.DATA as DATA3,

                a4.DATA as DATA4,

                a5.DATA as DATA5,

                a6.DATA as DATA6,

                a7.DATA as DATA7,

                a8.DATA as DATA8,

                a9.DATA as DATA9,

                a10.DATA as DATA10,

                a11.DATA as DATA11

from SRC a0

                left join SRC a1 on a0.ID=a1.ID and a0.PART='01' and a1.PART='02'

                left join SRC a2 on a0.ID=a2.ID and a0.PART='01' and a2.PART='03'

                left join SRC a3 on a0.ID=a3.ID and a0.PART='01' and a3.PART='04'

                left join SRC a4 on a0.ID=a4.ID and a0.PART='01' and a4.PART='05'

                left join SRC a5 on a0.ID=a5.ID and a0.PART='01' and a5.PART='06'

                left join SRC a6 on a0.ID=a6.ID and a0.PART='01' and a6.PART='07'

                left join SRC a7 on a0.ID=a7.ID and a0.PART='01' and a7.PART='08'

                left join SRC a8 on a0.ID=a8.ID and a0.PART='01' and a8.PART='09'

                left join SRC a9 on a0.ID=a9.ID and a0.PART='01' and a9.PART='10'

                left join SRC a10 on a0.ID=a10.ID and a0.PART='01' and a10.PART='11'

                left join SRC a11on a0.ID=a11.ID and a0.PART='01' and a11.PART='12'

                where a0.PART='01';

commit;

TD can efficiently remove not important references, but in mu opinion join method is still not efficient. TD wants to join table by table. it is very simple star schema, and, i think, TD should join all in one step (parallel sort, then merge).

I tried this scenario in ORACLE and i had similar problems. It has had significiant impact (ten or more times slower than expected).

Explanation (1)

---------------------------------------------------------------------------

  1) First, we lock a distinct SAS."pseudo table" for read on a RowHash to prevent global deadlock for SAS.a0.

  2) Next, we lock SAS.a0 in view wide for read.

  3) We do an all-AMPs RETRIEVE step from SAS.a0 in view wide by way of an all-rows scan with no residual conditions into Spool 2 (group_amps), which is built locally on the AMPs.  The size of Spool 2 is estimated with low confidence to be 2 rows (92 bytes). The estimated time for this step is 0.03 seconds.

  4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

-> The contents of Spool 2 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.03 seconds.

explain select * from wide;

 *** Help information returned. 134 rows.

 *** Total elapsed time was 1 second.

Explanation

---------------------------------------------------------------------------

  1) First, we lock a distinct SAS."pseudo table" for read on a RowHash to prevent global deadlock for SAS.a0.

  2) Next, we lock SAS.a0 in view wide for read.

  3) We do an all-AMPs RETRIEVE step from SAS.a0 in view wide by way of an all-rows scan with a condition of ("SAS.a0 in view wide.PART = '01'") into Spool 3 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '02') to all AMPs.  Then we do a SORT to order Spool 3 by row hash.  The size of Spool 3 is estimated with high confidence to be 2,000 rows (56,000 bytes).  The estimated time for this step is 0.07 seconds.

  4) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of a RowHash match scan, which is joined to SAS.a1 in view wide by way of a RowHash match scan with a condition of ("SAS.a1 in view wide.PART = '02'").  Spool 3 and SAS.a1 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a1.ID").  The result goes into Spool 4 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '05') to all AMPs. Then we do a SORT to order Spool 4 by row hash.  The size of Spool 4 is estimated with low confidence to be 2,000 rows (72,000 bytes). The estimated time for this step is 0.09 seconds.

  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a RowHash match scan, which is joined to SAS.a4 in view wide by way of a RowHash match scan with a condition of ("SAS.a4 in view wide.PART = '05'").  Spool 4 and SAS.a4 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a4.ID").  The result goes into Spool 5 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '06') to all AMPs. Then we do a SORT to order Spool 5 by row hash.  The size of Spool 5 is estimated with low confidence to be 2,000 rows (88,000 bytes). The estimated time for this step is 0.09 seconds.

  6) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of a RowHash match scan, which is joined to SAS.a5 in view wide by way of a RowHash match scan with a condition of ("SAS.a5 in view wide.PART = '06'").  Spool 5 and SAS.a5 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a5.ID").  The result goes into Spool 6 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '07') to all AMPs. Then we do a SORT to order Spool 6 by row hash.  The size of Spool 6 is estimated with low confidence to be 2,000 rows (104,000 bytes).  The estimated time for this step is 0.09 seconds.

  7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of a RowHash match scan, which is joined to SAS.a6 in view wide by way of a RowHash match scan with a condition of ("SAS.a6 in view wide.PART = '07'").  Spool 6 and SAS.a6 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a6.ID").  The result goes into Spool 7 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '08') to all AMPs. Then we do a SORT to order Spool 7 by row hash.  The size of Spool 7 is estimated with low confidence to be 2,000 rows (120,000 bytes).  The estimated time for this step is 0.09 seconds.

  8) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of a RowHash match scan, which is joined to SAS.a7 in view wide by way of a RowHash match scan with a condition of ("SAS.a7 in view wide.PART = '08'").  Spool 7 and SAS.a7 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a7.ID").  The result goes into Spool 8 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '09') to all AMPs. Then we do a SORT to order Spool 8 by row hash.  The size of Spool 8 is estimated with low confidence to be 2,000 rows (136,000 bytes).  The estimated time for this step is 0.09 seconds.

  9) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of a RowHash match scan, which is joined to SAS.a8 in view wide by way of a RowHash match scan with a condition of ("SAS.a8 in view wide.PART = '09'").  Spool 8 and SAS.a8 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a8.ID").  The result goes into Spool 9 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '11') to all AMPs. Then we do a SORT to order Spool 9 by row hash.  The size of Spool 9 is estimated with low confidence to be 2,000 rows (152,000 bytes).  The estimated time for this step is 0.09 seconds.

  10) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of a RowHash match scan, which is joined to SAS.a10 in view wide by way of a RowHash match scan with a condition of ("SAS.a10 in view wide.PART = '11'").  Spool 9 and SAS.a10 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a10.ID").  The result goes into Spool 10 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '03') to all AMPs. Then we do a SORT to order Spool 10 by row hash.  The size of Spool 10 is estimated with low confidence to be 2,000 rows ( 168,000 bytes).  The estimated time for this step is 0.09 seconds.

  11) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a RowHash match scan, which is joined to SAS.a2 in view wide by way of a RowHash match scan with a condition of ("SAS.a2 in view wide.PART = '03'").  Spool 10 and SAS.a2 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a2.ID").  The result goes into Spool 11 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '04') to all AMPs. Then we do a SORT to order Spool 11 by row hash.  The size of Spool 11 is estimated with low confidence to be 2,009 rows ( 184,828 bytes).  The estimated time for this step is 0.09 seconds.

  12) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of a RowHash match scan, which is joined to SAS.a3 in view wide by way of a RowHash match scan with a condition of ("SAS.a3 in view wide.PART = '04'").  Spool 11 and SAS.a3 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a3.ID").  The result goes into Spool 12 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '10') to all AMPs. Then we do a SORT to order Spool 12 by row hash.  The size of Spool 12 is estimated with low confidence to be 2,053 rows ( 205,300 bytes).  The estimated time for this step is 0.10 seconds.

  13) We do an all-AMPs JOIN step from Spool 12 (Last Use) by way of a RowHash match scan, which is joined to SAS.a9 in view wide by way of a RowHash match scan with a condition of ("SAS.a9 in view wide.PART = '10'").  Spool 12 and SAS.a9 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a9.ID").  The result goes into Spool 13 (all_amps), which is redistributed by the hash code of (SAS.a0.ID, '12') to all AMPs. Then we do a SORT to order Spool 13 by row hash.  The size of Spool 13 is estimated with low confidence to be 2,125 rows ( 229,500 bytes).  The estimated time for this step is 0.10 seconds.

  14) We do an all-AMPs JOIN step from Spool 13 (Last Use) by way of a RowHash match scan, which is joined to SAS.a11 in view wide by way of a RowHash match scan with a condition of ("SAS.a11 in view wide.PART = '12'").  Spool 13 and SAS.a11 are left outer joined using a merge join, with condition(s) used for non-matching on left table ("PART = '01'"), with a join condition of ("ID = SAS.a11.ID").  The result goes into Spool 2 (group_amps), which is built locally on the AMPs.  The size of Spool 2 is estimated with low confidence to be 2,211 rows (643,401 bytes).  The estimated time for this step is 0.11 seconds.

  15) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

  -> The contents of Spool 2 are sent back to the user as the result of statement 1.  The total estimated time is 1.09 seconds.


N/A

Re: Query from view optimizations

I doubt you'll find any RDBMS which is capable of joining more than two sets of data at the same time.

Getting both, join elimination and fast processing at the same time is probably not possible. If you change the PI to ID all the joins should be AMP-local, but then you can't define (PART,ID) as unique (unless you add a USI).

Instead of joining the same table multiple times you could also use some CASE/GROUP BY logic (again based on ID as NUPI leading to AMP-local aggregation):

select
ID,
max(case when PART = '01' then Data end) as Data0,
max(case when PART = '02' then Data end) as Data1,
max(case when PART = '03' then Data end) as Data2,
max(case when PART = '04' then Data end) as Data3,
...
from SRC
group by 1

Dieter

Re: Query from view optimizations

Your solution looks nice :)

Extending question - at this time I can't make tests, so i have question - what do you think: If there will be SRC0-SRC9 tables (all tables with 12 parts) which option will be better?

first:

select ID, 

  max(case when src='0' and PART='01' then DATA end) as SRC0_DATA0

  max(case when src='0' and PART='02' then DATA end) as SRC0_DATA1

  max(case when src='0' and PART='03' then DATA end) as SRC0_DATA2

  max(case when src='1' and PART='01' then DATA end) as SRC1_DATA0

  max(case when src='1' and PART='02' then DATA end) as SRC1_DATA1

  max(case when src='1' and PART='03' then DATA end) as SRC1_DATA2

  max(case when src='2' and PART='01' then DATA end) as SRC1_DATA0

  max(case when src='2' and PART='02' then DATA end) as SRC1_DATA1

  max(case when src='2' and PART='03' then DATA end) as SRC1_DATA2

from (

  select '0' as SRC, * from SRC0

  union all

  select '1' as SRC, * from SRC1

  union all

  select '2' as SRC, * from SRC2

)

group by 1

second:

select ID, 

  max(case when src='0' then DATA0 end) as SRC0_DATA0

  max(case when src='0' then DATA1 end) as SRC0_DATA1

  max(case when src='0' then DATA2 end) as SRC0_DATA2

  max(case when src='1' then DATA0 end) as SRC1_DATA0

  max(case when src='1' then DATA1 end) as SRC1_DATA1

  max(case when src='1' then DATA2 end) as SRC1_DATA2

  max(case when src='1' then DATA0 end) as SRC2_DATA0

  max(case when src='1' then DATA1 end) as SRC2_DATA1

  max(case when src='1' then DATA2 end) as SRC2_DATA2

from (

  select '0' as SRC, ID, 

      max(case when PART='01' then DATA end) as DATA0

      max(case when PART='02' then DATA end) as DATA1

      max(case when PART='03' then DATA end) as DATA2

    from SRC0

    group by 1,2

  union all

  select '1' as SRC, ID, 

      max(case when PART='01' then DATA end) as DATA0

      max(case when PART='02' then DATA end) as DATA1

      max(case when PART='03' then DATA end) as DATA2

    from SRC1

    group by 1,2

  union all

  select '2' as SRC, ID, 

      max(case when PART='01' then DATA end) as DATA0

      max(case when PART='02' then DATA end) as DATA1

      max(case when PART='03' then DATA end) as DATA2

    from SRC2

    group by 1,2

)

group by 1

third option is to left join aggregations with CASE&GROUP.

--

View mentioned at beginning:

Create view WIDE2 as

Select

                ID,

                max(case when PART='01' then Data end) as DATA0,

                max(case when PART='02' then Data end) as DATA1,

                max(case when PART='03' then Data end) as DATA2,

                max(case when PART='04' then Data end) as DATA3,

                max(case when PART='05' then Data end) as DATA4,

                max(case when PART='06' then Data end) as DATA5,

                max(case when PART='07' then Data end) as DATA6,

                max(case when PART='08' then Data end) as DATA7,

                max(case when PART='09' then Data end) as DATA8,

                max(case when PART='10' then Data end) as DATA9,

                max(case when PART='11' then Data end) as DATA10,

                max(case when PART='12' then Data end) as DATA11

from SRC

group by ID;

explain select * from wide2;

 *** Help information returned. 18 rows.

 *** Total elapsed time was 1 second.

Explanation

---------------------------------------------------------------------------

  1) First, we lock a distinct SAS."pseudo table" for read on a RowHash to prevent global deadlock for SAS.a0.

  2) Next, we lock SAS.a0 in view wide2 for read.

  3) We do an all-AMPs SUM step to aggregate from SAS.a0 in view wide2 by way of an all-rows scan with no residual conditions , grouping by field1 ( SAS.SRC.ID).  Aggregate Intermediate Results are computed globally, then placed in Spool 4.  The size of Spool 4 is estimated with high confidence to be 2,000 rows (234,000 bytes).  The estimated time for this step is 0.17 seconds.

  4) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan into Spool 2 (group_amps), which is built locally on the AMPs.  The size of Spool 2 is estimated with high confidence to be 2,000 rows (582,000 bytes).  The estimated time for this step is 0.04 seconds.

  5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

  -> The contents of Spool 2 are sent back to the user as the result of statement 1.  The total estimated time is 0.22 seconds.

N/A

Re: Query from view optimizations

This is hard to tell without testing. If this was a smaller table i'd go with #1 (as long as everything is AMP-local).

It will mainly depend on the number of columns returned (you were talking about "hundreds") and the data size. You need to run some tests with yor actual data and check explain/DBQL.

You might also hit some other limits like maximum number of columns in a table or maximum row size, i usually don't care about , for me it's large enough. But SAS guys try to run strange things on a DBMS :-)

Dieter

Re: Query from view optimizations

ORA has limit up to 1k columns, where is TD limit?

 But SAS guys try to run strange things on a DBMS :-)

heh,

If you say so, have you any experience with SAS? :)