The optmizer might do a join elimination when it's not changing the result set.
There are two issues with your approach:
#1: It must be a join to a UNIQUE/PK column.
In your case this is not known by the optmizer (indicated by the increasing estimated number of rows in each step).
#2: There must be an Outer Join or a Foreign Key.
Your view results in an Inner Join due to the WHERE condition on a1.PART and a2.PART.
To fix it you should define (PART, KEY) as UNIQUE and move a1.PART='01' and a2.PART='02' into the ON condition.
Thx for suggestions,
I do some modifications but TD still tries to make joins. Any ideas?
I created SRC table using:
Generally in real life tables called SRC have hundreds columns and millions of rows per months (part column). It must be prepared (transposed, same way as WIDE view does) to be processed using another procedure, which needs values from many months in one row. Typically we touch 6, sometimes 12 months. Sometimes there are more than one SRC table, so if there as eight SRC tables with 12 months WIDE view will join 96 parititions. If DB will get first part, join second, next db will join third to result, next woll join another, and again, again it will be veeery inefficient. Best way is to do sort and merge all tables i one pass.
Maybe view is not good option, what do you think?
my fault ;)
looks much better :)
I next comment wrote, that typically i will join 6 or more partitions. here is example view:
TD can efficiently remove not important references, but in mu opinion join method is still not efficient. TD wants to join table by table. it is very simple star schema, and, i think, TD should join all in one step (parallel sort, then merge).
I tried this scenario in ORACLE and i had similar problems. It has had significiant impact (ten or more times slower than expected).
I doubt you'll find any RDBMS which is capable of joining more than two sets of data at the same time.
Getting both, join elimination and fast processing at the same time is probably not possible. If you change the PI to ID all the joins should be AMP-local, but then you can't define (PART,ID) as unique (unless you add a USI).
Instead of joining the same table multiple times you could also use some CASE/GROUP BY logic (again based on ID as NUPI leading to AMP-local aggregation):
max(case when PART = '01' then Data end) as Data0,
max(case when PART = '02' then Data end) as Data1,
max(case when PART = '03' then Data end) as Data2,
max(case when PART = '04' then Data end) as Data3,
group by 1
Your solution looks nice :)
Extending question - at this time I can't make tests, so i have question - what do you think: If there will be SRC0-SRC9 tables (all tables with 12 parts) which option will be better?
View mentioned at beginning:
This is hard to tell without testing. If this was a smaller table i'd go with #1 (as long as everything is AMP-local).
It will mainly depend on the number of columns returned (you were talking about "hundreds") and the data size. You need to run some tests with yor actual data and check explain/DBQL.
You might also hit some other limits like maximum number of columns in a table or maximum row size, i usually don't care about , for me it's large enough. But SAS guys try to run strange things on a DBMS :-)
ORA has limit up to 1k columns, where is TD limit?
> But SAS guys try to run strange things on a DBMS :-)
If you say so, have you any experience with SAS? :)