FULL OUTER JOIN syntax and problems with correct output

Database
Enthusiast

FULL OUTER JOIN syntax and problems with correct output

I've been having difficulty with the position of the WHERE/AND clauses within a FULL OUTER JOIN and filtering of the the data on either side of the full outer join. It seems if i'm doing a self join filtering by column for each sub-set the only bullet proof way to do it is to create derived tables of the data first, because all other data does not return the right results. Can somebody lead me to an explanation of how teradata creates the plans?

CREATE VOLATILE TABLE a (
aa INTEGER
,ab CHAR(2)
)PRIMARY INDEX(aa)
ON COMMIT PRESERVE ROWS;

CREATE VOLATILE TABLE b (
ba INTEGER
,bb CHAR(2)
)PRIMARY INDEX(ba)
ON COMMIT PRESERVE ROWS;

INSERT INTO a VALUES (1,'a');
INSERT INTO a VALUES (2,'a');
INSERT INTO a VALUES (3,'a');
INSERT INTO a VALUES (5,'a');
INSERT INTO a VALUES (6,NULL);
INSERT INTO a VALUES (7,NULL);

INSERT INTO b VALUES (1,'b');
INSERT INTO b VALUES (2,'b');
INSERT INTO b VALUES (3,'b');
INSERT INTO b VALUES (4,'b');
INSERT INTO b VALUES (6,NULL);
INSERT INTO b VALUES (8,NULL);

SEL * FROM a;

SEL * FROM b;

SEL * FROM (SEL * FROM a WHERE a.ab='a')a2
FULL OUTER JOIN (SEL * FROM b WHERE b.bb='b')b2
ON a2.aa=b2.ba
ORDER BY 1,3; --great 5 rows
/*
aa ab ba bb
1 ? ? 4 b
2 1 a 1 b
3 2 a 2 b
4 3 a 3 b
5 5 a ? ?
*/

SEL *
FROM a
FULL OUTER JOIN b
ON a.aa=b.ba
WHERE a.ab = 'a'
AND b.bb = 'b'; --effectivly and inner join 3 wrows
/*
aa ab ba bb
1 2 a 2 b
2 3 a 3 b
3 1 a 1 b
*/

SEL *
FROM a
FULL OUTER JOIN b
ON a.aa=b.ba
AND a.ab = 'a'
AND b.bb = 'b'; --rubbish 9 rows and doesn't match #6
/*
aa ab ba bb
1 2 a 2 b
2 3 a 3 b
3 ? ? 8 ?
4 7 ? ? ?
5 ? ? 4 b
6 5 a ? ?
7 1 a 1 b
8 6 ? ? ?
9 ? ? 6 ?
*/

SEL *
FROM a
FULL OUTER JOIN b
ON a.aa=b.ba
AND b.bb = 'b'; --rubbish 9 rows and doesn't match #6
WHERE a.ab = 'a'
/*
aa ab ba bb
1 2 a 2 b
2 3 a 3 b
3 ? ? 8 ?
4 7 ? ? ?
5 ? ? 4 b
6 5 a ? ?
7 1 a 1 b
8 ? ? 6 ?
9 6 ? ? ?
*/
4 REPLIES
Enthusiast

Re: FULL OUTER JOIN syntax and problems with correct output

Addendum:
SEL * FROM a
FULL OUTER JOIN b
ON (b.bb='b')
AND (a.ab='a')
AND (a.aa=b.ba)
WHERE NOT (a.ab IS NULL AND b.bb IS NULL )
ORDER BY 1,3; --great 5 rows

is the only way I can seem to get the right answer without using derived tables...
Junior Contributor

Re: FULL OUTER JOIN syntax and problems with correct output

For Inner Joins there's no difference where you put the search conditions, but for Outer Joins there is. According to Standard SQL the order of evaluation is:
First there's the JOIN based on the ON conditions and this result set is then filtered by the WHERE.

If you check the explain sof your different queries you'll see that the optmizer is quite smart:
#2 is actually rewritten as an Inner Join
#4 will be a Left Join (if you move the semicolon after the Where condition)

I usually tell my students:
Most people don't like OJs.
If anybody claims to like writing OJs there are two possible reasons for that.
#1 He is lying
#2 He doesn't have a clue about OJs

Then i recommend to read the "Outer Join Case Study" in the SQL DML manual, this is a rewrite of an article from the old "Teradata Magazine", it's original name was "A Lesson on Outer Joins Learned the Hard Way".

Left/Right is already hard enough, but Full Outer is really horrible.
And you found both possible solutions, congrats :-)

I would prefer the first version, especially it the query gets more complicated.
But as the plans will be different you can choose the more efficient one based on the number of rows joined/returned.

Dieter
Enthusiast

Re: FULL OUTER JOIN syntax and problems with correct output

Thanks so much for the article reference. It brings up two points I came across:

1) If a search condition on the inner table is placed in the WHERE clause, the JOIN is logically equivalent to an INNER JOIN, even if you code OUTER JOIN in the query.

2) When a search condition (used for row selection) is required on the inner table, it should be put in the ON clause as well. A search condition in the ON clause of the inner table will not limit the number of rows in the answer set. It only defines the rows eligible to take part in the match to the outer table.

Thanks again for assistance.
Enthusiast

Re: FULL OUTER JOIN syntax and problems with correct output

Addendum:
1) Restrict inner table within the ON condition
2) Restrict outer table within the WHERE condition

A full outer join without derived tables needs to be treated as a Hybrid,
a.ab and b.bb are restricted in both the "inner table" way (after the ON) and the "outer table" way (after the WHERE)
as only the following are equivalent:

With the following difference:
UPDATE a
SET ab = 'c'
WHERE aa = 2;

UPDATE b
SET bb = 'c'
WHERE ba = 3;

-------------------------------------------
SEL * FROM a
FULL OUTER JOIN b
ON (
a.aa=b.ba
AND a.ab = 'a' -- ***
AND b.bb = 'b' -- ***
-- *** These stops the accidental association with the inner join part when it should be outer join (or in SET theory Left NOT Right or Right NOT Left). The problem being it will accidentally be matched with the INNER JOIN part and be filtered out within the "Second Process", the WHERE clause, when it shouldn't be.
)
WHERE (a.ab = 'a' OR a.ab IS NULL) -- Main driving conditions
AND (b.bb = 'b' OR b.bb IS NULL) -- Main driving conditions
AND NOT (a.ab IS NULL AND b.bb IS NULL ) --cleans up rows that have
--NULLS in both a.ab & b.bb at the same time
ORDER BY 1,3

-------------------------------------------
SEL * FROM (SEL * FROM a WHERE a.ab = 'a')a2
FULL OUTER JOIN (SEL * FROM b WHERE b.bb = 'b')b2
ON a2.aa=b2.ba
ORDER BY 1,3

As you can see, the safest way to do it is have derived tables.