Issue with Joins

Database

Issue with Joins

Hello All,

While just checking couple of joins on TD 15.10, came up with a weird scenario.

create volatile table test1
(
a integer not null,
b integer not null
) primary index (a) on commit preserve rows;

create volatile table test2
(
b integer not null,
c varchar(10)
) primary index (b) on commit preserve rows;

create volatile table test3
(
a integer not null,
b integer not null,
c varchar(10)
) primary index (a) on commit preserve rows;


insert into test1 values (1,1);
insert into test1 values (2,2);
insert into test1 values (4,5);

insert into test2 values (1,'abc');
insert into test2 values (2,'xyz');

 

insert into test3(a,b,c)
select t1.a,t2.b,t2.c
from test1 t1 left join test2 t2
on t2.b = t1.b
WHERE ((
CASE WHEN (t2.b IS NULL) THEN 'Y'
ELSE 'N'
END ) = 'N');

 

Running the above Insert DML gives TD 3604 error - Cannot place a Null value into a Not Null field.. 

Observed couple of interesting points:

Changing the DDL of the target as below :

create volatile table test3
(
a integer not null,
b bigint not null,
c varchar(10)
) primary index (a) on commit preserve rows;

 

OR

create volatile table test3
(
a integer not null,
b integer not null,
c varchar(10)
) primary index (b) on commit preserve rows;

 

resolves the issue. Am trying to understand how changing the data type from Integer to BIGINT or adding the field as PI  solves the 3604 issue.

 

Also, rewrote the above query as below

insert into test3(a,b,c)
select t1.a,t2.b,t2.c
from test1 t1 left join test2 t2
on t2.b = t1.b
WHERE ((
CASE WHEN (t2.b IS NULL) THEN 'Y'
ELSE 'N'
END ) = 'N') and t2.b IS NULL ;

DDL of the target is as below:

create volatile table test3
(
a integer not null,
b integer not null,
c varchar(10)
) primary index (a) on commit preserve rows;

 

This works !!!!!!!

 

Can someone please suggest what's going on internally here.. Is it something related to the way the where clause and the joins are getting parsed?

 

Thanks in advance

Srilatha

 

 

 

4 REPLIES
Supporter

Re: Issue with Joins

This is interesting

I checked it on release 16.0 express version

 

 

select t1.a,t2.b,t2.c
from test1 t1 left join test2 t2
on t2.b=t1.b
WHERE ((CASE WHEN (t2.b IS NULL) THEN'Y' ELSE'N' END)='N');

results in 2 rows as expected

 

a    b    c
1    1    abc
2    2    xyz

 

insert into test3(a,b,c)
select t1.a,t2.b,t2.c
from test1 t1 left join test2 t2
on t2.b=t1.b
WHERE ((CASE WHEN (t2.b IS NULL) THEN'Y' ELSE'N' END)='N')

give the same error message

 

JDBC::error: [Teradata Database] [TeraJDBC 16.00.00.24] [Error 3604] [SQLState 23000] Cannot place a null value in a NOT NULL field.

 

If the query is re-written to a not null constrain like

 

insert into test3(a,b,c)
select t1.a,t2.b,t2.c
from test1 t1 left join test2 t2
on t2.b=t1.b
WHERE t2.b IS NOT NULL;

the two rows are inserted.

 

 

The explains are quite different.

Case condition:

Explanation
  1) First, we do an all-AMPs RETRIEVE step from ULI.t1 by way of an
     all-rows scan with no residual conditions into Spool 2 (all_amps),
     which is redistributed by the hash code of (ULI.t1.b) to all AMPs.
     Then we do a SORT to order Spool 2 by row hash.  The size of Spool
     2 is estimated with high confidence to be 3 rows (63 bytes).  The
     estimated time for this step is 0.00 seconds.
  2) Next, we do an all-AMPs JOIN step from Spool 2 (Last Use) by way
     of a RowHash match scan, which is joined to ULI.t2 by way of a
     RowHash match scan with no residual conditions.  Spool 2 and
     ULI.t2 are left outer joined using a merge join, with a join
     condition of ("ULI.t2.b = b"), and with a post join condition of (
     "(( CASE WHEN (b IS NULL) THEN ('Y') ELSE ('N') END ))= 'N'").
     The result goes into Spool 1 (all_amps), which is redistributed by
     the hash code of (ULI.t1.a) to all AMPs.  Then we do a SORT to
     order Spool 1 by row hash.  The size of Spool 1 is estimated with
     index join confidence to be 4 rows (104 bytes).  The estimated
     time for this step is 0.01 seconds.
  3) We do an all-AMPs MERGE step into ULI.test3 from Spool 1 (Last
     Use).  The size is estimated with index join confidence to be 4
     rows.  The estimated time for this step is 1.00 seconds.
  4) We spoil the parser's dictionary cache for the table.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

not null condition:

 

Explanation
  1) First, we do an all-AMPs RETRIEVE step from ULI.t1 by way of an
     all-rows scan with no residual conditions into Spool 2 (all_amps),
     which is redistributed by the hash code of (ULI.t1.b) to all AMPs.
     Then we do a SORT to order Spool 2 by row hash.  The size of Spool
     2 is estimated with high confidence to be 3 rows (63 bytes).  The
     estimated time for this step is 0.00 seconds.
  2) Next, we do an all-AMPs JOIN step from ULI.t2 by way of a RowHash
     match scan with no residual conditions, which is joined to Spool 2
     (Last Use) by way of a RowHash match scan.  ULI.t2 and Spool 2 are
     joined using a merge join, with a join condition of ("ULI.t2.b = b").
     The result goes into Spool 1 (all_amps), which is redistributed by
     the hash code of (ULI.t1.a) to all AMPs.  Then we do a SORT to
     order Spool 1 by row hash.  The size of Spool 1 is estimated with
     index join confidence to be 3 rows (78 bytes).  The estimated time
     for this step is 0.01 seconds.
  3) We do an all-AMPs MERGE step into ULI.test3 from Spool 1 (Last
     Use).  The size is estimated with index join confidence to be 3
     rows.  The estimated time for this step is 1.00 seconds.
  4) We spoil the parser's dictionary cache for the table.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

I would consider this as a bug and would open an incident with Teradata....

 

Re: Issue with Joins

Hi,

Thanks for checking this out..

As I mentioned earlier, converting Integer to BigInteger also solved the problem and in that case did not see any difference in the explain plan..

An incident is already raised to Teradata. Waiting to hear from them.

 

Regards

Srilatha

Supporter

Re: Issue with Joins

Hi,

please share the feedback once you get from TD ;).

Ulrich

Senior Apprentice

Re: Issue with Joins

Hi Ulrich,

the strange part is that it fails in the join step (step 2), not in the merge. 

I used DIAGNOSTIC VerboseExplain and this showed some additional info:

 

     left outer joined using a merge join, with a join condition of (
     "t2.b = Spool_2.b"), and with a post join
     condition of ("(( CASE WHEN (b IS NULL) THEN ('Y') ELSE ('N') END
     ))= 'N'").  The result goes into Spool 1 (all_amps) (Spool 3 is
     pruned), which is redistributed by hash code to all AMPs to all
     AMPs with hash fields ("Spool_2.a").  Then we do a SORT to order
     Spool 1 by row hash.  The result spool is built with the target
     table's structure.  The size of Spool 1 is estimated with index

Seems like 15.10's new "post join condition" is done after building the spool :-)

 

Switching to BIGINT or a different PI results in an additional step.

 

Btw, WHERE t2.b IS NOT NULL changes the Outer Join to Inner, thus avoiding NULLs.

Seems like the optimizer is not smart enough to recognize that the original query can be changed to an Iner Join, too, why outer joining first when you filter NULLs afterwards?