Minus Vs Exists

Database

Minus Vs Exists

Hi,

I have two large multiset tables and I need to check if they are identical. In that case which one will give better performance: MINUS or NOT EXISTS?

Tags (1)
3 REPLIES
N/A

Re: Minus Vs Exists

Multiset - do you have duplicates?

If yes: both scenarios might not give you what you need.

Check:

create volatile multiset table vt_a (a integer) no primary index;
create volatile multiset table vt_b (a integer) no primary index;
insert into vt_a values (1);
insert into vt_a values (1);
insert into vt_a values (2);
insert into vt_b values (1);
insert into vt_b values (2);
insert into vt_b values (2);
select * from vt_a
minus
select * from vt_b;
select * from vt_a a
where not exists
( select * from vt_b b where a.a = b.a)
;

Both queries give you an empty result set. But I think we agree that both tables are not identical!

You can export the ordered tables via tpt and run a SHA256 checksum and compare the checksum.

Check Example Java UDF for Table Hash Calculations

and

Calculation of Table Hash Values to Compare Table Content

The provided Java UDF will not work in your case as the Java UDF has not a good performance.

My impression is that the problem of "how to compare table content" is not so far taken serioisly in the past. Espacially in case of multi systems...

Ulrich

Re: Minus Vs Exists

Thanks Ulrich for your reply.

Those two tables are multi set but will not have duplicate rows. We are currenlt using hashrow function for CDC but in past eperienced hash collision.

N/A

Re: Minus Vs Exists

In case of multiset tables your assumption should be that duplicates can exists. The DB will not prevent it...

You need real strong hash functions like SHA-256 to avoid the hash collisions. As stated in the presentation I would not use any hashes below md5 bit length...

A Teradata internal table hash function would be a good feature for future releases ;->