Comparing two tables

Database
Enthusiast

Comparing two tables

Hi Gurus
I have two huge tables TABLE-A and TABLE-B (200 Million rows in each table)
Now I am checking to see if both the table are identical or not.

one way of testing it is
sel * from table-A
minus
sel * from table-B

and then

sel * from table-B
minus
sel * from table-A

is there a better/faster way of doing this. Each table has more than 50 columns.

Thanks

4 REPLIES
Senior Apprentice

Re: Comparing two tables

MINUS: spool both tables & redistribute -> sort distinct -> exclusion merge join

Is the table SET or MULTISET? If it's MULTISET you'll need MINUS ALL, if it's SET, then MINUS ALL is more efficient, too.

MINUS ALL: spool both tables & redistribute -> sort -> minus all join

But spooling & redistributing is a large overhead, so if it's a SET table the most efficient way is probably using
NOT EXISTS: just a direct exclusion merge join

Of course it's easier to code a
SEL * FROM T1 MINUS ALL SEL * FROM T2
instead of a
SEL * FROM T1
WHERE NOT EXISTS
(SEL * FROM T2 WHERE T1.col1 = T2.col2 AND ... T1.col50 = T2.col50)
but it's probably worth the effort (and you can create the code using dbc.columns)

Dieter
Enthusiast

Re: Comparing two tables

Thank you very much!
I really appreciate it.

Enthusiast

Re: Comparing two tables

Hello,

Please i just want to verify that i well understood :  you are saying that NOT EXISTS is better  than MINUS / MINUS ALL , that's it ?

Can you give me more details please ?

Many thanks.

Ghalia

Teradata Employee

Re: Comparing two tables

As Dieter said (years ago): MINUS / MINUS ALL will always spool / redistribute / sort both tables.

If you code a NOT EXISTS, the optimizer will take advantage of the PI being the same and just do a direct AMP-local join.

Note that the results for NOT EXISTS are equivalent to MINUS, not MINUS ALL. That's fine for SET tables, but not for a MULTISET table with duplicate rows.