Hoping that someone can help with a situation I'm in. We have a source table that we're pulling in from SQL Server where the source is 3 million rows, and the table has about 100 columns in it. Once we pull it in, we have to do a full table comparison to see if any of the columns changed since the last load (CDC). I've tried using a MINUS, and also a LEFT OUTER JOIN, but in both cases, I run out of spool space because of having to coalesce all of the columns...many of which are null. I'm looking for suggestions on the best way to do this.
both joining on COALESCE and MINUS will result in redistribution steps, you better switch to NOT EXISTS instead:
WHERE NOT EXISTS
(SELECT * FROM t2
WHERE (t1.col1 = t2.col2 OR (t1.col1 IS NULL AND t2.col1 IS NULL))
AND (t1.col2 = t2.col2 OR (t1.col2 IS NULL AND t2.col2 IS NULL))
AND (t1.col3 = t2.col3 OR (t1.col3 IS NULL AND t2.col3 IS NULL))
AND (t1.col100 = t2.col100 OR (t1.col100 IS NULL AND t2.col100 IS NULL))
Assuming both table share the same PI this will result in an AMP-local join without any preparation step. No need to think about which value to use in COALESCE and the code is easily created automatically.