Explain INSERT INTO DB1.Table1
LEFT OUTER JOIN DB1.STAGING_PRIOR P
ON S.DT= P.DT
AND S.SID = P.SID
AND S.OID = P.OID
AND S.FID = P.FID
AND COALESCE(S.B_NUM, 'Z') = COALESCE(P.B_NUM, 'Z')
WHERE P.DT IS NULL;
1) First, we lock a distinct DB1."pseudo table" for read on a
RowHash to prevent global deadlock for DB1.P.
2) Next, we lock a distinct DB1."pseudo table" for write on a
RowHash to prevent global deadlock for
3) We lock a distinct DB1."pseudo table" for read on a
RowHash to prevent global deadlock for DB1.S.
4) We lock DB1.P for read, we lock
DB1.Table1for write, and we lock
DB1.S for read.
5) We do an all-AMPs JOIN step from DB1.S by way of a RowHash
match scan with no residual conditions, which is joined to
DB1.P by way of a RowHash match scan with a condition of (
"(NOT (DB1.P.OID IS NULL )) AND ((NOT
(DB1.P.SID IS NULL )) AND (NOT (DB1.P.DT
IS NULL )))"). DB1.S and DB1.P are left outer
joined using a merge join, with condition(s) used for non-matching
on left table ("(NOT (DB1.S.FID IS NULL )) AND
((NOT (DB1.S.OID IS NULL )) AND ((NOT
(DB1.S.SID IS NULL )) AND (NOT (DB1.S.DT
IS NULL ))))"), with a join condition of ("((( CASE WHEN (NOT
(DB1.S.B_NUM IS NULL )) THEN (DB1.S.B_NUM)
ELSE ('Z') END ))= (( CASE WHEN (NOT (DB1.P.B_NUM IS
NULL )) THEN (DB1.P.B_NUM) ELSE ('Z') END ))) AND
((DB1.S.FID = DB1.P.FID) AND
((DB1.S.OID = DB1.P.OID) AND
((DB1.S.SID = DB1.P.SID) AND
(DB1.S.DT = DB1.P.DT ))))"). The
result goes into Spool 2 (all_amps), which is built locally on the
AMPs. The size of Spool 2 is estimated with low confidence to be
887,490 rows (62,124,300 bytes). The estimated time for this step
is 0.27 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan with a condition of ("DT IS NULL") into
Spool 1 (all_amps), which is redistributed by the hash code of (
DB1.S.DT, ‘STG’, 'NEW
FACILITY', DB1.S.SID (VARCHAR(10), CHARACTER SET LATIN,
NOT CASESPECIFIC, NAMED SID, FORMAT 'X(10)', NULL),
DB1.S.B_NUM, NULL (VARCHAR(30), CHARACTER SET LATIN,
NOT CASESPECIFIC), CAST((CURRENT_TIMESTAMP(6)) AS TIMESTAMP(6)))
to all AMPs. Then we do a SORT to order Spool 1 by row hash. The
size of Spool 1 is estimated with low confidence to be 887,490
rows (717,979,410 bytes). The estimated time for this step is
7) We do an all-AMPs MERGE into DB1.Table1from
Spool 1 (Last Use). The size is estimated with low confidence to
be 887,490 rows. The estimated time for this step is 3 minutes
and 33 seconds.
8) We spoil the parser's dictionary cache for the table.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
Need to tune this query so that estimated time comes below 3 minutes. The PI of the 2 joining tables are the first 3 columns which are being joined. The last join column B_num , however has some skew for B_num = 01 and nulls. I tried converting the Coalesce to a derived table, so that we do not have to using coalese fucntion in join. I tried using union all with the 1 query handling nulls and other not handling it.The table being inserted is multiset with a UPI.
Any suggestions are appreciated.
Why do you care about the estimated time?
Did you check the actual numbers in the QueryLog?
Based on the explain everything is fine.
The join is on the PI-columns, followed by a redistribution, both estimated to run in half a second.
And the final Merge depends on your target taget table, unless you change the PI to match the source PI there's no way to speed this up.