Merge Join with OVERLAPS producing huge spool file

Database
Enthusiast

Merge Join with OVERLAPS producing huge spool file

Calling all 'OVERLAPS' function experts...Does anyone understand why the overlaps function below is blowing a 1million and 57million spool file merge join out to 23billion rows?

The problem lies in step 14 of the explain plan below:

EXPLAIN SELECT LOADS.LOADS_KEY , LOADS.SHOVEL_EQUIP_KEY , LOADS.LOAD_FULL_DTTM ,
DUMPS.DUMPS_KEY , DUMPS.TRUCK_EQUIP_KEY , DUMPS.DUMP_EMPTY_DTTM ,
DUMPS.MANUAL_ENTRY_IND , LH.HAULS_KEY , LH.HAUL_START_DTTM ,
LH.HAUL_END_DTTM , LH.SPOTTING_DTTM , LH.SPOTTED_DTTM , DH.HAULS_KEY ,
DH.HAUL_START_DTTM , DH.HAUL_END_DTTM , DH.SPOTTING_DTTM , DH.SPOTTED_DTTM ,
OES.EQUIP_STATUS_KEY , OES.STATE_START_DTTM , OES.STATE_END_DTTM ,
OES.ACQ_SITE_ABBR , OES.ACQ_SRC_SYS_ABBR
FROM DEV_CORE_T.DUMPS , DEV_CORE_T.LOADS , DEV_CORE_T.HAULS LH , HAULS DH , DEV_CORE_T.OPS_EQUIP_STATE OES
WHERE ( LOADS.LOADS_KEY = LH.LOADS_KEY
AND LOADS.DELETED_AT_DT IS NULL
AND LH.DELETED_AT_DT IS NULL )
AND ( DUMPS.DUMPS_KEY = LOADS.DUMPS_KEY
AND DUMPS.DELETED_AT_DT IS NULL )
AND ( DUMPS.DUMPS_KEY = DH.DUMPS_KEY
AND DH.DELETED_AT_DT IS NULL )
AND ( LH.DUMPS_KEY = 0
AND LH.LOADS_KEY <> 0 )
AND ( DH.LOADS_KEY = 0
AND DH.DUMPS_KEY <> 0 )
AND DUMPS.DUMP_EMPTY_DTTM >= CAST( '2007-08-24 08:00:00' AS TIMESTAMP( 0 ) )
AND DUMPS.TRUCK_EQUIP_KEY = OES.EQUIPMENT_KEY
AND (OES.STATE_START_DTTM , OES.STATE_END_DTTM ) OVERLAPS (LH.HAUL_START_DTTM ,
DUMPS.DUMP_EMPTY_DTTM )
ORDER BY DUMPS.DUMP_EMPTY_DTTM DESC ;

Explain Plan
-------------------------------------------------------------------------------------------------------------------------------------------------------
Explanation
1) First, we lock a distinct DEV_CORE_T."pseudo table" for read on a
RowHash to prevent global deadlock for DEV_CORE_T.DH.
2) Next, we lock a distinct DEV_CORE_T."pseudo table" for read on a
RowHash to prevent global deadlock for DEV_CORE_T.LOADS.
3) We lock a distinct DEV_CORE_T."pseudo table" for read on a RowHash
to prevent global deadlock for DEV_CORE_T.DUMPS.
4) We lock a distinct DEV_CORE_T."pseudo table" for read on a RowHash
to prevent global deadlock for DEV_CORE_T.OES.
5) We lock DEV_CORE_T.DH for read, we lock DEV_CORE_T.LOADS for read,
we lock DEV_CORE_T.DUMPS for read, and we lock DEV_CORE_T.OES for
read.
6) We do an all-AMPs RETRIEVE step from DEV_CORE_T.DUMPS by way of an
all-rows scan with a condition of (
"(DEV_CORE_T.DUMPS.DUMP_EMPTY_DTTM >= TIMESTAMP '2007-08-24
08:00:00') AND (DEV_CORE_T.DUMPS.DELETED_AT_DT IS NULL)") into
Spool 2 (all_amps) (compressed columns allowed) fanned out into 5
hash join partitions, which is built locally on the AMPs. The
size of Spool 2 is estimated with low confidence to be 2,829,154
rows (113,166,160 bytes). The estimated time for this step is
1.83 seconds.
7) We do an all-AMPs RETRIEVE step from DEV_CORE_T.DH by way of an
all-rows scan with a condition of ("(DEV_CORE_T.DH.DELETED_AT_DT
IS NULL) AND ((DEV_CORE_T.DH.DUMPS_KEY <> 0.) AND
(DEV_CORE_T.DH.LOADS_KEY = 0. ))") into Spool 3 (all_amps)
(compressed columns allowed) fanned out into 5 hash join
partitions, which is redistributed by the hash code of (
DEV_CORE_T.DH.DUMPS_KEY) to all AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 3 is estimated with low confidence to be
2,731,470 rows (210,323,190 bytes). The estimated time for this
step is 8.03 seconds.
8) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using a hash join
of 5 partitions, with a join condition of ("DUMPS_KEY = DUMPS_KEY").
The result goes into Spool 4 (all_amps) (compressed columns
allowed), which is built locally on the AMPs into 34 hash join
partitions. The size of Spool 4 is estimated with low confidence
to be 2,731,470 rows (284,072,880 bytes). The estimated time for
this step is 3.41 seconds.
9) We do an all-AMPs RETRIEVE step from DEV_CORE_T.LOADS by way of an
all-rows scan with a condition of (
"DEV_CORE_T.LOADS.DELETED_AT_DT IS NULL") into Spool 5 (all_amps)
(compressed columns allowed) fanned out into 34 hash join
partitions, which is redistributed by the hash code of (
DEV_CORE_T.LOADS.DUMPS_KEY) to all AMPs. The size of Spool 5 is
estimated with high confidence to be 3,553,915 rows (167,034,005
bytes). The estimated time for this step is 6.86 seconds.
10) We do an all-AMPs RETRIEVE step from DEV_CORE_T.LH by way of an
all-rows scan with a condition of ("(DEV_CORE_T.LH.DELETED_AT_DT
IS NULL) AND ((DEV_CORE_T.LH.LOADS_KEY <> 0.) AND
(DEV_CORE_T.LH.DUMPS_KEY = 0. ))") into Spool 6 (all_amps)
(compressed columns allowed) fanned out into 26 hash join
partitions, which is redistributed by the hash code of (
DEV_CORE_T.LH.LOADS_KEY) to all AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 6 is estimated with low confidence to be
2,760,455 rows (212,555,035 bytes). The estimated time for this
step is 8.07 seconds.
11) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to Spool 5 (Last Use) by way of an
all-rows scan. Spool 4 and Spool 5 are joined using a hash join
of 34 partitions, with a join condition of ("(DUMPS_KEY =
DUMPS_KEY) AND (DUMPS_KEY = DUMPS_KEY)"). The result goes into
Spool 7 (all_amps) (compressed columns allowed), which is
redistributed by the hash code of (DEV_CORE_T.LOADS.LOADS_KEY) to
all AMPs into 26 hash join partitions. The size of Spool 7 is
estimated with low confidence to be 2,700,165 rows (329,420,130
bytes). The estimated time for this step is 8.65 seconds.
12) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using a hash join
of 26 partitions, with a join condition of ("(LOADS_KEY =
LOADS_KEY) AND (LOADS_KEY = DUMPS_KEY)"). The result goes into
Spool 8 (all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 8 by the
hash code of (DEV_CORE_T.DUMPS.TRUCK_EQUIP_KEY). The size of
Spool 8 is estimated with low confidence to be 1,421,949 rows (
230,355,738 bytes). The estimated time for this step is 5.17
seconds.
13) We do an all-AMPs RETRIEVE step from DEV_CORE_T.OES by way of an
all-rows scan with no residual conditions into Spool 9 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs.
Then we do a SORT to order Spool 9 by the hash code of (
DEV_CORE_T.OES.EQUIPMENT_KEY). The result spool file will not be
cached in memory. The size of Spool 9 is estimated with high
confidence to be 57,468,192 rows (3,965,305,248 bytes). The
estimated time for this step is 1 minute and 15 seconds.
14) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of a
RowHash match scan, which is joined to Spool 9 (Last Use) by way
of a RowHash match scan. Spool 8 and Spool 9 are joined using a
merge join, with a join condition of ("((STATE_START_DTTM
,STATE_END_DTTM) OVERLAPS (HAUL_START_DTTM ,DUMP_EMPTY_DTTM)) AND
(TRUCK_EQUIP_KEY = EQUIPMENT_KEY)"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 1 by the sort key in spool field1 (
DEV_CORE_T.DUMPS.DUMP_EMPTY_DTTM). The result spool file will not
be cached in memory. The size of Spool 1 is estimated with low
confidence to be 23,977,945,466 rows (7,912,722,003,780 bytes).
The estimated time for this step is 41 hours and 1 minute.
15) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 41 hours and 3 minutes.

If you need more info please let me know.

Thanks!
Tags (1)
2 REPLIES
Enthusiast

Re: Merge Join with OVERLAPS producing huge spool file

The main thrust of this query is to merge three tables together (one is aliased) giving a total of four tables,

LOADS

DUMPS

HAULS

with a fourth table OPS_EQUIP_STATE to identify a date range from the LOADS à DUMPS—HAULS join with STATE changes in a piece of equipment.

Senior Apprentice

Re: Merge Join with OVERLAPS producing huge spool file

DUMPS.TRUCK_EQUIP_KEY = OES.EQUIPMENT_KEY
The optimizer assumes it is a many-to-many join (which results in a kind of product join per equipment key) and can't calculate the actual numbers correctly.

Is the spool actually that large/query actuall running that long?
Probably not.
It's the last step, so just forget about it :-)

Dieter