Excessive row estimates

Database
Enthusiast

Excessive row estimates

I have been looking at an explain that is giving me some unbelievably high row estimates when joining two spool files.  This is the result of some ETL pushdown SQL generated by DataServices.  I'm getting a row hash match scan merge join between the two spool files in steps 9,10, 11, and 12 below with index join confidence.  All stats are current.  The underlying tables are not ideally defined for join access but it does ok until spool 26 is joined to spool 25, spool 27, spool 29, and spool 31 in that order.  Spool 25 is estimated at 52,274 which is almost spot on.  The final actual result set is also around 52,000 records.  The estimates at each step are:

Step 9:   324,579

Step 10: 203,116,012

Step 11: 311,108,767,248

Step 12: 479,614,004,187,648

Here is the expain if anyone wants to take a look and come up with an expanation and suggestions for improvement.  I have tried an array of join indexes to no avail.  If you are taking a look at it and you need more info please let me know... I've seen these excessive spool join record count estimates in other queries...so want to make sure it is not an optimizer issue (which the developers seem to believe)

EXPLAIN SELECT WORK_ORDER_F_P1_2.WORK_ORDER_KEY , WORK_ORDER_F_P1_2.SRC_AUFNR ,

WORK_ORDER_F_P1_2.ACQ_SITE_ABBR , WORK_ORDER_F_P1_2.ACQ_SRC_SYS_ABBR ,

COALESCE( WORK_ORDER.WORK_ORDER_TYPE_KEY , 0) , COALESCE( WORK_ORDER.MAINT_PLAN_HIST_KEY ,

0) , COALESCE( WORK_ORDER.FUNCTIONAL_LOC_KEY , 0) , COALESCE( WORK_ORDER.WORK_ORDER_PRIORITY_KEY ,

0) , COALESCE( WORK_ORDER.WORK_CENTER_KEY , 0) , COALESCE( WORK_ORDER.MAINT_ACTIVITY_TYPE_KEY ,

0) , COALESCE( WORK_ORDER.TASK_LIST_KEY , 0) , COALESCE( WORK_ORDER.WEEK_CODE_KEY ,

0) , COALESCE( WORK_ORDER.WORK_ORDER_NBR , '?') , COALESCE( WORK_ORDER.WORK_ORDER_DESC ,

'?') , WORK_ORDER.INIT_WORK_ORDER_START_DT , WORK_ORDER.WORK_ORDER_CLOSE_DT ,

WORK_ORDER.WORK_ORDER_CREATE_DT , WORK_ORDER.WORK_ORDER_REF_DT ,

WORK_ORDER.SCHED_RPT_START_DT , WORK_ORDER.SCHED_START_DTTM ,

WORK_ORDER.SCHED_RPT_END_DT , WORK_ORDER.SCHED_END_DTTM ,

WORK_ORDER.ACTUAL_RPT_START_DT , WORK_ORDER.ACTUAL_START_DTTM ,

WORK_ORDER.ACTUAL_RPT_END_DT , WORK_ORDER.ACTUAL_END_DTTM ,

COALESCE( WORK_ORDER.ESTIMATED_COST_LOCAL , 0) , COALESCE( WORK_ORDER.LOCAL_CURR_CODE ,

'?') , COALESCE( WORK_ORDER.ESTIMATED_COST_USD , 0) , WORK_ORDER.ACQ_ROW_SEQ ,

WORK_ORDER.ACQ_ROW_STATUS_CODE , WORK_ORDER.BASE_REGION_CODE ,

WORK_ORDER.BASE_INSERT_DTTM , WORK_ORDER.BASE_UPDATE_DTTM ,

WORK_ORDER.ACQ_SRC_TABLE , MAINT_PLAN_HIST.MAINT_PLAN_DT ,

COALESCE( MAINT_PLAN_HIST.PLAN_COUNTER_READING , 0) , COALESCE( MAINT_PLAN_HIST.ACTUAL_COUNTER_READING ,

0) , COALESCE( MAINT_PLAN_HIST.COUNTER_READING_UOM , '?') ,

COALESCE( MAINT_PLAN_HIST.PREV_PLAN_COUNTER_READING , 0) , COALESCE( MAINT_PLAN_HIST.PREV_ACTUAL_COUNTER_READING ,

0) , COALESCE( WO_SCHED_STATUS.WO_SCHED_STATUS_KEY , 0) , COALESCE( WO_SCHED_STATUS.SCHED_STATUS_KEY ,

0) , COALESCE( EDW_DATE.DATE_KEY , 0) , COALESCE( EDW_DATE.DATE_KEY ,

0) , COALESCE( WEEK_CALENDAR.WEEK_CALENDAR_KEY , 0) , COALESCE( WEEK_CALENDAR_1.WEEK_CALENDAR_KEY ,

0) , COALESCE( WEEK_CALENDAR_2.WEEK_CALENDAR_KEY , 0) , COALESCE( WEEK_CALENDAR_3.WEEK_CALENDAR_KEY ,

0) , WORK_ORDER.EQUIPMENT_KEY , WORK_ORDER.NOTIFICATION_KEY

FROM ((((((((STAGE_CORP_SAP.WORK_ORDER_F_P1 WORK_ORDER_F_P1_2 LEFT OUTER JOIN BASE.WORK_ORDER WORK_ORDER

ON ( WORK_ORDER_F_P1_2.SRC_AUFNR = WORK_ORDER.SRC_AUFNR )

AND

( WORK_ORDER_F_P1_2.ACQ_SITE_ABBR = WORK_ORDER.ACQ_SITE_ABBR )

AND

( WORK_ORDER_F_P1_2.ACQ_SRC_SYS_ABBR = WORK_ORDER.ACQ_SRC_SYS_ABBR )

AND

( WORK_ORDER.ACQ_ROW_STATUS_CODE <> 'D')) LEFT OUTER JOIN BASE.MAINT_PLAN_HIST MAINT_PLAN_HIST

ON ( MAINT_PLAN_HIST.MAINT_PLAN_HIST_KEY = WORK_ORDER.MAINT_PLAN_HIST_KEY )

AND

( WORK_ORDER_F_P1_2.ACQ_SITE_ABBR = MAINT_PLAN_HIST.ACQ_SITE_ABBR )

AND

( WORK_ORDER_F_P1_2.ACQ_SRC_SYS_ABBR = MAINT_PLAN_HIST.ACQ_SRC_SYS_ABBR )

AND

( MAINT_PLAN_HIST.ACQ_ROW_STATUS_CODE <> 'D')) LEFT OUTER JOIN BASE.WO_SCHED_STATUS WO_SCHED_STATUS

ON ( WORK_ORDER_F_P1_2.WORK_ORDER_KEY = WO_SCHED_STATUS.WORK_ORDER_KEY )

AND

( WO_SCHED_STATUS.STATUS_ACTIVE_IND = 'Y')

AND

( WO_SCHED_STATUS.ACQ_ROW_STATUS_CODE <> 'D')) LEFT OUTER JOIN BASE.EDW_DATE EDW_DATE

ON ( EDW_DATE.CALENDAR_DATE = WORK_ORDER.SCHED_RPT_START_DT )) LEFT OUTER JOIN BASE.EDW_DATE EDW_DATE_1

ON ( EDW_DATE_1.CALENDAR_DATE = WORK_ORDER.ACTUAL_RPT_START_DT )) LEFT OUTER JOIN BASE.WEEK_CALENDAR WEEK_CALENDAR

ON ( WORK_ORDER.WEEK_CODE_KEY = WEEK_CALENDAR.WEEK_CODE_KEY )

AND

( WORK_ORDER.SCHED_RPT_START_DT >= WEEK_CALENDAR.WEEK_START_DT )

AND

( WORK_ORDER.SCHED_RPT_START_DT <= WEEK_CALENDAR.WEEK_END_DT )) LEFT OUTER JOIN BASE.WEEK_CALENDAR WEEK_CALENDAR_1

ON ( WORK_ORDER.WEEK_CODE_KEY = WEEK_CALENDAR_1.WEEK_CODE_KEY )

AND

( WORK_ORDER.ACTUAL_RPT_START_DT >= WEEK_CALENDAR_1.WEEK_START_DT )

AND

( WORK_ORDER.ACTUAL_RPT_START_DT <= WEEK_CALENDAR_1.WEEK_END_DT )) LEFT OUTER JOIN BASE.WEEK_CALENDAR WEEK_CALENDAR_2

ON ( WORK_ORDER.WEEK_CODE_KEY = WEEK_CALENDAR_2.WEEK_CODE_KEY )

AND

( WORK_ORDER.WORK_ORDER_CREATE_DT >= WEEK_CALENDAR_2.WEEK_START_DT )

AND

( WORK_ORDER.WORK_ORDER_CREATE_DT <= WEEK_CALENDAR_2.WEEK_END_DT )) LEFT OUTER JOIN BASE.WEEK_CALENDAR WEEK_CALENDAR_3

ON ( WORK_ORDER.WEEK_CODE_KEY = WEEK_CALENDAR_3.WEEK_CODE_KEY )

AND

( WORK_ORDER.WORK_ORDER_CLOSE_DT >= WEEK_CALENDAR_3.WEEK_START_DT )

AND

( WORK_ORDER.WORK_ORDER_CLOSE_DT <= WEEK_CALENDAR_3.WEEK_END_DT )

1) First, we lock STAGE_CORP_SAP_T.WORK_ORDER_F_P1 in view

STAGE_CORP_SAP.WORK_ORDER_F_P1 for access, we lock

BASE_T.MAINT_PLAN_HIST in view BASE.MAINT_PLAN_HIST for access, we

lock BASE_T.WORK_ORDER in view BASE.WORK_ORDER for access, we lock

base_t.edw_date in view BASE.EDW_DATE for access, we lock

BASE_T.WO_SCHED_STATUS in view BASE.WO_SCHED_STATUS for access,

and we lock BASE_T.W in view BASE.WEEK_CALENDAR for access.

2) Next, we do an all-AMPs RETRIEVE step from

STAGE_CORP_SAP_T.WORK_ORDER_F_P1 in view

STAGE_CORP_SAP.WORK_ORDER_F_P1 by way of an all-rows scan with no

residual conditions into Spool 12 (all_amps), which is

redistributed by the hash code of (

STAGE_CORP_SAP_T.WORK_ORDER_F_P1.WORK_ORDER_KEY) to all AMPs.

Then we do a SORT to order Spool 12 by row hash. The size of

Spool 12 is estimated with low confidence to be 51,648 rows (

2,427,456 bytes). The estimated time for this step is 0.09

seconds.

3) We do an all-AMPs JOIN step from Spool 12 (Last Use) by way of a

RowHash match scan, which is joined to BASE_T.WO_SCHED_STATUS in

view BASE.WO_SCHED_STATUS by way of a RowHash match scan with a

condition of ("(BASE_T.WO_SCHED_STATUS in view

BASE.WO_SCHED_STATUS.STATUS_ACTIVE_IND = 'Y') AND

((BASE_T.WO_SCHED_STATUS in view

BASE.WO_SCHED_STATUS.ACQ_ROW_STATUS_CODE > 'D') OR

(BASE_T.WO_SCHED_STATUS in view

BASE.WO_SCHED_STATUS.ACQ_ROW_STATUS_CODE < 'D'))"). Spool 12 and

BASE_T.WO_SCHED_STATUS are left outer joined using a merge join,

with a join condition of ("WORK_ORDER_KEY =

BASE_T.WO_SCHED_STATUS.WORK_ORDER_KEY"). The result goes into

Spool 13 (all_amps), which is redistributed by the hash code of (

STAGE_CORP_SAP_T.WORK_ORDER_F_P1.ACQ_SRC_SYS_ABBR,

STAGE_CORP_SAP_T.WORK_ORDER_F_P1.ACQ_SITE_ABBR,

STAGE_CORP_SAP_T.WORK_ORDER_F_P1.SRC_AUFNR) to all AMPs. Then we

do a SORT to order Spool 13 by row hash. The size of Spool 13 is

estimated with index join confidence to be 52,272 rows (3,293,136

bytes). The estimated time for this step is 0.13 seconds.

4) We execute the following steps in parallel.

1) We do an all-AMPs JOIN step from Spool 13 (Last Use) by way

of an all-rows scan, which is joined to BASE_T.WORK_ORDER in

view BASE.WORK_ORDER by way of a traversal of index # 4

without accessing the base table extracting row ids only.

Spool 13 and BASE_T.WORK_ORDER are left outer joined using a

nested join, with a join condition of ("(SRC_AUFNR =

BASE_T.WORK_ORDER.SRC_AUFNR) AND ((ACQ_SITE_ABBR =

BASE_T.WORK_ORDER.ACQ_SITE_ABBR) AND (ACQ_SRC_SYS_ABBR =

BASE_T.WORK_ORDER.ACQ_SRC_SYS_ABBR ))"). The result goes

into Spool 14 (all_amps), which is redistributed by hash code

to all AMPs. Then we do a SORT to order Spool 14 by field Id

1. The size of Spool 14 is estimated with index join

confidence to be 52,273 rows (3,815,929 bytes). The

estimated time for this step is 0.15 seconds.

2) We do an all-AMPs RETRIEVE step from base_t.edw_date in view

BASE.EDW_DATE by way of a traversal of index # 8 without

accessing the base table with a residual condition of ("NOT

(base_t.edw_date in view BASE.EDW_DATE.CALENDAR_DATE IS NULL)")

into Spool 15 (all_amps), which is duplicated on all AMPs.

Then we do a SORT to order Spool 15 by the hash code of (

base_t.edw_date.CALENDAR_DATE). The size of Spool 15 is

estimated with high confidence to be 876,592 rows (

18,408,432 bytes). The estimated time for this step is 0.65

seconds.

5) We do an all-AMPs JOIN step from Spool 14 (Last Use) by way of an

all-rows scan, which is joined to BASE_T.WORK_ORDER in view

BASE.WORK_ORDER by way of an all-rows scan with a condition of (

"(BASE_T.WORK_ORDER in view BASE.WORK_ORDER.ACQ_ROW_STATUS_CODE >

'D') OR (BASE_T.WORK_ORDER in view

BASE.WORK_ORDER.ACQ_ROW_STATUS_CODE < 'D')"). Spool 14 and

BASE_T.WORK_ORDER are left outer joined using a row id join, with

a join condition of ("(1=1)"). The result goes into Spool 16

(all_amps), which is built locally on the AMPs. Then we do a SORT

to order Spool 16 by the hash code of (

BASE_T.WORK_ORDER.SCHED_RPT_START_DT). The size of Spool 16 is

estimated with index join confidence to be 52,273 rows (

17,406,909 bytes). The estimated time for this step is 0.28

seconds.

6) We execute the following steps in parallel.

1) We do an all-AMPs JOIN step from Spool 15 (Last Use) by way

of a RowHash match scan, which is joined to Spool 16 (Last

Use) by way of a RowHash match scan. Spool 15 and Spool 16

are right outer joined using a merge join, with condition(s)

used for non-matching on right table ("NOT

(SCHED_RPT_START_DT IS NULL)"), with a join condition of (

"CALENDAR_DATE = SCHED_RPT_START_DT"). The result goes into

Spool 17 (all_amps), which is redistributed by the hash code

of (STAGE_CORP_SAP_T.WORK_ORDER_F_P1.ACQ_SRC_SYS_ABBR,

STAGE_CORP_SAP_T.WORK_ORDER_F_P1.ACQ_SITE_ABBR,

BASE_T.WORK_ORDER.MAINT_PLAN_HIST_KEY) to all AMPs. Then we

do a SORT to order Spool 17 by row hash. The size of Spool

17 is estimated with index join confidence to be 52,274 rows

(17,616,338 bytes). The estimated time for this step is 0.50

seconds.

2) We do an all-AMPs RETRIEVE step from BASE_T.MAINT_PLAN_HIST

in view BASE.MAINT_PLAN_HIST by way of an all-rows scan with

a condition of ("(BASE_T.MAINT_PLAN_HIST in view

BASE.MAINT_PLAN_HIST.ACQ_ROW_STATUS_CODE > 'D') OR

(BASE_T.MAINT_PLAN_HIST in view

BASE.MAINT_PLAN_HIST.ACQ_ROW_STATUS_CODE < 'D')") into Spool

20 (all_amps), which is redistributed by the hash code of (

BASE_T.MAINT_PLAN_HIST.MAINT_PLAN_HIST_KEY,

BASE_T.MAINT_PLAN_HIST.ACQ_SITE_ABBR,

BASE_T.MAINT_PLAN_HIST.ACQ_SRC_SYS_ABBR) to all AMPs. Then

we do a SORT to order Spool 20 by row hash. The size of

Spool 20 is estimated with low confidence to be 139,804 rows

(11,463,928 bytes). The estimated time for this step is 0.36

seconds.

3) We do an all-AMPs RETRIEVE step from BASE_T.W in view

BASE.WEEK_CALENDAR by way of an all-rows scan with no

residual conditions into Spool 21 (all_amps), which is

duplicated on all AMPs. The size of Spool 21 is estimated

with low confidence to be 144 rows (3,456 bytes). The

estimated time for this step is 0.01 seconds.

7) We do an all-AMPs JOIN step from Spool 17 (Last Use) by way of a

RowHash match scan, which is joined to Spool 20 (Last Use) by way

of a RowHash match scan. Spool 17 and Spool 20 are left outer

joined using a merge join, with condition(s) used for non-matching

on left table ("NOT (MAINT_PLAN_HIST_KEY IS NULL)"), with a join

condition of ("(ACQ_SRC_SYS_ABBR = ACQ_SRC_SYS_ABBR) AND

((ACQ_SITE_ABBR = ACQ_SITE_ABBR) AND (MAINT_PLAN_HIST_KEY =

MAINT_PLAN_HIST_KEY ))"). The result goes into Spool 25

(all_amps), which is built locally on the AMPs. Then we do a SORT

to order Spool 25 by the hash code of (

BASE_T.WORK_ORDER.WEEK_CODE_KEY). The size of Spool 25 is

estimated with index join confidence to be 52,274 rows (

19,759,572 bytes). The estimated time for this step is 0.32

seconds.

8) We do an all-AMPs JOIN step from Spool 21 (Last Use) by way of an

all-rows scan, which is joined to BASE_T.D in view

BASE.WEEK_CALENDAR by way of an all-rows scan locking BASE_T.D for

access. Spool 21 and BASE_T.D are joined using a single partition

hash_ join, with a join condition of ("BASE_T.D.MAINT_WEEK_CODE =

WEEK_CODE"). The result goes into Spool 26 (all_amps), which is

duplicated on all AMPs. Then we do a SORT to order Spool 26 by

the hash code of (BASE_T.W.WEEK_CODE_KEY). The size of Spool 26

is estimated with low confidence to be 871,504 rows (32,245,648

bytes). The estimated time for this step is 0.77 seconds.

9) We do an all-AMPs JOIN step from Spool 25 (Last Use) by way of a

RowHash match scan, which is joined to Spool 26 by way of a

RowHash match scan. Spool 25 and Spool 26 are left outer joined

using a merge join, with a join condition of (

"(ACTUAL_RPT_START_DT <= MAINT_WEEK_END_DATE) AND

((ACTUAL_RPT_START_DT >= MAINT_WEEK_START_DATE) AND (WEEK_CODE_KEY

= WEEK_CODE_KEY ))"). The result goes into Spool 27 (all_amps),

which is built locally on the AMPs. Then we do a SORT to order

Spool 27 by the hash code of (BASE_T.WORK_ORDER.WEEK_CODE_KEY).

The result spool file will not be cached in memory. The size of

Spool 27 is estimated with index join confidence to be 324,579

rows (123,989,178 bytes). The estimated time for this step is

18.35 seconds.

10) We do an all-AMPs JOIN step from Spool 27 (Last Use) by way of a

RowHash match scan, which is joined to Spool 26 by way of a

RowHash match scan. Spool 27 and Spool 26 are left outer joined

using a merge join, with a join condition of (

"(WORK_ORDER_CLOSE_DT <= MAINT_WEEK_END_DATE) AND

((WORK_ORDER_CLOSE_DT >= MAINT_WEEK_START_DATE) AND (WEEK_CODE_KEY

= WEEK_CODE_KEY ))"). The result goes into Spool 29 (all_amps),

which is built locally on the AMPs. Then we do a SORT to order

Spool 29 by the hash code of (BASE_T.WORK_ORDER.WEEK_CODE_KEY).

The result spool file will not be cached in memory. The size of

Spool 29 is estimated with index join confidence to be 203,116,012

rows (78,402,780,632 bytes). The estimated time for this step is

29 minutes and 12 seconds.

11) We do an all-AMPs JOIN step from Spool 29 (Last Use) by way of a

RowHash match scan, which is joined to Spool 26 by way of a

RowHash match scan. Spool 29 and Spool 26 are left outer joined

using a merge join, with a join condition of (

"(SCHED_RPT_START_DT <= MAINT_WEEK_END_DATE) AND

((SCHED_RPT_START_DT >= MAINT_WEEK_START_DATE) AND (WEEK_CODE_KEY

= WEEK_CODE_KEY ))"). The result goes into Spool 31 (all_amps),

which is built locally on the AMPs. Then we do a SORT to order

Spool 31 by the hash code of (BASE_T.WORK_ORDER.WEEK_CODE_KEY).

The result spool file will not be cached in memory. The size of

Spool 31 is estimated with index join confidence to be

311,108,767,248 rows (121,332,419,226,720 bytes). The estimated

time for this step is 1,005 hours and 3 minutes.

12) We do an all-AMPs JOIN step from Spool 31 (Last Use) by way of a

RowHash match scan, which is joined to Spool 26 (Last Use) by way

of a RowHash match scan. Spool 31 and Spool 26 are left outer

joined using a merge join, with a join condition of (

"(WORK_ORDER_CREATE_DT <= MAINT_WEEK_END_DATE) AND

((WORK_ORDER_CREATE_DT >= MAINT_WEEK_START_DATE) AND

(WEEK_CODE_KEY = WEEK_CODE_KEY ))"). The result goes into Spool

11 (all_amps), which is built locally on the AMPs. The result

spool file will not be cached in memory. The size of Spool 11 is

estimated with index join confidence to be 479,617,004,187,648

rows (*** bytes). The estimated time for this step is 419,538

hours and 16 minutes.

13) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 11 are sent back to the user as the result

of statement 1. The total estimated time is 420,543 hours and 48

minutes.