PERFORMANCE PROBLEM QUERY USES TOO MUCH SPOOL

Tools

PERFORMANCE PROBLEM QUERY USES TOO MUCH SPOOL

UPDATE STG FROM MRD_ETL_SIT.RPT_GCI_FMLY_STG STG,
(
SELECT S.PTY_ID,ID1.PTY_ID_NO AS FMLY_GCI_NO,
ID2.PTY_ID_NO AS SUP_FMLY_GCI_NO,ID3.PTY_ID_NO AS HQ_GCI_NO
FROM
MRD_ETL_SIT.RPT_GCI_FMLY_STG S
LEFT OUTER JOIN MRD_SIT.PTY_ID ID1
ON (S.FMLY_PTY_ID=ID1.PTY_ID AND ID1.PTY_ID_TYP_CD='GCI'
AND ID1.EFECT_END_DT='9999-12-31')
LEFT OUTER JOIN MRD_SIT.PTY_ID ID2
ON (S.SUP_FMLY_PTY_ID=ID2.PTY_ID AND ID2.PTY_ID_TYP_CD='GCI'
AND ID2.EFECT_END_DT='9999-12-31')
LEFT OUTER JOIN MRD_SIT.PTY_ID ID3
ON (S.HQ_PTY_ID=ID3.PTY_ID AND ID3.PTY_ID_TYP_CD='GCI'
AND ID3.EFECT_END_DT='9999-12-31')
) DT
SET FMLY_GCI_NO = DT.FMLY_GCI_NO
,SUP_FMLY_GCI_NO = DT.SUP_FMLY_GCI_NO
,HQ_GCI_NO = DT.HQ_GCI_NO
WHERE
(STG.PTY_ID=DT.PTY_ID
AND DT.PTY_ID<>'90004653704');

Explanation
1) First, we lock a distinct MRD_ETL_SIT."pseudo table" for write on
a RowHash to prevent global deadlock for
MRD_ETL_SIT.RPT_GCI_FMLY_STG.
2) Next, we lock a distinct MRD_SIT."pseudo table" for read on a
RowHash to prevent global deadlock for MRD_SIT.ID3.
3) We lock MRD_ETL_SIT.RPT_GCI_FMLY_STG for write, and we lock
MRD_SIT.ID3 for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from a single partition of
MRD_SIT.ID3 with a condition of ("MRD_SIT.ID3.EFECT_END_DT =
DATE '9999-12-31'") with a residual condition of (
"(MRD_SIT.ID3.PTY_ID_TYP_CD = 'GCI') AND
(MRD_SIT.ID3.EFECT_END_DT = DATE '9999-12-31')") into Spool 2
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 2 by
row hash. The size of Spool 2 is estimated with low
confidence to be 2,366,656 rows. The estimated time for this
step is 0.28 seconds.
2) We do an all-AMPs RETRIEVE step from MRD_ETL_SIT.S by way of
an all-rows scan with a condition of ("MRD_ETL_SIT.S.PTY_ID
<> 90004653704.") into Spool 3 (all_amps) (compressed columns
allowed), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 3 by row hash. The size of
Spool 3 is estimated with high confidence to be 2,364,187
rows. The estimated time for this step is 1.77 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use) by way
of a RowHash match scan. Spool 2 and Spool 3 are right outer
joined using a merge join, with a join condition of ("HQ_PTY_ID =
PTY_ID"). The result goes into Spool 4 (all_amps) (compressed
columns allowed), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 4 by row hash. The size of Spool
4 is estimated with low confidence to be 792,803 rows. The
estimated time for this step is 0.17 seconds.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 4 by way of an
all-rows scan into Spool 7 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs. Then we do a SORT
to order Spool 7 by row hash. The size of Spool 7 is
estimated with low confidence to be 185,515,902 rows. The
estimated time for this step is 1 minute and 12 seconds.
2) We do an all-AMPs RETRIEVE step from a single partition of
MRD_SIT.ID2 with a condition of ("MRD_SIT.ID2.EFECT_END_DT =
DATE '9999-12-31'") with a residual condition of (
"(MRD_SIT.ID2.PTY_ID_TYP_CD = 'GCI') AND
(MRD_SIT.ID2.EFECT_END_DT = DATE '9999-12-31')") into Spool 8
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 8 by
row hash. The size of Spool 8 is estimated with low
confidence to be 2,366,656 rows. The estimated time for this
step is 0.28 seconds.
7) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of a
RowHash match scan, which is joined to Spool 7 (Last Use) by way
of a RowHash match scan. Spool 8 and Spool 7 are joined using a
merge join, with a join condition of ("SUP_FMLY_PTY_ID = PTY_ID").
The result goes into Spool 9 (all_amps) (compressed columns
allowed), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 9 by row hash. The size of Spool 9 is
estimated with low confidence to be 792,803 rows. The estimated
time for this step is 14.99 seconds.
8) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of
a RowHash match scan, which is joined to Spool 4 (Last Use)
by way of a RowHash match scan. Spool 9 and Spool 4 are
right outer joined using a merge join, with a join condition
of ("Field_1 = Field_1"). The result goes into Spool 10
(all_amps) (compressed columns allowed), which is
redistributed by hash code to all AMPs. Then we do a SORT to
order Spool 10 by row hash. The size of Spool 10 is
estimated with low confidence to be 792,803 rows. The
estimated time for this step is 0.67 seconds.
2) We do an all-AMPs RETRIEVE step from a single partition of
MRD_SIT.ID1 with a condition of ("MRD_SIT.ID1.EFECT_END_DT =
DATE '9999-12-31'") with a residual condition of (
"(MRD_SIT.ID1.PTY_ID_TYP_CD = 'GCI') AND
(MRD_SIT.ID1.EFECT_END_DT = DATE '9999-12-31')") into Spool
13 (all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 13 by
row hash. The size of Spool 13 is estimated with low
confidence to be 2,366,656 rows. The estimated time for this
step is 0.28 seconds.
9) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a
RowHash match scan, which is joined to Spool 13 (Last Use) by way
of a RowHash match scan. Spool 10 and Spool 13 are left outer
joined using a merge join, with a join condition of ("FMLY_PTY_ID
= PTY_ID"). The result goes into Spool 1 (all_amps) (compressed
columns allowed), which is redistributed by hash code to all AMPs.
The size of Spool 1 is estimated with low confidence to be 792,803
rows. The estimated time for this step is 0.57 seconds.
10) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan with a condition of ("PTY_ID <> 90004653704.
") into Spool 17 (all_amps) (compressed columns allowed), which is
redistributed by hash code to all AMPs. The size of Spool 17 is
estimated with low confidence to be 792,803 rows. The estimated
time for this step is 0.50 seconds.
11) We do an all-AMPs JOIN step from MRD_ETL_SIT.RPT_GCI_FMLY_STG by
way of an all-rows scan with a condition of (
"MRD_ETL_SIT.RPT_GCI_FMLY_STG.PTY_ID <> 90004653704."), which is
joined to Spool 17 (Last Use) by way of an all-rows scan.
MRD_ETL_SIT.RPT_GCI_FMLY_STG and Spool 17 are joined using a
single partition hash join, with a join condition of (
"MRD_ETL_SIT.RPT_GCI_FMLY_STG.PTY_ID = PTY_ID"). The result goes
into Spool 16 (all_amps), which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 16 by the sort key in
spool field1. The size of Spool 16 is estimated with index join
confidence to be 792,803 rows. The estimated time for this step
is 1.10 seconds.
12) We do a MERGE Update to MRD_ETL_SIT.RPT_GCI_FMLY_STG from Spool 16
(Last Use) via ROWID.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.

4 REPLIES
Enthusiast

Re: PERFORMANCE PROBLEM QUERY USES TOO MUCH SPOOL

Hi Richard,
If you can break the query in two parts i.e
create a temporary table/volatile table instead of derived table DT
and then update then I think you will not get spool space issue.
(ensure primary index for this table is PTY_ID)
Also check for stats on columns FMLY_PTY_ID , SUP_FMLY_PTY_ID , HQ_PTY_ID.
Also to filter out more records from STG the DT query can be reframed to add NOT NULL condition as :

SELECT
S.PTY_ID
,ID1.PTY_ID_NO AS FMLY_GCI_NO
,ID2.PTY_ID_NO AS SUP_FMLY_GCI_NO
,ID3.PTY_ID_NO AS HQ_GCI_NO
FROM
MRD_ETL_SIT.RPT_GCI_FMLY_STG S
LEFT OUTER JOIN MRD_SIT.PTY_ID ID1
ON (S.FMLY_PTY_ID=ID1.PTY_ID
AND ID1.PTY_ID_TYP_CD='GCI'
AND ID1.EFECT_END_DT='9999-12-31'
AND S.FMLY_PTY_ID is not null )
LEFT OUTER JOIN MRD_SIT.PTY_ID ID2
ON (S.SUP_FMLY_PTY_ID=ID2.PTY_ID
AND ID2.PTY_ID_TYP_CD='GCI'
AND ID2.EFECT_END_DT='9999-12-31'
AND S.SUP_FMLY_PTY_ID is not null )
LEFT OUTER JOIN MRD_SIT.PTY_ID ID3
ON (S.HQ_PTY_ID=ID3.PTY_ID
AND ID3.PTY_ID_TYP_CD='GCI'
AND ID3.EFECT_END_DT='9999-12-31'
AND S.HQ_PTY_ID is not null )

lets hope this works out

Re: PERFORMANCE PROBLEM QUERY USES TOO MUCH SPOOL

Thank you very much for the recommendations. So far I have tested with the nulls predicate constraint and the query went from 1.19min to 18sec...pretty impressive! This, of course did not help the spool issue so we will test your recommendation regarding: "break the query in two parts i.e
create a temporary table/volatile table instead of derived table DT
and then update then I think you will not get spool space issue." I am confident this will have the effect we hope for.

Thank you again.
Best Regards,
Richard
Enthusiast

Re: PERFORMANCE PROBLEM QUERY USES TOO MUCH SPOOL

Don't forget to check that current statistics are collected on the columns you are joining. That can also affect spool.

Re: PERFORMANCE PROBLEM QUERY USES TOO MUCH SPOOL

Thank you so much for the recommendations. It put us on the right track.

We made sure to collect stats.

We made the changes below and all queries are complaint now.

1. Split the single update statement to 3 separate update statements to update GCI_NO for family, super family & HQ
2. Used a volatile table to derive and populate GCI_NM from PTY table. Then used this volatile table to update HQ_GCI_NM in the reporting stage table.

Thanks again.
Best Regards,
Richard