Xplain

Database
Enthusiast

Xplain

Hi all!

I have a simple qry to tune. Given is my qry and xplain.But my first issue is that I see 5 tables but a single ON clause. So will all the other tables not be a product join? Are Left outer joins not good for performance? If yes, how do I avoid it?
Also, how do I get rid of the 'duplicated on all amps' in the xpplain plan? What steps do I take to tune this qry?
SELECT D1.c1 AS c1 , D1.c2 AS c2 , D1.c3 AS c3 , D1.c4 AS c4 , D1.c5 AS c5 , D1.c6 AS c6 , D1.c7 AS c7 , D1.c8 AS c8
FROM
( SELECT D1.c1 AS c1 , D1.c2 AS c2 , D1.c3 AS c3 , D1.c4 AS c4 , D1.c5 AS c5 , D1.c6 AS c6 , D1.c7 AS c7 , D1.c8 AS c8
FROM
( SELECT D.CMP_NAME AS c1 , D.CMP_SRC_NUM AS c2 , A.ACTN_TYP AS c3 , A.ACTN_CODE AS c4 , A.ACTN_SUB_TYP AS c5 ,B.OBJ_TYP AS c6 ,
COUNT ( CASE WHEN B.OBJ_TYP = 'Email' THEN C.ROW END ) AS c7 ,
COUNT ( DISTINCT CASE WHEN B.OBJ_TYP = 'Email' THEN C.CONTACT END ) AS c8 ,
D.CMP_ID AS c9
FROM
MKT_ACTN_TYP_D A ,
MKT_OBJ_TYP_D B ,
MKT_ACTN_F C ,
W_SRC_D D
LEFT OUTER JOIN
W_SRC_DX TE
ON TE.ROW = D.ROW
WHERE
(
A.ROW = C.MKT_ACTN_TYP
AND B.ROW = C.MKT_OBJ_TYP
AND B.OBJ_TYP = 'Email'
AND C.src=D.ROW
AND TE.X_SUB_TYP IS NULL
)
GROUP BY A.ACTN_TYP ,A.ACTN_CODE ,A.ACTN_SUB_TYP,B.OBJ_TYP ,D.CMP_ID ,D.CMP_NAME ,
D.CMP_SRC_NUM
) D1
) D1

After COLLECT STATISTICS MKT_OBJ_TYP_D COLUMN OBJ_TYP
1) First, we lock a distinct "pseudo table" for read on a
RowHash to prevent global deadlock for B.
2) Next, we lock a distinct "pseudo table" for read on a
RowHash to prevent global deadlock for A.
3) We lock a distinct "pseudo table" for read on a RowHash to
prevent global deadlock for C.
4) We lock a distinct "pseudo table" for read on a RowHash to
prevent global deadlock for TE.
5) We lock a distinct "pseudo table" for read on a RowHash to
prevent global deadlock for D.
6) We lock B for read, we lock A for read, we
lock C for read, we lock TE for read, and
we lock D for read.
7) We do an all-AMPs RETRIEVE step from A by way of an
all-rows scan with no residual conditions into Spool 4 (all_amps),
which is duplicated on all AMPs. The size of Spool 4 is estimated
with high confidence to be 300 rows. The estimated time for this
step is 0.01 seconds.
8) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of
an all-rows scan, which is joined to B by way of
an all-rows scan with a condition of (
"B.OBJ_TYP = 'Email'"). Spool 4 and
B are joined using a product join, with a join
condition of ("(1=1)"). The result goes into Spool 5
(all_amps), which is duplicated on all AMPs. The size of
Spool 5 is estimated with high confidence to be 45,600 rows.
The estimated time for this step is 0.04 seconds.
2) We do an all-AMPs RETRIEVE step from C by way of
an all-rows scan with a condition of ("(NOT
(C.SRC IS NULL )) AND ((NOT
(C.MKT_OBJ_TYP IS NULL )) AND (NOT
(C.MKT_ACTN_TYP IS NULL )))") into Spool 6
(all_amps), which is built locally on the AMPs. The size of
Spool 6 is estimated with high confidence to be 47,506,614
rows. The estimated time for this step is 9.57 seconds.
9) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
all-rows scan, which is joined to Spool 6 (Last Use) by way of an
all-rows scan. Spool 5 and Spool 6 are joined using a single
partition hash join, with a join condition of ("(ROW =
MKT_OBJ_TYP) AND (ROW = MKT_ACTN_TYP)"). The
result goes into Spool 7 (all_amps), which is built locally on the
AMPs into 14 hash join partitions. The size of Spool 7 is
estimated with low confidence to be 14,158,834 rows. The
estimated time for this step is 15.07 seconds.
10) We do an all-AMPs JOIN step from D by way of a RowHash
match scan with no residual conditions, which is joined to
TE by way of a RowHash match scan with no residual
conditions. D and TE are left outer joined
using a merge join, with a join condition of (
"TE.ROW = D.ROW"). The result goes
into Spool 8 (all_amps), which is duplicated on all AMPs into 14
hash join partitions. The size of Spool 8 is estimated with low
confidence to be 67,013,550 rows. The estimated time for this
step is 16.60 seconds.
11) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
all-rows scan, which is joined to Spool 8 (Last Use) by way of an
all-rows scan. Spool 7 and Spool 8 are joined using a hash join
of 14 partitions, with a join condition of ("(src=ROW)
AND (X_SUB_TYP IS NULL)"). The result goes into Spool 3
(all_amps), which is built locally on the AMPs. The size of Spool
3 is estimated with low confidence to be 14,158,834 rows. The
estimated time for this step is 3.80 seconds.
12) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 10. The size of Spool 10 is estimated with no confidence
to be 10,619,126 rows. The estimated time for this step is 37.70
seconds.
13) We do an all-AMPs SUM step to aggregate from Spool 10 (Last Use)
by way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 12. The size of Spool 12 is estimated with no confidence
to be 7,964,345 rows. The estimated time for this step is 27.72
seconds.
14) We do an all-AMPs RETRIEVE step from Spool 12 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 7,964,345 rows. The estimated time for this step is 1.00
seconds.
15) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 14 (group_amps), which is built
locally on the AMPs. The size of Spool 14 is estimated with no
confidence to be 7,964,345 rows. The estimated time for this step
is 0.86 seconds.
16) Finally, we send out an END TRANSACTN step to all AMPs involved
in processing the request.
-> The contents of Spool 14 are sent back to the user as the result
of statement 1. The total estimated time is 1 minute and 52
seconds.
BEGIN RECOMMENDED STATS ->
17) "COLLECT STATISTICS MKT_ACTN_F COLUMN
(MKT_OBJ_TYP ,
MKT_ACTN_TYP)". (HighConf)
<- END RECOMMENDED STATS

Thank you.
6 REPLIES
Junior Contributor

Re: Xplain

There are 4 joins, but it's a mixture of old and new syntax:

FROM MKT_ACTN_F C
join MKT_ACTN_TYP_D A on A.ROW = C.MKT_ACTN_TYP
join MKT_OBJ_TYP_D B on B.ROW = C.MKT_OBJ_TYP
join W_SRC_D D on C.SRC = D.ROW
LEFT OUTER JOIN W_SRC_DX TE
ON TE.ROW = D.ROW
WHERE B.OBJ_TYP = 'Email'
AND TE.X_SUB_TYP IS NULL

Is this query looking for non-existing data in TE?
Then you might rewrite the left join using a NOT EXISTS.

The product join in the explain is done by the optimizer, because it's the most efficient way.
B.OBJ_TYP = 'Email' returns probably just a single row.
Seems to be a LT/ST=large table/small table optimization, fact table C plus dimensions A and B.

Finally you can get rid of the "CASE WHEN B.OBJ_TYP = 'Email' THEN", because this is already a WHERE-condition and there will be no other value.

Dieter
Enthusiast

Re: Xplain

Thanks Dieter! I changed the LEft join to not exists and it saved 2 minutes. I tried to filter out the records as below but it did not help much.This table is small arnd 850 recs but arnd 500 recs are EMail.
(sel * from MKT_OBJ_TYP where OBJ_TYP = 'Email') ,

How do I find out if the 'duplicated on all amps' or 'product join' is the most efficient way?
Is there any doument that I can refer to for the steps on tuning?
Enthusiast

Re: Xplain

I take my statement back when i change th left join to not exist, the explain shows me 2503 hours as estimated time. Please guide me.
FROM
MKT_ACTN_TYP_D A ,
MKT_OBJ_TYP_D B ,
MKT_ACTN_F C ,
W_SRC_D D
/* LEFT OUTER JOIN
W_SRC_DX TE
ON TE.ROW = D.ROW */
WHERE
(
not exists (sel 1 from SRC_DX where SRC_DX.row_wid = D.ROW_WID )
and A.ROW = C.MKT_ACTN_TYP
AND B.ROW = C.MKT_OBJ_TYP
AND B.OBJ_TYP = 'Email'
AND C.SRC = D.ROW
AND TE.X_SUB_TYP IS NULL
)
Junior Contributor

Re: Xplain

If X_SUB_TYP is defined as NOT NULL then your NOT EXISTS is correct, else there might be NULLs because of the Outer Join *and* because of NULLs within X_SUB_TYP.

I don't know of any specific SQL tuning documentation, you might search "ST/LT" in the manuals to get some more details about this specific topic.

Unless the optimizer is forced to use a Product Join (due to non-equi-joins or ORed join conditions) or it is underestimating the actual number of rows you might see some PJs in complex queries involving some smaller sets.
So it's not your task to find out if it's the most efficient way, it's supposed to be done by the optimizer :-)

Dieter
Junior Contributor

Re: Xplain

In the previous post you wrote "it saved 2 minutes", did you actually run it?

It's hard to tell if those 2500 hours are just a bad estimate without knowledge about your data/stats/DDL/PKs/FKs.

Dieter
Enthusiast

Re: Xplain

On the contrary, X-sub_typ has 99% of Nulls. So if I understand you, the NoT Exist will not work here. I did nt run the qry, just an explain and that too it was an error in qry from my side.