product Join

UDA
Highlighted
Enthusiast

product Join

Please help me eliminate the product join on this query as it's driving me mad.
I'ld prefer to not change the structure of the query at this stage though.
When i remove the lines from the OR statemt onwards there is no product join.
The statistics have been updated.

Should I apply indices, what are your thoughts on indexing?
We run a datawarehouse load followed by a mammoth number of mis reports.
Any ideas?

SELECT count(distinct a.claim_ref_id)
FROM fat_bse_cl_pay_cre_trans a
JOIN RRtv_lu_day b
ON (a.Rep_reported_date_id = b.Rep_reported_date_id)
WHERE ((((a.claim_ref_id)
IN (SELECT c.claim_ref_id
FROM Tmp_Reported_Claims_YTD c))
AND b.RRtv_year_id in (2007))
OR (((a.claim_ref_id)
IN (SELECT d.claim_ref_id
FROM Tmp_Reported_Claims_YTD1 d))
AND b.RRtv_year_id in (2007)));

The explain plan looks like this:

Explanation
1) First, we lock a distinct PROD_DATA_V."pseudo table" for read on a
RowHash to prevent global deadlock for PROD_DATA_V.d.
2) Next, we lock a distinct PROD_DATA_V."pseudo table" for read on a
RowHash to prevent global deadlock for PROD_DATA_V.c.
3) We lock a distinct prod_data_t."pseudo table" for read on a
RowHash to prevent global deadlock for
prod_data_t.fat_bse_cl_pay_cre_trans.
4) We lock a distinct PROD_DATA_T."pseudo table" for read on a
RowHash to prevent global deadlock for PROD_DATA_T.TIt_lu_day.
5) We lock PROD_DATA_V.d for read, we lock PROD_DATA_V.c for read, we
lock prod_data_t.fat_bse_cl_pay_cre_trans for read, and we lock
PROD_DATA_T.TIt_lu_day for read.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from PROD_DATA_V.c by way of
an all-rows scan with no residual conditions into Spool 4
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 4 by row hash and the sort key in spool
field1 eliminating duplicate rows. The size of Spool 4 is
estimated with high confidence to be 1 row. The estimated
time for this step is 0.01 seconds.
2) We do an all-AMPs RETRIEVE step from PROD_DATA_V.d by way of
an all-rows scan with no residual conditions into Spool 5
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 5 by row hash and the sort key in spool
field1 eliminating duplicate rows. The size of Spool 5 is
estimated with high confidence to be 1 row. The estimated
time for this step is 0.01 seconds.
7) We do an all-AMPs RETRIEVE step from PROD_DATA_T.TIt_lu_day by way
of an all-rows scan with no residual conditions into Spool 6
(all_amps) fanned out into 3 hash join partitions, which is
duplicated on all AMPs. The size of Spool 6 is estimated with
high confidence to be 79,226 rows. The estimated time for this
step is 0.08 seconds.
8) We do an all-AMPs RETRIEVE step from
prod_data_t.fat_bse_cl_pay_cre_trans by way of an all-rows scan
with a condition of ("NOT
(prod_data_t.fat_bse_cl_pay_cre_trans.Rep_Reported_date_id IS NULL)")
into Spool 7 (all_amps) fanned out into 3 hash join partitions,
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 7 is estimated with high confidence to be
16,238,400 rows. The estimated time for this step is 45.92
seconds.
9) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using a hash join
of 3 partitions, with a join condition of ("Rep_Reported_date_id =
cur_trn_dt"). The result goes into Spool 8 (all_amps), which is
built locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 8 is estimated with low
confidence to be 16,238,400 rows. The estimated time for this
step is 7.35 seconds.
10) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 10 (all_amps), which is duplicated on
all AMPs. The size of Spool 10 is estimated with high confidence
to be 14 rows. The estimated time for this step is 0.02 seconds.
11) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of
an all-rows scan, which is joined to Spool 8 by way of an
all-rows scan with a condition of ("year_id = 2007"). Spool
10 and Spool 8 are joined using a product join, with a join
condition of ("Claim_ref_id = claim_ref_id"). The result goes
into Spool 9 (all_amps), which is redistributed by hash code
to all AMPs. The size of Spool 9 is estimated with low
confidence to be 88 rows. The estimated time for this step is
2.81 seconds.
2) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way
of an all-rows scan into Spool 11 (all_amps), which is
duplicated on all AMPs. The size of Spool 11 is estimated
with high confidence to be 14 rows. The estimated time for
this step is 0.02 seconds.
12) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an
all-rows scan, which is joined to Spool 8 (Last Use) by way of an
all-rows scan with a condition of ("year_id = 2007"). Spool 11
and Spool 8 are joined using a product join, with a join condition
of ("Claim_ref_id = claim_ref_id"). The result goes into Spool 9
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 9 by the sort key in spool field1
eliminating duplicate rows. The size of Spool 9 is estimated with
low confidence to be 88 rows. The estimated time for this step is
2.81 seconds.
13) We do an all-AMPs RETRIEVE step from Spool 9 (Last Use) by way of
an all-rows scan into Spool 3 (all_amps), which is built locally
on the AMPs. The size of Spool 3 is estimated with low confidence
to be 132 rows. The estimated time for this step is 0.02 seconds.
14) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 2.
Aggregate Intermediate Results are computed globally, then placed
in Spool 13. The size of Spool 13 is estimated with high
confidence to be 1 row. The estimated time for this step is 0.02
seconds.
15) We do an all-AMPs SUM step to aggregate from Spool 13 (Last Use)
by way of an all-rows scan. Aggregate Intermediate Results are
computed globally, then placed in Spool 15. The size of Spool 15
is estimated with high confidence to be 1 row. The estimated time
for this step is 0.02 seconds.
16) We do an all-AMPs RETRIEVE step from Spool 15 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with high
confidence to be 1 row. The estimated time for this step is 0.00
seconds.
17) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 59.05 seconds.
4 REPLIES
Enthusiast

Re: product Join

I think that this will do the equivalent:

SELECT count(distinct a.claim_ref_id)
FROM fat_bse_cl_pay_cre_trans a
JOIN RRtv_lu_day b
ON (a.Rep_reported_date_id = b.Rep_reported_date_id)
WHERE (a.claim_ref_id)
IN (SELECT c.claim_ref_id
FROM Tmp_Reported_Claims_YTD c
UNION
SELECT d.claim_ref_id
FROM Tmp_Reported_Claims_YTD1 d)
AND b.RRtv_year_id in (2007)
;

If the claim_ref_id in YTD and YTD1 are mutually exclusive, change the "UNION" to a "UNION ALL".

Let me know if this works for you.

Regards,
Barry
Senior Supporter

Re: product Join

Hi marcmc,

"The statistics have been updated."

Are you shure about that?
According to explain both Tmp_Reported_Claims_YTD and Tmp_Reported_Claims_YTD1 just got a single row.
Of course the optimizer is then doing a Product Join with that single row.

And although you prefer not to change the query, i would suggest to rewrite the WHERE ;-)

SELECT count(distinct a.claim_ref_id)
FROM fat_bse_cl_pay_cre_trans a
JOIN RRtv_lu_day b
ON (a.Rep_reported_date_id = b.Rep_reported_date_id)
WHERE
b.RRtv_year_id in (2007)
AND
(
a.claim_ref_id IN
(SELECT c.claim_ref_id FROM Tmp_Reported_Claims_YTD c)
OR
a.claim_ref_id IN
(SELECT d.claim_ref_id FROM Tmp_Reported_Claims_YTD1 d)
)


And are you shure about "count(distinct a.claim_ref_id)"?
Would be more efficient without the DISTINCT...

Dieter
Enthusiast

Re: product Join

I am also wondering if the condition

AND b.RRtv_year_id in (2007)

had an impact on the performance, it was being evaluated separately for each of the OR conditions, which was an over kill ......

had it been outside the OR conditions. ( well ideally I would put in the join condition itself )
like this.

fat_bse_cl_pay_cre_trans a INNER JOIN RRtv_lu_day b
ON a.Rep_reported_date_id = b.Rep_reported_date_id
AND b.RRtv_year_id = 2007


Or like how Dieter has put, as part of where clause but independent of the OR, depending on your coding styles.

I think that would have resulted in only those records being spooled from RRtv_lu_day

Also turn on diagnostic stats and see if explain says anything about stats recommendations.
Enthusiast

Re: product Join

thanks guys i will look at ball approaches.