Where and AND Clause Usage In left Outer Join

Teradata Applications
Teradata Employee

Where and AND Clause Usage In left Outer Join

Hi All,

 

I have a situation about a select query whose performance is getting changed due to interchange of the AND and Where Clause.

PFB the Original Query Using the Where clause:

SyntaxEditor Code Snippet

SELECT 
            CAST(L_POFFR.LYLTY_TRANS_ID AS DECIMAL(18,0)) AS LYLTY_TRANS_ID            ,L_POFFR.PMT_ACCT_ID            ,L_POFFR.DVC_ID            ,L_POFFR.EVNT_TMS            ,L_POFFR.STR_SITE_NUM 
            , L_POFFR.OFFR_ID            , L_POFFR.CAL_DAY_OF_WK_NUM            , L_POFFR.SL_REC_TY_CD            , L_POFFR.SCAN_CD  
            , L_POFFR.RTRN_IND            , L_POFFR.OTH_INC_IND            , L_POFFR.TENDR_TY_CD            , L_POFFR.PCF_TENDR_IND            , COALESCE(L_POFFR.TTL_PROD_SL_AMT , 0) AS TTL_PROD_SL_AMT            , COALESCE(N_WG.RTL_ITEM_GP_COST_AMT, NULL) AS RTL_ITEM_GP_COST_AMT             ,COALESCE(L_POFFR.TTL_PROD_QTY , 0) AS TTL_PROD_QTY             , L_POFFR.SL_UOM_CD
    FROM 
     RLDMPROD_W.LYLTY_EVNT_PROD_OFFR_TEMP    L_POFFR
    INNER JOIN RLDMPROD_V.CAL_DT CAL
                ON         CAL.CAL_DT = L_POFFR.EVNT_DT
    LEFT JOIN NDW_V.WF_GP_COSTS N_WG
    ON      L_POFFR.LGCY_SCAN_CD=N_WG.RTL_UPC_CD
    AND L_POFFR.LGCY_STR_NUM=N_WG.STR_NUM
    AND CAL.CAL_WK_END_DT= N_WG.COST_DT
     LEFT OUTER  JOIN 
 (   SELECT * FROM RLDMPROD_V.LYLTY_FICO_TRANS_MAP
     QUALIFY ROW_NUMBER () OVER (  PARTITION BY   SL_TRANS_ID , STR_SITE_NUM ,LANE_NUM ,TRANS_NUM , SCAN_CD  , trans_seq_num order by  LYLTY_TRANS_ID desc )  =1  )   FICO_MAP
     --ON FICO_MAP.LYLTY_TRANS_ID=L_POFFR.LYLTY_TRANS_ID     ON  FICO_MAP.SL_TRANS_ID =  L_POFFR.SL_tRANS_ID
     AND FICO_MAP.TRANS_NUM = L_POFFR.TRANS_NUM
     AND FICO_MAP.TRANS_SEQ_NUM =  L_POFFR.TRANS_SEQ_NUM
     AND FICO_MAP.STR_SITE_NUM = L_POFFR.STR_SITE_NUM  
     AND FICO_MAP.LANE_NUM = L_POFFR.LANE_NUM
     AND FICO_MAP.EVNT_DT=L_POFFR.EVNT_DT
 WHERE    FICO_MAP.SL_TRANS_ID IS   NULL 

PFB the Expplain plan of the above query:

1) First, we lock RLDMPROD_W.L_POFFR for access, we lock
RLDMPROD_T.CAL_DT in view RLDMPROD_V.CAL_DT for access, we lock
RLDMPROD_T.LYLTY_FICO_TRANS_MAP for access, and we lock
NDW.WF_GP_COSTS in view NDW_V.WF_GP_COSTS for access.
2) Next, we do an all-AMPs RETRIEVE step from
RLDMPROD_T.LYLTY_FICO_TRANS_MAP by way of an all-rows scan with a
condition of ("(NOT (RLDMPROD_T.LYLTY_FICO_TRANS_MAP.LANE_NUM IS
NULL )) AND ((NOT (RLDMPROD_T.LYLTY_FICO_TRANS_MAP.STR_SITE_NUM IS
NULL )) AND ((NOT (RLDMPROD_T.LYLTY_FICO_TRANS_MAP.TRANS_SEQ_NUM
IS NULL )) AND (NOT (RLDMPROD_T.LYLTY_FICO_TRANS_MAP.TRANS_NUM IS
NULL ))))") into Spool 2 (all_amps) (compressed columns allowed),
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 2 is estimated with high confidence to be
10,011,589,927 rows (610,706,985,547 bytes). The estimated time
for this step is 7 minutes and 38 seconds.
3) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 5 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 3 (all_amps) (compressed columns allowed), which is
built locally on the AMPs. The size is estimated with high
confidence to be 10,011,589,927 rows (1,051,216,942,335 bytes).
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan with a condition of ("((NOT (LANE_NUM
IS NULL )) AND ((NOT (STR_SITE_NUM IS NULL )) AND ((NOT
(TRANS_SEQ_NUM IS NULL )) AND (NOT (TRANS_NUM IS NULL )))))
AND (Field_8 = 1)") into Spool 11 (all_amps) (compressed
columns allowed) fanned out into 6 hash join partitions,
which is redistributed by the hash code of (
RLDMPROD_T.LYLTY_FICO_TRANS_MAP.SL_TRANS_ID,
RLDMPROD_T.LYLTY_FICO_TRANS_MAP.TRANS_SEQ_NUM,
RLDMPROD_T.LYLTY_FICO_TRANS_MAP.STR_SITE_NUM,
RLDMPROD_T.LYLTY_FICO_TRANS_MAP.LANE_NUM,
RLDMPROD_T.LYLTY_FICO_TRANS_MAP.TRANS_NUM,
RLDMPROD_T.LYLTY_FICO_TRANS_MAP.EVNT_DT) to all AMPs. The
size of Spool 11 is estimated with high confidence to be
10,011,589,927 rows (430,498,366,861 bytes). The estimated
time for this step is 2 minutes and 25 seconds.
2) We do an all-AMPs RETRIEVE step from RLDMPROD_T.CAL_DT in
view RLDMPROD_V.CAL_DT by way of an all-rows scan with no
residual conditions into Spool 9 (all_amps) (compressed
columns allowed), which is duplicated on all AMPs. The size
of Spool 9 is estimated with high confidence to be 4,613,544
rows (96,884,424 bytes). The estimated time for this step is
0.10 seconds.
5) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of an
all-rows scan, which is joined to RLDMPROD_W.L_POFFR by way of an
all-rows scan with no residual conditions. Spool 9 and
RLDMPROD_W.L_POFFR are joined using a dynamic hash join, with a
join condition of ("CAL_DT = RLDMPROD_W.L_POFFR.EVNT_DT"). The
result goes into Spool 10 (all_amps) (compressed columns allowed)
fanned out into 6 hash join partitions, which is redistributed by
the hash code of (RLDMPROD_W.L_POFFR.EVNT_DT,
RLDMPROD_W.L_POFFR.TRANS_SEQ_NUM, RLDMPROD_W.L_POFFR.SL_TRANS_ID,
RLDMPROD_W.L_POFFR.TRANS_NUM, RLDMPROD_W.L_POFFR.STR_SITE_NUM,
RLDMPROD_W.L_POFFR.LANE_NUM) to all AMPs. The size of Spool 10 is
estimated with low confidence to be 14,542,370 rows (
2,035,931,800 bytes). The estimated time for this step is 1.55
seconds.
6) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of an
all-rows scan, which is joined to Spool 11 (Last Use) by way of an
all-rows scan. Spool 10 and Spool 11 are left outer joined using
a hash join of 6 partitions, with condition(s) used for
non-matching on left table ("(NOT (EVNT_DT IS NULL )) AND (NOT
(TRANS_SEQ_NUM IS NULL ))"), with a join condition of (
"(SL_TRANS_ID = SL_TRANS_ID) AND ((TRANS_NUM = TRANS_NUM) AND
((TRANS_SEQ_NUM = TRANS_SEQ_NUM) AND ((STR_SITE_NUM = STR_SITE_NUM)
AND ((LANE_NUM = LANE_NUM) AND (EVNT_DT = EVNT_DT )))))"), and
with a post join condition of ("SL_TRANS_ID IS NULL"). The result
goes into Spool 12 (all_amps) (compressed columns allowed) fanned
out into 5 hash join partitions, which is redistributed by the
hash code of (RLDMPROD_T.CAL_DT.CAL_WK_END_DT,
RLDMPROD_W.L_POFFR.LGCY_SCAN_CD, RLDMPROD_W.L_POFFR.LGCY_STR_NUM)
to all AMPs. The size of Spool 12 is estimated with low
confidence to be 14,556,838 rows (1,775,934,236 bytes). The
estimated time for this step is 50.72 seconds.
7) We do an all-AMPs RETRIEVE step from NDW.WF_GP_COSTS in view
NDW_V.WF_GP_COSTS by way of an all-rows scan with a condition of (
"NOT (NDW.WF_GP_COSTS in view NDW_V.WF_GP_COSTS.COST_DT IS NULL)")
into Spool 13 (all_amps) (compressed columns allowed) fanned out
into 5 hash join partitions, which is redistributed by the hash
code of (NDW.WF_GP_COSTS.STR_NUM, NDW.WF_GP_COSTS.RTL_UPC_CD,
NDW.WF_GP_COSTS.COST_DT) to all AMPs. The size of Spool 13 is
estimated with high confidence to be 672,005,818 rows (
25,536,221,084 bytes). The estimated time for this step is 15.14
seconds.
8) We do an all-AMPs JOIN step from Spool 12 (Last Use) by way of an
all-rows scan, which is joined to Spool 13 (Last Use) by way of an
all-rows scan. Spool 12 and Spool 13 are left outer joined using
a hash join of 5 partitions, with condition(s) used for
non-matching on left table ("(NOT (CAL_WK_END_DT IS NULL )) AND
((NOT (LGCY_SCAN_CD IS NULL )) AND (NOT (LGCY_STR_NUM IS NULL )))"),
with a join condition of ("(LGCY_SCAN_CD = RTL_UPC_CD) AND
((LGCY_STR_NUM = STR_NUM) AND (CAL_WK_END_DT = COST_DT ))"). The
result goes into Spool 8 (group_amps), which is built locally on
the AMPs. The size of Spool 8 is estimated with low confidence to
be 14,556,838 rows (1,877,832,102 bytes). The estimated time for
this step is 3.51 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 8 are sent back to the user as the result of
statement 1.

As I have Highligted the costly steps in Bold Letters which is going to scan 10Bilions rows because the FICO table is very large table and ALl row scan is going on for this table.
Modified Query where Where Clause Is replaced by AND:
SELECT
CAST(L_POFFR.LYLTY_TRANS_ID AS DECIMAL(18,0)) AS LYLTY_TRANS_ID
,L_POFFR.PMT_ACCT_ID
,L_POFFR.DVC_ID
,L_POFFR.EVNT_TMS
,L_POFFR.STR_SITE_NUM
, L_POFFR.OFFR_ID
, L_POFFR.CAL_DAY_OF_WK_NUM
, L_POFFR.SL_REC_TY_CD
, L_POFFR.SCAN_CD
, L_POFFR.RTRN_IND
, L_POFFR.OTH_INC_IND
, L_POFFR.TENDR_TY_CD
, L_POFFR.PCF_TENDR_IND
, COALESCE(L_POFFR.TTL_PROD_SL_AMT , 0) AS TTL_PROD_SL_AMT
, COALESCE(N_WG.RTL_ITEM_GP_COST_AMT, NULL) AS RTL_ITEM_GP_COST_AMT
,COALESCE(L_POFFR.TTL_PROD_QTY , 0) AS TTL_PROD_QTY
, L_POFFR.SL_UOM_CD
FROM
RLDMPROD_W.LYLTY_EVNT_PROD_OFFR_TEMP L_POFFR
INNER JOIN RLDMPROD_V.CAL_DT CAL
ON CAL.CAL_DT = L_POFFR.EVNT_DT
LEFT JOIN NDW_V.WF_GP_COSTS N_WG
ON L_POFFR.LGCY_SCAN_CD=N_WG.RTL_UPC_CD
AND L_POFFR.LGCY_STR_NUM=N_WG.STR_NUM
AND CAL.CAL_WK_END_DT= N_WG.COST_DT
LEFT OUTER JOIN
( SELECT * FROM RLDMPROD_V.LYLTY_FICO_TRANS_MAP
QUALIFY ROW_NUMBER () OVER ( PARTITION BY SL_TRANS_ID , STR_SITE_NUM ,LANE_NUM ,TRANS_NUM , SCAN_CD , trans_seq_num order by LYLTY_TRANS_ID desc ) =1 ) FICO_MAP
--ON FICO_MAP.LYLTY_TRANS_ID=L_POFFR.LYLTY_TRANS_ID
ON FICO_MAP.SL_TRANS_ID = L_POFFR.SL_tRANS_ID
AND FICO_MAP.TRANS_NUM = L_POFFR.TRANS_NUM
AND FICO_MAP.TRANS_SEQ_NUM = L_POFFR.TRANS_SEQ_NUM
AND FICO_MAP.STR_SITE_NUM = L_POFFR.STR_SITE_NUM
AND FICO_MAP.LANE_NUM = L_POFFR.LANE_NUM
AND FICO_MAP.EVNT_DT=L_POFFR.EVNT_DT
AND FICO_MAP.SL_TRANS_ID IS NULL;--------------->Where Clause replaced with AND

eXPLAIN pLAN FOR THE same:
1) First, we lock RLDMPROD_W.L_POFFR for access, we lock
RLDMPROD_T.CAL_DT in view RLDMPROD_V.CAL_DT for access, and we
lock NDW.WF_GP_COSTS in view NDW_V.WF_GP_COSTS for access.
2) Next, we do an all-AMPs RETRIEVE step from RLDMPROD_T.CAL_DT in
view RLDMPROD_V.CAL_DT by way of an all-rows scan with no residual
conditions into Spool 2 (all_amps) (compressed columns allowed),
which is duplicated on all AMPs. The size of Spool 2 is estimated
with high confidence to be 4,613,544 rows (96,884,424 bytes). The
estimated time for this step is 0.10 seconds.
3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to RLDMPROD_W.L_POFFR by way of an
all-rows scan with no residual conditions. Spool 2 and
RLDMPROD_W.L_POFFR are joined using a dynamic hash join, with a
join condition of ("CAL_DT = RLDMPROD_W.L_POFFR.EVNT_DT"). The
result goes into Spool 3 (all_amps) (compressed columns allowed)
fanned out into 5 hash join partitions, which is redistributed by
the hash code of (RLDMPROD_T.CAL_DT.CAL_WK_END_DT,
RLDMPROD_W.L_POFFR.LGCY_SCAN_CD, RLDMPROD_W.L_POFFR.LGCY_STR_NUM)
to all AMPs. The size of Spool 3 is estimated with low confidence
to be 14,542,370 rows (1,657,830,180 bytes). The estimated time
for this step is 1.43 seconds.
4) We do an all-AMPs RETRIEVE step from NDW.WF_GP_COSTS in view
NDW_V.WF_GP_COSTS by way of an all-rows scan with a condition of (
"NOT (NDW.WF_GP_COSTS in view NDW_V.WF_GP_COSTS.COST_DT IS NULL)")
into Spool 4 (all_amps) (compressed columns allowed) fanned out
into 5 hash join partitions, which is redistributed by the hash
code of (NDW.WF_GP_COSTS.STR_NUM, NDW.WF_GP_COSTS.RTL_UPC_CD,
NDW.WF_GP_COSTS.COST_DT) to all AMPs. The size of Spool 4 is
estimated with high confidence to be 672,005,818 rows (
25,536,221,084 bytes). The estimated time for this step is 15.14
seconds.
5) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
all-rows scan, which is joined to Spool 4 (Last Use) by way of an
all-rows scan. Spool 3 and Spool 4 are left outer joined using a
hash join of 5 partitions, with condition(s) used for non-matching
on left table ("(NOT (CAL_WK_END_DT IS NULL )) AND ((NOT
(LGCY_SCAN_CD IS NULL )) AND (NOT (LGCY_STR_NUM IS NULL )))"),
with a join condition of ("(LGCY_STR_NUM = STR_NUM) AND
((LGCY_SCAN_CD = RTL_UPC_CD) AND (CAL_WK_END_DT = COST_DT ))").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with low confidence
to be 14,542,370 rows (1,875,965,730 bytes). The estimated time
for this step is 3.49 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 20.16 seconds.

In this explain Plan we cann't see the Bigger table getting used anywhere.
Can anyone suggest the reason for the same and also any approach to tune this.


 

3 REPLIES
Junior Contributor

Re: Where and AND Clause Usage In left Outer Join

Both queries return a logically different result, #1 filters for rows which couldn't be joined (similar to a NOT EXISTS) while #2 returns all rows and the optimizer seems to be able to do join elimination, i.e. remove the join.

Teradata Employee

Re: Where and AND Clause Usage In left Outer Join

Thanks for the response D.

But here what i am seeing is the results of both the queries are same.Is that a Possibility?

also query using AND is optimized by a heavy Margin.the resource consumption shoots down to a very low value.

Can you share your thoughts on this.

Highlighted
Junior Contributor

Re: Where and AND Clause Usage In left Outer Join

Query #2 will return the same result when there's no matching data.

 

What is your query supposed to return?

IMHO the QUALIFY can be removed without changing the result.