Query optimization

Database
Enthusiast

Query optimization

Hi,

I am trying to optimize the following query. The Explain plan estimates a processing time of 13 seconds, however the actual query seems to take much more time. Stats have been collected for the tables. Is there any way in which the performance of this query can be improved?

Thanks in advance

Sreedhr

SELECT a.acct_id as mkt_id, a.sor_id as mkt_sor_id, sol_id, sol_dt, a.cust_id, 

a.adr_line_1_txt as address, a.city_nm, b.acct_id , b.sor_id , testcell , prodcell , testcontrol , 'card' AS Segment

FROM userdefined.abc_tera_card_xs a

LEFT JOIN container.cust_acct b

ON a.cust_id = b.cust_id

and data_dt in ('2011-09-23','2011-11-02','2011-12-01')

AND sol_dt between '2011-10-01' and '2011-12-31'

AND cust_grp_cd = 'ID'

AND b.sor_id IN ( '2' , '7' , '38' , '40' , '39' , '22' , '56' )

WHERE testcontrol = 'T'

and ode_chnl = 1

The EXPLAIN plan is as follows:

Next, we execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from userdefined.a in view
          userdefined.abc_tera_card_xs by way of an all-rows scan with a
          condition of ("(userdefined.a in view abc_tera_card_xs.ode_chnl
          = 1) AND (userdefined.a in view abc_tera_card_xs.TestControl =
          'T')") locking for access into Spool 3 (all_amps) (compressed
          columns allowed), which is built locally on the AMPs.  Then
          we do a SORT to order Spool 3 by the hash code of (
          userdefined.a.SOL_ID, userdefined.a in view abc_tera_card_xs.testcell
          (FLOAT, FORMAT '-9.99999999999999E-999')(FLOAT), userdefined.a in
          view abc_tera_card_xs.prodcell (FLOAT, FORMAT
          '-9.99999999999999E-999')(FLOAT)).  The size of Spool 3 is
          estimated with no confidence to be 3,165,367 rows (
          237,402,525 bytes).  The estimated time for this step is 0.58
          seconds.
       2) We do an all-AMPs RETRIEVE step from userdefined.b in view
          abc_tera_card_xs by way of an all-rows scan with a
          condition of ("(NOT (userdefined.b in view
          abc_tera_card_xs.test_cell IS NULL )) AND ((NOT (userdefined.b
          in view abc_tera_card_xs.prod_cell IS NULL )) AND (NOT
          (userdefined.b in view abc_tera_card_xs.sol_id IS NULL )))")
          locking for access into Spool 4 (all_amps) (compressed
          columns allowed), which is duplicated on all AMPs.  Then we
          do a SORT to order Spool 4 by the hash code of (
          userdefined.b.sol_id, userdefined.b in view
          abc_tera_card_xs.test_cell (FLOAT), userdefined.b in view
          abc_tera_card_xs.prod_cell (FLOAT)).  The size of Spool 4
          is estimated with high confidence to be 97,686 rows (
          2,539,836 bytes).  The estimated time for this step is 0.04
          seconds.
       3) We do a two-AMP RETRIEVE step from container_CORE_TB.BUS_LOAD_DT
          in view container.cust_acct_pt by way of unique index
          # 4 "container_CORE_TB.BUS_LOAD_DT in view
          container.cust_acct_pt.DB_NM = 'container_PARTY_TB'
,
          container_CORE_TB.BUS_LOAD_DT in view
          container.cust_acct_pt.TBL_NM =
          'cust_acct'
, container_CORE_TB.BUS_LOAD_DT in view
          container.cust_acct_pt.SOR_ID = -2" with a residual
          condition of ("container_CORE_TB.BUS_LOAD_DT in view
          container.cust_acct_pt.LAST_CMPLD_LOAD_DT >= DATE
          '2007-10-03'") locking row for access into Spool 5 (all_amps)
          (compressed columns allowed), which is duplicated on all AMPs.
          The size of Spool 5 is estimated with high confidence to be
          1,458 rows (24,786 bytes).  The estimated time for this step
          is 0.06 seconds.
       4) We do a single-AMP RETRIEVE step from
          container_CORE_TB.HISTCL_CLNDR in view
          container.cust_acct_pt by way of the unique primary
          index "container_CORE_TB.HISTCL_CLNDR in view
          container.cust_acct_pt.SNAP_DT = DATE '2011-09-23'"
          extracting row ids only with no residual conditions into
          Spool 2 (all_amps), which is built locally on that AMP.  The
          size of Spool 2 is estimated with high confidence to be 1 row.
          The estimated time for this step is 0.00 seconds.
  3) We do a single-AMP RETRIEVE step from container_CORE_TB.HISTCL_CLNDR in
     view container.cust_acct_pt by way of the unique primary
     index "container_CORE_TB.HISTCL_CLNDR in view
     container.cust_acct_pt.SNAP_DT = DATE '2011-11-02'"
     extracting row ids only with no residual conditions into Spool 2
     (all_amps), which is built locally on that AMP.  The size of Spool
     2 is estimated with high confidence to be 2 rows.  The estimated
     time for this step is 0.00 seconds.
  4) We do a single-AMP RETRIEVE step from container_CORE_TB.HISTCL_CLNDR in
     view container.cust_acct_pt by way of the unique primary
     index "container_CORE_TB.HISTCL_CLNDR in view
     container.cust_acct_pt.SNAP_DT = DATE '2011-12-01'"
     extracting row ids only with no residual conditions into Spool 2
     (all_amps), which is built locally on that AMP.  The size of Spool
     2 is estimated with high confidence to be 3 rows.  The estimated
     time for this step is 0.00 seconds.
  5) We do a SORT to order Spool 2 by row id eliminating duplicate rows.
     The estimated time for this step is 0.00 seconds.
  6) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
     all-rows scan, which is joined to container_CORE_TB.HISTCL_CLNDR in
     view container.cust_acct_pt by way of row ids from Spool 2
     (Last Use) with a residual condition of (
     "(container_CORE_TB.HISTCL_CLNDR in view
     container.cust_acct_pt.SNAP_DT <= DATE '2012-01-23') AND
     (container_CORE_TB.HISTCL_CLNDR in view
     container.cust_acct_pt.SNAP_DT >= DATE '2007-10-03')").
     Spool 5 and container_CORE_TB.HISTCL_CLNDR are joined using a product
     join, with a join condition of (
     "container_CORE_TB.HISTCL_CLNDR.SNAP_DT <= LAST_CMPLD_LOAD_DT").  The
     result goes into Spool 6 (all_amps) (compressed columns allowed),
     which is duplicated on all AMPs.  The size of Spool 6 is estimated
     with low confidence to be 2,916 rows (49,572 bytes).  The
     estimated time for this step is 0.10 seconds.
  7) We execute the following steps in parallel.
       1) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of
          an all-rows scan, which is joined to
          container_PARTY_TB.cust_acct in view
          container.cust_acct_pt by way of an all-rows scan with
          a condition of ("(container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SNAP_END_DT >= DATE
          '2007-10-04') AND (((container_PARTY_TB.cust_acct in
          view container.cust_acct_pt.SOR_ID = 2) OR
          ((container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SOR_ID = 7) OR
          ((container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SOR_ID = 38) OR
          ((container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SOR_ID = 40) OR
          ((container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SOR_ID = 39) OR
          ((container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SOR_ID = 22) OR
          (container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.SOR_ID = 56 ))))))) AND
          (container_PARTY_TB.cust_acct in view
          container.cust_acct_pt.CUST_GRP_CD = 'ID '))").  Spool
          6 and container_PARTY_TB.cust_acct are joined using a
          product join, with a join condition of (
          "(container_PARTY_TB.cust_acct.SNAP_DT <= SNAP_DT) AND
          (container_PARTY_TB.cust_acct.SNAP_END_DT > SNAP_DT)").
          The result goes into Spool 7 (all_amps) (compressed columns
          allowed), which is redistributed by the hash code of (
          container_PARTY_TB.cust_acct.cust_id) to all
          AMPs.  Then we do a SORT to order Spool 7 by row hash.  The
          size of Spool 7 is estimated with low confidence to be
          141,208,336 rows (4,377,458,416 bytes).  The estimated time
          for this step is 11.80 seconds.
       2) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of
          a RowHash match scan, which is joined to Spool 4 (Last Use)
          by way of a RowHash match scan.  Spool 3 and Spool 4 are
          left outer joined using a merge join, with a join condition
          of ("((testcell (FLOAT, FORMAT '-9.99999999999999E-999'))=
          (test_cell )) AND (((prodcell (FLOAT, FORMAT
          '-9.99999999999999E-999'))= (prod_cell )) AND (SOL_ID =
          sol_id ))").  The result goes into Spool 8 (all_amps)
          (compressed columns allowed), which is redistributed by the
          hash code of (userdefined.a.cust_id) to all AMPs.  Then we
          do a SORT to order Spool 8 by row hash.  The size of Spool 8
          is estimated with no confidence to be 3,165,368 rows (
          246,898,704 bytes).  The estimated time for this step is 0.79
          seconds.
  8) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of a
     RowHash match scan, which is joined to Spool 8 (Last Use) by way
     of a RowHash match scan.  Spool 7 and Spool 8 are right outer
     joined using a merge join, with condition(s) used for non-matching
     on right table ("(sol_dt <= DATE '2011-12-31') AND (sol_dt >= DATE
     '2011-10-01')"), with a join condition of ("cust_id =
     cust_id").  The result goes into Spool 1 (group_amps),
     which is built locally on the AMPs.  The size of Spool 1 is
     estimated with no confidence to be 3,165,368 rows (316,536,800
     bytes).  The estimated time for this step is 0.93 seconds.
  9) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 13.41 seconds.

2 REPLIES
Senior Supporter

Re: Query optimization

Hi,

it looks like your statement is much more complicated as just a two table join.

Can you share all ddls?

Run a show in front of your SQL and you will get all DDLs related to this statement.

The (FLOAT, FORMAT '-9.99999999999999E-999')(FLOAT) parts in the Explain are indicating implicit conversions which will make stats useless in these columns. So you might check the view definitions for these things.

Ulrich

Enthusiast

Re: Query optimization

Hello Ulrich,

Thank you for your input. My apologies for not responding sooner. We have come up with a new logic to limit the list of values in the date condition. So, the query is working a lot better. I appreciate your help.

Thanks again!

Sincerely,

Naren