Optimizing a qry

Database
Enthusiast

Optimizing a qry

Hi all!

I am trying to optimize this query which runs for a long time. Any pointers on improving the performance will be helpful.

SELECT Clt.ou_num (NAMED "Clt ID" ) , Clt.ALIAS_NAME (NAMED "Clt Name" ) ,
Emp.ou_num (NAMED "Emp ID" ) , Emp.ALIAS_NAME (NAMED "Emp Name" ) ,
Pln.ASSET_TAG (NAMED "Pln Number" ) , Pln.ALIAS_NAME (NAMED "Pln Name" ) ,
Pln.pln_cd (NAMED "Pln Type" ) , prsn.pin , prsn.LAST_NAME ,
prsn.FST_NAME , prsn.SEMF , prsn.AGE , b.issue_dt , b.pcy_nbr ,
issue_type , premium
FROM PCY_PRSN_BRDG a , pcy_mthly_cust b , org_a A ,
ORG_B pln , ORG_B Clt , org_b Emp , tc_prsn_d_cust prsn
WHERE b.accnt = A.LVL6
AND A.LVL6 = pln.row
AND A.LVL8 = Clt.row
AND A.LVL7 = Emp.row
AND a.pcy_nbr = b.pcy_nbr
AND a.ctct = b.ctct
AND prsn.ROW = a.ctct
AND prsn.ROW = b.ctct
AND b.ctct <> 0
AND b.accnt <> 0
AND Clt.ou_num IN ( '00111’ … )
AND A.HIER_TYPe = 'MKTG'
AND CAST( a.ISSUE_DT AS DATE ) BETWEEN '2011-01-01' AND '2011-01-30'
AND ^ EXISTS (
SELECT 'x'
FROM PCY_PRSN_BRDG
WHERE a.ctct = ctct
AND CAST( ISSUE_DT AS DATE ) BETWEEN '1900-01-01' AND '2011-03-31' )

Tables and indexes---
PCY_PRSN_BRDG
UNIQUE PRIMARY INDEX PCY_PRSN_BRDG_UPI ( PCY_NBR ,CTCT );

pcy_mthly_cust
UNIQUE PRIMARY INDEX PCY_MTHLY_CUST_UPI ( PCY_NBR ,PRD_YR_QTR ,"MONTH" )
PARTITION BY RANGE_N(CAST(((SUBSTR(PRD_YR_QTR ,1 ,4 ))||"month" ) AS INTEGER)
BETWEEN 200201 AND 209912 EACH 1 );

org_a
UNIQUE PRIMARY INDEX ORG_A_UPI ( ROW ,HIER_TYPE );

ORG_B
UNIQUE PRIMARY INDEX ORG_DUPI ( ROW );

tc_prsn_d_cust
UNIQUE PRIMARY INDEX ( ROW )
INDEX ( SSN )
INDEX ( PRSN_UID )
INDEX ( PIN );

Explain SELECT Clt.ou_num (NAMED "Clt ID" ) , Clt.ALIAS_NAME (NAMED "Clt Name" ) ,
Emp.ou_num (NAMED "Emp ID" ) , Emp.ALIAS_NAME (NAMED "Emp Name" ) ,
Pln.ASSET_TAG (NAMED "Pln Number" ) , Pln.ALIAS_NAME (NAMED "Pln Name" ) ,
Pln.pln_cd (NAMED "Pln Type" ) , prsn.pin , prsn.LAST_NAME ,
prsn.FST_NAME , prsn.SEMF , prsn.AGE , b.issue_dt , b.pcy_nbr ,
issue_type , premium
FROM PCY_PRSN_BRDG a , pcy_mthly_cust b , org_a A ,
ORG_B pln , ORG_B Clt , org_b Emp , tc_prsn_d_cust prsn
WHERE b.accnt = A.LVL6
AND A.LVL6 = pln.row
AND A.LVL8 = Clt.row
AND A.LVL7 = Emp.row
AND a.pcy_nbr = b.pcy_nbr
AND a.ctct = b.ctct
AND prsn.ROW = a.ctct
AND prsn.ROW = b.ctct
AND b.ctct <> 0
AND b.accnt <> 0
AND Clt.ou_num IN ( '003879' , '013763' , '059708' , '010192' ,
'014057' , '008410' , '011490' , '065029' )
AND A.HIER_TYPe = 'MARKETING'
AND CAST( a.ISSUE_DT AS DATE ) BETWEEN '2011-04-01' AND '2011-04-30'
AND ^ EXISTS (
SELECT 'x'
FROM PCY_PRSN_BRDG
WHERE a.ctct = ctct
AND CAST( ISSUE_DT AS DATE ) BETWEEN '1900-01-01' AND '2011-03-31' )

1) First, we lock .TC_PRSN_D_CUST for access, we lock
.PCY_PRSN_BRDG for access, we lock
.PCY_MTHLY_CUST for access, we lock .ORG_B for
access, and we lock .ORG_A for access.
2) Next, we execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from .ORG_B by way of
an all-rows scan with a condition of (
"(.ORG_B.OU_NUM = '001111') OR
((.ORG_B.OU_NUM = '022222')))") into Spool 2
(all_amps) (compressed columns allowed), which is duplicated
on all AMPs. The size of Spool 2 is estimated with high
confidence to be 2,100 rows. The estimated time for this
step is 0.27 seconds.
2) We do an all-AMPs RETRIEVE step from .ORG_A by way of
an all-rows scan with a condition of (
"(.ORG_A.LVL6 <> 0) AND
(.ORG_A.HIER_TYPE = 'MKTG')") into Spool 3
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. The size of Spool 3 is estimated with
no confidence to be 39,083 rows. The estimated time for this
step is 0.10 seconds.
3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using a single
partition hash join, with a join condition of ("LVL8 =
ROW"). The result goes into Spool 4 (all_amps) (compressed
columns allowed), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 4 by row hash. The size of Spool
4 is estimated with no confidence to be 2 rows. The estimated
time for this step is 0.01 seconds.
4) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a
RowHash match scan, which is joined to .ORG_B by way of a
RowHash match scan with no residual conditions locking
.ORG_B for access. Spool 4 and .ORG_B are joined
using a merge join, with a join condition of ("LVL7 =
.ORG_B.ROW"). The result goes into Spool 5 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs.
The size of Spool 5 is estimated with no confidence to be 300 rows.
The estimated time for this step is 0.01 seconds.
5) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
all-rows scan, which is joined to .PCY_MTHLY_CUST by
way of an all-rows scan with a condition of (
"(.PCY_MTHLY_CUST.ACCNT <> 0) AND
((.PCY_MTHLY_CUST.CTCT <> 0) AND ((NOT
(.PCY_MTHLY_CUST.CTCT IS NULL )) AND (NOT
(.PCY_MTHLY_CUST.PCY_NBR IS NULL ))))"). Spool 5
and .PCY_MTHLY_CUST are joined using a product join,
with a join condition of (".PCY_MTHLY_CUST.ACCNT =
LVL6"). The input table .PCY_MTHLY_CUST will
not be cached in memory, but it is eligible for synchronized
scanning. The result goes into Spool 6 (all_amps) (compressed
columns allowed), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 6 by row hash. The size of Spool
6 is estimated with no confidence to be 45,597 rows. The
estimated time for this step is 3 minutes and 17 seconds.
6) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of a
RowHash match scan, which is joined to .PCY_PRSN_BRDG
by way of a RowHash match scan with a condition of (
"(.PCY_PRSN_BRDG.CTCT <> 0) AND
(((CAST((ISSUE_DT) AS DATE))<= DATE '2011-04-30') AND
((CAST((ISSUE_DT) AS DATE))>= DATE '2011-04-01'))"). Spool 6 and
.PCY_PRSN_BRDG are joined using a merge join, with a
join condition of ("(.PCY_PRSN_BRDG.CTCT =
CTCT) AND (.PCY_PRSN_BRDG.PCY_NBR =
PCY_NBR)"). The result goes into Spool 7 (all_amps)
(compressed columns allowed), which is redistributed by hash code
to all AMPs. Then we do a SORT to order Spool 7 by row hash. The
size of Spool 7 is estimated with no confidence to be 45,597 rows.
The estimated time for this step is 0.36 seconds.
7) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of
a RowHash match scan, which is joined to .ORG_B by
way of a RowHash match scan with a condition of (
".ORG_B.ROW <> 0") locking .ORG_B for
access. Spool 7 and .ORG_B are joined using a merge
join, with a join condition of ("(.ORG_B.ROW =
ACCNT) AND (LVL6 = .ORG_B.ROW)"). The
result goes into Spool 8 (all_amps) (compressed columns
allowed), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 8 by row hash. The size of
Spool 8 is estimated with no confidence to be 45,597 rows.
The estimated time for this step is 0.35 seconds.
2) We do an all-AMPs RETRIEVE step from
.PCY_PRSN_BRDG by way of an all-rows scan with a
condition of ("((CAST((ISSUE_DT) AS DATE))<= DATE
'2011-03-31') AND ((CAST((ISSUE_DT) AS DATE))>= DATE
'1900-01-01')") locking for access into Spool 9 (all_amps),
which is redistributed by hash code to all AMPs. Then we do
a SORT to order Spool 9 by row hash and the sort key in spool
field1 eliminating duplicate rows. The size of Spool 9 is
estimated with no confidence to be 3,321,123 rows. The
estimated time for this step is 0.81 seconds.
8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to Spool 9 (Last Use) by way of an
all-rows scan. Spool 8 and Spool 9 are joined using an exclusion
merge join, with a join condition of ("CTCT = CTCT")
where unknown comparison will be ignored. The result goes into
Spool 10 (all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 10 by row
hash. The size of Spool 10 is estimated with no confidence to be
45,597 rows. The estimated time for this step is 0.00 seconds.
9) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a
RowHash match scan, which is joined to .TC_PRSN_D_CUST by
way of a RowHash match scan with no residual conditions. Spool 10
and .TC_PRSN_D_CUST are joined using a merge join, with a
join condition of ("(.TC_PRSN_D_CUST.ROW =
CTCT) AND (.TC_PRSN_D_CUST.ROW = CTCT)").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 45,597 rows. The estimated time for this step is 0.84
seconds.
10) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 3 minutes and 20 seconds.

Thanks!!!!
6 REPLIES
Enthusiast

Re: Optimizing a qry

I also noticed that the product join is taking the most time . Any suggestions to avoid the product join ?
Senior Apprentice

Re: Optimizing a qry

All but the first step show "no confidence", which means no stats or calculations on a column.
But there's only one cast on ISSUE_DT (you can probably avoid it) and in other posts you usually got lots of stats already :-)

What's the actual number of rows returned in step 4?
The optimizer assumes 300 rows in a duplicated spool, so when your system has e.g. 150 AMPs it's only two rows.

Always check DBQL steps where actual costs are much higher than estimated.

Dieter
Enthusiast

Re: Optimizing a qry

Thanks Dieter..I have a few more questions here:
So if I understand you correctly, whenever the first step has no confidence it means the stats are not collected on the table used in that step ?

How do we check that actual costs are higher? I do not see any column in the DBQLOG tbl which has anything similar to cost?

So when I see a 'product join'/'duplicated on all Amps' in the explain plan , when should I worry about them and when should I ignore them ?
Senior Apprentice

Re: Optimizing a qry

Whenever any step accesses a table and shows "no confidence" there's no stats or the column is involved in a calculation.
Costs are CPU and IO, in dbc.QryLogV there's only the actual AMPCPUTime and TotalIOCount.
But dbc.QryLogStepsV has EstCPUCost, EstProcTime and EstRowCount, too.

When the duplicated spool is small (maybe up to a few hundred rows multiplied by the number of AMPs in the system) and the estimation is correct (check QryLogStepsV) a product join is probably chosen by the optimizer because it's the most efficient way.

Dieter
Enthusiast

Re: Optimizing a qry

Thanks for the info! I am usinf TD13.I tried looking at QRYLOGSteps by query id found in DBQLOG tbl but it could not find it. ?So I tried the foll qry and it did return some rows but the queryid in both tables look different. I was expecting that the queryid will be same across all dbc tables, shouldn't it?
sel * from dbc.QryLogSteps a
join dbc.dbqlogtbl b
on a.ProcId = b.ProcId
where a.procid = 163
and b.queryid in (1234,12345)

Thanks!
Senior Apprentice

Re: Optimizing a qry

Your query doesn't look correct, you should only join on QueryID.
Maybe step info is not enabled for your user.

Or the QueryID is not displayed correctly in SQL Assistant.
Compare the QueryID and a TRIM(QueryID) to see if they are equal.

Dieter