OPTIMIZED QUERY

Database

OPTIMIZED QUERY

Hi !

Can this query be optimized?
SELECT DISTINCT P.ID,'VET' ,P.SID,'L118' ,6.0000 ,'NONE' ,'42' ,256 ,7
FROM (
SELECT DISTINCT T.ID, T.SID
FROM (SELECT T.ID, CAST(T.EID AS VARCHAR(40) ) SID
FROM table1 T
INNER JOIN table2 T0
ON T.EID = T0.EID AND T0.STATUS_CD <> 'RET'
INNER JOIN table3 T1
ON T.ID = T1.ID AND T1.RID = 92 ) T
LEFT JOIN (SELECT DISTINCT ID, FIELD_TXT, SID
FROM table4
WHERE FLD_NAME='RSN' AND TYPE='VET'
) A1
ON T.ID = A1.ID AND T.SID = A1.SID
AND A1.FIELD_TXT IN ('ABCD','DEF')
LEFT JOIN (SELECT DISTINCT ID, FIELD_TXT, SID
FROM table4
WHERE FLD_NAME='FROM' AND TYPE='VET'
) A2
ON T.ID = A2.ID AND T.SID = A2.SID
AND A2.FIELD_TXT = 'B'
LEFT JOIN (SELECT DISTINCT ID, FIELD_TXT, SID
FROM table4
WHERE FLD_NAME='AMT' AND TYPE='VET'
) A3
ON T.ID = A3.ID AND T.SID = A3.SID
AND CAST(A3.FIELD_TXT AS DECIMAL(15,2) ) >= 50000
LEFT JOIN (SELECT DISTINCT ID, FIELD_TXT, SID
FROM table4
WHERE FLD_NAME='AMT' AND TYPE='VET'
) A4
ON T.ID = A4.ID AND T.SID = A4.SID
AND CAST(A4.FIELD_TXT AS DECIMAL(15,2) ) <= 30000
WHERE (((A1.ID IS NOT NULL AND A2.ID IS NOT NULL) AND A3.ID IS NOT NULL) AND A4.ID IS NOT NULL)
) P

Thanks!
5 REPLIES
N/A

Re: OPTIMIZED QUERY

Of course it can be optimized.

I don't know what you actually want to achieve, but those LEFT JOINs are useless.
If you check EXPLAIN you'll see that the optimizer replaces them with INNER JOINs.

You access table4 four times in your query, this might be replaced by a single access, but performance depends on the PI of table4 and the number of rows/percentage of rows returned.

Are all those DISTINCTs actually neccessary?

Changing joins to EXISTS might help, too.

Could you post the DDL (at least the PIs) and the Explain?

Dieter

Re: OPTIMIZED QUERY

these are the indices and the explain.
table1- CONSTRAINT PK_E PRIMARY KEY ( ID ,ID ))
table2 - CONSTRAINT PK_E PRIMARY KEY ( ID ))
table3 - PRIMARY INDEX ( ID ,RID );
table4 -PRIMARY INDEX ( ID FLD_NAME,SID ,TYPE );

Explanation
1) First, we lock a distinct ."pseudo table" for read on a
RowHash to prVET global deadlock for .TABLE4.
2) Next, we lock a distinct ."pseudo table" for read on a
RowHash to prVET global deadlock for .T1.
3) We lock a distinct ."pseudo table" for read on a RowHash
to prVET global deadlock for .T.
4) We lock a distinct ."pseudo table" for read on a RowHash
to prVET global deadlock for .T0.
5) We lock .TABLE4 for read, we lock .T1 for
read, we lock .T for read, and we lock .T0 for
read.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from .TABLE4
by way of an all-rows scan with a condition of (
"(.TABLE4.FLD_NAME = 'RSN') AND
((.TABLE4.TYPE = 'VET') AND
(((.TABLE4.FIELD_TXT = 'ABCD') OR
((.TABLE4.FIELD_TXT = 'DEF'))) AND
(NOT (.TABLE4.ID IS NULL ))))") into Spool
4 (used to materialize view, derived table or table function
A1) (all_amps), which is redistributed by the hash code of (
.TABLE4.SID,
.TABLE4.FIELD_TXT, .TABLE4.ID)
to all AMPs. Then we do a SORT to order Spool 4 by the sort
key in spool field1 eliminating duplicate rows. The size of
Spool 4 is estimated with low confidence to be 134,855 rows (
198,506,560 bytes). The estimated time for this step is 0.13
seconds.
2) We do an all-AMPs RETRIEVE step from .TABLE4
by way of an all-rows scan with a condition of (
"(.TABLE4.FLD_NAME = 'FROM') AND ((NOT
(.TABLE4.ID IS NULL )) AND
((.TABLE4.TYPE = 'VET') AND
(.TABLE4.FIELD_TXT = 'B')))") into Spool 3
(used to materialize view, derived table or table function A2)
(all_amps), which is redistributed by the hash code of (
.TABLE4.SID,
.TABLE4.FIELD_TXT, .TABLE4.ID)
to all AMPs. Then we do a SORT to order Spool 3 by the sort
key in spool field1 eliminating duplicate rows. The size of
Spool 3 is estimated with low confidence to be 383,129 rows (
563,965,888 bytes). The estimated time for this step is 0.17
seconds.
3) We do an all-AMPs RETRIEVE step from .TABLE4
by way of an all-rows scan with a condition of ("(NOT
(.TABLE4.ID IS NULL )) AND
((.TABLE4.FLD_NAME = 'AMT') AND
(.TABLE4.TYPE = 'VET'))") into Spool 2
(used to materialize view, derived table or table function A3)
(all_amps), which is redistributed by the hash code of (
.TABLE4.SID,
.TABLE4.FIELD_TXT, .TABLE4.ID)
to all AMPs. Then we do a SORT to order Spool 2 by the sort
key in spool field1 eliminating duplicate rows. The size of
Spool 2 is estimated with low confidence to be 305,776 rows (
450,102,272 bytes). The estimated time for this step is 0.15
seconds.
7) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
way of an all-rows scan with a condition of ("((A1.FIELD_TXT =
'ABCD') OR ((A1.FIELD_TXT = 'DEF'))) AND (NOT (A1.ID IS NULL ))") into
Spool 6 (all_amps), which is redistributed by the hash code
of (.TABLE4.ID, 92) to all AMPs. Then we
do a SORT to order Spool 6 by row hash. The size of Spool 6
is estimated with low confidence to be 134,855 rows (
6,203,330 bytes). The estimated time for this step is 0.14
seconds.
2) We do an all-AMPs RETRIEVE step from .T by way of an
all-rows scan with no residual conditions into Spool 7
(all_amps), which is redistributed by the hash code of (
.T.ID, .T.ID, TRIM(BOTH FROM
.T.EID (VARCHAR(40), CHARACTER SET LATIN, NOT
CASESPECIFIC, FORMAT '-(10)9'))(VARCHAR(40), CHARACTER SET
LATIN, NOT CASESPECIFIC)(VARCHAR(80), CHARACTER SET LATIN,
NOT CASESPECIFIC)) to all AMPs. Then we do a SORT to order
Spool 7 by row hash. The size of Spool 7 is estimated with
high confidence to be 3,675,051 rows (80,851,122 bytes). The
estimated time for this step is 0.43 seconds.
8) We do an all-AMPs JOIN step from .T1 by way of a RowHash
match scan with a condition of (".T1.RID = 92"), which
is joined to Spool 6 (Last Use) by way of a RowHash match scan.
.T1 and Spool 6 are joined using a merge join, with a join
condition of (".T1.ID = ID"). The result goes into
Spool 8 (all_amps), which is redistributed by the hash code of (
.T1.ID, .TABLE4.ID,
.TABLE4.SID) to all AMPs. Then we do a SORT to
order Spool 8 by row hash. The size of Spool 8 is estimated with
low confidence to be 4,907 rows (240,443 bytes). The estimated
time for this step is 0.03 seconds.
9) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of
a RowHash match scan, which is joined to Spool 8 (Last Use)
by way of a RowHash match scan. Spool 7 and Spool 8 are
joined using a merge join, with a join condition of ("(ID
= ID) AND (((TRIM(BOTH FROM {LeftTable}.EID
(VARCHAR(40), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'-(10)9'))(VARCHAR(40), CHARACTER SET LATIN, NOT
CASESPECIFIC))= SID) AND (ID = ID ))"). The result
goes into Spool 9 (all_amps), which is redistributed by the
hash code of (.T.VET_ID) to all AMPs. Then we do a
SORT to order Spool 9 by row hash. The size of Spool 9 is
estimated with low confidence to be 4,907 rows (279,699
bytes). The estimated time for this step is 0.03 seconds.
2) We do an all-AMPs RETRIEVE step from Spool 2 by way of an
all-rows scan with a condition of ("((TRANSLATE((A4.FIELD_TXT
)USING LATIN_TO_UNICODE)(DECIMAL(15,2), FORMAT
'--------------.99'))<= 30000.00) AND (NOT (A4.ID IS
NULL ))") into Spool 10 (all_amps), which is redistributed by
the hash code of (.TABLE4.ID,
.TABLE4.SID, .TABLE4.ID,
.TABLE4.ID) to all AMPs. Then we do a SORT
to order Spool 10 by row hash. The size of Spool 10 is
estimated with low confidence to be 305,776 rows (14,065,696
bytes). The estimated time for this step is 0.05 seconds.
10) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from .T0 by way of a
RowHash match scan with a condition of (
".T0.STATUS_CD <> 'RET'"), which is joined to
Spool 9 (Last Use) by way of a RowHash match scan.
.T0 and Spool 9 are joined using a merge join, with a
join condition of ("EID = .T0.EID"). The
result goes into Spool 11 (all_amps), which is redistributed
by the hash code of (.T1.ID,
.TABLE4.ID, .T.ID, TRIM(BOTH FROM
.T.EID (VARCHAR(40), CHARACTER SET LATIN, NOT
CASESPECIFIC, FORMAT '-(10)9'))(VARCHAR(40), CHARACTER SET
LATIN, NOT CASESPECIFIC)(VARCHAR(80), CHARACTER SET LATIN, NOT
CASESPECIFIC)) to all AMPs. Then we do a SORT to order Spool
11 by row hash. The size of Spool 11 is estimated with low
confidence to be 4,907 rows (147,210 bytes). The estimated
time for this step is 0.04 seconds.
2) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way
of an all-rows scan with a condition of ("(A2.FIELD_TXT = 'B')
AND (NOT (A2.ID IS NULL ))") into Spool 12 (all_amps),
which is redistributed by the hash code of (
.TABLE4.ID, .TABLE4.SID,
.TABLE4.ID, .TABLE4.ID,
.TABLE4.ID) to all AMPs. Then we do a SORT
to order Spool 12 by row hash. The size of Spool 12 is
estimated with low confidence to be 383,129 rows (17,623,934
bytes). The estimated time for this step is 0.06 seconds.
11) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of
a RowHash match scan, which is joined to Spool 11 (Last Use)
by way of a RowHash match scan. Spool 10 and Spool 11 are
joined using a merge join, with a join condition of (
"((TRIM(BOTH FROM {RightTable}.EID (VARCHAR(40),
CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'-(10)9'))(VARCHAR(40), CHARACTER SET LATIN, NOT
CASESPECIFIC))= SID) AND ((ID = ID) AND ((ID =
ID) AND (ID = ID )))"). The result goes into Spool
13 (all_amps), which is redistributed by the hash code of (
.TABLE4.ID, .T.ID,
.TABLE4.ID, .T1.ID, TRIM(BOTH
FROM .T.EID (VARCHAR(40), CHARACTER SET LATIN,
NOT CASESPECIFIC, FORMAT '-(10)9'))(VARCHAR(40), CHARACTER SET
LATIN, NOT CASESPECIFIC)(VARCHAR(80), CHARACTER SET LATIN, NOT
CASESPECIFIC)) to all AMPs. Then we do a SORT to order Spool
13 by row hash. The size of Spool 13 is estimated with low
confidence to be 4,910 rows (166,940 bytes). The estimated
time for this step is 0.03 seconds.
2) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way
of an all-rows scan with a condition of (
"((TRANSLATE((A3.FIELD_TXT )USING
LATIN_TO_UNICODE)(DECIMAL(15,2), FORMAT
'--------------.99'))>= 50000.00) AND (NOT (A3.ID IS NULL
))") into Spool 14 (all_amps), which is redistributed by the
hash code of (.TABLE4.ID,
.TABLE4.SID, .TABLE4.ID,
.TABLE4.ID, .TABLE4.ID,
.TABLE4.ID) to all AMPs. Then we do a SORT
to order Spool 14 by row hash. The size of Spool 14 is
estimated with low confidence to be 305,776 rows (14,065,696
bytes). The estimated time for this step is 0.05 seconds.
12) We do an all-AMPs JOIN step from Spool 12 (Last Use) by way of a
RowHash match scan, which is joined to Spool 13 (Last Use) by way
of a RowHash match scan. Spool 12 and Spool 13 are joined using a
merge join, with a join condition of ("(ID = ID) AND
(((TRIM(BOTH FROM {RightTable}.EID (VARCHAR(40), CHARACTER
SET LATIN, NOT CASESPECIFIC, FORMAT '-(10)9'))(VARCHAR(40),
CHARACTER SET LATIN, NOT CASESPECIFIC))= SID) AND ((ID =
ID) AND ((ID = ID) AND(ID = ID ))))"). The result
goes into Spool 15 (all_amps), which is redistributed by the hash
code of (.TABLE4.ID, .T1.ID,
.TABLE4.ID, .T.ID,
.TABLE4.ID, TRIM(BOTH FROM .T.EID
(VARCHAR(40), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'-(10)9'))(VARCHAR(40), CHARACTER SET LATIN, NOT
CASESPECIFIC)(VARCHAR(80), CHARACTER SET LATIN, NOT CASESPECIFIC))
to all AMPs. Then we do a SORT to order Spool 15 by row hash.
The size of Spool 15 is estimated with low confidence to be 4,910
rows (186,580 bytes). The estimated time for this step is 0.03
seconds.
13) We do an all-AMPs JOIN step from Spool 14 (Last Use) by way of a
RowHash match scan, which is joined to Spool 15 (Last Use) by way
of a RowHash match scan. Spool 14 and Spool 15 are joined using a
merge join, with a join condition of ("(ID = ID) AND ((ID
= ID) AND ((ID = ID) AND ((ID = ID) AND
(((TRIM(BOTH FROM {RightTable}.EID (VARCHAR(40), CHARACTER
SET LATIN, NOT CASESPECIFIC, FORMAT '-(10)9'))(VARCHAR(40),
CHARACTER SET LATIN, NOT CASESPECIFIC))= SID) AND (ID =
ID )))))"). The result goes into Spool 5 (used to materialize
view, derived table or table function P) (all_amps), which is
redistributed by the hash code of (TRIM(BOTH FROM
.T.EID (VARCHAR(40), CHARACTER SET LATIN, NOT
CASESPECIFIC, FORMAT '-(10)9'))(VARCHAR(40), CHARACTER SET LATIN,
NOT CASESPECIFIC), .T.ID) to all AMPs. Then we do a
SORT to order Spool 5 by the sort key in spool field1 eliminating
duplicate rows. The size of Spool 5 is estimated with low
confidence to be 4,907 rows (402,374 bytes). The estimated time
for this step is 0.04 seconds.
14) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 16 (group_amps), which is
redistributed by the hash code of (4, 357, '92', 'NONE', 5.0000,
'CMB118', SID, 'VET', .T.ID) to all AMPs. Then
we do a SORT to order Spool 16 by the sort key in spool field1
eliminating duplicate rows. The size of Spool 16 is estimated
with low confidence to be 4,907 rows (819,469 bytes). The
estimated time for this step is 0.03 seconds.
15) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 16 are sent back to the user as the result
of statement 1. The total estimated time is 0.89 seconds.

Re: OPTIMIZED QUERY

Dieter, can u help me here plz?
N/A

Re: OPTIMIZED QUERY

Dieter is right - you can change the T4 access to not exist but not sure if this helps your performance.
Does this query skew a lot?
Anyway it should make sense to collect stats on all join columns especially on table1.EID and table2.EID.
As already mentioned try out on group by not distinct.
N/A

Re: OPTIMIZED QUERY

Why did you write those Left Joins?
Does this query actually return the correct answer?

Most DISTINCTs are probably not neccessary, instead of rewriting you should better remove them.

What's the row count for each table?
How many rows are returned by that query?
What stats exist?

Did you check DBQL data for your query?

CAST(T.EID AS VARCHAR(40) ) SID:
why do you join on different datatypes?

Changing the PI of table4 to (ID,SID) would result in AMP-local joins.

And of course table4 seems to store different data types, looks like an Entity-Attribute-Value-table. Of course you'll get complex queries with a model like that.

Dieter