Need help in Optimizing query!

Database
Enthusiast

Need help in Optimizing query!

Hi all!

I want to optimize this query given below. There are 50 such queries. These queries are auto-generated and are dependent on the user interface.
So if the user logic changes these queries can get updated,deleted or added.As of today there are 50 such queries. Each of this query is joined
with the 'BIG' table which has 2 million records. Also this part of the query(SELECT DISTINCT FLD1, NEWFLD, FLD3 FROM DB.RUN2 WHERE
FIELD_NM='ABD' AND FLD2='EFG') is repeated in all the 50 queries with just the values of FIELD_NM changing. I tried to collect all the FIELD_NM
in the 50 queries and ran this query once but it almost takes the same time as it takes currently to run them separately.
Can you please help me in optimizing?

INSERT INTO DB.MAIN
(FLD1,FLD2,FLD3,FLD4,FLD5,FLD6,FLD7,FLD8,FLD9)
SELECT DISTINCT P.FLD1, 'HARDCODE1',P.FLD3,'HARDCODE2','1.0000,'No','1' , 123, 4
FROM (
SELECT DISTINCT T.FLD1, T.FLD3
FROM (SELECT T.FLD1, FLD3
FROM db.big T -----> BIG TABLE WITH 2 MILLION RECORDS
INNER JOIN DB.RUN T1 -----> Table which has scope of records for todays run.
ON T.FLD1 = T1.FLD1 AND T1.LGC = 1 ) T ------> Lgc will change from 1 to 50 depending on the no. of logics.
LEFT JOIN (SELECT DISTINCT FLD1, NEWFLD, FLD3
FROM DB.RUN2
WHERE FIELD_NM='ABD' AND FLD2='EFG' ---> FIELD_NM subject to change but fld2 has only 2 values.
) A1
ON T.FLD1 = A1.FLD1 AND T.FLD3 = A1.FLD3 ----> Join condition for all 50 queries.
AND A1.NEWFLD IN ('CRU1','CRU2') ----> NEWFLD values sublect to change in each query.
LEFT JOIN (SELECT DISTINCT FLD1, NEWFLD, FLD3
FROM DB.RUN2
WHERE FIELD_NM='DEF' AND FLD2='EFG' ---> FIELD_NM subject to change but fld2 has only 2 values.
) A2
ON T.FLD1 = A2.FLD1 AND T.FLD3 = A2.FLD3 ---> Join condition for all 50 queries.
AND CAST(A2.NEWFLD AS INT) > 0
WHERE (A1.FLD1 IS NOT NULL AND A2.FLD1 IS NOT NULL) ----> where condition can change ofr each fo the 50 queries.
) P ;

Thanks!
4 REPLIES
Enthusiast

Re: Need help in Optimizing query!

Which tool generated this?

Can you run an explain so we can see what the optimiser does with it?

Also which version of Teradata are you running?
Enthusiast

Re: Need help in Optimizing query!

Also , You may fing that join indexes may be required to support this query, it may be worth reading up on that.

Random
Enthusiast

Re: Need help in Optimizing query!

This is generated by user interface. Teradata version i am not aware.

I had to chnage the query a bit. I just found out that This part of the query is constant throught all teh 50 queries.
'SELECT DISTINCT T.FLD2
FROM (SELECT T.FLD2
FROM DB.TABLE1 T
INNER JOIN DB.TABLE2 T0
ON T.EVT_ID = T0.EVT_ID AND T0.STATUS_CD <> 'RETIRED'
INNER JOIN DB.TABLE3 T1
ON T.FLD2 = T1.FLD2 AND'

Givem below is the actual query and the EXPLAIN.

INSERT INTO DB.MAIN
(FLD1,FLD2,FLD3,FLD4,FLD5,FLD6 ,FLD7)
SELECT DISTINCT 'ta01', P.FLD2, 'Evt',1,2 ,1,239
FROM (
SELECT DISTINCT T.FLD2
FROM (SELECT T.FLD2
FROM DB.TABLE1 T
INNER JOIN DB.TABLE2 T0
ON T.EVT_ID = T0.EVT_ID AND T0.STATUS_CD <> 'RETIRED'
INNER JOIN DB.TABLE3 T1
ON T.FLD2 = T1.FLD2 AND T1.FLD6 = 1 ) T
LEFT JOIN (SELECT DISTINCT FLD2, FIELD_Tx ---------------> This is for selecting teh attribute 'def'.
FROM DB.TABLE4
WHERE FIELD_NM='def' AND FLD3='Event'
) A1
ON T.FLD2 = A1.FLD2
AND A1.FIELD_TX IN ('abc','abcd') -----------> This is for selecting teh values of attribute 'def'.
WHERE A1.FLD2 IS NOT NULL) P;

There can be multiple such left joins depending on how many attributes needs to be checked in that particular query.

1) First, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.TABLE4.
2) Next, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.T.
3) We lock a distinct DB."pseudo table" for read on a RowHash
to prevent global deadlock for DB.T0.
4) We lock a distinct DB."pseudo table" for write on a RowHash
to prevent global deadlock for DB.MAIN.
5) We lock a distinct DB."pseudo table" for read on a RowHash
to prevent global deadlock for DB.T1.
6) We lock DB.TABLE4 for read, we lock DB.T for
read, we lock DB.T0 for read, we lock
DB.MAIN for write, and we lock DB.T1
for read.
7) We do an all-AMPs RETRIEVE step from DB.T by way of an
all-rows scan with no residual conditions into Spool 4 (all_amps),
which is redistributed by hash code to all AMPs. The size of
Spool 4 is estimated with high confidence to be 1,724,839 rows.
The estimated time for this step is 0.90 seconds.
8) We do an all-AMPs JOIN step from DB.T0 by way of an all-rows
scan with a condition of ("DB.T0.STATUS_CD <> 'RETIRED'"),
which is joined to Spool 4 (Last Use) by way of an all-rows scan.
DB.T0 and Spool 4 are joined using a single partition hash
join, with a join condition of ("EVT_ID = DB.T0.EVT_ID").
The result goes into Spool 5 (all_amps), which is redistributed by
hash code to all AMPs. Then we do a SORT to order Spool 5 by row
hash. The size of Spool 5 is estimated with no confidence to be
1,552,356 rows. The estimated time for this step is 0.96 seconds.
9) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from DB.T1 by way of a
RowHash match scan with a condition of ("DB.T1.FLD6 =
1"), which is joined to Spool 5 (Last Use) by way of a
RowHash match scan. DB.T1 and Spool 5 are joined
using a merge join, with a join condition of ("FLD2 =
DB.T1.FLD2"). The result goes into Spool 2
(all_amps), which is built locally on the AMPs. The size of
Spool 2 is estimated with no confidence to be 3,911,980 rows.
The estimated time for this step is 0.07 seconds.
2) We do an all-AMPs RETRIEVE step from DB.TABLE4
by way of an all-rows scan with a condition of (
"(DB.TABLE4.FLD3 = 'Event') AND
(DB.TABLE4.FIELD_NM = 'def')") into
Spool 1 (all_amps), which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 1 by the sort key
in spool field1 eliminating duplicate rows. The size of
Spool 1 is estimated with no confidence to be 583,709 rows.
The estimated time for this step is 0.40 seconds.
10) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way
of an all-rows scan with a condition of ("(NOT (FLD2 IS NULL
)) AND ((NOT (SUBJ_ID IS NULL )) AND (FIELD_TX =
'abc'))") into Spool 6 (all_amps), which is
redistributed by hash code to all AMPs. Then we do a SORT to
order Spool 6 by row hash. The size of Spool 6 is estimated
with no confidence to be 583,709 rows. The estimated time for
this step is 0.32 seconds.
2) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way
of an all-rows scan into Spool 7 (all_amps), which is
redistributed by hash code to all AMPs. Then we do a SORT to
order Spool 7 by row hash. The size of Spool 7 is estimated
with no confidence to be 3,911,980 rows. The estimated time
for this step is 2.11 seconds.
11) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of a
RowHash match scan, which is joined to Spool 7 (Last Use) by way
of a RowHash match scan. Spool 6 and Spool 7 are joined using a
merge join, with a join condition of ("(FLD2 = FLD2) AND
(SUBJ_ID = SUBJ_ID)"). The result goes into Spool 3 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 3 by the sort key in spool field1 eliminating
duplicate rows. The size of Spool 3 is estimated with no
confidence to be 1,154,502,035 rows. The estimated time for this
step is 1 minute and 29 seconds.
12) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 8 (all_amps), which is redistributed
by hash code to all AMPs. The size of Spool 8 is estimated with
no confidence to be 1,154,502,035 rows. The estimated time for
this step is 19.61 seconds.
13) We do a SORT to order Spool 8 by row hash and eliminate duplicate
rows.
14) We do an all-AMPs MERGE into DB.MAIN from
Spool 8 (Last Use).
15) We spoil the parser's dictionary cache for the table.
16) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
Junior Contributor

Re: Need help in Optimizing query!

If this is the actual query, it's quite stupid, what kind of "user interface" created it?

- a DISTINCT on an already DISTINCT result set
- an OUTER join which is actually an INNER join, but at least the optimizer is smart enough to recognize that :-)

This should return the same result set:
SELECT DISTINCT 'ta01', T.FLD2, 'Evt',1,2 ,1,239
FROM DB.TABLE1 T
INNER JOIN DB.TABLE2 T0
ON T.EVT_ID = T0.EVT_ID AND T0.STATUS_CD <> 'RETIRED'
INNER JOIN DB.TABLE3 T1
ON T.FLD2 = T1.FLD2 AND T1.FLD6 = 1
INNER JOIN
FROM DB.TABLE4
ON T.FLD2 = TABLE4.FLD2
WHERE TABLE4.FIELD_NM='def' AND TABLE4.FLD3='Event'
AND TABLE4.FIELD_TX IN ('abc','abcd')

Are those DISTINCTs really neccessary?

Depending on the number of rows per value changing DISTINCT to GROUP BY might help.

There's a lot of NO CONFIDENCE, did you check if all neccessary stats have been collected?

NUSIs might help for the base table access, can you post some info about the cardinality of the columns used in WHERE?

Dieter