SQL Problem

Analytics
Enthusiast

SQL Problem

I execute the following SQL-1 statement. The execution time is 55 min. But the record count just have 70000 record.I modify the SQL-1 to SQL-2. The SQL-2 just first exclude the null value in RD_EXCHANGE_AVG_RATE_M200610(40 record). The execution time become 5 sec.
Why? I think V2R5 p**** engine should have some problem. Who can help me? Although I can solve by modifing SQL, I still think this is a serious bug.

SQL-1:
SELECT TXN.ACCT_NBR || TXN.CURRENCY_CODE AS ACCT_NBR
, SUM(CASE
WHEN SUBSTR( TXN.TXN_CODE , 1, 2 ) IN ( '17' , '07 ') THEN
(CASE
WHEN TXN.REVERSE_TXN_IND = 'Y' THEN -1
ELSE 1
END
)
ELSE 0 END) AS CREDIT_CNT
FROM DP_MCIF_HISTORY.EVENT_FORGN_SAV_TXN200610 TXN
LEFT JOIN DP_MCIF_HISTORY.ACCT_FORGN_SAV_CUR200610 DEP
ON TXN.ACCT_NBR = DEP.ACCT_NBR
AND TXN.CURRENCY_CODE = DEP.CURRENCY_CODE
LEFT JOIN DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610 RD
ON TXN.CURRENCY_CODE = RD.CURRENCY_CODE
GROUP BY TXN.ACCT_NBR || TXN.CURRENCY_CODE;

SQL-2:
SELECT TXN.ACCT_NBR || TXN.CURRENCY_CODE AS ACCT_NBR
, SUM(CASE
WHEN SUBSTR( TXN.TXN_CODE , 1, 2 ) IN ( '17' , '07 ') THEN
(CASE
WHEN TXN.REVERSE_TXN_IND = 'Y' THEN -1
ELSE 1
END
)
ELSE 0 END) AS CREDIT_CNT
FROM DP_MCIF_HISTORY.EVENT_FORGN_SAV_TXN200610 TXN
LEFT JOIN DP_MCIF_HISTORY.ACCT_FORGN_SAV_CUR200610 DEP
ON TXN.ACCT_NBR = DEP.ACCT_NBR
AND TXN.CURRENCY_CODE = DEP.CURRENCY_CODE
LEFT JOIN
(sel * from DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610 where currency is not null)RD
ON TXN.CURRENCY_CODE = RD.CURRENCY_CODE
GROUP BY TXN.ACCT_NBR || TXN.CURRENCY_CODE;

8 REPLIES
Enthusiast

Re: SQL Problem

Hi Chung,

Can you please verify the explain of both the queries and check if the plan chosen is similar in both the cases?

I beleive the second query might have chosen a different plan whose runtime is shortened.

Enthusiast

Re: SQL Problem

SQL-1:
SELECT TXN.ACCT_NBR || TXN.CURRENCY_CODE AS ACCT_NBR
, SUM(CASE
WHEN SUBSTR( TXN.TXN_CODE , 1, 2 ) IN ( '17' , '07 ') THEN
(CASE
WHEN TXN.REVERSE_TXN_IND = 'Y' THEN -1
ELSE 1
END
)
ELSE 0 END) AS CREDIT_CNT
FROM EVENT_FORGN_SAV_TXN200610 TXN
LEFT JOIN ACCT_FORGN_SAV_CUR200610 DEP
ON TXN.ACCT_NBR = DEP.ACCT_NBR
AND TXN.CURRENCY_CODE = DEP.CURRENCY_CODE
LEFT JOIN RD_EXCHANGE_AVG_RATE_M200610 RD
ON TXN.CURRENCY_CODE = RD.CURRENCY_CODE
GROUP BY TXN.ACCT_NBR || TXN.CURRENCY_CODE;

1) First, we lock DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610 for
access, we lock DP_MCIF_HISTORY.EVENT_FORGN_SAV_TXN200610 for
access, and we lock DP_MCIF_HISTORY.ACCT_FORGN_SAV_CUR200610 for
access.
2) Next, we execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
DP_MCIF_HISTORY.EVENT_FORGN_SAV_TXN200610 by way of an
all-rows scan with no residual conditions into Spool 4
(all_amps), which is redistributed by hash code to all AMPs.
The size of Spool 4 is estimated with low confidence to be
68,280 rows. The estimated time for this step is 0.14
seconds.
2) We do an all-AMPs RETRIEVE step from
DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610 by way of an
all-rows scan with a condition of ("NOT
(DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610.CURRENCY_CODE
IS NULL)") into Spool 5 (all_amps), which is redistributed by
hash code to all AMPs. The size of Spool 5 is estimated with
no confidence to be 36 rows. The estimated time for this
step is 0.03 seconds.
3) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of
an all-rows scan, which is joined to Spool 5 (Last Use) by
way of an all-rows scan. Spool 4 and Spool 5 are left outer
joined using a product join, with a join condition of (
"CURRENCY_CODE = (CURRENCY_CODE)"). The result goes into
Spool 6 (all_amps), which is built locally on the AMPs. Then
we do a SORT to order Spool 6 by row hash. The size of Spool
6 is estimated with no confidence to be 69,897 rows. The
estimated time for this step is 0.12 seconds.
2) We do an all-AMPs RETRIEVE step from
DP_MCIF_HISTORY.ACCT_FORGN_SAV_CUR200610 by way of an
all-rows scan with no residual conditions into Spool 7
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 7 by row hash. The size of
Spool 7 is estimated with low confidence to be 103,520 rows.
The estimated time for this step is 0.25 seconds.
4) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of a
RowHash match scan, which is joined to Spool 7 (Last Use) by way
of a RowHash match scan. Spool 6 and Spool 7 are left outer
joined using a merge join, with a join condition of (
"(CURRENCY_CODE = CURRENCY_CODE) AND (ACCT_NBR = ACCT_NBR)"). The
result goes into Spool 3 (all_amps), which is built locally on the
AMPs. The size of Spool 3 is estimated with no confidence to be
3,556,998 rows. The estimated time for this step is 2.18 seconds.
5) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 8. The size of Spool 8 is estimated with no confidence
to be 3,201,299 rows. The estimated time for this step is 11.05
seconds.
6) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 3,201,299 rows. The estimated time for this step is 1.91
seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

SQL-2:

SELECT TXN.ACCT_NBR || TXN.CURRENCY_CODE AS ACCT_NBR
, SUM(CASE
WHEN SUBSTR( TXN.TXN_CODE , 1, 2 ) IN ( '17' , '07 ') THEN
(CASE
WHEN TXN.REVERSE_TXN_IND = 'Y' THEN -1
ELSE 1
END
)
ELSE 0 END) AS CREDIT_CNT
FROM EVENT_FORGN_SAV_TXN200610 TXN
LEFT JOIN ACCT_FORGN_SAV_CUR200610 DEP
ON TXN.ACCT_NBR = DEP.ACCT_NBR
AND TXN.CURRENCY_CODE = DEP.CURRENCY_CODE
LEFT JOIN
(sel unique (case when currency_code is null then '' else currency_code end) real_currency_code,convert_twd_rate
from RD_EXCHANGE_AVG_RATE_M200610
) RD
ON TXN.CURRENCY_CODE = RD.real_currency_code
GROUP BY TXN.ACCT_NBR || TXN.CURRENCY_CODE;

1) First, we lock DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610 for
access, we lock DP_MCIF_HISTORY.EVENT_FORGN_SAV_TXN200610 for
access, and we lock DP_MCIF_HISTORY.ACCT_FORGN_SAV_CUR200610 for
access.
2) Next, we execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
DP_MCIF_HISTORY.RD_EXCHANGE_AVG_RATE_M200610 by way of an
all-rows scan with no residual conditions into Spool 1
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 1 by the sort key in spool
field1 eliminating duplicate rows. The size of Spool 1 is
estimated with low confidence to be 40 rows. The estimated
time for this step is 0.03 seconds.
2) We do an all-AMPs RETRIEVE step from
DP_MCIF_HISTORY.EVENT_FORGN_SAV_TXN200610 by way of an
all-rows scan with no residual conditions into Spool 5
(all_amps), which is redistributed by hash code to all AMPs.
The size of Spool 5 is estimated with low confidence to be
68,280 rows. The estimated time for this step is 0.14
seconds.
3) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan with a condition of ("NOT (REAL_CURRENCY_CODE IS
NULL)") into Spool 6 (all_amps), which is redistributed by hash
code to all AMPs. The size of Spool 6 is estimated with low
confidence to be 40 rows. The estimated time for this step is
0.03 seconds.
4) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of
an all-rows scan, which is joined to Spool 6 (Last Use) by
way of an all-rows scan. Spool 5 and Spool 6 are left outer
joined using a product join, with a join condition of (
"(TRANSLATE((CURRENCY_CODE )USING LATIN_TO_UNICODE))=
(REAL_CURRENCY_CODE)"). The result goes into Spool 7
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 7 by row hash. The size of
Spool 7 is estimated with no confidence to be 69,893 rows.
The estimated time for this step is 0.15 seconds.
2) We do an all-AMPs RETRIEVE step from
DP_MCIF_HISTORY.ACCT_FORGN_SAV_CUR200610 by way of an
all-rows scan with no residual conditions into Spool 8
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 8 by row hash. The size of
Spool 8 is estimated with low confidence to be 103,520 rows.
The estimated time for this step is 0.25 seconds.
5) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of a
RowHash match scan, which is joined to Spool 8 (Last Use) by way
of a RowHash match scan. Spool 7 and Spool 8 are left outer
joined using a merge join, with a join condition of (
"(CURRENCY_CODE = CURRENCY_CODE) AND (ACCT_NBR = ACCT_NBR)"). The
result goes into Spool 4 (all_amps), which is built locally on the
AMPs. The size of Spool 4 is estimated with no confidence to be
3,556,998 rows. The estimated time for this step is 2.18 seconds.
6) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 9. The size of Spool 9 is estimated with no confidence
to be 3,201,299 rows. The estimated time for this step is 11.05
seconds.
7) We do an all-AMPs RETRIEVE step from Spool 9 (Last Use) by way of
an all-rows scan into Spool 2 (group_amps), which is built locally
on the AMPs. The size of Spool 2 is estimated with no confidence
to be 3,201,299 rows. The estimated time for this step is 1.91
seconds.
8) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1.

Junior Contributor

Re: SQL Problem

Hi Chung-Te,
both plans are the same (just the order of preparation steps for joins is slightly different), so the performance should be the same.

I still think, the queries you posted are not your real queries, because there's no need for any joins. Following query will retrieve exactly the same data:

SELECT TXN.ACCT_NBR || TXN.CURRENCY_CODE AS ACCT_NBR
, SUM(CASE
WHEN SUBSTR( TXN.TXN_CODE , 1, 2 ) IN ( '17' , '07 ') THEN
(CASE
WHEN TXN.REVERSE_TXN_IND = 'Y' THEN -1
ELSE 1
END
)
ELSE 0 END) AS CREDIT_CNT
FROM EVENT_FORGN_SAV_TXN200610 TXN
GROUP BY TXN.ACCT_NBR || TXN.CURRENCY_CODE;

The main problem is probably the missing statistics, there are probably no stats at all on those tables.
Teradata's optimizer is cost-based and without costs/stats the plan might be quite bad.

Please post the output of "help stats ..." and the PI for each table.

Dieter
Enthusiast

Re: SQL Problem

Hi Dieter,
You are right. We have no stat and we don't need to join other tables.But From the explain statement, you can understand the record count is very small and the explain is the same. Why do SQL-1 spend 40-50 min and SQL-2 just spend 5-10 second ? I still can't realize.I think SQL-1 should complete in 1 min.

Ted
Junior Contributor

Re: SQL Problem

Hi Chung-Te,
did you run the queries several times or just once?
Maybe it was just blocked.

If you use DBQL (including stepinfo) you can submit both queries and then check the query log for details.

Dieter
Enthusiast

Re: SQL Problem

Hi Dieter,
I run the queries many times and I make sure it was not blocked.
I suspect the Tearadata V2R5 P**** Engine have some problem. I have call Taiwan NCR to support me.
Enthusiast

Re: SQL Problem

Hi Chung-Te,
Try excuting the same queries 3/4 times, check with the PMON uitility.

Regds,
Sachin
Enthusiast

Re: SQL Problem

Hi,
I just took a look at the explain statements. If you see the first explain, it is redistributing the 2 tables (one large with approx 68280 rows and the other relatively small with 40 rows) on all amps and then joining both complete tables. But in the second explain, the smaller table is first re-distributed and filtered and then sorted. Then the second large table is joined to this sorted and arranged smaller table. Hence the run time stats for the 2 tables should probably be more precise in the second case where each table is uniquely sorted and arranged and studied by the optimizer. This may cause a better cost study in the optimizer to use a better procedure and be able to do the join faster. But I am not very sure on this. This is the only possibility I can see as of now.