Need help in Optimizing query to fix spool space issue

Database

Need help in Optimizing query to fix spool space issue

Hi All,

i have a query which need to be tuned, currently the query is running out of spool. I tried to split the query, but not able to achive the expected result.

INSERT INTO TEST.Tmp_Table
(
Acct_id,
Bill_Addr_Cntct_Tele_No,
Bill_Addr_Cntct_Tele_Extns_No
)

SELECT
T07.Acct_id,
T07.Telephone,
T07.Extns_No

FROM (
SELECT
TAddr.Acct_id Acct_id,
T15.Area_Cd || T15.Exchng_Prefix || T15.Tele_Line_No Telephone,
T15.Extns_No Extns_No,

RANK() OVER (PARTITION BY APLH. Acct_Id
ORDER BY APLH.Acct_Prty_Loc_Eff_Dt DESC) K

FROM
PRODUCTION.TMP_Addr TAddr
INNER JOIN
PRODUCTION.Acct_Prty_Loc_Hist APLH
ON (TAddr.Acct_id = APLH.Acct_Id)
INNER JOIN
PRODUCTION.Tmp_T801515 T15
ON (APLH.Addr_Id = T15.Addr_Id)

WHERE
T15.Area_Cd <> ' '
and
APLH.loc_rsn_id= 3
and
(TAddr.Acct_Tp_Ds = 'Fixed'
OR
(TAddr.Acct_Integrated_Invoice_Ind = 'Yes'
AND TAddr.Acct_Tp_Ds = 'Mobile'))
and
TAddr.Acct_Stat_Id=2
QUALIFY RANK() OVER (PARTITION BY APLH. Acct_Id
ORDER BY APLH.Acct_Prty_Loc_Eff_Dt DESC) = 1
) T07
group by 1,2,3
;

This is the explain plan for the query. guess it has some low confidence statments. other suggestion/inputs are highly appreciated.

Explanation
1) First, we lock a distinct TEST."pseudo table" for write
on a RowHash to prevent global deadlock for
TEST.Temptab1.
2) Next, we lock TEST.Temptab1 for write, we lock
PRODUCTION.Acct_Prty_Loc_Hist for access, we lock
PRODUCTION.Temptab2 for access, and we lock PRODUCTION.TMP_ADDR
for access.
3) We do an all-AMPs JOIN step from PRODUCTION.TMP_ADDR by way of a
RowHash match scan with a condition of (
"(PRODUCTION.TMP_ADDR.Acct_Stat_Id = 2) AND
((PRODUCTION.TMP_ADDR.Acct_Tp_Ds = 'fixedline') OR
((PRODUCTION.TMP_ADDR.Acct_Integrated_Invoice_Ind = 'Yes') AND
(PRODUCTION.TMP_ADDR.Acct_Tp_Ds = 'mobile')))"), which is joined to
PRODUCTION.Acct_Prty_Loc_Hist by way of a RowHash match
scan with a condition of (
"(PRODUCTION.Acct_Prty_Loc_Hist.Loc_Rsn_Id = 3) AND (NOT
(PRODUCTION.Acct_Prty_Loc_Hist.Addr_Id IS NULL ))").
PRODUCTION.TMP_ADDR and PRODUCTION.Acct_Prty_Loc_Hist are
joined using a merge join, with a join condition of (
"PRODUCTION.TMP_ADDR.Acct_Id =
PRODUCTION.Acct_Prty_Loc_Hist.Acct_Id"). The input table
PRODUCTION.Acct_Prty_Loc_Hist will not be cached in memory.
The result goes into Spool 3 (all_amps), which is redistributed by
the hash code of (PRODUCTION.Acct_Prty_Loc_Hist.Addr_Id)
to all AMPs. Then we do a SORT to order Spool 3 by row hash. The
size of Spool 3 is estimated with low confidence to be 142,632,566
rows (4,136,344,414 bytes). The estimated time for this step is 9
minutes and 1 second.
4) We do an all-AMPs JOIN step from PRODUCTION.Temptab2 by way of
a RowHash match scan with a condition of (
"PRODUCTION.Temptab2.Area_Cd <> ' '"), which is joined to Spool
3 (Last Use) by way of a RowHash match scan.
PRODUCTION.Temptab2 and Spool 3 are joined using a merge join,
with a join condition of ("Addr_Id =
PRODUCTION.Temptab2.Addr_Id"). The input table
PRODUCTION.Temptab2 will not be cached in memory, but it is
eligible for synchronized scanning. The result goes into Spool 2
(all_amps), which is built locally on the AMPs. The size of Spool
2 is estimated with low confidence to be 142,632,566 rows (
5,990,567,772 bytes). The estimated time for this step is 14.11
seconds.
5) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 6 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 4 (all_amps), which is built locally on the AMPs. The
size is estimated with low confidence to be 142,632,566 rows (
7,416,893,432 bytes).
6) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of
an all-rows scan with a condition of ("Field_5 = 1") into Spool 1
(used to materialize view, derived table or table function T07)
(all_amps), which is built locally on the AMPs. The size of Spool
1 is estimated with low confidence to be 142,632,566 rows (
6,275,832,904 bytes). The estimated time for this step is 3.59
seconds.
7) We do an all-AMPs SUM step to aggregate from Spool 1 (Last Use) by
way of an all-rows scan , grouping by field1 (
PRODUCTION.Temptab2.Area_Cd
,PRODUCTION.Temptab2.Exchng_Prefix
,PRODUCTION.Temptab2.Tele_Line_No). Aggregate Intermediate
Results are computed globally, then placed in Spool 11. The size
of Spool 11 is estimated with no confidence to be 106,974,425 rows
(6,204,516,650 bytes). The estimated time for this step is 39.21
seconds.
8) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan into Spool 9 (all_amps), which is redistributed
by the hash code of (PRODUCTION.Temptab2.Area_Cd) to all AMPs.
Then we do a SORT to order Spool 9 by row hash. The size of Spool
9 is estimated with no confidence to be 106,974,425 rows (
3,423,181,600 bytes). The estimated time for this step is 8.48
seconds.
9) We do an all-AMPs MERGE into TEST.Temptab1 from Spool
9 (Last Use). The size is estimated with no confidence to be
106,974,425 rows. The estimated time for this step is 6 minutes
and 33 seconds.
10) We spoil the parser's dictionary cache for the table.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.

This is my first post,so please comment if the provided info is enough or any more info need on this. Any sort of help will be very help ful.

Thanks in Advance

Regards
John
7 REPLIES
Enthusiast

Re: Need help in Optimizing query to fix spool space issue

Hi John,
What is the primary index for table PRODUCTION.Acct_Prty_Loc_Hist .
Please paste the DDL for the tables.Also the approx rowcount.
Also If you can create the temp table then
TMP_addr1 can be crated as below so the OR condition from the join can be removed then you can directly join the tables.After this hope there will not be spool space issue.Ensure the stats are updated.

create table PRODUCTION.TMP_Addr1
as sel Acct_id from PRODUCTION.TMP_Addr TAddr
where (TAddr.Acct_Tp_Ds = 'Fixed' OR
(TAddr.Acct_Integrated_Invoice_Ind = 'Yes'
AND TAddr.Acct_Tp_Ds = 'Mobile'))
and TAddr.Acct_Stat_Id=2 ) with data ;

regards,
Rupesh

Re: Need help in Optimizing query to fix spool space issue

Hi John,
The Explain looks fine. I think you are joining the Tables on Index column, since I can't see any redistribution or duplication happening.
The possible causes for out of spool space might be
1) with the OR operators in where clause. u can try to convert this OR condition into case statements.
2) Check whether stats are Up to date.
3) Check the count of duplicate values u have for acct_id & addr_id in each tables. Huge no. of dups might result in spool space.
4) Or else the Tables you are operating might be very Huge where the actual data in it occupies more space than your alloacted spool space. Raise a req to DBA for increasing ur spool space if this is the cause.

Regards,
Jason

Re: Need help in Optimizing query to fix spool space issue

Thanks Rupesh and Jason.

That was an excellent suggestion. Let me check and get back to you with the result.

Re: Need help in Optimizing query to fix spool space issue

Still this issue persist, can any one help me to tune this query.

SELECT
T07.Acct_id,
T07.Telephone,
T07.Extns_No

FROM (
SELECT
TAddr.Acct_id Acct_id,
T15.Area_Cd || T15.Exchng_Prefix || T15.Tele_Line_No Telephone,
T15.Extns_No Extns_No,

RANK() OVER (PARTITION BY APLH. Acct_Id
ORDER BY APLH.Acct_Prty_Loc_Eff_Dt DESC) K

FROM
PRODUCTION.TMP_Addr TAddr
INNER JOIN
PRODUCTION.Acct_Prty_Loc_Hist APLH
ON (TAddr.Acct_id = APLH.Acct_Id)
INNER JOIN
PRODUCTION.Tmp_T801515 T15
ON (APLH.Addr_Id = T15.Addr_Id)

WHERE
T15.Area_Cd <> ' '
and
APLH.loc_rsn_id= 3
and
(TAddr.Acct_Tp_Ds = 'Fixed'
OR
(TAddr.Acct_Integrated_Invoice_Ind = 'Yes'
AND TAddr.Acct_Tp_Ds = 'Mobile'))
and
TAddr.Acct_Stat_Id=2
QUALIFY RANK() OVER (PARTITION BY APLH. Acct_Id
ORDER BY APLH.Acct_Prty_Loc_Eff_Dt DESC) = 1
) T07
group by 1,2,3
;

believe use of rank funtion to get the most recent row runs the job out of spool, so tried with replacing the above query with subquery..but still am getting differnent in record count.. please suggest a solution.

SELECT
T07.Acct_id,
T07.Telephone,
T07.Extns_No

FROM (
SELECT
TAddr.Acct_id Acct_id,
T15.Area_Cd || T15.Exchng_Prefix || T15.Tele_Line_No Telephone,
T15.Extns_No Extns_No

FROM
PRODUCTION.TMP_Addr TAddr
INNER JOIN
PRODUCTION.Acct_Prty_Loc_Hist APLH
ON (TAddr.Acct_id = APLH.Acct_Id)
INNER JOIN
PRODUCTION.Tmp_T801515 T15
ON (APLH.Addr_Id = T15.Addr_Id)

and
T15.Area_Cd <> ' '
and
APLH.loc_rsn_id= 3
and
(TAddr.Acct_Tp_Ds = 'FIXED'
OR
(TAddr.Acct_Integrated_Invoice_Ind = 'Yes'
AND TAddr.Acct_Tp_Ds = 'MOBILE'))
and
TAddr.Acct_Stat_Id=2
WHERE
(APLH.acct_id, APLH.Acct_Prty_Loc_Eff_Dt ) IN
(select acct_id, max(Acct_Prty_Loc_Eff_Dt) from PRODUCTION.Acct_Prty_Loc_Hist group by 1)
) T07
group by 1,2,3 ;

Let me know if you required any more info.. thanks in advance

Regards
John

Re: Need help in Optimizing query to fix spool space issue

Looking for expert guidance...
Enthusiast

Re: Need help in Optimizing query to fix spool space issue

I am no expert in Teradata, but I have a question:

Why do you need
RANK() OVER (PARTITION BY APLH. Acct_Id
ORDER BY APLH.Acct_Prty_Loc_Eff_Dt DESC) K
inside the inline view? You are not using/selecting it outside.

How well your Acct_Id records can be spread?

SELECT Acct_Id, count(1)
FROM PRODUCTION.Acct_Prty_Loc_Hist
GROUP BY Acct_Id
ORDER BY 2 DESC

If you see the counts evenly close to each other, then you should be in good shape otherwise it is the data skewness blowing up the spool space on one or more AMP.

Hope it helps
Enthusiast

Re: Need help in Optimizing query to fix spool space issue

try this:
DIAGNOSTIC HASHSTATSPART ON FOR SESSION;

then run your query within the same session