Too much resources

Database

Too much resources

Hi !
Thsi query of ours is using too many rsources. The Fact_stg table has PI as DIM_ID and is partitioned by END_DT
What can be done to to reduce the consumption?
INSERT INTO .FACT_STG
(DIM_ID,FND_DIM_ID,CTG_DIM_ID,END_DT,BAL_AMT,CREAT_DT)
SELECT
DIM_ID,FND_DIM_ID,CTG_DIM_ID,END_DT,SUM(BAL_AMT) AS BAL_AMT,CURRENT_DATE
FROM
( -- DT8
SELECT DIM_ID,FND_DIM_ID,CTG_DIM_ID,END_DT,BAL_AMT--,CURRENT_DATE
FROM
( -- DT6 = 0-dollar rows
SELECT DT1.DIM_ID,DT1.FND_DIM_ID,DIM1.CTG_DIM_ID,DT1.END_DT,CAST(0 AS DECIMAL(15,2)) AS BAL_AMT
FROM
-- select FROM DT3, DT4, SUB_CTG_DIM
( -- DT1 = 0-dollar list of 2 ID's
SELECT F.DIM_ID,F.FND_DIM_ID ,
ADD_MONTHS(CAST((CAST(EXTRACT(YEAR FROM F.VAL_DT) AS CHAR(4))
|| CAST(CAST(EXTRACT(MONTH FROM F.VAL_DT) AS BYTEINT FORMAT '9(2)') AS CHAR(2))
|| '01') AS DATE FORMAT 'YYYYMMDD'), 1) -1 AS END_DT
FROM .DLY F
,.RPT_MO_AN R
WHERE F.VAL_DT BETWEEN R.STRT_DT AND R.MO_END_DT
AND R.RPT_MO_ID = 1
GROUP BY 1,2,3
) DT1
--
,.SUB_CTG_DIM DIM1
WHERE DT1.END_DT BETWEEN DIM1.EFF_DT AND DIM1.END_DT
) DT6
--
UNION ALL
--
SELECT DIM_ID,FND_DIM_ID,CTG_DIM_ID,END_DT,BAL_AMT--,CURRENT_DATE
FROM
( -- DT7: Aggregation from fact table in the processing date range
SELECT
F.DIM_ID,F.FND_DIM_ID ,F.CTG_DIM_ID,
ADD_MONTHS(CAST((CAST(EXTRACT(YEAR FROM F.END_DT) AS CHAR(4))
|| CAST(CAST(EXTRACT(MONTH FROM F.END_DT) AS BYTEINT FORMAT '9(2)') AS CHAR(2))
|| '01') AS DATE FORMAT 'YYYYMMDD'), 1) -1 AS END_DT
,SUM(TRNS_AMT) AS BAL_AMT
FROM .TRNS_HIST_FACT F
,.RPT_MO_AN R
WHERE F.END_DT BETWEEN R.STRT_DT AND R.MO_END_DT
AND R.RPT_MO_ID = 1
GROUP BY 1,2,3,4
) DT7
) DT8
GROUP BY 1,2,3,4
;;

1) First, we lock .TRNS_HIST_FACT for access, we lock
.SUB_CTG_DIM for access, and we lock
.DLY for access.
2) Next, we do a single-AMP RETRIEVE step from
.RPT_MO_AN by way of the unique primary index
".RPT_MO_AN.RPT_MO_ID = 1" with no residual
conditions locking row for access into Spool 7 (all_amps), which
is duplicated on all AMPs. The size of Spool 7 is estimated with
high confidence to be 150 rows. The estimated time for this step
is 0.01 seconds.
3) We do an all-AMPs JOIN step from Spool 7 by way of an all-rows
scan, which is joined to .DLY by way of an all-rows
scan with no residual conditions. Spool 7 and .DLY
are joined using a product join, with a join condition of (
"(.DLY.VAL_DT >= STRT_DT) AND
(.DLY.VAL_DT <= MO_END_DT)"). The input
table .DLY will not be cached in memory, but it is
eligible for synchronized scanning. The result goes into Spool 6
(all_amps), which is built locally on the AMPs. The result spool
file will not be cached in memory. The size of Spool 6 is
estimated with no confidence to be 622,524,249 rows. The
estimated time for this step is 3 minutes and 19 seconds.
4) We do an all-AMPs SUM step to aggregate from Spool 6 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 8. The aggregate spool file will not be cached in memory.
The size of Spool 8 is estimated with no confidence to be
466,893,187 rows. The estimated time for this step is 7 minutes
and 58 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by
way of an all-rows scan into Spool 2 (all_amps), which is
built locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with no
confidence to be 466,893,187 rows. The estimated time for
this step is 25.65 seconds.
2) We do an all-AMPs RETRIEVE step from
.SUB_CTG_DIM by way of an all-rows scan with no
residual conditions into Spool 10 (all_amps), which is
duplicated on all AMPs. The size of Spool 10 is estimated
with high confidence to be 19,050 rows. The estimated time
for this step is 0.04 seconds.
6) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of an
all-rows scan, which is joined to Spool 2 (Last Use) by way of an
all-rows scan. Spool 10 and Spool 2 are joined using a product
join, with a join condition of ("(END_DT >= EFF_DT)
AND (END_DT <= END_DT)"). The result goes into
Spool 3 (all_amps), which is redistributed by hash code to all
AMPs. The result spool file will not be cached in memory. The
size of Spool 3 is estimated with no confidence to be
5,261,619,001 rows. The estimated time for this step is 23
minutes and 33 seconds.
7) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan into Spool 4 (all_amps), which is
built locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 4 is estimated with no
confidence to be 5,261,619,001 rows. The estimated time for
this step is 6 minutes and 7 seconds.
2) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of
an all-rows scan, which is joined to .TRNS_HIST_FACT by
way of an all-rows scan with no residual conditions. Spool 7
and .TRNS_HIST_FACT are joined using a product join,
with a join condition of (
"(.TRNS_HIST_FACT.END_DT >= STRT_DT)
AND (.TRNS_HIST_FACT.END_DT <=
MO_END_DT)"). The input table .TRNS_HIST_FACT will
not be cached in memory, but it is eligible for synchronized
scanning. The result goes into Spool 12 (all_amps), which is
built locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 12 is estimated with no
confidence to be 334,339,993 rows. The estimated time for
this step is 1 minute and 42 seconds.
8) We do an all-AMPs SUM step to aggregate from Spool 12 (Last Use)
by way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 14. The aggregate spool file will not be cached in
memory. The size of Spool 14 is estimated with no confidence to
be 250,754,995 rows. The estimated time for this step is 5
minutes and 25 seconds.
9) We do an all-AMPs RETRIEVE step from Spool 14 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with no confidence to be
250,754,995 rows. The estimated time for this step is 18.48
seconds.
10) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 4 (all_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 4 is estimated with no confidence to be
5,512,373,996 rows. The estimated time for this step is 18.48
seconds.
11) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 18. The aggregate spool file will not be cached in
memory. The size of Spool 18 is estimated with no confidence to
be 4,961,136,597 rows. The estimated time for this step is 1 hour
and 48 minutes.
12) We do an all-AMPs RETRIEVE step from Spool 18 (Last Use) by way of
an all-rows scan into Spool 16 (group_amps), which is built
locally on the AMPs. The result spool file will not be cached in
memory. The size of Spool 16 is estimated with no confidence to
be 4,961,136,597 rows. The estimated time for this step is 6
minutes and 6 seconds.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 16 are sent back to the user as the result
of statement 1. The total estimated time is 2 hours and 41
minutes.

5 REPLIES

Re: Too much resources

Dieter , I need your help!!
N/A

Re: Too much resources

Too much CPU?

There are 3 product joins due to the BETWEENs.
You might get rid of the two joining to a single row in TD13 using a Scalar Subquery:
http://developer.teradata.com/blog/dnoeth/2011/03/global-and-session-level-parameters-in-sql

But the 3rd is cross joining to estimated 127 rows.
What's the actual cardinality of the SUB_CTG_DIM table?

You should avoid typecast date->char->date, to find the last day of month better use:
ADD_MONTHS(VAL_DT - (EXTRACT(DAY FROM VAL_DT)-1),1)-1

Do you actually need the BETWEEEN when you already calculated the end date?
Maybe it's enough to join on
DT1.END_DT = DIM1.END_DT?

Dieter

Re: Too much resources

Thanks much for your suggestions!
1. I read your post for Scalar queries , very informative. However, I am not sure how to apply to the first two product joins because both of them have the WHERE clause based on both the tables. Can you show me an example please?
2. Help stats shows that SUB_CTG_DIM table has 127 unique values for DIM_ID and 2 for EFF_DT and END_DT.
I looked at the ecplain but it did not show me 127 rows, how did u come up with this number :) ?
3. I agree to removing the multiple cast , and ran ADD_MONTHS(VAL_DT - (EXTRACT(DAY FROM VAL_DT)-1),1)-1 , but PMON still shows me the same estimated time of 4 minutes. Do you think it will still help?
4. This select actually returns zero rows so I am not able to test this last condition of DT1.END_DT = DIM1.END_DT. But I will give it a try.

I really want to implement this suggestions so I will wait for ur response.

N/A

Re: Too much resources

#1:
You need to resolve the BETWEEN;
WHERE F.VAL_DT BETWEEN R.STRT_DT AND R.MO_END_DT
-->
(SELECT STRT_DT FROM RPT_MO_AN) <= F.VAL_DT
AND
(SELECT MO_END_DT FROM RPT_MO_AN) >= F.VAL_DT

#2:
You got a 150 AMP system as the unique row resulted in 150 rows after duplication.

"duplicated on all AMPs. The size of Spool 10 is estimated
with high confidence to be 19,050 rows."

--> 19050/150 = 127

But if there are only 2 values for EFF_DT and END_DT the join to SUB_CTG_DIM is probably not a 1-to-many (1:n) but a many-to-many (m:n) join. And this will result in n*m intermediate rows.

#3:
The estimated time will not change because the optimizer doesn't know/consider the actual CPU used for a calculation.

Dieter

Re: Too much resources

Thanks Dieter for responding so quickly and the detailed explanation.
We have not yet upgraded to TD13 but we are planning to , so this will be helpful. Once there are some data changes I will be testing this with your suggestions and will post the results.
Thanks much for ur help!