I have two versions(One is old which is being rewritten using some new standards) of code both claiming to accomplish the same thing; with the subset of data we tested both seems to give the same results. If you guys see any issues in the new code, pelase let me know.
FROM Rel_Detail A,
(SELECT Ten_Dig_Val,MAX(Rel_Detail)AS EFF_DT
WHERE Ten_Dig_Val IN (SELECT Ten_Dig_Val FROM Ten_Dig_Val_list)
GROUP BY Ten_Dig_Val) B
WHERE A.Ten_Dig_Val=B.Ten_Dig_Val AND
A.HIER_Rel_Detail = B.EFF_DT AND
Lvl_Nbr=(SELECT MIN(Lvl_Nbr) FROM Rel_Detail WHERE A.Ten_Dig_Val=Ten_Dig_Val AND HIER_Rel_Detail = B.EFF_DT)
FROM Rel_Detail A
SELECT Temp1.Ten_Dig_Val,MAX(HIER_Rel_Detail)AS EFF_DT
FROM Rel_Detail Temp1
inner join Ten_Dig_Val_list Temp2
on Temp1.Ten_Dig_Val = Temp2.Ten_Dig_Val
GROUP BY Temp1.Ten_Dig_Val
AND A.HIER_Rel_Detail = B.EFF_DT
inner join temp3 t3
on t3.Lvl_Nbr = A.Lvl_Nbr
and t3.Ten_Dig_Val = A.Ten_Dig_Val
AND t3.HIER_Rel_Detail = B.EFF_DT
create volatile table temp3
(Ten_Dig_Val BIGINT, Lvl_Nbr BYTEINT, HIER_Rel_Detail DATE FORMAT 'YYYY-MM-DD')on commit preserve rows;
insert into temp3
SELECT Ten_Dig_Val,Lvl_Nbr,HIER_Rel_Detail FROM Rel_Detail
qualify row_number() over(partition by Ten_Dig_Val,HIER_Rel_Detail order by Lvl_Nbr)=1
collect stats on temp3 column (Ten_Dig_Val);
collect stats on temp3 column (Lvl_Nbr);
collect stats on temp3 column (HIER_Rel_Detail);
Thanks in advance :)
Yes, both seem to do the same, but why do you create a Volatile Table instead if directly using it in a Derived Table?
I'm not shure but you probably want the minimum Lvl_Nbr for the maximum HIER_Rel_Detail, which seems to be much simpler:
FROM Rel_Detail A join Ten_Dig_Val_list Temp2
AND A.HIER_Rel_Detail = B.HIER_Rel_Detail
over(partition by Ten_Dig_Val
order by HIER_Rel_Detail desc, Lvl_Nbr)=1
Thanks Dieter for u r help & time :)
The data volume was around 30 to 40 million so thought that inserting into volatile table and collecting stats might help reduce the execution time.
Yes Dieter, thats what the query does, thanks again for this elegant and simple solution will try this method also.
Could you please look into my problem, I am new to this site. need your help.
We have problem with a query in which the aggregation stpes take more time to complete as given below the explain plan
We do an all-AMPs SUM step to aggregate from Spool 6 (Last Use) by
way of an all-rows scan , grouping by field1 (
,PDWRKCDR.A.TELP_TYPE). Aggregate Intermediate Results are
computed globally, then placed in Spool 13. The aggregate spool
file will not be cached in memory. The size of Spool 13 is
estimated with low confidence to be 25,734,269 rows (
41,226,298,938 bytes). The estimated time for this step is 1 hour
and 1 minute.
8) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 4 are sent back to the user as the result of
statement 1. The total estimated time is 1 hour and 3 minutes.
The stats were stealed but it the data is not changing so , i have recollect but there come no difference in reponse time.
but relate to sum aggregate in select because this query have many sum aggregate and “case when” condition (around 320 condition and sql 2000+ record) and very big data (VOL_CDR_PRE is around 740 million record).
How would i optimized the aggregation in order to optimized the query.
Did you compare the actual runtime and CPU/IO from DQBL (preferably QryLogSteps)?
You didn't show the full SQL, but i assume there are some joins in the previous steps. Depending on the actual data/PK/FK you might try to aggregate the large table(s) before the join. Otherwise materialize the data without aggregation in a Volatile Table with a PI on the GROUP BY columns and then aggregate on this VT, thus you'll get "computed locally".
Spool issue with the query, there is no null values in the join condition.
stats are uptodate, but still have issue.
FROM VP_SEM.CNL_DIM AL_EFAT_CNL_DIM
RIGHT JOIN VP_SEM.EVNT_CNL ECNL ON AL_EFAT_CNL_DIM.CNL_ID=ECNL.CNL_ID
RIGHT JOIN VP_SEM.ETIYA_FATURA_TALEP_DIM TDIM ON ECNL.CNL_ID = TDIM.STRT_CNL_ID
1) First, we
lock DP_ETL_SKEY.CHANNEL_KEY in view VP_SEM.CNL_DIM for
access, we lock DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM in view
VP_SEM.ETIYA_FATURA_TALEP_DIM for access, we lock
DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL for access, and we
lock DP_CORE_EDW.CNL in view VP_SEM.CNL_DIM for access.
2) Next, we do an all-AMPs
RETRIEVE step from DP_CORE_EDW.CNL in view
VP_SEM.CNL_DIM by way of an all-rows scan with a condition of (
"DP_CORE_EDW.CNL in view VP_SEM.CNL_DIM.DWH_STATUS = 'AC'") into
Spool 4 (all_amps), which is duplicated on all AMPs. The size of
Spool 4 is estimated with low confidence to be 230,688 rows (
11,995,776 bytes). The estimated time for this step is 0.04
3) We do an all-AMPs
JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to 10 partitions of
DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL with a condition of (
"DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL.DWH_CLOSED_LOAD IS
). Spool 4 and DP_CORE_EDW.EVNT_CNL are right outer joined
dynamic hash join, with condition(s) used for non-matching
on right table (
"NOT (DP_CORE_EDW.EVNT_CNL.CNL_ID IS NULL)"), with
a join condition of (
"CNL_ID = DP_CORE_EDW.EVNT_CNL.CNL_ID"). The
result goes into
Spool 5 (all_amps), which is duplicated on all
AMPs. Then we do a SORT to order
Spool 5 by the hash code of (
DP_CORE_EDW.EVNT_CNL.CNL_ID). The result spool file will not be
cached in memory. The size of
Spool 5 is estimated with low
confidence to be 458,469,681,120 rows (25,674,302,142,720 bytes).
The estimated time for this step is
19 hours and 43 minutes.
4) We do an all-AMPs
RETRIEVE step from
DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM in view
VP_SEM.ETIYA_FATURA_TALEP_DIM by way of an all-rows scan with no
residual conditions into
Spool 7 (all_amps), which is built
locally on the AMPs. Then we do a SORT to order Spool 7 by the
hash code of (
The size of
Spool 7 is estimated with high confidence to be
12,691,911 rows (469,600,707 bytes). The estimated time for this
step is 0.86 seconds.
5) We do an all-AMPs
JOIN step from Spool 5 (Last Use) by way of a
RowHash match scan, which is joined to Spool 7 (Last Use) by way
RowHash match scan. Spool 5 and Spool 7 are right outer
joined using a
merge join, with condition(s) used for non-matching
on right table (
"NOT (STRT_CNL_ID IS NULL)"), with a join
condition of (
"CNL_ID = STRT_CNL_ID"). The result goes into Spool
3 (all_amps), which is
built locally on the AMPs. The result
spool file will not be cached in memory. The size of
Spool 3 is
estimated with low confidence to be 128,621,109,816,881 rows (***
bytes). The estimated time for this step is 1,
111 hours and 49
6) We do an all-AMPs SAMPLING step from
Spool 3 (Last Use) by way of
all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. Samples are specified as a number of rows.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of
Spool 1 are sent back to the user as the result of
Stats might be up-to-date, but at least one on VP_SEM.CNL_DIM.DWH_STATUS is missing, DIAGNOSTIC HELPSTATS ON FOR SESSION; will reveal this.
The estimates are horrible, what's the actual row counts for those tables?
Seems the optimizer has no information about uniqueness or the join is not correctly using the logical PKs/FKs.
Can you provide more details, DDL/PI, stats?