Query Rewriting.

Database
Enthusiast

Query Rewriting.

Hi All,

I have two versions(One is old which is being rewritten using some new standards) of code both claiming to accomplish the same thing; with the subset of data we tested both seems to give the same results. If you guys see any issues in the new code, pelase let me know.

Old Code:

SELECT  A.Ten_Dig_Val,A.SUP_Ten_Dig_Val
FROM    Rel_Detail A,
(SELECT Ten_Dig_Val,MAX(Rel_Detail)AS EFF_DT
FROM    Rel_Detail
WHERE   Ten_Dig_Val IN (SELECT Ten_Dig_Val FROM Ten_Dig_Val_list)
GROUP BY Ten_Dig_Val) B
WHERE   A.Ten_Dig_Val=B.Ten_Dig_Val AND
A.HIER_Rel_Detail = B.EFF_DT AND
Lvl_Nbr=(SELECT MIN(Lvl_Nbr) FROM Rel_Detail WHERE A.Ten_Dig_Val=Ten_Dig_Val AND HIER_Rel_Detail = B.EFF_DT)

New Approach:

SELECT  A.Ten_Dig_Val,A.SUP_Ten_Dig_Val
FROM    Rel_Detail A
inner join
(
SELECT Temp1.Ten_Dig_Val,MAX(HIER_Rel_Detail)AS EFF_DT
FROM    Rel_Detail Temp1
inner join Ten_Dig_Val_list Temp2
on Temp1.Ten_Dig_Val = Temp2.Ten_Dig_Val
GROUP BY Temp1.Ten_Dig_Val
) B
on   A.Ten_Dig_Val=B.Ten_Dig_Val
AND A.HIER_Rel_Detail = B.EFF_DT

inner join temp3 t3
on t3.Lvl_Nbr = A.Lvl_Nbr
and t3.Ten_Dig_Val = A.Ten_Dig_Val
AND t3.HIER_Rel_Detail = B.EFF_DT

create volatile table temp3
(Ten_Dig_Val BIGINT, Lvl_Nbr BYTEINT, HIER_Rel_Detail DATE FORMAT 'YYYY-MM-DD')on commit preserve rows;

insert into temp3
SELECT Ten_Dig_Val,Lvl_Nbr,HIER_Rel_Detail FROM Rel_Detail
qualify row_number() over(partition by Ten_Dig_Val,HIER_Rel_Detail order by Lvl_Nbr)=1

collect stats on temp3 column (Ten_Dig_Val);
collect stats on temp3 column (Lvl_Nbr);
collect stats on temp3 column (HIER_Rel_Detail);

Thanks in advance :)

7 REPLIES
Enthusiast

Re: Query Rewriting.

Hi,

anyone pls confirm whether both the snippets/logics are doing the same thing??

Junior Contributor

Re: Query Rewriting.

Yes, both seem to do the same, but why do you create a Volatile Table instead if directly using it in a Derived Table?

I'm not shure but you probably want the minimum Lvl_Nbr for the maximum HIER_Rel_Detail, which seems to be much simpler: 

SELECT  A.Ten_Dig_Val,A.SUP_Ten_Dig_Val
FROM Rel_Detail A join Ten_Dig_Val_list Temp2
on A.Ten_Dig_Val=B.Ten_Dig_Val
AND A.HIER_Rel_Detail = B.HIER_Rel_Detail
qualify
row_number()
over(partition by Ten_Dig_Val
order by HIER_Rel_Detail desc, Lvl_Nbr)=1

Dieter

Enthusiast

Re: Query Rewriting.

Thanks Dieter for u r help & time :)

    The data volume was around 30 to 40 million so thought that inserting into volatile table and collecting stats might help reduce the execution time.

   Yes Dieter, thats what the query does, thanks again for this elegant and simple solution will try this method also.

Teradata Employee

Re: Query Rewriting.

Hi Dieter,

Could you please look into my problem, I am new to this site. need your help.

We have problem with a query in which the aggregation stpes take more time to complete as given below the explain plan

We do an all-AMPs SUM step to aggregate from Spool 6 (Last Use) by

     way of an all-rows scan , grouping by field1 (

     PDWRKCDR.A.CALL_START_DT ,PDWRKCDR.A.ACCS_METH_ID

     ,PDWRKCDR.A.ACCS_METH_VAL ,PDWRKCDR.A.CCB_SBSCRPN_ID

     ,PDWRKCDR.A.TELP_TYPE).  Aggregate Intermediate Results are

     computed globally, then placed in Spool 13.  The aggregate spool

     file will not be cached in memory.  The size of Spool 13 is

     estimated with low confidence to be 25,734,269 rows (

     41,226,298,938 bytes).  The estimated time for this step is 1 hour

     and 1 minute.

  8) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 4 are sent back to the user as the result of

     statement 1.  The total estimated time is 1 hour and 3 minutes.

The stats were stealed but it the data is not changing so , i have recollect but there come no difference in reponse time.

but relate to sum aggregate in select because this query have many sum aggregate and “case when” condition (around 320 condition and sql 2000+ record) and very big data (VOL_CDR_PRE is around 740 million record). 

How would i optimized the aggregation in order to optimized the query.

Thanks

Junior Contributor

Re: Query Rewriting.

Did you compare the actual runtime and CPU/IO from DQBL (preferably QryLogSteps)?

You didn't show the full SQL, but i assume there are some joins in the previous steps. Depending on the actual data/PK/FK you might try to aggregate the large table(s) before the join. Otherwise materialize the data without aggregation in a Volatile Table with a PI on the GROUP BY columns and then aggregate on this VT, thus you'll get "computed locally".

Dieter

Teradata Employee

Re: Query Rewriting.

Spool issue with the query, there is no null values in the join condition.

stats are uptodate, but still have issue.

sel AL_EFAT_CNL_DIM.CNL_NAME,TDIM.EMAIL_TALEP_F,TDIM.BASILI_TALEP_F,TDIM.SMS_BILGI_TALEP_F,TDIM.EFATURA_POSTA_KUTUSU_F

FROM VP_SEM.CNL_DIM AL_EFAT_CNL_DIM

RIGHT JOIN VP_SEM.EVNT_CNL ECNL ON AL_EFAT_CNL_DIM.CNL_ID=ECNL.CNL_ID

RIGHT JOIN VP_SEM.ETIYA_FATURA_TALEP_DIM TDIM ON ECNL.CNL_ID = TDIM.STRT_CNL_ID

Explain plan:

1) First, we

lock DP_ETL_SKEY.CHANNEL_KEY in view VP_SEM.CNL_DIM for

access, we lock DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM in view

VP_SEM.ETIYA_FATURA_TALEP_DIM for access, we lock

DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL for access, and we

lock DP_CORE_EDW.CNL in view VP_SEM.CNL_DIM for access.

2) Next, we do an all-AMPs

RETRIEVE step from DP_CORE_EDW.CNL in view

VP_SEM.CNL_DIM by way of an all-rows scan with a condition of (

"DP_CORE_EDW.CNL in view VP_SEM.CNL_DIM.DWH_STATUS = 'AC'") into

Spool 4 (all_amps), which is duplicated on all AMPs. The size of

Spool 4 is estimated with low confidence to be 230,688 rows (

11,995,776 bytes). The estimated time for this step is 0.04

seconds.

3) We do an all-AMPs

JOIN step from Spool 4 (Last Use) by way of an

all-rows scan, which is joined to 10 partitions of

DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL with a condition of (

"DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL.DWH_CLOSED_LOAD IS

NULL"

). Spool 4 and DP_CORE_EDW.EVNT_CNL are right outer joined

using a

dynamic hash join, with condition(s) used for non-matching

on right table (

"NOT (DP_CORE_EDW.EVNT_CNL.CNL_ID IS NULL)"), with

a join condition of (

"CNL_ID = DP_CORE_EDW.EVNT_CNL.CNL_ID"). The

result goes into

Spool 5 (all_amps), which is duplicated on all

AMPs. Then we do a SORT to order

Spool 5 by the hash code of (

DP_CORE_EDW.EVNT_CNL.CNL_ID). The result spool file will not be

cached in memory. The size of

Spool 5 is estimated with low

confidence to be 458,469,681,120 rows (25,674,302,142,720 bytes).

The estimated time for this step is

19 hours and 43 minutes.

4) We do an all-AMPs

RETRIEVE step from

DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM in view

VP_SEM.ETIYA_FATURA_TALEP_DIM by way of an all-rows scan with no

residual conditions into

Spool 7 (all_amps), which is built

locally on the AMPs. Then we do a SORT to order Spool 7 by the

hash code of (

DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM.STRT_CNL_ID).

The size of

Spool 7 is estimated with high confidence to be

12,691,911 rows (469,600,707 bytes). The estimated time for this

step is 0.86 seconds.

5) We do an all-AMPs

JOIN step from Spool 5 (Last Use) by way of a

RowHash match scan, which is joined to Spool 7 (Last Use) by way

of a

RowHash match scan. Spool 5 and Spool 7 are right outer

joined using a

merge join, with condition(s) used for non-matching

on right table (

"NOT (STRT_CNL_ID IS NULL)"), with a join

condition of (

"CNL_ID = STRT_CNL_ID"). The result goes into Spool

3 (all_amps), which is

built locally on the AMPs. The result

spool file will not be cached in memory. The size of

Spool 3 is

estimated with low confidence to be 128,621,109,816,881 rows (***

bytes). The estimated time for this step is 1,

111 hours and 49

minutes.

6) We do an all-AMPs SAMPLING step from

Spool 3 (Last Use) by way of

an

all-rows scan into Spool 1 (group_amps), which is built locally

on the AMPs. Samples are specified as a number of rows.

7) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of

Spool 1 are sent back to the user as the result of

Junior Contributor

Re: Query Rewriting.

Stats might be up-to-date, but at least one on VP_SEM.CNL_DIM.DWH_STATUS is missing, DIAGNOSTIC HELPSTATS ON FOR SESSION; will reveal this.

The estimates are horrible, what's the actual row counts for those tables?

Seems the optimizer has no information about uniqueness or the join is not correctly using the logical PKs/FKs. 

Can you provide more details, DDL/PI, stats?