Explain

Database
Enthusiast

Explain

Hi Dieter!
I have shown the qry and the explain plan. I collected stats on LOV_DX column(SEC_TSK) and
PRNTACTcolumn(CRT_TM). After this the explain shows high confidence wherever these columns are used, however the query runs for a few minutes longer. Should we consider this an imprvmt in performnce?
I have also notice that there are some NUSI on which the stats have not been collected? Will collecting stats on these help?
Also I have read that reccomended stats may not always be right? So which ones out of the reccomended stats should be useD? Is there any other way to tune this?

Thanks a lot !

explain
SELECT X.c1 AS c1 , X.c2 AS c2 , X.c3 AS c3 , X.c4 AS c4 ,
X.c5 AS c5 , X.c6 AS c6 , X.c7 AS c7 , X.c8 AS c8 , X.c9 AS c9
FROM (
SELECT X.c1 AS c1 , X.c2 AS c2 , X.c3 AS c3 , X.c4 AS c4 , X.c5 AS c5 , X.c6 AS c6 , X.c7 AS c7 , X.c8 AS c8 ,
SUM ( X.c9 ) OVER ( partition BY X.c8 , X.c2 , X.c4 ) AS c9
FROM (
SELECT A."PER_NAME_MONTH" AS c1 , A."DAY_DT" AS c2 ,
CASE
WHEN A."DAY" = 'Monday' THEN '1. Monday'
WHEN A."DAY" = 'Tuesday' THEN '2. Tuesday'
WHEN A."DAY" = 'Wednesday' THEN '3. Wednesday'
WHEN A."DAY" = 'Thursday' THEN '4. Thursday'
WHEN A."DAY" = 'Friday' THEN '5. Friday'
WHEN A."DAY" = 'Saturday' THEN '6. Saturday'
WHEN A."DAY" = 'Sunday' THEN '7. Sunday'
ELSE A."DAY"
END AS c3 ,
CASE
WHEN E."CRT_TM" < 6000000 THEN 'a. 12am-6am'
WHEN E."CRT_TM" BETWEEN 6000000 AND 7000000 THEN 'b. 6am-7am'
WHEN E."CRT_TM" BETWEEN 7000001 AND 8000000 THEN 'c. 7am-8am'
WHEN E."CRT_TM" BETWEEN 8000001 AND 9000000 THEN 'd. 8am-9am'
WHEN E."CRT_TM" BETWEEN 9000001 AND 10000000 THEN 'e. 9am-10am'
WHEN E."CRT_TM" BETWEEN 10000001 AND 11000000 THEN 'f. 10am-11am'
WHEN E."CRT_TM" BETWEEN 11000001 AND 12000000 THEN 'g. 11am-12pm'
WHEN E."CRT_TM" BETWEEN 12000001 AND 13000000 THEN 'h. 12pm-1pm'
WHEN E."CRT_TM" BETWEEN 13000001 AND 14000000 THEN 'i. 1pm-2pm'
WHEN E."CRT_TM" BETWEEN 14000001 AND 15000000 THEN 'j. 2pm-3pm'
WHEN E."CRT_TM" BETWEEN 15000001 AND 16000000 THEN 'k. 3pm-4pm'
WHEN E."CRT_TM" >= 16000001 THEN 'l. After 4pm'
ELSE ''
END AS c4 ,
COUNT ( E."X_TSKID" ) AS c5 , D."ATR_03" AS c6 ,
D."ATR_04" AS c7 , D."SEC_TSK" AS c8 ,
COUNT ( E."ROW" ) AS c9
FROM "DAY_D" A , "LOV_D" B , "LOV_D" C ,
"LOV_DX" D , "PRNTACT" E
WHERE ( B."ROW" = D."ROW"
AND B."ROW" = E."ACT_TYPE"
AND A."ROW" = E."ACTUAL_START"
AND C."NAME" = D."ATR_01"
AND B."TYPE" = 'PI_TSK'
AND C."TYPE" = 'PI_DPT'
AND C."ROW" = E."X_DPT"
AND D."SEC_TSK" = 'Single Sum'
AND C."LONG_DESC" <> 'INC OC'
AND C."LONG_DESC" <> 'OUT OC'
AND A."DAY_DT" >= tImEsTaMp'2011-01-01 00:00:00'
AND E."CRT_TM" <= 16000000 )
GROUP BY c2 , c1 , c6 , c7, c8, c3 , c4
) X
QUALIFY RONUMBER ( ) OVER ( PARTITION BY X.c1 , X.c2 , X.c4 ,
X.c6 , X.c7 , X.c8
ORDER BY c1 , c2 , c4 , c6 ,c7 , c8 ) = 1 ) X
ORDER BY 6

1) First, we lock a distinct TF001."pseudo table" for read on a
RowHash to prevent global deadlock for TF001.E.
2) Next, we lock a distinct TF001."pseudo table" for read on a
RowHash to prevent global deadlock for TF001.D.
3) We lock a distinct TF001."pseudo table" for read on a RowHash to
prevent global deadlock for TF001.C.
4) We lock a distinct TF001."pseudo table" for read on a RowHash to
prevent global deadlock for TF001.A.
5) We lock TF001.E for read, we lock TF001.D for read, we
lock TF001.C for read, and we lock TF001.A for read.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from TF001.D by way of
an all-rows scan with a condition of (
"(TF001.D.SEC_TSK = 'Single Sum') AND ((NOT
(TF001.D.ATR_01 IS NULL )) AND (NOT
(TF001.D.ROW IS NULL )))") into Spool 5 (all_amps)
(compressed columns allowed), which is built locally on the
AMPs. The size of Spool 5 is estimated with high confidence
to be 1,295 rows. The estimated time for this step is 0.00
seconds.
2) We do an all-AMPs RETRIEVE step from TF001.C by way of
index # 8 ""TYPE" = 'PI_DPT'" with a residual condition of (
"(TF001.C.LONG_DESC <> 'OUT OCC') AND
((TF001.C.LONG_DESC <> 'INC OCC') AND
(TF001.C."TYPE" = 'PI_DPT'))") into Spool 6 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs.
The size of Spool 6 is estimated with high confidence to be
31,500 rows. The estimated time for this step is 0.04
seconds.
3) We do an all-AMPs RETRIEVE step from TF001.B by way of
an all-rows scan with a condition of ("TF001.B."TYPE" =
'PI_TSK'") into Spool 7 (all_amps) (compressed columns
allowed), which is redistributed by hash code to all AMPs.
The size of Spool 7 is estimated with high confidence to be
2,436 rows. The estimated time for this step is 0.02 seconds.
7) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
all-rows scan, which is joined to Spool 6 (Last Use) by way of an
all-rows scan. Spool 5 and Spool 6 are joined using a single
partition hash join, with a join condition of ("NAME = ATR_01").
The result goes into Spool 8 (all_amps) (compressed columns
allowed), which is redistributed by hash code to all AMPs. The
size of Spool 8 is estimated with low confidence to be 1,283 rows.
The estimated time for this step is 0.01 seconds.
8) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
all-rows scan, which is joined to Spool 8 (Last Use) by way of an
all-rows scan. Spool 7 and Spool 8 are joined using a single
partition hash join, with a join condition of ("ROW = ROW").
The result goes into Spool 9 (all_amps) (compressed columns
allowed), which is built locally on the AMPs. The size of Spool 9
is estimated with low confidence to be 1,283 rows. The estimated
time for this step is 0.01 seconds.
9) We do an all-AMPs RETRIEVE step from TF001.E by way of an
all-rows scan with a condition of ("(TF001.E.CRT_TM
<= 16000000.) AND (NOT (TF001.E.X_DPT IS NULL ))") into
Spool 10 (all_amps) (compressed columns allowed), which is
redistributed by hash code to all AMPs. The input table will not
be cached in memory, but it is eligible for synchronized scanning.
The size of Spool 10 is estimated with high confidence to be
40,589,897 rows. The estimated time for this step is 17.10
seconds.
10) We do an all-AMPs RETRIEVE step from TF001.A by way of an
all-rows scan with a condition of ("TF001.A.DAY_DT >=
TIMESTAMP '2011-01-01 00:00:00'") into Spool 11 (all_amps)
(compressed columns allowed) fanned out into 2 hash join
partitions, which is duplicated on all AMPs. The size of Spool 11
is estimated with high confidence to be 2,219,400 rows. The
estimated time for this step is 0.27 seconds.
11) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of an
all-rows scan, which is joined to Spool 10 (Last Use) by way of an
all-rows scan. Spool 9 and Spool 10 are joined using a product
join, with a join condition of ("(ACT_TYPE = ROW) AND
((ROW = ACT_TYPE) AND (ROW = X_DPT ))"). The
result goes into Spool 12 (all_amps) (compressed columns allowed),
which is built locally on the AMPs into 2 hash join partitions.
The size of Spool 12 is estimated with low confidence to be
8,297,776 rows. The estimated time for this step is 3.75 seconds.
12) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an
all-rows scan, which is joined to Spool 12 (Last Use) by way of an
all-rows scan. Spool 11 and Spool 12 are joined using a hash join
of 2 partitions, with a join condition of ("ROW =
ACTUAL_START"). The result goes into Spool 4 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The size of Spool 4 is estimated with low confidence to be
8,297,776 rows. The estimated time for this step is 0.80 seconds.
13) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
way of an all-rows scan, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 13. The size of Spool 13 is estimated with low
confidence to be 6,223,332 rows. The estimated time for this step
is 22.39 seconds.
14) We do an all-AMPs RETRIEVE step from Spool 13 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps) (compressed columns
allowed), which is built locally on the AMPs. The size of Spool 1
is estimated with low confidence to be 6,223,332 rows. The
estimated time for this step is 0.86 seconds.
15) We do an all-AMPs STAT FUNCTION step from Spool 1 (Last Use) by
way of an all-rows scan into Spool 18 (Last Use), which is assumed
to be redistributed by value to all AMPs. The result rows are put
into Spool 16 (all_amps) (compressed columns allowed), which is
built locally on the AMPs.
16) We do an all-AMPs STAT FUNCTION step from Spool 16 (Last Use) by
way of an all-rows scan into Spool 21 (Last Use), which is assumed
to be redistributed by value to all AMPs. The result rows are put
into Spool 20 (all_amps) (compressed columns allowed), which is
built locally on the AMPs.
17) We do an all-AMPs RETRIEVE step from Spool 20 (Last Use) by way of
an all-rows scan with a condition of ("Field_11 = 1") into Spool 2
(all_amps) (compressed columns allowed), which is built locally on
the AMPs. The size of Spool 2 is estimated with no confidence to
be 6,223,332 rows. The estimated time for this step is 0.75
seconds.
18) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan into Spool 24 (group_amps), which is built
locally on the AMPs. Then we do a SORT to order Spool 24 by the
sort key in spool fielX. The size of Spool 24 is estimated with
no confidence to be 6,223,332 rows. The estimated time for this
step is 0.75 seconds.
19) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 24 are sent back to the user as the result
of statement 1.
BEGIN RECOMMENDED STATS ->
20) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACT_TYPE ,
ACTUAL_START ,X_DPT ,CRT_TM)".
(HighConf)
21) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACTUAL_START ,
X_DPT)". (HighConf)
22) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACTUAL_START ,
X_DPT ,CRT_TM)". (HighConf)
23) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(X_DPT ,
ACT_TYPE ,ACTUAL_START)". (HighConf)
24) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACT_TYPE ,
CRT_TM)". (LowConf)
25) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACT_TYPE ,
X_DPT ,CRT_TM)". (HighConf)
26) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACT_TYPE ,
ACTUAL_START ,CRT_TM)". (HighConf)
27) "COLLECT STATISTICS TF001.PRNTACT COLUMN
(ACTUAL_START ,
CRT_TM)". (HighConf)
28) "COLLECT STATISTICS TF001.LOV_DX COLUMN (SEC_TSK
,
ROW)". (HighConf)
29) "COLLECT STATISTICS TF001.LOV_DX COLUMN (ATR_01
,SEC_TSK ,
ATR_03 ,ATR_04)". (HighConf)
30) "COLLECT STATISTICS TF001.LOV_DX COLUMN (SEC_TSK
,
ATR_03 ,ATR_04)". (HighConf)
31) "COLLECT STATISTICS TF001.LOV_DX COLUMN (ATR_04)".
(LowConf)
32) "COLLECT STATISTICS TF001.LOV_DX COLUMN (ROW
,SEC_TSK ,
ATR_03 ,ATR_04)". (HighConf)
33) "COLLECT STATISTICS TF001.LOV_DX COLUMN (SEC_TSK
,
ATR_01)". (HighConf)
34) "COLLECT STATISTICS TF001.LOV_DX COLUMN (ROW ,ATR_04)".
(LowConf)
35) "COLLECT STATISTICS TF001.LOV_DX COLUMN (ROW ,ATR_01
,
SEC_TSK ,ATR_03 ,ATR_04)". (HighConf)
36) "COLLECT STATISTICS TF001.LOV_D COLUMN (NAME)". (HighConf)
<- END RECOMMENDED STATS

LOV_DX -no stats on nusi
INDEX LOV_DX_NUSI1 ( OWNR )
INDEX LOV_DX_NUSI2 ( TER_TSK );

show table LOV_D
UNIQUE PRIMARY INDEX LOV_D_UPI ( "TYPE" ,VAL )
INDEX LOV_D_NUSI2 ( ROW )
INDEX LOV_D_NUSI1 ( "TYPE" );

show table PRNTACT no stats on nusi
UNIQUE PRIMARY INDEX ( ROW )
INDEX PRNT_ACT_NUSI1 ( ACTL_END )
INDEX PRNT_ACT_NUSI2 ( ACT )
INDEX PRNT_ACT_NUSI3 (DPT )
INDEX PRNT_ACT_NUSI4 (DT_RCVD );

show table DAY_D
UNIQUE PRIMARY INDEX DAY_D_UPI ( ROW );

3 REPLIES
Senior Apprentice

Re: Explain

Without stats the optimizer will use a NUSI only when access is based on equality, "NUSIcol = value".

So stats on NUSI are usually recommended, read Carrie's articles on stats, especially
http://developer.teradata.com/database/articles/statistics-collection-recommendations-for-teradata-12

A recommended stat might be usefull if the column is used a lot in other queries, too.

Regarding optimization, i don't think you need the QUALIFY as it's on the same columns used in the previous GROUP BY.

Dieter
Enthusiast

Re: Explain

Thank you so much for responding.
Regarding the Qualify, the columns in Group by are different than the Partition by . So will it give the same result if I remove Qualify?
Also, as far as I understand the qualify =1 will remove the dupes based on columns in partition, please correct me if I am wrong.

I also tried using the equality conditions in where clause as below but this just increased the CPU skew to 98% in PMON.
Doesn't filtering like this improve the perf?
FROM ( sel * from "DAY_D" where "DAY_DT" >= tImEsTaMp'2011-01-01 00:00:00' ) A ,
( sel * from "LOV_D" where "TYPE" = 'PI_TSK' ) B ,
(sel * from "LOV_D" where "LONG_DESC" <> 'C OC' and "LONG_DESC" <> 'T OC' and "TYPE" = 'PI_DPT' ) C ,
(sel * from "LOV_DX" where "SEC_TSK" = 'Single' ) D ,
( sel * from "PRNTACT" where "CRT_TM" <= 16000000 ) E
WHERE ( ....

Also, generally I collect stats on all the cols in WHERE ocndition. What would you suggest regarding collecting stats?
What are the things in explain that I should look for ..like no confidence?
Sorry, too many qsns but I have an experience taht you generally follow up till the query gets resolved...and this time I realy expect it as it will help me clear all my doubts.

Senior Apprentice

Re: Explain

The only additional column in the GROUP BY is c3, which returns the day of week.
A."DAY_DT" sounds like a date, i don't think there are different weekdays for any given date.
When there are no dups then why try to remove them?

The PARTITION and the ORDER BY in your ROW_NUMBER are exactly the same, if there are multiple rows per partition this will return one of them randomly. You could order by a single column instead of six columns and get the same answer.

You should check if the explain changed after using those Derived Tables.
As there's nothing complex like distinct/aggregation/OLAP in those DTs i would not expect a different plan, but you never know.

You can't collect stats on all those columns found in any WHERE-condition, those stats have to be recollected at a later time. So only collect on those columns used in different queries and only if they actually change the plan.

When you already use DIAGNOSTIC HELPSTATS you don't have to search for missing stats anymore, but you should check if an existing stat results in high (at least low) confidence. If not it's useless.

Dieter