Optimizer choses different path when a meaningless group by is added to a dreived table

Database

Optimizer choses different path when a meaningless group by is added to a dreived table

We stumbled upon this by accident. We have a query that runs much faster when a group by is added to a derived table when one of the columns in the derived table is the result of a cast/concatenation operation.

The first query/explain posted below has the group by in the derived table. It shows the cast/concatenation being resolved when the derived table is materialized in step 2 in the EXPLAIN. Result is 761 rows and I assume 761 resolutions of the cast/concatenation. The group by does not eliminate any dups.

The second query/explain posted below shows the cast/concatenation taking place In step 3 and 4. Step 3 is a join to to a 9 million row table and step 4 is a join to a 3 million row table. I think this means that the cast/concatenation resolution is happening 9 and 3 million times as opposed to 761 times.

Does anyone have an expaination why the group by in the derived table would casue the resolution of the cast/concatenation to take place when the derived table is materialized?

Query with group by in derived table

Explain select
p.repl_part_no,
i.fac_cd,
t.calendar_date,
t.calendar_ts,
i.prim_vndr_cd,
i.plan_unt_cd,
sum(i.onhd_qty) CHN_ONHD_QTY,
sum(i.in_trnst_qty) CHN_IN_TRNST_QTY,
sum(i.rcpt_in_prcs_qty) CHN_RCPT_IN_PRCS_QTY,
sum(i.bkord_qty) CHN_BKORD_QTY
from
wwvw101.inve_inv_a i
join
(select
calendar_date,
cast( cast ( (calendar_date (format 'yyyy-mm-dd') ) as char(10)) || ' 12:00:00' as timestamp) calendar_ts
--cast(calendar_date as timestamp) + interval '12' hour "calendar_ts",
from
sys_calendar.calendar
where
calendar_date <= date
and
calendar_date >= date - interval '25' month
group by
calendar_date, calendar_ts ) t
on
(t.calendar_ts between i.data_vald_bgn_ts and i.data_vald_end_ts
and
(i.bkord_qty > 0 or i.onhd_qty > 0 or i.in_trnst_qty > 0 or i.rcpt_in_prcs_qty > 0 ))
left join
wwvw101.item_part_no_a p
on
(i.part_25_no = p.part_25_no
and
t.calendar_ts between p.data_vald_bgn_ts and p.data_vald_end_ts)
group by
p.repl_part_no, i.fac_cd, t.calendar_date,t.calendar_ts,i.prim_vndr_cd,i.plan_unt_cd

1) First, we lock WWTB101.ITEM_PART_NO in view wwvw101.item_part_no_a
for access, we lock WWTB101.INVE_INV in view wwvw101.inve_inv_a
for access, and we lock sys_calendar.CALTABLE for access.
2) Next, we do an all-AMPs SUM step to aggregate from
sys_calendar.CALTABLE by way of an all-rows scan with a condition
of ("(sys_calendar.CALTABLE.calendar_date <= DATE '2011-05-03')
AND (sys_calendar.CALTABLE.calendar_date >= DATE '2009-04-03')")
, grouping by field1 ( sys_calendar.CALTABLE.calendar_date
,(TRANSLATE(((TRIM(BOTH FROM sys_calendar.CALTABLE.calendar_date
(CHAR(10), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'yyyy-mm-dd'))(CHAR(10), CHARACTER SET LATIN, NOT
CASESPECIFIC))||' 12:00:00')USING LATIN_TO_UNICODE))(TIMESTAMP(6))).
Aggregate Intermediate Results are computed locally, then placed
in Spool 3. The size of Spool 3 is estimated with no confidence
to be 761 rows (34,245 bytes). The estimated time for this step
is 0.02 seconds.
3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (used to materialize view, derived
table or table function t) (all_amps), which is built locally on
the AMPs. The size of Spool 1 is estimated with no confidence to
be 761 rows (26,635 bytes). The estimated time for this step is
0.02 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 8 (all_amps), which is duplicated on
all AMPs. The size of Spool 8 is estimated with no confidence to
be 219,168 rows (5,917,536 bytes). The estimated time for this
step is 0.04 seconds.
5) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to WWTB101.INVE_INV in view
wwvw101.inve_inv_a by way of an all-rows scan with a condition of
("(WWTB101.INVE_INV in view wwvw101.inve_inv_a.BKORD_QTY > 0) OR
((WWTB101.INVE_INV in view wwvw101.inve_inv_a.ONHD_QTY > 0) OR
((WWTB101.INVE_INV in view wwvw101.inve_inv_a.IN_TRNST_QTY > 0) OR
(WWTB101.INVE_INV in view wwvw101.inve_inv_a.RCPT_IN_PRCS_QTY > 0
)))"). Spool 8 and WWTB101.INVE_INV are joined using a product
join, with a join condition of ("(CALENDAR_TS >=
WWTB101.INVE_INV.DATA_VALD_BGN_TS) AND (CALENDAR_TS <=
WWTB101.INVE_INV.DATA_VALD_END_TS)"). The result goes into Spool
9 (all_amps), which is redistributed by the hash code of (
WWTB101.INVE_INV.PART_25_NO) to all AMPs. Then we do a SORT to
order Spool 9 by row hash. The result spool file will not be
cached in memory. The size of Spool 9 is estimated with no
confidence to be 194,375,187 rows (15,550,014,960 bytes). The
estimated time for this step is 57.85 seconds.
6) We do an all-AMPs JOIN step from WWTB101.ITEM_PART_NO in view
wwvw101.item_part_no_a by way of a RowHash match scan with no
residual conditions, which is joined to Spool 9 (Last Use) by way
of a RowHash match scan. WWTB101.ITEM_PART_NO and Spool 9 are
right outer joined using a merge join, with a join condition of (
"(CALENDAR_TS <= WWTB101.ITEM_PART_NO.DATA_VALD_END_TS) AND
((CALENDAR_TS >= WWTB101.ITEM_PART_NO.DATA_VALD_BGN_TS) AND
(PART_25_NO = WWTB101.ITEM_PART_NO.PART_25_NO ))"). The result
goes into Spool 7 (all_amps), which is built locally on the AMPs.
The result spool file will not be cached in memory. The size of
Spool 7 is estimated with low confidence to be 206,770,166 rows (
16,955,153,612 bytes). The estimated time for this step is 10.69
seconds.
7) We do an all-AMPs SUM step to aggregate from Spool 7 (Last Use) by
way of an all-rows scan , grouping by field1 (
WWTB101.ITEM_PART_NO.REPL_PART_NO ,WWTB101.INVE_INV.FAC_CD
,sys_calendar.CALTABLE.CALENDAR_DATE
,(TRANSLATE((sys_calendar.CALTABLE.Field_3 )USING
LATIN_TO_UNICODE))(TIMESTAMP(6)),WWTB101). Aggregate Intermediate
Results are computed globally, then placed in Spool 12. The
aggregate spool file will not be cached in memory. The size of
Spool 12 is estimated with no confidence to be 155,077,625 rows (
22,021,022,750 bytes). The estimated time for this step is 2
minutes and 27 seconds.
8) We do an all-AMPs RETRIEVE step from Spool 12 (Last Use) by way of
an all-rows scan into Spool 5 (group_amps), which is built locally
on the AMPs. The size of Spool 5 is estimated with no confidence
to be 155,077,625 rows (16,128,073,000 bytes). The estimated time
for this step is 7.37 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 5 are sent back to the user as the result of
statement 1. The total estimated time is 3 minutes and 43 seconds.
Query without group by in derived table
Explain select
p.repl_part_no,
i.fac_cd,
t.calendar_date,
t.calendar_ts,
i.prim_vndr_cd,
i.plan_unt_cd,
sum(i.onhd_qty) CHN_ONHD_QTY,
sum(i.in_trnst_qty) CHN_IN_TRNST_QTY,
sum(i.rcpt_in_prcs_qty) CHN_RCPT_IN_PRCS_QTY,
sum(i.bkord_qty) CHN_BKORD_QTY
from
wwvw101.inve_inv_a i
join
(select
calendar_date,
cast( cast ( (calendar_date (format 'yyyy-mm-dd') ) as char(10)) || ' 12:00:00' as timestamp) calendar_ts
--cast(calendar_date as timestamp) + interval '12' hour "calendar_ts",
from
sys_calendar.calendar
where
calendar_date <= date
and
calendar_date >= date - interval '25' month
/* group by
calendar_date, calendar_ts */ ) t
on
(t.calendar_ts between i.data_vald_bgn_ts and i.data_vald_end_ts
and
(i.bkord_qty > 0 or i.onhd_qty > 0 or i.in_trnst_qty > 0 or i.rcpt_in_prcs_qty > 0 ))
left join
wwvw101.item_part_no_a p
on
(i.part_25_no = p.part_25_no
and
t.calendar_ts between p.data_vald_bgn_ts and p.data_vald_end_ts)
group by
p.repl_part_no, i.fac_cd, t.calendar_date,t.calendar_ts,i.prim_vndr_cd,i.plan_unt_cd

1) First, we lock WWTB101.ITEM_PART_NO in view wwvw101.item_part_no_a
for access, we lock WWTB101.INVE_INV in view wwvw101.inve_inv_a
for access, and we lock sys_calendar.CALTABLE for access.
2) Next, we do an all-AMPs RETRIEVE step from sys_calendar.CALTABLE
by way of an all-rows scan with a condition of (
"(sys_calendar.CALTABLE.calendar_date <= DATE '2011-05-03') AND
(sys_calendar.CALTABLE.calendar_date >= DATE '2009-04-03')") into
Spool 8 (all_amps), which is duplicated on all AMPs. The size of
Spool 8 is estimated with high confidence to be 219,168 rows (
3,725,856 bytes). The estimated time for this step is 0.04
seconds.
3) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to WWTB101.INVE_INV in view
wwvw101.inve_inv_a by way of an all-rows scan with a condition of
("(WWTB101.INVE_INV in view wwvw101.inve_inv_a.BKORD_QTY > 0) OR
((WWTB101.INVE_INV in view wwvw101.inve_inv_a.ONHD_QTY > 0) OR
((WWTB101.INVE_INV in view wwvw101.inve_inv_a.IN_TRNST_QTY > 0) OR
(WWTB101.INVE_INV in view wwvw101.inve_inv_a.RCPT_IN_PRCS_QTY > 0
)))"). Spool 8 and WWTB101.INVE_INV are joined using a product
join, with a join condition of ("(((TRANSLATE(((TRIM(BOTH FROM
{LeftTable}.calendar_date (CHAR(10), CHARACTER SET LATIN, NOT
CASESPECIFIC, FORMAT 'yyyy-mm-dd'))(CHAR(10), CHARACTER SET LATIN,
NOT CASESPECIFIC))||' 12:00:00')USING
LATIN_TO_UNICODE))(TIMESTAMP(6)))>=
WWTB101.INVE_INV.DATA_VALD_BGN_TS) AND (((TRANSLATE(((TRIM(BOTH
FROM {LeftTable}.calendar_date (CHAR(10), CHARACTER SET LATIN, NOT
CASESPECIFIC, FORMAT 'yyyy-mm-dd'))(CHAR(10), CHARACTER SET LATIN,
NOT CASESPECIFIC))||' 12:00:00')USING
LATIN_TO_UNICODE))(TIMESTAMP(6)))<=
WWTB101.INVE_INV.DATA_VALD_END_TS)"). The result goes into Spool
9 (all_amps), which is redistributed by the hash code of (
WWTB101.INVE_INV.PART_25_NO) to all AMPs. Then we do a SORT to
order Spool 9 by row hash. The size of Spool 9 is estimated with
no confidence to be 194,375,187 rows (13,606,263,090 bytes). The
estimated time for this step is 52.28 seconds.
4) We do an all-AMPs JOIN step from WWTB101.ITEM_PART_NO in view
wwvw101.item_part_no_a by way of a RowHash match scan with no
residual conditions, which is joined to Spool 9 (Last Use) by way
of a RowHash match scan. WWTB101.ITEM_PART_NO and Spool 9 are
right outer joined using a merge join, with a join condition of (
"(((TRANSLATE(((TRIM(BOTH FROM {RightTable}.calendar_date
(CHAR(10), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'yyyy-mm-dd'))(CHAR(10), CHARACTER SET LATIN, NOT
CASESPECIFIC))||' 12:00:00')USING
LATIN_TO_UNICODE))(TIMESTAMP(6)))<=
WWTB101.ITEM_PART_NO.DATA_VALD_END_TS) AND
((((TRANSLATE(((TRIM(BOTH FROM {RightTable}.calendar_date
(CHAR(10), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'yyyy-mm-dd'))(CHAR(10), CHARACTER SET LATIN, NOT
CASESPECIFIC))||' 12:00:00')USING
LATIN_TO_UNICODE))(TIMESTAMP(6)))>=
WWTB101.ITEM_PART_NO.DATA_VALD_BGN_TS) AND (PART_25_NO =
WWTB101.ITEM_PART_NO.PART_25_NO ))"). The result goes into Spool
7 (all_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 7 is
estimated with low confidence to be 206,770,166 rows (
14,887,451,952 bytes). The estimated time for this step is 9.71
seconds.
5) We do an all-AMPs SUM step to aggregate from Spool 7 (Last Use) by
way of an all-rows scan , grouping by field1 (
WWTB101.ITEM_PART_NO.REPL_PART_NO ,WWTB101.INVE_INV.FAC_CD
,sys_calendar.CALTABLE.calendar_date ,(TRANSLATE(((TRIM(BOTH FROM
sys_calendar.CALTABLE.calendar_date (CHAR(10), CHARACTER SET
LATIN, NOT CASESPECIFIC, FORMAT 'yyyy-mm-dd'))(CHAR(10), CHARACTER
SET LATIN, NOT CASESPECIFIC))||' 12:00:00')USING
LATIN_TO_UNICODE))(TIMESTAMP(6)),WWTB101.INVE_INV.PRIM_VNDR_CD
,WWTB101.INVE_INV.PLAN_UNT_CD). Aggregate Intermediate Results
are computed globally, then placed in Spool 12. The aggregate
spool file will not be cached in memory. The size of Spool 12 is
estimated with no confidence to be 155,077,625 rows (
21,710,867,500 bytes). The estimated time for this step is 2
minutes and 26 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 12 (Last Use) by way of
an all-rows scan into Spool 5 (group_amps), which is built locally
on the AMPs. The size of Spool 5 is estimated with no confidence
to be 155,077,625 rows (16,128,073,000 bytes). The estimated time
for this step is 7.24 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 5 are sent back to the user as the result of
statement 1. The total estimated time is 3 minutes and 35 seconds.
BEGIN RECOMMENDED STATS ->
8) "COLLECT STATISTICS WWTB101.INVE_INV COLUMN ONHD_QTY". (HighConf)
9) "COLLECT STATISTICS WWTB101.INVE_INV COLUMN RCPT_IN_PRCS_QTY".
(HighConf)
10) "COLLECT STATISTICS WWTB101.INVE_INV COLUMN IN_TRNST_QTY".
(HighConf)
11) "COLLECT STATISTICS WWTB101.INVE_INV COLUMN BKORD_QTY".
(HighConf)
<- END RECOMMENDED STATS

3 REPLIES

Re: Optimizer choses different path when a meaningless group by is added to a dreived table

Is anyone able to show me the syntax for a group by function in Teradata?

-New User.  One Teradata step at a time.

Re: Optimizer choses different path when a meaningless group by is added to a dreived table

Group by <<comma separated list of column names>>

Re: Optimizer choses different path when a meaningless group by is added to a dreived table

Thanks!