Optimizer's inability to resolve variables for partitioning

Database
Enthusiast

Optimizer's inability to resolve variables for partitioning

DB VERSION 13.10.05.02

I'm attempting to understand how does the optimizer resolve (if at all) dynamic variables (subquery) for partition elimination.
Here is an example,

CREATE TABLE SSURI.CDC_BATCH(
SINGLEAMP SMALLINT,
BATCH_START DATE,
BATCH_END DATE);

INSERT INTO SSURI.CDC_BATCH VALUES (1, CURRENT_DATE - 10, CURRENT_DATE);

--collect stats ssuri.cdc_batch....

CREATE TABLE SSURI.CUSLDSFL_SI(
TRACKINGNUM INTEGER,
AUD_TIME TIMESTAMP)
PRIMARY INDEX (TRACKINGNUM)
PARTITION BY RANGE_N(CAST((AUD_TIME) AS DATE AT LOCAL) BETWEEN '2009-12-31' AND '2013-12-31' EACH INTERVAL '1' DAY);

SELECT TOP 10 PARTITION, CAST(AUD_TIME AS DATE), COUNT(*)
FROM SSURI.CUSLDSFL_SI
GROUP BY 1,2 ORDER BY 1

PARTITION AUD_TIME COUNT(*)
----------------------------------------
3 1/2/2010 596
4 1/3/2010 3072
5 1/4/2010 45956
6 1/5/2010 32130
7 1/6/2010 32571
8 1/7/2010 28969
9 1/8/2010 26795
10 1/9/2010 2275
11 1/10/2010 1734
12 1/11/2010 38274

SELECT * FROM SSURI.CUSLDSFL_SI
WHERE AUD_TIME BETWEEN CAST((CURRENT_DATE - 10) AS TIMESTAMP) AND CAST(CURRENT_DATE AS TIMESTAMP)

3) We do an all-AMPs RETRIEVE step from 12 partitions of
SSURI.CUSLDSFL_SI with a condition of (
"(SSURI.CUSLDSFL_SI.Aud_Time >= TIMESTAMP '2013-02-22
00:00:00.000000') AND (SSURI.CUSLDSFL_SI.Aud_Time <= TIMESTAMP
'2013-03-04 00:00:00.000000')") into Spool 1 (group_amps), which
is built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 644,784 rows (620,926,992 bytes). The
estimated time for this step is 0.37 seconds.

SELECT * FROM SSURI.CUSLDSFL_SI
WHERE AUD_TIME >= (SELECT CAST(BATCH_START AS TIMESTAMP) FROM SSURI.CDC_BATCH)
AND AUD_TIME <= (SELECT CAST(BATCH_END AS TIMESTAMP) FROM SSURI.CDC_BATCH)

8) We do an all-AMPs RETRIEVE step from SSURI.CUSLDSFL_SI by way of
an all-rows scan with a condition of (
"(SSURI.CUSLDSFL_SI.Aud_Time <= :%SSQ18) AND
(SSURI.CUSLDSFL_SI.Aud_Time >= :%SSQ17)") into Spool 3
(group_amps), which is built locally on the AMPs. The size of
Spool 3 is estimated with no confidence to be 122,089,337 rows (
117,572,031,531 bytes). The estimated time for this step is 2
minutes and 54 seconds.

The subqueries (derived or otherwise) are now passed as variables and the optimizer seems to ignore partition elimination in this case.
The "no confidence" stems from this fact.

Any insight is deeply appreciated...

Thanks
Sanjeev
8 REPLIES
Enthusiast

Re: Optimizer's inability to resolve variables for partitioning

DB VERSION 13.10.05.02

I'm attempting to understand how does the optimizer resolve (if at all) dynamic variables (subquery) for partition elimination.
Here is an example,

CREATE TABLE SSURI.CDC_BATCH(
SINGLEAMP SMALLINT,
BATCH_START DATE,
BATCH_END DATE);

INSERT INTO SSURI.CDC_BATCH VALUES (1, CURRENT_DATE - 10, CURRENT_DATE);

--collect stats ssuri.cdc_batch....

CREATE TABLE SSURI.CUSLDSFL_SI(
TRACKINGNUM INTEGER,
AUD_TIME TIMESTAMP)
PRIMARY INDEX (TRACKINGNUM)
PARTITION BY RANGE_N(CAST((AUD_TIME) AS DATE AT LOCAL) BETWEEN '2009-12-31' AND '2013-12-31' EACH INTERVAL '1' DAY);

SELECT TOP 10 PARTITION, CAST(AUD_TIME AS DATE), COUNT(*)
FROM SSURI.CUSLDSFL_SI
GROUP BY 1,2 ORDER BY 1

PARTITION AUD_TIME COUNT(*)
----------------------------------------
3 1/2/2010 596
4 1/3/2010 3072
5 1/4/2010 45956
6 1/5/2010 32130
7 1/6/2010 32571
8 1/7/2010 28969
9 1/8/2010 26795
10 1/9/2010 2275
11 1/10/2010 1734
12 1/11/2010 38274

SELECT * FROM SSURI.CUSLDSFL_SI
WHERE AUD_TIME BETWEEN CAST((CURRENT_DATE - 10) AS TIMESTAMP) AND CAST(CURRENT_DATE AS TIMESTAMP)

3) We do an all-AMPs RETRIEVE step from 12 partitions of
SSURI.CUSLDSFL_SI with a condition of (
"(SSURI.CUSLDSFL_SI.Aud_Time >= TIMESTAMP '2013-02-22
00:00:00.000000') AND (SSURI.CUSLDSFL_SI.Aud_Time <= TIMESTAMP
'2013-03-04 00:00:00.000000')") into Spool 1 (group_amps), which
is built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 644,784 rows (620,926,992 bytes). The
estimated time for this step is 0.37 seconds.

SELECT * FROM SSURI.CUSLDSFL_SI
WHERE AUD_TIME >= (SELECT CAST(BATCH_START AS TIMESTAMP) FROM SSURI.CDC_BATCH)
AND AUD_TIME <= (SELECT CAST(BATCH_END AS TIMESTAMP) FROM SSURI.CDC_BATCH)

8) We do an all-AMPs RETRIEVE step from SSURI.CUSLDSFL_SI by way of
an all-rows scan with a condition of (
"(SSURI.CUSLDSFL_SI.Aud_Time <= :%SSQ18) AND
(SSURI.CUSLDSFL_SI.Aud_Time >= :%SSQ17)") into Spool 3
(group_amps), which is built locally on the AMPs. The size of
Spool 3 is estimated with no confidence to be 122,089,337 rows (
117,572,031,531 bytes). The estimated time for this step is 2
minutes and 54 seconds.

The subqueries (derived or otherwise) are now passed as variables and the optimizer seems to ignore partition elimination in this case.
The "no confidence" stems from this fact.

Any insight is deeply appreciated...

Thanks
Sanjeev
Senior Apprentice

Re: Optimizer's inability to resolve variables for partitioning

Hi Sanjeev,
at the time when the optimizer is creating the execution plan the actual rows in the CDC_BATCH table are unknown.

As this seems to be a single row table i'd suggest replacing the CDC_BATCH table s with a View:
http://developer.teradata.com/node/7579

Dieter
Enthusiast

Re: Optimizer's inability to resolve variables for partitioning

Thanks for the response Dieter.

I tried creating the view, but that doesn't seem to change the plan.

CREATE VIEW SSURI.V_CDC1 (BATCH_START, BATCH_END)

AS

SELECT     CAST(BATCH_START AS TIMESTAMP(6)),

        CAST(BATCH_END AS TIMESTAMP(6))

FROM    SSURI.CDC_BATCH1;

EXPLAIN SELECT * FROM SSURI.CUSLDSFL_SI

WHERE AUD_TIME >= (SELECT BATCH_START FROM SSURI.V_CDC1)

AND      AUD_TIME <= (SELECT BATCH_END   FROM SSURI.V_CDC1)

 4) We do an all-AMPs RETRIEVE step from SSURI.CDC_BATCH1 by way of an

     all-rows scan with no residual conditions into Spool 1 (all_amps),

     which is built locally on the AMPs.  The size of Spool 1 is

     estimated with high confidence to be 1 row (35 bytes).  The

     estimated time for this step is 0.03 seconds.

  5) We do an all-AMPs DISPATCHER RETRIEVE step from Spool 1 (Last Use)

     by way of an all-rows scan and send the rows back to the

     Dispatcher.

  6) We do an all-AMPs RETRIEVE step from SSURI.CDC_BATCH1 by way of an

     all-rows scan with no residual conditions into Spool 2 (all_amps),

     which is built locally on the AMPs.  The size of Spool 2 is

     estimated with high confidence to be 1 row (35 bytes).  The

     estimated time for this step is 0.03 seconds.

  7) We do an all-AMPs DISPATCHER RETRIEVE step from Spool 2 (Last Use)

     by way of an all-rows scan and send the rows back to the

     Dispatcher.

  8) We do an all-AMPs RETRIEVE step from SSURI.CUSLDSFL_SI by way of

     an all-rows scan with a condition of (

     "(SSURI.CUSLDSFL_SI.Aud_Time <= :%SSQ18) AND

     (SSURI.CUSLDSFL_SI.Aud_Time >= :%SSQ17)") into Spool 3

     (group_amps), which is built locally on the AMPs.  The size of

     Spool 3 is estimated with no confidence to be 122,089,337 rows (

     117,572,031,531 bytes).  The estimated time for this step is 2

     minutes and 54 seconds.

Senior Apprentice

Re: Optimizer's inability to resolve variables for partitioning

Hi Sanjeev,

don't create a view on top of the table, replace the table with the view:

CREATE VIEW SSURI.CDC_BATCH1 (BATCH_START, BATCH_END)

AS

SELECT CURRENT_DATE - 10, CURRENT_DATE;  

SELECT * FROM SSURI.CUSLDSFL_SI CROSS JOIN CDC_BATCH1 as x

WHERE AUD_TIME BETWEEN x.BATCH_START AND x.BATCH_END

Dieter

Enthusiast

Re: Optimizer's inability to resolve variables for partitioning

:) Thanks Dieter... but i should have mentioned that the dates are passed as parameters into a table and we create a view on that table.

The view cannot be created on hard coded dates.

I used those dates to create a test case.

Here is the scenario:

We use Wherescape RED for our ETL builds and to store our environment's metadata.

There is a table called DSS_PARAMETER which stores parameter values defined in the tool.

We have defined parameters like 'Batch_Start_Window' and 'Batch_End_Window'.

Values for these parameters are incremented daily through a script and that's how we do our daily build.

CREATE SET TABLE D_BI_USRDB.DSS_PARAMETER

(dss_parameter_name VARCHAR(256) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,

 dss_parameter_value VARCHAR(4000) CHARACTER SET LATIN NOT CASESPECIFIC,

 dss_parameter_comments VARCHAR(256) CHARACTER SET LATIN NOT CASESPECIFIC,

 dss_parameter_type VARCHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC)

UNIQUE PRIMARY INDEX ( dss_parameter_name );

SELECT    DSS_PARAMETER_NAME, CAST(DSS_PARAMETER_VALUE AS TIMESTAMP)

FROM    D_BI_USRDB.DSS_PARAMETER

WHERE    DSS_PARAMETER_NAME IN ('BATCH_BEGIN_WINDOW', 'BATCH_END_WINDOW');

Batch_End_Window    3/13/2013 00:00:00.000000

Batch_Begin_Window    3/11/2013 00:00:00.000000

So, if i create a view like

REPLACE VIEW SSURI.CDC_BATCH AS

SELECT     MAX(CASE WHEN DSS_PARAMETER_NAME = 'BATCH_BEGIN_WINDOW'    THEN CAST(DSS_PARAMETER_VALUE AS TIMESTAMP) END) AS BATCH_START,

    MAX(CASE WHEN DSS_PARAMETER_NAME = 'BATCH_END_WINDOW'    THEN CAST(DSS_PARAMETER_VALUE AS TIMESTAMP) END) AS BATCH_END

FROM    D_BI_USRDB.DSS_PARAMETER

WHERE    DSS_PARAMETER_NAME IN ('BATCH_BEGIN_WINDOW', 'BATCH_END_WINDOW')

and use this view with a table to extract last 2 days data, the optimizer would ignore the partition elimination.

SELECT * FROM SSURI.CUSLDSFL_SI

CROSS JOIN SSURI.CDC_BATCH X

WHERE AUD_TIME BETWEEN X.BATCH_START AND X.BATCH_END

 10) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an

     all-rows scan, which is joined to SSURI.CUSLDSFL_SI by way of an

     all-rows scan with no residual conditions.  Spool 7 and

     SSURI.CUSLDSFL_SI are joined using a product join, with a join

     condition of ("(SSURI.CUSLDSFL_SI.Aud_Time >= BATCH_START) AND

     (SSURI.CUSLDSFL_SI.Aud_Time <= BATCH_END)").  The result goes into

     Spool 6 (group_amps), which is built locally on the AMPs.  The

     size of Spool 6 is estimated with no confidence to be 203,482,229

     rows (206,534,462,435 bytes).  The estimated time for this step is

     3 minutes and 22 seconds.

Sanjeev

Senior Apprentice

Re: Optimizer's inability to resolve variables for partitioning

Hi Sanjeev,

yes, you should have mentioned that it's more complicated.

Seems your explain in the 2nd post is the best you can get.

In a case like that i'd prefer putting the query in a Stored Proc, select the values into parameters and plug them into the SQL.

Dieter

Enthusiast

Re: Optimizer's inability to resolve variables for partitioning

Thanks Dieter.. much appreciated.

Teradata Employee

Re: Optimizer's inability to resolve variables for partitioning

Sanji,

This might be a little late, but have you tried to create a sparse join index based on the like clause scenario?  I believe 13 and 14 both support this.