Why Predicate Push Down mechanism doesn`t happen?

Database

Why Predicate Push Down mechanism doesn`t happen?

Hello!

I`m interested in Predicate Push Down mechanism in Teradata.

Could you possibly explain why it doesn`t happen in example below and give a piece of advice how to fix it?

Thanks in advance.

Description of the example:

UAT_DM_CF.DM_CF_CARD_TURN - table with customer's salary information month-by-month. Primary Index: AGREEMENT_RK. Primary Key: AGREEMENT_RK, YEAR_MONTH. CUSTOMER_MDM_ID - customer identifier, that changes over time.

UAT_DM_CF.TECH_MDM_RELATIONSHIP - table with history of all relationships between "old" and "new" customer identifiers. This table are used to sync customer identifiers in UAT_DM_CF.DM_CF_CARD_TURN. Primary Index: PREVIOUS_CUSTOMER_ID. Primary Key: PREVIOUS_CUSTOMER_ID, EFFECTIVE_TO_DTTM.

UAT_VDM_CF.TECH_MDM_RELATIONSHIP_ACT - view with actual relationship of "old" and "new" customer identifiers (view).

It has the following structure:

REPLACE VIEW UAT_VDM_CF.TECH_MDM_RELATIONSHIP_ACT AS LOCKING ROW FOR ACCESS
SELECT
PREVIOUS_CUSTOMER_RK,
PREVIOUS_CUSTOMER_ID,
CUSTOMER_RK,
CUSTOMER_ID
FROM
UAT_DM_CF.TECH_MDM_RELATIONSHIP
WHERE
EFFECTIVE_TO_DTTM = CAST('5999-12-31 00:00:00' AS TIMESTAMP(0))
AND DELETED_FLG = '0';

UAT_VDM_CF.DM_CF_CARD_TURN - view with synchronized customer identifiers.

It has the following structure:

REPLACE VIEW UAT_VDM_CF.DM_CF_CARD_TURN AS LOCKING ROW FOR ACCESS
SELECT
crdt.AGREEMENT_RK,
crdt.CONTRACT_ID,
crdt.YEAR_MONTH,
COALESCE(rel.CUSTOMER_ID, crdt.CUSTOMER_MDM_ID) as CUSTOMER_MDM_ID,
crdt.CUSTOMER_MDM_ID as OLD_CUSTOMER_MDM_ID,
crdt.CURRENCY_ISO_ID,
crdt.SALARY_FLG,
crdt.DEBET_TURN_AMT,
crdt.DEBET_TURN_SALARY_AMT,
crdt.CREDIT_TURN_AMT,
crdt.BALANCE_AMT,
crdt.EMPLOYER_TAX_PAYER_NUM,
crdt.SOURCE_SYSTEM_CD,
crdt.BUSINESS_DTTM,
crdt.PROCESSED_DTTM,
crdt.LAYER_ID,
crdt.LOAD_ID
FROM
UAT_DM_CF.DM_CF_CARD_TURN as crdt
LEFT JOIN
UAT_VDM_CF.TECH_MDM_RELATIONSHIP_ACT as rel
ON
crdt.CUSTOMER_MDM_ID = rel.PREVIOUS_CUSTOMER_ID;

After executing two following queries we surprisingly got dramatic difference in response time:

select * from UAT_DM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123'; --0.02 seconds
select * from UAT_VDM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123'; --17 minutes and 42 seconds

The question is why in the second query the condition CUSTOMER_MDM_ID = '123' applies after dynamic hash join ?

Explain Texts:

Explain select * from UAT_DM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123';

1) First, we lock a distinct UAT_DM_CF."pseudo table" for read on a
RowHash to prevent global deadlock for UAT_DM_CF.DM_CF_CARD_TURN.
2) Next, we lock UAT_DM_CF.DM_CF_CARD_TURN for read.
3) We do an all-AMPs RETRIEVE step from UAT_DM_CF.DM_CF_CARD_TURN by
way of index # 4 "UAT_DM_CF.DM_CF_CARD_TURN.CUSTOMER_MDM_ID =
'123'" with a residual condition of (
"UAT_DM_CF.DM_CF_CARD_TURN.CUSTOMER_MDM_ID = '123'") into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with high confidence to be 43 rows (19,307
bytes). The estimated time for this step is 0.02 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.
No rows are returned to the user as the result of statement 2.
The total estimated time is 0.02 seconds.

Explain select * from UAT_VDM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123';

1) First, we lock UAT_DM_CF.TECH_MDM_RELATIONSHIP in view
UAT_VDM_CF.DM_CF_CARD_TURN for access, and we lock UAT_DM_CF.crdt
in view UAT_VDM_CF.DM_CF_CARD_TURN for access.
2) Next, we do an all-AMPs RETRIEVE step from a single partition of
UAT_DM_CF.TECH_MDM_RELATIONSHIP in view UAT_VDM_CF.DM_CF_CARD_TURN
with a condition of ("UAT_DM_CF.TECH_MDM_RELATIONSHIP in view
UAT_VDM_CF.DM_CF_CARD_TURN.EFFECTIVE_TO_DTTM = TIMESTAMP
'5999-12-31 00:00:00'") with a residual condition of (
"(UAT_DM_CF.TECH_MDM_RELATIONSHIP.EFFECTIVE_TO_DTTM = TIMESTAMP
'5999-12-31 00:00:00') AND (UAT_DM_CF.TECH_MDM_RELATIONSHIP in
view UAT_VDM_CF.DM_CF_CARD_TURN.DELETED_FLG = '0')") into Spool 2
(all_amps) (compressed columns allowed), which is duplicated on
all AMPs. The size of Spool 2 is estimated with high confidence
to be 1,256,112 rows (199,721,808 bytes). The estimated time for
this step is 0.15 seconds.
3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to UAT_DM_CF.crdt in view
UAT_VDM_CF.DM_CF_CARD_TURN by way of an all-rows scan with no
residual conditions. Spool 2 and UAT_DM_CF.crdt are right outer
joined using a dynamic hash join, with condition(s) used for
non-matching on right table ("NOT (UAT_DM_CF.crdt.CUSTOMER_MDM_ID
IS NULL)"), with a join condition of (
"UAT_DM_CF.crdt.CUSTOMER_MDM_ID = PREVIOUS_CUSTOMER_ID"). The
input table UAT_DM_CF.crdt will not be cached in memory. The
result goes into Spool 3 (all_amps) (compressed columns allowed),
which is built locally on the AMPs. The result spool file will
not be cached in memory. The size of Spool 3 is estimated with
low confidence to be 1,207,507,023 rows (527,680,569,051 bytes).
The estimated time for this step is 7 minutes and 20 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan with a condition of ("(( CASE WHEN (NOT
(CUSTOMER_ID IS NULL )) THEN (CUSTOMER_ID) ELSE (CUSTOMER_MDM_ID)
END ))= '123'") into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be
1,207,507,023 rows (624,281,130,891 bytes). The estimated time
for this step is 10 minutes and 21 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.
No rows are returned to the user as the result of statement 2.
The total estimated time is 17 minutes and 42 seconds.

The tables have the following structure:

show table UAT_DM_CF.DM_CF_CARD_TURN;

CREATE MULTISET TABLE UAT_DM_CF.DM_CF_CARD_TURN ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
CUSTOMER_MDM_ID VARCHAR(100) CHARACTER SET UNICODE CASESPECIFIC,
CONTRACT_ID VARCHAR(50) CHARACTER SET UNICODE CASESPECIFIC,
SOURCE_SYSTEM_CD VARCHAR(10) CHARACTER SET UNICODE CASESPECIFIC COMPRESS ('00006','00040','00018','00051','00000'),
YEAR_MONTH DATE FORMAT 'yyyy-mm-dd',
SALARY_FLG CHAR(1) CHARACTER SET UNICODE CASESPECIFIC COMPRESS ('0','1'),
CURRENCY_ISO_ID VARCHAR(3) CHARACTER SET UNICODE CASESPECIFIC COMPRESS '810',
DEBET_TURN_SALARY_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
CREDIT_TURN_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
DEBET_TURN_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
BALANCE_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
BUSINESS_DTTM TIMESTAMP(0) NOT NULL,
PROCESSED_DTTM TIMESTAMP(0) NOT NULL,
AGREEMENT_RK DECIMAL(18,0) NOT NULL,
LAYER_ID INTEGER NOT NULL,
LOAD_ID INTEGER NOT NULL,
EMPLOYER_TAX_PAYER_NUM VARCHAR(200) CHARACTER SET UNICODE CASESPECIFIC COMPRESS )
PRIMARY INDEX ( AGREEMENT_RK )
PARTITION BY RANGE_N(YEAR_MONTH BETWEEN DATE '2010-11-01' AND DATE '2020-12-01' EACH INTERVAL '1' MONTH ,
NO RANGE)
INDEX ( CUSTOMER_MDM_ID );

show table UAT_DM_CF.TECH_MDM_RELATIONSHIP;

CREATE MULTISET TABLE UAT_DM_CF.TECH_MDM_RELATIONSHIP ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
PREVIOUS_CUSTOMER_RK INTEGER NOT NULL,
PREVIOUS_CUSTOMER_ID VARCHAR(100) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
PREVIOUS_CUSTOMER_TYPE_CD CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
CUSTOMER_RK INTEGER NOT NULL,
CUSTOMER_ID VARCHAR(100) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
CUSTOMER_TYPE_CD CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
SOURCE_SYSTEM_CD VARCHAR(10) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
FILE_ID INTEGER NOT NULL,
PROCESSED_DTTM TIMESTAMP(0) NOT NULL,
EFFECTIVE_FROM_DTTM TIMESTAMP(0) NOT NULL,
EFFECTIVE_TO_DTTM TIMESTAMP(0) NOT NULL,
LOAD_ID INTEGER NOT NULL,
DELETED_FLG CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
IS_ACTIVE_FLG CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL)
PRIMARY INDEX ( PREVIOUS_CUSTOMER_ID )
PARTITION BY RANGE_N(CAST((EFFECTIVE_TO_DTTM ) AS DATE AT TIME ZONE INTERVAL '3:00' HOUR TO MINUTE ) BETWEEN DATE '2010-01-01' AND DATE '2017-12-31' EACH INTERVAL '3' DAY ,
DATE '5999-12-31' AND DATE '5999-12-31' EACH INTERVAL '1' DAY ,
NO RANGE)
INDEX ( PREVIOUS_CUSTOMER_RK );

The tables have the following statistics:

show statistics on UAT_DM_CF.DM_CF_CARD_TURN;

COLLECT STATISTICS
-- default SYSTEM SAMPLE PERCENT
-- default THRESHOLD 10 DAYS
-- default THRESHOLD 10.00 PERCENT
COLUMN ( AGREEMENT_RK ) ,
COLUMN ( CUSTOMER_MDM_ID )
ON UAT_DM_CF.DM_CF_CARD_TURN ;

Date Time Unique Value Column Names Column Dictionary Name Column SQL Names Column Names UEscape
1 16/03/01 14:54:56 1,207,507,022 * * "*" NULL
2 16/02/19 11:27:39 37,533,183 AGREEMENT_RK AGREEMENT_RK AGREEMENT_RK NULL
3 16/03/01 14:54:56 18,000,808 CUSTOMER_MDM_ID CUSTOMER_MDM_ID CUSTOMER_MDM_ID NULL

show statistics on UAT_DM_CF.TECH_MDM_RELATIONSHIP;

COLLECT STATISTICS
-- default SYSTEM SAMPLE PERCENT
-- default THRESHOLD 10 DAYS
-- default THRESHOLD 10.00 PERCENT
COLUMN ( PREVIOUS_CUSTOMER_ID ) ,
COLUMN ( CUSTOMER_ID ) ,
COLUMN ( CUSTOMER_RK ) ,
COLUMN ( PREVIOUS_CUSTOMER_RK ) ,
COLUMN ( EFFECTIVE_TO_DTTM,DELETED_FLG ) ,
COLUMN ( EFFECTIVE_TO_DTTM ) ,
COLUMN ( DELETED_FLG )
ON UAT_DM_CF.TECH_MDM_RELATIONSHIP ;

Date Time Unique Value Column Names Column Dictionary Name Column SQL Names Column Names UEscape
1 16/03/01 14:58:19 16,250 * * "*" NULL
2 16/02/18 20:05:36 16,143 PREVIOUS_CUSTOMER_ID PREVIOUS_CUSTOMER_ID PREVIOUS_CUSTOMER_ID NULL
3 16/02/18 20:04:54 8,001 CUSTOMER_ID CUSTOMER_ID CUSTOMER_ID NULL
4 16/02/18 20:04:55 8,001 CUSTOMER_RK CUSTOMER_RK CUSTOMER_RK NULL
5 16/02/18 20:05:38 16,143 PREVIOUS_CUSTOMER_RK PREVIOUS_CUSTOMER_RK PREVIOUS_CUSTOMER_RK NULL
6 16/03/01 14:58:17 13 EFFECTIVE_TO_DTTM,DELETED_FLG EFFECTIVE_TO_DTTM,DELETED_FLG EFFECTIVE_TO_DTTM,DELETED_FLG NULL
7 16/03/01 14:58:18 12 EFFECTIVE_TO_DTTM EFFECTIVE_TO_DTTM EFFECTIVE_TO_DTTM NULL
8 16/03/01 14:58:19 2 DELETED_FLG DELETED_FLG DELETED_FLG NULL

1 REPLY
Junior Contributor

Re: Why Predicate Push Down mechanism doesn`t happen?

The optmizer can't push this, because CUSTOMER_MDM_ID is a column in query #1, but the result of a calculation in #2:  COALESCE(rel.CUSTOMER_ID, crdt.CUSTOMER_MDM_ID)