Difference in output

UDA

Difference in output

Hi All,

Mentioned below are 2 queries .Can anyone have a look & tell me whether the output of both the queries would be same or not.

Query 1

SELECT
S.ACCT_EVENT_TYPE_CD
,S.CUST_ACCT_EVENT_ID
,S.EVENT_USED_BY_CD
,S.EVENT_SEQUENCE_CD
,S.DW_AS_OF_DT
,S.CUST_ACCT_EVENT_CD
,S.EXT_EVENT_REF_CD
,S.EVENT_START_DT
,S.EVENT_END_DT
,S.AUDIT_USER_ID
,S.AUDIT_UPDT_TS
,S.DW_ACTN_IN
,S.DW_DATA_CHG_TS
,S.DW_LOAD_TS
FROM
IDW_STAGE.CUST_ACCT_EVENT_S S
LEFT OUTER JOIN
(SELECT ACCT_ID,
CUST_ID,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4) T
ON
((T.ACCT_ID=0 AND T.CUST_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='C') OR
(T.CUST_ID=0 AND T.ACCT_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='A') )
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
WHERE
S.DW_AS_OF_DT > COALESCE(T.DW_EFF_DT,date '0001-01-01')
AND
S.DW_ACTN_IN in ('I','U')

============================================================================

Query 2

SELECT
S.ACCT_EVENT_TYPE_CD
,S.CUST_ACCT_EVENT_ID
,S.EVENT_USED_BY_CD
,S.EVENT_SEQUENCE_CD
,S.DW_AS_OF_DT
,S.CUST_ACCT_EVENT_CD
,S.EXT_EVENT_REF_CD
,S.EVENT_START_DT
,S.EVENT_END_DT
,S.AUDIT_USER_ID
,S.AUDIT_UPDT_TS
,S.DW_ACTN_IN
,S.DW_DATA_CHG_TS
,S.DW_LOAD_TS
FROM
IDW_STAGE.CUST_ACCT_EVENT_S S
LEFT OUTER JOIN
(SELECT ACCT_ID,CUST_ID,EVNT_USE_BY_CD,EVNT_SEQ_CD,MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4) T
ON
T.ACCT_ID=0 AND T.CUST_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='C'
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
WHERE
S.DW_AS_OF_DT > COALESCE(T.DW_EFF_DT,date '0001-01-01')
AND
S.DW_ACTN_IN in ('I','U')
UNION
SELECT
S.ACCT_EVENT_TYPE_CD
,S.CUST_ACCT_EVENT_ID
,S.EVENT_USED_BY_CD
,S.EVENT_SEQUENCE_CD
,S.DW_AS_OF_DT
,S.CUST_ACCT_EVENT_CD
,S.EXT_EVENT_REF_CD
,S.EVENT_START_DT
,S.EVENT_END_DT
,S.AUDIT_USER_ID
,S.AUDIT_UPDT_TS
,S.DW_ACTN_IN
,S.DW_DATA_CHG_TS
,S.DW_LOAD_TS
FROM
IDW_STAGE.CUST_ACCT_EVENT_S S
LEFT OUTER JOIN
(SELECT ACCT_ID,CUST_ID,EVNT_USE_BY_CD,EVNT_SEQ_CD,MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4) T
ON
T.CUST_ID=0 AND T.ACCT_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='A'
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
WHERE
S.DW_AS_OF_DT > COALESCE(T.DW_EFF_DT,date '0001-01-01')
AND
S.DW_ACTN_IN in ('I','U')
=========================================================

Is there going to be some difference in query due to Left Outer join.Probably the output would have been same I there would have been Inner join.

Regards
12 REPLIES
Teradata Employee

Re: Difference in output

Not the same, because the term COALESCE(T.DW_EFF_DT,date '0001-01-01') in each subquery evaluates to date'0001-01-01' for all the 'non-matching' rows. In effect your first query is returning rows that satisfy a "NOT (A OR B)" type condition but your second query is "(NOT A) OR (NOT B)" which is not logically equivalent. Alternative "(NOT A) AND (NOT B)" (which would be logically equivalent) would use INTERSECT instead of UNION.

Re: Difference in output

Thanxs fred for the reply.
I think the problem with the query is because of COALESCE & left outer join.COALESCE is making the query 1 to behave as Inner join.
what's ur views on the same
Teradata Employee

Re: Difference in output

Not inner join, exactly; in effect it's being used as a "NOT EXISTS" condition:

SELECT
S.ACCT_EVENT_TYPE_CD
,S.CUST_ACCT_EVENT_ID
,S.EVENT_USED_BY_CD
,S.EVENT_SEQUENCE_CD
,S.DW_AS_OF_DT
,S.CUST_ACCT_EVENT_CD
,S.EXT_EVENT_REF_CD
,S.EVENT_START_DT
,S.EVENT_END_DT
,S.AUDIT_USER_ID
,S.AUDIT_UPDT_TS
,S.DW_ACTN_IN
,S.DW_DATA_CHG_TS
,S.DW_LOAD_TS
FROM
IDW_STAGE.CUST_ACCT_EVENT_S S
WHERE
S.DW_ACTN_IN in ('I','U')
AND NOT EXISTS
(SELECT 1 FROM
(SELECT ACCT_ID,
CUST_ID,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4) T
WHERE
((T.ACCT_ID=0 AND T.CUST_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='C') OR
(T.CUST_ID=0 AND T.ACCT_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='A') )
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
AND T.DW_EFF_DT >= S.DW_AS_OF_DT
);

Re: Difference in output

Thanxs a lot Fred.
Saw your reply today & i have got the logic mentioned by you.
Actually I was looking to tune query 1 as it was taking a lot of time to fetch records.Query1 was using the index but due to OR condition the usage of Index was nullified.But in Query2 Index was used & OR condition was not present which made the query to run faster.
Looks like I have no alternative & will have to use the query1.
Or is there any way to avoid OR condition ,with no difference in output?

Teradata Employee

Re: Difference in output

As I said before, you could probably just change UNION to INTERSECT but that would not be my first choice. It's hard to say exactly what would be best without a lot more detail, but I'd consider splitting the conditions into multiple JOINs, e.g.

...
LEFT OUTER JOIN
(SELECT ACCT_ID,
CUST_ID,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4) T
ON
(T.ACCT_ID=0 AND T.CUST_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='C')
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
LEFT OUTER JOIN
(SELECT ACCT_ID,
CUST_ID,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4) U
ON
(U.CUST_ID=0 AND U.ACCT_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='A')
AND U.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND U.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
WHERE
S.DW_AS_OF_DT > COALESCE(U.DW_EFF_DT,date '0001-01-01')
AND
S.DW_AS_OF_DT > COALESCE(T.DW_EFF_DT,date '0001-01-01')
...

The optimizer might recognize derived table expressions T and U are the same and reuse the same spool. I'd also consider pushing the ACCT_ID=0 and CUST_ID=0 conditions inside the aggregate derived tables, which would make them different but might be more efficient overall.

...
LEFT OUTER JOIN
(SELECT CUST_ID,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
WHERE ACCT_ID=0
GROUP BY 1,2,3) T
ON
(T.CUST_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='C')
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
LEFT OUTER JOIN
(SELECT ACCT_ID,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM IDW_DATA.CUST_ACCT_EVENT_T
GROUP BY 1,2,3) U
ON
(U.ACCT_ID=S.CUST_ACCT_EVENT_ID AND S.ACCT_EVENT_TYPE_CD='A')
AND U.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND U.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
WHERE
S.DW_AS_OF_DT > COALESCE(U.DW_EFF_DT,date '0001-01-01')
AND
S.DW_AS_OF_DT > COALESCE(T.DW_EFF_DT,date '0001-01-01')
...
N/A

Re: Difference in output

The main problem within that query is the OR, this is always bad for the optimizer, because it can't calculate a rowhash using that column.

But the OR is not an or, in fact it's an XOR:
Join on ACCT_ID XOR CUST_ID, only one of both conditions can be true.

An XOR can be replaced by a CASE and you can calculate if it's an account or a customer information.

Hopefully this is correct, of course you should compare the result sets :-)

FROM
CUST_ACCT_EVENT_S S
LEFT OUTER JOIN
(
SELECT
CASE WHEN ACCT_ID = 0 THEN CUST_ID ELSE ACCT_ID END AS id,
CASE WHEN ACCT_ID = 0 THEN 'A' ELSE 'C' END AS type_cd,
EVNT_USE_BY_CD,
EVNT_SEQ_CD,
MAX(DW_EFF_DT) DW_EFF_DT
FROM CUST_ACCT_EVENT_T
GROUP BY 1,2,3,4
) T
ON T.ID=S.CUST_ACCT_EVENT_ID
AND T.TYPE_CD = S.ACCT_EVENT_TYPE_CD
AND T.EVNT_USE_BY_CD = S.EVENT_USED_BY_CD
AND T.EVNT_SEQ_CD = S.EVENT_SEQUENCE_CD
WHERE
S.DW_AS_OF_DT > COALESCE(T.DW_EFF_DT,DATE '0001-01-01')
AND
S.DW_ACTN_IN IN ('I','U')

If you use

AND TRANSLATE(T.TYPE_CD USING UNICODE_TO_LATIN) = S.ACCT_EVENT_TYPE_CD

instead of

AND T.TYPE_CD = S.ACCT_EVENT_TYPE_CD

this will enable the optimizer to join directly to the UPI of CUST_ACCT_EVENT_S.

Dieter

Re: Difference in output

Hello,

My table has col a which is of type unicode and has some chinese and english characters in it. It would show translation errors if I try to load them to column with Latin type.I didnt want to used TRANSLATE WITH ERROR so I am fine to change my target table's datatype to unicode to avoid loss of data.However, I have a problem:-

Sel COALESCE(COLA,' ') (VARCHAR(2000)) FROM Tbl; <--runs fine

Sel COALESCE(COLA,' ') (VARCHAR(2000)) FROM Tbl
UNION ALL
Sel NULL (VARCHAR(2000)) FROM Tbl; <--- runs fine

Sel NULL (VARCHAR(2000)) FROM Tbl
UNION ALL
Sel COALESCE(COLA,' ') (VARCHAR(2000)) FROM Tbl; <-- Error: 6706.

I am confused why it occured. I thought translation error could occur if I am trying to insert a untranslatable unicode char to latin.However, this is no insert here, just a select.

To keep the query short & simple, I have shown only the column where I am facing this problem. There are other columns as well to the query which do not need to be translated as they are both unicode and latin compatible.The reason why I had to use UNION with NULL is because I have to transpose source tbl colmns to rows between 2 columns(1 varchar and other date type) of my target table.

P.S. This may not be the correct feed to post this question.However, I am not sure how can I post a new thread.
N/A

Re: Difference in output

The first select within a set operation determines a column's data type.
The "NULL (VARCHAR(2000))" results in a LATIN character set:

SELECT TYPE (NULL (VARCHAR(2000))), TYPE('' (VARCHAR(2000)));

Type(Null) Type('')
VARCHAR(2000) VARCHAR(2000) CHARACTER SET UNICODE

So avoid the NULL in the first select or force it to UNICODE:
TRANSLATE((NULL (VARCHAR(2000))) USING latin_to_unicode)

Btw, if you go to one of the forums there's a "Create a new forum topic" link at the end of the forum description.

Dieter

Re: Difference in output

Thanks Dieter, that was really informative.
How should I decide whether the character should be set as LATIN or UNICODE for a column while defining its DDL? I know some UNICODE Chars arent translatable to LATIN.Why dont we always go with UNICODE.Does LATIN provide any benefit over UNICODE?

- I got that link yesterday and posted a few other questions as well. :)