Need help in tuning a query

Database
Fan

Re: Need help in tuning a query

I have re-written the query below.  I added numbers as comments that refer to footnotes explaining my reasoning. 

select


b.COOKIE_ID,


max( b.VISIT_DATE ) as latest_visit_date, -- 7, 8


max(case when b.VISIT_DATE between '2009-05-01' and '2010-04-30' -- 7


then 'Y' else 'N' end) as ACTIVITY_2yearsbefore,


max(case when b.VISIT_DATE between '2010-05-01' and '2011-04-30' -- 7


then 'Y' else 'N' end) as ACTIVITY_prevyear,


max(case when b.VISIT_DATE between '2011-05-01' and '2012-04-30' -- 7


then 'Y' else 'N' end) as ACTIVITY_currentyear

from -- 1

(


select a.COOKIE_ID,


cast( a.VISIT_START_DTTM as date ) as VISIT_DATE


from CDW.VISIT as a


where -- 2


a.LOCATION_ID = 2 and


VISIT_DATE < date '2012-05-01' -- 3, 4, 5, 6

) as b

group by b.COOKIE_ID

1. Use a derived table to convert VISIT_START_DTTM to DATE format only once.

2. Put as many of the WHERE conditions inside the derived table as possible.

3. Use the alias name for the new date column wherever possible.  This avoids the potential additonal, unnecessary re-casting of the same column and makes the query easier to read.

4. This might not be an issue with Teradata, but it is best practice for Oracle: Use a single comparison when possible rather than a combination, because a combo like "<=" forces the engine to re-write the whole query as "<" OR "=", which is like running 2 separate UNION queries.

5. Here, because the "=" was dropped, the date constant had to be adjusted by one day.

6. It's best practice to convert constants explicitly to the DATE data type.

7. After being converted from a DATETIME column to a DATE column once, it can be referred to by its alias name - again, avoids potential re-casting and makes the query easier to read.

8. The original query actually has the VISIT_START_DTTM treated in 2 different ways, and I'm not sure whether that was on purpose or by accident. The original 2nd column FIRST took the maximum DATETIME value and THEN took the MAX; all the remaining original columns FIRST cast from DATETIME to DATE and THEN perform the MAX aggregation. I re-wrote the query to deal with them all in the same way; if that doesn't accomplish what you want, feel free to re-write.

Junior Contributor

Re: Need help in tuning a query

Some remarks on the comments:

1. ok, but this only avoids cut&pasting the same calculation several times, it will not influence the optimizer. Unless you do a DITINCT/GROUP BY/OLAP within the DT the optimizer will remove it.

2. The optmizer will automatically push conditions into the Derived Table, again this should not change the plan

3. re-using the alias is only conveniant for the programmer, the resolver will replace it with the original calculation before it's passed to the optimizer

4. I doubt this is true in Oracle (maybe it was decades ago), it's definitely not true in TD, simply compare Explains

6. ok, but then you should do it within the CASEs, too :-)

And best pratice would be to completely avoid the cast to a date, as it's not needed at all (and as a side effect existing statistics will be usable again). Instead of casting the column change the literals, e.g.

VISIT_START_DTTM < TIMESTAMP '2012-05-01 00:00:00'

or

VISIT_START_DTTM < CAST(DATE '2012-05-01' AS TIMESTAMP)

The optimizer will treat both exactly the same.

DIeter

Fan

Re: Need help in tuning a query

I misstated #8.  I should have written, "The original 2nd column was created by first taking a MAX then doing a CAST, while the other columns first did the CAST then took the MAX value."