Need help in tuning a query

Database

Need help in tuning a query

Here is the explain that is straight forward but sometimes the same query runs in 10-15mins and sometimes it takes longer than 4-5hours for the same data volume.

Explain SELECT COOKIE_ID, CAST(MAX(VISIT_START_DTTM) AS DATE) AS LATEST_VISIT_DATE,

MAX(CASE WHEN CAST(VISIT_START_DTTM AS DATE) BETWEEN '2009-05-01' AND '2010-04-30' THEN 'Y'

ELSE 'N' END) AS ACTIVITY_2yearsbefore,

MAX(CASE WHEN CAST(VISIT_START_DTTM AS DATE) BETWEEN '2010-05-01' AND '2011-04-30' THEN 'Y'

ELSE 'N' END) AS ACTIVITY_prevyear,

MAX(CASE WHEN CAST(VISIT_START_DTTM AS DATE) BETWEEN '2011-05-01' AND '2012-04-30' THEN 'Y'

ELSE 'N' END) AS ACTIVITY_currentyear

FROM CDW.VISIT A

WHERE A.LOCATION_ID = 2

AND CAST(VISIT_START_DTTM AS DATE) <= '2012-04-30'

GROUP BY COOKIE_ID

1) First, we lock CDW_Tables.VISIT in view CDW.VISIT for access.

2) Next, we do an all-AMPs SUM step to aggregate from

CDW_Tables.VISIT in view CDW.VISIT by way of an all-rows scan with

a condition of ("(CDW_Tables.VISIT in view CDW.VISIT.Location_Id =

2) AND ((CAST((CDW_Tables.VISIT in view CDW.VISIT.Visit_Start_Dttm)

AS DATE))<= DATE '2012-04-30')") , grouping by field1 (

CDW_Tables.VISIT.Cookie_Id). Aggregate Intermediate Results are

computed globally, then placed in Spool 4. The input table will

not be cached in memory, but it is eligible for synchronized

scanning. The aggregate spool file will not be cached in memory.

The size of Spool 4 is estimated with no confidence to be

141,283,585 rows (7,911,880,760 bytes). The estimated time for

this step is 2 minutes and 23 seconds.

3) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of

an all-rows scan into Spool 2 (group_amps), which is built locally

on the AMPs. The result spool file will not be cached in memory.

The size of Spool 2 is estimated with no confidence to be

141,283,585 rows (6,075,194,155 bytes). The estimated time for

this step is 6.21 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 2 are sent back to the user as the result of

statement 1. The total estimated time is 2 minutes and 29 seconds.

The UPI on the visit table is on VISIT_ID, not sure why it would perform so differently under different cpu loads.

Any ways to tune this query? The table does not have any PPIs and would not make any difference to this query since the CAST(VISIT_START_DTTM AS DATE) <= '2012-04-30' is actually pulling all the data because the min(visit_Start_dttm) in the table itself is 2009-05-01.

Thanks

Ashwini

12 REPLIES
N/A

Re: Need help in tuning a query

Are these queries logged in DBQL?

If yes, check if the resource consumption is really different. You might be faceing system workload issues.

Re: Need help in tuning a query

Yes they are in DBQL, what do i really need to look at to understand resource consumption? Do I need to look at HotAmpCPU, CPU Skew, I/O Sku etc? Appreciate your response

Re: Need help in tuning a query

Check the AMPCPU Time.. Is there a database patch upgrade done recently ??

Re: Need help in tuning a query

No upgrade as far as I know. I looked at the DBQLogTbl and the AMP CPU Time was 18,367. Is this too high?









AMP CPU Time MaxAmpCPUTime parsercputime totalIOCount MaxAmpIO TDWMEstTotalTime
18,367.83 123.047 0.225 10,865,418.00 66,936.00 350,529.14

Let me know your thoughts on this!!

Thanks

Ashwini

Re: Need help in tuning a query

Looks like it could be a statistics problem to me (no confidence everywhere).  Collect the necessary stats, rerun the query, then see if your plan is much better.

N/A

Re: Need help in tuning a query

Hi Ashwini,

you posted only one query log entry.

My understanding is that different runs differ in execution time / ellapsedtimes. Do they differ in AMPCPUtime, totalIOCount etc.?

N/A

Re: Need help in tuning a query

Hi goldmine,

how could this plan be improved? Its doning a plain aggregation, no joins etc. Stats might give better estimates but will not change the plan.

Re: Need help in tuning a query

I would say it is just generally a good idea to do it anyway.  Whether it improves the query... who knows... pretty low hanging fruit though.

Re: Need help in tuning a query

It is evident from the plan that the query is unable to use the PI of the table and is pulling out the entire table into spool which is global, meaning the data will first be pulled out into a spool and then it will be processed. I would try out following options:

(a) see if cookie id can be made the PI of the table without skewing it too much, this will make all calculations happen locally in the AMP and run very fast even with high data volumes owing to MPP. This may be the only real option to try, rest are all guesses with a very slim chance of working

(b) if PI cannot be changed, a PPI with date ranges may work here if it is aligned to your case statement filters, although I have a strong feeling it won't work

(c) lastly, try converting your case statements into a small control table that stores the boundary values for each case and is joined with your master table. With correct stats the optimizer should replicate this table across all AMPs and partially restrict the data that goes into the spool. Again, the issue here is if the filter conditions will pull out the entire table - this would make no difference

I'm curious to hear more tuning options from others based on only the explain plan.