I am trying to determine the length of time a query takes to run under Teradata. We run TD V12.
The view DBC.qrylog contains metrics to show the time a query was submitted, the time it "first" returned data to the client, but not the time it returned the last byte of data to the client. It shows an "ElapsedTime" but this seems to be the time since submitting the query to the first byte returned, which is no use to us.
Can anyone tell me how you measure the total start to finish time of a query? give me a pointer?
In my experience, the duration between StartTime and FirstRespTime (aka ElapsedTime) is more valuable in terms of the actual query performance than the last response time. The ElapsedTime is the time it took Teradata to evaluate the query given the current workload of the environment. This takes into account other queries and workload re-classification by Teradata Dynamic Workload Management.
Including the LastRespTime into the equation begins to include factors external to the Teradata environment such as the LAN/WAN/VPN and ability of the client to render the data as it is being returned by Teradata. In other words, factors beyond your control to tune and improve upon.
I have found that the metrics AMPCPUTime, TotalIOCount, the ratio between those two values, and the (AMPCPUTime/<# of amps)/MaxAMPCPUTime (CPU Efficiency) are more important to me than ElapsedTime.
I hope this helps. If it does not, any further elaboration on what you are trying to accomplish may help me (or others) improve our responses/suggestions.
I am not worried about performance at the minute, I do not need to tune queries. Our system was going to go live with a huge SAS and Data mining community, these users run in the region of 50000 queries a month, I currently total the queries into "buckets" depending on their elapsed time, ie It could break down that for a particular month we have
20000 less than 30 seconds 8000 less than a minute, 9000 less than 5 minutes ... 50 over 7 hours
We can then have realistic workload based Service Level Agreements based on the percentages. I can also see when the Average query completion times start to alter from their norm. I very rarely start to look at individual queries, you can spend your entire life doing that, I prefer to take a higher level view and look at the table usage and index usage, look for better ways to optimise the indexes, ensure stats are up to date etc. (We will look at queries when requested by user, and normally it can be rewritten or debugged.)
If we cannot get the elapsed time of a query, we are missing a fundamental metric. The first response time is pretty useless for our requirements, and I suspect most managers. I am never really interested in when something starts after I initiate it, just when it finishes, be that Queries in a database or units of work set to my staff.
I understand the concepts you mention the AMPCPUTime and TotalIOcount are very useful for monitoring the performance of Teradata itself, and including LastRespTime would bring into focus factors beyond a DBAs ability to control, Network bandwidth, client performance etc, but it does include the time taken to recover the data from the disk.
From a customer point of view though the database is everything from their keyboard back to the actual database.
To successfully implement Teradata on our site, we need to manage user and management expectations. If the users see 45 minutes response times on a queries and the DBAs are reporting 3 second response times then Teradata have an issue.
Forget about LastResponseTime, it never actually worked, that's why it's removed in TD12.
After ElapsedTime the first rows are send back to the client and this is when users start to look at the result set. Unless there's a huge answer set the time to send the data across the network will be neglectible.
But i don't know how SAS handles the answer set, does it start to process it after the last row has been received? If you really need the end-to-end time you should gather that info from SAS, probably there will be a query history with timings.
The lastresponse time never did work, perhaps I was too optimistic, I hoped that it would be fixed rather than removed with TD12.
I have an example query . simply select * from dbc.qrylog
The query takes over 50 minutes to run, extracting all the data. The query reports 1.5 seconds elapsed time. When this query runs, according to the qrylog table, its finished, but the effect of the full table scans whilst it returns millions of rows over next 50 minutes will be felt.
If teradata does not "know" this query is still running, and accessing the disk subsystem, how does the workload management system see it?
The Full Table Scan is finished after 1.5 seconds and all data is in spool now: Teradata starts sending back the first row *after* the whole answer set has been created, so for the PE (which maintains the Query Log cache) the query is really finished.
Tools like PMon will show the PEState now as "Response" whereas AMPState is "Idle" and you can't use the SQL Button anymore to see the query details.
The Query Log is a performance log, but it's about Teradata performance, not network etc. What if a user submits a query using SQL Assistant and after 2000 rows there's that "cancel according to settings" message. Teradata will wait forever until the user presses Yes or No.
Better try to get that info from SAS query history.
If your focus is elapsed time, DBQL is not the place to find or calculate that metric. SAS, Cognos, Business Objects, etc. should provide you with log tables that detail the time it took to produce a report or resultset.
Thanks guys for the responses. I have also spoken to Teradata Support and they confirm that they cannot get the end of the transaction, they suggest we put in a RFC, but said not the expect any improvement :-(
Our issue now is to how to compare the Teradata system with DB2/ORACLE/SYBASE which all caputre the "Last repsonse time" and which we are used to measuring. We have hundreds of SAS analysts spread over the entire world, we don't have access to their SAS servers, so cannot get any logs from there.
I have decided to run a few of the queries I see and time them from a desktop and use the metrics available to manufacture an estimated total run time. I think something like
total_time=(FirstRespTime - StartTime) + NumResultRows/550 ( 550 rows second being the fetch rate)
My example query,
select * from DBC.QryLog returned 1418558 rows, and took 7.78 seconds elapsed time which gives a total elapsed time of 2580 total 2588 seconds or 43 minutes 8 seconds.
Obviously the row width is going to vary over queries, so I need to investigate this further.
Only metrics gathered from the TD server are consistantly available so if anoyone has any further Ideas, I would be grateful.