Killer queries: Track them, find them, fix them

Viewpoint
Teradata Viewpoint is Teradata's strategic and innovative SOV (single operational view) for Teradata DB, Aster, and HDP Hadoop systems management and monitoring that enables Teradata's Unified Data Architecture (UDA).
Enthusiast

Re: Killer queries: Track them, find them, fix them

I do not have access to viewpoint and in order to tune our queries I have the following questions for you:

1. Is there a way to find this info with the help of queries manually?
2. Is there a way to find the exact threshold ratio set with the help of queries manually?

Thanks,
Vipen.
Teradata Employee

Re: Killer queries: Track them, find them, fix them

Vipen,

I am not aware of any way via the query to do live monitoring of Teradata systems. Viewpoint does live monitoring of Teradata system, so you can know at that point in time the step that is being executed by the query or how much resource is being used by the query etc.. It can also simulate the live monitoring by taking you back in time using rewind feature.

Other thing what Viewpoint does is trending analysis using Metric Analysis/Metric graph portlet. This can be done querying DBQL, resusage data because this is looking at historical data.

If you are looking for all the completed queries to identify bad running queries you can query DBQL tables.

However the amount of information you can get using the DBQL/Resusage table depends on the logging that are enabled. If you do not have logging for explain enabled you will not see the explain.

Can you elaborate more on #2. What are the threshold ratio you are looking for ?.

Thanks,
Shrity
Enthusiast

Re: Killer queries: Track them, find them, fix them

For #2, I am looking for a ratio CPU to Disk IO, because of which our queries are becoming a killing target by TDWM.
Teradata Employee

Re: Killer queries: Track them, find them, fix them

Hello Shrity...Thanks for the article. Very nice elaboration.
I would like to clear a few things:
(a) Amongst CPU Use and Impact CPU, which one is a more dominant factor for identifying Bad Queries to be Tuned ?
(b) It's a difficult/misguided task zeroing on CPU Use alone for identifying Queries to be tuned. So, amongst other metrics like Spool Usage, Impact CPU, CPU Skew, CPU Use, ReqCPU, Total IO Count, PJI; which 3-4 metrics are very critical for identifying Bad Queries ?

My Opinion is (In Decreasing Preference) : ReqCPU (As mentioned by you) | CPU Skew/CPU Use | Total IO Count/Impact CPU | Spool Usage/PJI

I understand that such questions are best answered via experience, but I am looking for a near-ideal picture for identifying Bad Queries.
Teradata Employee

Re: Killer queries: Track them, find them, fix them

Vipen,

You are talking about TASM rules. You may have to work with your DBA to identify what are those..

Thanks
Shrity
Teradata Employee

Re: Killer queries: Track them, find them, fix them

Samark,
You are welcome and thanks for the comment,

In my opinion, there is no single definition of bad queries and the definition may change based on workload/jobs running. If my system is I/O bound I would not care much about CPU usage, this is the reason we don't have a bad query indicator in Viewpoint. For live monitoring if I encounter a situation that system is running slow, I usually see system health to see which metric is going beyond the threshold and identify the query that has maximum resource consumption for those metrics in query monitor portlet and label it saying these are bad queries at this point in time.

(a) Amongst CPU Use and Impact CPU, which one is a more dominant factor for identifying Bad Queries to be Tuned ?
>> CPU Use and Impact CPU have different Usage. Impact CPU shows "CPU time in seconds of the highest CPU utilized AMP" so it is better used along with CPU Skew. CPU USE is only available when query is running as that shows CPU Utilization by a query at that point in time. If my system has high CPU Utilization I would use CPU USE to identify which query is using more CPU at that point in time. If it has high AMP CPU Skew I would use CPU Skew and Impact CPU.

(b) It's a difficult/misguided task zeroing on CPU Use alone for identifying Queries to be tuned. So, amongst other metrics like Spool Usage, Impact CPU, CPU Skew, CPU Use, ReqCPU, Total IO Count, PJI; which 3-4 metrics are very critical for identifying Bad Queries ?
>> To Identify bad queries, I usually use Query Spotligtht portlet (If I don’t know what metric I am targeting at) and go by Duration as long running queries has more tuning opportunities than a short tactical queries. Then I look at steps that took long to complete. Then I start dissecting the steps to see identify resource that is consumed more in that step using the Snapshot. If I don't find anything I enable "control rewind" to see what was going on with system at that point in time, especially system health portlet to see resource consumption and correlate that to query resource consumption..

Yes you are right we cannot use only one metric to say query is good/bad. Instead for completed queries I think we need to first identify which metric is making that query run slow. For running queries we need to look at the metrics that is impacting my system performance at that point in time.

This is purely how I look at things... :-)

Thanks, Shrity
Enthusiast

Re: Killer queries: Track them, find them, fix them

So it means that there is no way to find the TASM rules from the database views/tables related to TDWM?
Teradata Employee

Re: Killer queries: Track them, find them, fix them

You can know by quering TDWM tables but not everyone have access to it.
Enthusiast

Re: Killer queries: Track them, find them, fix them

Thanks for the information.

Re: Killer queries: Track them, find them, fix them

Nice article Shrity.