Can we compare the CPU calculated from Resusage table with one shown in DBQLogtbl?
The above discussion, talks about systemwide CPU utilization on a day and Hour basis. DBQLogtbl also shows us TotalCPU per vproc and other details, can we corelate that CPU calculation with the calculations we make with resusage data?
Which is more precise? DBQLlog or Resusgae?
Does DBQL account for Waittime and Idletime w.r.t CPU utilization?
That is an interesting question.. I believe that the AMPCPUtime is the total USER CPU by the statement ignore the system, wait and idle time.
The DBQL records one record for each statement; so its difficult to use just that information to generate a usage profile. I think it also only records the CPU used in running the query upto first response time, I don't believe it includes the CPU involved in returning data to the client.
I use the AMPCPUtime for recharging to departments and the resusage for capacity planning.
Which is more precise? I would imagine that both are as accurate as each other, but I use them for different purposes. Resusagepma is good for providing general graphs on AWT CPU utilisation monitoring the relationships between AWT usage and CPU usage to provide best guess limits for restricting work etc. I also use it to manage BYNET contention and Physical/Logical IO etc.
DBQL is better for identifying the rogue queries, grouping CPU by User/Department etc.
Thanks for the information Random_thoughts! This is very useful.
I would like to ask few more questions regarding the CPU Usage by a Query per user.Hope you will help me understand.
(1)Will the CPU Usage of a Query remain same (excluding wait time and idle time) across all types of loads on the system for eg:- I have a Query that is run by a userid. an account string is assigned to this userID. The allocated weight for this user is different during different performance periods. Will I see same CPU Usage by this query during different performance periods.Where should I be checking the CPU Usage of this query, DBQL or Resusage?
(2)I have a SQL that deals with aggregation based on keys. I have 3 combination of keys and My SQL can be run in a single shot or in multiple phases(3 different SQLs).When the allocated weight for my performance group is high, which is the best scenario to run the SQL (a) Run the SQL in single shot (b) Run the SQL in 3 parallel sessions? What is the impact on CPU Usage in (a) and in (b)
(1) You should check CPUUsAGE for queries in DBQL, one statement per record. its easier I think.
(2) a) b) depends on the query. the intensity of reading data from the disk and the concurrency of other jobs in the system. RUnning together you could find you have issues with Cache pollution as one queries rows are discarded from memory to be reread in a couple of seconds, or it could benefit the other queries as the data is in cache somewhere and no physical read is required. You should run the queries and monitor the cache. Post if you need advice on this, I don't know if its covered elsewhere
I don't know if this will help answer your question but ...
The CPU usage reported by DBQL should be the same each time your query is run. Assuming data is the same, your statistics are upto date etc. you may notice some slight variation but ingeneral we will assume it to be constant.
What can alter is the elapsed time for the query? from when the query is submitted, to when the data starts to return to when the data finishes returning are all measures you should be concerned about.
When a query is submitted DBQL records a timestamp in starttime. When it starts to run, a timestamp is written into firstStepTime. The difference is the Delay time. This is governed by the TASM group the query is put in. if only 2 queries from that group can run at once. This query will wait.
When a query completes, as far as teradata is concerned then the timestamp is written into FirstRespTime. The difference between FirststepTime and FirstRespTime is the elapsed time of the query. Therefore the AMPCPUtime cpu seconds are consumed here. For Example you may have consumed 5 seconds of CPUtime and taken 15 seconds to execute. The factors that can affect the elapsed time are CPU weightings this is where your allocated weight comes into play. Your weightings may not work in the same way as you would expect. In this example I will have a batch and a adhoc group and set the weightings so I have a 80/20 split between them. This split does not mean that one gets 80% of the CPU and the other 20%. All it means is that the underlying Linux scheduler under the teradata covers will send 80% of the time slices to Batch and 20% to adhoc. In this case 1 query running in batch can be swamped by several running in the adhoc. Why? because One query cannot by itself usually consume the whole allocation. The process has to wait for iO (Disk or network)) so does not require CPU, and relinquishes it. If the adhoc has more queries then there is a greater chance that one will require CPU, and thye will use their whole timeslice. Then if you have several large adhoc queries reading loads of data, the caches become filled with their data, and not the data the batch process needs to run with thus exagerating the IO wait for the batch query.
The last part of the query is the data return. This is from when the FirstRespTime is recorded to the last byte that is returned to your invoking program. This can be affected by Network traffic, disk load (remember the data is coming from spool, which is on physical disk. this will still need CPU, but I am not sure where this is recorded. if your query is an aggregation then you many not be returning millions of rows. NOte Teradata as standard does not return the LastResponse time, so measuring this is not as easy as on other platforms.
Does this help? or confuse. Please ask if you require more information or clarification. For every person asking the question there are often loads of people wanting to know that wont post!
In answer to another question earlier in the thread that I overlooked from Monis. The different Node sizes are catered for because you will run more AMPs on the more powerful ones and fewer on the less powerful ones, to even things out. In general a parallel system will run to the speed of the slowest component, so doing it this way would ensure that the more powerful NODES get more work. Having said that all our systems have the same nodes in the configurations and I do not have experience of different node sizes.