High Impact / Worst Performing / Long Running Queries (Teradata)

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Visitor

High Impact / Worst Performing / Long Running Queries (Teradata)

Hi Folks,

First off, let me start by saying that this question has probably been around for long. However, since I am new to Teradata I am requesting some help from you to understand and achieve my goal.

That been said, I am trying to build a query that will show the top 10 worst performing /long-running queries for a given time-frame in the past (e.g last 7 days etc). I already did some reading online and came up with few metrics that might help identify such queries. Needless to mention its all very confusing to me at this point and hence request your help in sorting this out.

 

SELECT    
UserName,
LogDate,
QueryID,
StartTime,
FirstRespTime,
((FirstRespTime - StartTime) HOUR(4) TO SECOND(2)) AS ElapsedTime,
((FirstRespTime - FirstStepTime) HOUR(4) TO SECOND(2)) AS EexecutionTime,
(FirstRespTime - StartTime) HOUR to SECOND(4) AS FirstRespElapsedTime,
ParserCPUTime,
AMPCPUTime,
AMPCPUTime + ParserCPUTime AS TotalCPUTime,
SpoolUsage/(1024*1024*1024) AS Spool_GB,
(MaxAMPCPUTime) * (HASHAMP() + 1) AS ImpactCPU
CAST(100-(nullifzero(AMPCPUTime/HASHAMP() + 1) * 100 /nullifzero(MaxAMPCPUTime)) AS INTEGER ) AS "CPUSkew%",
TotalIOCount,
MaxAMPIO * (HASHAMP() + 1) AS ImpactIO,
CAST(100-((TotalIOCount/HASHAMP() + 1) * 100 /nullifzero(MaxAMPIO)) AS INTEGER ) AS "IOSkew%",
QueryText
FROM pdcrinfo.<tables>
.....
.....
WHERE
logdate BETWEEN <input start-date> AND <input end-date>
AND
AMPCPUTime > 0

However, I am still struggling with the following questions -

  • Did I get the calculations shown above accurate ?
  • Does the above list of metrics suffice ? Or, are there any additional metric(s) that need to be included in the above list as well ?
  • Can I use PJI > 6 as one of the conditions ? 
    PJI = (TotalCPUTime *1000)/TotalIOCount > 6
  • Can the condition for unnecessary IO be used : 
    TotalIOCount/(TotalCPUTime*1000) > 6
  • I am bit confused with the ImpactCPU metric calculation logic. Apart from the one mentioned above, I found another logic as
    ImpactCPU = (max_vproc_CPU * number of vprocs). Please indicate the correct one.
  • Kindly let me know the history tables to use (in pdcrinfo etc)
  • Finally, what logic should I apply to combine all these metrics into one to identify the top 10 ?

We are using Teradata v14.

Please help me figure out the actual logic for the same.

Cheers !

 

 

2 REPLIES
Teradata Employee

Re: High Impact / Worst Performing / Long Running Queries (Teradata)

It's a good start. There are other metrics that can be useful in certain situations, e.g. NumSteps may give a rough indication of complexity.

Note that the Viewpoint "Query Spotlight" portlet supports this same sort of analysis. 

All the listed information except the full query text can be found in DBQLogTbl.

The "HASHAMP()+1" returns the total number of AMPs in the system. It's generally preferred to use NumOfActiveAMPs in this calculation instead

Junior Supporter

Re: High Impact / Worst Performing / Long Running Queries (Teradata)

Mave18,

 

I select many more columns as in my example, but there is even more in DBQLogTbl.

To find a top-10 can be done on many dimensions. I don't know of a golden condition.

I usually start with high CPU using queries, then high I/O using queries. The payback of optimizing these has high potential.

Don't forget that it is sometimes small queries with high impact, due to high frequency. You have to do some smart grouping to find those.

 

LOCKING ROW FOR ACCESS
SEL CPUrank "CPU rank",IOrank "I/O rank",TRIM(UserName) "User"
,SUBSTR(CAST(ElapTime AS CHAR(25)),3,12) "Elapse" ,ElapSecs "Elap Secs"
,QueryID ,AcctString --,SessionID ,LogicalHostID ,RequestNum ,InternalRequestNum ,LogonDateTime
,QueryBand "Qry band" --,AppID "Appl",ClientID ,ClientAddr
,StartTime "Start",FirstStepTime "First Step" ,FirstRespTime "First response" -- ,LastStateChange
,NumSteps "Num steps" ,NumStepswPar ,MaxStepsInPar ,NumResultRows "Result rows"
,TotalIOCount "Total I/O" ,AMPCPUTime "Total CPU" ,ParserCPUTime "parser CPU" ,UtilityByteCount
,UtilityRowCount ,ErrorCode "Err code" ,WarningOnly "warn only" ,DelayTime "delay time"
,AbortFlag "ab rt" ,CacheFlag "cache"
,NumOfActiveAMPs "AMP active" ,MaxAMPCPUTime ,MaxCPUAmpNumber ,MinAmpCPUTime
,ZEROIFNULL((MaxAMPCPUTime-MinAmpCPUTime)/NULLIFZERO(MaxAMPCPUTime)) "CPU skew %"
,MaxAmpIO ,MaxIOAmpNumber ,MinAmpIO
,ZEROIFNULL((MaxAmpIO-MinAmpIO)/NULLIFZERO(MaxAmpIO)) "I/O skew %"
,SpoolUsage "Spool usage" ,WDID "WD id" ,OpEnvID ,SysConID ,LSN ,NoClassification
,WDOverride ,ExceptionValue ,FinalWDID -- ,SLGMet
,TDWMEstMaxRows ,TDWMEstLastRows ,TDWMEstTotalTime ,TDWMAllAmpFlag ,TDWMConfLevelUsed ,TDWMRuleID
,DefaultDatabase ,AMPCPUTimeNorm ,ParserCPUTimeNorm
,MaxAMPCPUTimeNorm ,MaxCPUAmpNumberNorm ,MinAmpCPUTimeNorm
,EstResultRows "Estim rows" ,EstProcTime "Estim time" ,EstMaxRowCount "Estim Max Rows"
,StatementType "Type"
,SUBSTR(QueryText,200) "QueryText"
FROM (SEL A.* ,CAST(StartTime AS DATE) StartDate
,(CAST(FirstRespTime AS TIMESTAMP(6)) - CAST(StartTime AS TIMESTAMP(6))) HOUR(4) TO SECOND(6) AS Elaptime
,EXTRACT(HOUR FROM ElapTime)*3600 + EXTRACT(MINUTE FROM ElapTime)*60 + EXTRACT(SECOND FROM ElapTime) AS ElapSecs
,RANK () OVER( PARTITION BY StartDate ORDER BY AMPcpuTime DESC) CPUrank
,RANK () OVER( PARTITION BY StartDate ORDER BY TotalIOcount DESC) IOrank
FROM PDCRinfo.DBQLogTbl_Hst A
WHERE StartTime BETWEEN DATE-8 AND DATE-1
AND (AMPcpuTime > 1000 OR TotalIOcount > 10000)
) AA
WHERE CPUrank < 31 OR IOrank <31
ORDER BY 1;

 

Teradata Frank, Certified Master