System health status and heartbeat query threshold response time

Viewpoint
Enthusiast

System health status and heartbeat query threshold response time

Per this article:

http://developer.teradata.com/blog/stever/2011/08/understanding-the-system-health-equation

"A Teradata Database system is considered DOWN in the health equation if any of the following conditions occurs:

  • A canary query is enabled in the health equation and returns an error (such as a SQL parse exception or login failure).
  • The System Heartbeat canary query is enabled in the health equation and fails to complete in under 60 seconds.
  • Any other canary query is enabled in the health equation and fails to complete in under 30 minutes."

Is there any way to change the maximum allowed response time on the System Heartbeat canary query (sel * from dbcinfo)?  We've seen a few instances where we received "health down" alert for the heartbeat query when the system isn't actually down.  Sometimes other system conditions (which we'd like to alert on separately) do cause a heartbeat response of greater than 60 seconds, triggering the "health down" alert.  We want to increase this value to something that would raise the likelihood that the system is actually down when we receive the alert.  For sake of example let's say 120 seconds.  So: Can we do this? And if so, where?

 

Viewpoint 14.10

 

Thanks,

Dan

3 REPLIES
Teradata Employee

Re: System health status and heartbeat query threshold response time

Dan,

The canary query response time threshold cannot be modified today.  However, the query can certainly be modified.  Certainly there's some query that can complete in under 60 seconds on your system, right?  It might be beneficial to switch to a single AMP query and ensure it isn't delayed by TASM.

Thanks, Steve

Teradata Employee

Re: System health status and heartbeat query threshold response time

It's in Monitored Systems (or Teradata Systems if you are running Viewpoint 14.10 or earlier).

Enthusiast

Re: System health status and heartbeat query threshold response time

That was easy enough, thanks Steve.  Basically 'sel * from dbc.dbase where databasenamei = 'dbc'' - with row-level access lock.  Will always go to same amp but is at least single amp.