TPT sessions are left behind?


TPT sessions are left behind?

A user test scenario uses TPT STREAM operator.

Test is inspected with respect to sensitivity to network outage / broken connection.

Following query is issued repeatedly in order to follow our testing "SUPPORT" team sessions.

SELECT Username , DefaultDatabase, LogonSequenceNo , PARTITION AS Utility_Type,

     MAXIMUM ( CURRENT_TIMESTAMP - ( CAST( ( CAST( LogonDate AS DATE FORMAT 'YYYY-MM-DD'  ) ( CHAR ( 10 ) )  ) || ' ' || LogonTime  AS TIMESTAMP ) ) HOUR TO SECOND ) AS TimeLoggedIn ,

     COUNT ( * ) AS num_of_sessions   

FROM Dbc.Sessioninfo       

GROUP BY 1 , 2 , 3, 4;

Initially - following is reported:

Username, DefaultDatabase, LogonSequenceNo, Utility_Type, Time_LoggedIn, num_of_sessions

DBC,DBC    ,0x00000000,DBC/SQL                         ,  0:00:00.060000,1

DBC,SUPPORT,0x00000000,DBC/SQL                         ,  0:03:27.590000,2

So - number of “SUPPORT” sessions are 2.

TPT program runs.

Now - our SUPPROT guys cause a connection error by blocking the IP of the Teradata DB.

The C/API based TPT program senses this exception, issues conn->Terminate() and exits.

Repeating the above query now shows:

DBC,DBC    ,0x00000000,DBC/SQL                         ,  0:00:00.060000,1

DBC,SUPPORT,0x00000000,DBC/SQL                         ,  0:06:03.920000,3

Number of “SUPPORT” sessions are 3

Repeating this sequence again and we see:

DBC,DBC    ,0x00000000,DBC/SQL                         ,  0:00:00.130000,1

DBC,SUPPORT,0x00000000,DBC/SQL                         ,  0:08:32.410000,4

Number of “SUPPORT” sessions are 4

These sessions were initially subjected to conn->Initiate()as they started and were cleaned up by a corresponding conn->Terminate() thereafter.

But we see that the number of session is constantly growing, regardless of the fact that backend TPT neatly terminates.

What is the point here?

Why these TPT session are left behind?

Teradata Employee

Re: TPT sessions are left behind?

The Terminate() call cleaned up the sessions on the client, but the database never saw that request because you were blocking network traffic.

By default, the Teradata gateway should send start sending a keepalive probe back to the client after about 10 minutes of inactivity, and the session should be cleaned up within another 10 minutes or so after that.


Re: TPT sessions are left behind?


But - a further clarification is still required.

Both participants of that TPT dialog are loosly coupled - in a sense - in a non-symetric fashion.

Our C/API based client senses right away that the connection was lost.

It responds with a conn->Terminate(), then exits and attempts to re- connect after a short while.

By issuing conn->Terminate() it releases all its allocated resources which were used by the lost connection.

However - the backend counterpart of this couple remains active for additional 20 minutes at most - right?

A successfull re-connect from the client side will establish a brand new couple of participants.

As far as I know, the TPT/API protocol is not capable of identifying an existing hung backend connection and renewing a client link to it directly, thus bypassing the common client initialization sequence - is it?

So - I wonder what is the point in keeping that 'orphaned' backend counterpart active for that long period of time?

Who can reuse it? How can a common TPT client re-attach to it?

Did I miss something?



Teradata Employee

Re: TPT sessions are left behind?

When the TPTAPI operators issue the "disconnect" (in conn->Terminate), it is an asynchronous process. By that I mean that we issue the disconnect requests but do not wait to find out if the disconnect is successful or not (at that point it does not matter because we are terminating and our processes will be exiting).

So, we have no idea whether the disconnect really took place.

On a reconnect (on a restart) we are always connecting new sessions. We are not ever trying to reconnect to previously connected sessions.

Whether the old sessions hang around for the 20 minutes is not known to us, and we have no control over that. That is a Gateway/DBS issue.

-- SteveF