Unable to use twbcmd to terminate a job is a FileReader is using Vigil to wait on new files

Tools & Utilities

Unable to use twbcmd to terminate a job is a FileReader is using Vigil to wait on new files

Hello,

I have a process that uses TPT Stream to load from a FileReader producer that is using the Vigil properties to continuously scan a directory.  The Vigil time is set to many hours to reduce any load delay caused by TPT startup, as new files are constantly being written.

I am having difficulty finding a way to cleanly end the process while it is running.

I was trying to use twbcmd, but I think I may be encountering a bug.  `twbcmd ${JOB_ID} JOB TERMINATE`successfully commits the buffers and performs the necessary checkpoints when there are files remaining that match the FileReaders' FileName pattern (with wildcard).

However, when there are no files in the directory that match the FileName pattern, the JOB TERMINATE command is ignored, indefinitely.  This is the last line in the tlogview logs:

Processing "JOB TERMINATE" user command

Performing checkpoint prior to terminating job

The process will wait in this state until new files (one per instance) matching the FileName pattern arrives (or the Vigil time expires).  In the former case (such as if I copy files matching the pattern into the directory), the FileReader operators wake up and start processing the files, but then immedately acknowledge the checkpoint signal:

Processing "JOB TERMINATE" user command

Performing checkpoint prior to terminating job

FileReader: TPT19222 Operator instance 1 processing file '[Filename1, removed]'.

FileReader: TPT19222 Operator instance 2 processing file '[Filename2, removed]'.

FileReader: TPT19222 Operator instance 3 processing file '[Filename3, removed]'.

FileReader: TPT19222 Operator instance 4 processing file '[Filename4, removed]'.

Task(SELECT_2[0002]) ready to take internal checkpoint

Task(SELECT_2[0003]) ready to take internal checkpoint

Task(SELECT_2[0004]) ready to take internal checkpoint

Task(SELECT_2[0001]) ready to checkpoint

Task(SELECT_2[0002]): checkpoint completed, status = Success

Task(SELECT_2[0004]): checkpoint completed, status = Success

Task(SELECT_2[0001]): checkpoint completed, status = Success

Task(SELECT_2[0003]): checkpoint completed, status = Success

However, this is always followed by a checkpoint error that triggers a TPT restart.

TPT_INFRA: TPT02258: Error: Operator checkpointing error, status = Retry Error

Task(APPLY_1[0001]): checkpoint completed, status = Retry Error

TPT_INFRA: TPT03720: Error: Checkpoint command failed with 47

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 89752, Total Rows Received = 1822251, Total Rows Sent = 0

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 22422, Total Rows Received = 0, Total Rows Sent = 455158

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 22429, Total Rows Received = 0, Total Rows Sent = 455226

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 22537, Total Rows Received = 0, Total Rows Sent = 458020

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 22368, Total Rows Received = 0, Total Rows Sent = 453847

**** 12:00:16 RDBMS CRASHED OR RETRYABLE CONDITION OCCURRED.

              JOB WILL BE RESTARTED.

After the restart, the TPT job continues rather than terminating.

Is there anything that I am doing wrong, with respect to either TPT or twbcmd?  Is there any other way that I can cleanly terminate a TPT process that is using a FileReader with Vigil, even if this may occur when the FileReader has  no files to read?

TPT version: 14.00.00.08

OS: Sun OS 5.10 / Solaris 10

Thanks!

TPT_INFRA: TPT02258: Error: Operator checkpointing error, status = Retry Error
Task(APPLY_1[0001]): checkpoint completed, status = Retry Error
TPT_INFRA: TPT03720: Error: Checkpoint command failed with 47

2 REPLIES
Teradata Employee

Re: Unable to use twbcmd to terminate a job is a FileReader is using Vigil to wait on new files

We are aware of this issue, but not much can be done about it right now.

When you use Vigil, you are effectively telling the operating system to "sleep" for a particular duration of time.

During that "sleep", we cannot interrupt that process (the sleep is controlled by the operating system).

The only way to terminate the process is to obtain the job id and then use the "twbkill" command (do not try to kill the processes manually yourself).

-- SteveF

Re: Unable to use twbcmd to terminate a job is a FileReader is using Vigil to wait on new files

Thank you for the fast reply feinholz, I will use twbkill.  If it causes any issues with checkpointing or data loss, I'll look into using shorter vigil periods with future-dated VigilStartTimes, so any startup overhead occurs before the previous process has ended and does not cause load downtime.