TD 14 on EC2 hangs at 100% CPU requiring reboot

Teradata Express Software
Formerly this board name was Cloud Computing. The name was changed to better represent the product areas covered.

TD 14 on EC2 hangs at 100% CPU requiring reboot

  • Used the Teradata 14 supplied image for installing on Amazon's EC2 and followed the TD installation guide.
  • No users, no load, no jobs -- nothing happening from the user environment -- the server is in an idle state.
  • Every few hours (6 - 24) the CPU will go from 2% to 100% CPU usage, the system looses all SSH connections and the server needs to be rebooted.
  • Sent TD the server event logs and other TD generated log files but no resolution.

Any help is appreciated.

Thanks!

These are the typical server event log entries preceding the 100% CPU state.




INFO: Teradata: 13018 #Event number 33-13018-00 (severity 0, category 11) 1 node online.
INFO: Teradata: 14081 #Event number 33-14081-00 (severity 0, category 11), occurred on Wed Oct  1 07:38:56 2014 at 001-01 (Vproc 10237, partition 32, task 4967) in system tdexpress in Module tvsa_agent, version PDE:14.00.00.01,TDBMS:14.00.00.01,PDEGPL:14.00.00.01,RSG:14.00.00.00,TGTW:14.00.00.00,TDGSS:14.00.00.00
 PUT-generated perferred mapping file not found.
INFO: Teradata: 14081 #Event number 33-14081-00 (severity 0, category 11), occurred on Wed Oct  1 07:38:56 2014 at 001-01 (Vproc 10237, partition 32, task 4967) in system tdexpress in Module tvsa_agent, version PDE:14.00.00.01,TDBMS:14.00.00.01,PDEGPL:14.00.00.01,RSG:14.00.00.00,TGTW:14.00.00.00,TDGSS:14.00.00.00
 PUT-generated perferred mapping file not found.
INFO: Teradata: 13896 #COD: DiskPerformance scaled to 100%
INFO: Teradata: 13896 #COD: CpuPerformance scaled to 100.0%
1 REPLY

Re: TD 14 on EC2 hangs at 100% CPU requiring reboot

I am having problems using TD 14 on EC2 and was wondering if it is related to this post or a different issue.

Every time I try to logon to bteq I get an error message like this one:

 *** Warning: RDBMS CRASHED OR SESSIONS RESET.  RECOVERY IN PROGRESS.

I have tried rebooting several times. There are many processes running on the ec2 host.

I manually edited /etc/hosts to include a line like this one:

NNN.MM.RR.81 ip-NNN-MM-RR-81 ip-NNN-MM-RR-81.td.teradata dbccop1091 dbccop1

(changed the actual IP numbers to letters for this post)

and can ssh to those hostnames (it does require manually entering a password).

I don't know if this is relevant but I am getting hourly email messages like this:

From: root@ip-10-17-1-125.ec2.internal (root)

running hourly cronjob scripts

SCRIPT: output (stdout && stderr) follows

Unknown Intel CPU type family 6 model 3e

SCRIPT: mcelog

------- END OF OUTPUT

Is this the right forum and if so does anybody have any ideas? Is there a log file that might have more details?