Issue on post-deploy scripts

Teradata Database on VMWare
Enthusiast

Re: Issue on post-deploy scripts

Hi,

 

I found out the following issues:

 

1) Internal DNS configuration is messed up and it seems that some scripts need correct DNS solving to return data back to the VCenter host. YAST shows correct DNS entries following the common.IT.properties.json (dns1, dns2) but in fact the /etc/resolv.conf file is totally different. We overcame this by going into single user mode (root login allowed) and changing /etc/resolv.conf directly. This directly relates to the vix error code 3016 as I found in some forums.

 

2) After a reboot, doing de copyfiles step happened as a breeze, and it seemed more quite, with less stdout messages showing up.

 

3) Then we tried repeating the deploy step and got the following:

 

***********************************************************************************************

*

*   19/04/2018 18:19:29   - Configure Teradata DBS...

*

***********************************************************************************************

19/04/2018 18:19:29  - TDput Configure Teradata Operation on ( SBCDF25F ) and will take approximately 20 minut

es...

( SBCDF25F ) SystemName( BACEN04) SystemFamily(000CLV) TotalAmps(4) AmpsPerNode(4)

WARNING: The version of VMware Tools on VM 'SBCDF25F' is out of date and may cause Invoke-VMScript to work

improperly.

Stopping PUT services...

Killing Processes with SIG: -TERM:

        3498    /opt/teradata/TDput/bin/putservices

        3502    /opt/teradata/TDput/bin/portmgmt

        3503    /opt/teradata/TDput/bin/portmgmt

        3504    /opt/teradata/TDput/bin/portmgmt

Done killing services, moving on to anything that is left

No processes to kill.

Starting PUT services...

/opt/teradata/bin/autoput -l *********** -w *********** -q SUPPORT0001 -o Configure Teradata

Parallel Upgrade Tool (PUT)

Autoput terminated with ERROR(s).

Operation: Configure Teradata

Error message: The current step has terminated abnormally.

Plugin name: Clique/Amp Modeling

Log file name: /var/opt/teradata/TDput/fileservice/logs/sequencer_pdeconfig.log

Action: Please bring up PUT via the browser, click 'continue the operation' and correct the problem.

  4934 ERROR: system is cloud platform but clique 0 has multiple disk sizes - this is not currently supported

for cloud/tvme systems (CliqueAmpMod.cpp+18098)

  4934 ERROR: calc_new_amps_for_foggy() failed for clique 0 (CliqueAmpMod.cpp+30481)

  4934 ERROR: Plug-in aborted with message The current step has terminated abnormally. (runner.cpp+1106)

  4934 ERROR: Analysis of error follows: (runner.cpp+1109)

  4934 ERROR:      (runner.cpp+1111)

  4934 ERROR: ********** Error Log Analysis ********** (runner.cpp+1111)

  4934 ERROR: This error was detected in step "Clique/Amp Modeling" state "Auto_DetermineOptimalAmpsState".  (

runner.cpp+1111)

 

 

19/04/2018 18:20:08 Status of step:

Success: Sysinit and DIP complete.

WARNING: The version of VMware Tools on VM 'SBCDF25F' is out of date and may cause Copy-VMGuestFile to work

improperly.

WARNING: The version of VMware Tools on VM 'SBCDF25F' is out of date and may cause Invoke-VMScript to work

improperly.

False

 

19/04/2018 18:20:18 Status of step:

Error: Error Teradata could not be properly configured. It should have started.

 

Deployment started  - 19/04/2018 18:16:27

Deployment finished - 19/04/2018 18:20:18

 

Looking into the internal /var/opt/teradata/TDput/fileservice/logs/sequencer_pdeconfig.log file I found the following screens.

 

 

Log2.pngLog1.jpg

 

 

Teradata Employee

Re: Issue on post-deploy scripts

Hi Henrique,

 

First, check if vSphere/vCenter shows different disk size for this VM. First disk would be 187GB, second/third disk should be 100GB.

 

Based on the community forum, the first run of "deploy" actually kicks off AutoPut and might have generated some gdos or temp files in the system before it failed. I suspect the failure of the second "deploy" is due to these stall gdos and temp files. You could try to clean up the gdos and rerun "deploy" like this:

 

cp -p /etc/opt/teradata/tdconfig/tdgssconfig.gdo   /etc/opt/teradata/tdconfig/tdgssconfig.gdo.save

rm /etc/opt/teradata/tdconfig/*.gdo

cp -p /etc/opt/teradata/tdconfig/tdgssconfig.gdo.save /etc/opt/teradata/tdconfig/tdgssconfig.gdo

rm -f /dev/pdisk/*

rm -f /opt/teradata/TDput/data/permanent/.linkinfo

rm -f /opt/teradata/TDput/data/permanent/pnmgr.txtr

m -f /lib/udev/devices/pdisk/*

rm -f /opt/teradata/TDput/data/permanent/excluded_storage.txt

 

If this doesn't fix the problem, I would recommend to start from the beginning. I don't think the vix:3016 error would show up again if you specify the correct DNS server in the common properties file.  Your site people sent me your property files and everything looks ok as best I can tell except one issue I saw which is probably not related to the error you are getting, your value for ntp1 that you have listed as "timeserver". I would fully qualify the name of your timeserver and make sure you can ping it. For example in-house I use: "time00.teradata.com" as my ntp1 time server (yours will be different than mine). Let me know if those suggestions help.

 

Teradata Employee

Re: Issue on post-deploy scripts

One other suggestions from one of my peers:

 

I would look into seeing what size the Pdisks are in:

 

  1. The Guest OS itself
    # /usr/pde/bin/tvam -display -config
    << See if the "Numblocks" is the same on each device  >>
  2. The Sphere level
    << Take a look at the size of the datstore(s) you are using. Also check to see if you have enough available space in your datastore(s) >>
Teradata Employee

Re: Issue on post-deploy scripts

Henrique,

 

I tried deploying the free developer Tier with settings almost the same as yours (except for things I needed to change like name, IP address, DNS, Timezon etc...) and it deployed just fine.

 

Arnie

Highlighted
Enthusiast

Re: Issue on post-deploy scripts

Thanks for the feedback. I will perform the steps you suggested as soon I get some free time here.


@ArnieChazen wrote:

Henrique,

 

I tried deploying the free developer Tier with settings almost the same as yours (except for things I needed to change like name, IP address, DNS, Timezon etc...) and it deployed just fine.

 

Arnie


 

Enthusiast

Re: Issue on post-deploy scripts

Sorry to hijack this thread but has there been an resolution to the 2nd part of the Configure step?

 

I'm having the same issues, but I never experience any of issues on the original post. Deploy works until we get to Configure, and than I get the same error as the OP. Even the log sequencer_pdeconfig.log are similar. I've ran this and deploy VM multiple times but its always the configure that get stuck and I can't start up the teradata.

 

I'd also ran TVAM, and I got the same NumBlocks at 209712384 on both disk (default setting of 100GB on configs). The cleaning up the gdos/temp files didn't help (and it wouldn't matter since it was on the 1st deployment of a new VM).
We're somewhat stuck in deploying this to test Teradata, so any help would be appreciate or any direction to it. 

 

Thanks

Teradata Employee

Re: Issue on post-deploy scripts

I am checking with engineering for you. Can you attach some pictures of your VM environment such as the datastores you are using along with available space on each data store. Maybe a picture of what VMs are running on the node you are deploying this system. Did you remember to delete the bad VM before re-deploying a new VM?

Enthusiast

Re: Issue on post-deploy scripts

Hi,

 

I'm not availble for accessing the info right now but we have NFS datastore with 30TB free. The VM & Disk are hosted on the same datastore. We have around 6 VM on the ESX host but below the 40 logical processor (36 when deploying). 

 

This is just a single node teradata. I never used the script to remove VM, I just delete it via Delete VM vmware (disk included).

 

Thanks for replying! I will try to attch my configures, and more. 

Enthusiast

Re: Issue on post-deploy scripts

I am having the same (or substantially similar) issue as in the original post as well, but I had been posting about it in the wrong forum (Database forum).  However, I'm trying to deploy a two-node system.

 

Part of the problem seems to be that after sysinit completes and the dbs restarts, the cpu utilization goes up to 100% on both systems, they become unresponsive, and dip never gets kicked off.

 

Happy to provide any info that might be helpful...

 

Thanks!

Teradata Employee

Re: Issue on post-deploy scripts

The original post has to do with an error being detected in step
"Clique/AMP Modeling" State. Likely something related to the Datastore or pdisks (some kind of issue related to storage). The issue you are describing with the high CPU utilization sounds like it might be an issue with fsg cache being too high for the amount of memory you are deploying. If you can post your errors, logs, and screen shots we can take a look and see if we can help you. One thing that might help is if you increase the amount of memory for initial deployment or if the system is accessible from Linux try lowering the FSG cache to maybe 80% to 85% if it is higher than that. Also if you can provide the contents of your two property files we can look at those settings as well. If the error message is different please start a new post.