Tuesday, May 22, 2012

vC Ops - Basic vApp Troubleshooting

vCenter Operations Manager 5 has really generated a lot of interest and many people are deploying a trial in their own environments to give it a test run.  I highly recommend this, there's nothing like using vC Ops with your own data to see what it can do and how it can help you.

However, I am seeing some common issues that have caused some trials to come to a halt.  For the most part, these are easily addressed and prevented.  I've put together a basic troubleshooting list to help make your trial or proof of concept run more smoothly.  Comments welcome!

Update to 5.0.2. Once again, several bug fixes have been introduced and this release includes the fixes from 5.0.1 noted below.

Update to 5.0.1. This version introduces an automatic fix for the "broken vApp" where a repair is done automatically when it is detected that the VMs can no longer communicate.  If you update to 5.0.1 from 5.0 you'll be prompted for reconnection information the first time you log into the Admin UI after a connectivity issue.  For 5.0.1 fresh installs you'll get prompted the first time you log into the Admin UI after deployment.

  • Upgrading to 5.0.1 fixes several stability and performance issues
  • 5.0.1 removes the need to run a manual repair when IP addresses change on the VMs

Shutting down/starting up.  The vApp is designed to start and stop the VMs in proper sequence.  It is highly recommended to shut down the ENTIRE vApp even if you only want to modify the UI VM (for example).

  • Always use the vApp to start the VMs by selecting the vApp in vCenter and then clicking the "green arrow" start button as you would with a VM.
  • Always use the "Power Off" button to shut down the vApp - even though it seems like that would cut power to the VMs it actually does a safe shutdown.
  • Failure to do so will (eventually) cause DB corruption.

IP Pools setup.  Basically, the IP Pool is used by the vApp to set up a VPN tunnel between the two VMs.  Here are some things that often get overlooked or mis-configured:

  • Do not enable the IP Pool, I've seen this cause issues with IP address assignment for the VMs in the vApp.
  • You only need configure the subnet, mask and gateway on the IPv4 tab - an IP pool range is not required (and you won't be able to edit unless the IP pool is enabled).
  • Use the DNS tab on the IP Pool properties to configure your DNS servers.  
  • In general do not use the "blue console" to configure networking, even though there's an option to do that.  Always use the IP Pool and vApp properties to manage networking.
  • Always use static IP.  Even if your DHCP environment is well run, a change in IP creates havoc. 
  • Always change IP addresses by shutting down the vApp and editing the vApp properties.
  • Verify network connectivity between the first and second VM.  From the console of the UI VM, ping "secondvm-internal" and "secondvm-external" to check that all is well.  If not, very likely you need to run the repair (see the section on vcops-admin command below).  If a ping works, simply shut down and restart the vApp to get things working again.
Disk space.  The sizing options when you deploy the vApp are pretty helpful, and many customers will select small because it goes as high as 1500 VMs.  This is an estimate and I have seen customers with as few as 1000 VMs run out of space within 90-180 days.  This can cause a number of issues depending on which VM has used all the space available on the /data partition.  I have seen this manifest itself in a few different ways so I always check disk space now as a matter of routine.

  • Check the /data partition on both the UI and Analytics VM using df -h 
  • Add additional storage if needed by first properly shutting down the vApp (Power Off - trust me, that's the right way) and add another VMDK to the VM(s).  Start the vApp and the additional disk space will be added to the /data partition automatically by the VM.
The vcops-admin command can be your friend.  Run from the UI VM (as the admin user) this is a set of tools to address most common issues.  Use vcops-admin help for a list of commands and options.

  • The repair command fixes a broken vApp.  That is, if your VMs are not communicating due to an IP address change (because you used DHCP for example) this command will get them talking again.
  • You can use vcops-admin to restart the services on the UI VM.  Sometimes this corrects issues with web pages not being accessible or other odd behavior - after you've checked storage and networking.
  • Sometimes the admin password is forgotten.  It happens.  Use the password command.
  • You can create a diagnostic bundle for VMware support from vcops-admin if you are not able to access the admin UI via web browser.
Missing scroll bars. It has been noted by some customers that certain combinations of Windows OS and IE 8 will cause the scroll bars to go missing in the the main view panel of the vSphere UI.  If you experience this, try using IE 9, Firefox or Chrome (I actually find Chrome works best).
I will likely be updating this post with additional information in the future, so please check back frequently.

*Updated 5/22 - 5.0.1 upgrade
*Updated 7/25 - 5.0.2 upgrade and missing scroll bars

No comments:

Post a Comment