Thursday, May 11, 2017

vROps and Service Discovery - A New Dashboard for Troubleshooting Service Discovery

While working with the new vRealize Operations Service Discovery Management Pack in the Tech Marketing lab here at VMware, I built a dashboard to help troubleshoot service discovery issues.  I am sharing this in case others find it helpful.

There are a few things that can lead to service discovery failures:

  • Version of tools older than 10.1
  • Tools not running
  • Unmanaged tools that cannot be updated from vCenter
  • Invalid guest mapping credentials
This dashboard helps with isolating each of these problems; let me show you how.

The dashboard has six useful sections:

1 - Select the vCenter you wish to troubleshoot.  You can also view the collection state of the VC here before digging too deep.  If there are problems with vCenter Server, you should address those first.
2 - This list provides all of the SDMP adapter instances; it makes it easy for you to configure excluded services for SDMP.
3 - SDMP Stats scoreboard shows some interesting stats based on the SDMP adapter instance selected in item 2.
4 - The VM Discovery Issues list shows the following:
  • VMtools status
  • VMtoools support
  • Discovery method (Guest Alias or the default adapter instance credentials)
5 - VMs with No Service Discovery is a list of VMs for you to work from that have no initial discovery.
6 - Tools Status breakdowns provides overall number of VMs with tools running/not running and tools versions installed.

To install the dashboard, follow these steps:

1 - Download the zip file for the dashboard content here. (Click the download button)
2 - Extract the zip file to a suitable location.
3 - In vROps UI go to Content > Dashboards, import the SDMPTroubleshootingDashboard.json file.
4 - Now go to Content > Views, import the following:
  • SDMP Discovery Failures
  • SDMP No Success Tools Running
  • SDMP No Success Tools Version
  • SDMP VM No Services Discovered
5 - Finally, open the SDMPTroubleshooting.xml file in an editor (notepad is fine).  Create a new metric config called SDMPScoreboard in Content > Manage Metric Config > ReskndMetric and paste the contents of the xml file into the new config (you should overwrite the default contents).  Save it and you are finished!

Monday, December 5, 2016

The Self-Healing Data Center Part 5: Configure an Alert to Trigger the vRO Workflow

Now that we have configured the Translation Shim and vRealize Orchestrator the next task before testing is to configure vR Ops to send an alert to the shim.  This will be the final post in this series and when you complete this, you will be ready to apply this solution to automate any alerts you desire by simply creating the appropriate workflow and alert settings.

I am using vR Ops version 6.4 in the steps below, but this should work with any 6.0 or higher version of vR Ops.  We are also using Endpoint Operations to monitor the state of a service on a Linux OS.  Endpoint Operations is NOT required to use this shim, I am only using that because it provides a way to easily trigger an alert by stopping a service on a monitored OS.  It also shows that any automated remediation or activity is possible, not just automation of virtual infrastructure.

Saturday, December 3, 2016

The Self-Healing Data Center Part 4: Configuration of the vRO Example Workflow

So far we have discussed the reasoning behind the Translator Shim, installed it, configured it, started and tested the shim server.  In this blog post, we will set up things on the Orchestrator side so that the workflow is ready to go when an alert is fired from vR Ops.

This blog post assumes you have familiarity with vRealize Orchestrator and some experience with importing workflows and working with the HTTP-REST plugin.  If these are new concepts for you, please don't be discouraged.  Some great references to get comfortable with Orchestrator are:

HOL-1721-SDC-5 - Introduction to vRealize Orchestrator
Blog from on using the REST plugin
Postman + vRO = HTTP-REST Plug-in Operations

Let's dive in.

Wednesday, November 30, 2016

The Self-Healing Data Center Part 3: Configuring and Testing the Orchestrator Translation Shim

So far, we've walked through installing the Translation Shims.  In this blog post we'll configure the Orchestrator shim for use.  The Orchestrator shim is but one of the handful of shims included in the solution.  More are being added via the community.  Participation is encouraged!

The Self-Healing Data Center Part 2: Installing the Translation Shims for Automating vR Ops Alerts

In the previous post, I explained some of the capabilities and limitations with vR Ops alert notifications for automation of alert notifications via the REST Notification Plugin.  I also introduced the Translation Shims for Log Insight and/or vRealize Operations Manager Webhooks as a solution to these current limitations.  By the way, as the name indicates, this solution works great with Log Insight webhooks as well!  In fact, it was originally created for that purpose and later vR Ops support was added.

Monday, November 28, 2016

The Self-Healing Data Center Part 1: Using vR Ops with vRO to Automatically Remediate Alerts

If you are a user of vR OPs, you know that it can monitor your infrastructure, server OS, applications and more.  But as this commercial suggests, monitoring is only part of the answer.  Wouldn't it be much better to have vR Ops attempt some simple fixes before giving up and calling for human intervention?

In this blog post series, I will explain how to activate a vRealize Orchestrator workflow based on a vR Ops alert to fix an issue instead of just alerting you.  First some background.

Wednesday, August 24, 2016

A Postman Collection for Upgrading vR Ops Endpoint Operations Agent via REST API

With the release of vRealize Operations Manager (vR Ops) 6.3 this week, I noticed that I had not updated the Endpoint Operations (EP Ops) agents running in my lab since version 6.1 and as a peer pointed out the 6.3 release notes specifically point out that you should upgrade the EP Ops agents to 6.3 before upgrading vR Ops.

As there is already a KB referenced in the release notes, I won't go into the "supported" way to do an agent upgrade.  Rather, in this blog post I wanted to show how I used Postman REST client to do the upgrades, as customers may wish to leverage something other than the provided Python script to perform the upgrades in bulk.  Using Postman, you can generate a number of different code snippets to use your favorite automation tool (js, Ruby, shell script, etc).

The upgrades for the supported method use the same API call - which is an "internal" API call (meaning, it's available but may be changed or removed in the future).

POST /internal/agent/upgrade

The body of the POST includes a payload with three elements (JSON example shown below).

  "agentId" : "1432528944061-6735281266450674401-1746278254068293921",
  "fileLocation" : "bundles/6.2.1",
  "agentBundleFile" : ""

Note that since this is an internal API endpoint, you need an additional header to permit the operation.

X-vRealizeOps-API-use-unsupported : True

Easy enough, and in the Postman collection I have created for you there are three REST operations that can be run together to perform an upgrade on a single agent.  First, grab the collection from the link below to import into your Postman client (assuming you have Postman installed already).

Upgrade EP Ops Agent Collection

Also, grab the vR Ops environment I have created for the variables used in the collection.


Import that environment into your Postman client and edit the following keys:

{{user}} = vR Ops user name
{{pass}} = vR Ops password
{{vrops}} = vR Ops FDQN or IP address

I'll come back to the other keys in a moment, but I want to explain why they are required.

As you can see below, the collection includes a GET for the agentID based on search against the FQDN or the agent's host system, then performs the update for the agent based on that ID and then finally does a check on the status of the update of the agent.

 Additionally, the POST Upgrade Agent operation has some parameters in the request payload.  As you can see, the values for "fileLocation" and "agentBundleFile" are based on env variables as well.  The "fileLocation" is the path under the following directory structure on the vR Ops virtual appliance:


That location is where you will place the "agentBundleFile" for the upgrade (available from the vR Ops download page).  By the way, if you have a cluster deployment then you must have the bundle files installed on each node in the cluster.

Now back to the environment keys you need to update.

{{agentFQDN}} = agent's host FQDN (case sensitive)
{{fileLocation}} = truncated path for bundle files
{{agentBundleFile}} = complete filename of agent bundle to use for the upgrade (OS/arch specific).

Example vrops environment values - the value for agentID is updated by the Postman tests script

Once you have the {{agentFQDN}} value set, you can run the collection.  The test script on the GET Get Agent Status on Upgrade will fail - and that is to be expected as the upgrade will take a few minutes to complete.  You can run that operation independently as often as you wish to validate the success of the upgrade.  The test is looking for "COMPLETED" in "agent_upgrade_status" within the response.  Other values, such as "IN_PROGRESS_DOWNLOADING", "IN_PROGRESS_UPGRADING" and "FAILED TIMED OUT" are not evaluated in the script I provide but be aware of these if you do create a bulk upgrade script to evaluate the upgrade state for remediation or logging.

Example of the agent_upgrade_status in the response body of the Get Update Status operation

This should give you a general understanding of how you can use the vR Ops REST API to upgrade EP Ops agents.