Monday, December 5, 2016

The Self-Healing Data Center Part 5: Configure an Alert to Trigger the vRO Workflow

Now that we have configured the Translation Shim and vRealize Orchestrator the next task before testing is to configure vR Ops to send an alert to the shim.  This will be the final post in this series and when you complete this, you will be ready to apply this solution to automate any alerts you desire by simply creating the appropriate workflow and alert settings.

I am using vR Ops version 6.4 in the steps below, but this should work with any 6.0 or higher version of vR Ops.  We are also using Endpoint Operations to monitor the state of a service on a Linux OS.  Endpoint Operations is NOT required to use this shim, I am only using that because it provides a way to easily trigger an alert by stopping a service on a monitored OS.  It also shows that any automated remediation or activity is possible, not just automation of virtual infrastructure.

Saturday, December 3, 2016

The Self-Healing Data Center Part 4: Configuration of the vRO Example Workflow

So far we have discussed the reasoning behind the Translator Shim, installed it, configured it, started and tested the shim server.  In this blog post, we will set up things on the Orchestrator side so that the workflow is ready to go when an alert is fired from vR Ops.

This blog post assumes you have familiarity with vRealize Orchestrator and some experience with importing workflows and working with the HTTP-REST plugin.  If these are new concepts for you, please don't be discouraged.  Some great references to get comfortable with Orchestrator are:


HOL-1721-SDC-5 - Introduction to vRealize Orchestrator
Blog from vcoteam.info on using the REST plugin
Postman + vRO = HTTP-REST Plug-in Operations

Let's dive in.

Wednesday, November 30, 2016

The Self-Healing Data Center Part 3: Configuring and Testing the Orchestrator Translation Shim

So far, we've walked through installing the Translation Shims.  In this blog post we'll configure the Orchestrator shim for use.  The Orchestrator shim is but one of the handful of shims included in the solution.  More are being added via the community.  Participation is encouraged!

The Self-Healing Data Center Part 2: Installing the Translation Shims for Automating vR Ops Alerts

In the previous post, I explained some of the capabilities and limitations with vR Ops alert notifications for automation of alert notifications via the REST Notification Plugin.  I also introduced the Translation Shims for Log Insight and/or vRealize Operations Manager Webhooks as a solution to these current limitations.  By the way, as the name indicates, this solution works great with Log Insight webhooks as well!  In fact, it was originally created for that purpose and later vR Ops support was added.

Monday, November 28, 2016

The Self-Healing Data Center Part 1: Using vR Ops with vRO to Automatically Remediate Alerts

If you are a user of vR OPs, you know that it can monitor your infrastructure, server OS, applications and more.  But as this commercial suggests, monitoring is only part of the answer.  Wouldn't it be much better to have vR Ops attempt some simple fixes before giving up and calling for human intervention?

In this blog post series, I will explain how to activate a vRealize Orchestrator workflow based on a vR Ops alert to fix an issue instead of just alerting you.  First some background.

Wednesday, August 24, 2016

A Postman Collection for Upgrading vR Ops Endpoint Operations Agent via REST API

With the release of vRealize Operations Manager (vR Ops) 6.3 this week, I noticed that I had not updated the Endpoint Operations (EP Ops) agents running in my lab since version 6.1 and as a peer pointed out the 6.3 release notes specifically point out that you should upgrade the EP Ops agents to 6.3 before upgrading vR Ops.

As there is already a KB referenced in the release notes, I won't go into the "supported" way to do an agent upgrade.  Rather, in this blog post I wanted to show how I used Postman REST client to do the upgrades, as customers may wish to leverage something other than the provided Python script to perform the upgrades in bulk.  Using Postman, you can generate a number of different code snippets to use your favorite automation tool (js, Ruby, shell script, etc).

The upgrades for the supported method use the same API call - which is an "internal" API call (meaning, it's available but may be changed or removed in the future).

POST /internal/agent/upgrade

The body of the POST includes a payload with three elements (JSON example shown below).

{
  "agentId" : "1432528944061-6735281266450674401-1746278254068293921",
  "fileLocation" : "bundles/6.2.1",
  "agentBundleFile" : "agent-x86-64-win-6.2.1.zip"
}

Note that since this is an internal API endpoint, you need an additional header to permit the operation.

X-vRealizeOps-API-use-unsupported : True

Easy enough, and in the Postman collection I have created for you there are three REST operations that can be run together to perform an upgrade on a single agent.  First, grab the collection from the link below to import into your Postman client (assuming you have Postman installed already).

Upgrade EP Ops Agent Collection

Also, grab the vR Ops environment I have created for the variables used in the collection.

vrops.postman_environment.json

Import that environment into your Postman client and edit the following keys:

{{user}} = vR Ops user name
{{pass}} = vR Ops password
{{vrops}} = vR Ops FDQN or IP address

I'll come back to the other keys in a moment, but I want to explain why they are required.

As you can see below, the collection includes a GET for the agentID based on search against the FQDN or the agent's host system, then performs the update for the agent based on that ID and then finally does a check on the status of the update of the agent.


 Additionally, the POST Upgrade Agent operation has some parameters in the request payload.  As you can see, the values for "fileLocation" and "agentBundleFile" are based on env variables as well.  The "fileLocation" is the path under the following directory structure on the vR Ops virtual appliance:

/usr/lib/vmware-vcops/user/plugins/inbound/agent_adapter/conf/plugins/agent_plugins/

That location is where you will place the "agentBundleFile" for the upgrade (available from the vR Ops download page).  By the way, if you have a cluster deployment then you must have the bundle files installed on each node in the cluster.

Now back to the environment keys you need to update.

{{agentFQDN}} = agent's host FQDN (case sensitive)
{{fileLocation}} = truncated path for bundle files
{{agentBundleFile}} = complete filename of agent bundle to use for the upgrade (OS/arch specific).

Example vrops environment values - the value for agentID is updated by the Postman tests script

Once you have the {{agentFQDN}} value set, you can run the collection.  The test script on the GET Get Agent Status on Upgrade will fail - and that is to be expected as the upgrade will take a few minutes to complete.  You can run that operation independently as often as you wish to validate the success of the upgrade.  The test is looking for "COMPLETED" in "agent_upgrade_status" within the response.  Other values, such as "IN_PROGRESS_DOWNLOADING", "IN_PROGRESS_UPGRADING" and "FAILED TIMED OUT" are not evaluated in the script I provide but be aware of these if you do create a bulk upgrade script to evaluate the upgrade state for remediation or logging.


Example of the agent_upgrade_status in the response body of the Get Update Status operation

This should give you a general understanding of how you can use the vR Ops REST API to upgrade EP Ops agents.





Monday, August 22, 2016

vR Ops Alert "VMware Virtual Data Service" Is Not Available - What to Do?

Running vR Ops in my home lab and noticed that a couple of VMware services were down thanks to the EPOPS agent.  One of those was sort of a mystery, the "VMware Virtual Data Service" which I will shorten to "vdcs" for this post.


Turns out that this service is responsible for the Content Library and Transfer services.  When you look for the service within the vSphere web client you won't find it listed under a friendly name but rather by the "vdcs" abbreviation (the full name is com.vmware.vdcs.cls-main).

I attempted to start this service from the web client and it didn't return an error in the UI but when I refreshed, it was still not running.  So, off to the logs!  The log file for this service can be found in /var/log/vmware/vdcs/wrapper.log and there I saw the problem!

ERROR  | wrapper  | 2016/08/22 19:28:26 | 4728 pid file, var/log/vmware/vdcs/vm  ware-vdcs.pid, already exists.

FATAL  | wrapper  | 2016/08/22 19:28:26 | ERROR: Could not write pid file /var/l  og/vmware/vdcs/vmware-vdcs.pid: Inappropriate ioctl for device

So, I have an orphaned PID file.  Just to validate, I run

ps -ef | grep vmware-vdcs

to make sure that process isn't running.  And then I backup the current PID file, delete it and start the vdcs service:


And give it 5 minutes for vR Ops to check on things.  My error is now cleared!


Fortunately, I have vR Ops to report this - otherwise I would not have known that this service was down until I needed it to be up!