Friday, September 12, 2014

Automating a Multi-Action Security Workflow with NSX, vShield Endpoint and vCO

Through a joint effort with Hadar Freehling, one of my esteemed peers here at VMware, we co-developed a proof-of-concept workflow for a network security use case.  Hadar created a short video showing and explaining the use case, but in summary this is a workflow that reacts to and remediates a security issue flagged by third-party integration with NSX (in the video, TrendMicro is used but it could be any other partner integration with vShield Endpoint).  Basically, here's what happens:

  • a virus is detected on a VM and is quarantined by the AV solution
  • the AV solution tags the VM with an NSX security tag
  • NSX places the VM in a new Security Group, whose network policies steer all VM traffic through an IPS
  • vCenter Orchestrator monitors the security group for changes and when a VM is added
    • a snapshot of the VM is taken for forensic purposes
    • a vSpan session (RSPAN) is set up on the Distributed Virtual Switch to begin capturing inbound/outbound traffic on the VM
    • once the VM has been removed from the security group, the vSpan session is removed
Watch the video below for a walk-through by Hadar:



You will note that there is a portion of the workflow that is handled natively by NSX (Security Tag reaction, Security Group policy) but the snapshot and RSPAN are done via vCO workflow.

If you are interested in exploring this capability, I have provided the vCO workflow package for download.  This is provided as-is and you should fully test it (and modify as needed) before using in your environment.

To use the workflow package you will need (assuming you also have NSX, vShield Endpoint and some third party integration already set up):

- vCO 5.5.2
- the NSX plugin for vCO (installed and configured)
- the REST plugin with your NSX manager added as a REST host
- vCenter plugin configured

The workflow package includes a good number of "helper" workflows which you will not need to run directly.  The master workflow is in the root folder Security Reaction and is named "Set up VM Forensics RUN THIS" (just in case you had any doubt as to which one to run).

The Security Reaction Master Workflow

Running the master workflow will prompt you for three items -

  - The NSX Security Group to monitor.  This is why the NSX plugin is required, so that you can browse the vCO managed objects and locate the desired Security Group.
  - A time to sleep in seconds.  The master workflow will run continuously until manually stopped and will use a REST call to NSX to get the current membership for the Security Group.  We have no recommendation on this poll time, although in testing we used 5-10 seconds.  It would have been better to use some external event to kick off the vCO workflow but we could not find a way to do this from NSX.  It may be possible to do via the partner solution, but we wanted this workflow package to be "partner neutral."
 - Destination IPv4 address.  This is the destination for the RSPAN (or vSpan session in vSphere API terms).  The vSpan session is created with some defaults (for example sampling rate, normal traffic allowed, etc).  If you want to change any of those properties, you will need to modify the Helper workflow named "Configure encapRemoteMirrorSource vSpan Session on DVS" (modify the "Create Port Mirror" script task).

Also note that this workflow doesn't support VMs with multiple vNICs - specifically, it will only create an RSPAN that includes the first vNIC found on a VM.  You can modify the Helper workflow "Implement Forensics" and adjust the script task "Prep for Mirror Creation" so that the additional NICs (if any) are added to the sourcePorts array.  It's something we intended to fix but forgot about until after our final testing and video production - so as they say in the textbooks "this is left as an exercise for the reader."

Of course, there are many other actions that can be taken besides setting up an RSPAN and getting a snapshot.  This solution can be extended to practically any task required during such an event such as creating a ticket in your service desk software, spinning up additional workloads to replace the compromised VM, sending emails, guest OS file system operations... all of these and more can be accomplished using vCO in conjunction with NSX.

Hadar Freehling - @dfudsecurity - is a Security and Compliance Systems Engineer Specialist with VMware and jointly contributed to this solution and blog post.

Thursday, September 11, 2014

Install Gotcha: vCAC, Windows Server 2012 and the Guest Agent

If you are using Windows Server 2012 or later for your IaaS install it is recommended that you disable TLS1.2 on the IIS server.  From the vCAC 6.1 install guide (IaaS Windows Server Requirements):


For certificates using SHA512, TLS1.2 disabled on Windows 2012 machines

I have found that if you use self-signed certificates, you will absolutely need to follow this requirement - otherwise you will have deployments that utilize the Guest Agent stuck at "CustomizeOS" state and never finish deployment.  The Guest Agent start up script uses OpenSSL to grab the IaaS server certificate and this fails for self-signed certs over TLS1.2.

The security protocol settings are available in the registry only.  Fortunately, you can use this handy utility to manage your protocol settings on IIS instead of hunting through the registry.  Or, if you like, refer to Microsoft KB 245030 for the officially supported method.  Essentially, both will change the reg key as shown below....




Monday, September 1, 2014

Using a Cloned VM as a SQL Server - Gotcha for vCAC Install

Installing the newest version of vCAC in a lab, I ran into an issue I thought only I would encounter - turns out a peer ran into the very same issue a couple of days later so I thought I would post the problem and solution.

In our case we were both installing an new (as yet unreleased) version of vCAC with a separate SQL server for the IaaS database component.  The IaaS Windows server and SQL server were cloned from the same base image.  By the way, this issue isn't related to vCAC or a particular version - you could really see this with other products.  It's a known issue with MSDTC and VM clones.

The installation of the IaaS component goes fine, you can even configure the tenant, add fabric groups, vSphere end points, business groups - but then things get weird.  You will likely see that in your vSphere reservation, the memory, storage and network are basically empty - like nothing has been collected.  In fact, if you go and look at the collection status for the compute resource you will see that the Inventory and State collections are not even showing up as configured (neither "on" nor "off").

Finally, you will see these type of entries in the IaaS vCAC server log (also can see this in the vCAC UI under Infrastructure > Monitoring > Log) -

CollectedDataImportService: Ignoring exception: Error executing query usp_SelectManagementEndpoint Inner Exception: Error executing query usp_SelectEntityProperties
Error processing ping response Error executing query usp_SelectAgent Inner Exception: Error executing query usp_SelectAgentCapabilities
DataBaseStatsService: ignoring exception: Error executing query usp_SelectAgent Inner Exception: Error executing query usp_SelectAgentCapabilities

What has happened is due to the clone of the VM for both IaaS and SQL.  If MSDTC has already been installed, then both VMs will have the same GUID for their MSDTC nodes and the communication will fail.  This assumes you don't have other issues such as firewall configuration problems between the two VMs.  

To correct this, simply uninstall and re-install MSDTC on one of the VMs (I did this on the IaaS server) and restart the affected service (for example vCAC Server service on IaaS or SQL Server).  From an elevated command prompt:

msdtc -uninstall
msdtc -install

Re-configure the MSDTC Security settings as you would for the IaaS install.

That should allow collections to run and your reservations will reflect the correct memory, storage and networking information.

UPDATE - you will need to make sure that MSDTC is configured on both the IaaS server and the SQL server for a distributed install.  (Thanks to Steve Kaplan for pointing this out)

Thursday, August 14, 2014

How to Modify or Disable the Idle Session Timeout on Application Director 6

The Application Director UI is set by default to log you out after 60 minutes of inactivity.  This is a good security practice, but you may wish to modify this time to a longer or shorter period - or even disable it.

As described in an earlier post on modifying the session timeout for the vCAC 6.0 UI you simply need to modify the following file on your AppD virtual appliance. 

Edit the file

/home/darwin/tcserver/darwin/webapps/darwin/WEB-INF/web.xml

and find the following section


The timeout value is in minutes.  Changing it to -1 effectively disables the idle timeout.

WARNING - always back up your AppD appliance when making changes.  Also understand that this is not a supported configuration change so use at your own risk.

Monday, July 14, 2014

vCAC Inflate a Thin Disk

I had a customer contact me this week to ask about a vCAC custom property setting that didn't seem to be working.  The background is, they wanted to have all templates staged as thin provisioned but on deployment they would like them to be thick.

**UPDATE** Turns out that the custom property below does work for me, in my lab.  I had placed it initially in the storage property set of the blueprint, instead of as a blueprint property.  So, at least for me, it does work but the solution below may be helpful for other use cases (like a resource action to allow a machine owner to inflate a thin disk).

**UPDATE 2** Sorry that this issue is a moving target, but after looking at this with my customer it seems that the issue is related to Storage DRS in some way.  I'll update this post as I learn more.

What they expected (as I did) is that the custom property VirtualMachine.Admin.ThinProvision set in the blueprint with a value of "false" would deploy the machine's VMDKs as thick.  Just a side note, if you deploy from template in the vSphere client, you are given the option to select the virtual disk format (i.e. "Same format as source" or thick, thin).

However, it seems that this custom property only works with new disks that aren't already part of the template.  This is what my customer was experiencing - the OS drive was deploying thin but any drives added during request time were deployed as thick.

You can "inflate" a thin VMDK by browsing to it in the datastore browser and right clicking.  However, it occured to me that this could be used as a work around for my customer using vCO and the vSphere plugin.  So, I wrote a simple action that will inflate a VMDK if you provide the vSphere virtual machine object and the uuid of the VMDK - both bits of info are available using the vCAC Extensibility workflows in vCO.

Wednesday, May 14, 2014

A Couple of Gotchas Using Out of the Box Content in Application Director

I thought I would share these - not major issues if you know about them. During a recent POC these came to light and hopefully will save others some time and frustration if you are using any of the OOTB (out of the box) content (services, application blueprints, scripts, etc). As I find other gotchas I may add them to this list.

Service - Microsoft .Net Framework 4.0

THE PROBLEM - Service exits with a non-zero errorlevel causing deployment to fail.

DESCRIPTION - I attempted to use this OOTB service to install .Net 4.5, thinking I could just substitute the content property "DOTNETFX40_EXE" and the DOTNET_VERSION property with the newer version (which WILL work - that's not the bug).  However, what I discovered was that .Net installer returns a non-zero error level (for reboot required) and the INSTALL script is a somewhat elaborate cmd file that traps this error and attempts to exit with an errorlevel 0.

The reason for this is that you really want AppD to handle the reboots so that the deployment workflow is resumed properly when the system comes back up.

But, as good intended as this install cmd script is, it fails to exit with an errorlevel 0.  This is because of the way batch/cmd scripts handle vars.  Basically, the manipulation of the errorlevel works but once the conditional loop is closed the original errorlevel is reset, the script exits with a "non-zero" and the AppD workflow fails.

MY FIX - I basically removed the script's conditional loop for trapping the non-zero errorlevel and added "set errorLevel=0" as the last line in the script.  My script looks like this -

@echo off
if exist %WINDIR%\Microsoft.NET\Framework\%DOTNET_VERSION% (
  echo Found %WINDIR%\Microsoft.NET\Framework\%DOTNET_VERSION%, the .NET framework of interest appears to have already been installed.
  echo.
   echo Skip .NET 4.0 installation. 
) else (
    echo Installing .Net Framework.
    start "Install .NET Framework" /wait "%DOTNETFX40_EXE%" /q /norestart
)
echo Installation Completed.    
echo This lifecycle always reboots. Rebooting now...
REM log the error level returned by the .Net installer for troubleshooting
echo Errorlevel NOW set to %errorLevel%
REM always exit with a zero
set errorLevel=0

Service - vFabric tc Server v2.7.1

THE PROBLEM - tcServer does not install and deployment fails.

DESCRIPTION - The required properties of the service were provided (or catalog values used where appropriate).  However, the install would fail with the error that the EXTERNAL_TEMPLATE could not be found - and that property is NOT shown as required.  This property is referenced in the CONFIGURE script and you can see on line 36 of that script there's a conditional check for the property and if it's populated then it is used - otherwise, nothing is done.  I believe the problem is with the script itself in that a NULL value for that property isn't evaluated as intended.

MY FIX - Really just put in any value there.  I noticed that the OOTB jPetStore Blueprint sets the value to the darwin_global.conf path (as used in the global_conf property) and that seems to work just fine.  Of course, if you actually HAVE an external template that path should be used.  Ideally, you could modify the service and set the property there so you could just use the catalog value each time.

Monday, May 5, 2014

Use vCAC Static IP Without vCenter Customization Spec

You are probably aware that vCAC has a nice little IP Address Management (IPAM)  capability built in (referred to as "Static IP" in the documentation) that allows you to create IP pools and settings with Network Profiles that can be associated to Network Paths in your reservations.  If so, you're also aware that using this out of the box for VMware virtual machines requires you to use VM templates and vCenter Customization Specifications*.

However, deploying a Windows VM with a customization spec adds time and if all you really want is the IP address assignment it can be annoying to have to use a customization spec.  In fact, the reason I'm posting this information is because I had a request from a customer to speed up the provisioning time while still using the Static IP feature.  The use case was they simply needed to spin up a Windows server for quick QA and then destroy it.  Now, there are some other ways to accomplish this (snapshot/revert comes to mind) but it did get me to thinking of ways to avoid running vCenter customizations on Windows clones to speed up deployment.

One way to accomplish this is with vCenter Orchestrator (vCO).  This post assumes some knowledge of and experience with vCO, but you may be able to put this together without that background.  

In general, a VM deployed with Static IP will have the following machine properties set with the values from the Network Profile:

VirtualMachine.Network0.Address
VirtualMachine.Network0.SubnetMask
VirtualMachine.Network0.Gateway
VirtualMachine.Network0.PrimaryDNS
... and optionally ...
VirtualMachine.Network0.SecndaryDNS

There are other properties for networking, but for our use case we will leverage these specific property values via the vCAC Extensibility Workflows.  These are included with the embedded vCO instance and you will want to go ahead and set that up if you haven't already.  See this presentation for an overview and setup walk-through.

In addition, I will use the Guest Script Manager Package for vCO to run the configurations directly on the new VM's guest OS - note that this requires VMtools to be installed on the guest.



*Static IP is also supported for AutoYAST/kickstart using the guest agent.  

Saturday, March 29, 2014

Publish an Application Director blueprint to the vCAC Catalog

This post covers how to publish you VMware Application Director (AppDir) 6.x application blueprints to the vCAC 6.x catalog so that users can request them.  This post assumes you have a knowledge of AppDir blueprints and have configured AppDir 6.x with a vCAC 6.x cloud provider.

Choose the application and create a deployment profile as if you were going to provision the application from AppDir.  When you get to "Step 4: Review" you will notice that a "Publish" button is available.  This will allow you to publish the deployment profile to vCAC.


Clicking the "Publish" button will provide you a dialog to set the name and description for the catalog item in vCAC.



You may now save the deployment profile, there's no need to deploy.  From the vCAC 6.x interface, go to the Administration tab and select Catalog Items.  You will see your application there and you can configure it just as you would an IaaS or ASD service blueprint.  You probably want to set up another Service type for your applications.

Once configured and entitled, you can now request and provision the application from vCAC as well as view application deployment details once it is provisioned.



Sunday, February 9, 2014

vCAC - Automatically Manage Local Administrator AD Groups

In my last post, we covered adding the VM requester's AD account to local administrators on the guest.  This is a quick and dirty way to getting the machine requester up and running with their new VM.  However, many organizations prefer to use AD security groups for this kind of access.  In fact, if you use an AD group to control local admins for a Windows VM, then you can create actions for the provisioned VM so that the owner can assign local admin to whomever they wish.

In this post we will cover the following use case - a new VM is requested and as it is being provisioned, a new AD security group will be created in a designated OU with the name of the VM and some custom suffix (like "vmname-localadm").  The requester of the VM will be placed into this group by default and the new group will be added to local admins on the machine after it has been built and customized.

Wednesday, February 5, 2014

vCAC - Add VM Requester to Windows Local Admin Group

This is a request that I get frequently.  The person requesting a Windows VM needs to be a local administrator, so that after the VM is provisioned they can begin to access via RDP and perform tasks that require this level of access (install software, for example).

This can be accomplished using the Guest Agent for vCAC.  Installing the Guest Agent on the VM template allows vCAC to perform many post-build activities such as running scripts.  In this post I will show how you can use the Guest Agent to run a script that will add the requester of the machine to local administrators group.

Note: Post updated with a new script that accepts UPN (as provided by vCAC 6.0) or sAMAccount (as provided by vCAC 5.2).  Thanks to Sam Pursch for testing and suggesting the fix!

Thursday, January 30, 2014

Using vCAC Resource Actions

vCloud Automation Center 6.0 includes a new and easier way to extend the machine action menus so that you can add just about any type of operation to the list.  As you can see here, I've added an action to mount an ISO to a virtual CD drive on a VM.



What's involved?  Well, for starters you will need to have Advanced Services configured within you vCAC install.  If you have already set this up you can skip to CREATE A RESOURCE ACTION.

SETTING UP ADVANCED SERVICES FOR VCAC