vCD Expired VM Cleanup Fails

While renewing a vApp from Expired Items in an Organization I decided to cleanup a couple vApps which expired over 4 months ago. I deleted them and didn’t think much of them until I was working in vCenter an hour later.  I noticed recurring tasks trying to power off the VM’s that belonged to these vApps.

Looking at the VM’s they were suspended.  For this to happen the owners ignored the running lease time at which time the VM’s were suspended and the vApps stopped.  When the storage lease time expired the vApp went to Expired Items but the VM was still suspended.

Now that I deleted the vApp the process to remove it from vCenter must only check if it is not powered on.   Since it is suspended the power off task fails and vCD keeps trying this process. All I had to do to correct this was power on the VM and in a couple minutes vCD powered it off and cleaned everything up.

I was thinking I could write a PowerCLI script or vCO workflow to check for this and fix the VM’s.  You wouldn’t want to fix all “Suspended” VM’s as a user might want to suspend them on their own.

“Fence” vApps or use Internal vApp Networks

Over a year ago we started looking at vCloud Director as it was slated to replace Lab Manager.  The beauty of Lab Manager for us was we could clone the same VM’s thousands of times and have them “fenced” from each other for software development, training, automated testing, and supporting clients. We could do this very rapidly and save a lot of disk usage costs with Linked-Clones as well.

We ran into an issue with some of our software packages that didn’t like the IP address or hostname changing during the guest customization process though.  Another issue was the confusion created by the VM using IP’s from the Physical Network pool and and the virtual routers using the same pool for NAT’d access to the VM’s.  We worked around all of this by using Network Templates for the VM’s and connecting those to the Physical Networks upon deployment.  This way the VM’s have the same IP (192.168.x.x) and MAC address and they are still “fenced” from others VM’s on the external network (10.x.x.x).

It took awhile to figure this out in vCloud Director when we were first testing it.  After a couple days of reading posts and numerous tests I figured out how to mimic this setup.  I thought I’d share as I’ve answered this question numerous times on the communities pages.

Whether modifying a current vApp or manually creating a new one you’ll select “Add Network” to create a new network for this vApp.

vAppNet-1

You can use the default 192.168.2.0/24 network if you wish and just add DNS information or you can change the settings to your liking.

vAppNet-2

Name the new network and finish it’s settings.

vAppNet-3

The network setup will now look like this. You have the option to automatically use IP’s from the pool you created, manually assign and IP from the subnet, or use DHCP from the vShield system. Click Next when done as we still need to connect this internal network to an Organization Network.

vAppNet-4

Click “None” in the connection column and select the proper Organization Network. Decide if you want to use NAT only or add firewall settings as well.  You can also have this vApp keep the same IP it gets from the Organization Network if you’d like. We don’t do this so IP’s are not reserved when not in use.

vAppNet-5

At this point we can capture this vApp to a catalog but there’s a major gotcha.  In order for the internal network to be saved you must select “Make identical copy” on the capture screen. When someone adds this vApp Template to their cloud they won’t have to go through this process.  They may have to change the connection between the internal network and organization network depending on their needs and if deployed across organizations.

vAppNet-6

vApp Template not capturing NIC’s “IP POOL” address setting

*** UPDATE ***   The fix has been added to the 5.1.2 release that is now GA.  Read the release notes for other fixes here… https://www.vmware.com/support/vcd/doc/rel_notes_vcloud_director_512.html

While setting up and testing a couple new vApp Templates for a new organization I noticed an issues when capturing to the catalog and using IP Pool for IP addressing of the VM’s.  vApps deployed from the new vApp Templates where not pulling an address from our IP Pool setup within our External Network.  I messed around with the VM inside the vApp I used to capture the vApp Template from.  I thought it had something to do with the new UDEV stuff in RHEL 6.  It looked correct so I tested by converting it to a Template within vCenter and doing a Guest Customization deployment from it.  The VM was working correctly.  We are currently on version 5.1.0.810718

I finally looked closer at the vApp Template and noticed the NIC for the VM was set to DHCP instead of IP Pool.

Image

I checked some vApp Templates I’d captured while on vCD 1.5.0 and they were OK.  I checked another from when we were on 1.5.1 and they were set to DHCP so something changed at that point.  We don’t use these templates very often and weren’t on vCD 1.5.1 very long so no one complained.

Doing a quick search I came across this post on the communities page.  There’s a SQL query to find the NIC for the VM inside the vApp Template based on Catalog and vApp Template name.  I had 11 templates to update so I modified it to use variables for the catalog and template name.

DECLARE @Catalog varchar(80)
DECLARE @vAppTemplate varchar(80)

SET @Catalog = 'catalogName'
SET @vAppTemplate = 'templateName' -- May need % at the end in case vApp Template was copied/moved. May get multiple returns though so be careful.

select nic.nic_id, nic.mac_address, nic.ip_addressing_mode from network_interface as nic inner join vapp_vm as vm on nic.netvm_id = vm.nvm_id where svm_id=(select svm_id from vapp_vm where vapp_id=(select entity_id from catalog_item where name like @vAppTemplate and catalog_id=(select id from catalog where name = @Catalog)))

When running the above query you should get the following output:

sqlQueryResponsePre

Change the last line of the code to the following and run again.

update network_interface set ip_addressing_mode = 1 where nic_id = (select nic.nic_id from network_interface as nic inner join vapp_vm as vm on nic.netvm_id = vm.nvm_id where svm_id=(select svm_id from vapp_vm where vapp_id=(select entity_id from catalog_item where name like @vAppTemplate and catalog_id=(select id from catalog where name = @Catalog))))

It should update 1 row.  If you run the original query again and look at the NIC properties of the VM within the vApp you will now see the following:

sqlQueryResponsePost

vAppTempNicPost

Deployments will now pull from IP Pool as intended.

Per to communities post above there is a hotfix coming out soon for those who submit an SR. Otherwise it will be fixed in the next maintenance release.