Upgrading Cache in Nimble Array

One great feature of Nimble arrays is being able to upgrade resources without down time.  In one of our arrays the cache hit ratio was constantly below 80% and Nimble’s Infosight statistics showed we needed a minimum of 10% more cache than we currently had to keep up with all activity.  Fortunately we budgeted for this upgrade to make the unit a CS220G-X2 array (upgrading from 320 GB of SSD to 640).

The upgrade process is very simple.  Once the upgrade kit has been purchased and arrives on site the drives need to be replaced one at a time.  I’m usually full speed ahead on stuff like this but I had my reservations in the process.  I know all of the data held in cache is written to the SATA drives as well but there’s always a part of me thinking the system will come crashing down.

While watching the console of the array I removed the SSD drive from slot 7 and replaced it with a new 160 GB drive.  After refreshing the Management/Array page a couple times, and not seeing the new drive, I took a look at the Events page.  It took just over 2 minutes for the system to see the new drive and then display “The CS-Model has changed. The model is now CS220G-X2.”  The Array page now showed the correct drive in slot 7 and the additional 80 GB of cache for the system.  I waited 3 minutes to replace the drive in slot 8 and it showed up within a minute on the Array page.

In doing research on Nimble arrays and passing their SE Certification exam I knew data went to cache only on reads/writes and only does a pre-fetch of blocks needed to complete current read requests.  This means the new drives didn’t automatically “rebuild” or add the data back to them that was on the old drives.  This also means there’s a substantial performance hit if all 4 drives are replaced within a short period of time.  I browsed to the performance tab to see how bad it was.  The first drive was replaced at 3:00 PM (which is a slow time for this array).  You can see how much data had to be pulled from SATA (lower cache hit means more data pulled from SATA).  I took the following screen shots to show the next 5 hours and then later in the morning when the cache hit ratio finally went up.

CacheHit01 CacheHit02 CacheHit03

After seeing the very low levels, and watching activity lights on each drive, I decided to wait until the next morning to replace the last two drives.  This gave a couple disk intensive operations that run overnight a chance of being in cache and anything not would be added to the new drives (hopefully).  I then replaced the last two drives the next morning.  I again saw the same cache hit characteristics as above.  Also, since these drives are not in any form of RAID with each other they don’t have to be replaced at the same time.

Depending on the load of the system it can take a few days for the cache hit ratio to stay above the wanted 80% threshold.

VCD & Storage Clusters with Fast Provisioning Enabled

There are many different design options available when deploying vCloud Director, which makes it both flexible and confusing at the same time.  One topic I wanted to touch on is configuring storage for an Organization with Fast Provisioning enabled.  In setting up an environment for software development the provisioning speed and repetitive build requirements made fast provisioning a must.  While testing multiple setups some issues arose with each setup but one was a clear winner.

More background on system/process requirements:

–  1 Organization
–  1 Org. Virtual Datacenter with Fast Provisioning & Thing Provisioning enabled
–  VCD 5.1.1
–  vCenter 5.1.0 (shared with other non-VCD clusters)
–  4 ESXi 5.1 Hosts
–  4 datastores on same array
–  2 vApp Templates – each are chain length 1 and on different datastores.
–  PowerCLI script used to deploy 15 vApps from each vApp Template

One main concept to remember is automated continuous improvement builds/software tests are running multiple times each day based off of these parent templates. Some builds get captured back to catalogs for a couple days when done and others are deleted right away.  The goal is to balance storage usage and performance with time to deploy as well as eliminating as much infrastructure administration as possible.

Setup #1 – *(Any) storage profile, no storage cluster.  This is the “out-of-box” setup and works well if you only have one compute cluster and all hosts use all datastores.

Pros:  When importing or creating vApps VCD places VM’s on datastores as it sees fit and does a pretty good job.  Fast-provisioned vApps use the same datastore as the parent VMDK by default and will create shadow-copies when running out of space.

Cons:  When this isn’t the only cluster within your vCenter VCD will see and monitor all other datastores, even ones not seen by the cluster in use (including ESXi local datastores).  It will send alerts on datastore usage thresholds for them as well.

Setup #2 – Two storage profiles,  one storage cluster per profile.  The thought was to separate storage clusters and profiles to specific hosts within cluster and only license part of the cluster for MS Datacenter 2012, in turn saving lots of MS licensing costs.

Pros:  Storage DRS does a great job of load balancing VM placement upon creation. It also is VERY handy when evacuating a datastore.  Multiple storage profiles allows you to place VM’s within a vApp on different datastores.  This helps reduce software costs as you could place Windows VM’s on one set of hosts and Linux VM’s on another set so you don’t have to buy MS Datacenter licenses for all hosts within the cluster.

Cons:  All VM’s get deployed to the DEFAULT storage profile within the Organization, no matter which storage profile the parent is on.  A shadow-copy VM is created for this to happen, which takes much longer than the standard linked-clone does.  Also, with storage clusters a parent VM gets a shadow-copy to all datastores in the cluster as it gets used more often. This is due to the placement algorithm of SDRS.  This is great for non linked-clone VM’s but defeats the purpose of the Fast Provisioning feature.  We tried to script this but had issues and a lot of VCD UI users wouldn’t know how to follow this properly.

VM disk layout for this setup is just like that of setup #3 below. You can see screenshots from VMware lctree fling of this below.

Setup #3 – One storage profile, one storage cluster.

Pros:  Same as setup #2 but all datastores are presented to all hosts and in the same storage cluster.

Cons:  As time goes by a Shadow VM is created on every datastore within the cluster (other than the datastore where the parent resides) for each vApp Template.

Here are the datastore layouts after 15 copies of the vApp Template based on the first datastore are created.  Notice the Shadow VM’s on datastore 02 and 03.

1SP-1SC_01 1SP-1SC_02 1SP-1SC_03

After creating 15 copies of the vApp Template on datastore 02 you now have Shadow VM’s for it on datastore 01 and 03.

1SP-1SC_11 1SP-1SC_12 1SP-1SC_13

Looking at the vApp Templates within the Catalogs view it lists 2 Shadow VM’s for each vApp Template.


Setup #4 – One storage profile, no storage cluster.

Pros:  When importing or creating vApps VCD places VM’s on datastores as it sees fit and does a pretty good job.  Fast-provisioned vApps use the same datastore as the parent VMDK by default and will create shadow-copies when running out of space.

Cons:  Can’t use SDRS or change the Storage Profile of a VM to evacuate a datastore.

This is the desired layout once 15 vApps are deployed from each vApp Template.

1SP-noSC_01 1SP-noSC_02

The last setup seems to work best for this environment.  Within Lab Manager I would put the Library entry on multiple datastores, check all datastores to see which one has the most free space and use the corresponding entry for that build.  With setup #4 VCD will take care of this issue for me on new build process.  I can still run out of space with Fast-Provisioning though if I don’t have alerts setup.

If you have any questions about this please ask them below.  Again, this setup is way different based on needs than most VCD environments are.