This post is about why and how to monitor VM Location on a vSphere Metro Storage Cluster with vRealize Operations Manager. Stay curious!
What is meant by VM Location?
First things first. Before we go in to the details, what is meant by vm Location?
VM Location basically combines Compute and Storage. It means, where does the vm run (Host) and where does the vm has its data (Datastore).
Why should I even care about the VM Location in an vMSC?
To answer this question, we need to have a closer look at the behavior of a vSphere Metro Storage Cluster. I will explain this topic just briefly since there are many good resources out there which are describing this topic in more detail. However, lets have a quick look at it! What types of vSphere Metro Storage Clusters are existing? By design there are 2 types of clusters configurations possible:
- Non-Uniform Cluster
- Uniform Cluster
Now, what does that mean? Basically it describes only how the hosts are attached to the storage arrays, either via physical cabling, via FC zoning or via LUN masking. As seen in Diagram 1, with a Non-Uniform Storage Metro Cluster, the hosts get connected only to the “local” Storage Array in each datacenter:
In a Uniform Cluster, the Hosts get attached to both Storage Arrays in both datacenters and can therefore request resources from all arrays.
As you may already have recognized, in a Non-Uniform cluster configuration, the storage solution has to provide the desired LUNs or datastores in an active/active fashion. This means, each LUN pair has to be read and writable at the same time on both sites since the hosts are only connected locally. Famous Vendors who are supporting this configuration are for example DELL/EMC with SRDF/Metro on their VMAX systems and Hitachi with their GAD (Global Active Device) technology on their G-series systems. In comparison, a Uniform cluster configuration can basically provide his data either active/active or active/passive as all hosts are connected to all arrays. Nevertheless it is of course always depending on what your Storage vendor supports. In general, HPE is a famous vendor for providing his Storage Metro Cluster solution with a uniform configuration but DELL/EMC and Hitachi are supporting this configuration with their arrays as well.
A word of caution here. Especially in an active/active uniform cluster where the host can write and read from both arrays, you should control your data flow with an intelligent multipathing solution and try to keep traffic as local as possible as you don`t want end up with having a lot of unnecessary traffic on the ISL`s (inter switch links) between your datacenters. In an active/passive uniform configuration you need to control the data flow by placing your vm`s on to the datastores which are active in this specific datacenter in order to avoid cross site traffic.
But with that said, apart from unnecessary cross site traffic, it doesn`t really matter whether you have a non-uniform, a uniform, an active/active or an active/passive cluster configuration in terms of availability, as each LUN by design has a preferred location (one of the storage arrays) on which it stays active in comparison to the non-preferred location in case of a split brain scenario to avoid data corruption. This means, whenever a Split brain scenario occurs where the two storage arrays no longer can synchronous replicate data and you have vm`s running on hosts in datacenter 1 for instance, but they have their disks on datastores which have their preferred location in datacenter 2, the hosts on which the vm`s are running, receive a PDL (Permanent Device Lost) SCSI condition and the vm`s immediately die. To avoid such outages, you should have your vm`s on datastores which have their primary or preferred location in the same datacenter in which you have the vm`s running.
How can I monitor wrong placed VMs?
Actually, it is pretty simple to accomplish this with vROps. Simply create a list view and select “virtual machine” as subject. Add “Host” and “Datastore” as Data Attributes:
Afterwards, go to “Filter” and build a query like the following one. In my Lab, my ESXi Hosts always begin with the prefix “ESX10” for the ones in Datacenter 1, and “ESX20” for the ones in Datacenter 2. The same applies to the Datastores. Therefore I just created a query which shows me VMs that run on a Host in Datacenter 1 and have their disks in Datacenter 2 and vice versa:
Out of this you get a nice and comprehensive list of vm objects which are split across two datacenters:
In order to prevent your vm`s from possible outages and to optimize your storage traffic in an active/passive uniform metro storage cluster configuration, my recommendation is definitely to keep an eye on vm location. It is extremely simple to do that with vRealize Operations Manager, but of course, beside just monitoring it, you could also implement some mechanisms (scripts or workflows) to automatically move wrong placed vm`s.