Exceptionally, the today’s post won’t be deep technical nor very long and is therefore more suitable for people who are new to the concepts of vSAN and vRealize Automation (vRA)/vRealize Orchestrator (vRO). Furthermore, I won’t share configuration details but rather focus on the conceptional/logical part of the use case. Hence the today’s post it intended to give you some ideas about the capabilities of the products and in the best case act as an inspiration for your environment or lead to a POC if it awakens your interest 😉 Alright, lets start after reading my Disclaimer…
Whoever was, is or at least sits close to a virtualization-admin knows the pain of building and maintaining virtual machines or containers based on customer requests when it comes to availability and performance requirements. Let me give you a short but for most admins known example:
An application owner requests a virtual machine. To the question whether it is a productive virtual machine and whether it needs to be high available, the answer is “no” as apparently the VM is supposed to be only a test/dev machine. Roughly a half year later, the same application owner approaches you and asks why the application wasn’t available during a recent outage in one of the datacenters (I think you know where I’m going to…). Obviously, the former test machine became a productive virtual machine with productive workload on it. However, since nobody was aware of it, of course, the machine got never reconfigured and adjusted properly to the new requirements. Apart from that, the reconfiguration of this particular virtual machine wouldn’t have been easy neither as one had to move the virtual disks to another Datastore/LUN which is mirrored and depending on whether it is a stretched cluster or not, adjust the DRS rules accordingly or update the DR solution.
As you might already have recognized, there are two major challenges in the above stated scenario. One of them is a technical whereas the other is more a process related one:
- Process: In many companies, the whole server administration happens behind “closed doors”. The vi-admins deploy and manage virtual workload based on some initial requirements without exposing any further information to their “customers”. This mostly leads to operational uncertainties and sometimes to miscommunication between the different stakeholders.
- Technical: In traditional environments, you mostly end up with having dozens or even hundreds of LUNs with different capabilities like availability, performance and so on and so forth. You have to monitor space usage of each individual LUN/Datastore or Datastore Cluster. Furthermore you have to take care that the DR solution or the DRS rules are up to date and aligned to the VM storage placement.
Alright, enough of problems, lets have a look at how we could solve such a challenge in an elegant fashion. For solving the process related challenge, imagine one would hand over part of the control/responsibility to the customers (application owner) in form of a self service portal where they can request and modify the characteristics of their virtual workload whenever they want. At this point I’d also like to mention that of course this would go along with some approval mechanisms as I know how much admins like to stay in control 😉 With that, we would already solve the process problem with the increase of visibility over customer workload. But now lets take a look at how we could further simplify the whole process and also solve the technical part.
To reduce the complexity of storage/disk management, we could introduce Storage Policy based Management (SPBM) provided by vSAN or VVols (we’ll have a closer look at this concept shortly). This concept allows the configuration of the above stated storage availability and performance characteristics on a virtual disk level without the need of moving machines around or managing storage arrays and probably a fibre channel infrastructure. Last but not least, we’re going to automate all infrastructure related actions necessary in order to further simplify and accelerate the whole process.
Solution Components & Architecture
Okay, as we took a look at the conceptual part of the use case, lets proceed with how the logical part, the different solution components used as well as the high level architecture looks like. In general, in this particular showcase we are assuming to have 2 datacenters available which enables us to build a stretched cluster across them. Of course, one can adopt the basic concept also for non stretched clusters. However, the assumption for this scenario is that you have a functional vSphere Metro Cluster with vSphere HA enabled for proper datacenter failover.
The next solution component is to use VMware vSAN as the storage provider of choice. I won’t explain the technology behind it as there are more than enough resources our there you can check out ( I’d recommend https://storagehub.vmware.com/t/vmware-vsan).
However, as mentioned in the introduction of this section, the underlying infrastructure presents itself as a vSphere Metro Cluster and gets supplemented with a vSAN Streched Cluster as shown in Figure 1:
As indicated, we have two datacenters with a few hosts in each and a witness appliance at a third location to ensure proper failover and split brain scenario handling.
Now, as the infrastructure level got set, we can proceed and have a look at the coolest part of the whole use case which is Storage Policy based Management. I won’t go in to too much details here neither, but let me give you a brief description of the benefits.
The SPBM framework allows you to equip single virtual machine storage object like a vmdk disk for example with a certain set of storage capabilities (availability, performance etc.). These capabilities can get configured by yourself or can get consumed from a VASA Provider (vStorage APIs for Storage Awareness). Furthermore, the SPBM framework is accessible trough different types of APIs like powercli, RestAPI etc. in order to automate and maintain the configuration:
With vSAN (or VVOLs) you don’t set these capabilites on a LUN level anymore, but instead on a VM storage object level. That means, instead of creating LUNs with a defined set of capabilities, you simply create policies and assign them to the desired set of objects.
To get a little bit more practical here, lets do a quick example; After you’ve set up the vSAN streched cluster, you can create one or more policies depending on your needs. For example we create one policies which says that the objects assigned to it needs to get mirrored to the other datacenter whereas the next policies we create should have its objects only on one side (a full list of possible parameters is available here: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-08911FD3-2462-4C1C-AE81-0D4DBC8F7990.html).
Alright, before we continue, lets do a short recap. We have a fully functional vSphere Storage Metro Cluster which can fail over individual VMs with the help of vSAN and vSphere HA in case of a datacenter outage. With the help of the SPBM framework, we can choose which virtual machines can get failed over in case of an outage.
vRealize Automation / vRealize Orchestrator
Now, since all the preparations on infrastructure level have been made, we can now proceed with the automation/orchestration part of the showcase. Of course, at this point you can already do basic automation with powercli or with the use of the RestAPI with a scripting language of your choice, but as we want to expose part of the control and responsibility to the customers, the help of a powerful self service portal is needed. At this point, I’d like to introduce you to vRealize Automation and vRealize Orchestrator. Once more, I won’t go in to details, but provide a basic introduction for better understanding.
vRealize Automation (vRA) in conjunction with vRealizie Orchestrator (vRO) which is a part of vRA but can also get deployed and used independently, represents a powerful bundle which offers not only a rich set of automation and orchestration capabilities, but also a tenant aware portal which allows you to reflect your business organization and offer a wide variety of services.
To become a little bit more precise, we’ll use vRA to build a virtual machine blueprint consisting out of a virtual machine template and some properties and present this as a request-able item along with a form to configure the desired HA capabilities within a service catalog on a web portal. vRO does then the post configuration such as assigning the correct storage policies, updating the DRS group memberships etc. after the VM has been deployed. Following a brief explanation of the process:
1. VM gets requested via the vRA service catalog.
2. vRA deploys the VM according to the blueprint and configuration specifications defined in the request form and sends necessary meta data to vRO with the help of an event subscription.
3. vRO receives all necessary deployment information and triggers the corresponding workflow to assign the storage policies defined in the order form. Additionally vRO adjusts the DRS rules and other properties accordingly.
4. vSAN or more precisely the CLOM (Cluster-Level Object Manger) instructs the other vSAN processed to distribute storage objects as defined the storage policy so that the virtual machines becomes compliant.
The form that we’ve created in the lab is pretty basic and should only serve as an example but as you can see in Screenshot 1, the customer can select whether the ordered machine should be high available and on which site it should run:
Of course, depending on the knowledge level of your customers you can simplify the form even further and allow only to select a service level for instance.
As you could see, the customer can easily request a virtual machine according to his/hers requirements and vRA/vRO as well as vSAN take care of configuring and enforcing them. A later change of these characteristics is also straight forward as one only have to change the storage policy of the virtual machine and adjust the DRS rules with a workflow within vRA/vRO. That means, a customer has always control over his/hers workload and can adjust it based on the current requirements. Of course, you can further extend the functionality in terms of failover settings and introduce additional logic like the configuration of HA restart priorities and dependencies.