Performance Monitoring – CPU

Welcome to the first post of the upcoming performance monitoring series.  In this series I’ll cover the key metrics you should take a look at in case of performance issues in your environment. We’ll make our way through the categories CPU, Memory, Storage and Network. Furthermore I’m trying to explain what possible causes for certain scenarios could be as well as showing feasible solutions for them.

Even though the information given in this series won’t be new for most of you, I keep on noticing that this topic around performance troubleshooting is still current. Alright, don’t waste time and lets get started after you ready my Disclaimer 😉 !

 

CPU Demand & CPU Usage

Of course, the first metrics you’ve most probably looked at already in case of performance issues, are CPU demand and CPU usage. But what’s the difference between them? CPU usage I guess is pretty clear for the most of you… It is the CPU time a VM or one of its cores including the hypervisor’s overhead uses. In comparison, CPU demand is what a VM or its cores are requesting… In an ideal world, these two values would be more or less identical, but due to several reasons, these values can differ a lot (get covered in the next sections). For now, we assume that these values are identical. Up to about 70-80% CPU demand/usage, everything is fine as this means that the VM is able to utilize its granted resources in an efficient fashion. But whenever a VM runs above this threshold (not just peaks, but longer periods), the likelihood that the VM is undersized and could utilize additional resources becomes more realistic.

If the overall CPU usage of a multi core virtual machine is between 30% – 70%, but the friendly application owner in your department is still complaining about bad performance on his/hers server, I would also recommend to check out how the workload gets distributed among the cores. Not all applications are really multi threading capable which then ends up in an uneven utilization of the configured virtual cores. To grant more resources in such a situation is more or less senseless as the application still cannot use the additional cores.

The last metric in this category I would check out even though the machine is obviously not using all the granted resources is, whether the machines has a CPU limit on it. In case there is one, you probably want to remove it unless it was set for a good reason.

But in any case, if you have to increase the number of virtual CPUs, always reboot the virtual machine even though you might have CPU Hot-add enabled. Even though all modern operating systems can usually handle this, experiences have shown, that most of the application can’t. Therefore better shut down the VM and do the changes.

To summarize this section:

  • Overall CPU Demand should not constantly exceed 70-80%, otherwise grant more resources.
  • Workload should get equally distributed among all available cores.
  • Remove CPU Limits if necessary.

 

CPU Contention

As mentioned in the first section of this post, in an ideal world, CPU demand and CPU usage are identical, but in the real world they mostly aren’t. This can have multiple reasons which I briefly want to cover in the following sub sections. But before we proceed, we need to get a basic understanding of how a hypervisor offers the underlying hardware resources to the virtual machines. At this point I’m not going to explain the whole concept of virtualization as this would go far beyond the scope of this post, but let me give you a very short overview.

Assuming you have a physical sever with one operating system on it, it would get exclusive access to all the hardware resources and hence also to all CPU cores. That means, whenever a multi threading capable application wants to schedule a process, it executes its threads to all available cores. But in a virtualized world, we have multiple operating systems running on one physical server. In consequence, all virtual machines have to share the underlying resources… And here the ESXi CPU Scheduler comes in to play. The CPU scheduler as the name indicates, schedules all the requests for CPU resources from the virtual machines to optimally utilize the hardware while providing the best possible performance. What sounds easy, is actually pretty complex as the CPU scheduler not only needs to distribute and schedule all the virtual cores, but also has to consider Memory Location, CPU cache locations, co-scheduling constraints and so on and so forth. If you are interested in a more detailed explanation about this topic, I’d recommend you to check out the resources linked at the bottom of this post. But for now, lets stick to the basics.

CPU Ready

As described in the above section, the ESXi CPU scheduler distributes the virtual CPUs attached to the virtual machines to the physical CPUs based on several factors. As soon as you’ve provisioned more virtual CPUs to virtual machines as physical ones are available, in logical consequence there will be times where some processes have to wait until they get executed. And exactly this behavior can be monitored with the metric “Ready times”. As soon as a CPU core of a virtual machine is ready to “run” but has to wait until a physical core gets free, ready time value increases.

In general, over provisioning of virtual CPUs is absolutely normal and was beside others one of the arguments for virtualization. With that said, over provisioning alone doesn’t create high ready times unless the ratio between virtual and physical CPUs are extremely high. But usually it always depends on how many resources the virtual CPUs are requesting and at which time. To make an example; If the workload across all virtual machines on a physical host is pretty asynchronous or if the requested CPU resources from the virtual machines are low, scheduling everything without a lot of ready times isn’t a problem. On the other hand, if your overall workload happens mostly at the same time and/or if your VMs are pretty resource intensive and are requesting a lot of CPU time, coupled with a high virtual to physical core ratio, high ready times are more likely to happen.

Last but not least, there is another factor which increases the likelihood of high ready times even more. Referring to the example I gave in the introduction of this section, if an OS on a bare metal server executes a multi thread process, it executes all threads at the same time to do as much as possible in parallel. This basically applies also to a virtualized environment where the ESXi scheduler tries to schedule a multi core machine if possible at the same time. Now, with increasing numbers of virtual machines with a lot of virtual cores, you can imagine, scheduling everything gets more complicated. Even though ESXi is capable of doing something called “relaxed co scheduling” which does not strictly forces all virtual CPUs to run at the exact same time to optimize the scheduling process, there is still a demand for running everything as close as possible. Therefore, you increase the chance of having high ready times with provision a lot of monster VMs with a lot of virtual CPUs. That means, always configure as much cores as needed and as few as possible in order to support efficient scheduling and optimal usage of the resources.

You can check whether you have high ready times either via ESXTOP, in the vSphere performance charts or via vRealize Operations Manager (vROps). But be careful, as Ready times are a per core value, you should basically also check them like that (check out the per core values of a VM). In vROps you can also check them on a cluster level to quickly get an idea how your data center is performing, but since not everybody has a shiny vROps Installation, there is also another easy way to check whether your VMs are suffering from high ready times. Search for the VM with the most virtual CPU cores on each Host and check the ready times there. As always, there are no fixed thresholds, but in the range of 5-10%, you’re going to notice it.

Co-Stop

As explained in the section above, ready times happen when the CPUs could not be scheduled when they were ready to run. Ready times can therefore occur on single core VMs as well as on multi core VMs whereas CO-Stops only occur on virtual machines with more than one core. As also explained previously, the ESXi CPU Scheduler is capable of scheduling the virtual cores of a virtual machines in an asynchronous fashion to better utilize the hardware resources and to support mutli core VMs more efficient. Hence sometimes you have virtual cores which are slightly farther than others within the same virtual machine bundle. At a certain threshold this becomes a problem as the operating system is expecting a synchronous execution of the threads.

At this point, the ESXi Scheduler kicks in and Co-stops the “fastest” core to harmonize the execution. This usually happens on systems which are pretty loaded and which already experiences high ready times. The Co-Stop value should not exceed approximately 3%.

Summary

To recap the whole section and as illustrated in Figure 1, the difference between Demand and Usage is some sort of Contention like Ready times and Co-Stops:

Figure 1: CPU

As stated above, If you have vRealize Operations Manager in place, you can also very easily check the maximum contention values for any of the VMs on a cluster or data center level:

Screenshot 1: Max. CPU Contention in Cluster (vROps)

If you have bad contention, either reduce the count of provisioned CPU cores if it doesn’t goes against the first section (CPU Demand) or expand you cluster with physical resources. As a side note, vROps does also provide you information about an optimal physical to virtual CPU Ratio based on the usage metrics.

To summarize:

  • Ready times per core should not exceed 5-10%.
  • Co-Stop values per core should not exceed 3%.
  • CPU Contention overall should not exceed 5-10%.
  • Provision as many cores as needed and as few as possible.

 

Power Management

Another possible cause of sub optimal performance could also be the power management configuration of your servers. You can easily verify whether your server is configured to make use of any power saving techniques either by checking the settings in the BIOS or in case it is set to “OS controlled”, in the ESXi configuration. In general, using power saving techniques is not a bad thing and can help you reducing energy consumption or make use of Turbo Boost, but if you’re facing performance issues and need low CPU latencies, you probably want to increase the policy to “high performance”. In any case, you can check whether ESXi is using power saving techniques by checking out ESXTOP and by looking at the power management section. If your host is configured to “high performance”, you’ll see that all cores show 100% at P state 1 and depending on their activities switching between C state 0 or 1:

Screenshot 2: ESXTOP Power Stats

 

OS CPU Queue

As we’ve now covered all the major Hypervisor’s related metrics, I’d like to point out another potential bottleneck I’ve discovered during my time as a systems engineer. Assuming you’ve checked all the above metrics as well as Memory, Disk and Network related values and you haven’t found anything suspicious but you keep receiving complaints about the performance, check out the CPU queue on OS level!

Sometimes, applications keep executing a lot of threads with smaller instructions which do not saturate the actual CPU capacity due to increased scheduling effort on OS level (this is at least my explanation for it). However, if the application could execute these threads across more cores, it would most probably benefit from it and overall usage would increase. In Windows, there is an easy way to check this. Open the Performance Monitor and add a counter called “Processor Queue Length” located in the category “System”. Afterwards we have to tune the chart a little bit to get a better overview. First of all, adjust the Scale to “1,0” as shown in Screenshot 3:

Screenshot 3: Scale modification

Afterwards, set the maximum Vertical scale to 50 or so, depending on the value range your OS is in:

Screenshot 4: Vertical scale modification

And then adjust the sample duration to whatever you like to see:

Screenshot 5: Duration modification

 

If you have everything configured, wait until the first run have been made and check out the results as shown in the following Screenshot:

Screenshot 6: Results

The results are going to show beside other values the average amount of threads waiting in the processor queue on OS level. According to Microsoft, the average value should not be higher than the count of configured CPU cores multiplied by 2. That means, if your machine has 4 virtual CPUs, and the average queue count is constantly more than 8, it could make sense to increase the core count of the virtual machine. But remember, as mentioned in the first section of this post, consider a reboot or better said, shut down the machine, modify the CPU count (CPU sockets instead of cores) and then start.

 

Summary

To summarize this post, following a short and comprehensive overview about the most important metrics and their thresholds:

MetricCritical Threshold
CPU Demand≥ 70-80%
CPU Usage per coreUneven
CPU Ready times≥ 10%
Co-Stop≥ 3%
Max. limited≥ 0
CPU Queue Depth≥ 2 x vCores

Of course, there a plenty of other metric available which allow you to conduct an even deeper analysis, but in general you should be able to identify 95% of your performance issues with the ones stated above. But keep in mind that every category influences and gets influenced by the others. For example, you probably won’t see high CPU Usage when you have high storage latencies or when the memory buffers are saturated. Therefore, always try to get a complete picture about the situation!

With that said, stay tuned for the following posts…

 

Sources

Leave a Reply