Hol-Sdc-1604 PDF en

HOL-SDC-1604
Table of Contents
Lab Overview - HOL-SDC-1604 - vSphere Performance Optimization ............................... 3
Lab Guidance .......................................................................................................... 4
vSphere 6 Performance Introduction....................................................................... 6
Module 1: CPU Performance, Basic Concepts and Troubleshooting (15 minutes).............. 7
Introduction to CPU Performance Troubleshooting .................................................. 8
CPU Contention ..................................................................................................... 14
Conclusion and Clean-Up ...................................................................................... 34
Module 2: CPU Performance Feature: Latency Sensitivity Setting (45 minutes) ............. 37
Introduction to Latency Sensitivity........................................................................ 38
Performance impact of the Latency Sensitivity setting ......................................... 45
Conclusion and Cleanup........................................................................................ 67
Module 3: CPU Performance Feature: Power Management Policies (15 minutes)............ 70
Introduction to, and Performance Impact of, Power Policies.................................. 71
Configuring the Server BIOS Power Management Settings ................................... 77
Configuring Host Power Management in ESXi ....................................................... 84
Conclusion............................................................................................................. 89
Module 4: vSphere Fault Tolerance (FT) and Performance (30 minutes) ......................... 91
Introduction to vSphere Fault Tolerance ................................................................ 92
Configure Lab for Fault Tolerance ........................................................................ 104
Fault Tolerance Performance ............................................................................... 123
Conclusion........................................................................................................... 129
Module 5: Memory Performance, Basic Concepts and Troubleshooting (30 minutes) ... 130
Introduction to Memory Performance Troubleshooting........................................ 131
Memory Resource Control ................................................................................... 137
Conclusion and Clean-Up .................................................................................... 158
Module 6: Memory Performance Feature: vNUMA with Memory Hot Add (30 minutes) 161
Introduction to NUMA and vNUMA....................................................................... 162
vNUMA vs. Cores per Socket ............................................................................... 170
vNUMA with Memory Hot Add ............................................................................. 187
Module 7: Storage Performance and Troubleshooting (30 minutes).............................. 196
Introduction to Storage Performance Troubleshooting ........................................ 197
Storage I/O Contention........................................................................................ 203
Storage Cluster and Storage DRS ....................................................................... 212
Module 8: Network Performance, Basic Concepts and Troubleshooting (15 minutes) ... 229
Introduction to Network Performance ................................................................. 230
Show network contention.................................................................................... 236
Conclusion and Cleanup...................................................................................... 243
Module 9: Network Performance Feature: Network IO Control with Reservations (45
minutes)........................................................................................................................ 246
Introduction to Network IO Control...................................................................... 247
HOL-SDC-1604
Page 1
HOL-SDC-1604
Show Network IO Control .................................................................................... 249

Module 10: Performance Monitoring Tool: esxtop CLI introduction (30 minutes) .......... 252
Introduction to esxtop ......................................................................................... 253
Show esxtop CPU Features.................................................................................. 259
Show esxtop memory features ........................................................................... 270
Show esxtop storage features............................................................................. 276
show esxtop Network features ............................................................................ 282
Module 11: Performance Monitoring Tool: esxtop for vSphere Web Client (30
minutes)........................................................................................................................ 289
Introduction to esxtop for vSphere Web Client.................................................... 290
Show esxtop for vSphere Web Client cpu features.............................................. 296
Show esxtop for vSphere Web Client memory features ...................................... 306
Show esxtop for vSphere Web Client storage features ....................................... 314
show esxtop for vSphere Web Client Network features....................................... 323
Module 12: Performance Monitoring and Troubleshooting: vRealize Operations, Next
Steps (30 minutes)........................................................................................................ 333
Introduction to vRealize Operations Manager ..................................................... 334
Use vRealize Operations Manager for Performance Troubleshooting .................. 335
HOL-SDC-1604
Page 2
HOL-SDC-1604
Lab Overview - HOLSDC-1604 - vSphere

Performance Optimization
HOL-SDC-1604
Page 3
HOL-SDC-1604
Lab Guidance
You have 90 minutes for each lab session and next to each module you can see the
estimated time to complete it. Every module can be completed by itself, and the
modules can be taken in random order, but make sure that you follow the instructions
carefully with respect to the cleanup procedure after each module. In short, all VMs
should be shut down after the completion of each module using the script instructed in
the modules. In total, there is more than six hours of content in this lab.
Lab Module List:
Lab Overview
Lab Guidance
vSphere 6 Performance Introduction
Module 1: CPU Performance, Basic Concepts and Troubleshooting (15 minutes)
Module 2: CPU Performance Feature: Latency Sensitivity Setting (45 minutes)
Module 3: CPU Performance Feature: Power Policies (15 minutes)
Module 4: CPU Performance Feature: SMP-FT (30 minutes)
Module 5: Memory Performance, Basic Concepts and Troubleshooting (30 minutes)
Module 6: Memory Performance Feature: vNUMA with Memory Hot Add (30 minutes)
Module 7: Storage Performance and Troubleshooting (30 minutes)
Module 8: Network Performance, Basic Concepts and Troubleshooting (15 minutes)
Module 9: Network Performance Feature: Network IO Control with Reservations (45
minutes)
Module 10: Performance Tool: esxtop CLI introduction (30 minutes)
Module 11: Performance Tool: esxtop for vSphere Web Client (30 minutes)
Module 12: Performance Tool: vRealize Operations, next step in performance
monitoring and Troubleshooting (30 minutes)
Lab Captains: David Morse (Modules 1, 2, 3, 4), Henrik Moenster (Modules 5, 6, 7, 12),
Robert Jensen (Module 8, 9, 10, 11)
This lab manual can be downloaded from the Hands-on Labs Document site found here:
HOL-SDC-1604
Page 4
HOL-SDC-1604
http://docs.hol.pub/catalog/
Activation Prompt or Watermark

When you first start your lab, you may notice a watermark on the desktop indicating
that Windows is not activated.
One of the major benefits of virtualization is that virtual machines can be moved and
run on any platform. The Hands-on Labs utilizes this benefit and we are able to run the
labs out of multiple datacenters. However, these datacenters may not have identical
processors, which triggers a Microsoft activation check through the Internet.
Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoft
licensing requirements. The lab that you are using is a self-contained pod and does not
have full access to the Internet, which is required for Windows to verify the activation.
Without full access to the Internet, this automated process fails and you see this
watermark.
This cosmetic issue has no effect on your lab. If you have any questions or concerns,
please feel free to use the support made available to you either at VMworld in the
Hands-on Labs area, in your Expert-led Workshop, or online via the survey comments as
we are always looking for ways to improve your hands on lab experience.
HOL-SDC-1604
Page 5
HOL-SDC-1604
vSphere 6 Performance Introduction

This Lab, HOL-SDC-1604, covers vSphere performance best practices and various
performance related features available in vSphere 6. You will work with a broad array of
solutions and tools, including VMware Labs "Flings" and esxtop to gauge and diagnose
performance in a vSphere Environment. vSphere features related to performance
includes Network IO Control Reservations, vNUMA with Memory Hot Add, Latency
Sensitivity and Power Policy Settings.
While the time available in this lab constrains the number of performance problems we
can review as examples, we have selected relevant problems that are commonly seen in
vSphere environments. By walking through these examples, you should be more
capable to understand and troubleshoot typical performance problems.
For the complete Performance Troubleshooting Methodology and a list of VMware Best
Practices, please visit the vmware.com website:
http://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf
http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vsphere-esxivcenter-server-60-monitoring-performance-guide.pdf
Further more, if you have interest in performance related articles, make sure that you
monitor the VMware VROOM! Blog:
http://blogs.vmware.com/performance/
HOL-SDC-1604
Page 6
HOL-SDC-1604
Module 1: CPU
Performance, Basic
Concepts and
Troubleshooting (15
minutes)
HOL-SDC-1604
Page 7
HOL-SDC-1604
Introduction to CPU Performance

Troubleshooting
The goal of this module is to expose you to a CPU contention issue in a virtualized
environment. It will also guide you on how to quickly identify performance problems by
checking various performance metrics and settings.
Performance problems may occur when there are insufficient CPU resources to satisfy
demand. Excessive demand for CPU resources on a vSphere host may occur for many
reasons. In some cases, the cause is straightforward. Populating a vSphere host with too
many virtual machines running compute-intensive applications can make it impossible
to supply sufficient CPU resources to all the individual virtual machines. However,
sometimes the cause may be more subtle, related to the inefficient use of available
resources or non-optimal virtual machine configurations.
Let's get started!
HOL-SDC-1604
Page 8
HOL-SDC-1604
For users with non-US Keyboards

If you are using a device with non-US keyboard layout, you might find it difficult to enter
CLI commands, user names and passwords throughout the modules in this lab.
The CLI commands, user names and passwords that needs to be entered, can be copied
and pasted from the file README.txt on the desktop.
HOL-SDC-1604
Page 9
HOL-SDC-1604
On-Screen Keyboard
Another option, if you are having issues with the keyboard, is to use the On-Screen
Keyboard.
To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.
Getting Back on Track

If, for any reason, you make a mistake or the lab breaks, perform the following
actions to get back on track, and restart the current module from the beginning.
Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell
prompt.
HOL-SDC-1604
Page 10
HOL-SDC-1604
Resetting VMs to Restart Module

From the PowerCLI prompt, type:
.\StopLabVMs.ps1
and press Enter.
The script will stop all running VMs and reset their settings, and you can restart the
module.
Start this Module

Let's start this module.
Launch Chrome from the shortcut in the Taskbar.
HOL-SDC-1604
Page 11
HOL-SDC-1604
Login to vSphere
Log into vSphere. The vSphere Web Client should be the default home page.
Check the Use Windows session authentication checkbox.
If, for some reason, that does not work, uncheck the box and use these credentials:
User name: CORP\Administrator
Password: VMware1!
HOL-SDC-1604
Page 12
HOL-SDC-1604
Refresh the UI
In order to reduce the amount of manual input in this lab, a lot of tasks are automated
using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the
actual state of the inventory immediately after a script has run.
If you need to manually refresh the inventory, click the Refresh icon in the top of the
vSphere Web Client.
Select Hosts and Clusters

Click on the Hosts and Clusters icon
HOL-SDC-1604
Page 13
HOL-SDC-1604
CPU Contention
Below are a list of most common CPU performance issues:
High Ready Time: A CPU is in the Ready state when the virtual machine is ready to run
but unable to run because the vSphere scheduler is unable to find physical host CPU
resources to run the virtual machine on. Ready Time above 10% could indicate CPU
contention and might impact the Performance of CPU intensive application. However,
some less CPU sensitive application and virtual machines can have much higher values
of ready time and still perform satisfactorily.
High Costop time: Costop time indicates that there are more vCPUs than necessary,
and that the excess vCPUs make overhead that drags down the performance of the VM.
The VM will likely run better with fewer vCPUs. The vCPU(s) with high costop is being
kept from running while the other, more-idle vCPUs are catching up to the busy one.
CPU Limits: CPU Limits directly prevent a virtual machine from using more than a set
amount of CPU resources. Any CPU limit might cause a CPU performance problem if the
virtual machine needs resources beyond the limit.
Host CPU Saturation: When the Physical CPUs of a vSphere host are being
consistently utilized at 85% or more then the vSphere host may be saturated. When a
vSphere host is saturated, it is more difficult for the scheduler to find free physical CPU
resources in order to run virtual machines.
Guest CPU Saturation: Guest CPU (vCPU) Saturation is when the application inside
the virtual machine is using 90% or more of the CPU resources assigned to the virtual
machine. This may be an indicator that the application is being bottlenecked on vCPU
resource. In these situations, adding additional vCPU resources to the virtual machine
might improve performance.
Incorrect SMP Usage: Using large SMP virtual machines can cause extra overhead.
Virtual machines should be correctly sized for the application that is intended to run in
the virtual machine. Some applications may only support multithreading up to a certain
number of threads. Assignment of additional vCPU to the virtual machine may cause
additional overhead. If vCPU usage shows that a machine, which is configured with
multiple vCPUs and is only using one of them. Then it might be an indicator that the
application inside the virtual machine is unable to take advantage of the additional
vCPU capacity, or that the guest OS is incorrectly configured.
Low Guest Usage: Low in-guest CPU utilization might be an indicator, that the
application is not configured correctly, or that the application is starved of some other
resource such as I/O or Memory and therefore cannot fully utilize the assigned vCPU
resources.
HOL-SDC-1604
Page 14
HOL-SDC-1604
Open a PowerCLI window

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command
prompt.
Start the CPU Workload

From the PowerCLI Console
type:
.\StartCPUTest.ps1
press enter
While the script configures and starts up the virtual machines, please continue to read
ahead.
CPU Test Started

When the script completes, you will see two Remote Desktop windows open (note: you
may have to move one of the windows to display them side by side, as shown above).
The script has started a CPU intensive benchmark (SPECjbb2005) on both perfworker-01a and perf-worker-01b virtual machines, and a GUI is displaying the real-time
performance value as this workload runs.
Above, we see an example screenshot where the performance of the benchmarks are
around 15,000.
HOL-SDC-1604
Page 15
HOL-SDC-1604
IMPORTANT NOTE: Due to changing loads in the lab environment, your values may
vary from the values shown in the screenshots.
HOL-SDC-1604
Page 16
HOL-SDC-1604
Navigate to perf-worker-01a (VM-level) Performance Chart

1.
2.
3.
4.
5.
Select the perf-worker-01a virtual machine from the list of VMs on the left
Click the Monitor tab
Click Performance
Click Advanced
Click on Chart Options
HOL-SDC-1604
Page 17
HOL-SDC-1604
Select Specific Counters for CPU Performance Monitoring

When investigating a potential CPU issue, there are several counters that are important
to analyze:
Demand: Amount of CPU the virtual machine is demanding / trying to use.
Ready: Amount of time the virtual machine is ready to run but unable to because
vSphere could not find physical resources to run the virtual machine on
Usage: Amount of CPU the virtual machine is actually currently being allowed to
use.
1.
2.
3.
4.
5.
Select CPU from the Chart metrics

Check only the perf-worker-01a object
Click None on the bottom right of the list of counters
Now check only Demand, Ready, and Usage in MHz
Click Ok
CPU State Time Explanation

Virtual machines can be in any one of four high-level CPU States:
Wait: This can occur when the virtual machine's guest OS is idle (waiting for
work), or the virtual machine could be waiting on vSphere tasks. Some examples
of vSphere tasks that a vCPU may be waiting on include waiting for I/O to
HOL-SDC-1604
Page 18
HOL-SDC-1604
complete or waiting for ESX level swapping to complete. These non-idle vSphere
system waits are called VMWAIT.
Ready (RDY): A CPU is in the Ready state when the virtual machine is ready to
run but unable to run because the vSphere scheduler is unable to find physical
host CPU resources to run the virtual machine on. One potential reason for
elevated Ready time is that the VM is constrained by a user-set CPU limit or
resource pool limit, reported as max limited (MLMTD).
CoStop (CSTP): Time the vCPUs of a multi-vCPU virtual machine spent waiting
to be co-started. This gives an indication of the co-scheduling overhead incurred
by the virtual machine.
Run: Time the virtual machine was running on a physical processor.
HOL-SDC-1604
Page 19
HOL-SDC-1604
Unpin View Panes to Expand Chart View

Since we are dealing with a limited screen resolution, let's create some more screen real
estate, so we can see more of the performance chart (without having to use the
scrollbars).
To do so, unpin some of the default view panes by clicking the areas noted in the
screenshot.
HOL-SDC-1604
Page 20
HOL-SDC-1604
Monitor Demand vs. Usage lines

Notice the amount of CPU this virtual machine is demanding and compare that to the
amount of CPU usage the virtual machine is actually allocated (Usage in MHz). The
virtual machine is demanding more than it is currently being allowed to use.
Notice that the virtual machine is also seeing a large amount of ready time.
Guidance: Ready time > 10% could be a performance concern.
HOL-SDC-1604
Page 21
HOL-SDC-1604
Explanation of value conversion

NOTE: vCenter reports some metrics such as "Ready Time" in milliseconds (ms). Use
the formula above to convert the milliseconds (ms) value to a percentage.
For multi-vCPU virtual machines, multiply the Sample Period by the number of vCPUs of
the VM to determine the total time of the sample period. It is also beneficial to monitor
Co-Stop time on multi-vCPU virtual machines. Like Ready time, Co-Stop time greater
than 10% could indicate a performance problem. You can examine Ready time and CoStop metrics per vCPU as well as per VM. Per vCPU is the most accurate way to
examine statistics like these.
Navigate to Host-level CPU chart view

1.
2.
3.
4.
5.
Select
Select
Select
Select
Select
HOL-SDC-1604
esx-01a.corp.local
the Monitor tab
Performance
the Advanced view
the CPU view
Page 22
HOL-SDC-1604
Examine ESX Host Level CPU Metrics

Notice in the Chart, that only 1 of the CPUs in the host seems to have any significant
workload running on it.
One CPU is at 100%, but the other CPU in the host is not really being used.
HOL-SDC-1604
Page 23
HOL-SDC-1604
Edit Settings of perf-worker-01a

Let's see how perf-worker-01a is configured:
1. Click on the perf-worker-01a virtual machine
2. Click Actions
3. Click Edit Settings
HOL-SDC-1604
Page 24
HOL-SDC-1604
Check Affinity Settings on perf-worker-01a

1. Expand the CPU item in the list and you will see that affinity is set to cpu1.
2. Clear the "1" to correctly balance the virtual machines across the physical CPUs
in the system.
3. Press OK to make the changes.
Note: VMware does not recommend setting affinity in most cases. vSphere will balance
VMs across CPUs optimally without manually specifying affinity. Enabling affinity
prevents some features like vMotion, can become a management headache and lead to
performance issues like the one we just diagnosed.
HOL-SDC-1604
Page 25
HOL-SDC-1604
Check Affinity Settings on perf-worker-01b

1. Expand the CPU item in the list and you will see that affinity is set. Unfortunately,
both virtual machines are bound to the same processor (CPU1). This can happen
if an administrator sets affinity for a virtual machine and then creates a second
virtual machine by cloning the original.
2. Clear the "1" to correctly balance the virtual machines across the physical CPUs
in the system.
3. Press OK to make the changes.
Note: VMware does not recommend setting affinity in most cases. vSphere will balance
VMs across CPUs optimally without manually specifying affinity. Enabling affinity
prevents some features like vMotion, can become a management headache and lead to
performance issues like the one we just diagnosed.
HOL-SDC-1604
Page 26
HOL-SDC-1604
Monitor Ready time

Return to perf-worker-01a and see how the Ready time immediately drops, and the
Usage in MHz increases.
HOL-SDC-1604
Page 27
HOL-SDC-1604
See Better Performance

It may take a moment, but the CPU Benchmark scores should increase. Click back to the
Remote Desktop windows to confirm this.
In this example, we have seen how to use the Demand compared to the Used CPU
metrics to identify CPU contention. We showed you the Ready time metric and how it
can be used to detect physical CPU contention. We also showed you the danger of
setting affinity.
HOL-SDC-1604
Page 28
HOL-SDC-1604
Edit Settings of perf-worker-01b

Let's add a virtual CPU to perf-worker-01b to improve performance.
1. Click on the perf-worker-01b virtual machine
2. Click Actions
HOL-SDC-1604
Page 29
HOL-SDC-1604
Add a CPU to perf-worker-01b

1. Change the number of CPUs to 2
2. Click OK
Monitor CPU performance of perf-worker-01b

1.
2.
3.
4.
Select
Select
Select
Select
HOL-SDC-1604
perf-worker-01b
Monitor
Performance
the CPU view
Page 30
HOL-SDC-1604
Notice that the virtual machine is now using both vCPUs. This is because the OS in the
virtual machine supports CPU hot-add, and because that feature has been enabled on
the virtual machine.
Investigate performance
Notice that the performance of perf-worker-01b has increased, since we added the
additional virtual CPU.
However, this is not always the case. If the host these VMs are running on (esx-01a)
only had two physical CPUs, the addition of an additional vCPU would have caused an
overcommitment, leading to high %READY and poor performance.
Remember, most workloads are not necessarily CPU bound. The OS and the application
need to be able to be multi-threaded to get performance improvements from additional
CPUs. Most of the work that an OS is doing is typically not CPU-bound, that is, most of
their time is spent waiting for external events such as user interaction, device input, or
data retrieval, rather than executing instructions. Because otherwise-unused CPU cycles
are available to absorb the virtualization overhead, these workloads will typically have
throughput similar to native, but potentially with a slight increase in latency.
Configuring a virtual machine with more virtual CPUs (vCPUs) than its workload can use
might cause slightly increased resource usage, potentially impacting performance on
very heavily loaded systems. Common examples of this include a single-threaded
HOL-SDC-1604
Page 31
HOL-SDC-1604
workload running in a multiple-vCPU virtual machine, or a multi-threaded workload in a

virtual machine with more vCPUs than the workload can effectively use.
Even if the guest operating system doesnt use all of the vCPUs allocated to it, overconfiguring virtual machines with too many vCPUs still imposes non-zero resource
requirements on ESXi that translate to real CPU consumption on the host. For example:
Unused vCPUs still consume timer interrupts in some guest operating systems.
(Though this is not true with tickless timer kernels)
Maintaining a consistent memory view among multiple vCPUs can consume
additional resources, both in the guest operating system and in ESXi. (Though
hardware-assisted MMU virtualization significantly reduces this cost.)
Most guest operating systems execute an idle loop during periods of inactivity.
Within this loop, most of these guest operating systems halt by executing the HLT
or MWAIT instructions. Some older guest operating systems (including Windows
2000 with certain HALs, Solaris 8 and 9, and MS-DOS), however, use busywaiting within their idle loops. This results in the consumption of resources that
might otherwise be available for other uses (other virtual machines, the
VMkernel, and so on). ESXi automatically detects these loops and de-schedules
the idle vCPU. Though this reduces the CPU overhead, it can also reduce the
performance of some I/O-heavy workloads.
The guest operating systems scheduler might migrate a single-threaded
workload amongst multiple vCPUs, thereby losing cache locality.
These resource requirements translate to real CPU consumption on the host.
HOL-SDC-1604
Page 32
HOL-SDC-1604
Close Remote Desktop Connections

Close the two remote desktop connections.
HOL-SDC-1604
Page 33
HOL-SDC-1604
Conclusion and Clean-Up

Clean up procedure
In order to free up resources for the remaining parts of this lab, we need to shut down
the used virtual machine and reset their configuration.
Launch PowerCLI
If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI
icon in the taskbar to open a command prompt.
Power off and Reset VMs

In the PowerCLI console, type:
.\StopLabVMs.ps1
press Enter
The script will now stop all running VMs and reset their settings.
HOL-SDC-1604
Page 34
HOL-SDC-1604
Close PowerCLI window

Close the PowerCLI Window
You can now move on to another module.
Key take aways

CPU contention problems are generally easy to detect. In fact, vCenter has several
alarms that will trigger if host CPU utilization or virtual machine CPU utilization goes too
high for extended periods of times.
vSphere 6.0 allows you to create very large virtual machines that have up to 128 vCPUs.
It is highly recommended to size your virtual machine for the application workload that
will be running in them. Sizing your virtual machine with resources that are
unnecessarily larger than the workload can actually use may result in hypervisor
overhead and can also lead to performance issues.
In general, here are some common CPU performance tips
Avoid a large VM on too small a platform
Rule of thumb: 1-4 vCPU on dual socket hosts, 8+ vCPU on quad socket hosts.
This rule of thumb changes as core counts increase. Try to keep vCPU count
below the core count of any single pCPU for the best performance profile. This is
due to memory locality, see module 4 about vNUMA for more details on this.
With 8 vCPU, ensure at least vSphere 4.1
Sizing a VM too large is wasteful. The OS will spend more time wasting cycles
trying to keep workloads in sync.
Don't expect as high of consolidation ratios with busy workloads as you did
with the low-hanging-fruit
Virtualizing larger workloads require revisiting consolidation ratios.
Tier 1 applications more performant workloads demand more resources
HOL-SDC-1604
Page 35
HOL-SDC-1604
Conclusion
This concludes Module 1: CPU Performance, Basic Concepts and
Troubleshooting. We hope you have enjoyed taking it. Please do not forget to fill out
the survey when you are finished.
If you have time remaining, here are the other modules that are part of this lab along
with an estimated time to complete each one. Click on 'More Options - Table of
Contents' to quickly jump to a module in the manual.

Module 5: Memory Performance, Basic Concepts and Troubleshooting (30
minutes)
Module 6: Memory Performance Feature: vNUMA with Memory Hot Add (30
minutes)
Module 8: Network Performance, Basic Concepts and Troubleshooting (15
minutes)
Module 9: Network Performance Feature: Network IO Control with Reservations
(45 minutes)
HOL-SDC-1604
Page 36
HOL-SDC-1604
Module 2: CPU
Performance Feature:
Latency Sensitivity
Setting (45 minutes)
HOL-SDC-1604
Page 37
HOL-SDC-1604
Introduction to Latency Sensitivity

The latency sensitivity feature aims at eliminating the major sources of extra latency
imposed by virtualization to achieve low response time and jitter. This per-VM feature
achieves this goal by giving exclusive access to physical resources to avoid resource
contention due to sharing, bypassing virtualization layers to eliminate the overhead of
extra processing, and tuning virtualization layers to reduce overhead. Performance can
be further improved when the latency sensitivity feature is used together with a passthrough mechanism such as single-root I/O virtualization (SR-IOV).
Since the latency sensitivity feature is applied on a per VM basis, a vSphere host can
run a mixture of normal VMs and latency sensitive VMs.
Who should use this feature?

The latency sensitivity feature is intended for specialized use cases that require
extremely low latency. It is extremely important to determine whether or not your
workload could benefit from this feature before enabling it. In a nutshell, latency
sensitivity provides extremely low network latency with a tradeoff of increased CPU and
memory cost as a result of less resource sharing, and increased power consumption.
We define a "highly latency sensitive application" as one that requires network latencies
in the order of tens of microseconds and very small jitter. Stock market trading
applications are an example of highly latency sensitive applications.
Before deciding if this setting is right for you, you should be aware of the network
latency needs of your application. If you set latency sensitivity to High, it could lead to
increased host CPU utilization, power consumption, and even negatively impact
performance in some cases.
Who should not use this feature?

Enabling the latency sensitivity feature reduces network latency. Latency sensitivity
will not decrease application latency if latency is influenced by storage latency or other
sources of latency besides the network.
The latency sensitivity feature should be enabled in environments in which the CPU is
undercommitted. VMs which have latency sensitivity set to High will be given exclusive
access to the physical CPU they need to run. This means the latency sensitive VM can
no longer share the CPU with neighboring VMs.
Generally, VMs that use the latency sensitivity feature should have a number of vCPUs
which is less than the number of cores per socket in your host to ensure that the latency
sensitive VM occupies only one NUMA node.
HOL-SDC-1604
Page 38
HOL-SDC-1604
If the latency sensitivity feature is not relevant to your environment, feel free to choose
a different module.
Changes to CPU access

When a VM has 'High' latency sensitivity set in vCenter, the VM is given exclusive
access to the physical cores it needs to run. This is termed exclusive affinity. These
cores will be reserved for the latency sensitive VM only, which results in greater CPU
accessibility to the VM and less L1 and L2 cache pollution from multiplexing other VMs
onto the same cores. When the VM is powered on, each vCPU is assigned to a particular
physical CPU and remains on that CPU.
When the latency sensitive VM's vCPU is idle, ESXi also alters its halting behavior so that
the physical CPU remains active. This reduces wakeup latency when the VM becomes
active again.
Changes to virtual NIC coalescing

A virtual NIC (vNIC) is a virtual device which exchanges data packets between the
VMkernel and the guest operating system. Exchanges are typically triggered by
interrupts to the guest OS or by the guest OS calling into VMKernel, both of which are
expensive operations. Virtual NIC coalescing, which is default behavior in ESXi, attempts
to reduce CPU overhead by holding onto packets for some time before posting interrupts
or calling into VMKernel. In doing so, coalescing introduces additional network latency
and jitter, but these effects are negligible for most non-latency sensitive workloads.
Enabling 'High' latency sensitivity disables virtual NIC coalescing, so that there is
less latency between when a packet is sent or received and when the CPU is interrupted
to process the packet.
This also results in greater power and CPU consumption as a tradeoff for reduced
network latency. This reduces network latency when the number of packets being
processed is small. But, if the number of packets becomes large, disabling virtual NIC
coalescing can actually be counterproductive due to the increased CPU overhead.
Are you ready to get your hands dirty? Let's start the hands-on portion of this lab.
HOL-SDC-1604
Page 39
HOL-SDC-1604

HOL-SDC-1604
Page 40
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 41
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 42
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 43
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 44
HOL-SDC-1604
Performance impact of the Latency

Sensitivity setting
In this section, we will observe the impact of the Latency Sensitivity setting on network
latency. To do so, let's start up some workloads to stress the VMs.

prompt.
Start CPU workload

From the PowerCLI Console, type:
.\StartLSLab.ps1
press enter
The script will configure and start up three VMs (03a, 05a, and 06a), and generate a CPU
workload on two of them (05a and 06a).
VM Stats Collectors: CPU intensive workload started

In a few minutes, when the script completes, you will see two VM Stats Collector
applications start up. Within a minute after, each utility will start a CPU intensive
application on the perf-worker-05a and perf-worker-06a virtual machines and will be
HOL-SDC-1604
Page 45
HOL-SDC-1604
collecting the benchmark results from those CPU intensive workloads. These VMs perfworker-05a and perf-worker-06a will create high demand for CPU on the host, which will
help us demonstrate the Latency Sensitivity feature.
IMPORTANT NOTE: Due to changing loads in the lab environment, your values may
vary from the values shown in the screenshots.
Select ESX Host

The environment where the lab is running, is not constant. Due to that, it's important, to
notice the speed of the CPU's on the nested ESX hosts.
Open the vSphere Web client again.
You should already be in the host and clusters window.
1. Select esx-01a.corp.local
2. Make a note of the cpu speed of the processor (in this case 2.6GHZ)
You will be using this, in a later step.
HOL-SDC-1604
Page 46
HOL-SDC-1604
Edit perf-worker-04a
We will use the perf-worker-04a virtual machine to demonstrate the Latency Sensitivity
feature. To show how the 'High' Latency Sensitivity setting affects network latency, we
will compare network performance between perf-worker-04a with Latency Sensitivity set
to 'Normal' and that same VM with Latency Sensitivity set to 'High'.
The Latency Sensitivity feature, when set to 'High', has two VM resource requirements.
For best performance, it needs 100% memory reservation and 100% CPU reservation.
To make a fair comparison, both the 'Normal' latency sensitivity VM and the 'High'
latency sensitivity VM should have the same resource reservations, so that the only
difference between the two is the 'High' latency sensitivity setting.
First, we will create resource allocations for the perf-worker-04a virtual machine while
Latency Sensitivity is set to "Normal".
1. Select perf-worker-04a
2. Select edit settings
HOL-SDC-1604
Page 47
HOL-SDC-1604
Set CPU Reservation to Maximum

1. Expand CPU
2. Set the Reservation value to the highest value possible, according the the cpu
speed, that you noted in the earlier step. If the cpu speed was 3.1 GHZ, set it til
3058MHZ. If it was 2.6GHZ set it to 2598MHZ.
Note that it must be a couple of mhz less, for the VM to be able to start.
This sets a near-100% CPU reservation for the VM. When the VM has the 'High' latency
sensitivity setting, this CPU reservation enables exclusive affinity so that one physical
CPU is reserved solely for use of the 'High' Latency Sensitive VM vCPU.
Note that normally you should select "Maximum" reservation, but due to this being a
fully virtualized environment, the CPU speed is detected with a wrong value.Therefore
we set it manually according to the underlying hardware.
HOL-SDC-1604
Page 48
HOL-SDC-1604
Set Memory Reservation

Still on the Edit Settings page,
1. Click CPU to collapse the CPU view
2. Click Memory to expand the Memory view
3. Check the box Reserve all guest memory (All locked)
This sets a 100% memory reservation for the VM.
Right now, we are going to test network performance on a 'Normal' Latency Sensitivity
VM, but when we change the VM's latency sensitivity to 'High' later, 100% memory
reservation ensures that all the memory the VM needs will be located close to the
processor which is running the VM. If the VM has a 'High' Latency Sensitivity setting and
does not have a 100% memory reservation, it will not power on.
HOL-SDC-1604
Page 49
HOL-SDC-1604
Ensure Latency Sensitivity is 'Normal'

Still on the Edit Settings page:
1.
2.
3.
4.
Click the VM Options tab

Click Advanced to expand this section
Confirm the Latency Sensitivity is Normal
Click OK
Power on perf-worker-04a
1. Select "perf-worker-04a"
2. Select "Power"
HOL-SDC-1604
Page 50
HOL-SDC-1604
3. Click "Power On"
HOL-SDC-1604
Page 51
HOL-SDC-1604
Monitor esx-02a host CPU usage

1.
2.
3.
4.
5.
Select esx-02a.corp.local
Select Monitor
Select Performance
Select Advanced
You can see that the Latest value for esx-02a.corp.local Usage should be close to
100%. This indicates that the perf-worker-05a and perf-worker-06a VMs are
consuming as much CPU on the host as they can.
Although an environment which contains latency-sensitive VMs should typically remain

CPU undercommitted, creating demand for CPU makes it more likely that we can see a
difference between the 'Normal' and 'High' Latency Sensitivity network performance.
The VM perf-worker-03a will serve as the network performance test target.
Monitor Resource Allocation

2. Select Monitor
3. Select Utilization
HOL-SDC-1604
Page 52
HOL-SDC-1604
The Resource Allocation for the 'Normal' Latency Sensitive VM shows only a small
portion of the total CPU and Memory reservation is Active. Your screen may see different
values if the VM is still booting up.
Open a PuTTY window

Click the PuTTY icon on the taskbar
HOL-SDC-1604
Page 53
HOL-SDC-1604
SSH to perf-worker-04a
2. Click Open
Test network latency on 'Normal' latency sensitivity VM

At the command line, type:
ping -f -w 1 192.168.100.153
Press enter.
Wait for the command to complete, and run this command a total of 3 times. On the
second and third times, you can press the up arrow to retrieve the last command
entered.
Ping is a very simple network workload, which measures Round Trip Time (RTT), in which
a network packet is sent to a target VM and then returned back to the VM. The VM perfworker-04a, located on esx-02a.corp.local, is pinging perf-worker-03a, located on
esx-01a.corp.local, with the IP address 192.168.100.153. For a period of one second,
perf-worker-04a sends back-to-back ping requests. Ping is an ideal low-level network
HOL-SDC-1604
Page 54
HOL-SDC-1604
test because the request is processed in the kernel and does not need to access the
application layer of the operating system.
We have finished testing network latency and throughput on the 'normal' Latency
Sensitivity VM. Do not close this PuTTY window as we will use it for reference later. We
will now change the VM to 'high' Latency Sensitivity.
HOL-SDC-1604
Page 55
HOL-SDC-1604
Shut down the perf-worker-04a VM

To enable the latency sensitivity feature for a VM, the VM must first be powered off. You
can still change the setting while the VM is powered on, but it doesn't fully apply until
the VM has been powered off and then back on again.
1. Right-click perf-worker-04a
2. Select Power
3. Click Shut Down Guest OS
Confirm Guest Shut Down

Click Yes
Wait for perf-worker-04a to shut down.
Edit Settings for perf-worker-04a

We will use the perf-worker-04a virtual machine to demonstrate the Latency
Sensitivity feature. To show how the 'High' Latency Sensitivity setting affects network
latency, we will compare network performance with this setting set to Normal and
High.
HOL-SDC-1604
Page 56
HOL-SDC-1604
The Latency Sensitivity feature, when set to 'High', has two VM resource requirements.
For best performance, it needs 100% memory reservation and 100% CPU reservation.
To make a fair comparison, both the 'Normal' latency sensitivity VM and the 'High'
latency sensitivity VM should have the same resource reservations, so that the only
difference between the two is the 'High' latency sensitivity setting.
First, we will create resource allocations for the perf-worker-04a virtual machine while
Latency Sensitivity is set to "Normal" (the default setting).
1. Click perf-worker-04a
2. Click Actions
3. Click Edit Settings...
HOL-SDC-1604
Page 57
HOL-SDC-1604
Set 'High' Latency Sensitivity.

1.
2.
3.
4.
Select VM Options
Expand Advanced
Select High
Click OK
CPU reservation warning

Maybe you noticed a warning in the previous picture? "Check CPU Reservation" appear
next to the Latency Sensitivity setting. For best performance, High Latency Sensitivity
requires you set 100% CPU reservation for the VM, which we did earlier. This warning
HOL-SDC-1604
Page 58
HOL-SDC-1604
will always appear in the Advanced Settings screen, even when the CPU reservation has
already been set high enough.
If no reservation is set, the VM is still allowed to power on and no further warnings are
made.
1. Right-click perf-worker-04a
2. Select Power
3. Click Power On
HOL-SDC-1604
Page 59
HOL-SDC-1604
Monitor Resource Allocation

1. Select the "Monitor" tab
2. Select "Utilization"
On the top half of this image, we see that the 'High' Latency Sensitivity VM shows 100%
Active CPU and Private Memory even though the VM itself is idle. Compare this to the
Resource Allocation for the 'Normal' Latency Sensitive VM which we examined earlier. It
shows only a small portion of the total CPU and Memory reservation is Active. This
increase in Active CPU and Memory is the result of the 'High' Latency Sensitivity setting.
Although we cannot see the difference in this environment when 'High' Latency
Sensitivity is set with 100% CPU reservation, the Host CPU will show 100%
utilization of the physical core which is hosting the VM's vCPU. This is a normal
result of exclusive affinity in the Lab environment and occurs even when the VM itself is
idle. On many Intel processors, the physical CPU hosting the vCPU will be idle if the
vCPU is idle but it will still be unavailable to other vCPUs.
HOL-SDC-1604
Page 60
HOL-SDC-1604
Monitor the VM Stats Collectors

Before we set 'High' Latency Sensitivity for perf-worker-04a, the CPU workers had
equivalent benchmark scores. Now, one of the CPU workers will have a lower score. In
the example above, perf-worker-06a has a lower score. Your lab may show either perfworker-05a or perf-worker-06a with a lower score. This confirms that perf-worker-04a has
impacted perf-worker-06a's access to CPU cycles which decreases its CPU benchmark
score.
Next, we will test network latency on the 'High' Latency Sensitivity VM.
Open a PuTTY window

HOL-SDC-1604
Page 61
HOL-SDC-1604
SSH to perf-worker-04a
2. Click Open
HOL-SDC-1604
Page 62
HOL-SDC-1604
Test network latency on 'High' Latency Sensitive VM

At the command line, run the command:
ping -f -w 1 192.168.100.153
Like last time, wait for the command to complete, and run this command a total of three
times.
We'll take a look at the results in a second, but first we will set the Latency Sensitivity
setting back to default.
Compare network latency tests

From the taskbar, click the PuTTY icons to bring both PuTTY windows to the foreground
and arrange them with Normal Latency Sensitivity on top and High Latency Sensitivity
on the bottom.
Hint: At the bottom of both windows, there should be a timestamp:

Broadcast message from root (timestamp): The oldest timestamp is the Normal
Latency Sensitivity VM. Place this window on top and the other on bottom.
Now let's delve into the performance results.
HOL-SDC-1604
Page 63
HOL-SDC-1604
Important Note: Due to variable loads in the lab environment, your numbers may
differ from those above.
The ping test we completed sends as many ping requests to the remote VM as possible
("Back to back pings") within a one second period. As soon as one ping is returned,
another request is sent. The ping command outputs four statistics per test:
minimum latency (min)

average latency (avg)
maximum latency (max)
maximum deviation (mdev)
Of these, we are most interested in minimum latency and maximum deviation.

From 'eyeballing' the differences in numbers between the 'Normal' and 'High' Latency
Sensitivity VMs, hopefully you will be able to see the difference. Note the numbers
within the green brackets; the smaller deviation in the 'High' Latency sensitive VM
represents less "jitter". Because this is a shared virtualized test environment, these
performance results are not representative of the effects of the Latency Sensitivity
setting in a real-life environment. They are for demonstration purposes only.
Remember, these numbers were taken from the same VM with the same resource
allocations, under the same conditions. The only difference between the two is setting
'Normal' versus 'High' Latency Sensitivity.
HOL-SDC-1604
Page 64
HOL-SDC-1604
Close the VM Stats Collector windows

From the taskbar, click the .NET icon to bring the VM Stats Collectors to the foreground.
HOL-SDC-1604
Page 65
HOL-SDC-1604
We have finished the network tests. Close the windows using the X on each window.
Close open PuTTY windows

Close the open PuTTY windows.
HOL-SDC-1604
Page 66
HOL-SDC-1604
Conclusion and Cleanup

Clean up procedure
Launch PowerCLI

.\StopLabVMs.ps1
press enter
HOL-SDC-1604
Page 67
HOL-SDC-1604

Key take aways

The Latency Sensitivity setting is very easy to configure. Once you have determined
whether your application fits the definition of 'High' latency sensitivity (tens of
microseconds), configure Latency Sensitivity.
To review:
1. On a powered off VM, set 100% memory reservation for the latency sensitive VM.
2. If your environment allows, set 100% CPU reservation for the latency sensitive VM
such that the MHz reserved is equal to 100% of the sum of the frequency of the VM's
vCPUs.
3. In Advanced Settings, set Latency Sensitivity to High.
If you want to learn more about running latency sensitive applications on vSphere,
consult these white papers:
http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-SensitiveWorkloads.pdf
http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf
Conclusion
This concludes Module X, Module Title. We hope you have enjoyed taking it. Please
do not forget to fill out the survey when you are finished.
HOL-SDC-1604
Page 68
HOL-SDC-1604

minutes)
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 69
HOL-SDC-1604
Module 3: CPU
Power Management
Policies (15 minutes)
HOL-SDC-1604
Page 70
HOL-SDC-1604
Introduction to, and Performance

Impact of, Power Policies
VMware vSphere serves as a common virtualization platform for a diverse ecosystem of
applications. Every application has different performance demands which must be
met, but recent increases in density and computing needs in datacenters are straining
power and cooling capacities and costs of running these applications.
vSphere Host Power Management (HPM) is a technique that saves energy by placing
certain parts of a computer system or device into a reduced power state when the
system or device is inactive or does not need to run at maximum speed. vSphere
handles power management by utilizing Advanced Configuration and Power Interface
(ACPI) performance and power states. In VMware vSphere 5.0, the default power
management policy was based on dynamic voltage and frequency scaling (DVFS). This
technology utilizes the processors performance states and allows some power to be
saved by running the processor at a lower frequency and voltage. However, beginning
in VMware vSphere 5.5, the default HPM policy uses deep halt states (C-states) in
addition to DVFS to significantly increase power savings over previous releases while
still maintaining good performance.
However, in order for ESXi to be able to control these features, you must ensure that the
server BIOS power management profile is set to "OS Control mode" or the equivalent.
In this lab, we will show how to:
1. Customize your server's BIOS settings (using example screen shots)
2. Explain the four power policies that ESXi offers, and demonstrate how to change
this setting
3. Optimize your environment for either balancing power and performance
(recommended for most environments), or optimizing for maximum performance
(which can sacrifice some power savings).
HOL-SDC-1604
Page 71
HOL-SDC-1604

HOL-SDC-1604
Page 72
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 73
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 74
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 75
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 76
HOL-SDC-1604
Configuring the Server BIOS Power

Management Settings
VMware ESXi includes a full range of host power management capabilities. These can
save power when an ESXi host is not fully utilized. As a best practice, you should
configure your server BIOS settings to allow ESXi the most flexibility in using the power
management features offered by your hardware, and make your power management
choices within ESXi (next section).
On most systems, the default setting is BIOS-controlled power management. With that
setting, ESXi wont be able to manage power; instead it will be managed by the BIOS
firmware. The sections that follow describe how to change this setting to OS Control
(recommended for most environments).
In certain cases, poor performance may be related to processor power management,
implemented either by ESXi or by the server hardware. Certain applications that are
very sensitive to processing speed latencies may show less than expected performance
when processor power management features are enabled. It may be necessary to turn
off ESXi and server hardware power management features to achieve the best
performance for such applications. This setting is typically called Maximum
Performance mode in the BIOS.
NOTE: Disabling power management usually results in more power being consumed by
the system, especially when it is lightly loaded. The majority of applications benefit from
the power savings offered by power management, with little or no performance impact.
Therefore, if disabling power management does not realize any increased performance,
VMware recommends that power management be re-enabled to reduce power
consumption.
For more details on how and what to configure, see this white paper:
http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf
Configuring BIOS to OS Control mode (Dell example)

The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS
can be configured to allow the OS (ESXi) to control the CPU power-saving features
directly:
Under the Power Management section, set the Power Management policy to
OS Control.
For a Dell PowerEdge 12th Generation or newer server with UEFI, review the System
Profile modes in the System Setup> System BIOS settings. You see these options:
Performance Per Watt (DAPC-System)
HOL-SDC-1604
Page 77
HOL-SDC-1604
Performance Per Watt (OS)

Performance
Dense Configuration (DAPC-System)
Custom
Choose Performance Per Watt (OS).

Next, you should verify the Power Management policy used by ESXi (see the next
section).
HOL-SDC-1604
Page 78
HOL-SDC-1604
Configuring BIOS to OS Control mode (HP example)

The screenshot above illustrates how a HP ProLiant server BIOS can be configured
through the ROM-Based Setup Utility (RBSU). The settings highlighted in red allow the
OS (ESXi) to control some of the CPU power-saving features directly:
Go to the Power Management Options section, HP Power Profile, and select
Custom
Go to the Power Management Options section, HP Power Regulator, and
select OS Control Mode
Next, you should verify the Power Management policy used by ESXi (see the next
section).
Configuring BIOS to Maximum Performance mode (Dell

example)
The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS
can be configured to disable power management:
Under the Power Management section, set the Power Management policy to
OS Control.
For a Dell PowerEdge 12th Generation or newer server with UEFI, review the System
Profile modes in the System Setup> System BIOS settings. You see these options:
Performance Per Watt (DAPC-System)

Performance Per Watt (OS)
Performance
Dense Configuration (DAPC-System)
Custom
Choose Performance to disable power management.
HOL-SDC-1604
Page 79
HOL-SDC-1604
consumption.
HOL-SDC-1604
Page 80
HOL-SDC-1604
Configuring BIOS to Maximum Performance mode (HP

example)
The screenshot above illustrates how to set the HP Power Profile mode in the server's
RBSU to the Maximum Performance setting to disable power management:
Enter RBSU by pressing F9 during the server boot-up process
Select Power Management Options
Change the HP Power Profile to Maximum Performance mode.
consumption.
Configuring BIOS Custom Settings (Advanced)

The screenshot above illustrates that if a Custom System Profile is selected, individual
parameters are allowed to be modified. Here are some examples of some of these
settings; for more information, please consult your server's BIOS setup manual.
C1E is a hardware-managed state: When ESXi puts the CPU into the C1 state, the
CPU hardware can determine, based on its own criteria, to deepen the state to
C1E. Availability of the C1E halt state typically provides a reduction in power
consumption, with little or no impact on performance.
HOL-SDC-1604
Page 81
HOL-SDC-1604
C-states deeper than C1/C1E (typically C3 and/or C6 on Intel and AMD) are
managed by software and enable further power savings. You should enable all Cstates and the deepest C-state in the BIOS to get the best performance per watt.
This gives you the flexibility to use vSphere host power management to control
their use.
When Turbo Boost or Turbo Core is enabled, C1E and deep halt states (for
example, C3 and C6 on Intel) can sometimes even increase the performance of
certain lightly-threaded workloads (workloads that leave some hardware threads
idle). Therefore, you should enable C1E and deep C-states in the BIOS. However,
for a very few multithreaded workloads that are highly sensitive to I/O latency, Cstates can reduce performance. In these cases, you might obtain better
performance by disabling them in the BIOS. Also, C1E and deep C-states
implementation can be different for different processor vendors and generations,
so your results may vary.
Some systems have Processor Clocking Control (PCC) technology, which enables
ESXi to manage power on the host system indirectly, in cooperation with the
BIOS. This setting is usually located under the Advanced Power Management
options in the BIOS of supported HP systems and is usually called Collaborative
Power Control. As shown above, there is a Collaborative CPU Performance
Control setting in the Dell PowerEdge BIOS. With this technology, ESXi does not
manage P-states directly. It instead cooperates with the BIOS to determine the
processor clock rate. This feature was turned on by default only in ESXi for 5.0 GA
up until 5.0 U2 and has been disabled in ESXi for stability reasons. You should not
re-enable it.
HOL-SDC-1604
Page 82
HOL-SDC-1604
HOL-SDC-1604
Page 83
HOL-SDC-1604
Configuring Host Power Management

in ESXi
VMware ESXi includes a full range of host power management capabilities. These can
save power when an ESXi host is not fully utilized. As a best practice, you should
configure your server BIOS settings to allow ESXi the most flexibility in using the power
management features offered by your hardware, and make your power management
choices within ESXi. These choices are described below.
Select Host Power Management Settings for esx-01a

1.
2.
3.
4.
Select
Select
Select
Select
"esx-01a.corp.local"
"Manage"
"Settings"
"Power Management" in the Hardware section (not under System)
Power Management Policies

On a physical host, the Power Management options could look like this (it may vary
depending on the processors of the physical host).
HOL-SDC-1604
Page 84
HOL-SDC-1604
Here you can see what ACPI states that get presented to the host and what Power
Management policy is currently active. There are four Power Management policies
available in ESXi 5.0, 5.1, 5.5, 6.0 and ESXi/ESX 4.1:
High Performance
Balanced (Default)
Low Power
Custom
1. Click "Edit" to see the different options

NOTE: Due to the nature of this lab environment, we are not interacting directly with
physical servers, so changing the Power Management policy will not have any
noticeable effect. Therefore, while the sections that follow will describe each Power
Management policy, we won't actually change this setting.
High Performance
The High Performance power policy maximizes performance,and uses no power
management features. It keeps CPUs in the highest P-state at all times. It uses only
the top two C-states (running and halted), not any of the deep states (for example, C3
and C6 on the latest Intel processors). High performance was the default power policy
for ESX/ESXi releases prior to 5.0.
HOL-SDC-1604
Page 85
HOL-SDC-1604
Balanced (default)
The Balanced power policy is designed to reduce host power consumption while
having little or no impact on performance. The balanced policy uses an algorithm that
exploits the processors P-states. This is the default power policy since ESXi
5.0. Beginning in ESXi 5.5, we now also use deep C-states (greater than C1) in the
Balanced power policy. Formerly, when a CPU was idle, it would always enter C1. Now
ESXi chooses a suitable deep C-state depending on its estimate of when the CPU will
next need to wake up.
Low Power
The Low Power policy is designed to save substantially more power than the
Balanced policy by making the P-state and C-state selection algorithms more
aggressive, at the risk of reduced performance.
HOL-SDC-1604
Page 86
HOL-SDC-1604
Custom
The Custom power policy starts out the same as Balanced, but allows individual
parameters to be modified.
Click "Cancel" to exit.
The next step describes settings that control the Custom power policy.
Setting custom parameters

To configure the custom policy settings,
1. Select Advanced System Settings (under the System section)
2. Type "in custom policy" in the filter search bar (as shown above) to only show
Custom policy settings.
The settings you can customize include:
Power.CStateMaxLatency : Do not use C-states whose latency is greater than
this value.
Power.CStatePredictionCoef : A parameter in the ESXi algorithm for predicting
how long a CPU that becomes idle will remain idle. Changing this value is not
recommended.
Power.CStateResidencyCoef : When a CPU becomes idle, choose the deepest
C-state whose latency multiplied by this value is less than the hosts prediction of
how long the CPU will remain idle. Larger values make ESXi more conservative
about using deep C-states; smaller values are more aggressive.
Power.MaxCpuLoad : Use P-states to save power on a CPU only when the CPU is
busy for less than the given percentage of real time.
HOL-SDC-1604
Page 87
HOL-SDC-1604
Power.MaxFreqPct : Do not use any P-states faster than the given percentage
of full CPU speed, rounded up to the next available P-state.
Power.MinFreqPct : Do not use any P-states slower than the given percentage
of full CPU speed.
Power.PerfBias : Performance Energy Bias Hint (Intel only). Sets an MSR on Intel
processors to an Intel-recommended value. Intel recommends 0 for high
performance, 6 for balanced, and 15 for low power. Other values are undefined.
Power.TimerHz : Controls how many times per second ESXi reevaluates which Pstate each CPU should be in.
Power.UseCStates : Use deep ACPI C-states (C2 or below) when the processor is
idle.
Power.UsePStates : Use ACPI P-states to save power when the processor is
busy.
HOL-SDC-1604
Page 88
HOL-SDC-1604
Conclusion
Key takeaways
We hope that you now know how to change power policies, both at the server BIOS level
and also within ESXi itself.
To summarize, here are some best practices around power management policies:
Configure your physical host (server BIOS) to OS Control mode as the power
policy. If applicable, enable Turbo mode, C-States (including deep C-states),
which are usually the default.
Within ESXi, the default Balanced power management policy will achieve the
best performance per watt for most workloads.
For applications that require maximum performance, switch the BIOS power
policy and/or the ESXi power management policy to Maximum Performance
and High Performance respectively. This includes latency-sensitive applications
that must execute within strict constraints on response time. Be aware, however,
that this typically only results in minimal performance gain, but disables all
potential power savings.
Depending on your applications and the level of utilization of your ESXi hosts, the
correct power policy setting can have a great impact on both performance and energy
consumption. On modern hardware, it is possible to have ESXi control the power
management features of the hardware platform used. You can select to use predefined
policies or you can create your own custom policy.
Recent studies have shown that it is best to let ESXi control the power policy. For more
details, see the following references:
http://blogs.vmware.com/performance/2014/09/custom-power-managementsettings-power-savings-vsphere-5-5.html
http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf
Conclusion
This concludes Module 3: CPU Performance Feature: Power Policies. We hope you have
enjoyed taking it. Please do not forget to fill out the survey when you are finished.
HOL-SDC-1604
Page 89
HOL-SDC-1604

minutes)
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 90
HOL-SDC-1604
Module 4: vSphere Fault

Tolerance (FT) and
Performance (30 minutes)
HOL-SDC-1604
Page 91
HOL-SDC-1604
Introduction to vSphere Fault

Tolerance
VMware vSphere Fault Tolerance (FT) is a pioneering component that provides
continuous availability to applications, preventing downtime and data loss in the event
of server failures. VMware Fault Tolerance, built using VMware vLockstep technology,
provides operational continuity and high levels of uptime in VMware vSphere
environments, with simplicity and at a low cost.
With vSphere 6, one of the key new features is support for up to 4 virtual CPUs
(vCPUs) in FT virtual machines, also known as SMP FT. This is especially important for
IT departments that may have limited clustering experience but don't want the
downtime associated with a hardware failure. Use vSphere FT as needed for
applications that require continuous protection during critical times, such as
quarter-end processing.
HOL-SDC-1604
Page 92
HOL-SDC-1604
How does Fault Tolerance work?

VMware vSphere Fault Tolerance (FT) provides continuous availability for applications
in the event of server failures by creating a live "shadow instance" of a virtual machine
that is always up-to-date with the primary virtual machine. In the event of a hardware
outage, vSphere FT automatically triggers failoverensuring zero downtime and
preventing data loss.
After failover, vSphere FT automatically creates a new, secondary virtual machine to
deliver continuous protection for the application.
FT Architecture
vSphere FT is made possible by four underlying technologies: storage, runtime state,
network, transparent failover.
Storage
vSphere FT ensures the storage of the primary and secondary virtual machines is
always kept in sync. Whenever vSphere FT protection begins, an initial synchronization
HOL-SDC-1604
Page 93
HOL-SDC-1604
of the VMDKs happens using a Storage vMotion to ensure the primary and secondary
have the exact same disk state.
This initial Storage vMotion happens whenever FT is turned on, a failover occurs, or a
powered-off FT virtual machine powers on. The FT virtual machine is not considered FTprotected until the Storage vMotion completes.
After this initial synchronization, vSphere FT will mirror VMDK modifications between the
primary and secondary over the FT network to ensure the storage of the replicas
continues to be identical.
Runtime State
vSphere FT ensures the runtime state of the two replicas is always identical. It does this
by continuously capturing the active memory and precise execution state of the virtual
machine, and rapidly transferring them over a high speed network, allowing the virtual
machine to instantaneously switch from running on the primary ESXi host to the
secondary ESXi host whenever a failure occurs.
Network
The networks being used by the virtual machine are also virtualized by the underlying
ESXi host, ensuring that even after a failover, the virtual machine identity and network
connections are preserved. Similar to vSphere vMotion, vSphere FT manages the virtual
MAC address as part of the process. If the secondary virtual machine is activated,
vSphere FT pings the network switch to ensure that it is aware of the new physical
location of the virtual MAC address. Since vSphere FT preserves the storage, the precise
execution state, the network identity, and the active network connections, the result is
zero downtime and no disruption to users should an ESXi server failure occur.
Transparent Failover
If a failover occurs, vSphere FT ensures that the new primary always agrees with the old
primary about the state of the virtual machine. This is achieved by holding and only
releasing externally visible output from the virtual machine once an acknowledgment is
made from the secondary affirming that the state of the two virtual machines is
consistent (for the purposes of vSphere FT, externally visible output is network
transmissions).
Benefits of FT
vSphere FT offers the following benefits:
Provides continuous availability, for zero downtime and zero data loss with
infrastructure failures
Protects mission-critical, high-performance applications regardless of operating
system (OS)
Provides uninterrupted service through an intuitive administrative interface
HOL-SDC-1604
Page 94
HOL-SDC-1604
Delivers a fully automated response
HOL-SDC-1604
Page 95
HOL-SDC-1604
What's new in vSphere 6.0 FT

With vSphere 6.0, the new Multi-Processor FT (SMP-FT) implementation now brings
continuous availability protection for VMs with up to 4 vCPUs. There are also new
features in addition:
Enhanced virtual disk format support (all types of VMDK)

Ability to hot configure FT
Greatly increased FT host compatibility
API for non-disruptive snapshots
Support for vStorage APIs for Data Protection (VADP)
Note that there are some differences between the vSphere editions: Standard and
Enterprise support 2 vCPU FT, while Enterprise Plus raises this to 4 vCPU support.
(credit: http://vinfrastructure.it/2015/02/vmware-vsphere-6-the-new-ft-feature/)
HOL-SDC-1604
Page 96
HOL-SDC-1604
Considerations for vCenter Server with FT

When virtualizing vCenter Server, technologies such as vSphere FT can help protect the
vCenter management server from hardware failures.
Compared to vSphere HA, vSphere FT can provide instantaneous protection, but the
following limitations must be considered:
The vCenter Server system is limited to four vCPUs.

vSphere FT protects against hardware failures but not against application failures.
vSphere FT cannot reduce downtime for patching-related outages.
vSphere FT has resource requirements that can create additional overhead.
Because vSphere FT is suitable for workloads with a maximum of four vCPUs and
64GB of memory, it can be used in tiny and small vCenter Server deployments.
Prerequisites for FT
All hosts with vSphere FT enabled require a dedicated 10Gbps low-latency
VMkernel interface for vSphere FT logging traffic.
The option to turn on vSphere FT is unavailable (dimmed) if any of these conditions
apply:
The VM resides on a host that does not have a license for the feature.
The VM resides on a host that is in maintenance mode or standby mode.
HOL-SDC-1604
Page 97
HOL-SDC-1604
The VM is disconnected or orphanedthat is, its VMX file cannot be accessed.

The user does not have permission to turn the feature on.
Next, we will go through an example of configuring a VM for vSphere FT.
HOL-SDC-1604
Page 98
HOL-SDC-1604

HOL-SDC-1604
Page 99
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 100
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 101
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 102
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 103
HOL-SDC-1604
Configure Lab for Fault Tolerance

As it leverages existing vSphere HA clusters, vSphere FT can safeguard any number of
virtual machines in a cluster. Administrators can turn vSphere FT on or off for specific
virtual machines with a point-and-click action in the vSphere Web Client.
To see how this works from a "functional" perspective, we will use our lab environment,
which is a nested ESXi environment. SMP-FT no longer uses the "record/replay"
capability like its younger brother Uniprocessing Fault Tolerance (UP-FT). Instead, SMP-FT
now uses a new Fast Checkpointing technique which not only improves the overall
performance of its predecessor but also greatly simplifies and reduces additional
configurations when running in a Nested ESXi environment like this Hands-On Lab.
NOTE: Running SMP-FT in a Nested ESXi environment does not replace or substitute
actual testing of physical hardware. For any type of performance testing, please test
SMP-FT using real hardware.
Edit Cluster's vSphere HA Settings

1.
2.
3.
4.
Click Cluster Site A from the Hosts and Clusters list on the left
Click Manage
Click Settings
Click vSphere HA (NOTE: you should see the message "vSphere HA is Turned
OFF" as shown above)
5. Click Edit...
We will now enable vSphere HA for the cluster.
Enable vSphere HA
1. Check Turn on vSphere HA
HOL-SDC-1604
Page 104
HOL-SDC-1604
2. Check Host Monitoring

3. Set Virtual Machine Monitoring to Disabled
4. Click Admission Control to configure this option. We will set this in the next
step.
HOL-SDC-1604
Page 105
HOL-SDC-1604
Enable vSphere HA (continued)

1. Under Admission Control, scroll to the bottom and select the last radio button,
Do not reserve failover capacity. This is necessary in our lab environment
since not all HA constraints may necessarily be guaranteed.
2. Click OK.
vSphere HA will now be enabled to reduce downtime. Again, this is a prerequisite for
Fault Tolerance.
HOL-SDC-1604
Page 106
HOL-SDC-1604
Verify vSphere HA is Enabled

1. Now that vSphere HA has been enabled, click the Refresh icon at the top of the
vSphere Web Client to ensure this is reflected in the UI.
2. Click on vSphere HA Monitoring.
HOL-SDC-1604
Page 107
HOL-SDC-1604
Review vSphere HA Monitoring page

Review the vSphere HA tab under the Monitor section. You should see a screen
similar to the above.
1. To verify there are no issues, click Configuration Issues.
HOL-SDC-1604
Page 108
HOL-SDC-1604
Review vSphere HA Configuration Issues

Review this Configuration Issues page. This list is empty if vSphere HA succeeded
without issues.
Turn On vSphere FT for perf-worker-02a

1. Left-click perf-worker-02a from the list of virtual machines on the left
2. Click the Actions dropdown in the upper right pane
3. Hover over the Fault Tolerance menu, and choose Turn On Fault Tolerance
This will pop up a Fault Tolerance configuration screen, which is shown next.
HOL-SDC-1604
Page 109
HOL-SDC-1604
Select datastore for vSphere FT (1/2)

The first step is to select the datastores for the secondary VM configuration file, tie
breaker file, and virtual hard disk.
1. Click Browse...
2. Click Browse... again from the dropdown.
A list of datastores will pop up next.
Select datastore for vSphere FT (2/2)

Click the only datastore we have in this environment (ds-site-a-nfs01), then click OK.
Repeat the previous step and this step for all three files (Configuration File, Tie
Breaker File, and Hard disk 1).
HOL-SDC-1604
Page 110
HOL-SDC-1604
Ensure compatibility checks succeeded

After selecting the datastore for the secondary files, you should see a screen like the
one above, with a green checkbox that says "Compatibility checks succeeded."
Click Next.
HOL-SDC-1604
Page 111
HOL-SDC-1604
Select host esx-02a for the secondary VM

We now need to select where to host the secondary VM.
1. perf-worker-02a is already running on esx-01a, so we should select the other host
in the cluster; click esx-02a.corp.local.
2. You'll see a warning that we will be using the same datastore (ds-site-a-nfs01)
for both the primary and secondary VM's disks. While not recommended for
production, this is only a demonstration lab environment. Click Next to continue.
HOL-SDC-1604
Page 112
HOL-SDC-1604
Review selections and finalize enabling vSphere FT

Ensure your selections match the screenshot above, and click Finish to turn on fault
tolerance for perf-worker-02a.
HOL-SDC-1604
Page 113
HOL-SDC-1604
1. Left-click perf-worker-02a from the list of virtual machines on the left. Note the
slightly darker blue color for the VM, which indicates it is now fault-tolerant.
2. Click the Actions link in the upper right pane
3. Hover over the Power menu, and choose Power On
This will power on the perf-worker-02a VM, and start the procedure to make it Fault
Tolerant.
HOL-SDC-1604
Page 114
HOL-SDC-1604
Monitor the Fault Tolerance secondary VM creation

1. Click the vSphere Web Client text in the upper left-hand corner to return to the
home screen
2. Click Tasks on the left-hand pane
3. Click the Refresh icon periodically and monitor the Task Console Progress until
you see it has Completed
Step 3 could take up to 5-10 minutes to complete. Once you see that all tasks have
Completed, continue onto the next step.
HOL-SDC-1604
Page 115
HOL-SDC-1604
Select the Fault Tolerant VM

Once the secondary VM has been created in the previous step, click on the perfworker-02a VM to switch the view back to our fault-tolerant VM.
HOL-SDC-1604
Page 116
HOL-SDC-1604
Verify the VM is Fault Tolerant

From the perf-worker-02a VM view, we can verify that the VM is now Fault Tolerant in a
couple of ways:
The dark blue icon highlighted above (when you hover over it, the tooltip reads
Protected by VMware Fault Tolerance)
The Fault Tolerance section highlighted above, which indicates the Status
(Protected), the location of the secondary VM (esx-02a), and the network
bandwidth used to keep the primary and secondary VM synchronized
Additionally, we could induce a failover from esx-01a to esx-02a by selecting Actions,
Fault Tolerance, Test Failover. However, this is time-consuming and resourceintensive, as it not only makes esx-02a the new Primary VM location, but also makes
esx-01a the new Secondary VM location.
Turn Off vSphere FT for perf-worker-02a

Now let's reverse the process (disable Fault Tolerance) to clean up the environment:
1. Click the Actions dropdown in the upper right pane
2. Hover over the Fault Tolerance menu
3. Click Turn Off Fault Tolerance
HOL-SDC-1604
Page 117
HOL-SDC-1604
This will pop up a dialog box asking you to confirm; click Yes.
This will unregister, power off, and delete the secondary VM.
HOL-SDC-1604
Page 118
HOL-SDC-1604

We need to reselect the cluster to remove vSphere HA:
1. Get to the Home screen by clicking vSphere Web Client in the upper left
2. Click Hosts and Clusters.
HOL-SDC-1604
Page 119
HOL-SDC-1604
Edit Cluster's vSphere HA Settings

1.
2.
3.
4.
Click Cluster Site A from the Hosts and Clusters list on the left
Click Manage
Click Settings
Click vSphere HA (NOTE: you should see the message "vSphere HA is Turned
ON" as shown above)
5. Click Edit...
We will now disable vSphere HA for the cluster.
HOL-SDC-1604
Page 120
HOL-SDC-1604
Disable vSphere HA
1. Uncheck Turn on vSphere HA
2. Check OK
HOL-SDC-1604
Page 121
HOL-SDC-1604
Shut down perf-worker-02a

1. Left-click perf-worker-02a from the list of virtual machines on the left. Note the
VM is no longer dark blue, since it is no longer fault-tolerant.
2. Click the Actions link in the upper right pane
3. Hover over the Power menu, and choose Shut Down Guest OS
This will shut down the perf-worker-02a VM.
HOL-SDC-1604
Page 122
HOL-SDC-1604
Fault Tolerance Performance

Because the hands-on lab environment is shared, it is not feasible to try and run
benchmarks that could potentially saturate the environment (there are other users
taking this, and other labs, too :-).
Therefore, this section will show some results from a FT performance whitepaper, which
used a variety of micro-benchmarks and real-life workloads.
Micro-benchmarks were used to stress CPU, disk, and network subsystems
individually by driving them to saturation.
Real life workloads, on the other hand, have been chosen to be representative of
what most customers would run and they have been configured to have a CPU
utilization of 60 percent in steady state.
Identical hardware test beds were used for all the experiments, and the performance
comparison was done by running the same workload on the same virtual machine with
and without FT enabled.
HOL-SDC-1604
Page 123
HOL-SDC-1604
Kernel Compile
This experiment shows the time taken to do a parallel compile of the Linux kernel. This
is a both a CPU- and MMU-intensive workload due to the forking of many parallel
processes. The CPU is 100 percent utilized. This workload does some disk reads and
writes, but generates no network traffic.
As shown in the figure above, FT protection increases the kernel compile time a small
amount -- about 7 seconds.
Network Throughput (Netperf)

Netperf is a micro-benchmark that measures the performance of sending and receiving
network packets. Netperf was configured with several scenarios and network speeds to
demonstrate the throughput and latency performance of TCP/IP under FT.
This netperf experiment measures unidirectional TCP/IP throughput. One experiment is
done in each direction, when the virtual machine is either receiving or transmitting data.
The speed of the virtual machine network is an important factor for performance; the
experiment shown above was on a 1 Gbps uplink.
The throughput experiments reveal some important points about performance under FT
protection:
HOL-SDC-1604
Page 124
HOL-SDC-1604
The FT network traffic is minimal when transmitting heavy workloads.

The FT network traffic is high when receiving heavy workloads.
Receiving heavy workloads tends to increase FT traffic due to the requirement to keep
the replicas in sync. The influx of data into the primary causes large differences
between the replicas, and thus requires more data to be sent over the FT network.
Transmitting heavy workloads, on the other hand, causes very few differences between
the replicas, and thus very little FT traffic. Therefore, transmit-heavy applications, such
as Web servers or read-bound databases tend to have less FT traffic requirements.
Network Latency (Netperf)

Another aspect of networking performance to consider is latency. Fault tolerance
introduces some delay to network output (measureable only in milliseconds, as shown
above). The latency occurs because an FT-protected primary withholds network
transmissions until the secondary acknowledges to the primary that it has reached an
identical runtime state.
In this experiment, netperf is run with the TCP_RR configuration (single stream, no
parallel streams) and the round-trip latency is reported here (it is the inverse of the
round-trip transaction rate). TCP_RR is a pure latency benchmark: the sender transmits
a 1-byte message and blocks waiting for a 1-byte response, the benchmark counts the
number of serial transactions completed in a unit time, and has no parallelization.
In a pure latency benchmark, all latency increases drop throughput (for example,
latency increases 57 times and throughput drops identically).
Normal server applications are not pure latency benchmarks, however. They handle
multiple connections at a time and each connection will transmit several packets worth
of data before pausing to hear a response. The result is that real world applications
tolerate network latencies without dropping throughput. The previous netperf
HOL-SDC-1604
Page 125
HOL-SDC-1604
throughput experiment is an example of this, and the client/server workloads examined

in this paper demonstrate the same thing.
One aspect not measured by netperf is jitter and latency fluctuation. FT-protected
virtual machines can vary widely in latencies depending on the workload, and over time
within a given workload. This can cause significant jitter. Highly latency-sensitive
applications, such as high frequency trading (HFT), or some voice-over-IP (VOIP)
applications may experience high overhead with FT. However, some voice applications,
where the bulk data is carried by separate machines and only call management traffic is
FT protected, would perform fine.
HOL-SDC-1604
Page 126
HOL-SDC-1604
Iometer
Iometer is an I/O subsystem measurement and characterization tool for Microsoft
Windows. It is designed to produce a mix of operations to stress the disk. This
benchmark ran random I/Os of various types. The bar charts above show that FTprotected VM achieves nearly the same throughput as the non-protected VM.
HOL-SDC-1604
Page 127
HOL-SDC-1604
Swingbench with Oracle 11g

In this experiment, an Oracle 11g database was driven using the Swingbench 2.2 order
entry online transaction processing (OLTP) workload. This workload has a mixture of
CPU, memory, disk, and network resource requirements. The FT-protected virtual
machine is able to achieve nearly the same throughput as the non-FT virtual machine
(top chart).
The latency of basic operations has increased under FT protection (bottom chart), but
still within an acceptable user threshold of milliseconds.
HOL-SDC-1604
Page 128
HOL-SDC-1604
Conclusion
All Fault Tolerance solutions rely on redundancy. That means a certain cost must be paid
to establish replicas and keep them in sync. These costs come in the form of CPU,
storage, and network overheads. For a variety of workloads, CPU and storage
overheads are generally modest or minimal with FT protection. The most
noticeable overhead for FT-protected virtual machines is the increase in latency for
network packets. However, the experiments performed have shown that FT-protected
workloads can achieve good application throughput despite an increase in network
latency; network latency does not dictate overall application throughput for a wide
variety of applications. On the other hand, applications that are sensitive to network
latency (such as high frequency trading or realtime workloads) will pay a higher cost
under FT protection.
VMware vSphere Fault Tolerance is a revolutionary new technology. It universally
applies the basic principles and guarantees of fault-tolerant technology to any multivCPU workload in a uniquely simple to use way. The vSphere FT solution is able to
achieve good throughput for a wide variety of applications.
HOL-SDC-1604
Page 129
HOL-SDC-1604
Module 5: Memory
Performance, Basic
Concepts and
Troubleshooting (30
minutes)
HOL-SDC-1604
Page 130
HOL-SDC-1604
Introduction to Memory Performance

Troubleshooting
The goal of this module is to expose you to a memory performance problem in a
virtualized environment as an example. It will also guide you on how to quickly identify
performance problems by checking various performance metrics and settings.
Host memory is a limited resource. VMware vSphere incorporates sophisticated
mechanisms that maximize the use of available memory through page sharing,
resource-allocation controls, and other memory management techniques. However,
several vSphere Memory Over-commitment Techniques only come into play when the
host is under memory pressure (in other words, when the sum of all of the VMs' virtual
memory exceeds that of the physical hosts hosting the VMs).
This module will discuss:
Active (Memory Demand) vs. Consumed Memory Usage
Types of Swapping, when they kick in and their impact
Memory Metrics to monitor in order to detect potential memory issues
This test demonstrates Memory Demand vs. Consumed Memory in a vSphere
environment. It also demonstrates how memory overcommitment impacts host and VM
performance.
The first step is to prepare the environment for the demonstration.
HOL-SDC-1604
Page 131
HOL-SDC-1604

HOL-SDC-1604
Page 132
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 133
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 134
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 135
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 136
HOL-SDC-1604
Memory Resource Control

prompt.
Start Memory workload

.\StartMemoryTest.ps1
press Enter. This script will configure and start up two VMs, and generate a memory
workload.
NOTE: Please wait a couple of minutes, and do not proceed with the lab until you see
output as shown in the next step.
HOL-SDC-1604
Page 137
HOL-SDC-1604
Memory activity benchmark

Two windows showing a memory performance benchmark are launched. We need these
to generate workload that we can inspect. We will return to them shortly.
The actual performance numbers will vary from environment to environment.
HOL-SDC-1604
Page 138
HOL-SDC-1604
Select perf-worker-02a
Return to the vSphere Web Client.
Monitor perf-worker-02a Utilization metrics

1. Select the Monitor tab.
2. Select Utilization
You can see that perf-worker-02a and perf-worker-03a virtual machines are
configured with 1.5GB of memory and are running on the ESXi host esx-01a. If you
wait for a while, the memory consumption of the virtual machines will look something
like the above screenshot. The ESXi host has 8GB of memory, so there is no memory
contention at this time.
A host determines allocations for each VM based on the number of shares allocated to
it and an estimate of its recent working set size (shown as Active Guest Memory
above):
Shares: a modified proportional-share memory allocation policy. Memory shares
entitle a virtual machine to a fraction of available physical memory.
Working set size/active guest memory: an estimate determined by
monitoring memory activity once a minute. Estimates are smoothed over several
time periods using techniques that respond rapidly to increases in working set
size and more slowly to decreases in working set size.
HOL-SDC-1604
Page 139
HOL-SDC-1604
This approach ensures that a virtual machine from which idle memory is reclaimed can
ramp up quickly to its full share-based allocation when it starts using its memory more
actively.
By default, active memory is estimated once every 60 seconds. To modify this, adjust
the Mem.SamplePeriod advanced setting.
HOL-SDC-1604
Page 140
HOL-SDC-1604
Select esx-01a Host

View the ESX hosts memory metrics

1.
2.
3.
4.
Select
Select
Select
Select
"Monitor"
"Performance"
"Advanced"
the "Memory" view
Consumed memory on the host is around 4GB, but active memory is less than 3GB.
Notice that there is no memory contention, as the host has 8GB of memory.
In late 2014, VMware announced that ESXi will no longer have TPS (Transparent Page
Sharing) enabled in future releases per default, although TPS is still available. For more
information see KB: http://kb.vmware.com/kb/2080735
Transparent page sharing is a method by which redundant copies of memory pages are
eliminated (deduplicated). TPS has always been running by default, until late 2014.
However, if TPS is enabled, and you are running on modern hardware-assisted memory
virtualization systems; vSphere will preferentially back guest physical pages with large
host physical pages (2MB contiguous memory region instead of 4KB for regular pages)
for better performance. vSphere will not attempt to share large physical pages because
the probability of finding two large pages that are identical is very low. If memory
pressure occurs on the host, vSphere may break the large memory pages into regular
4KB pages, which TPS will then be able to use to consolidate memory in the host.
HOL-SDC-1604
Page 141
HOL-SDC-1604
In vSphere 6, TPS has been enhanced to support different levels of page sharing such as
intra VM sharing, inter VM sharing etc. See this article for more information:
http://kb.vmware.com/kb/2097593
HOL-SDC-1604
Page 142
HOL-SDC-1604
Observe Application Performance

As there is no memory pressure in the environment, the virtual machine performance is
good. The virtual machines are configured identical, and there for their performance
numbers are fairly identical. The numbers will bounce a bit up and down due to the
design of the lab.
HOL-SDC-1604
Page 143
HOL-SDC-1604
Power on the Memory Hog virtual machine

Select and Right click on perf-worker-04a.
1. Right click "perf-worker-04a"
2. Select "Power"
3. Click "Power On"
Perf-worker-04a has been configured to boot up as a VM that consumes a lot of memory,
a memory hog. The memory hog virtual machine is configured with 5.5GB of memory,
and will consume all the free memory of the host and cause memory contention.
While the memory hog powers on, keep an eye on the benchmark scores for the
memory performance on perf-worker-02a and perf-worker-03a. They will take a large dip
in performance as the memory pressure increases and vSphere has to stabilize the
environment.
Review the Resource Allocation for the virtual machines

2. Select "Monitor"
3. Select "Utilization"
HOL-SDC-1604
Page 144
HOL-SDC-1604
Now that memory pressure is occurring in the system, vSphere will begin to use
memory overcommit techniques to conserve memory use.
It may take a while for vCenter to update the memory utilization statistics, so you might
have to wait. (Try to refresh if nothing happens)
Notice that vSphere has used some memory overcommit techniques on the perf-worker
virtual machines to relieve the memory pressure. Notice that consumed memory for the
virtual machines is now lower than before we applied memory pressure. As long as the
Active memory the virtual machine requires can stay in physical memory the
performance of the application will perform well.
HOL-SDC-1604
Page 145
HOL-SDC-1604
HOL-SDC-1604
Page 146
HOL-SDC-1604
Review the ESX host memory metrics

Review the ESX host memory metrics now that we have powered on the Memory Hog.
1.
2.
3.
4.
Select
Select
Select
Select
Monitor
Performance
Advanced
Memory
Notice that Granted and Consumed are very close to the full size of the ESX host
(8GB); Active is higher (but still less) than Consumed. You can also see how swapping
and ballooning started when we increased the memory pressure on the host. Also notice
that Swap used is relatively low. Any active swapping is a performance concern, but
relying on this metric alone can be misleading. To more accurately tell if swapping is
affecting performance, you would need to look at the Swap in rate available from the
Chart Options screen, which we will look at next. Any non-trivial Swap in rate would
likely indicate a performance problem.
HOL-SDC-1604
Page 147
HOL-SDC-1604
HOL-SDC-1604
Page 148
HOL-SDC-1604
Select Chart Options

Let's investigate the Swap in rate:
Click Chart Options
HOL-SDC-1604
Page 149
HOL-SDC-1604
Select "Swap in rate" counter

1. Scroll down to find "Swap in rate"
2. Select "Swap in rate"
3. Click "OK"
Monitor "Swap in rate" graph

Swap in rate is the highlighted purple graph. Note that it is different from the previous
chart.
You don't have to wait for the graph to progress as far as illustrated above. Just let it be
and come back and see the result later or before stopping perf-worker-04a in a later
step.
Overallocating memory tends to be fine for most applications and environments. It is
generally safe to have a 20% memory over-allocation, so for initial sizing, start with 20%
less memory over-allocation, and increase or decrease after monitoring application
performance and ensuring that memory over-allocation does not cause a constant
Swap in rate to occur. This is also depending on if you are using transparent page
sharing or not.
As you can see, there is a significant amount of memory swap in occurring on the ESXi
host.
HOL-SDC-1604
Page 150
HOL-SDC-1604
Continue to see how that has impacted the memory performance measured.
HOL-SDC-1604
Page 151
HOL-SDC-1604
Monitor Memory Performance under Contention

Now that we have applied memory contention to esx-01a.corp.local, the memory
performance has dropped significantly. The virtual machines are performing fairly
identical.
If the performance numbers are still high, wait a couple of minutes, and they
will drop significantly.
If you monitor the benchmarks for a longer period of time, you will see the performance
fluctuate. This is due to the benchmark that keeps memory active in the virtual
machines. This causes a "waves" of swap-in and swap-out, which causes the
performance numbers to shift. When ESXi has had time to optimize the memory access
between the powered on virtual machines, the application performance will start to
increase. Potentially performance will increase up to the same level as before we had
any memory contention. It all depends on the level of memory contention, level of
memory activity and memory overcommit techniques available.
Let's try and change the priority of the virtual machines access to memory.
HOL-SDC-1604
Page 152
HOL-SDC-1604
Edit perf-worker-03a
2. Select Summary
HOL-SDC-1604
Page 153
HOL-SDC-1604
Change Memory Shares to High

1.
2.
3.
4.
Select "Virtual Hardware"

Expand "Memory"
Under shares, select "High"
Click "OK"
Monitor Memory Performance with High Shares

Wait for a couple of minutes and see how the performance of perf-worker-03a starts to
increase.
HOL-SDC-1604
Page 154
HOL-SDC-1604
Now that we have doubled the amount of memory shares assigned to perf-worker-03a,
the virtual machine is being prioritized over perf-worker-02a and perf-worker-04a. This
results in increased memory performance of perf-worker-03a.
Shares is a way of influencing how access to a resource is prioritized between virtual
machines, but only under resource contention. It will not increase VM performance
in an underutilized environment.
Let's try and remove the memory contention, but keep the high amount of shares
assigned to perf-worker-03a.
HOL-SDC-1604
Page 155
HOL-SDC-1604
Power off perf-worker-04a

1. Right click "perf-worker-04a"
2. Select "Power"
3. Click "Power Off"
Confirm Power Off

Click Yes to confirm powering the VM off.
HOL-SDC-1604
Page 156
HOL-SDC-1604
Observe memory performance

Now monitor the application performance for a couple of minutes. Since we powered off
the memory hog VM, we no longer have memory contention, and the memory
performance returns to the same level as before we powered on the memory hog. So
now the amount of shares assigned to perf-worker-03a is irrelevant.
Close the benchmark windows

1. Close the two performance counters
HOL-SDC-1604
Page 157
HOL-SDC-1604

Clean up procedure
Launch PowerCLI

.\StopLabVMs.ps1
press enter
HOL-SDC-1604
Page 158
HOL-SDC-1604

Key take aways

During this lab we saw how memory overcommitting affects performance, and how
vSphere can use several techniques to reduce the impact of memory overcommit. We
also touched on how it is possible to adjust how ESXi uses TPS intra VM, inter VM or
doesn't use it at all, depending on how you evaluate the security aspects of TPS. Even
though the memory overcommit techniques in ESXi can compensate for some degree of
memory overcommit, it is still recommended to rightsize the configuration of a virtual
machine if possible. For inspiration on how to rightsize the resource configuration of
virtual machines, take a look at HOL-SDC-1610 Module 5. There you will use vRealize
Operations Manager to identify resource stress and do right sizing.
Conclusion
This concludes Module 5, Memory Performance, Basic Concepts and
Troubleshooting. We hope you have enjoyed taking it. Please do not forget to fill out
Module 1:
Module 2:
Module 3:
Module 4:
Module 5:
minutes)
Module 6:
minutes)
Module 7:
Module 8:
minutes)
HOL-SDC-1604
CPU Performance, Basic Concepts and Troubleshooting (15 minutes)

CPU Performance Feature: Latency Sensitivity Setting (45 minutes)
CPU Performance Feature: Power Policies (15 minutes)
CPU Performance Feature: SMP-FT (30 minutes)
Memory Performance, Basic Concepts and Troubleshooting (30
Memory Performance Feature: vNUMA with Memory Hot Add (30
Storage Performance and Troubleshooting (30 minutes)
Network Performance, Basic Concepts and Troubleshooting (15
Page 159
HOL-SDC-1604

(45 minutes)
HOL-SDC-1604
Page 160
HOL-SDC-1604
Module 6: Memory
vNUMA with Memory Hot
Add (30 minutes)
HOL-SDC-1604
Page 161
HOL-SDC-1604
Introduction to NUMA and vNUMA

Since 5.0, vSphere has had the vNUMA feature that presents the physical NUMA
topology to the guest operating system. Traditionally virtual machines have been
presented with a single NUMA node, regardless of the size of the virtual machine, and
regardless of the underlying hardware. Larger and larger workloads are being
virtualized, and it has become increasingly important that the guest OS and applications
can make decisions on where to execute application processes and where to place
specific application memory. ESXi is NUMA aware, and will always try to fit a VM within a
single NUMA node when possible. With the emergence of the "Monster VM" this is not
always possible.
Note that because we are working in a fully virtualized environment, we have to enforce
NUMA architectures presented to a VM. In a real environment it would be possible to see
the physical architecture. The purpose of this module is to gain understanding of how
vNUMA works by itself and in combination with the cores per socket feature.
NUMA
Non-Uniform Memory Access (NUMA) system architecture
Each node consists of CPU cores and memory. A pCPU can access memory across NUMA
nodes, but at a performance cost, and memory access time can be 30% ~ 100% longer
Without vNUMA
In this example, a VM with 12 vCPUs is running on a host with four NUMA nodes with 6
cores each. This VM is not being presented with the physical NUMA configuration and
hence the guest OS and application only sees a single NUMA node. This means that the
guest has no chance of placing processes and memory within a physical NUMA node.
HOL-SDC-1604
Page 162
HOL-SDC-1604
We have poor memory locality.
HOL-SDC-1604
Page 163
HOL-SDC-1604
With vNUMA
In this example, a VM with 12 vCPUs is running on a host that has four NUMA nodes with
6 cores each. This VM is being presented with the physical NUMA configuration, and
hence the guest OS and application sees two NUMA nodes. This means that the guest
can place processes and accompanying memory within a physical NUMA node when
possible.
We have good memory locality.
HOL-SDC-1604
Page 164
HOL-SDC-1604

HOL-SDC-1604
Page 165
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 166
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 167
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 168
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 169
HOL-SDC-1604
vNUMA vs. Cores per Socket

Besides the possibility of presenting a virtual NUMA architecture to a virtual machine, it
is also possible to alter the number of cores per socket for a virtual machine. This
feature controls how virtual CPUs are presented to a guest OS, essentially allowing the
guest OS to "see" multi-core CPUs, since by default VMware presents multiple singlecore CPUs.
In general, it's best to stick to the default (1 core per virtual socket), and just set the
number of virtual cores as large as the workload needs; this is because there is no
performance gain to be realized by using multiple virtual multi-core CPUs. The primary
use for this feature is for licensing, where an application may require fewer virtual
sockets. In this case, the optimal vNUMA size should be determined, and the number of
virtual sockets should be set to the same value.

prompt.
HOL-SDC-1604
Page 170
HOL-SDC-1604
Start the vNUMA Script

.\StartvNUMA.ps1
press enter
The script will configure and startup a VM, with four vCPUs and memory hot-add
enabled.
The script will then launch a Remote Desktop session to the perf-worker-01a
VM.
HOL-SDC-1604
Page 171
HOL-SDC-1604
Open a Command Prompt on perf-worker-01a

In the Remote Desktop window to perf-worker-01a, launch cmd.exe:
1. Click the Start button
2. Click cmd.exe
Note: Make sure you do this inside of the Remote Desktop Connection window
(as shown above).
Observe default Cores/Sockets and NUMA architecture

Type in the following command, followed by enter:
coreinfo -c -n -s
From the output you can see that the VM is presented with:
1. 4 Physical Processors (actually, virtual :-) which equals 4 cores (since it is
configured with the default 1 core per virtual socket)
HOL-SDC-1604
Page 172
HOL-SDC-1604
2. 4 Logical Processors, one per physical socket

3. 1 single NUMA node
Now let's see what impact changing the cores per socket has on this output.
HOL-SDC-1604
Page 173
HOL-SDC-1604

Back in the vSphere Web Client on the controlcenter:
1. Right click perf-worker-01a
2. Select Power
3. Select Shut Down Guest OS
Confirm Shut Down

Click Yes to confirm.
HOL-SDC-1604
Page 174
HOL-SDC-1604

1. Select perf-worker-01a from the list of VMs on the left
2. Click Actions
HOL-SDC-1604
Page 175
HOL-SDC-1604
Modify CPU configuration

1.
2.
3.
4.
Select the Virtual Hardware tab

Expand CPU
On the Cores per Socket drop-down, select 2
Click OK
HOL-SDC-1604
Page 176
HOL-SDC-1604
Power on perf-worker-01a VM
2. Select Power
3. Click Power On
Start a Remote Desktop session to perf-worker-01a

Wait a minute to allow the VM to boot, then open a Remote Desktop Connection to perfworker-01a by double-clicking the 01a shortcut on the desktop.
HOL-SDC-1604
Page 177
HOL-SDC-1604

In the perf-worker-01a window, launch a Command Prompt:
2. Click cmd.exe
Note: Make sure you're in the Remote Desktop Connection window as shown above.
Verify Multiple Cores per Socket with coreinfo

Type in the following command and press Enter:
coreinfo -c -n -s
From the output you see that one thing has changed: the VM is now presented with 2
Logical Processors (cores) per socket. Since we still have 4 processors (cores), all
presented in a single NUMA node, this is just matter of presentation to the guest OS,
and has no performance impact. The feature can be used in order to adhere to licensing
terms.
HOL-SDC-1604
Page 178
HOL-SDC-1604
This is valid when vNUMA is not enabled. Let's see what happens when we enable
vNUMA with this 2 cores per socket configuration.

Back in the vSphere Web Client on the controlcenter:
2. Select Power
3. Select Shut Down Guest OS
HOL-SDC-1604
Page 179
HOL-SDC-1604
Confirm Shut Down

Click Yes to confirm.

2. Click Actions
HOL-SDC-1604
Page 180
HOL-SDC-1604
Edit Advanced Configuration Parameters

1. Select the VM Options tab
2. Expand Advanced
3. Click Edit Configuration...
Reduce threshold for enabling vNUMA

1. Click the Name column to sort the configuration parameters alphabetically
2. Locate the row numa.vcpu.min and change the value to 4 (as shown above);
the default is 9.
3. Click OK twice to save this change.
HOL-SDC-1604
Page 181
HOL-SDC-1604
The numa.vcpu.min configuration parameter specifies the minimum number of virtual

CPUs in a VM before vNUMA is enabled. The default is 9, which means that a VM must
have 9 or more vCPUs before virtual NUMA is enabled.
By decreasing this value to 4, we can see what effect vNUMA has on this VM without
having to increase the number of vCPUs in our resource-constrained lab environment.
HOL-SDC-1604
Page 182
HOL-SDC-1604
Power on perf-worker-01a VM
2. Select Power
3. Click Power On
Start a Remote Desktop session to perf-worker-01a

Wait a minute to allow the VM to boot, then open a Remote Desktop Connection to perfworker-01a by double-clicking the 01a shortcut on the desktop.
HOL-SDC-1604
Page 183
HOL-SDC-1604

In the perf-worker-01a window, launch a Command Prompt:
2. Click cmd.exe
Note: Make sure you're in the Remote Desktop Connection window as shown above.
Verify Multiple vNUMA Nodes

Type in the following command, followed by enter:
coreinfo -c -n -s
From the output you can tell that the VM is now presented with 2 vNUMA nodes, each
having 2 processors, which means that we have successfully enabled vNUMA for this
VM.
HOL-SDC-1604
Page 184
HOL-SDC-1604
We saw earlier in this module that changing the cores per socket alone did not alter the
NUMA architecture presented to the VM. Now we can see that when used in combination
with vNUMA, the cores per socket configuration dictates the presented vNUMA
architecture. This means that when using the cores per socket feature on VMs with more
than 8 vCPUs (default value), the configuration dictates the vNUMA architecture
presented to the VM and therefore can have a impact on VM performance. This is
because we can force a VM to unnecessarily span multiple NUMA nodes.
Best Practices for vNUMA and Cores per Socket

In general, the following best practices should be followed regarding vNUMA and Cores
per Socket:
Stick to the defaults for most cases; size vCPUs via # of virtual cores, but leave
Cores per Socket = 1
For corner cases (e.g. licensing), find the optimal vNUMA size, and set the
number of virtual sockets equal to that
"Right-size" VMs to be multiples of physical NUMA size; for example, on an 8
cores/node system, 8/16/24/32 vCPU; for a 10 cores/node system, 10/20/40 vCPU
There are many Advanced Virtual NUMA Attributes (see the vSphere
Documentation Center for a full list); here are a few:
If the VM is larger than total physical core count (e.g. a 64 vCPU VM on a 40 core /
80 thread host), try numa.consolidate = false
If Hyper-Threading is enabled, numa.vcpu.preferHT=true may help (KB
2003582)
If Cores per Socket is too restrictive, the vNUMA size can be manually set with
numa.vcpu.maxPerMachineNode
HOL-SDC-1604
Page 185
HOL-SDC-1604
To enable vNUMA on a VM with 8 or fewer vCPUs, use numa.vcpu.min
HOL-SDC-1604
Page 186
HOL-SDC-1604
vNUMA with Memory Hot Add

In vSphere releases 5.0 through 5.5, if virtual NUMA was configured with in combination
with memory hot add, the additional memory was only allocated to NUMA node 0. In
vSphere 6, the added memory is distributed evenly across all available NUMA nodes,
providing better VM memory scalability.
Note that vNUMA is still disabled if vCPU hotplug is enabled; therefore, only enable vCPU
hotplug if you plan to use it. See this article for more information:
Launch NumaExplorer
perf-worker-01a should already be running and you should have an active RDP session
to it. If not, power on perf-worker-01a and launch the RDP session from the shortcut on
the desktop, as previously in this module.
On perf-worker-01a, do the following.
1. Click Start
2. Click NumaExplorer
HOL-SDC-1604
Page 187
HOL-SDC-1604
Observe vNUMA memory sizes

1. Click the Refresh button a couple of times, and monitor how the
GetNumaAvailableMemory values change slightly.
As you can see, the VM still has 2 NUMA nodes, each with 2 processor cores.
Furthermore, you can see that the available memory is evenly distributed between the
2 nodes, taking the memory consumption of running processes into consideration.
HOL-SDC-1604
Page 188
HOL-SDC-1604

2. Click Actions
HOL-SDC-1604
Page 189
HOL-SDC-1604
Modify Memory configuration

1. Select the Virtual Hardware tab
2. Change Memory to 4096 MB
3. Click OK
Observe vNUMA memory sizes

Click back to the perf-worker-01a Remote Desktop Connection window.
1. Click the Refresh button again, and now notice how the
"GetNumaAvailableMemory" has increased to 1GB per NUMA node (2GB added
in total).
HOL-SDC-1604
Page 190
HOL-SDC-1604
This experiment shows us that on a vNUMA enabled VM, memory hot add does in fact
distribute the additional memory evenly across vNUMA nodes.
Close Remote Desktop Connection to perf-worker-01a

1. Close the Remote Desktop Connection window, as shown above.
HOL-SDC-1604
Page 191
HOL-SDC-1604

Clean up procedure
Launch PowerCLI
If the PowerCLI window is not already open, click on the "VMware vSphere PowerCLI"

.\StopLabVMs.ps1
press enter
HOL-SDC-1604
Page 192
HOL-SDC-1604

Key take aways

During this lab we learned that for virtual machines with 8 or fewer vCPUs, virtual NUMA
is not enabled by default. In this scenario, increasing the cores per socket value from
the default of 1 (to expose multi-core CPUs to the guest) does not affect performance
since the VM fits within a physical NUMA node, regardless of its configuration. In that
case, the cores per socket setting is used for licensing issues only.
However, when vNUMA is used, the cores per socket setting does impact the virtual
NUMA topology presented to the guest and can have a performance impact if it does not
match the physical NUMA topology. By default, vNUMA will pick the optimal topology for
you as long as you have not manually increased the cores per socket value. If it has
been changed for licensing purposes, it is important to match the physical NUMA
topology manually.
WARNING! When using the cores per socket configuration in combination with vNUMA,
you need to be careful about the changes you make. Dictating a NUMA architecture of a
VM that does not match the underlying NUMA architecture, or at least fit within the
underlying NUMA architecture, can cause performance problems for demanding
applications. However, this can also be used to manage the NUMA architecture of a VM
so that it matches the physical server's NUMA layout in a cluster with different physical
NUMA layouts. The vNUMA configuration of a VM is locked the first time the VM is
powered on and will (by default) not be altered after that. This is to provide guest OS
and application stability.
In conclusion, when working with vNUMA enabled VMs, you should ensure that the
vNUMA layout of a VM matches the physical NUMA architecture of all hosts within a
cluster.
Ideally, hosts within a cluster should be homogeneous (identical processor
architecture), so ESXi can determine the optimal vNUMA layout.
In a heterogeneous cluster (non-identical hosts), you can either make sure to
initially power on a VM on the smallest host in the cluster, or use the cores per
HOL-SDC-1604
Page 193
HOL-SDC-1604
socket value to ensure that the vNUMA layout of a VM fits within the smallest
physical NUMA node size.
Remember that a NUMA node consists of CPU cores and memory. So if a VM has more
memory than what will fit within a single NUMA node, and the VM has 8 or less vCPUs, it
may make sense to enable vNUMA so that the guest OS can better place vCPUs and
memory.
There has been some confusion around the performance impact of setting the cores per
socket of a VM and how vNUMA actually works. By completing this module, we have
shown that:
1. Setting the cores per socket on a VM without vNUMA has no performance impact,
and should only be used to comply with license restrictions.
2. Setting the cores per socket of a VM with vNUMA enabled can have a
performance impact and can be used to force a particular vNUMA architecture.
Use with caution!
3. vNUMA is an important feature to ensure optimal performance of larger VMs (>8
vCPUs by default)
If you want to know more about the vNUMA feature of vSphere, see these articles:
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf (paper
has not yet (june 2015) been upgraded with vSphere 6 additions)
http://blogs.vmware.com/vsphere/tag/vnuma
Conclusion
This concludes Module 6, Memory Performance Feature: vNUMA with Memory
Hot Add. We hope you have enjoyed taking it. Please do not forget to fill out the survey
when you are finished.
Module 1:
Module 2:
Module 3:
Module 4:
Module 5:
minutes)
Module 6:
minutes)
Module 7:
Module 8:
minutes)
HOL-SDC-1604

Network Performance, Basic Concepts and Troubleshooting (15
Page 194
HOL-SDC-1604

(45 minutes)
HOL-SDC-1604
Page 195
HOL-SDC-1604
Module 7: Storage
Performance and
Troubleshooting (30
minutes)
HOL-SDC-1604
Page 196
HOL-SDC-1604
Introduction to Storage Performance

Troubleshooting
Approximately 90% of performance problems in a vSphere deployment are typically
related to storage in some way. There have been significant advances in storage
technologies over the past couple of years to help improve storage performance. There
are a few things that you should be aware of:
In a well-architected environment, there is no difference in performance between
storage fabric technologies. A well-designed NFS, iSCSI or FC implementation will work
just about the same as the others.
Despite advances in the interconnects, performance limit is still hit at the media
itself, in fact 90% of storage performance cases seen by GSS (Global Support Services VMware support) that are not configuration related, are media related. Some things to
remember:
Payload (throughput) is fundamentally different from IOP (cmd/s)
IOP performance is always lower than throughput
A good rule of thumb on the total number of IOPS any given disk will provide:
7.2k rpm 80 IOPS

10k rpm 120 IOPS
15k rpm 150 IOPS
EFD/SSD 20k-100k IOPS (max real world)
So, if you want to know how many IOPs you can achieve with a given number of disks:
Total Raw IOPS = Disk IOPS * Number of disks
Functional IOPS = (Raw IOPS * Write%)/(Raid Penalty) + (Raw IOPS * Read %)
This test demonstrates some methods to identify poor storage performance, and how to
resolve it using VMware Storage DRS for workload balancing. The first step is to prepare
the environment for the demonstration.
HOL-SDC-1604
Page 197
HOL-SDC-1604

HOL-SDC-1604
Page 198
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 199
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 200
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 201
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 202
HOL-SDC-1604
Storage I/O Contention

prompt.
Start the Storage Workloads

.\StartStorageTest.ps1
press enter
The script configures and starts up the virtual machines, and launches a storage
workload using Iometer.
The script may take up to 5 minutes to complete. While the script runs, spend
a few minutes on reading through the next step, to gain understanding on
storage latencies.
Disk I/O Latency

When we think about storage performance problems, the top issue is generally latency,
so we need to look at the storage stack and understand what layers there are in the
storage stack and where latency can build up.
At the top most layer, is the Application running in the guest operating system. That is
ultimately the place where we most care about latency. This is the total amount of
HOL-SDC-1604
Page 203
HOL-SDC-1604
latency that application sees and it include the latencies off the total storage stack
including the guest OS, the VMKernel virtualization layers, and the physical hardware.
ESXi cant see application latency because that is a layer above the ESXi virtualization
layer.
From ESXi we see 3 main latencies that are reported in esxtop and vCenter.
The top most is GAVG, or Guest Average latency, that is the total amount of latency that
ESXi can detect.
That is not saying this is the total amount of latency the application will see, in fact if
you compare the GAVG (the Total Amount of Latency ESX is seeing) and the Actual
latency the Application is seeing, you can tell how much latency the Guest OS is adding
to the storage stack and that could tell you if the guest OS is configured incorrectly or is
causing a performance problem. For example, if ESX is reporting GAVG of 10ms, but the
application or perfmon in the guest OS is reporting Storage Latency of 30ms, that
means that 20ms of latency is somehow building up in the Guest OS Layer, and you
should focus your debugging on the Guest OSs storage configuration.
Ok, now GAVG is made up of 2 major components KAVG and DAVG, DAVG = basically
how much time is spent in the Device from the driver HBA and storage array, and KAVG
= how much time is spent in the ESXi Kernel (so how much over is the kernel adding).
KAVG is actually a derived metric - ESXi does not specifically calculate KAVG. ESXi
calculates KAVG with the following formula:
Total Latency DAVG = KAVG.
The VMKernel is very efficient in processing IO, so there really should not be any
significant time that an IO should wait in the kernel or KAVG, so KAVG should be equal to
0 in well configured / running environments. When KAVG is not equal to 0, then that
most likely means that the IO is stuck in a Kernel Queue inside the VMKernel. So the
vast majority of the time KAVG will equal QAVG or Queue Average latency (The amount
of time an IO is stuck in a queue waiting for a slot in a lower queue to free up so it can
move down the stack).
HOL-SDC-1604
Page 204
HOL-SDC-1604
HOL-SDC-1604
Page 205
HOL-SDC-1604
View the Storage Performance as reported by Iometer

When the storage script has completed, you should see two Iometer windows, and two
storage workloads should be running.
The storage workload is started on both perf-worker-02a and perf-worker-03a. It will take
a few minutes for the workloads to settle, and performance numbers to become almost
identical for the two VMs. These virtual machines testing disk share the same datastore
and that datastore is saturated.
The performance can be seen in the Iometer GUI as...
Latencies (Average I/O Response Time), latencies around 6ms.
Low IOPs (Total I/O per Second), around 160IOPs
Low Throughput (Total MBs per Second), around 2.7MBPS
Disclaimer: Because we run this lab in a fully virtualized environment, where
the ESXi host servers also run in virtual machines, we cannot assign physical
disk spindles to individual datastores. Therefore the performance numbers on
these screenshots will vary depending on the actual load in the cloud
environment the lab is running in.
HOL-SDC-1604
Page 206
HOL-SDC-1604
HOL-SDC-1604
Page 207
HOL-SDC-1604
Select perf-worker-03a
HOL-SDC-1604
Page 208
HOL-SDC-1604
View Storage Performance Metrics in vCenter

1.
2.
3.
4.
Select "Monitor"
Select "Performance"
Select "Advanced"
Click "Chart Options"
HOL-SDC-1604
Page 209
HOL-SDC-1604
Select Performance Metrics

1.
2.
3.
4.
5.
Select "Virtual disk"

Select only"scsi0:1"
Click "None" under "Select counters for this chart"
Select "Write latency" and "Write rate"
Click "OK"
The disk that Iometer uses for generating workload is scsi0:1, or sdb inside the guest.
View Storage Performance Metrics in vCenter

Repeat the configuration of the performance chart for perf-worker-02a and
verify that performance is almost identical to perf-worker-03a.
Guidance: Device Latencies that are greater than 20ms may be a performance impact
to your applications.
Due to the way we create a private datastore for this test, we actually have pretty good
low latency numbers. Scsi0:1 is located on an iSCSI datastore based on a RAMdisk on
perf-worker-04a (DatastoreA), running on the same ESXi host as perf-worker-03a. Hence,
latencies are pretty low for a fully virtualized environment.
HOL-SDC-1604
Page 210
HOL-SDC-1604
vSphere provides several storage features to help manage and control storage
performance:
Storage I/O control

Storage IOP Limits
Storage DRS
Disk Shares
Lets configure Storage DRS to solve this contention problem.
HOL-SDC-1604
Page 211
HOL-SDC-1604
Storage Cluster and Storage DRS

A datastore cluster is a collection of datastores with shared resources and a shared
management interface. Datastore clusters are to datastores what clusters are to hosts.
When you create a datastore cluster, you can use vSphere Storage DRS to manage
storage resources.
When you add a datastore to a datastore cluster, the datastore's resources become part
of the datastore cluster's resources. As with clusters of hosts, you use datastore clusters
to aggregate storage resources, which enables you to support resource allocation
policies at the datastore cluster level. The following resource management capabilities
are also available per datastore cluster.
Space utilization load balancing You can set a threshold for space use. When space
use on a datastore exceeds the threshold, Storage DRS generates recommendations or
performs Storage vMotion migrations to balance space use across the datastore cluster.
I/O latency load balancing You can set an I/O latency threshold for bottleneck
avoidance. When I/O latency on a datastore exceeds the threshold, Storage DRS
generates recommendations or performs Storage vMotion migrations to help alleviate
high I/O load. Remember to consult your storage vendor, to get their recommendation
on using I/O latency load balancing.
Anti-affinity rules You can create anti-affinity rules for virtual machine disks. For
example, the virtual disks of a certain virtual machine must be kept on different
datastores. By default, all virtual disks for a virtual machine are placed on the same
datastore.
HOL-SDC-1604
Page 212
HOL-SDC-1604
Change to the Datastore view

1. Change to the datastore view
2. Expand "vcsa-01a.corp.local" and "Datacenter Site A"
Create a Datastore Cluster

1. Right Click on "Datacenter Site A"
2. Select "Storage"
3. Click "New Datastore Cluster..."
HOL-SDC-1604
Page 213
HOL-SDC-1604
Create a Datastore Cluster ( part 1 of 6 )

For this lab, we will accept most of the default settings.
1. Type "DatastoreCluster" as the name of the new datastore cluster.
2. Click Next
HOL-SDC-1604
Page 214
HOL-SDC-1604
Create a Datastore Cluster (part 2 of 6 )

1. Click "Next"
HOL-SDC-1604
Page 215
HOL-SDC-1604

1. Change the "Utilized Space" threshold to "50"
2. Click "Next"
Since the HOL is a nested virtual environment, it is difficult to demonstrate high latency
in a reliable manner. Therefor we do not use I/O latency to demonstrate load balancing.
The default is to check for storage cluster imbalances every 8 hours, but it can be
changed to 60 minutes as a minimum.
HOL-SDC-1604
Page 216
HOL-SDC-1604

1. Select "Clusters"
2. Select "Cluster Site A"
3. Click "Next"
HOL-SDC-1604
Page 217
HOL-SDC-1604

1. Select "DatastoreA" and "DatastoreB"
2. Click "Next"
HOL-SDC-1604
Page 218
HOL-SDC-1604

1. Click "Finish"
HOL-SDC-1604
Page 219
HOL-SDC-1604
Run Storage DRS

Take a note of the name of the virtual machine that Storage DRS wants to
migrate.
1.
2.
3.
4.
5.
Select "DatastoreCluster"
Select the "Monitor" tab
Select "Storage DRS"
Click "Run Storage DRS Now"
Click "Apply Recommendations"
Notice that SDRS recommends moving one of the workloads from DatastoreA to
DatastoreB. It is making the recommendation based on space. SDRS makes storage
moves based on performance only after it has collect performance data for more than 8
hours. Since the workloads just recently started SDRS would not make a
recommendation to balance the workloads based on performance until it has collected
more data.
Storage DRS in vSphere 6.0

1. Select "Manage"
2. Select "Settings"
3. Select "Storage DRS"
HOL-SDC-1604
Page 220
HOL-SDC-1604
4. Investigate the different settings you can configure for Storage DRS
A number of enhancements has been made to Storage DRS in vSphere 6.0, in order to
remove some of the previous limitations of Storage DRS.
Storage DRS has improved interoperability with deduplicated datastores, so that
Storage DRS is able to identify if datastores are baked by the same deduplication pool
or not, and hence avoid moving a VM to a datastore using a different deduplication pool.
Storage DRS has improved interoperability with thin provisioned datastores, so that
Storage DRS is able to identify if thin provisioned datastores are baked by the same
storage pool or not, and hence avoid moving a VM between datastores using the same
storage pool.
Storage DRS has improved interoperability with Array-based auto-tiering, so that
Storage DRS can identify datastores with auto-tiering, and treat them differently,
according to the type and frequency of auto-tiering.
Common for all these improvements is that they all require VASA 2.0, which
requires that the storage vendor has an updated storage provider.
HOL-SDC-1604
Page 221
HOL-SDC-1604
Select the VM that was migrated

1. Return to the "Hosts and Clusters" view
2. Select the virtual machine that was migrated using Storage DRS, in this
case perf-worker-03a
HOL-SDC-1604
Page 222
HOL-SDC-1604
Increased throughput and lower latency

1. Select the "Monitor" tab
2. Select "Performance"
3. Select "Advanced"
Now you should see the performance chart you created earlier in this module.
Notice how the throughput has increased and how the latency is lower (green arrows),
than it was when both VMs shared the same datastore.
HOL-SDC-1604
Page 223
HOL-SDC-1604
Return to the Iometer GUIs to review the performance

Return the Iometer workers, and see how they also report increased performance and
lower latencies.
It will take a while for Iometer to show these higher numbers, maybe 10 minutes. This
due to the way the storage performance is throttled in this lab. If you want to cut a
shortcut, try to stop, wait for 30 seconds, and then restart the two workers. See arrows
on the picture. Then the workload will spike and then settle at the higher performance
level in a matter of a couple of minutes.
Stop the Iometer workloads

Stop the Workloads
1. Press the "Stop Sign" button on the Iometer GUI
HOL-SDC-1604
Page 224
HOL-SDC-1604
2. Close the GUI by pressing the X

3. Press the "Stop Sign" button on the Iometer GUI
4. Close the GUI by pressing the X
HOL-SDC-1604
Page 225
HOL-SDC-1604

Clean up procedure
Launch PowerCLI

.\StopLabVMs.ps1
press enter
HOL-SDC-1604
Page 226
HOL-SDC-1604

Key take aways

During this lab we saw the importance of sizing your storage correctly, with respect to
space and performance. It also shows that sometimes when you have two storage
intensive sequential workloads sharing the same spindles, the performance can be
greatly impacted. If possible try to keep workloads separated; sequential workloads
separate (back by different spindles/LUNs) from random workloads.
In general, we will aim to keep storage latencies under 20ms, lower if possible, and
monitor for frequent latency spikes of 60ms or more, which would be a performance
concern and something to investigate further.
Guidance: From a vSphere perspective, for most applications, the use of one large
datastore vs. several small datastores tends not to have a performance impact.
However, the use of one large LUN vs. several LUNs is storage array dependent and
most storage arrays perform better in a multi LUN configuration than a single large LUN
configuration.
Guidance: Follow your storage vendors best practices and sizing guidelines to properly
size and tune your storage for your virtualized environment.
Conclusion
This concludes Module 7, Storage Performance and Troubleshooting. We hope
you have enjoyed taking it. Please do not forget to fill out the survey when you are
finished.
HOL-SDC-1604
Page 227
HOL-SDC-1604

minutes)
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 228
HOL-SDC-1604
Module 8: Network
Performance, Basic
Concepts and
Troubleshooting (15
minutes)
HOL-SDC-1604
Page 229
HOL-SDC-1604
Introduction to Network Performance

As defined by Wikipedia, network performance refers to measures of service quality of
a telecommunications product as seen by the customer.
These metrics are considered important:
Bandwidth: commonly measured in bits/second, this is the maximum rate that
information can be transferred
Throughput: the actual rate that information is transferred
Latency: the delay between the sender and the receiver decoding it, this is
mainly a function of the signals travel time, and processing time at any nodes the
information traverses
Jitter: variation in the time of arrival at the receiver of the information
Error rate: the number of corrupted bits expressed as a percentage or fraction
of the total sent
In the following module, we will show you how to monitor and troubleshoot some
network-related issues, so that you can troubleshoot similar issues that may exist in
your own environment.
HOL-SDC-1604
Page 230
HOL-SDC-1604

HOL-SDC-1604
Page 231
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 232
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 233
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 234
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 235
HOL-SDC-1604
Show network contention

Network contention, is when multiple VM's are fighting for the same resources.
In the VMware Hands on labs, it's not possible, to use all network resources, in a way
that simulates the real world.
Therefore this module will focus on creating network load, and show you where to look,
when you suspect network problems in your own environment.
You might see different results on your screen, due to the load of the environment when
you are running the lab.
Launch PowerCLI
Start network load

Start the lab VMs and start generating network load by typing
.\StartNetTest.ps1
in the PowerCLI window, and press enter.
Select VM
2. Select "Monitor" tab
3. Select "Performance" tab
HOL-SDC-1604
Page 236
HOL-SDC-1604
4. Select "Advanced"
5. Click "Chart Options"
HOL-SDC-1604
Page 237
HOL-SDC-1604
Select Chart options

1.
2.
3.
4.
5.
Select "Network"
Click "None"
Select "perf-worker-06a"
Select the Receive and Transmit packets dropped
Click "OK"
Note : If you are unable to select all the metrics shown here, wait until the script starts
the VM's and select open the "Chart options" again.
Monitor chart
Depending on the time it has taken for you to get to here, the Network load might be
done. You should still be able to see the load that was running in the charts. Notice, that
on the picture above, we ran the network twice for illustrational purposes.
1. Here you can see the graphical network load, on perf-worker-06a
2. Here you can monitor the load, of the VM and see the actual numbers, of the data
transmitted.
Some good advice on what to look for is:
Usage:
HOL-SDC-1604
Page 238
HOL-SDC-1604
If this number is to low, depending on what you expect, it might be because of problems
in the network, or in the VM.
Receive and Transmit packets dropped:
This is a good indication of contention. This means that packages are dropped, and
might need to be re-transmitted, which could be caused by contention or problems in
the network.
Let's go to the host, and see if this is a VM, or a host problem.
HOL-SDC-1604
Page 239
HOL-SDC-1604
Select Host
1.
2.
3.
4.
5.
6.
Select "esx-01a.corp.local"
Select "Monitor" tab
Select "Performance" tab
Select "Advanced"
Select "Network" from the drop down menu
Click "Chart Options"
HOL-SDC-1604
Page 240
HOL-SDC-1604
Select Chart options

1.
2.
3.
4.
Click "None"
Select "esx-01a.corp.local"
Select "Receive and Transmit packets dropped"
Click "OK"
HOL-SDC-1604
Page 241
HOL-SDC-1604
Monitor Chart
1. See if there are any dropped packets on the host
In this example, there is no packets dropped in the host, wich indicates that this is a VM
problem.
Note that you might see different results, in the lab, due to the nature of the Hands on
Labs.
HOL-SDC-1604
Page 242
HOL-SDC-1604

Clean up procedure
the used virtual machines and reset their configuration.
Launch PowerCLI

.\StopLabVMs.ps1
press enter
The script will now stop all running VMs and reset their settings. You can then move on
to another module.
HOL-SDC-1604
Page 243
HOL-SDC-1604

Key take aways

During this lab we saw how to diagnose networking problems, in VM's and hosts, using
VMware's build in monitoring tools in vCenter.
There are many other ways of performance troubleshooting.
If you want real time performance, esxtop is a great tool for just that, and it's
covered in module 10.
If you want long term performance statistics, the vRealize Operations could be a
good tool. This is also covered later in module 12.
If you want to know more about performance troubleshooting, continue with the next
modules, or see this article:
Troubleshooting network performance issues in a vSphere environment
Conclusion
This concludes Module 8, Network Performance, Basic Concepts and
Trubleshooting We hope you have enjoyed taking it. Please do not forget to fill out the
survey when you are finished.
Module
Module
Module
Module
HOL-SDC-1604
1:
2:
3:
4:
CPU
CPU
CPU
CPU
Performance, Basic Concepts and Troubleshooting (15 minutes)

Performance Feature: Latency Sensitivity Setting (45 minutes)
Performance Feature: Power Policies (15 minutes)
Performance Feature: SMP-FT (30 minutes)
Page 244
HOL-SDC-1604

minutes)
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 245
HOL-SDC-1604
Module 9: Network
Network IO Control with
Reservations (45
minutes)
HOL-SDC-1604
Page 246
HOL-SDC-1604
Introduction to Network IO Control

The Network I/O Control (NIOC) feature in VMware vSphere has been enhanced in
vSphere 6.0 to support a number of exciting new features such as bandwidth
reservations.
The total list of features, can be seen below.
Bandwidth reservations for classes of traffic: You can specify the minimum
bandwidth that must be reserved for a class of traffic. This guarantees that the
bandwidth to the same class of traffic never falls below the specified threshold.
As an example, if VM traffic is dedicated 5 Gbps of bandwidth, then the combined
VM traffic is always guaranteed 5 Gbps of bandwidth, irrespective of traffic from
other classes of service such as VMware vSphere vMotion, NFS, VMware
vSphere Fault Tolerance (FT), and VMware Virtual SANTM.
Bandwidth reservations for VMs: NetIOC also allows the ability to provide
bandwidth reservations to each VM virtual adapter (vNIC), thus providing the
ability to provide dedicated bandwidth reservations at a per VM granularity.
NetIOC also allows you to create abstract network resource pools that can be
attributed to a port group of a distributed virtual switch (DVS). Bandwidth
reserved for a resource pool is available only to VM vNICs that are part of the port
group associated with the resource pool.
Load balancing: This feature allows VMware vSphere Distributed Resource
Scheduling (DRS) to migrate VMs within a cluster of vSphere hosts to
accommodate bandwidth reservations assigned to VM ports. This powerful
feature allows you to assign bandwidth reservations to VMs without worrying
about hitting the reservation limit in a single host of the cluster.
The above features are in addition to NetIOC features already available in vSphere 5,
such as:
Resource isolation through resource pools

Distributing bandwidth with fair shares
Bandwidth limits
Load-based teaming policies
HOL-SDC-1604
Page 247
HOL-SDC-1604
Architecture
An overview of the architecture of NIOC
HOL-SDC-1604
Page 248
HOL-SDC-1604
Show Network IO Control

Due to the nature of the Hands on labs, it's not possible to show Network IO Control live.
Therefore we have created an offline demo, that you can walk thru, and see the
features.
Click here to view an interactive demo of Network IO Control. The demo will open in a
new browser tab or window. When you have completed the offline demo, simply click
the "Return to the Lab" link the top right-hand corner of the browser.
Be aware, that the lab continues to run, in the background. If it takes too long to
complete the offline demo, the lab may go into standby mode, and you have to resume
it after completing the module.
HOL-SDC-1604
Page 249
HOL-SDC-1604

Clean up procedure
Since this is an interactive demo, there is no cleanup necessary in this module.
If your lab was suspended, during the interactive demo, please resume it now.
Key take aways

During this lab we saw that NIOC can be used, to reserve bandwidth, for certain certain
types of VM's and workloads.
If you want to know more about NIOC, see these articles:
Performance Evaluation of Network I/O Control in VMware vSphere 6
http://www.vmware.com/files/pdf/techpaper/Network-IOC-vSphere6-PerformanceEvaluation.pdf
Youtube video explaining vSphere Network I/O Control, Version 3
https://www.youtube.com/watch?v=IvczUp6d8ZY
Conclusion
This concludes Module 9, Network Performance Feature: Network IO Control
with Reservations. We hope you have enjoyed taking it. Please do not forget to fill out
Module 1:
Module 2:
Module 3:
Module 4:
Module 5:
minutes)
HOL-SDC-1604

Page 250
HOL-SDC-1604
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 251
HOL-SDC-1604
Module 10: Performance

Monitoring Tool: esxtop
CLI introduction (30
minutes)
HOL-SDC-1604
Page 252
HOL-SDC-1604
Introduction to esxtop
There are several tools to monitor and diagnose performance in vSphere environments.
It is best to use esxtop to diagnose and further investigate performance issues that
have already been identified through another tool or method. esxtop is not a tool
designed for monitoring performance over the long term, but is great for deep
investigation or monitoring a specific issue or VM over a defined period of time.
In this lab, which should take about 30 minutes, we will use esxtop to dive into
performance troubleshooting, in both CPU, Memory, Storage and Network. The goal of
this module is to expose you to the different views in esxtop, and to present you with
different loads, in each view. This is not meant to be a deep dive into esxtop, but to get
you comfortable with this tool so that you can use it in your own environment.
To learn more about each metric in esxtop, and what they mean, we recommend that
you look at the links at the end of this module.
In the next module, we will look at the ESXtopNGC Plugin whichdisplays host-level
statistics in new and more powerful ways by tapping into the GUI capabilities of the
vSphere Web Client.
For day-to-day performance monitoring of an entire vSphere environment, vRealize
Operations Manager (vROPs) is powerful tool that can be used to monitor your entire
virtual infrastructure. It incorporates high-level dashboard views and built in intelligence
to analyze the data and identify possible problems. Module 12 of this lab shows you
some basic functions of vROPs. We also recommend that you look at the other vROPs
lab when you are finished with this one, for better understanding of day-to-day
monitoring.
HOL-SDC-1604
Page 253
HOL-SDC-1604

HOL-SDC-1604
Page 254
HOL-SDC-1604
On-Screen Keyboard
Keyboard.

prompt.
HOL-SDC-1604
Page 255
HOL-SDC-1604

.\StopLabVMs.ps1
and press Enter.
module.
Start this Module

HOL-SDC-1604
Page 256
HOL-SDC-1604
Login to vSphere
Password: VMware1!
HOL-SDC-1604
Page 257
HOL-SDC-1604
Refresh the UI
vSphere Web Client.

HOL-SDC-1604
Page 258
HOL-SDC-1604
Show esxtop CPU Features

Esxtop can be used to diagnose performance issues involving almost any aspect of
performance at both the host and virtual machine level. This section will step through
how to view CPU performance, using esxtop in interactive mode.

prompt.
Start CPU load on VMs

Type
.\StartCPUTest2.ps1
and press Enter. The lab VMs will now start.
Open PuTTY
HOL-SDC-1604
Page 259
HOL-SDC-1604
SSH to esx-01a
1. Select host esx-01a.corp.local
2. Click Open
HOL-SDC-1604
Page 260
HOL-SDC-1604
Start esxtop
1. From the ESXi shell, type
esxtop
and press Enter.
2.
Click the Maximize icon so we can see the maximum amount of information.
HOL-SDC-1604
Page 261
HOL-SDC-1604
Select the CPU view

If you just started esxtop, you are default on the CPU view.
To be sure, press "c" to switch to the CPU view.
HOL-SDC-1604
Page 262
HOL-SDC-1604
Filter the fields displayed

Type
f
To see the list of available fields (counters).
Since we don't have a lot of screen space, let's remove the ID and Group Id counters.
Do this by typing the following letters (NOTE: make sure these are capitalized, as these
are case sensitive!)
A
B
Press Enter
HOL-SDC-1604
Page 263
HOL-SDC-1604
Filter only VMs

This screen shows performance counters for both virtual machines and ESXi host
processes.
To see only values for virtual machines
Press (capital)
V
Monitor VM load
Monitor the load on the 2 Worker VM's: perf-worker-01a and perf-worker-01b.
They should both be running at (or near) 100% guest CPU utilization. If not, then wait for
a moment and let the CPU workload startup.
One important metric to monitor is %RDY (CPU Ready). This metric is the percentage of
time a world is ready to run, but awaiting the CPU scheduler for approval. This metric
can go up to 100% per vCPU, which means that with 2 vCPU's, it has a maximum value
of 200%. A good guideline is to ensure this value is below 5% per vCPU, but it will
always depend on the application.
Look at the worker VMs to see if they go above the 5% per vCPU threshold. To force
esxtop to immediately refresh, click the Space bar.
HOL-SDC-1604
Page 264
HOL-SDC-1604

2. Click Actions
HOL-SDC-1604
Page 265
HOL-SDC-1604
Add a vCPU to perf-worker-01a

1. Select 2 vCPUs
2. Click "OK"
HOL-SDC-1604
Page 266
HOL-SDC-1604

2. Click Actions
HOL-SDC-1604
Page 267
HOL-SDC-1604
Add a vCPU to perf-worker-01b

1. Select 2 vCPUs
2. Click "OK"
Monitor %USED and %RDY

Return to the PuTTY (esxtop) window.
Now we have added an additional vCPU to each virtual machine, you should see results
like the screenshot above:
As expected, the vCPU count has increased from 2 to 4.
HOL-SDC-1604
Page 268
HOL-SDC-1604
The %USED is still only around 100, which means that the CPU benchmark is still
only using one vCPU per virtual machine.
%IDLE is now around 100, which means that one vCPU is idle.
%RDY has increased, which means that even if the additional vCPU is not being
used yet, it causes some additional CPU ready time. This is due to the additional
overhead of scheduling SMP virtual machines. This is also why right-sizing your
virtual machines is important, if you want to optimize resource consumption.
Monitor %USED and %RDY (continued)

After a few minutes, the CPU benchmark will start to use the additional vCPUs and
%RDY will increase even more. This is due to CPU contention and SMP scheduling
(increased %CSTP) on the system. The ESXi host has 4 vCPUs across two active virtual
machines, attempting to run 2 vCPUs at 100% each, are fighting for resources.
Remember that the ESXi host also requires some CPU resources to run, and this causes
CPU contention.
HOL-SDC-1604
Page 269
HOL-SDC-1604
Show esxtop memory features

performance and at both the host and virtual machine perspectives. This section will
step through how to view memory performance, using esxtop in interactive mode.
Start load on VM's

In the PowerCLI window type
And press enter, to start the memory load.
You can continue to the next step, while the script is running, but please don't close any
windows, since that will stop the memory load.
HOL-SDC-1604
Page 270
HOL-SDC-1604
Select the Memory view

In the PuTTY window type
m
To see the memory view
Select correct fields

Type
f
To see the list of available counters.
Since we don't have so much screen space, we will remove the 2 counters ID and Group
Id
Do this by pressing (capital letters)
B
H
J
HOL-SDC-1604
Page 271
HOL-SDC-1604
Press enter to return to the esxtop screen
HOL-SDC-1604
Page 272
HOL-SDC-1604
See only VMs

This screen shows performance counters for both virtual machines and ESXi host
processes.
To see only values for virtual machines
Press (capital)
V
Monitor memory load with no contention

When the load on the worker VM's begin, you should be able to see them, in the top of
the esxtop window.
Some good metrics to look at is :
MCTL :
Is the balloon driver installed? If not, then it's a good idea to fix that first.
MCTLSZ :
Shows how inflated the balloon is. How much memory has been taken back from the
operating system. This should be 0.
HOL-SDC-1604
Page 273
HOL-SDC-1604
SWCUR :
Shows how much the VM has swapped. This should be 0, but could be ok, if the last
counters are ok.
SWR/S :
Shows how much read there is on the swap file.
SWW/S :
Shows how much write there is on the swap file.
Depending on the lab, all counters should be ok. But due to the nature of the nested lab,
it's unclear what you might see. So look around, and see if everything looks fine.
1. Right Click "perf-worker-04a"
2. Select "Power"
3. Click "Power On"
HOL-SDC-1604
Page 274
HOL-SDC-1604
Monitor memory load under contention

Now that we have created memory contention on the ESXi host, we can see.
perf-worker-04a has no VMware Tools (and no Balloon driver) installed, and therefor
doesn't have any ballooning target
perf-worker-02a and 03a are ballooning around 400MB each
perf-worker-02a, 03a and 04a are swapping to disk
Stop load on worker

1. Stop the load on the workers that appeared after you started the load script by
closing the 2 VM stat collector windows.
HOL-SDC-1604
Page 275
HOL-SDC-1604
Show esxtop storage features

step through how to use esxtop to view storage performance using esxtop in interactive
mode.
Start lab
and press enter to start the lab
The lab will take about 5 minutes to prepare. Feel free to continue, on the other steps,
while the script finishes.
After you start the script, be sure that you don't close any windows that appear.
Different views
When looking at storage in esxtop, you have multiple options to choose from.
Esxtop shows the storage statistics in three different screens:
adapter screen (d)
device screen (u)
vm screen (v)
And
vSAN (x)
HOL-SDC-1604
Page 276
HOL-SDC-1604
We will focus on the VM screen in this module.

In the Putty window type (lower case)
v
To see the storage vm view
HOL-SDC-1604
Page 277
HOL-SDC-1604

Type
f
In this case, all counters are added by vscsi id.
Since we have enough room for all counters, we will add this too by pressing (capital
letter)
A
Press enter when finished
HOL-SDC-1604
Page 278
HOL-SDC-1604
Start load on VM's

The StartStorageTest.ps1 script that we executed in the beginning of this lab, should be
finished now and you should have 2 IOmeter windows on your desktop, looking like this.
If not, run the
StartStorageTest.ps1
again, and wait for it to finish.
Monitor VM load
You have 4 running VM's in the Lab.
2 of them, is running IOmeter Workloads, and the other 2 are iSCSI storage targets using
RAM disk. Because they are using a RAM disk as storage target, they do not generate
any disk I/O.
The metrics to look for here is :
CMDS/S :
HOL-SDC-1604
Page 279
HOL-SDC-1604
This is the total amount of commands per second and includes IOPS (Input/Output
Operations Per Second) and other SCSI commands such as SCSI reservations, locks,
vendor string requests, unit attention commands etc. being sent to or coming from the
device or virtual machine being monitored.
In most cases, CMDS/s = IOPS unless there are a lot of metadata operations (such as
SCSI reservations)
LAT/rd and LAT/wr :
Indicates average response time or Read and Write IO, as seen by the VM.
In this case, you should see high values, in CMD/s on the worker VM's that is currently
doing IO Meter load (perf-worker-02a and 03a) indicating, that we are generating a lot of
IO.
And a high value in LAT/wr, since we are only doing writes.
The numbers can be different, on your screen, due to the nature of the Hands on labs.
HOL-SDC-1604
Page 280
HOL-SDC-1604
Device or Kernel latency

Press
d
To go to the Device view.
Here you can see that the storage workload is on device vmhba33, which is the software
iSCSI adapter. Look for DAVG (device latency) and KAVG (kernel latency). DAVG should
be below 25ms and KAVG, latency caused by the kernel, should be very low, and always
below 2ms.
In this example the latencies are within acceptable values.
Stop load on worker

Close BOTH Iometer workers
1. When finished, stop IOmeter workloads by clicking "STOP"
2. Close the window, by selecting the X in the top right corner.
HOL-SDC-1604
Page 281
HOL-SDC-1604
show esxtop Network features

step through how to view network performance, using esxtop in interactive mode.
Start Lab VM's

.\StartNetTest.ps1
And press enter.
Continue with the next steps, while the script runs, it will take a few minutes.
HOL-SDC-1604
Page 282
HOL-SDC-1604
Select the network view

In the PuTTY window type
n
to see the networking view
HOL-SDC-1604
Page 283
HOL-SDC-1604

Type
f
Since we don't have so much screen space, we will remove the 2 counters PORT-ID and
DNAME
Do this by pressing (capital letters)
A
F
Press enter when finished.
Monitor load
Monitor the metrics.
Note that the result might be different, on your screen, due to the load of the
environment where the Hands On Labs is running.
HOL-SDC-1604
Page 284
HOL-SDC-1604
The screen updates automatic, but you can force a refresh, by pressing
space
The metric to watch for, is :
%DRPTX and %DRPRX :
Which is the % of sent and received packages that were dropped.
If this number goes up, it might be an indication of high network utilization.
Note that the StartNetTest.ps1 script that you ran in the first step, starts the VM's and
then waits for 2 minutes before running a network load for 5 minutes.
Depending on how fast you were, at getting to this step, you might not see any load, if it
took you more than seven minutes.
HOL-SDC-1604
Page 285
HOL-SDC-1604
Restart network load

If you want to start the network load for another 5 minutes, return to the PowerCLI
window.
.\StartupNetLoad.bat
And press enter.
The network load will now run for another five minutes. While you wait, you can explore
esxtop more.
Network workload complete

As described previously, the load will stop by itself, . So when the PowerShell window
says "Network load complete" no more load will be generated.
HOL-SDC-1604
Page 286
HOL-SDC-1604

Clean up procedure
Launch PowerCLI

.\StopLabVMs.ps1
press enter
to another module.
Key take aways

During this lab we saw how to use esxtop, to monitor load, in both CPU, memory,
storage and network.
We have only scratched the surface, of what esxtop can do.
HOL-SDC-1604
Page 287
HOL-SDC-1604
If you want to know more about esxtop, see these articles:

Yellow-Bricks esxtop page
http://www.yellow-bricks.com/esxtop/
Esxtop Bible
https://communities.vmware.com/docs/DOC-9279
Conclusion
This concludes Module 10, Performance Tool: esxtop CLI introduction. We hope
you have enjoyed taking it. Please do not forget to fill out the survey when you are
finished.

minutes)
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 288
HOL-SDC-1604

Monitoring Tool: esxtop
for vSphere Web Client
(30 minutes)
HOL-SDC-1604
Page 289
HOL-SDC-1604
Introduction to esxtop for vSphere

Web Client
The goal of this module is to show you how to use VMware's fling esxtopNGC, which is a
plugin-version of esxtop for the vSphere Web Client. The plugin displays ESX server
stats in new and more powerful ways by tapping into the GUI capabilities of the Web
Client.
Some of the features are :
Separate tabs for CPU, memory, network and disk performance statistics
Flexible batch output
Flexible counter selection
Advanced data grid for displaying stats (sortable columns, expandable rows, etc.)
Configurable refresh rate
VM-only stats
Embedded tooltip for counter description
It can be found and downloaded from the VMware flings website

:https://labs.vmware.com/flings/esxtopngc-plugin
While the time available in this lab constrains the number of performance problems we
can review as examples, we have selected relevant problems that are commonly seen in
vSphere environments. By walking through these examples, you should be more
capable to understand and troubleshoot typical performance problems.
For the complete Performance Troubleshooting Methodology and a list of VMware Best
Practices, please visit the vmware.com website:
http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vsphere-esxivcenter-server-60-monitoring-performance-guide.pdf
What is a Fling ?
From the Flings website :https://labs.vmware.com/about
Our engineers work on tons of pet projects in their spare time, and are always looking to
get feedback on their projects (or flings). Why flings? A fling is a short-term thing, not
a serious relationship but a fun one. Likewise, the tools that are offered here are
intended to be played with and explored. None of them are guaranteed to become part
of any future product offering and there is no support for them. They are, however,
totally free for you to download and play around with them!
HOL-SDC-1604
Page 290
HOL-SDC-1604

HOL-SDC-1604
Page 291
HOL-SDC-1604
On-Screen Keyboard
Keyboard.
To open the On-Screen Keyboard go to "Start - On-Screen Keyboard" or use the shortcut
in the Taskbar.

action to get back on track, and restart the current module from the beginning.
prompt.
HOL-SDC-1604
Page 292
HOL-SDC-1604

.\StopLabVMs.ps1
press enter
The script will now stop all running VMs and reset their settings, and you can restart the
module.
Web Client errors

Du to the way we close the lab, when we switch lessons, you might experience the error
above.
If the happens select
1. Yes
To reload the client, and continue the lesson.
HOL-SDC-1604
Page 293
HOL-SDC-1604
Start this Module

Login to vSphere
Log into vSphere. The vSphere web client is the default start page.
1. Login using Windows session authentication.
Credentials used are:
User name: corp\Administrator
Password: VMware1!
HOL-SDC-1604
Page 294
HOL-SDC-1604
Refresh the UI
If you need to manually refresh the inventory, click the "Refresh" icon in the top of the
vSphere Web Client.
Navigate the Web Client to Hosts and Clusters Screen

1. Click on the "Hosts and Clusters" icon
HOL-SDC-1604
Page 295
HOL-SDC-1604
Show esxtop for vSphere Web Client

cpu features
Open PowerCLI
1. Click the PowerCLI icon on the taskbar
Start Load
Type
.\StartCPUTest2.ps1
and press enter, to start the Lab.
You can continue now, but please don't close any windows since it might stop the script.
HOL-SDC-1604
Page 296
HOL-SDC-1604
Esxtop
In the vSphere web client
1.
2.
3.
4.
Expand Datacenter and Cluster

Select Monitor tab
Select top tab
HOL-SDC-1604
Page 297
HOL-SDC-1604
Refresh Rate
1. Click the Set Refresh rate button
2. Change Refresh Rate to 3
3. Select OK
HOL-SDC-1604
Page 298
HOL-SDC-1604
Layout
Since the resolution within our environment is small, we need to make room for the
metrics and counters.
Start by closing all the extra windows in the Web client:
1.
2.
3.
4.
Navigator
Alarms (should already be closed)
Work in progress (should already be closed)
Recent tasks (should already be closed)
HOL-SDC-1604
Page 299
HOL-SDC-1604
Display counters
1. Click Select Display Counters
Select display counters

1. Remove the checkmark from GID
2. Click OK
Monitor the load

Monitor the load on the 2 Worker VM's (perf-worker-01a and perf-worker-01b). They are
both running at 100% cpu utilization, inside the guest OS. The metrics to concentrate on
is %RDY (CPU Ready). This metric, is the % of time a world is ready to run and
awaiting the CPU Scheduler for approval. The metric is up to 100% per vCPU. which
means that with 2 vCPU's it can go up to 200%.
A good guideline to use, is to have this below 5% per vCPU, but it will always be up to
the user experience, on your applications.
HOL-SDC-1604
Page 300
HOL-SDC-1604
Look at the worker VM's to see if they go above the 5% per vCPU. If you want esxtop to
refresh, press the space button.

2. Click Actions
HOL-SDC-1604
Page 301
HOL-SDC-1604
Add a vCPU to perf-worker-01a

1. Select 2 vCPUs
2. Click "OK"
HOL-SDC-1604
Page 302
HOL-SDC-1604

2. Click Actions
HOL-SDC-1604
Page 303
HOL-SDC-1604
Add a vCPU to perf-worker-01b

1. Select 2 vCPUs
2. Click "OK"

Return to the esxtop tab
Now we have added an additional vCPU to each virtual machine. If you did it quickly,
you will see an esxtop like the picture above (note only look at perf-worker-01a, due to
screenshot timing).
HOL-SDC-1604
Page 304
HOL-SDC-1604
The vCPU count has increased from 2 to 4.

The %RUN is still only around 100, which means that the CPU benchmark is still only
using one vCPU in each virtual machine.
%IDLE is now around 100, which means that one vCPU is idle.
%RDY has increased, which means that, even if the additional vCPU is not being used
yet, it causes some additional CPU ready time. This is caused by the additional overhead
of scheduling SMP virtual machines. This is also why right sizing of your virtual
machines is important, if you want to optimize your resource consumption.

After a few minutes, the CPU benchmark will start to use the additional vCPUs and
%RDY will increase even more. This is due to CPU contention and SMP scheduling
(increased %CSTP) on the system. The ESXi host has 4 vCPUs and the two virtual
machines, attempting to run 2 vCPUs at 100% each, are fighting for resources.
Remember that the ESXi host also requires some CPU resources to run, and this causes
CPU contention.
HOL-SDC-1604
Page 305
HOL-SDC-1604

memory features
Start Load
You can continue now, but please don't close any windows, since it might stop the
script.
HOL-SDC-1604
Page 306
HOL-SDC-1604
Esxtop
1.
2.
3.
4.
5.

Select Monitor tab
Select top tab
Select Memory tab
HOL-SDC-1604
Page 307
HOL-SDC-1604
Refresh Rate
3. Select OK.
HOL-SDC-1604
Page 308
HOL-SDC-1604
Layout
If you did not change the layout in the last step, then please do this now, to have
enough room, to see all the metrics possible.
Start by closing all the extra windows in the Web client
1.
2.
3.
4.
Navigator
HOL-SDC-1604
Page 309
HOL-SDC-1604
Display counters
1. Click Select display counters button

Remove or add the check mark from
1.
2.
3.
4.
GID
MCTL?
MCTLSZ
And click ok.
Monitor the load with no contention

When the load on the worker VM's begin, you should be able to see them, in the top of
the esxtop window.
Note that you might have to use the scroll bar in the bottom, to see all metrics, due to
the resolution of the screen.
HOL-SDC-1604
Page 310
HOL-SDC-1604
Some good metrics to look at is :

MCTL :
Is the balloon driver installed? If not, then it's a good idea to fix that first.
MCTLSZ :
Shows how inflated the balloon is. etc. how much memory, has been taken back from
the operating system. This should be 0.
SWCUR :
HOL-SDC-1604
Page 17HOL-SDC-1604
Shows how much the VM has swapped. This should be 0, but could be ok if the last
counters are ok.
SWR/S :
Shows how much read there is on the swap file.
SWW/S :
Shows how much write there is on the swap file.
Depending on the lab, all counters should be ok. But due to the nature of the nested lab,
it's unclear what you might see. So look around, and see if everything looks fine.
1. Right Click "perf-worker-04a"
2. Select "Power"
HOL-SDC-1604
Page 311
HOL-SDC-1604
3. Click "Power On"
Monitor memory load under contention

Now that we have created memory contention on the ESXi host, we can see.
perf-worker-04a has no VMware Tools (and no Balloon driver) installed, and therefor
doesn't have any ballooning target
perf-worker-02a and 03a are ballooning around 400MB each perf-worker-02a, 03a and
04a are swapping to disk
HOL-SDC-1604
Page 312
HOL-SDC-1604
Stop load
1. To stop the load on the lab, close the 2 vm stat collector windows.
HOL-SDC-1604
Page 313
HOL-SDC-1604

storage features
Start Load
Open the PowerCLI window and type
script.
HOL-SDC-1604
Page 314
HOL-SDC-1604
Esxtop
In the vSphere client
1.
2.
3.
4.
5.

Select Monitor tab
Select top tab
Select Disk VSCSI tab
HOL-SDC-1604
Page 315
HOL-SDC-1604
Refresh Rate
3. Select OK.
HOL-SDC-1604
Page 316
HOL-SDC-1604
Layout
I you haven't done do, in the last steps then start by closing all the extra windows in the
Web client.
1.
2.
3.
4.
Navigator
HOL-SDC-1604
Page 317
HOL-SDC-1604
Display counters

Remove or add the check mark from
1.
2.
3.
4.
GID
ID
NDK
And click ok.
Monitor the load

You have 4 running VM's in the Lab.
2 of them, is running IOmeter Workloads, and the other 2 are idle, to show you the
difference between, an idle and a loaded VM.
HOL-SDC-1604
Page 318
HOL-SDC-1604
The metrics to look for here, is :

CMDS/S :
This is the total amount of commands per second and includes IOPS (Input/Output
Operations Per Second) and other SCSI commands such as SCSI reservations, locks,
vendor string requests, unit attention commands etc. being sent to or coming from the
device or virtual machine being monitored.
In most cases, CMDS/s = IOPS unless there are a lot of metadata operations (such as
SCSI reservations)
LAT/rd and LAT/wr :
Indicates average response time, or Read and Write IO, as seen by the VM.
In this case, you should see high values, in CMD/s on the worker VM's that is currently
doing IO Meter load (Perf-worker-02a and 03a) indicating, that we are generating a lot of
IO.
And a high value in LAT/wr, since we are only doing writes.
The numbers can be different, on your screen, due to the nature of the Hands on labs.
HOL-SDC-1604
Page 319
HOL-SDC-1604
Device or Kernel latency

1. Go to the Disk Adapter tab
Here you can see that the storage workload is on device vmhba33, which is the software
iSCSI adapter. Look for DAVG (device latency) and KAVG (kernel latency). DAVG should
be below 25ms and KAVG, latency caused by the kernel, should be very low, and always
below 2ms.
In this example the latencies are within acceptable values.
HOL-SDC-1604
Page 320
HOL-SDC-1604
Stop load
1. To stop the lab, close the 2 IO Meter windows.
HOL-SDC-1604
Page 321
HOL-SDC-1604
Error
If you experience the error above, just select
1. Close the program
HOL-SDC-1604
Page 322
HOL-SDC-1604
show esxtop for vSphere Web Client

Network features
Start Load
.\StartNetTest.ps1
script.
HOL-SDC-1604
Page 323
HOL-SDC-1604
Esxtop
In the vSphere Web client :
1.
2.
3.
4.

Select Monitor tab
Select top tab
HOL-SDC-1604
Page 324
HOL-SDC-1604
Network and settings

1.
2.
3.
4.
Select Networking tab

Click the Set Refresh rate button
Change Refresh Rate to 3
Select OK.
HOL-SDC-1604
Page 325
HOL-SDC-1604
Layout
I you haven't done do, in the last steps then start by closing all the extra windows in the
Web client.
1.
2.
3.
4.
Navigator
HOL-SDC-1604
Page 326
HOL-SDC-1604
Display counters

Remove the checkmark from
1. Port-ID
2. DNAME
3. And click ok.
Monitor the load

The metric to watch for is :
%DRPTX and %DRPRX :
Which is the % of sent and received packages that were dropped.
If this number goes up, it might be an indication of high network utilization.
HOL-SDC-1604
Page 327
HOL-SDC-1604
Note that the StartNetTest.ps1 script that you ran in the first step, starts the VM's and
then waits for 2 minutes before running a network load for 5 minutes.
Depending on how fast you were at getting to this step, you might not see any load to
begin with since the network load has not started yet.
Restart network load

If you want to start the network load for another 5 minutes, return to the PowerCLI
window. In the PowerCLI window type
.\StartupNetLoad.bat
And press enter.
The network load will now run for another five minutes, while you can explore esxtop
more.
HOL-SDC-1604
Page 328
HOL-SDC-1604
Network workload complete

The load will stop by itself, as described in the last step. So when the PowerShell
window, says "Network load complete" no more load will be generated.
HOL-SDC-1604
Page 329
HOL-SDC-1604

Clean up procedure
Launch PowerCLI

.\StopLabVMs.ps1
press enter
to another module.
Key take aways

During this lab we saw how to use the esxtop for vSphere Web Client, to diagnose and
troubleshoot cpu, memory, storage and network performance.
HOL-SDC-1604
Page 330
HOL-SDC-1604
We hope that we have showed you an alternative way of using esxtop, in the vSphere
Web Client, and that you find it useful.
If you want to know more about esxtop for vSphere Client, see these articles:
VMware flings website
https://labs.vmware.com/flings/esxtopngc-plugin
Esxtop bible
https://communities.vmware.com/docs/DOC-9279
Yellow-Bricks esxtop page
http://www.yellow-bricks.com/esxtop/
HOL-SDC-1604
Page 331
HOL-SDC-1604
Conclusion
This concludes Module 11, esxtop for vSphere Web Client. We hope you have
enjoyed taking it. Please do not forget to fill out the survey when you are finished.

minutes)
minutes)
minutes)
(45 minutes)
HOL-SDC-1604
Page 332
HOL-SDC-1604

Monitoring and
Troubleshooting: vRealize
Operations, Next Steps
(30 minutes)
HOL-SDC-1604
Page 333
HOL-SDC-1604
Introduction to vRealize Operations

Manager
vRealize Operations Manager 6 features a re-architectured platform that now delivers 8x
greater scalability with unified management across vSphere and other domains.
Analytics, smart alerts and problem detection capabilities identify complex issues and
then recommend automated tasks that streamline remediation before problems impact
users.
Architecture
An overview of the architecture of vRealize Operations Manager version 6. vRealize
Operations Manager now uses a scale out architecture, where the older version 5.X used
a scale up architecture.
HOL-SDC-1604
Page 334
HOL-SDC-1604
Use vRealize Operations Manager for

Performance Troubleshooting
Since this lab has its focus on deep parameter based performance troubleshooting, we
have decided not to include vRealize Operations Manager in this particular lab.
However, vRealize Operations Manager is a very powerfull performance troubleshooting
tool, and therefore we have included an offline demo, that hopefully will inspire you to
go and explore one of the dedicated vRealize Operations Manager Hands-On labs.
Click here to view an interactive demo of vRealize Operations Manager. The demo will
open in a new browser tab or window. When you have completed the offline demo,
simply click the "Return to the Lab" link the top right-hand corner of the browser.
Be aware, that the lab continues to run, in the background. If it takes too long to
complete the offline demo, the lab may go into standby mode, and you have to resume
it after completing the module.
HOL-SDC-1604
Page 335
HOL-SDC-1604

Clean up procedure
Since this is an interactive demo, there is no cleanup necessary in this module.
If your lab was suspended, during the interactive demo, please resume it now.
Key take aways

During this lab we saw how vRealize Operations Manager can be used for performance
troubleshooting. But there is much more to vRealize Operations Manager and if you feel
inspired to learn more about this powerfull tool, then try out the following labs: HOLSDC-1601, HOL-SDC-1602 and HOL-SDC-1610.
If you want to learn more about vRealize Operations Manager, you can go to the
following sites.
VMware TV on youtube.com: https://www.youtube.com/user/vmwaretv and go to
"Software-Defined Data Center" - "vRealize Ops Mgmt"
VMware vRealize Operations Manager 6.0.1 Documentation Center:
http://pubs.vmware.com/vrealizeoperationsmanager-6/index.jsp#Welcome/
welcome.html
Conclusion
This concludes Module 12, Performance Tool: vRealize Operations, next step in
performance monitoring and Troubleshooting. We hope you have enjoyed taking
it. Please do not forget to fill out the survey when you are finished.
Module 1:
Module 2:
Module 3:
Module 4:
Module 5:
minutes)
Module 6:
minutes)
Module 7:
HOL-SDC-1604

Page 336
HOL-SDC-1604

minutes)
(45 minutes)
HOL-SDC-1604
Page 337
HOL-SDC-1604
Conclusion
Thank you for participating in the VMware Hands-on Labs. Be sure to visit
http://hol.vmware.com/ to continue your lab experience online.
Lab SKU: HOL-SDC-1604
Version: 20151005-065137
HOL-SDC-1604
Page 338

Hol-Sdc-1604 PDF en

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hol-Sdc-1604 PDF en

Caricato da

Copyright:

Formati disponibili

HOL-SDC-1604

Show Network IO Control .................................................................................... 249

Lab Overview - HOLSDC-1604 - vSphere

Activation Prompt or Watermark

vSphere 6 Performance Introduction

Introduction to CPU Performance

For users with non-US Keyboards

Getting Back on Track

Resetting VMs to Restart Module

Start this Module

Select Hosts and Clusters

Open a PowerCLI window

Start the CPU Workload

CPU Test Started

Navigate to perf-worker-01a (VM-level) Performance Chart

Select Specific Counters for CPU Performance Monitoring

Select CPU from the Chart metrics

CPU State Time Explanation

Unpin View Panes to Expand Chart View

Monitor Demand vs. Usage lines

Explanation of value conversion

Navigate to Host-level CPU chart view

Examine ESX Host Level CPU Metrics

Edit Settings of perf-worker-01a

Check Affinity Settings on perf-worker-01a

Check Affinity Settings on perf-worker-01b

Monitor Ready time

See Better Performance

Edit Settings of perf-worker-01b

Add a CPU to perf-worker-01b

Monitor CPU performance of perf-worker-01b

workload running in a multiple-vCPU virtual machine, or a multi-threaded workload in a

Close Remote Desktop Connections

Conclusion and Clean-Up

Power off and Reset VMs

Close PowerCLI window

Key take aways

Module 1: CPU Performance, Basic Concepts and Troubleshooting (15 minutes)

Introduction to Latency Sensitivity

Who should use this feature?

Who should not use this feature?

Changes to CPU access

Changes to virtual NIC coalescing

For users with non-US Keyboards

Getting Back on Track

Resetting VMs to Restart Module

Start this Module

Select Hosts and Clusters

Performance impact of the Latency

Open a PowerCLI window

Start CPU workload

VM Stats Collectors: CPU intensive workload started

Select ESX Host

Set CPU Reservation to Maximum

Set Memory Reservation

Ensure Latency Sensitivity is 'Normal'

Click the VM Options tab

3. Click "Power On"

Monitor esx-02a host CPU usage

Although an environment which contains latency-sensitive VMs should typically remain

Monitor Resource Allocation

Open a PuTTY window

Test network latency on 'Normal' latency sensitivity VM