Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Topics Covered
Introduction Root Cause Analysis Performance Characteristics CPU Networking Memory Disk Virtual Machine optimisation ESXTop vm-support Service Console Resource Groups Design Guidelines Capacity Planner limitations and cautions Conclusion Reference Articles
Introduction
Multiple layers of virtualisation are used to increase service levels, availability and manageability However, multiple layers of virtualisation often mask performance and configuration issues making it more of a challenge to troubleshoot and correct The worst out come is that performance issues after a virtualisation project lead to the perception that VMware results in reduced performance and future confidence in VMware can be affected
Performance Basics
Resource Maximums
Host Logical Processors
Virtual CPUs
Guest N/A
8
64
N/A
20 1TB
N/A 256GB
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
Typical Host
vSphere 1U Host CPUs Memory 2 x Quad Core 32-64GB RAM
Typical 3 VMs per core, 24VMs per Host Each has 2GB of RAM = 48GB of RAM
http://www.vmware.com/resources/techresources/10066
Performance Analysis Tools esxtop (service console only) resxtop (remote command line utilities) Performance graphs in vCentre
esxtop basics
Host Resources
Number of Worlds
Performance Characteristics
CPU
Slow Processing High CPU Wait
Memory
Slow Processing Disk Swapping
Networking
Packet Loss Slow Network
Disk
Log Stalls Disk Queue
Slow Application Performance Reduced User Experience Data Loss and Corruption
CPU
ESX Scheduler
Basic World States Read / Run / Wait
Service Console
Virtual Machine
esxtop
CPU
PCPU(%): CPU utilization %USED: Utilization %RDY: Ready Time %RUN: Run Time %WAIT: Wait and idling time
VI-Client
CPU
Used Time > Ready Time: Possible CPU over-committment Used Time
Ready Time
Further Investigation
%MLMTD shows this VM has been limited
CPU
Memory
Ballooning vs. Swapping
Ballooning driver causes the host to swap pages that it chooses to disk ESX Swapping will swap any pages to disk.
Memory Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using:
sched.mem.maxmemctl
Default is set to 65%, can be controlled at host level. Only is an issue in resource contention scenarios. (or VMs with low latency eg Citrix)
Memory - Host
VI Client shows memory usage of the host. This is calculated as consumed + overhead memory + Service Console. Performance charts are a very good way of showing the Virtual Machine memory breakdown. Consumed Memory Ballooned Memory Shared Memory Swapped Memory
Memory - Guest
Host Memory = Consumed + Overhead Memory Guest Memory = Active Memory for Guest OS
Memory
Metric
Memory Active (KB)
Active memory / configured memory Machine memory mapped to a virtual machine, including its portion of shared pages. Doesnt include overhead memory Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesnt include overhead memory. Physical pages shared with other virtual machines Physical memory ballooned from a virtual machine Physical memory in swap file (approx. swap out swap in). Swap out and Swap in are cumulative Machine pages used for virtualisation
Memory Shared (KB) Memory Balloon (KB) Memory Swapped (KB) Overhead Memory (KB)
Memory
Metric Memory Active (KB) Memory Usage (%) Memory Consumed (KB) Memory Granted (KB) Memory Shared (KB) Shared Common (KB) Memory Balloon (KB) Memory Swap Used (KB) Overhead Memory (KB) Description
Physical pages touched recently by the host Active memory / configured memory Total host physical memory free memory on host. Includes Overhead and Service Console memory
Sum of physical pages allocated to all virtual machines. Doesnt include overhead memory.
Machine pages ballooned from virtual machines Physical memory in swap file (approx. swap out swap in). Swap out and Swap in are cumulative Machine pages used for virtualisation
esxtop
Memory
PMEM: Total physical memory breakdown VMKMEM: Memory managed by vmkernel COSMEM: Service Console memory breakdown PSHARE: Page sharing statistics SWAP: Swap statistics MEMCTL: Balloon driver data
Memory
VI Client Active Memory Memory Usage Consumed Memory Memory Granted Memory Shared Memory Balloon Memory Swapped Overhead Memory
esxtop TCHD %ACTV N/A N/A (SZTGT and CMTTGT represent memory scheduler targets) SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOP MCTLSZ SWCUR (SWR/s & SWW/s are rates) OVHD & OVHDMAX
Memory
VI Client Memory Active Memory Usage Memory Consumed Memory Granted Memory Shared Memory Shared Common Memory Balloon Memory Swap Used Overhead Memory
esxtop N/A (try /proc/vmware/sched/mem-verbose) N/A (try /proc/vmware/sched/mem-verbose) PMEM total PMEM free N/A (SZTGT and CMTTGT represent memory scheduler targets) PSHARE (shared) PSHARE (common) MEMCTL SWAP (r/w and w/s are rates) OVHD & OVHDMAX
Memory
Memory
Networking
Switch Assisted Teaming (IP Hash) VLAN Trunking Flow Control (full) Speed & Duplex (1000Mb / Full) Port Fast BPDU Disabled STP Disabled Link State Tracking Jumbo Frames
Network configuration is more likely to blame than resource contention
esxtop
Networking
Transmit and Receive in Mb/s Transmit and Receive in Packets
esxtop
Networking
Disk
Varying Factors File system performance Disk subsystem configuration (SAN, NAS, iSCSI, local disk) Disk caching Disk formats (thick, sparse, thin)
ESX Storage Stack Different latencies for different disks Queuing within the kernel
K: Kernel D: Device G: Guest
Disk
VI Client statistics
Quite Coarse Statistics Disk read / write rate (KB/s) Disk usage: sum of read BW and write BW (KB/s) Disk read / write requests (per 20s interval) Bus resets / Command aborts (per 20s interval) Per LUN or aggregated stats
Disk
Aggregated stats similar to VI Client Disk read / write per sec (READS/s, WRITES/s) MB read / write per sec (MBREAD/s, MBWRTN/s) Latency Statistics Kernel Average / command (KAVG/cmd) Device Average / command (DAVG/cmd) Guest Average / command (GAVG/cmd) Queuing Information Adapter Queue Length (AQLEN) LUN Queue Length (LQLEN) VMKernel (QUED) Active Queue (ACTV) %Used (%USD = ACTV/LQLEN)
esxtop statistics
Disk
SAN Rough Estimates Purely looking at a single ESX host, roughly:
Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec
Disk
Desired Latency Calculations
Desired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host Example: Number of Hosts: 16 Effective Link Bandwidth: 90 MBps Throughput per host: 90 / 16 = 5.6 MBps Desired Latency: (32 * 32) / (5.6) = 182.86 msec
Workload Desired Latency (msec) Observed Latency (msec) Throughput Drop? Throughput (MBps)
Disk
VI Client
Disk
esxtop
Command Action
esxtop
Command Options when inside esxtop
space
?
q f/F
o/O
s #
W
e V
L
m n i d u v
Display memory statistics Display network statistics Display interrupt statistics Display disk adapter statistics Display disk device statistics Display disk VM statistics
esxtop
Command Line Options from the console
Command Action
-b -l -s -a -c -R -d -n batch mode locks the objects available in the first snapshot enables secure mode show all statistics sets the configuration file enables replay mode (used with vm-support S) sets the update interval runs esxtop for n iterations
esxtop
Expand the default window size for your session to get all statistics
vm-support
Creates a packaged zip file containing the following sections:
boot contains the grub configuration etc contains the Console OS configuration files (cron, tcpwrappers, syslog, etc) proc contains much of the hardware configuration modules and variables tmp contains a lot of the ESX specific configuration output var contains log files and any core dumps vmfs contains the structure of the VMFS datastores esx3-installation (where appropriate) contains a copy if the previous esx3 configuration variables
vm-support
Using vm-support to extract performance information:
vm-support S d <duration> -i <interval> <duration> and <interval> are in seconds The output from this can then be replayed in esxtop for review after it has been extracted. esxtop R <path_to_vm-support_output>
Resource Groups
Dynamically reallocate resource shares
Additional VM, shares allow you to overcommit resources and have a graceful re-allocation
Design Guidelines
Full Resilience / Multiple paths Standard configuration across all aspects (ESX, Storage, Networking, etc.) Standard naming conventions Learn from others mistakes Follow guidelines from vendors best-practices Rule out the basics before requesting support
Conclusion
Performance issues can often be traced with simple root cause analysis using basic tools (VI Client / esxtop) Performance tools help diagnose issues and help rule out nonissues Performance tools are useful in different contexts, not always either/or Real-time data and troubleshooting: esxtop Historical data: VI Client Coarse resource / cluster usage: VI Client Detailed resource usage: esxtop Combine information from various tools to get a complete picture Always benchmark your systems first so you not what the optimal performance is that you can receive
Reference Articles
http://www.vmware.com/pdf/esx3_memory.pdf http://www.vmworld.com/docs/DOC-2370 http://blogs.vmware.com/performance/ http://communities.vmware.com/docs/DOC-5420 http://kb.vmware.com/kb/1008205 http://communities.vmware.com/community/vmtn/general/performance http://www.vmware.com/products/vmmark/ http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf http://www.vmware.com/pdf/GuestOS_guide.pdf http://www.vmware.com/resources/techresources/10066 http://www.vmware.com/resources/techresources/10059 http://www.vmware.com/resources/techresources/10062