Sei sulla pagina 1di 51

Delivering 5000 Desktops with Citrix XenDesktop

Validation Report and Recommendations for a Scalable VDI Deployment using Citrix
XenDesktop and Provisioning Services, NetApp Storage and VMWare Server
Virtualization

www.citrix.com

TABLEOFCONTENTS
INTRODUCTION...............................................................................................................................3
CITRIXXENDESKTOPOVERVIEW..............................................................................................3
EXECUTIVESUMMARY...................................................................................................................4
KeyFindings.................................................................................................................................................................................................................................5

METHODOLOGYANDWORKLOAD............................................................................................5
WorkloadDetailsLoginVSIfromLoginConsultants...............................................................................................................................................5
ComponentScalabilityResults.............................................................................................................................................................................................6

XENDESKTOP DESKTOP DELIVERY CONTROLLER (DDC) ........................................................... 7


SINGLE SERVER SCALABILITY .................................................................................................. 8
PROVISIONING SERVICES SCALABILITY ..................................................................................... 9
FINDINGS..........................................................................................................................................10
TheDesktopandDesiredUserExperience..................................................................................................................................................................10
CitrixXenDesktopDesktopDeliveryController........................................................................................................................................................10
StorageRecommendations.................................................................................................................................................................................................11
ServerHardwareFindings..................................................................................................................................................................................................12
ServerVirtualizationFindings...........................................................................................................................................................................................13
AdditionalImplicationsforScalabilityDesign............................................................................................................................................................14

LARGESCALETESTRESULTS....................................................................................................16
TestDetails................................................................................................................................................................................................................................16
SummaryofLargeScaleTestResults.............................................................................................................................................................................17
SessionPerformanceandSessionStartupTimes....................................................................................................................................................17
DesktopDeliveryControllerandProvisioningServicesPerformance.............................................................................................................19

DESKTOP DELIVERY CONTROLLER PERFORMANCE .................................................................. 19


CITRIX PROVISIONING SERVICES (PVS) PERFORMANCE .......................................................... 29
NetAppStoragePerformance............................................................................................................................................................................................34
VMWareVirtualCenterandESXPerformance...........................................................................................................................................................35

ESX PERFORMANCE ............................................................................................................. 42


SUMMARY........................................................................................................................................44
APPENDIXABLADESERVERHARDWAREANDDEPLOYMENT..................................45
APPENDIXBNETWORKDIAGRAM......................................................................................48
REFERENCES...................................................................................................................................50

Introduction
This document is intended to provide advanced technical personnel - architects, engineers and
consultants with data to assist in the planning, design and deployment of a Citrix XenDesktop hosted
VM-based (VDI) solution that scales to 5000 desktops. This document presents the findings of internal
Citrix testing that simulates a large enterprise deployment of VDI desktops.
This document provides data generated from a sample deployment, in which a single OS image is
provisioned to 5,000 unique desktop users. This document is not intended to provide definitive guidance
on scalability, and the data should be interpreted and adapted for your specific environments. To help
you understand the data, some examples of possible recommendations are made throughout the
document to adjust to different scenarios.
The information gathered from this testing is part of a comprehensive and constantly growingguidebook
to
scalability.
Please
reference
the
XenDesktop
Scalability
Guidelines
at
http://support.citrix.com/proddocs/topic/xendesktop-bdx/cds-scalability-wrapper-bdx.html for an
understanding how to scale in building blocks to many tens of thousands of desktops.

Citrix XenDesktop Overview


IT organizations today are looking for new ways to address their desktop challenges, whether it be rapid
provisioning, Windows 7 migrations, security, patching and updating, or remote access. They are
exploring solutions for current business initiatives, such as outsourcing, compliance and globalization.
Many are interested in bring your own computer policies, to enable IT to get out of the business of
managing hardware and focus on the core software and intellectual property that is central to the line of
business.
Citrix XenDesktop offers the most powerful and flexible desktop virtualization solution available on the
market, enabling organizations to start delivering desktops as a service to users on any device, anywhere.
With FlexCast delivery technology, XenDesktop can match the virtual desktop model to the
performance, security, flexibility and cost requirements of each group of users across the enterprise.
This document focuses on the scalability and test results of one of the six FlexCast delivery models:
hosted VM-based desktops, or VDI.

Executive Summary
Citrix internally tested a sample VDI deployment designed for high-availability and simulated real-world
workloads using XenDesktop 4. The end-to-end environment included more than 3300 Windows XP
virtual desktops. In addition, key components were individually tested to determine their ability to
support more than 5000 desktops. Combining the complete system results with the individual
component tests enabled Citrix to extrapolate results to support a single virtual desktop infrastructure
design that can deliver at least 5000 desktops.
The full VDI infrastructure was built using the following components:
o
o
o
o
o
o

Desktop Delivery Controller for brokering, remoting and managing the virtual desktop
Citrix Provisioning Services for OS provisioning
NetApp centralized storage for storing user profiles, write cache and relevant databases
HP Blade servers for hosting the VMs
VMWare ESX and vCenter as the server virtualization infrastructure
Cisco datacenter network switches

KeyFindings
o Workloads and Boot or Logon Storms (from rapid concurrent or simultaneous user logons) have
the largest impact to how you scale and size this VDI design
o Desktop Delivery Controllers can be virtualized and have roles divided amongst them for best
scalability and resiliency
o Citrix Provisioning Services, with the release of 5.1 SP2, has demonstrated unparalleled scale
(with over 3000 users per physical server) and reliability in this VDI deployment.
o Virtual Machine density will vary with OS, workload and of course server hardware

Methodology and Workload


Testing was done in two phases - individual component scalability and full-system scalability. Central to
both phases of testing is the use of a tool that simulates real-world workloads, as well as an internally
built tool to measure session startup times (providing expected user logon times).

WorkloadDetailsLoginVSIfromLoginConsultants
One of the most critical factors of designing a scalable VDI deployment is understanding the true user
workflow and planning adequately in terms of server and storage capacity, while setting a standard for
the user experience throughout.
To accurately represent a real-world user workflow, the third-party tools from Login Consultants were
used throughout the full system testing. These tools also take measures of in-session response time,
providing a way to measure the expected user experience in accessing their desktop throughout large
scale testing, including login storms.
The widely available workload simulation tool, LoginVSI 1.x, was also coupled with the use of the idle
pool (feature in XenDesktop) to spin up sessions, simulating a scenario of all users coming in to work at
the same time and logging on. (Login VSI is freeware and can be downloaded from
www.loginconsultants.com.)
Login VSI is a benchmarking methodology that calculates an index based on the amount of
simultaneous sessions that can be run on a single machine. The objective is to find the point at which

the number of sessions generates too much load that end user experience would be noticeably e
degraded.
Login VSI simulates a medium-heavy workload user (intensive knowledge worker) running generic
applications like: Microsoft Office 2007, Internet Explorer including Flash applets and Adobe Acrobat
Reader (Note: For the purposes of this test, applications were installed locally, not streamed or hosted).
Like real users, the scripted session will leave multiple applications open at the same time. Every session
will average about 20% minimal user activity, similar to real world usage. Note that during each 18
minute loop users open and close files a couple of time per minutes which is probably more intensive
that most users.
Each loop will open and use:
Outlook 2007, browse 10 messages & type new message.
Internet Explorer, one instance is left open, one instance is browsed to Microsoft.com,
VMware.com and Citrix.com (locally cached copies of these websites).
Word 2007, one instance to measure response time (9 times), one instance to review, edit and
print a random document.
Solidata PDF writer & Acrobat Reader, the word document is printed to PDF and reviewed.
Excel 2007, a very large randomized sheet is opened and edited.
PowerPoint 2007, a random presentation is reviewed and edited.
3 Breaks (40, 20 & 40 seconds) are included to emulate real world usage.

ComponentScalabilityResults
The following components were tested for individual scalability:
Desktop Delivery Controller (DDC)
VMWare ESX Server on blade servers
Provisioning Services (note this testing was done as part of full-scale system tests)


XenDesktop Desktop Delivery Controller (DDC)
The DDCs were virtualized on ESX server and some of the roles of the DDC were assigned to specific
DDCs, an approach often taken in Citrix XenApp deployments. The DDCs were configured such that:
DDC 1: Farm Master and Pool Management
DDC 2 & 3: VDA Registrations and XML Brokering
In this environment, 3 DDCs (4vCPU, 4GB RAM) were shown to be able to sustain a farm size of 6000
desktops and proved stable handling over 120k logons from a pool of 5650 users.
It was necessary to have multiple Virtual Center instances to support this scale; each VC instance
required a new XenDesktop desktop group. In the testing 5 VCs were used with the following
distribution:
2 x 2000 Desktops
2 x 700 Desktop
1 x 600 Desktops
The stability of the deployment was validated using the following method:
All VMs were powered on using Idle Pool Management. This feature of XenDesktop allows the
environment to be automatically brought up in a controlled manner in advance of peak user
activity.

An initial logon storm was created by logging users on at a rate of ~3 per second.
Followed by a steady load as users logged off, rebooted and VDAs re-registered.

Single Server Scalability


The Single Server Scalability tests are focused on determining the number of Virtual Desktops a given
target machine can support. There are many permutations of tests that could be performed to evaluate
specific features or architectures. As this testing is a precursor to more comprehensive scalability tests
and guidelines, exploring a broad set of configurations was not within the scope of the project.
The methodology used is based on Project Virtual Reality Check (Project VRC:
http://www.virtualrealitycheck.net); project VRC was collaboration between two Consulting companies
(Login Consultants and PQR) with the objective of measuring hypervisor scalability using Login VSI 1.0.
The key differences are between the testing methodology used at Citrix and that of Project VRC are:
Provisioning Services was included and enables Pooled XP Desktops running from a single
common vDisk. Some changes were made to the session logon scripts to prevent unnecessary
file copy operations that would impact the PVS Write Cache; this operation was intended for
XenApp environments.
Connections are brokered via the XenDesktop DDC, not direct connections.
The XP Virtual Desktops have been allocated 512MB RAM, compared to 1GB in the case of
ProjectVRC.
Roaming users have been used instead of local profiles, as this would be representative of a VDI
deployment.
Each of the hardware platforms tested were intended to show scalability in memory and CPU bound
conditions along with cases where the environment was rich in memory and CPU resources.
VM Density used in Large Scale Testing
The following specifications were used.
o Windows XP pooled desktops
o 1vCPU and 512MB RAM.
o 1.5 GB PVS Cache on NFS (NetApp
3170HA)
o HP BL460c Dual Quad Core (1.86GHz
L5320) 16GiB RAM
o HP BL460c Dual Quad Core (2.5GHz
L5420) 32GB RAM

o ESX 3.5 Update 4

o VMs/Host

o VMs/Core

o 28

o 3.5

o 50

o 6.25

Note that at smaller scale, slightly higher single server density was possible, however at large scale we
noticed some degradation of performance. Testing showed that with 34 desktops on the BL460c 16GB
blade that ballooning was occurring and was unable to free enough memory. This caused the ESX host
to start to swap the guest memory to the storage tier. This impacted the end user experience as pages the
guest believed to be in memory were actually on disk, causing an increase in latency for accessing those
pages. A reduction in the number of guests per host removed the swapping behavior and removed the
impact on the end user experience that was seen when the environment was being scaled out.
Testing with 32GB RAM, 52 desktops were possible though the system was close to becoming CPU
bound. To avoid the risk of impacting user experience, we slightly reduced density used in the large-scale
tests.

Provisioning Services Scalability


The scalability of Provisioning Services builds on the results from the SSS testing. As we increased the
number of desktops being streamed from PVS, we monitored the Login VSI score and the logon time
to ensure that the end user experience remained acceptable. Standard Perfmon metrics were also
captured to understand the characteristics of PVS and streaming pooled desktops.
As the full-system scalability testing was conducted and users added to the maximum capacity of the
hardware, it was observed that ONE physical Provisioning Server could easily support the 3300
desktops. This is a significant improvement from earlier testing of previous versions of the technology.

Findings
To build a 5000 VDI desktop deployment, the findings of this round of testing indicates some new
guidance in our overall approach to scalability, to be captured in a comprehensive scalability guide in the
near future:

TheDesktopandDesiredUserExperience
Ensuring proper design of a large-scale VDI deployment requires that you have a good understanding of
how the users on average will be using their desktops and applications. The two critical elements are
login storms and the in-session workload.
The test environment is capable of supporting a login storm of 5000 desktops based on test data.
LoginVSI workload was for a medium type of user as described in the Methodology section.
If the user workload varies greatly on average from the one described in this design, then you need to
model the workload on at least a single-server basis to gain approximations for sizing servers and storage
components differently.

CitrixXenDesktopDesktopDeliveryController
XenDesktop Desktop Delivery Controller configuration was an enterprise installation with the following
adjustments to allow distribution of roles to 3 virtualized brokers:
Farm master (DDC1)
Registry configured so that the DDC rejects VDA registrations.

Pool Management throttling was configured at 40 desktops, overriding the default of 10% of the
pool size (~160-170 desktops depending on the group.

Configured as the preferred Farm Master.

VDA registration and XML brokering (DDC2 and DDC3)


The above pool management configuration change was made in case pool management failed
over to a different VDA.
This configuration was tested to support 5000 sessions.

10

StorageRecommendations
For a large VDI deployment, a scalable storage solution is a cost-effective and reliable solution. The
NetApp FAS3170HA was used with 2 controllers, 70 x 300GB drives for storage and PAMII cards.
The PAM II modules in the NetApp FAS3170HA filer did not offer any gains as the workload on the
storage was write focused. For this version of XenDesktop and VDI design, the PAM II cards are not
required and would be not recommended
Otherwise, this particular configuration of NetApp is recommended as designed here for 5000 users,
with the assumption that there will be some potential degradation in a complete failover situation (where
one NetApp controller fails complete or similar failure). To tune the NetApp sizing for your particular
failover/recovery needs, its recommended to work with a NetApp sales engineer.
The FAS3170 was running OnTap version 7.3.2 with PAMII cards enabled. One aggregate per
controller with multiple volumes created on each aggregate per the layout shown below.

11

ServerHardwareFindings
For hosting the actual virtual desktops, a blade server configuration is recommended.
In this design, approximately 50 VMs/host was achieved using the following:
HP BL460
2 x 1.86Ghz Intel Xeon L5320 Quad Core (8MiB L2 Cache 1066Mhz Bus)
1 x 36GB HDD SAS 10K rpm
16 GB RAM 667Mhz
Dual Broadcom 1Gb NICs
QLogic QMH2462 Dual Port Fibre Channel HBA

HP BL460c
2 x 2.5Ghz Intel Xeon L5420 Quad Core (12MiB L2 Cache 1333Mhz Bus)
1 x 72GB HDD SAS 10K rpm
32 GB RAM 667Mhz
Dual Broadcom 1Gb NICs
QLogic QMH2462 Dual Port Fibre Channel HBA
Using similar hardware configuration but with newer updated Intel Nehalem processors (55xx series)
and memory configurations 64-96GB should provide significantly increased VM density.
For Provisioning Services, dedicated servers were used and over-specified for this design of 5000
desktops. An HP BL680 was used:
Citrix PVS Server
OS:
Windows 2008 64bit
Make:
HP
CPU:
4 x Intel E7450 2.4GHz
Disk:
2 x 72GB 10k SAS
Provisioning Services 5.1 SP2

Service Pack:
Model:
RAM:
Network:

camb5e1b02
1
BL680
64GiB
8 x 1GbE

From the test data, this server was highly underutilized.


The 24 core server is clearly over specified. With a peak of < 30%, this would equate to 7.2 cores. A dual
quad core server would expect to be able to handle this load, though may be too close to the maximum
utilization; hence instead of two 24 core servers, three 8 core servers would be sufficient.

12

ServerVirtualizationFindings
In our testing, two desktop groups were configured, pointing at two different VMWare Virtual Center
servers.
Virtual Center 1 would run 1604 desktop sessions on 32 blades.
Virtual Center 2 would run 1708 desktop sessions on 61 blades.
Based on VMware best practice for the software versions used (VMWare ESX 3.5 Update 4) and
published maximums (2000 VMs per Virtual Center) the environment had to be split over 2 Virtual
Center instances.
Since then, VMWare has released version 4.0 that has higher limits than the 2000 VMs tested in version
3.5 (note that in version 4, the limit is respectively 3000 and 4500 for 32bit and 64bit guests). In
general, the recommendation would be to have the least number of Virtual Centers configured.
No changes were made or recommended from a standard installation. Servers were placed into logical
clusters, with one cluster matching one blade enclosure.
VMware ESX 3.5.0 build 176894 was used on all ESX hosts in the environment. Each host is
configured with a single virtual switch with both vmnic0 and vmnic1 connected.

The VM Network is configured with vmnic0 as active and vmnic1 as standby.


o This is used for ICA, PVS and general network traffic
The Service Console is not specifically bound to a specific vmic
VMotion is configured with vmnic1 as active and vmnic0 as standby
o This is used for NFS and VMotion traffic

13

Service Console was allocated 800MiB.


NFS configuration changes were made as per current NetApp guidance in the NetApp Technical
Report TR-3428
NTP was configured to sync time.
ESX hosts were installed with the latest HP ESX utilities for monitoring hardware.
Due to interrupt sharing issues between the vmkernel and the service console USB was disabled in
the BIOS. See VMWare KB article 1003710. Note that while the BIOS disabled USB, USB was
still available from the iLO so remote keyboard access was still available.

AdditionalImplicationsforScalabilityDesign
Dont place the PVS vDisk on a CIFS share.
o Windows does not cache files from file shares in memory, thus each time a call is made
to the PVS server to it in turn has to reach out to the shared storage.
Ensure VMware Virtual Center hasnt set a resource limit on your Virtual Machine
o When we moved from the DDC testing which used 256MiB guests to the large-scale test
we increased the VM memory back to 512MiB however for some reason a limit was
placed on the memory resources available to the guest of 256MiB. This resulted in a VM
which appeared to have 512MiB RAM but was limited to only using 256MiB of physical
RAM and the rest was held in the VMware swap file, leading to huge increase in our
storage IO to the SAN which crippled the large scale environment down to less than 100
desktops. Check: Virtual Machine Properties -> Resources -> Memory -> Limit:
Dont place too many Virtual Machines on VMFS volumes
o Not applicable to the NFS implementation, but seen with SSS testing using local VMFS
volumes and also FC attached VMFS volumes. Impact was most noticeable on user
logon time with it quickly increasing with more than 40 active VMs on a single VMFS
volume. Splitting this on to multiple volumes on the same number of disks alleviated the
problem.
.NET 3.5 SP1 (+ later windows updates) is necessary to improve scalability of the DDC

14

o Without this update applied we would see VDAs deregister as users began to login to the
system. This was seen with ~1500 desktops and higher. The Microsoft fixes to .NET
addressed the problem and allowed testing to achieve ~6000 desktops.
By default Pool Management will attempt to start 10% of the total pool size. In a large
environment this may be more than Virtual Center can cope with.
o The number of concurrent requests can be throttled by editing the Pool Management
Service configuration file:
o C:\Program Files\Citrix\VMManagement\CdsPoolMgr.exe.config
o Modify the <appSetting> section by adding the line:
o <add key="MaximumTransitionRate" value="20"/>
o The Pool Management service needs to be restarted to read the new configuration.
o If VMware DRS is being used a lower value should be set as DRS needs additional time
to determine guest placement before powering it on. In our testing with DRS enabled
the rate of 20 was used.
o In our testing we allowed DRS to do the initial VM placement through a full run, DRS
was then disabled and this allowed the MaxiumumTransisionRate to be increased to 40
without VC becoming overloaded.
Details on assigning the farm master roles can be found in CTX117477. Note that the XenDesktop
PowerShell SDK can also be used to configure the preferred farm master.
To stop the farm master handling connections, see the MaxWorkers registry key in CTX117446.
PVS NIC teaming can simplify the deployment of the PVS server.
o NIC teaming also improves the reliability, as one PVS server has one IP address, if a
network connection fails, the remaining connections take over the load and the PVS
server continues to operate on its current IP. This is especially useful for failover and HA
as only one IP address needs to be specified for the login server per host. This also
allows the network layer to handle the load balancing of client connections over the
available NICs.

15

Large Scale Test Results


TestDetails
The test run of 3312 desktops comprised of an idle pool spin up with the following details:
o All sessions launched within approximately 60 minutes.
o Individual logon times tracked to ensure logon performance did not degrade
significantly.
o All running the Login VSI 1.1 workload and their response times logged.
o At the end of the VSI workload phase the users would logout. This triggers Pool
Management to shutdown then restart the desktop.
o PVS HA testing to ensure all desktops would continue to run in the event of a PVS
server failure.
o Use the various product management consoles during the test to ensure they remain
responsive to general admin tasks.
Environment: Two desktop groups, pointing at two different Virtual Center servers.
o Virtual Center 1 ran 1604 desktop sessions on 32 blades.
o Virtual Center 2 ran 1708 desktop sessions on 61 blades.
o Based on VMware best practice and published maximums the environment had to be
split over 2 Virtual Center instances. Within the Virtual Center individual clusters are
created for each blade chassis (of up to 16 blade servers).
o Virtual Center 1 has clusters for two chassis of the more powerful blade servers. Virtual
Center 2 hosts clusters for the other four chassis of blades.

16

SummaryofLargeScaleTestResults

Powering on all 3312 desktops ready for users to login took less than 60 minutes using
XenDesktop Idle Pool Management capability.
Using a launch rate of 107/minute, 99% of users logged on in 31 minutes.
PVS was shown to be able to run 3312 desktops from a HA pair of servers. In a separate test
one of the PVS servers was shutdown triggering a HA failover. The ~1600 sessions transferred
to the other server within 8 minutes.
The scalability of the environment was verified through analysis of the logon times, Login VSI
test response times and performance metrics gathered from all the major components.
The perfmon data confirms that a number of the servers were oversized and could easily handle
more load than was placed on them in this test.
It took on average 19 seconds from launching the ICA file to having a fully running desktop.
Login VSI response times indicate the system remained at an acceptable performance level for
all users during the test.

SessionPerformanceandSessionStartupTimes
LoginVSI results illustrate the capture of response time against the count of sessions launched.
You can observe that the max response time increases nominally as session count increases, but that
overall, average response times stay within the 2000ms for the duration.

17

MaxResponse_Time

MinResponse_Time

AverageResponse_Time

4500
4000

Responsetime(ms)

3500
3000
2500
2000
1500
1000
500

1
104
207
310
413
516
619
722
825
928
1031
1134
1237
1340
1443
1546
1649
1752
1855
1958
2061
2164
2267
2370
2473
2576
2679
2782
2885
2988
3091
3194

ActiveSessions

o TotalSessionsLaunched

o 3312

o UncorrectedOptimalPerformanceIndex(UOPI)

o 3312

o StuckSessionCountbeforeUOPI(SSC)

o 0

o LostSessionCountbeforeUOPI(LSC)

o 44

o CorrectedOptimalPerformanceIndex(COPI=UOPI(SSC*50%)LSC)

o 3268

Session start-up time is a measure of the time taken from starting the ICA client on the client launcher,
having received the ICA file from a successful XML brokering request, to the session loading and the
STAT mini agent (a .NET application loaded by the windows start-up folder) loading. This method of
calculating start-up time is the closest approximation to true user logon time in such a test environment
Logon times can be seen to fit mostly in band between 15-22 seconds though with some stray sessions
taking close to 40 seconds near the end of the logon storm and the earlier users and part way through
their first workload run.

18

Min
11 secs

Max
39 secs

Average
19 seconds

DesktopDeliveryControllerandProvisioningServices
Performance
Where available, the data is presented for the environment during the spin up phase, which is controlled
via XenDesktop Idle Pool Management and during the test run; where the Login VSI 1.x workload is
run in all the sessions until all desktops have run the full set of scripts at least once, then a file is dropped
on a network share which triggers the Login VSI scripts to initiate a logoff when they next complete a
full run of the scripts.
As the desktops were configured to reboot on logoff, additional load is placed on the systems when
users begin to logoff and then idle pool management powers them back on again.
Standard Microsoft Windows perfmon counters were used to collect the following performance metrics.
Desktop Delivery Controller Performance

19

As mentioned previously, 3 DDCs were used in this test with specific roles assigned. All are running as
Virtual Machines on a separate ESX server to the desktop VMs. Configured with 4vCPU and 4GB
RAM, running on a HP BL460c with 2 x 1.8 GHz Quad Core L5320 CPU and 16 GB RAM.

DDC1: Farm Master + Pool Management


Pool Spin Up

During Test Run

% Processor Time: _Total (4vCPU)

15:11:22
15:15:45
15:20:08
15:24:31
15:28:54
15:33:17
15:37:40
15:42:03
15:46:26
15:50:49
15:55:11
15:59:33
16:03:56
16:08:18
16:12:41
16:17:03
16:21:26
16:25:49
16:30:12
16:34:35
16:38:59

50
45
40
35
30
25
20
15
10
5
0
13:10:54
13:14:48
13:18:43
13:22:38
13:26:33
13:30:27
13:34:21
13:38:15
13:42:09
13:46:03
13:49:57
13:53:51
13:57:45
14:01:39
14:05:33
14:09:27
14:13:21
14:17:15
14:21:09
14:25:03
14:28:57

50
45
40
35
30
25
20
15
10
5
0

XenDesktop Services % Processor Time

15:11:22
15:15:59
15:20:36
15:25:14
15:29:51
15:34:28
15:39:05
15:43:43
15:48:19
15:52:56
15:57:33
16:02:09
16:06:46
16:11:23
16:15:59
16:20:36
16:25:13
16:29:50
16:34:28
16:39:06

200
180
160
140
120
100
80
60
40
20
0
13:10:54
13:14:55
13:18:58
13:23:00
13:27:01
13:31:02
13:35:03
13:39:04
13:43:06
13:47:07
13:51:08
13:55:09
13:59:10
14:03:11
14:07:13
14:11:14
14:15:15
14:19:16
14:23:17
14:27:18

200
180
160
140
120
100
80
60
40
20
0

Process %ProcessorTime CdsPoolMgr

Process %ProcessorTime CdsPoolMgr

Process %ProcessorTime CdsImaProxy

Process %ProcessorTime CdsImaProxy

Process %ProcessorTime CdsController

Process %ProcessorTime CdsController

Process %ProcessorTime CitrixManagementServer

Process %ProcessorTime CitrixManagementServer

Process %ProcessorTime ImaSrv

Process %ProcessorTime ImaSrv

The main item to note is that during pool spin up the high usage process is the CdsPoolMgr process. This is expected is it drives Virtual
Center to start the guests up. The two peaks of the IMA service during Pool Spin up are caused by the UI taking the two desktop groups
out of Maintenance Mode.
During the test run itself IMASrv is responsible for brokering all the desktops, and so the zone master takes the most load while making

20

the decision on desktop assignment. In the later stage of the run desktops are starting to logoff and so the Pool Management Service is
starting to shutdown and restart the desktops.

21

15:11:22
15:16:13
15:21:05
15:25:56
15:30:48
15:35:39
15:40:31
15:45:22
15:50:13
15:55:04
15:59:55
16:04:45
16:09:36
16:14:27
16:19:18
16:24:09
16:29:01
16:33:53
16:38:45

13:10:54
13:15:09
13:19:27
13:23:42
13:27:58
13:32:13
13:36:28
13:40:44
13:44:59
13:49:15
13:53:30
13:57:45
14:02:00
14:06:16
14:10:31
14:14:46
14:19:02
14:23:17
14:27:32

15:11:22
15:15:59
15:20:36
15:25:14
15:29:51
15:34:28
15:39:05
15:43:43
15:48:19
15:52:56
15:57:33
16:02:09
16:06:46
16:11:23
16:15:59
16:20:36
16:25:13
16:29:50
16:34:28
16:39:06

13:10:54
13:14:55
13:18:58
13:23:00
13:27:01
13:31:02
13:35:03
13:39:04
13:43:06
13:47:07
13:51:08
13:55:09
13:59:10
14:03:11
14:07:13
14:11:14
14:15:15
14:19:16
14:23:17
14:27:18

15:11:22
15:16:27
15:21:33
15:26:39
15:31:45
15:36:50
15:41:56
15:47:01
15:52:07
15:57:12
16:02:16
16:07:21
16:12:27
16:17:32
16:22:37
16:27:42
16:32:48
16:37:55

13:10:54
13:15:23
13:19:55
13:24:25
13:28:55
13:33:24
13:37:54
13:42:23
13:46:53
13:51:22
13:55:52
14:00:21
14:04:51
14:09:20
14:13:50
14:18:19
14:22:49
14:27:18

MiB
2,000
MiB

Memory Committed Megabytes

4,000
4,000

3,500
3,500

3,000
3,000

2,500
2,500

1,500

100
95
90
85
80
75
70
65
60
55
50

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0

2,000

1,500

1,000
1,000

500
500

0
0

The memory usage on this DDC grows significantly towards the end of the run as users log off. This will trigger the tainting detection code
to shutdown the VM. Once shutdown pool management will power it back on again.

time garbage collection would correct the spike.


Further investigation is required to better understand the dramatic memory increase at the end of the test. Its suspected that given enough

PhysicalDisk -- % Idle Time -- _Total

100
95
90
85
80
75
70
65
60
55
50

Context Switches (per second)

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0

22

4.5

4.5

3.5

3.5

3
Mbps

2.5

2.5
2

1.5

1.5

0.5

0.5

MbpsReceived

15:11:22
15:16:27
15:21:33
15:26:39
15:31:45
15:36:50
15:41:56
15:47:01
15:52:07
15:57:12
16:02:16
16:07:21
16:12:27
16:17:32
16:22:37
16:27:42
16:32:48
16:37:55

13:10:54
13:15:23
13:19:55
13:24:25
13:28:55
13:33:24
13:37:54
13:42:23
13:46:53
13:51:22
13:55:52
14:00:21
14:04:51
14:09:20
14:13:50
14:18:19
14:22:49
14:27:18

Mbps

Network Utilisation (Mbps)

MbpsSent

MbpsReceived

MbpsSent

The spikes in network traffic at the end of the test correspond to the desktops being shutdown and restarted by the pool
management service. This traffic is between the DDC and the Virtual Center servers, as can be seen by the corresponding
increase on traffic on both VC at this time.

23

DDC2: XML + VDA registration


Pool Spin Up

During Test Run

% Processor Time: _Total (4vCPU)

15:11:22
15:15:44
15:20:07
15:24:30
15:28:53
15:33:16
15:37:38
15:42:01
15:46:23
15:50:46
15:55:08
15:59:30
16:03:52
16:08:14
16:12:36
16:16:58
16:21:20
16:25:42
16:30:04
16:34:26
16:38:48

50
45
40
35
30
25
20
15
10
5
0
13:10:53
13:14:47
13:18:41
13:22:34
13:26:28
13:30:22
13:34:15
13:38:09
13:42:03
13:45:56
13:49:50
13:53:44
13:57:38
14:01:31
14:05:25
14:09:19
14:13:12
14:17:06
14:20:59
14:24:53
14:28:47

50
45
40
35
30
25
20
15
10
5
0

XenDesktop Services % Processor Time

15:11:22
15:15:59
15:20:36
15:25:13
15:29:50
15:34:27
15:39:03
15:43:41
15:48:17
15:52:53
15:57:29
16:02:05
16:06:42
16:11:18
16:15:54
16:20:30
16:25:06
16:29:43
16:34:19
16:38:56

200
180
160
140
120
100
80
60
40
20
0
13:10:53
13:14:54
13:18:55
13:22:56
13:26:56
13:30:57
13:34:58
13:38:58
13:42:59
13:47:00
13:51:01
13:55:02
13:59:03
14:03:03
14:07:04
14:11:05
14:15:05
14:19:06
14:23:07
14:27:08

200
180
160
140
120
100
80
60
40
20
0

Process %ProcessorTime CdsPoolMgr

Process %ProcessorTime CdsPoolMgr

Process %ProcessorTime CdsImaProxy

Process %ProcessorTime CdsImaProxy

Process %ProcessorTime CdsController

Process %ProcessorTime CdsController

Process %ProcessorTime CitrixManagementServer

Process %ProcessorTime CitrixManagementServer

Process %ProcessorTime ImaSrv

Process %ProcessorTime ImaSrv

In contrast to DDC1 the load is noticeably lower. The main active process is the CdsController which handles communication with the
VDA including heartbeats and initial registration.

24

15:11:22
15:16:13
15:21:04
15:25:55
15:30:47
15:35:38
15:40:29
15:45:20
15:50:10
15:55:00
15:59:51
16:04:41
16:09:31
16:14:22
16:19:12
16:24:03
16:28:53
16:33:44
16:38:34

13:10:53
13:15:08
13:19:23
13:23:38
13:27:53
13:32:08
13:36:23
13:40:38
13:44:52
13:49:08
13:53:22
13:57:38
14:01:53
14:06:07
14:10:22
14:14:37
14:18:52
14:23:07
14:27:22

15:11:22
15:15:59
15:20:36
15:25:13
15:29:50
15:34:27
15:39:03
15:43:41
15:48:17
15:52:53
15:57:29
16:02:05
16:06:42
16:11:18
16:15:54
16:20:30
16:25:06
16:29:43
16:34:19
16:38:56

13:10:53
13:14:54
13:18:55
13:22:56
13:26:56
13:30:57
13:34:58
13:38:58
13:42:59
13:47:00
13:51:01
13:55:02
13:59:03
14:03:03
14:07:04
14:11:05
14:15:05
14:19:06
14:23:07
14:27:08

15:11:22
15:16:27
15:21:33
15:26:38
15:31:43
15:36:49
15:41:54
15:46:59
15:52:03
15:57:08
16:02:12
16:07:17
16:12:21
16:17:26
16:22:31
16:27:35
16:32:40
16:37:45

13:10:53
13:15:22
13:19:52
13:24:20
13:28:49
13:33:19
13:37:48
13:42:17
13:46:46
13:51:15
13:55:44
14:00:13
14:04:43
14:09:11
14:13:40
14:18:09
14:22:38
14:27:08

MiB
2,000
MiB

Memory Committed Megabytes

4,000
4,000

3,500
3,500

3,000
3,000

2,500
2,500

1,500

100
95
90
85
80
75
70
65
60
55
50

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0

2,000

1,500

1,000
1,000

500
500

0
0

PhysicalDisk -- % Idle Time -- _Total

100
95
90
85
80
75
70
65
60
55
50

Due to some previous memory leak tracing for the IMA Service, user mode stack trace database was being created for the imasrv.exe. This

extra tracing was causing the higher than normal disk utilization, showing a steady baseline of 20% utilisation.

Context Switches (per second)

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0

Network Utilisation (Mbps)

25

MbpsReceived

15:11:22
15:15:44
15:20:06
15:24:29
15:28:51
15:33:13
15:37:36
15:41:58
15:46:21
15:50:42
15:55:04
15:59:26
16:03:48
16:08:10
16:12:32
16:16:54
16:21:16
16:25:38
16:30:00
16:34:22
16:38:45

13:10:54
13:14:47
13:18:42
13:22:35
13:26:29
13:30:22
13:34:16
13:38:10
13:42:03
13:45:57
13:49:51
13:53:44
13:57:38
14:01:32
14:05:26
14:09:19
14:13:13
14:17:06
14:21:00
14:24:53
14:28:47

15:11:22
15:16:27
15:21:33
15:26:38
15:31:43
15:36:49
15:41:54
15:46:59
15:52:03
15:57:08
16:02:12
16:07:17
16:12:21
16:17:26
16:22:31
16:27:35
16:32:40
16:37:45

13:10:53
13:15:22
13:19:52
13:24:20
13:28:49
13:33:19
13:37:48
13:42:17
13:46:46
13:51:15
13:55:44
14:00:13
14:04:43
14:09:11
14:13:40
14:18:09
14:22:38
14:27:08

Mbps
2.5
Mbps

5
5

4.5
4.5

4
4

3.5
3.5

3
3

50
45
40
35
30
25
20
15
10
5
0

2.5

2
2

1.5
1.5

1
1

0.5
0.5

0
0

MbpsSent
MbpsReceived

DDC3: XML + VDA registration


Pool Spin Up
MbpsSent

During Test Run

% Processor Time: _Total (4vCPU)

50
45
40
35
30
25
20
15
10
5
0

26

XenDesktop Services % Processor Time

15:11:22
15:15:58
15:20:35
15:25:11
15:29:48
15:34:24
15:39:01
15:43:38
15:48:14
15:52:50
15:57:26
16:02:02
16:06:38
16:11:14
16:15:50
16:20:26
16:25:03
16:29:39
16:34:15
16:38:52

200
180
160
140
120
100
80
60
40
20
0
13:10:54
13:14:55
13:18:56
13:22:56
13:26:57
13:30:58
13:34:59
13:38:59
13:43:00
13:47:01
13:51:02
13:55:02
13:59:03
14:03:04
14:07:05
14:11:05
14:15:06
14:19:06
14:23:07
14:27:08

200
180
160
140
120
100
80
60
40
20
0

Process %ProcessorTime CdsPoolMgr

Process %ProcessorTime CdsPoolMgr

Process %ProcessorTime CdsImaProxy

Process %ProcessorTime CdsImaProxy

Process %ProcessorTime CdsController

Process %ProcessorTime CdsController

Process %ProcessorTime CitrixManagementServer

Process %ProcessorTime CitrixManagementServer

Process %ProcessorTime ImaSrv

Process %ProcessorTime ImaSrv

The load profile is as expected similar to DDC2. In contrast to DDC1 the load is noticeably lower. The main active process is the
CdsController which handles communication with the VDA including heartbeats and initial registration.
4,000

3,500

3,500

3,000

3,000

2,500

2,500
MiB

4,000

2,000

2,000
1,500

1,000

1,000

500

500

0
15:11:22
15:16:26
15:21:31
15:26:36
15:31:41
15:36:46
15:41:51
15:46:56
15:52:00
15:57:05
16:02:09
16:07:14
16:12:18
16:17:22
16:22:27
16:27:31
16:32:36
16:37:41

1,500

13:10:54
13:15:23
13:19:52
13:24:21
13:28:50
13:33:19
13:37:49
13:42:18
13:46:47
13:51:16
13:55:45
14:00:14
14:04:43
14:09:12
14:13:41
14:18:10
14:22:39
14:27:08

MiB

Memory Committed Megabytes

27

15:11:22
15:16:12
15:21:03
15:25:54
15:30:44
15:35:35
15:40:26
15:45:17
15:50:07
15:54:57
15:59:48
16:04:38
16:09:28
16:14:18
16:19:09
16:23:59
16:28:49
16:33:40
16:38:31

13:10:54
13:15:09
13:19:24
13:23:39
13:27:54
13:32:09
13:36:24
13:40:38
13:44:53
13:49:08
13:53:23
13:57:38
14:01:53
14:06:08
14:10:23
14:14:37
14:18:52
14:23:07
14:27:22

15:11:22
15:15:58
15:20:35
15:25:11
15:29:48
15:34:24
15:39:01
15:43:38
15:48:14
15:52:50
15:57:26
16:02:02
16:06:38
16:11:14
16:15:50
16:20:26
16:25:03
16:29:39
16:34:15
16:38:52

13:10:54
13:14:55
13:18:56
13:22:56
13:26:57
13:30:58
13:34:59
13:38:59
13:43:00
13:47:01
13:51:02
13:55:02
13:59:03
14:03:04
14:07:05
14:11:05
14:15:06
14:19:06
14:23:07
14:27:08

PhysicalDisk -- % Idle Time -- _Total

100
95
90
85
80
75
70
65
60
55
50
100
95
90
85
80
75
70
65
60
55
50

Context Switches (per second)

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0

28

4.5

4.5

3.5

3.5

3
Mbps

Mbps

Network Utilisation (Mbps)

2.5
2

2.5
2

1.5

1.5

0.5

0.5

MbpsReceived

15:11:22
15:16:26
15:21:31
15:26:36
15:31:41
15:36:46
15:41:51
15:46:56
15:52:00
15:57:05
16:02:09
16:07:14
16:12:18
16:17:22
16:22:27
16:27:31
16:32:36
16:37:41

13:10:54
13:15:23
13:19:52
13:24:21
13:28:50
13:33:19
13:37:49
13:42:18
13:46:47
13:51:16
13:55:45
14:00:14
14:04:43
14:09:12
14:13:41
14:18:10
14:22:39
14:27:08

MbpsSent
MbpsReceived

MbpsSent

Citrix Provisioning Services (PVS) Performance


There are 2 PVS servers handling the 3312 desktops in the environment. The processor and memory
configuration for these servers can clearly be seen to significantly over-specified. The servers 8 gigabit
NICs were configured as NIC team, the blade chassis had 4x10GbE uplink to the core switch.
The PVS servers are each running on BL680c blades with 4 x E7450 2.40 GHz hex core CPUs, with
64GB RAM.

29

15:11:22
15:16:26
15:21:31
15:26:35
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:08
16:07:12
16:12:16
16:17:21
16:22:25
16:27:30
16:32:34
16:37:39

13:10:53
13:17:22
13:23:51
13:30:21
13:36:50
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:46
14:35:15
14:41:44
14:48:13
14:54:42
15:01:11

MiB
6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000
MiB

15:11:22
15:15:58
15:20:34
15:25:10
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:25
16:02:01
16:06:37
16:11:13
16:15:49
16:20:25
16:25:01
16:29:37
16:34:14
16:38:50

13:10:53
13:16:39
13:22:26
13:28:13
13:34:00
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:30
14:20:16
14:26:03
14:31:49
14:37:36
14:43:23
14:49:09
14:54:56
15:00:43

PVS Server 1
Pool Spin Up
During Test Run

% Processor Time: _Total (4 x 6 Core CPUs)

100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0

Memory Committed Megabytes

6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000

30

MbpsReceived
15:11:22
15:16:26
15:21:31
15:26:35
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:08
16:07:12
16:12:16
16:17:21
16:22:25
16:27:30
16:32:34
16:37:39

13:10:53
13:17:22
13:23:51
13:30:21
13:36:50
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:46
14:35:15
14:41:44
14:48:13
14:54:42
15:01:11

1500
Mbps

Mbps

15:11:22
15:15:58
15:20:34
15:25:10
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:25
16:02:01
16:06:37
16:11:13
16:15:49
16:20:25
16:25:01
16:29:37
16:34:14
16:38:50

13:10:53
13:16:39
13:22:26
13:28:13
13:34:00
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:30
14:20:16
14:26:03
14:31:49
14:37:36
14:43:23
14:49:09
14:54:56
15:00:43

PhysicalDisk -- % Idle Time -- _Total

100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0

Network Utilisation (Mbps) (8 Teamed 1GbE NICs)

4000

3500
4000

3000
3500

2500
3000

2000
2500

1000

500

2000

1500

1000

500

MbpsSent
MbpsReceived
MbpsSent

Peak traffic occurs during the user logon phase of the test run with a peak close to 2.3Gbps.

31

15:11:22
15:16:26
15:21:31
15:26:36
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:07
16:07:12
16:12:16
16:17:20
16:22:24
16:27:29
16:32:33
16:37:38

13:10:53
13:17:22
13:23:52
13:30:21
13:36:51
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:45
14:35:14
14:41:43
14:48:12
14:54:41
15:01:10

MiB
6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000
MiB

15:11:22
15:15:58
15:20:34
15:25:11
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:24
16:02:00
16:06:36
16:11:12
16:15:48
16:20:24
16:25:00
16:29:36
16:34:13
16:38:49

13:10:53
13:16:39
13:22:26
13:28:14
13:34:01
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:29
14:20:16
14:26:03
14:31:49
14:37:36
14:43:22
14:49:09
14:54:55
15:00:42

PVS Server 2
Pool Spin Up
During Test Run

% Processor Time: _Total (4 x 6 Core CPUs)

100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0

The 24 core server is clearly over specified. With a peak of < 30%, this would equate to 7.2 cores. A dual quad core server
would expect to be able to handle this load, though may be too close to the maximum utilisation; hence instead of two 24
core servers, three 8 core servers would expect to be sufficient.
Memory Committed Megabytes

6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000

32

MbpsReceived
15:11:22
15:16:26
15:21:31
15:26:36
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:07
16:07:12
16:12:16
16:17:20
16:22:24
16:27:29
16:32:33
16:37:38

13:10:53
13:17:22
13:23:52
13:30:21
13:36:51
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:45
14:35:14
14:41:43
14:48:12
14:54:41
15:01:10

Mbps
2000
Mbps

15:11:22
15:15:58
15:20:34
15:25:11
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:24
16:02:00
16:06:36
16:11:12
16:15:48
16:20:24
16:25:00
16:29:36
16:34:13
16:38:49

13:10:53
13:16:39
13:22:26
13:28:14
13:34:01
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:29
14:20:16
14:26:03
14:31:49
14:37:36
14:43:22
14:49:09
14:54:55
15:00:42

PhysicalDisk -- % Idle Time -- _Total

100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0

Network Utilisation (Mbps) (8 Teamed 1GbE NICs)

4000
4000

3500
3500

3000
3000

2500
2500

MbpsSent

2000

1500
1500

1000
1000

500
500

0
0

MbpsReceived
MbpsSent

This network load mirrors the load seen on the other PVS server, with a peak close to 2.2Gbps.

33

NetAppStoragePerformance
Analysis concentrates on the actual test run rather than the spin up phase as the load is significantly
higher. The following summary (courtesy of NetApp) captures the critical read/write and IOPS info for
the 3312 desktop test.
Averages for 3312 Virtual Desktops
Mean Network Read/Write ratio
Max Network Read/Write ratio
Mean Disk Read/Write ratio
Max Disk Read/Write ratio

Reads
11.5%
20.5%
14.2%
17.8%

Mean IOPS per desktop


Max Average IOPS per desktop

IOPS
4.4
27.7

Writes
88.5%
79.5%
85.8%
82.2%

Analysis
o Never did more than 2 CPUS of the 4 on the storage controllers become fully utilised,
staying well within normal operating limits with significant headroom for further growth if
performance during a cluster failover is not required.
o The average latency for all protocols was well within reasonable performance, which would
provide an excellent end user experience.
o During the start and end of the test run the CIFS workload was a 50% player in protocol
usage. This is seen as a large amount of reads during the beginning of the test (when user
profiles are loaded) and a large amount of writes at the end of the test (when profiles are
written back).
o The remaining duration of the test NFS played the predominate role being utilised for PVS
client side cache.
o FCP (Fibre Channel) played very little if no part in the workload seen on the filer. FCP was
limited to database traffic for the various components in the environment.
o The majority of all IOs were writes across all protocols.
o Average and Max Disk utilization was never more than 40% which suggests there could be
headroom to accept more virtual machines on to these controllers.
o In the event of a cluster failure the data indicates the filer could handle 3000-4000 desktops
with minimal or no performance degradation.

34

VMWareVirtualCenterandESXPerformance
Two blade servers have been installed as physical Virtual Center servers. Within each VC a cluster is
created for each blade chassis of up to 16 ESX hosts. As there are two different hardware specs in the
lab the number of Virtual Desktops hosted on each VC isnt quite balanced.
Virtual Center 1
Blade Chassis
# Hosts
# Virtual Machines
camb4e1
16
898
camb4e2
16
800
Total
32
1698
During the testing only 1604 desktops were actively used. The remaining VMs remained powered off
though would still be enumerated by Virtual Center and XenDesktop Pool Management. These
additional VMs are present from earlier broker scalability testing.
Virtual Center 2
Blade Chassis
# Hosts
# Virtual Machines
camr3e1
16
481
camr3e2
14
392
camr5e1
16
480
camr5e2
15
420
Total
61
1773
In addition to the desktops above, VC2 also manages camr5e2b13 which hosts some infrastructure
VMs, e.g. 3 x Brokers and 1 x NetApp performance monitor.
Out of the 1773 desktop VMs only 1708 were powered on. As with VC1 these additional VMs were
present from earlier testing at higher host densities.

35


15:11:21
15:16:22
15:21:22
15:26:22
15:31:23
15:36:23
15:41:23
15:46:24
15:51:24
15:56:24
16:01:25
16:06:25
16:11:25
16:16:26
16:21:26
16:26:26
16:31:27
16:36:27
16:41:27

13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:57
14:22:58
14:28:58
14:34:59
14:40:59
14:46:59
14:53:00
14:59:00
15:05:01

100
90
80
70
60
50
40
30
20
10
0

250
250

200
200

150
150

100
100

50
50

0
0
15:11:21
15:16:22
15:21:22
15:26:22
15:31:23
15:36:23
15:41:23
15:46:24
15:51:24
15:56:24
16:01:25
16:06:25
16:11:25
16:16:26
16:21:26
16:26:26
16:31:27
16:36:27
16:41:27

13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:57
14:22:58
14:28:58
14:34:59
14:40:59
14:46:59
14:53:00
14:59:00
15:05:01

camr3e2b15: Virtual Center 1


Pool Spin Up
During Test Run

% Processor Time: _Total (2 x 4 Core CPUs)


100
90
80
70
60
50
40
30
20
10
0

Process -- % Processor Time -- vpxd

The vpxd service is exercised when XenDesktop Pool Management is requesting VMs be powered up or shut down. This can be seen

during the spin up phase and at the end of the test run.

As this server has 8 cores, the peak at ~200% would be equivalent to 2 cores being fully utilised.

36

15:11:21
15:16:52
15:22:22
15:27:52
15:33:23
15:38:53
15:44:24
15:49:54
15:55:24
16:00:55
16:06:25
16:11:55
16:17:26
16:22:56
16:28:26
16:33:57
16:39:27

13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:57
14:22:28
14:28:58
14:35:29
14:41:59
14:48:29
14:55:00
15:01:30

MiB

MiB

2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500

100
100

98
98

96

94

92

90
88

88
86

15:11:21
15:16:22
15:21:22
15:26:22
15:31:23
15:36:23
15:41:23
15:46:24
15:51:24
15:56:24
16:01:25
16:06:25
16:11:25
16:16:26
16:21:26
16:26:26
16:31:27
16:36:27
16:41:27

13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:57
14:22:58
14:28:58
14:34:59
14:40:59
14:46:59
14:53:00
14:59:00
15:05:01

Memory Committed Megabytes


2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500

PhysicalDisk -- % Idle Time -- _Total

96

94

92

90

37

15:11:21
15:16:52
15:22:22
15:27:52
15:33:23
15:38:53
15:44:24
15:49:54
15:55:24
16:00:55
16:06:25
16:11:55
16:17:26
16:22:56
16:28:26
16:33:57
16:39:27

13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:57
14:22:28
14:28:58
14:35:29
14:41:59
14:48:29
14:55:00
15:01:30

Mbps
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Mbps

Network Utilisation (Mbps)

5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0

NIC1:MbpsReceived
NIC1:MbpsSent
NIC1:MbpsReceived
NIC1:MbpsSent

NIC2:MbpsReceived
NIC2:MbpsSent
NIC2:MbpsReceived
NIC2:MbpsSent

38


15:11:22
15:16:23
15:21:23
15:26:23
15:31:24
15:36:24
15:41:24
15:46:25
15:51:25
15:56:25
16:01:26
16:06:26
16:11:26
16:16:27
16:21:27
16:26:27
16:31:28
16:36:28
16:41:29

13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:58
14:22:58
14:28:58
14:34:59
14:40:59
14:47:00
14:53:00
14:59:00
15:05:01

100
90
80
70
60
50
40
30
20
10
0

250
250

200
200

150
150

100
100

50
50

0
0
15:11:22
15:16:23
15:21:23
15:26:23
15:31:24
15:36:24
15:41:24
15:46:25
15:51:25
15:56:25
16:01:26
16:06:26
16:11:26
16:16:27
16:21:27
16:26:27
16:31:28
16:36:28
16:41:29

13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:58
14:22:58
14:28:58
14:34:59
14:40:59
14:47:00
14:53:00
14:59:00
15:05:01

camr3e2b16: Virtual Center 2


Pool Spin Up
During Test Run

% Processor Time: _Total (2 x 4 Core CPUs)


100
90
80
70
60
50
40
30
20
10
0

Process -- % Processor Time -- vpxd

The load on vpxd is consistent between the two VC servers.

As this server has 8 cores, the peak at ~230% would be equivalent to a little more than 2 cores being fully utilised.

39

15:11:22
15:16:53
15:22:23
15:27:54
15:33:24
15:38:54
15:44:25
15:49:55
15:55:25
16:00:56
16:06:26
16:11:56
16:17:27
16:22:57
16:28:28
16:33:58
16:39:28

13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:58
14:22:28
14:28:58
14:35:29
14:41:59
14:48:30
14:55:00
15:01:31

MiB

MiB

2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500

100
100

99
99

98
98

97
97

96
96

95
95

94
94

93
93

92
92

91
91

15:11:22
15:16:23
15:21:23
15:26:23
15:31:24
15:36:24
15:41:24
15:46:25
15:51:25
15:56:25
16:01:26
16:06:26
16:11:26
16:16:27
16:21:27
16:26:27
16:31:28
16:36:28
16:41:29

13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:58
14:22:58
14:28:58
14:34:59
14:40:59
14:47:00
14:53:00
14:59:00
15:05:01

Memory Committed Megabytes


2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500

The memory used on each VC is similar, though VC2 is ~300MiB higher. This is to be expected as its managing twice the number of ESX

hosts and a higher number of VM guests.

PhysicalDisk -- % Idle Time -- _Total

40

15:11:22
15:16:53
15:22:23
15:27:54
15:33:24
15:38:54
15:44:25
15:49:55
15:55:25
16:00:56
16:06:26
16:11:56
16:17:27
16:22:57
16:28:28
16:33:58
16:39:28

13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:58
14:22:28
14:28:58
14:35:29
14:41:59
14:48:30
14:55:00
15:01:31

Mbps
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Mbps

Network Utilisation (Mbps)

5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0

NIC1:MbpsReceived
NIC1:MbpsSent
NIC1:MbpsReceived
NIC1:MbpsSent

NIC2:MbpsReceived
NIC2:MbpsSent
NIC2:MbpsReceived
NIC2:MbpsSent

41

ESX Performance
The test environment consists of 2 different hardware configurations running the desktop workload.
The data below is from a BL460c with 32 GB RAM and 2 x L5420 Quad Core CPU.
Pool Spin Up
During Test Run
700

600

600

500

500

400

400

300

300

200

200

100

100

0
15:03:32
15:10:12
15:16:52
15:23:32
15:30:12
15:36:52
15:43:32
15:50:12
15:56:52
16:03:32
16:10:12
16:16:52
16:23:32
16:30:12
16:36:52
16:43:32

Percent

700

13:10:11
13:16:51
13:23:31
13:30:11
13:36:51
13:43:31
13:50:11
13:56:51
14:03:32
14:10:12
14:16:52
14:23:32
14:30:12
14:36:52
14:43:32
14:50:12
14:56:52

Percent

CPU Usage (2 x L5420 Quad Core 2.5GHz CPU)

CPUUsage(Average) 0

CPUUsage(Average) 1

CPUUsage(Average) 0

CPUUsage(Average) 1

CPUUsage(Average) 2

CPUUsage(Average) 3

CPUUsage(Average) 2

CPUUsage(Average) 3

CPUUsage(Average) 4

CPUUsage(Average) 5

CPUUsage(Average) 4

CPUUsage(Average) 5

CPUUsage(Average) 6

CPUUsage(Average) 7

CPUUsage(Average) 6

CPUUsage(Average) 7

42

20000

40

10000

20
0

60
50
40
30
20
10
0
15:04:52
15:12:12
15:19:32
15:26:52
15:34:12
15:41:32
15:48:52
15:56:12
16:03:32
16:10:38
16:17:58
16:25:18
16:32:38
16:39:58

13:10:11
13:18:51
13:27:31
13:36:11
13:44:51
13:53:31
14:02:12
14:10:52
14:19:32
14:28:12
14:36:52
14:45:32
14:54:12

30000
25000
20000
15000
10000
5000
0

60
MiB

30000

Percent

MiB

Memory Usage

MemoryBalloon(Average)

MemoryBalloon(Average)

MemorySharedCommon(Average)

MemorySharedCommon(Average)

MemoryGranted(Average)

MemoryGranted(Average)

MemorySwapUsed(Average)

MemorySwapUsed(Average)

MemoryActive(Average)

MemoryActive(Average)

AverageMemoryUsage(%)

AverageMemoryUsage(%)

2000
1500
1000
500
0
15:03:52
15:09:52
15:15:52
15:21:52
15:27:52
15:33:52
15:39:52
15:45:52
15:51:52
15:57:52
16:03:52
16:09:52
16:15:52
16:21:52
16:27:52
16:33:52
16:39:52
16:45:52

KBps

2000
1500
1000
500
0
13:10:12
13:16:32
13:22:52
13:29:12
13:35:32
13:41:52
13:48:12
13:54:32
14:00:52
14:07:12
14:13:32
14:19:52
14:26:12
14:32:32
14:38:52
14:45:12
14:51:32
14:57:52

KBps

Disk Usage Kilobytes/second

DiskReadRate vmhba0:0:0

DiskReadRate vmhba0:0:0

DiskWriteRate vmhba0:0:0

DiskWriteRate vmhba0:0:0

This traffic is on the local physical disk of the ESX host, rather than tracking the activity of the VMs as these are on NFS shared storage.
The frequency of the disk activity would suggest some logging, perhaps of performance data from the VMs. The rate of traffic appears to
be proportional to the number of running virtual machines.
200

150

150
Mbps

200
100

100
50

0
15:04:52
15:10:32
15:16:12
15:21:52
15:27:32
15:33:12
15:38:52
15:44:32
15:50:12
15:55:52
16:01:32
16:07:12
16:12:52
16:18:32
16:24:12
16:29:52
16:35:32
16:41:12

50
13:10:11
13:16:31
13:22:51
13:29:11
13:35:31
13:41:51
13:48:11
13:54:31
14:00:52
14:07:12
14:13:32
14:19:52
14:26:12
14:32:32
14:38:52
14:45:12
14:51:32
14:57:52

Mbps

Network Utilisation (Mbps)

vmnic0:MbpsSent

vmnic1:MbpsSent

vmnic0:MbpsSent

vmnic1:MbpsSent

vmnic0:MbpsReceive

vmnic1:MbpsReceive

vmnic0:MbpsReceive

vmnic1:MbpsReceive

43

Summary
Spend extra time and care on how you simulate the user workload as it highly impacts all
design recommendations.
o Dont forget to consider the entire user population and how and when login storms will
occur.
Use free and reputable tools like LoginVSI from Login Consultants to simulate real-worldlike user workloads.
Design for failover, your infrastructure size will depend on what user experience you want
during failover (degraded or not, and how much).
o Use central storage and blade servers for scale and reliability.
Virtualize most major components of XenDesktop
o Provisioning server in this design was not virtualized, and given the high scalability; you
should dedicate a physical server to it in your design. It will be an option to run PVS
virtualized, but look for recommendations on this in an upcoming document.

44

Appendix A Blade Server Hardware and Deployment


The test environment consists of primarily HP Blade servers. Some additional servers hosting
infrastructure of specific test components are detailed later in this report. VMware ESX was installed on
the 2 different specification BL460 servers, labelled (V1) and (V2), which were used to host both
Windows XP Desktops and a small number of VMs for XenDesktop Brokers (DDCs).
The BL680 servers were used to host two Citrix Provisioning Services and a Microsoft SQL Server.
These machines were somewhat over specified for their roles.
BL460c (v1) 1.86Ghz Dual Processor Quad Core 16GB RAM
2 x 1.86Ghz Intel Xeon L5320 Quad Core (8MiB L2 Cache 1066Mhz
Bus)
1 x 36GB HDD SAS 10K rpm
16 GB RAM 667Mhz
Dual Broadcom 1Gb NICs
QLogic QMH2462 Dual Port Fibre Channel HBA
Product
Overview:
http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/460c/index.html
BL460c (v2) 2.5Ghz Dual Processor Quad Core 32GB RAM
2 x 2.5Ghz Intel Xeon L5420 Quad Core (12MiB L2 Cache 1333Mhz
Bus)
1 x 72GB HDD SAS 10K rpm
32 GiB RAM 667Mhz
Dual Broadcom 1Gb NICs
QLogic QMH2462 Dual Port Fibre Channel HBA
Product
http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/460c/index.html

Overview:

45

BL680 G5 2.4Ghz Quad Processor Hex Core 64GB RAM


4 x 2.4Ghz Intel Xeon E7450 Hex Core (9MiB L2 Cache (12MiB L3
Cache) 1000Mhz Bus)
2 x 72GB HDD SAS 10K rpm
64 GiB RAM 667Mhz
8 x Broadcom 1Gb NICs
QLogic QMH2462 Dual Port Fibre Channel HBA
Product
Overview:
http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/680c/index.html

46

Blade Deployment

47

Appendix B Network Diagram


This is predominately HP blade based environment running the virtual machines. Dell 1950 1U Servers
are used to run many ICA clients on the same server to connect into the environment.
The environment was originally designed to use Fibre Channel for storage traffic, however in this testing
NFS was used as it offer greatly simplified management and scalability.
All traffic is passed to either a top of rack Cisco 2960-G switch or via the Cisco blade switch modules in
the blades back to a central Cisco 4510 chassis. This chassis houses multiple 1GbE and 10GbE line
cards in addition to the supervisor modules. Where the blade switches support stacking this feature has
been used.

48

Fibre Channel Storage Network


Fibre Channel network is only used for databases on SQL server running on one of the BL680 blades
servers.
All other storage traffic uses NFS over Ethernet links.

49

REFERENCES
Citrix (Knowledgebase Articles)
Separating the Roles of Farm Master and Controller in the XenDesktop Farm (CTX117477)
Registry Key Entries Used by XenDesktop (CTX117446)
NetApp:
Deployment Guide for XenDesktop 3.0 and VMware ESX Server on NetApp (TR-3795)
NetApp and VMware Virtual Infrastructure 3 Storage Best Practices (TR-3428)
Citrix XenServer 5.0 and NetApp Storage Best Practices (TR-3732)
Citrix XenDesktop 2.0 with NetApp Storage Pilot Deployment Overview (TR-3711)
2,000-Seat VMware View on NetApp Deployment Guide Using NFS (TR-3770)
Project VRC / Login Consultants:
VRC, VSI and Clocks Reviewed
VMware Platform Performance Index v1.1
XenServer Platform Performance Index v1.0
VMware:
VMware Virtual Infrastructure 3.5 Configuration Maximums
Comparison of Storage Protocol Performance
NetApp FAS2020HA Unified Storage

50

About Citrix
Citrix Systems, Inc. (NASDAQ:CTXS) is the leading provider of virtualization, networking and software as a service
technologies for more than 230,000 organizations worldwide. It is Citrix Delivery Center, Citrix Cloud Center (C3) and Citrix
Online Services product families radically simplify computing for millions of users, delivering applications as an on-demand
service to any user, in any location on any device. Citrix customers include the worlds largest Internet companies, 99 percent
of Fortune Global 500 enterprises, and hundreds of thousands of small businesses and prosumers worldwide. Citrix partners
with over 10,000 companies worldwide in more than 100 countries. Founded in 1989, annual revenue in 2008 was $1.6
billion.

2010 Citrix Systems, Inc. All rights reserved. Citrix, Access Gateway, Branch Repeater, Citrix Repeater, HDX,
XenServer, XenApp, XenDesktop and Citrix Delivery Center are trademarks of Citrix Systems, Inc. and/or one or
more of its subsidiaries, and may be registered in the United States Patent and Trademark Office and in other countries. All
other trademarks and registered trademarks are property of their respective owners.

51

Potrebbero piacerti anche