Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT
This paper provides guidance on best practices for EMC XtremIO deployments with
Brocade Storage Area Network (SAN) Fabrics. SANs must be designed so applications
can take full advantage of the extremely low latency, high IO, and bandwidth
capabilities of all-flash arrays (AFA) such as XtremIO.
August, 2015
Authors:
Marcus Thordal, Director, Solutions Marketing, Brocade
Anika Suri, Systems Engineer, Brocade
Victor Salvacruz, Corporate Solutions Architect, EMC XtremIO
To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local
representative or authorized reseller, visit www.emc.com, or explore and compare products in the EMC Store
TABLE OF CONTENTS
EXECUTIVE SUMMARY .............................................................................. 5
AUDIENCE ....................................................................................................... 5
TERMINOLOGY ................................................................................................. 5
OLTP............................................................................................................... 6
VSI ................................................................................................................. 6
EXECUTIVE SUMMARY
With a new generation of all-flash arrays (AFAs) providing unprecedented storage performance, storage area networks (SAN) must
be designed so applications can take full advantage of the extremely low latency, high IO, and bandwidth capabilities of the arrays.
In this paper, we provide guidance on best practices for EMC XtremIO deployments with Brocade SAN Fabrics.
Whether you are deploying AFA storage for dedicated or mixed-application workloads or adding XtremIO cluster(s) to an existing
SAN, this paper provides you with a methodology based on joint tests with EMC and Brocade.
AUDIENCE
This guide is intended for IT data storage architects and SAN administrators who are responsible for storage or SAN design based on
the EMC XtremIO storage system and Brocade Gen 5 Fibre Channel SAN fabrics.
TERMINOLOGY
Below are some commonly used terms and abbreviations that you will find throughout this document.
AFA
All-flash array
ICL
Inter-Chassis Links, also referred to as UltraScale ICLs, are dedicated ports for connectivity between DCX 8510
directors providing up to 64 Gbps of throughput per link
ISL
Interswitch links provide connectivity between two Fibre Channel switches via E_ports
OLTP
VDI
VSI
X-Brick
An X-Brick storage appliance is the building block of an XtremIO storage system which can be clustered to scale
performance and capacity
XDP
XIOS
DAE
Disk-array enclosure
Table 1
Terminology
Figure 1.
When deploying dedicated SANs for XtremIO cluster storage services, the SAN design can be tailored directly based on the
application workloads and scalability requirements. In the next sections, we discuss how different application types and scalability
requirements influence the SAN design decisions.
APPLICATION WORKLOADS
VDI
When the application workloads are well known and the AFA serves few (one or two) application types, reasonably precise estimates
for the optimal fan-in ratio of compute to storage are possible. For example, if the AFA is used for a VDI application, each compute
(server) will host a fixed number of VDI instances with a known IO profile. This profile determines the number of IOs per server and
correlates to a fixed number of servers per storage port and the number of X-Bricks necessary for the XtremIO cluster.
OLTP
In the case of an OLTP database application, the database is running on one or few clustered servers which determine the number of
servers per storage port and XtremIO cluster. ISLs can then be sized accordingly to provide the necessary capacity between the
servers and the XtremIO cluster(s).
VSI
When the AFA is serving multiple applications, it becomes more difficult to determine the fan-in ratio. To ensure adequate capacity
between servers and XtremIO storage clusters, it is necessary to size the ISLs between the SAN switches more conservatively by
provisioning more bandwidth than if the application workload is very well known. In this way, administrators can mitigate the
potential for a negative performance impact in a shared SAN or mixed application environment.
Figure 2 is an example of a redundant SAN design and Figure 3 shows an example of a redundant and resilient SAN design.
Figure 2.
Figure 3.
For most XtremIO deployments, a core-edge topology is often best suited to meet the needs of scalability and uniform IO
performance. Figure 4 provides an example of a core-edge topology with X-Bricks connected at the core and hosts connected at the
edge. Edge switches may be placed at the Top of Rack (ToR) or Middle of Row/End of Row (MoR/EoR) depending on the server
density in the racks and the density of the edge switch used in the design.
Figure 4.
For small XtremIO deploymentssingle cluster with one to two XtremIO clusters throughout the lifecycle of the environment and
limited requirements for scalea simple collapsed design with a single backbone switch in each fabric with hosts and X-Bricks
directly connected will be sufficient. Figure 5 shows an example of a collapsed design with hosts and storage directly connected to a
backbone switch.
Figure 5.
For very large scale XtremIO deployments with multiple X-Brick clusters, a full mesh UltraScale ICL design is recommended. This
design, utilizing the ICLs on the DCX 8510 platform for Switch Fabric backplane interconnect between the switches, provides a
combination of completely flexible placement of hosts and storage with maximum level of scalability and uniform IO performance.
Figure 6 shows an example of the full-mesh UltraScale ICL design enabling placement of hosts and servers independently across
racks with the DCXs placed MoR/EoR.
Figure 6.
For core-edge designs, a good rule of thumb is to design with ISL bandwidth equal to total capacity of all X-brick ports which are
accessible (zoned) to hosts attached to each ToR switch.
For collapsed SAN designs, calculate the number of host and storage ports for the lifetime of the infrastructure.
For UltraScale ICL designs, the ICL capacity must be deployed to match the total number of X-brick ports which are not local on
each DCX Backbone.
Figure 7.
This architecture meets the requirements for connectivity initially. With incremental growth of servers and storage, new servers are
connected with edge switches and storage is connected at the core as shown in Figure 8.
Figure 8.
Scaling by adding edge switches (and port blades on the backbone as needed)
In our example, the SAN is scaled to provide connectivity for 400 servers with access to both traditional non-flash and XtremIO
storage. The split between traditional non-flash storage and XtremIO ports connected per fabric is 36 traditional non-flash storage
ports and two distinct XtremIO clusters with a single X-brick requiring four storage ports at the core (in each fabric). This
configuration results in an overall host to storage port ratio (fan-in) of 10:1.
With anticipated incremental growth of 30 percent year over year in servers and storage during a five-year lifespan, the solution
scales from 400 hosts (and 44 total storage ports) to 1,150 hosts and 104 traditional non-flash storage ports and 24 XtremIO ports
across two XtremIO clusters with six X-bricks per cluster for a total of approximately 1,598 used ports including ISLs.
10
The design can scale well beyond the example shown using dual core or ICL design as discussed in the previous section.
Note:
Though the host to storage port ratio requirements may differ in your environment as well as growth rate and port
utilization objectives, we use this calculation example to illustrate how to project and accommodate the growth of a SAN fabric.
The key building blocks and design assumptions are:
Edge switch = Brocade 6520 (GEN5 switch with a total of 96 ports)
On each edge switch, planned port allocations are 60 ports for host connections and 12 ports for ISL connections to the backbone
core. The remaining 24 ports on the edge switch are reserved as a buffer for unanticipated server growth or ISL bandwidth need.
On the backbone core, the 48 port blades (FC16-48) are used for storage and ISLs. Throughout the lifetime of the solution, 286
ports will be consumed on the DCX Backbone core.
Table 2 shows how the fabric grows year by year using the anticipated growth rate of 30 percent per year.
Edge hosts
Traditional storage ports
XtremIO ports
Total storage ports
ISLs
Total edge ports
Total core ports
Edge switches
DCX-8510-8 Backbone
FC16-48 port blade
Ports used
Ports unused
Port utilization
Total ports per fabric
Year 1
400
36
8
44
56
456
100
6
1
3
556
164
77%
720
Table 2
Year 2
520
48
12
60
72
592
132
8
1
4
724
236
75%
960
Year 3
680
64
16
80
96
776
176
11
1
5
952
346
73%
1296
Year 4
880
86
20
106
120
1000
226
14
1
6
1226
164
88%
1390
Year 5
1150
104
24
128
160
1310
288
18
1
8
1598
514
76%
2112
ZONING
Brocade Best Practice for Zoning
Zoning is a fabric-based service in SANs that groups host and storage nodes which need to communicate. It allows nodes to
communicate with each other only if they are members of the same zone. Nodes can be members of multiple zones, allowing for a
great deal of flexibility when you implement a SAN using zoning.
Zoning not only prevents a host from unauthorized access of storage assets, but it also stops undesired host-to-host communication
and fabric-wide Registered State Change Notification (RSCN) disruptions.
Brocade recommends that users always implement zoning, even if they are using LUN masking. Also, PWWN identification for zoning
is recommended for both security and operational consistency. For details, please refer to the Brocade SAN Fabric Administration
Best Practices Guide.
Zone membership is primarily based on the need for a host to access a storage port. Hosts rarely need to interact directly with each
other and storage ports never initiate SAN traffic by virtue of their nature as targets. Zones can be grouped by array, by host
operating system, by application, or by location within the data center.
11
The recommended grouping method for zoning is single initiator zoning (SIZ), sometimes called single HBA zoning. With SIZ, each
zone has only a single host bus adapter (HBA) and one or more storage ports. It is recommended that you use separate zones for
tape and disk traffic when an HBA is carrying both types of traffic.
SIZ is optimal because it prevents any host-to-host interaction and limits RSCNs to just the zones that need the information within
the RSCN.
Figure 9.
2 HBAs
HBA1
HBA2
2 X-Bricks
X1_SC1_FC1
X1_SC2_FC1
X2_SC1_FC1
X2_SC2_FC1
X1_SC1_FC2
X1_SC2_FC2
X2_SC1_FC2
X2_SC2_FC2
Table 3
12
4 HBAs
HBA1
HBA2
HBA3
HBA4
2 X-Bricks
X1_SC1_FC1
X1_SC2_FC1
X2_SC1_FC1
X2_SC2_FC1
X1_SC1_FC2
X1_SC2_FC2
X2_SC1_FC2
X2_SC2_FC2
X1_SC1_FC1
X1_SC2_FC1
X2_SC1_FC1
X2_SC2_FC1
X1_SC1_FC2
X1_SC2_FC2
X2_SC1_FC2
X2_SC2_FC2
Table 4
16
For more details, please refer to the XtremIO host configuration guide available on the EMC support website.
13
Flow MonitoringMonitors specified traffic flows from source to destination through the SAN
Flow MirroringCaptures packet data as it flows through the SAN, then displays and analyzes the captured packets
data
Flow Vision is most suited to temporary use while troubleshooting high-latency conditions, as continual use results in the
collection of large amounts of diagnostic data. Flow Vision can be used as needed to verify optimal performance for the most
demanding applications requiring optimal performance.
4. Fabric Performance Impact Monitoring: Identifies and alerts administrators to device or ISL congestion and high levels of
latency in the fabric which can have a severe impact on all-flash array performance. FPI Monitoring provides visualization of
bottlenecks and identifies slow drain devices and impacted hosts and storage.
5. At-a-glance Dashboard: Includes customizable health and performance dashboard views, providing all critical information in
one screen. Viewable dashboard widgets that should be monitored include errors on all-flash array facing ports, top 10 flows,
memory usage, and port health.
6. Forward Error Correction (FEC): Automatically detects and recovers from bit errors, enhancing transmission reliability and
performance. FEC can reduce latency time significantly by preventing the need to retransmit frames with bit errors.
7. Credit Loss Recovery: Automatically detects and recovers buffer credit loss at the virtual channel level, providing protection
against performance degradation and enhancing application availability.
8. Compass (Configuration and Operational Monitoring Policy Automation Services Suite): An automated configuration
and operational monitoring policy tool that enforces consistency of configuration across the fabric and monitors changes,
simplifying SAN configuration and alerting you to changes. In medium to large sized environments, this can prevent inadvertent
changes to switch configurations that may impact the preferred parameters set across the fabric to optimize performance.
14
Intelligently classified events in-context within the VMware vCenter Log Insight dashboards
Pre-defined queries, alerts, and fields that can be customized to specific environments, offering both simplicity and flexibility
Increased visibility of issues across the SAN network and operational efficiency with Brocade Fabric Vision Technology integration
For details, please refer to the Brocade and VMware Technology Alliance at http://www.brocade.com/partnerships/technologyalliance-partners/partner-details/vmware/index.page
COMPREHENSIVE REPORTING
SAN Health utilizes two main components: a data capture application and a back-end report processing engine. After it finishes the
capture of switch diagnostic data, the back-end reporting process automatically generates a Visio topology diagram and a detailed
snapshot report on the users SAN configuration. This report contains summary information about the entire SAN as well as specific
details about fabrics, switches, and individual ports. Other useful items in the report include alerts, historical performance graphs,
and recommended best practices.
Because SAN Health provides a point-in-time snapshot of your customer's SAN, Brocade recommends using SAN Health to track
traffic pattern changes in weekly or monthly increments. With a built-in scheduler, the user can run SAN Health after primary
business hours for added safety and convenience. Additional detailed information, sample reports, Visio diagrams, and a list of
supported devices is available at http://www.brocade.com/sanhealth.
EMC Mitrend takes the comprehensive output from SAN Health and creates an easy-to-understand summary presentation to help the
customer make important decisions about the SAN environment. Using SAN Health data, Mitrend recommends consolidation and
technology refresh options.
Together, SAN Health and Mitrend provide the information customers need to quickly monitor the health of their SANs and make
essential daily and long-term decisions.
15
16
17
LUN QUEUE
Generally speaking, it is recommended that you configure more than a solitary LUN on the XtremIO storage system. This will ensure
increased I/O queueing from the various paths already created via zoning on a per LUN basis. A good rule of thumb is to configure at
least four LUNs for use per application or per host.
STORAGE TUNING
XtremIO has no knobs for tuning storage since any application has full access to all resources of the array. XtremIO outlines
recommendations and guidelines for various platforms in the XtremIO Storage Host Configuration Guide.
18
APPENDIX A: REFERENCES
EMC
EMC Storage Analytics 3.1.1 Installation and User Guide
EMC Storage Analytics 3.1.1 Release Notes
XtremIO Storage Array User Guide
XtremIO Release Notes
XtremIO Storage Array Software and Upgrade Guide
XtremIO Performance and Data Services for Oracle Databases
XtremIO 3.0.1, 3.0.2, and 3.0.3 Storage Array Pre-Installation Checklist
XtremIO 3.0.3 Global Services Product Support Bulletin
XtremIO 3.0.1, 3.0.2, and 3.0.3 Storage Array Software Installation and Upgrade Guide
BROCADE
Brocade SAN Fabric Administration Best Practices Guide (http://www.brocade.com/downloads/documents/best_practice_guides/sanadmin-best-practices-bp.pdf)
Brocade Fabric OS Administrators Guide
(http://www.brocade.com/downloads/documents/product_manuals/B_SAN/FOS_AdminGd_v730.pdf)
19