Sei sulla pagina 1di 635

Isilon Solution Design Course

Student Guide
Copyright© 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell
Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the USA.
Use, copying, and distribution of any DELL EMC software described in this publication requires an applicable software
license. The trademarks, logos, and service marks (collectively "Trademarks") appearing in this publication are the
property of DELL EMC Corporation and other parties. Nothing contained in this publication should be construed as
granting any license or right to use any Trademark without the prior written permission of the party that owns the
AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos,
Authentica, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Aveksa,
Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC
CertTracker, CIO Connect, ClaimPack, ClaimsEditor, Claralert,CLARiiON, ClientPak, CloudArray, Codebook Correlation
Technology, Common Information Model, Compuset, Compute Anywhere, Configuration Intelligence, Configuresoft,
Connectrix, Constellation Computing, CoprHD, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge, Data
Protection Suite, Data Protection Advisor, DBClassify, DD Boost, Dantz, DatabaseXtender, Data Domain, Direct Matrix
Architecture, DiskXtender, DiskXtender 2000, DLS ECO, Document Sciences, Documentum, DR Anywhere, DSSD, ECS,
elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender, EMC Centera, EMC ControlCenter, EMC LifeLine,
EMCTV, Enginuity, EPFM, eRoom, Event Explorer, FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony,
Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, Illuminator, InfoArchive, InfoMover,
Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, Isilon, ISIS,Kazeon, EMC LifeLine, Mainframe Appliance
for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor, Metro, MetroPoint, MirrorView, Mozy, Multi-Band
Deduplication, Navisphere, Netstorage, NetWitness, NetWorker, EMC OnCourse, OnRack, OpenScale, Petrocloud,
PixTools, Powerlink, PowerPath, PowerSnap, ProSphere, ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven
Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect,
RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, Silver Trail, EMC Snap,
SnapImage, SnapSure, SnapView, SourceOne, SRDF, EMC Storage Administrator, StorageScope, SupportMate,
SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex, UltraPoint,
UltraScale, Unisphere, Universal Data Consistency, Vblock, VCE. Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix
Architecture, Virtual Provisioning, Virtualize Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe,
Voyence, VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xPression, xPresso, Xtrem,
XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise Storage.
Revision Date: June 2017
Revision Number: MR-7TP-ISID0916, OneFS 8.1

Isilon Solution Design 2

Module 1: Platform Architecture .................................................................................................. 5
Lesson 1: Phases of Solution Design .......................................................................................... 6
Lesson 2: Sizing ........................................................................................................................ 17
Lesson 3: Node Types ............................................................................................................... 25
Lesson 4: Node Fundamentals .................................................................................................. 48
Lesson 5: Rack and Stack ......................................................................................................... 66
Lesson 6: Solution Concepts ..................................................................................................... 79
Module 2: Data Layout and Protection ...................................................................................... 84
Lesson 1: Journaling .................................................................................................................. 85
Lesson 2: File Striping ............................................................................................................... 91
Lesson 3: Data Protection ....................................................................................................... 105
Lesson 4: Working with Small Files ......................................................................................... 145
Lesson 5: Caching ................................................................................................................... 154
Lesson 6: Read and Write Performance ................................................................................. 162
Module 3: Networking ............................................................................................................... 178
Lesson 1: Networking .............................................................................................................. 179
Lesson 2: Multi-tenancy ........................................................................................................... 196
Lesson 3: SmartConnect ......................................................................................................... 201
Lesson 4: Access Zones.......................................................................................................... 217
Module 4: Data Management .................................................................................................... 235
Lesson 1: Information Lifecycle Management ......................................................................... 236
Lesson 2: File System Layout ................................................................................................. 244
Lesson 3: File Tiering .............................................................................................................. 248
Lesson 4: Quotas ..................................................................................................................... 277
Lesson 5: Deduplication .......................................................................................................... 287
Lesson 6: Snaps ...................................................................................................................... 293
Lesson 7: WORM Compliance ................................................................................................ 301
Lesson 8: Antivirus .................................................................................................................. 311
Module 5: Replication and Recovery ....................................................................................... 316
Lesson 1: Replication .............................................................................................................. 317
Lesson 2: SyncIQ Disaster Recovery ...................................................................................... 352
Lesson 3: NDMP Backups ....................................................................................................... 363
Module 6: Authentication and Authorization .......................................................................... 381
Lesson 1: Client Protocol Support ........................................................................................... 382
Lesson 2: Authentication and Authorization ............................................................................ 405
Lesson 3: Permissions and User Identity ................................................................................ 432
Lesson 4: Access Control ........................................................................................................ 439
Module 7: Monitoring ................................................................................................................ 445
Lesson 1: Job Engine .............................................................................................................. 446
Lesson 2: Monitoring and Alerting ........................................................................................... 458
Module 8: Solution Tools .......................................................................................................... 474
Lesson 1: Isilon Sizing Tool ..................................................................................................... 475

Isilon Solution Design 3

Lesson 2: Other Assessment Tools ......................................................................................... 480
Lesson 3: Tools on Cluster ...................................................................................................... 488
Module 9: Verticals and Horizontals ........................................................................................ 494
Lesson 1: Media and Entertainment ........................................................................................ 495
Lesson 2: Video Surveillance .................................................................................................. 508
Lesson 3: Home Directories and File Shares .......................................................................... 523
Lesson 4: Hadoop .................................................................................................................... 536
Lesson 5: Life Sciences ........................................................................................................... 548
Lesson 6: Healthcare ............................................................................................................... 559
Lesson 7: Oil and Gas ............................................................................................................. 571
Lesson 8: Financial Services ................................................................................................... 583
Module 10: Competition ............................................................................................................ 596
Lesson 1: ROI and TCO Objectives ........................................................................................ 597
Lesson 2: Creating an Analysis ............................................................................................... 614
Lesson 3: Competition and Technical Differentiators .............................................................. 624
References ................................................................................................................................. 635

Isilon Solution Design 4

Module 1: Platform Architecture


Upon completion of this module, you will be able to understand the phases of solution
design, understand sizing, differentiate Isilon hardware, and explain cluster environment

Isilon Solution Design 5

Lesson 1: Phases of Solution Design


Upon completion of this lesson, you should be able to describe the phases of solution
development and understand some of the tools used in the process.

Isilon Solution Design 6

Overview: Solution Design


Solution design is a process that analyzes the needs of a customer and formulates a solution
that addresses those needs. The process can be broken down into several broad phases.
First is the initial conversation. This is, essentially, the foundation of an investigation, probing
the customer for specifics that are used to formulate a proposal. The second phase, the
proposal, is understanding what the solution may look like. In this course, we will use a
topology graphic to represent our proposal to a given customer. The third phase is vetting
the proposal, finding gaps in the solution or perhaps identifying unnecessary noise (i.e.,
unneeded software). The content in each of the modules is intended to broaden your
expertise of the Isilon components, features, and function to help discover flaws in the
proposal. Finally, once the questions have been answered and research complete, the final
phase, creating a solution document, can be done.
The proposal is dynamic and may change as customer requirements change or new
information comes to light. For example, individuals on the solution development team
bring different skill sets. A networking specialist may view the project differently than a
storage specialist. To find your way to a solid solution, it is important to include expertise in
all areas of the data center that may influence the design, such as network, virtualization,
application integration, storage, etc. Involve the best people to get a high quality result.

Isilon Solution Design 7

Interview, Phase 1: Investigation and Discovery


The goal is to quickly determine if Isilon is a good fix for this environment. Avoid investing a
lot of time if Isilon cannot solve the problem, especially given that Dell EMC has a wide
portfolio of other solutions. Try to gather the most relevant information up front. If for
example you discover the customer’s application is running an Oracle database that must
have block storage access, then the Isilon would not be a good option to present.

Isilon Solution Design 8

Interview Tools


There are several sources to get tools that will help you rapidly assess whether Isilon suits
the customer’s need, or not. You can use them as is, or customize them to meet your
business needs. A well-understood, well-documented solution can help you stand out to the
Through the field tools page noted on the slide, you can navigate to the sizing tool, total cost
of ownership (TCO) tools, and calculators. Mainstay advisor also provides tools as well as
access to playbooks. Another great place to get playbook and reference architecture PDFs is
at Inside EMC. The playbooks are designed for developing knowledge and collateral around
Isilon solutions for specific verticals and horizontals.

Isilon Solution Design 9

Phase 2: Initial Proposal


Once you are done analyzing the customer responses from pre-qualification questions, an
initial proposal can be formed. If Isilon is a good fit for the customer, your first
considerations should be, which is the proper node type to help this customer? Positioning
the proper platform or platforms is the baseline for the solution.

Isilon Solution Design 10

Phase 3: Challenge the Proposal


The Isilon Corporate Systems Engineering (CSE) Team can help you validate actual
configurations. Internal hardware resources (particularly in Seattle) can implement almost
any cluster configuration possible.
Consult with the customer when questions arise, especially when considering integration
into the customer's architecture. Limitations in the customer's environment could impede
integration with the Isilon solution. For example, what software does the storage solution
have to integrate with? Can their network infrastructure accommodate the cluster? What
kind of throughput demands might be required of the system?
You should always, test → revise proposal → test further → revise further until proposal is
solid. Document as you go in case you are working multiple solution designs.

Isilon Solution Design 11

Phase 4: Solution Document


Get the right people to review the solution document. Often, the storage people know very
little about the network requirements. The backup team may have differing opinions from
the storage team. Get all stakeholders to approve the document, or you may end up
rewriting and revising to meet the goals of each group.
The solution document is intended to be a high-level solution design for the Isilon
environment and used as a reference for all future and ongoing changes and updates. It is
not intended to be an in-depth technical install guide. The document serves as a record of
the design decisions made to ensure that all EMC hardware and software aligns with the
customer requirements.

Isilon Solution Design 12

Putting It All Together


The customer will meet with you, because they have needs. You’ll ask the customer a whole
range of questions, and you will have answers. Is it an Isilon opportunity? Apparently so. Will
you ask all the questions? Come to think of it, probably not, because you don't know what
you don't know. You need to roll it all into a big picture so that you can really grasp what the
situation is at a high level, and make an intelligent recommendation.
Benefits of using the approach of “show” versus “tell”; you are less likely to miss details. You
will get a better picture of customer workflows using graphics. You can clearly educate the
customer on architectural changes and benefits. You can highlight better cluster design
more effectively than by just putting numbers into a sizing tool. You can identify security and
authentication challenges and also describe footprint and access needs.

Isilon Solution Design 13

Today’s Reality


Let’s explore what this may look like. You need to create the big picture, and the easiest way
is - a big picture. Grab a whiteboard if you can, and draw it all out. Start big and then drill
down. What are they using to store and manage unstructured data? What applications are
accessing data? What is their backup methodology? What are they using for a network?

Isilon Solution Design 14

Drill Down to Details


Drill down. Ask more detailed questions. How many users? What types of users, Windows,
Linux, Apple, etc… Ask about workgroups, criticality of data access for each group, protocols,
and especially note their pain points.

Isilon Solution Design 15

How Would It Look with Isilon?


Use the same structure to re-architect it on paper, center around Isilon and related Dell EMC
products. Show the customer how it all fits together and why. Show less chaos. Show a
solution that addresses their pain points.

Isilon Solution Design 16

Lesson 2: Sizing


Upon completion of this lesson, you should be able to describe solution development
methodology, qualify an Isilon opportunity, and explain sizing impact in solution design.

Isilon Solution Design 17

Overview: Sizing


Sizing begins in the interview, investigation, and discovery phase of solution design and is
considered throughout the lifecycle of solutions design. Sizing can be a very broad subject
that includes everything from ensuring the customer has sufficient rack space and power
distribution to available disk capacity to allow a deduplicated, large data set to be
reconstituted. Sizing must be considered in all design phases. The initial interview will
typically yield general capacities such as the amount of capacity needed for the workflow
and network bandwidth required. The proposal will note more granular sizing
considerations such as protection overhead, snapshots, and division of data between drive
types. Challenging the proposal is where considerations such as L3 cache sizing will play a
role. The solution document should account for the expected growth of the environment.
Sizing is an iterative process. Can we really get it all in one pass? Probably not. Prepare the
customer for the idea that sizing properly is an organic process, and most sales calls require
multiple visits. Your goal is to hit 70-80% of the information you need to gather in the first
meeting. To do this, you must make sure that you are interviewing the right people. Rarely
does any one person understand the organization’s complete information lifecycle. Be aware
of scope creep, which is notoriously common. Customer requirements, often, are not all that
firm. Experience shows that most customers start with applying one or two applications to
storage, and then end up throwing others at it. Your proposal can wind up trying to hit a
moving target - just be aware that as customers continue thinking through their new system,
their goals for it can evolve. If you document goals at each stage, you can call attention to it

Isilon Solution Design 18

when the customer talks with a different goal in mind.
When all else fails, remember your tools. If a customer doesn’t have the information, tools
such as MiTrend can be a real game changer. It also specs for many other EMC storage
systems, such as VNX, VNXe, and XtremIO. Select the link for more information.

Qualifying an Isilon Opportunity


Shown here are examples of ideal opportunities. During the interview process the customer
may tell you their storage is used for unstructured data, or file based data. A key takeaway is
to play the detective and ask questions, the customer does not always touch upon key areas.
Ask how many clients access the current storage array and is the number trending upward
and if so at what rate.
Isilon is built to provide “N-way expansion,” so it shows its value best when the customer is
scaling out rapidly. With Isilon you are building out storage with nodes and each node has a
performance profile. It will store 1 Nth of the data that you have at any point and time.
OneFS is going to break up a file in chunks and store it across all the nodes. We do best with
files larger than 128 K, so that we can stripe them across drives. Files need to have certain
properties, access patterns, and a certain amount of concurrencies across the cluster to get
an even distribution of data. Data storage is a growing area, so although the cluster may
start out small, the goal is that the cluster will grow into a six, or eight node cluster within the

Isilon Solution Design 19

next year.
There is no need to worry about high levels of client concurrency. AMD ran a test against a
10 node Isilon cluster using 74 client nodes, with 12 threads per node, 888 concurrent
threads, against the Isilon. The more client concurrency you have the better Isilon will shine.
What is the difference between concurrent / sequential? If the system has a lot of threads
grabbing data; ask if it is grabbing the data randomly or in a sequential pattern?
Every solution has problems handling random I/O; however, note that such problems are
more pronounced on the OneFS system. OneFS does really well on reads. OneFS performs
very well in a mostly read environment. In an environment that is primarily for scientific
computation, you will see a burst of writes followed by a burst of reads then a lot of reads.
These type of pattern tend to be the patterns we go after.
We also do really well with home directories and other types of data where the files are large
enough to be broken down into 128 KB stripes. In summary, if you do not have data that is
growing, lots of concurrency, or a random access pattern, it is not an Isilon opportunity! If
you do have a strong “yes” to any of these three categories, there is a very good chance that
Isilon will do well in this environment.

Qualifying Questions


Questions you can use to qualify an opportunity that, on first look, does not look ideal.

Isilon Solution Design 20

 How much capacity are you looking for? Large capacity need - 30TB, 50, 100's to
 What does your infrastructure look like? Looking for a NAS solution (If they are all
Fibre Channel, they’ll have significant infrastructure costs in order to upgrade to NAS.
Check # of clients, connectivity type (1GigE or 10GigE). Looking to consolidate a large
number of NetApp filers and/or workflows into a single system?
 What applications access this storage? Block size, random vs. sequential, read vs.
write? Note: We don’t handle Oracle very well.
 What protocols, SMB, NFS, FTP or other? Include version.
 How do you measure performance? How many operations per second (transactional,
databases, etc.)? What is the size of operations and are they concurrent or random?
Which is more important, throughput or latency? How many clients do they need to
support? Lots of objects per directory? What are your aggregate bandwidth
 What is the size and type of your existing NAS infrastructure? Lots of unstructured
file data? File-based data solution? Expressed a desire for a scale-out architecture?
 What is your expected annual growth (both performance and capacity)? Expecting
high data growth of less than 10%+ per year, or more data growth?
 How are you managing authentication and file security? (AD, LDAP, other).
 What is the makeup and size of your files and directory structure (deep vs. wide). The
customer’s backup team is often the best authority on this question. If there is no
backup team, ask for a representative sample of data; for example, a sizable home
directory. Then you can extrapolate the answer from the sample. Tools such as
MiTrend can provide backup assessments or file system assessments, which are
perfect for providing sizing data.
 Determine the major reason why they are looking at a new solution. The reason for
replacement will guide you to which aspects of Isilon’s value you should primarily
characterize to them.

Isilon Solution Design 21

Beyond Top 10 Qualifying Questions


You could expand beyond the Top 10 questions to include questions around budget. Isilon
believes this is NOT a good question for an initial conversation. Cost is not the best starting
point for crafting an effective system. You may want to include DR in every campaign, even if
the customer didn’t ask for it. Designing for DR is a best practice. How about auto-tiering of
data, putting the right data on the right cost drive? If chargeback to various business units is
a consideration, the solution design can include segmenting the business units into unique
storage capacity. Is integration with virtual machines, with tools such as VAAI and VASA
needed? Integration is present within Isilon, but we do not target virtual environments as a
primary use.
You’ll need to be proactive with investigating what the customer wants to do. A real world
example: A customer was running video playback from NL nodes, which worked fine. They
then decided that they also needed to add virtualization on the NL nodes. The virtualization
did not work very well on the NL nodes as that is not the NL’s primary use. When the
customer complained, it was too late to turn off the VMs because they were running
production gear. The customer had essentially changed the purpose of the system without
checking with Isilon support or engineering. The VMs could have worked great with the
addition of some S nodes if the right questions had been asked. Asking the right questions
when you are investigating can eliminate these types of issues.

Isilon Solution Design 22

Influencing Factors


External factors, data factors and cluster factors must also be considered prior to properly
sizing a solution.
Questions such as: Are they using 10GbE? Do they use jumbo frames? If the customers
cluster is not Layer 2 connected into the switch, recommend staying with standard-sized
frames. Standard frames provide a 2 to 5% performance improvement compared to jumbo
EMC will size in IOPs. How many IOPS are you going to drive? IOPs from the client to the
storage array and from the storage array back to the client, everything is in terms of IOPs
(read, write, namespace operations, sequential vs. random). To measure ops per second
Isilon uses SpecSFS 2008. Other data factors such as block size, metadata size, file sizes and
node types are all important to us. You can improve latency by using a higher block size,
such as 512K for the Write buffer. A Windows environment will auto-negotiate block size, so
this is difficult to change. If the company is using a custom application or in-house written
software, watch out for scenarios where the system flushes memory after every write - this
wipes out any performance advantage that caching could have provided. Performance can
turn abysmal. Explicit “write-and-flush” is about the worst thing a developer can do for
performance. Metadata can be aligned with SSDs for performance increases.
If any of the conditions under cluster factors are present, you will see a slight degrade in

Isilon Solution Design 23

Platform characteristics outside of sizing criteria: A customer running a 108 NL-Series cluster
expects a certain performance level which they are achieving. If a drive then fails, what is the
impact on the 108 NL-Series cluster? If you have 100 drives, statistically, you are likely to
have at least one out at all times. On a small cluster, performance could be hurt by as much
as 50% on an NL-Series cluster. X-Series nodes can be affected by as much as 30%, and on
an S-Series cluster as much as 10% to 20%.

Sizing Tool


The Isilon Sizing Tool is a valuable and vital resource. Some wizards are designed for specific
workflows while others will help to assist in positioning a platform, such as the Node Pool
Search, which can present a side-by-side comparison. The sizing tool can be accessed on the
Isilon Sizing Tool web page and includes a link to the demonstration using a case study for
the Media and Entertainment Sizing Tool. Select the link for more information.
Also on the Isilon Sizing Tool’s page under the TOOLS option is the file size calculator, which
provides quick metrics on efficiency and overhead for a given protection and file sizes.

Isilon Solution Design 24

Lesson 3: Node Types


Upon completion of this lesson, you should be able to differentiate Gen 5 from Gen 6
product families, describe the different types of nodes and what their designed purposes are.

Isilon Solution Design 25

Isilon Building Blocks


We begin by looking at the building blocks of an Isilon storage system. Isilon

systems are made up of multiple units called ‘nodes’. The nodes combine with Isilon
software to create a ‘cluster’, which behaves as a single, central storage system for a
company’s data. There are primarily three generations of Isilon clusters currently in
the field, Generation 4, 5, and 6. This course focuses on Isilon Generation 5, or Gen
5, and Isilon Generation 6, or Gen 6. The predecessors to Isilon Gen 6 require a
minimum of three nodes to form a cluster. Each node is a single chassis with
compute and drives. A Gen 6 cluster requires a minimum of four nodes. A Gen 6
node is a 1/4 of a 4U chassis, and as shown four nodes fit horizontally in a 4U
chassis. Let’s look at how nodes fit together to form a cluster.

Isilon Solution Design 26

Overview: Isilon Node and Cluster


Gen 5 and Gen 6 nodes can exist within the same cluster. Having both types of node
in one cluster is the typical path for a hardware refresh as customers incorporate and
scale with Gen 6 nodes. Currently, a cluster can have up to 144 nodes, regardless of
node type and mix.
You can add Gen 5 nodes to the cluster one at a time provided the cluster has a
minimum of three nodes of the same series. For example, you cannot use the
capacity of a single X410 node if adding it to a cluster consisting of only S210 nodes.
You would need to add three, X410s in this example. For Gen 6, nodes are added to
the cluster in node pairs as shown. A node pair is the minimum incremental node
Architecturally, every Isilon node is equal to every other Isilon node of the same type
in a cluster. No one specific node is a controller or filer. Instead, OneFS unites the
entire cluster in a globally coherent pool of memory, CPU, and capacity. OneFS
automatically distributes file data across the nodes for built-in high availability. So
when a file request is received by an available node, it requests the pieces of the file
from the nodes over the back end network, assembles the file, and delivers it to the
requesting client. Therefore requests are not processed through one controller node,
but rather the node which is most accessible based on availability.
Click on the Terminology button to review the naming differences between Gen 5
and Gen 6.
Gen 5 vs. Gen 6: Platform Terminology

Isilon Solution Design 27

Let us take a moment to review the terminology that is used throughout this course.
In the Generation 5 family, a node was a self-contained single unit composed of both
computing and storage hardware, which was mounted directly into a rack. With the
Gen 6 Product Family, a single node is ¼ of a chassis where a chassis can hold up to
four nodes. The chassis is mounted into the rack, and the four nodes exist within the
chassis. The chassis is always 4U high in a rack, whereas Gen 5 nodes ranged from
1U through 4U. Gen 6 nodes consist of a compute module that fits into the back of
the chassis, and drive sleds for the storage that fit into the front of the chassis. Click
on the information icon to view an image.
Because a fully populated chassis contains four nodes, the minimum cluster size has
increased from three to four nodes. Additional nodes must be added in node pairs in
either the left half or right half of the chassis. Node pairs allow for various newly
designed features and protection mechanisms to work. The changes and functionality
of the node pairs are discussed later in the course.

Gen 5 vs. Gen 6: Platform Terminology

Isilon Solution Design 28

Node Positioning


Gen 6 holds four nodes in a single chassis, so while it is true that you can get 800TB/chassis
of raw storage in A2000 nodes that only amounts to 200TB/node. Conversely, with an HD400
node, you can pack it with 60 data drives and one SSD (for caching), and with a typical choice
is 8TB drives, that’s nearly half a petabyte per node. Let's take a closer look at mapping
between Gen 5 and Gen 6 platforms.
With Gen 6, there is no one-to-one mapping and comparability with the nodes of the Gen 5
platform. There is no direct replacement for an S200 or an S210. Performance is easily
quantified; as you can see in the slide above, the H600 is superior to anything in Gen 5 or
Gen 4. However, with respect to the per-node capacity, Gen 6 cannot be compared with the
Also, the A2000 is denser than the HD400 by a factor of about 1.7, but the HD400 offers a
total cluster capacity of more than double that of A2000 nodes.
Click the highlighted areas to view positioning in each of these use cases.
F800, H600, S210 Positioning:
Customer responses that lean toward using the F800, H600, and S210 nodes are: Small, high
performance working set, flexibility to add SSDs, and the need for high spindle count per
rack unit. Customer concerns such as “We have a lot of random workloads” and “The existing
system just isn’t performing like it used to” are indicators that these platforms may be a
good fit. Avoid measuring the platform value in “dollars per terabyte”. If the customer brings

Isilon Solution Design 29

it up, shift to “dollars to IOPS” or “storage per rack unit.” Rack Unit (U) is defined as the space
that a device takes up in a rack, i.e., an accelerator node is only 1U vs. an S210 is 2U and an
X410 is 4U.
H500, H400, X-Series Positioning:
Customer responses that lean toward the H500, H400, and X-Series nodes are: the need for
high concurrency, small initial capacity, large capacity needs, high throughput, high spindle
count (X410), and home directories with SSDs. Customer statements such as “We are looking
for a general purpose system or utility storage” and “We have a lot of large files” are
indicators that these platforms may be a good fit.
The challenges are in environments with high IOPS, but low capacity, high IOPS per U, and
very random workload.
Archive Platform Positioning:
Deep and long-term storage means little to no change rate, anyone looking for an archival
solution, large capacity, disk based backup and minimal performance requirements would
lean toward the A200, A2000, NL410, and HD400. The customer challenges are when
performance is “nonexistent” compared to typical expectations and when there are large
amounts of files that will require backing up or replicating.
A100 Positioning:
Accelerators have all L1 Cache, so if an application (or series of them) reads so frequently
that data should remain in L1, this is the solution. Video rendering is a good example. The
A100 can aid with VM-intensive applications. Accelerators excel with single-stream
performance. Examples are media applications moving into 4K video (Ultra HD), and the
large instrument sizes that some large defense contractors use. If the workflow needs more
than 600 MB per second, an A100 node is the only way we can provide it.

Isilon Solution Design 30

F800, H600, S210 Positioning

Isilon Solution Design 31

H500, H400, X-Series Positioning

Isilon Solution Design 32

Archive Platform Positioning

Isilon Solution Design 33

A100 Positioning

Isilon Solution Design 34

Generation 6 Advantages


We’ve looked at both Gen 5 and 6 Isilon nodes, but what are the advantages of Gen 6? First,
this new platform provides flexibility. From a customer perspective, it allows for easier
planning as each chassis will require 4U in the rack, with the same cabling and a higher
storage density in a smaller data center footprint. It should be noted that this also means
there is four times as much cabling across the Gen 6 4U chassis populated with 4 nodes. The
Gen 6 platform enables matched storage to computing power for optimum performance
and availability.

Isilon Solution Design 35

Generation 5 Specification Sheets


Isilon Gen 4 and Gen 5 have four different classes, or series, of nodes. Think S-Series for
performance solutions. It uses SSD for metadata acceleration. This can make a huge
improvement on overall performance of the cluster. Latency on SSDs is in microseconds,
rather than the milliseconds measured on spinning disks. The X-Series is the most
commonly positioned node as it strikes a balance between capacity needs and performance
needs. The X-Series is a good starting point. Look at positioning the NL-Series in
environments looking for an archival-type platform where data is fairly static, such as a
backup-to-disk target-highly sequential workflows. The HD-Series fits in environments
seeking a deep archive platform.
Note that the minimum number of nodes is three, but five nodes are a desired entry point
for maintenance, repair and performance reasons. Click on the buttons to review each Gen 5
node series.
If a customer has a lot of random data that needs to be accessed fast, then the S-Series node
is a good solution. It contains SSDs, high memory capacity, and SAS drives, which all add up
to speed. The largest capacity you can work with is about 28.8TB per node (once you figure
in parity). In a 3-node cluster, the capacity for one of the nodes is essentially there for parity.
However, as you add nodes you get almost a full node’s worth of storage.
Shown here are some highlights of the S-Series specification sheet. The S-Series node is

Isilon Solution Design 36

designed to meet the demands of performance conscious, high transactional and IOPS
intensive applications. It can deliver over 3 million IOPS and 175GB/s aggregate throughput.
SSDs and HDDs are supported. With SSD technology for file-system metadata, the Isilon S-
Series significantly accelerates namespace intensive operations. As with all platforms, 1GbE
and 10GbE connectivity can be configured. 80% storage efficiency and deduplication make it
an efficient data storage solution. Isilon’s data storage and redundancy algorithms can
provide for substantial storage efficiency compared to some other strategies. It’s easy to
install and scale.
Use cases may include real-time media streaming, system simulations, DNA and RNA
The X-Series strikes a balance between storage capacity and performance. The X210 is a
good low-cost entry point. Single processor, quad-core, at a decent clock speed, although it
has less memory than other models. The X410 is a real solid workhorse, and contains 36 3.5”
drives. Perfect for home directories, higher-performing archives, high-performance compute
for applications such as bioinformatics and genomics; surveillance that is frequently
accessed. The platform of choice for most applications.
The X-Series allows for greater capacity than the S-Series, can deliver over 200GB/s
throughput. SSDs and HDDs are supported. A cluster of X410 nodes can scale to over 20 PB
in a single namespace, with an aggregate throughput of up to 200GB/sec. As with the S-
Series, the X-Series can also offer 80% storage efficiency, depending on the number of nodes
and the level of protection. Deduplication adds to its efficiency. Simplicity includes quick
install and ease of management.
The NL-Series nodes are designed for cost effective, large capacity storage (up to 210TB per
node), supporting nearline storage and active archiving workloads. They are primarily used
for large data storage and archival solutions, emphasizing capacity over performance. NL
nodes do well in environments with sequential reads. However, there is not enough CPU and
memory in them to do well with random IO. Any random IO will degrade the NL
performance. Drive failures take longer to rebuild on NL, because the drives are significantly
larger. NLs are appropriate when stored data does not change often. A customer should
have at least 200 TB of data to store before considering NL nodes. If they have less than that,
the X-Series may be a better fit.
Use cases may include economical archive, disaster recovery target, disk-to-disk backups.
The HD-Series is designed for cost effective, large capacity storage, but on a more massive
scale than the NL-Series. Termed as “deep archive” or “cold” storage (meaning long time
storage with minimal access), the HD-Series seeks to form the foundation of a scale out data
lake solution. The HD400 scales up to 3.2 PB in a single rack, giving an OPEX reduction by up
to 50%.
Use cases include deep archive, disaster recovery target, data lake foundation.

Isilon Solution Design 37

The A100 Performance Accelerator provides fast concurrent reads with low latency and
supports highly parallel workloads. The A100 Backup Accelerator scales backup performance
to meet backup windows. More information about the A100 is provided in this course.


Isilon Solution Design 38


Isilon Solution Design 39


Isilon Solution Design 40


Isilon Solution Design 41


Isilon Solution Design 42

Generation 6 Specification Sheets


The Isilon’s Generation 6 (Gen 6) family consists of six different offerings based on the
customer's need for performance and capacity. Like previous generations of Isilon hardware,
Gen 6 is a scale-out architecture that enables you to mix node types to meet your needs. The
F800 is the all-flash array with high performance and capacity. Next in terms of computing
power there are the H600 and H500 nodes. The H range of nodes offer both reasonable
computing performance and storage density. These are spinning media nodes with various
levels of computing power - H600 combines "turbo compute" modules with 2.5" SAS drives.
H500 is comparable to a top of the line X410, bundling "high compute" nodes with SATA
drives. The H400 uses a "medium compute" bundle with SATA 4kN drives.
Customers can purchase the A200 (A for archive) node which is a "low compute" node
combined with smaller drives. The A2000 uses a deeper chassis, allowing for 80 drives in 4U
of a 40-inch rack.
Click on the buttons to review the Gen 6 platforms.
The F800 is an all flash solution that caters to high performance and high capacity solution
needs. In the marketplace, it is able to compete against other all-flash vendor solutions for
workflows that depend on high performance. It can accomplish 250-300k IOPS per chassis, it
gets 15GB/s aggregate read throughput from the chassis, and even as you scale the cluster,
the latency remains predictable.

Isilon Solution Design 43

Suited for workflows that require extreme performance and efficiency. Click on the
information icon for more information.
H400, H500, H600:
The H400 provides a balance of performance, capacity and value to support a wide range of
file workloads. The Isilon H400 delivers up to 3 GB/s bandwidth per chassis and provides
capacity options ranging from 120 TB to 480 TB per chassis. The H400 is a capacity optimized
solution with an element of performance. It'll produce 1.7x capacity per chassis over the
NL410 and 3x the read per chassis. Click on the information icon for more information.
The H500 node is a hybrid node build for high performance and high capacity. The H500
delivers 5 GB/s bandwidth per chassis with a capacity ranging from 120 TB to 480 TB per
chassis. It produces 3x read per chassis over the X410 and utilizes the 3.5" SATA drives and is
optimized for throughput performance. The H500 gives you predictable performance even
as it scales and is an ideal choice for customers needing to support a broad range of
enterprise workloads and file use cases. Click on the information icon for more information.
The H600 is designed to provide high performance at value, delivering up to 120,000 IOPS
and 12 GB/s bandwidth per chassis. It is geared toward cost optimized work environments
but it still produces very high performance numbers. It gets 6x and 8x the IOPS and read
operations per chassis respectively over the Gen 5’s S210. It is a standard 4U solutions with
predictable performance even as it scales. This is a high density performance that supports
120 drives per chassis. The Isilon H600 is an ideal choice for high performance computing
(HPC) applications and workloads that don’t require the extreme performance of the Isilon
F800 all-flash storage system. Click on the information icon for more information.
Note: As shown earlier, a Gen 6 chassis consists of 4 nodes as a minimum number of nodes
in a cluster; whereas in Gen 5 there are 3 nodes. For this reason, the performance
comparisons may not be a 1:1 comparison. A more comprehensive comparison between
Gen 5 and Gen 6 nodes is provided in the performance module of this course.
A200, A2000:
For most businesses today, data storage requirements are continuing to grow rapidly -
nearly doubling every 2 to 3 years. 80% of this new data is file-based, unstructured data.
Rapid data growth, along with other IT challenges, drives the need for efficient data archiving
solutions that can store and protect data for long-term retention. The A200 and A2000
address these challenges. The A200 is an active archive box that is optimized for a low cost
per TB for your active archive needs. It is a standard rack depth solution for all of your near-
line storage needs. The A200 is an ideal active archive storage solution that combines near-
primary accessibility, value and ease of use. The A2000 is a deep archive solution with the
lowest cost per TB. It is a high-density deep archive solution in a deep 40" rack. The capacity
stands at 800TB in a 4U solution. It is an ideal solution for high density, deep archive storage
that safeguards data efficiently for long-term retention. Click on the information icons for
more information.

Isilon Solution Design 44


Isilon Solution Design 45

H400, H500, H600

Isilon Solution Design 46

A200, A2000

Isilon Solution Design 47

Lesson 4: Node Fundamentals


Upon completion of this lesson, you should be able to differentiate Gen 5 from Gen 6
product families, describe the different types of nodes and what their designed purposes are.

Isilon Solution Design 48

Compute Power (CPU)


Adding CPU by adding nodes improves a OneFS cluster’s overall performance. CPU is largely
used for FEC calculations in high ingest/update workflows. Read, modify, write cycles can
consume considerable CPU as can the AutoBalance and SmartPools data movements which
involve FEC calculation overhead. FEC is discussed later. The raw compute power in various
Isilon node types is not published by Dell EMC Isilon.

Isilon Solution Design 49

Memory (RAM)


The general rule for memory is “the more the better”, but it does get costly. More memory is
most commonly configured to facilitate high streaming I/O rates and high connection count
environments. Since the L1 cache on performance accelerator nodes is not immediately
recycled, it is good at accelerating a variety of operations against data that accumulate in the
L1 cache. RAM will improve L2 hit rates on reads. Memory will improve metadata read
bandwidth, potentially reduce the disk IOPS for read/modify/write cycles on data being
updated, allow tuning options like larger coalescer buffers, and decrease the impact of job
manager tasks on normal operations.

Isilon Solution Design 50

Disk Drives


Hard disk drives participating in a read or write will aggregate their throughput performance.
The X410’s are a great example of getting as many spindles as possible to participate in a
read or write, which is clearly demonstrated with their high throughput numbers. Due to
how OneFS stripes data across the nodes, larger files will benefit from this higher spindle
count more than smaller files.
When higher IOPS is needed the faster drives, like in the all flash F800 or the 10k SAS drives
found in the H600 and S210 nodes, are preferred. These drives have higher disk IO and
lower seek times. This is especially helpful for random transactional type workflows.
As a general practice, always consider SSDs. Note that Gen 6 nodes have either one or two
SSDs. SSDs improve performance on the vast majority of Isilon installations. SSDs primarily
speed up namespace operations and actions involving metadata. Such activities are at the
heart of many common office activities. Even the performance of large home directories
benefit noticeably from the use of SSDs. Take the approach to quote SSDs, and then remove
from the proposal only if the customer insists.
Isilon does support self-encrypting drives. If you have a node with SEDs, all the drives in the
node must be SEDs. If you have a node with SEDs, you can only create a pool with other SED
nodes, because if you didn't, you couldn't guarantee that the data would be saved on an SED.

Isilon Solution Design 51

Node Types and Specifications


Here is a comparison between the Isilon nodes in each generation as related to performance
and capacity needs. This shows the target workflows for each Isilon node. Select the
Generation 5 and Generation 6 buttons to view the node types, target workflows, and their
Generation 5 Nodes
The Gen 5 portfolio includes five storage nodes and two accelerator nodes. A storage node
includes the following components in a 2U or 4U rack-mountable chassis with an LCD front
panel: CPUs, RAM, NVRAM, network interfaces, InfiniBand adapters, disk controllers, and
storage media.
Customers can choose the combination of different node types that best suits their storage
needs. Click the information buttons next to the nodes to learn more about each platform.
The A-Series consists of two separate nodes with two different functions. The A100
performance accelerator adds CPU and memory resources without adding storage capacity.
The A100 Backup Accelerator allows you to perform backups directly to a backup server or
tape array over a Fibre Channel connection, without sending data over the front-end
network. Click the information buttons for more information.
The S-Series targets IOPS-intensive, random access, file-based applications. The S-Series
node excels in environments where access to random data needs to be fast.

Isilon Solution Design 52

The X series achieves a balance between large capacity and high-performance storage.
These nodes are also best for high concurrency applications, where many people have to
access a file at the same time.
S-Series and X-Series nodes can be equipped with SSD media. The SSDs can be used to hold
file system metadata, which provides improved performance for metadata intensive
operations, while improving overall latency. They can also be configured as L3 cache to
provide faster access to frequently accessed data stored on the cluster.
The NL (for Nearline) and HD (for High-Density) nodes are primarily used for large data
storage. The NL-Series nodes are used for active archival, and the HD nodes for deep
archival workloads. NLs and HDs are appropriate when the data stored does not change
often and is only infrequently accessed.
Note that the “A” in the Gen 5 A100 accelerator should not be confused with the Gen 6 “A” in
A200 or A2000 archival platforms.
Generation 6 Nodes
The Gen 6 platform consists of six different offerings based on the customer's need for
performance and capacity. Previous generations of Isilon nodes come in 1U, 2U, and 4U
form factors. Gen 6 has a modular architecture instead, with 4 nodes fitting into a single 4U
The F800 is an all-flash array with ultra-high performance. The F800 sits at the top of both
the performance and capacity platform offerings when implementing the 15.4TB model,
giving it the distinction of being both the fastest and densest Gen 6 node.
The H in H600 and H500 stands for 'hybrid' and targets both performance and capacity
intensive workloads. These are spinning media nodes with various levels of available
computing power - H600 combines our turbo compute performance nodes with 2.5" SAS
drives for high IOPS workloads. H500 is comparable to a top of the line X410, combining a
high compute performance node with SATA drives. The whole Gen 6 architecture is
inherently modular and flexible with respect to its specifications.
The H400 uses a medium compute performance node with SATA drives, and A200 (A for
archive) is an entry level node. It uses a low compute performance node with larger drives in
capacities of 2TB, 4TB, or 8TB. The A2000 is the deep rack version of the A200, capable of
containing 80, 10TB drives for 800TBs of storage by using a deeper chassis with longer drive
sleds containing more drives in each sled.
Click on the buttons to view more information.
Gen 6 Specifications at a Glance:
This chart is a summarized list of supported and available options for the Gen 6 nodes
Gen 6 Capacity at a Glance:
This chart includes the usable capacity per chassis at 100% and at 85% for each Gen 6
Gen 6 Capacity and Performance at a Glance:

Isilon Solution Design 53

The F800, for Extreme Performance and Scalability, provides 250,000 file operations per
second (SpecSFS 2008 ops), 15 GB/s of aggregate read throughput, and up to 924TB of raw
capacity in a 4U chassis. This is roughly equivalent to 10 S210s in terms of ops/sec and
throughput, with the capacity of 2 HD400s squeezed into one 4U chassis. This is nothing like
any platform we’ve built before.
The H600, designed to be the highest performing hybrid platform, delivers 120,000 ops/sec
and 12GB/s of read throughput using 120 2.5” SAS drives. This is not like the S210 which we
positioned primarily for transactional workloads that need just high ops/sec. The H600 is
also very well suited for workloads with a few, high-speed streams like 4K editing or playback.
The H500, with 60 3.5” SATA drives, is designed for high throughput and scalability. It delivers
5GB/s of read throughput and a moderate level of ops/sec while providing up to 480TB per
4U. This is still optimized to deliver the most throughput possible out of the SATA drives and
will be best for many concurrent low to moderate speed streams.
The H400, also with 60 SATA drives, is a more balanced platform for performance, capacity,
and value. As you can see, compared to the H500, it delivers slightly less performance but
the same capacity range.
The A200 continues this shift towards capacity rather than performance. On a per node
basis, the A200 is less powerful than the NL410, but offers more aggregate performance per
rack unit. Thus, the A200 will be well suited for active archives.
The A2000 completes this portfolio by providing even denser and deeper archive by using
10TB drives and up. It is second only to the F800 in terms of max capacity per chassis, but it
is significantly cheaper and slower than the all-flash solution. The A2000 is also more
powerful than a similar HD400 on a 4U basis, but that performance is best used for data
services jobs like SmartFail and SyncIQ.

Isilon Solution Design 54

Generation 5 Nodes

Isilon Solution Design 55

Generation 6 Nodes

Isilon Solution Design 56

Gen 6 Specifications at a Glance

Isilon Solution Design 57

Gen 6 Capacity at a Glance

Isilon Solution Design 58

Gen 6 Capacity and Performance at a Glance

Isilon Solution Design 59

SSDs in Gen 6


The usage of SSD in Gen 6 is different from Gen 5. Gen 5 had a given number of drive bays
per node, and you could elect to use some of them for caching SSDs instead of HDDs for
data. Thus, you could have Gen 5 nodes (e.g. X410) with different numbers of HDDs and
numbers of SSDs (e.g. 34/2 vs 36/0). This created a problem because OneFS has strict
requirements that nodes match in the number of HDDs to be in the same node pool. We
created the Node Compatibility concept to relax some of those restrictions and allow nodes
of same hardware but slightly different HDD count to be in the same pool.
Gen 6 has dedicated slots for data drives and caching drives. For instance, H600 will have
120 HDD slots per chassis, 30 per node. Each chassis will also have two caching SSD slots in
the rear. This eliminates the need for the complex node compatibility like we see in Gen 4
and Gen 5 because any two of the same node type (e.g. two H600 nodes) will always have
the same number of drives.

Isilon Solution Design 60

4Kn Drives


The term 4Kn drives refers to drives formatted based on 4 kilobyte (KB) or 4,096 byte (B)
sector sizes. This is a new larger sector sizes compared to previous drives used in Isilon
nodes. The other drives are based on a 512 byte sector size. 4Kn drives with the larger sector
size provide more efficient platter use within the drives. It also provides a higher efficiency
when calculating and checking error checking checksum information on each used sector. In
addition to providing better efficiencies, it is required to support new larger capacity drives.
The new 8TB and larger drives are only available in 4Kn format. To enable support for the
larger capacity drives, 4Kn support is required in OneFS. 4Kn drives are identifiable through
their drive model number. This varies between drive manufacturers.
Several requirements exist to implement 4Kn drives within OneFS. The I/O must be sized and
aligned to meet a 4KB boundary. This means as OneFS decides how to place files on the
drives, the files must be split with the correct sizing. The sizing must be aligned with the size
of the sector size of the 4Kn drives which is 4KB. It also means the inode size used for
metadata storage with OneFS has to be aligned. The inode size is aligned to 8KB, which is the
same size as our file block size. This replaces the 512B inode size currently used in OneFS.
A node can only support either 4Kn drives or 512n drives at one time. You may not mix 4Kn
drives with 512n drives in the same node. OneFS must be able to support each drive type
appropriately for each node, and simultaneously support both drive types in the same
cluster. This is not possible if drive types are mixed.
Similar to one drive type per node, a node pool may consist of only nodes with 4Kn drives or

Isilon Solution Design 61

nodes with 512n drives. Nodes with the same drive type are supported in the same node
pool. Because OneFS stripes data across nodes, each node must have the same drive format
to use.
OneFS was updated to appropriately recognize situations of mixed 4Kn and 512n drives in
the same node. An alert and CELOG event are created if the wrong drive type is used when
replacing a drive. This should help minimize the risk of using the wrong type of drive when
replacing failed drives by customers or support personnel.

Considerations: Gen 5 Node Compatibility


In Gen 5, node compatibility allows an equivalence association between different class

nodes from the same performance series so you can combine them into a single node pool.
You can enable node compatibility between node pools with S200 nodes and adding S210
nodes, or a pool with X400 nodes and adding X410 nodes. As new generations of CTO-based
nodes are released, additional compatibilities are created. Nodes must meet compatibility
requirements, be of the same node class, and have identical settings for some
configurations. With OneFS 8.1 and later the different RAM capacity have been made
compatible. With versions prior to OneFS 8.1, RAM compatibility must be considered. Nodes
with different drive counts can be added into the same node pool. This compatibility
requires the same size HDDs in all nodes. Compatibilities must be enabled in OneFS. Node
compatibilities can be created before or after you add a new node type to the cluster, and

Isilon Solution Design 62

can be disabled or deleted at any time.

Considerations: Gen 6 Cluster Compatibility


With the introduction of the Gen 6 platform, Gen 6 and Gen 4/5 can coexist in the same
cluster. However, bear in mind that node pools can only consist of the same node type. That
is, an H400 node pool cannot include Gen 5 HD400 nodes, but certainly we can have a
cluster with multiple node pools each with uniform nodes. You can also use SmartPools to
tier data between them. When adding Gen 6 nodes to a Gen 5 cluster, the existing InfiniBand
network is used. Gen 5 nodes cannot be added to an existing Gen 6 cluster that uses an
Ethernet back-end network.
You must use the proper PSU’s for F800 and H600 configurations.
Also, if you mix 2.5” and 3.5” drive sleds, the drive sled in slot 0 will be the expected drive
type. If the sled in slot 0 is a 2.5" drive sled, the node will expect all sleds to be 2.5" drives.
Sleds that do not match will not be powered up
Mixing Drive Types in a single sled is not allowed. Mixing node types in a node pair is not
supported due to both a journal size mismatch and also a potential PSU mismatch. No
damage will occur to the nodes but it will put the nodes in an exposed state. When a node is
powered on there's no way to tell the operating system that the node peer does not match.

Isilon Solution Design 63

Mixing two node pairs types in one chassis is supported in the hardware but it will not be
offered for sale at the release date. No date has been determined for this type of availability.

The Equalizer


When you are putting together the story of what the customer workflow and clients are
doing - you are trying to put together a map of what is happening at the data layer of their
operations. You can think of this like using a graphic equalizer to understand the input and
tuning of Dell EMC hardware and software to provide the right balance of performance and
cost. Instead of filtering or boosting audio frequencies, we look at six categories we can work
with from a data perspective.
 How much data processing is going on?
 How much memory usage do we anticipate
 What’s important in metadata to maintain performance?
 What will the hard drives be doing?
 What does the client network look like?
 What I/O profile are we looking for?
As we move forward and dive into customer verticals and horizontals, use this equalizer

Isilon Solution Design 64

metaphor to help illustrate the areas of importance. However, as with the real graphic
equalizer presets, these are just a starting point. Every customer, environment, and situation
will need adjustment to fit just right. It’s up to you to dig into your customer’s needs and pain
points and determine the right balance for the solution proposal.
Click on the Equalizer Exercise button to learn more.
Equalizer Exercise:
Let’s take an example in a typical Media & Entertainment industry. Our prospect customer
has a workflow of serving high definition video content to playout servers feeding six live
television channels. On a scale of 1 to 10, how would you rate each of these 14 variables for
this workflow?
 1 = Not at all important, could be non-functional and solution will still perform well.
 10 = Absolutely critical, entire solution hinges on this function.
If we were to create a preset for this workflow - it might look like this. What do these mean in
translation to Isilon solutions? CPU being a bit above 5 probably means staying away from
nearline storage such as the NL410, HD400, A2000, etc… Through further conversations, you
can refine this preset to improve your proposal to meet their needs.

Equalizer Exercise

Isilon Solution Design 65

Lesson 5: Rack and Stack


Upon completion of this lesson, you should be able to discuss the node power requirements,
explain node distribution, and highlight front-end and back-end connectivity.

Isilon Solution Design 66

Qualifying Questions


Data center capacity must be part of the solution design document. Environmentals, such as
power and power distribution, HVAC, floor space, and rack space, need to be addressed.

Isilon Solution Design 67

Gen 5 Dimensions and Weight


This table shows the dimensions for each of the Gen 5 nodes. When designing a solution, it’s
important to consider the weight of a system and that the weight does not exceed the
supportability of the customer’s data center floor. As an example, a cluster with 3 S210s, 3
X210s, and 3 NL410s is a minimum of 690 lbs. This does not take into account the weight of
the rack, rails, switches, cabling, and other components that may be mounted in the rack.
Raised floor recommendation is the use of a 24 in. x 24 in. (60 cm x 60 cm) or heavy-duty,
concrete-filled steel floor tiles.
EMC fully disclaims any and all liability for any damage or injury resulting from the
customer's failure to ensure that the raised floor, subfloor, and/or site floor are capable of
supporting the system weight, the design solution.
The Isilon Site Preparation and Planning Guide has additional information such as floor load
bearing requirements.

Isilon Solution Design 68

Gen 6 Chassis


Dell EMC is concerned for the safety of our employees and customers. Please do not
underestimate the weight of the Gen 6 product family with a fully loaded, fully populated
rack. An A2000 prepopulated rack weighs in excess of 3,500 pounds. This prerack is 1,200
pounds heavier than the HD400 prerack configurations. For comparison, a grizzly bear only
weighs 2,500 pounds, so it is one thousand pounds lighter than our prerack A2000.
For safety reasons, use extreme caution when handling or working with fully populated

Isilon Solution Design 69

Gen 5 Power Overview


Plan to set up redundant power for each rack that contains Isilon nodes. Supply the power
with a minimum of two separate circuits on the building's electrical system. Each AC circuit
requires a source connection that can support a minimum of 4800 VA of single phase 200-
240V AC input power. If one of the circuits fails, the remaining circuit(s) should be able to
handle the full power load of the rack.

Isilon Solution Design 70

Gen 6 Power Connectivity


The Gen 6 chassis' come with two different PSU types. The F800 and H600 nodes will use
1450W high line only supplies, while the rest of the nodes will use the 1050W wide range
supplies. The F800/H600 also requires two 2U step-up transformers. A step-up transformer
is a transformer that increases voltage from primary to secondary transformer. You'll need
two 3KVA transformers for every F800/H600 chassis' for power and redundancy.
When discussing for Gen 6 power redundancy, it is important to understand that there is a
single power supply per node but within the chassis there are two distinct power zones. In
the event that one PSU goes down, the peer node in that power zone would power both
nodes. There is more discussion on power redundancy in a later module.

Isilon Solution Design 71

Gen 6 Environmentals


Beyond precautions of working with electricity, it is also critical to ensure proper cooling.
Proper airflow must be provided to all Isilon equipment. The ambient temperature of the
environment in which Isilon Gen 5 nodes operate should not exceed the maximum rated
ambient temperature of 35°Celsius or 95°Farenheit. Gen 6 nodes have an ASHRAE (American
Society of Heating, Refrigerating and Air-Conditioning Engineers) designation of A3, which
enables the nodes to operate in environments with ambient temperatures from 5 degrees-
up to 40 degrees Celsius for limited periods of time.

Isilon Solution Design 72



Any standard ANSI/EIA RS310D 19-inch rack system can be used in an Isilon cluster
installation. Isilon recommends the Dell EMC 40U-P rack, 40U-D Titan rack, and the Titan-HD
rack for the A2000. The 40U-P is a shallow rack and the 40U-D is a deep rack. In most cases,
the 40U-P cabinet ships fully assembled and configured; if the rack cabinet is not already
assembled, you will need to build it. The rack cabinet in which you will install the nodes or
chassis must have full earth ground to provide reliable grounding. It should also have dual
power supplies with a power distribution that allows load balancing and will switch between
the two in case of an outage. We suggest that you use a rack cabinet that has dual power
distribution units, one on each side. Nodes are shipped with rail kits that are compatible with
racks that have 3/8 inch square holes, 9/32 inch round holes, or pre-threaded holes of 10-32,
12-24, M5X.8, or M6X1. The brackets adjust in length from 24 inches to 36 inches. All Isilon
nodes can be mounted in standard ANSI/EIA RS310D 19-inch rack systems.
Consult the Isilon Site Preparation and Planning Guide for additional information, such as 40U-
P rack delivery, installed dimensions, and stabilizer brackets.

Isilon Solution Design 73

Cluster Overview


All clusters must start with a minimum of three like-type or identical nodes in Generation 5
and a chassis with four of the same type of Generation 6 node. This means that when
starting a new cluster, you must purchase either three Gen 5 or four Gen 6 identical nodes.
You cannot purchase one S-node, one X-node and one NL-node, and then combine them to
form a Gen 5 three-node cluster. Likewise, you cannot purchase a chassis with a Gen 6 H600
node pair and a H400 node pair to form a cluster.
All similar nodes must initially be purchased in groups of three or four due to the way that
OneFS protects the data. Also because Gen 6 node have a “pair” for protection, the minimum
is 4 nodes. You can buy a chassis of H600 nodes, a chassis of H400 nodes, and a chassis of
A2000 nodes, and combine them into a single cluster. If you are adding to an existing Gen 5
cluster, you can add a Gen 6 chassis with like nodes to the cluster. If you accidentally bought
two X-nodes, you could still add them to a cluster but they will stay in a read-only state until
the third X-node is added. The two X-nodes would add memory and processing to the cluster.
Once the third X-node was joined, the three X-nodes would automatically become writeable
and add their storage capacity to the whole of the cluster. Once the minimum number of
like-types nodes is met, you can add in any number of nodes of that type. For example, you
might start out with a 3-node cluster of X-nodes and then purchase one single X-node, or 18
more X-nodes; again, once the node minimum is met, any number or type of nodes can be
added. Keep in mind that Gen 6 nodes are added in node pairs. As of this publication,
clusters can scale up to a maximum of 144 nodes.

Isilon Solution Design 74

Internal Connectivity


Isilon cluster's separates internal (back-end) and external (front-end) network connectivity.
The nodes in the cluster communicate internally using InfiniBand for a Gen 5 cluster or a
cluster combining Gen 5 and Gen 6 nodes. InfiniBand is a high-speed interconnect for high
performance computing. The reliability and performance of the interconnect is very
important in creating a true scale-out storage system. The interconnect needs to provide
both high throughput and very low latency. InfiniBand meets this need, acting as the
backplane of the cluster, enabling each node to contribute to the whole. Isilon supports 40
Gb/sec Quad Data Rate (QDR) models of InfiniBand switches.
A new Gen 6 cluster will use a 40 Gb/sec Ethernet back-end. You cannot add Gen 5 nodes to
a Gen 6 cluster that uses an Ethernet back-end.
A single front-end operation can generate multiple messages on the back-end, because the
nodes coordinate work among themselves when they write or read data. Thus, the dual
back-end InfiniBand or Ethernet (Gen 6 only) switches handle all intra-cluster communication
and provide redundancy in the event that one switch fails.

Isilon Solution Design 75

External Connectivity


The external networking components of a Dell EMC Isilon cluster provide client access over a
variety of protocols. Each storage node connects to one or more external Ethernet networks
using 10 Gigabit Ethernet (10 GbE) or 40 Gigabit Ethernet (40 GbE) connections. Gen 5 nodes
also support 1 GbE. The external switch interfaces may support link aggregation. Link
aggregation creates a logical interface that clients connect to. In the event of a NIC or
connection failure, clients do not lose their connection to the cluster. For stateful protocols,
such as SMB and NFSv4, this prevents client-side timeouts and unintended reconnection to
another cluster. Instead, clients maintain their connection to the logical interface and
continue operating normally. Support for Continuous Availability (CA) for stateful protocols,
like SMB and NFSv4, is supported. The external network switch should support Gigabit
Ethernet, be a non-blocking fabric switch, have a minimum of 1 MB per port of packet buffer
memory, and if jumbo frames will be used, a switch that supports jumbo frames.

Isilon Solution Design 76

Considerations: Gen 5


For the Gen 5 nodes and cluster, the list above outlines some environmental and switch
considerations that should be noted in a solution design. Raised or non-raised floors should
support at least 2,600 lbs (1,180 kg) per rack. A fully configured rack uses at least two floor
tiles, and can weigh 2600 pounds. This bearing requirement accommodates equipment
upgrades and/or reconfigurations. 24 in. x 24 in. (60 cm x 60 cm) or heavy-duty, concrete-
filled steel floor tiles is recommended.
Recommended site temperature is +15°C to +32°C (59°F to 89.6°F). A fully configured cabinet
can produce up to 16,400 BTUs per hour.
Using the internal network for any purpose other than intra-cluster communication is not
supported, and can easily lead to cluster malfunction. Allow for growth, if all the ports on the
back-end switches are used, larger switches will be needed. Daisy chaining back-end
switches is unsupported. Any attempt to extend the InfiniBand fabric beyond the two
switches configured with the cluster can easily lead to cluster malfunction. Only InfiniBand
cables and switches supplied by EMC Isilon are supported. Use a hybrid QSFP-CX4 cable to
connect DDR InfiniBand switches with CX4 ports to nodes that have QSFP (Quad Small Form-
factor Pluggable) ports (A100, S210, and X410, HD400). QSFP is a connector to allow
connection to a QDR port (switch port or InfiniBand card). When connecting a QDR port to a
DDR port, a hybrid cable is needed - QSFP on one end and CX4 on the other. DDR nodes to a
QDR switch can be connected as well as QDR nodes to a DDR switch. When using fibre, a
QSFP Optical transceiver that plugs into the QDR port is needed.

Isilon Solution Design 77

When using 3rd party racks the adjustable rails need mounting screws to secure to both
front and rear NEMA rails. The depth of rails makes installing screws into the rear of the
NEMA very difficult.

Considerations: Gen 6


For the Gen 6 nodes you must use the proper 1450W PSUs for F800 and H600 configurations.
If you mix 2.5” and 3.5” drive sleds, the drive sled in slot 0 will be the expected drive type. If
the sled in slot 0 is a 2.5" drive sled, the node will expect all sleds to be 2.5" drives. Sleds that
do not match will not be powered up. Mixing Drive Types in a single sled not allowed.
Homogeneous sleds, drive size, and protocol, e.g. 3.5” SATA). Mixing node types in a node
pair is not supported due to both a journal size mismatch, and it also may result in PSU
mismatch. No damage occurs to the nodes, but puts nodes in an exposed state. When a
node is powered on, there is no way to tell the OS that the node peer does not match.
Mixing two node pair types in one chassis is supported in the hardware, but it will not be
offered for sale at this point. No date has been determined for this type of availability.

Isilon Solution Design 78

Lesson 6: Solution Concepts


Upon completion of this lesson, you should be able to identify common use cases for Isilon,
explain Edge to Core to Cloud, and highlight Isilon in a Data Lake.

Isilon Solution Design 79

Use Cases


Some of the use cases for the Isilon clusters are:

 A media and entertainment (M&E) production house that needs high single stream
performance at PB scale that is cost optimized. They also require Flash-to-Cloud in a
single name space, archive optimized density, and a low TCO solution. Animation
and post-production workflows quadruple in data size and data rates every time
image formats double from 2K/HD to 4K/UHD to 8K. Industry competition and wider
availability of high resolution consumer displays are pushing production houses to
4K and above.
 EDA - as semiconductor geometries shrink in half, the number of gates quadruples
and thus requires 4x more compute and storage to complete the projects in the
same time.
 Life Science - with new initiatives to capture genomic data of entire nations, storage
today can’t keep up with the amount of genome sequencing that needs to be done.
Traditionally DNA analysis took hours and now we can do that in minutes;
personalized medicine is the next wave
 Financial Services - continues to innovate with advanced, real-time analytics for
better investment decisions and reduce risk and fraud; thus, the main focus is on
data protection and availability.
Trends like these result in far more data kept online in NAS archives scale so that companies

Isilon Solution Design 80

can derive business value at any time.

From Edge to Core to Cloud


Cloud storage is something that CIOs are actively considering. They hope to use cloud
economics and want to integrate their existing Isilon environment with the cloud. They have
committed to Isilon for their NAS storage but want the OPEX economic model that cloud
offers. They want cloud integration to be non-disruptive and seamless to users and apps
while lowering costs. They want a single namespace-one way to get to the data.
Security is always a concern when data is off-premises. Customers are always looking at
choice - choice of public or private cloud providers or both. They do not want to be locked in.
They want to place their frozen data in the cloud. And, they like our SmartPools approach.

Isilon Solution Design 81

Challenges for Enterprise Customers


Isilon products enable enterprises to implement a Data Lake 2.0 strategy that extends the
data lake from the data center to edge locations, including remote and branch offices, and
the cloud.
Today the focus is at the core of the data center where the critical corporate data are kept
and the users access these files remotely. In fact, 68% of enterprises have over 10TB of data
at each edge location. You can deploy IsilonSD Edge on VMware at the edge and consolidate
the content from local storage to IsilonSD Edge installations in each branch location. You can
then use SyncIQ to replicate the content to the core so that you can manage backups in a
central location, while retaining fast data access for users at the edge.
We can also extend our story to include our cloud offering. The customer can tier inactive
data at the core to a cloud provider while providing seamless file access to users of the core
Another solution in this space is to use the core cluster as a cloud target for IsilonSD Edge
installations. This way the virtual installations at the edge of the enterprise are not limited to
their local storage capacity, but can transparently extend their storage to additional capacity
maintained at the central office. SyncIQ can copy the stub files to the core as well, for
archival purposes, thereby still protecting the data against failures at the edge of the
Yet another approach is to let the IsilonSD Edge installations use the public cloud for

Isilon Solution Design 82

capacity, while performing either stub file or full replication to the core. These alternatives
together allow for maximal flexibility to meet organizational needs, whatever shape they
may take.
Some of the value propositions of Isilon SD Edge are:
 Simple deployment: IsilonSD Edge software can be deployed in minutes on
commodity hardware in remote offices.
 Improved data protection: IsilonSD Edge enables customers to consolidate data at
the enterprise edge and replicate the data to the core leading to simplified and more
reliable backups
 Data Lake 2.0: IsilonSD Edge extends the date lake from the data center to enterprise
edge locations, including remote and branch offices, while delivering increased
resiliency, reduced costs, simplified management and improved data protection

Isilon Solution Design 83

Module 2: Data Layout and Protection


Upon completion of this module, you will be able to explain journaling, describe OneFS file
striping, examine data protection strategies, compare caching options, and discuss
read/write performance.

Isilon Solution Design 84

Lesson 1: Journaling


After completing this lesson, you should be able to describe the journal of Isilon nodes
compared to the previous implementation. In particular, you should be able to explain how
Generation 6 node pairs mirror each other's journals and how the recovery features
improve cluster stability.

Isilon Solution Design 85

Overview: Journaling in Isilon


The file system journal is the first place to store file system changes. The journal stages data
before being written to disk. If the journal is lost, data is lost. The major advantage of
journals is that they can confirm that data was written to nonvolatile storage quickly. The
disadvantage is that data in the journals is not protected to the same extent as it would be if
it were written to drives. The data resides briefly in the journal, but it still constitutes a
window of vulnerability. A larger journal improves performance. Even on SSDs the time
taken to calculate protection and communicate with the drives to store data is a source of
latency for each write operation, so having a journal that can buffer writes in a nonvolatile
fashion is a performance boost.
The combination of OneFS 8.1 and Generation 6 hardware brings new advances to file
system journaling. OneFS versions prior to 8.1 are not compatible with new generation
hardware, so there are no concerns with old software on new journal hardware. OneFS 8.1
can run on earlier versions of hardware, such as the S210 and the NL400 nodes, but because
those nodes do not have the new type of journal hardware, it makes no substantial change
to how OneFS uses their journal facilities.
In pre-Gen 6 versions of the Isilon hardware, there were many challenges surrounding
journals. The journal itself was maintained on a battery-backed volatile storage system, but if
these batteries ran down owing to extended downtime, the journals could be lost. Journals
could also be lost or corrupted due to improper shutdown procedures, and whenever a
journal was lost, data loss became a possibility. Another issue was that the limited size of the

Isilon Solution Design 86

journal came to be a performance bottleneck.
In Gen 6 Isilon hardware, these questions have been addressed.

Journaling Change in Gen 6


In Gen 6, the journal's size was increased for better storage performance. As a journal fills, it
flushes changes to drives within the node. In certain high-volume workflows this could
occasionally cause latency when flushing occurred too often. Larger journals offers more
flexibility in determining when data should be moved to disk. Also in Gen 6, Isilon has added
a separate storage vault for the journal. This comes in the form of an M.2 vault drive on each
node. The M.2 is an internal connection standard like SATA or PCI. The node writes the
journal contents to the vault in the event of power loss. The backup battery helps maintain
power while data is stored in the vault. The journal hardware itself also offers lower latency
than older hardware, and while a few microseconds is not a huge difference, it helps to chip
away at the general performance challenges.
During runtime, the journal is kept in RAM, and each node mirrors its journal to its node pair.
In Gen 6, if a node goes down, there is more than one copy of its journal available. When the
node boots, it can refer to its node pair for valid journal copies in the event that its own
copies are bad. The updated status of the journal (not the actual journal contents
themselves) is written as node state blocks to every drive, so that if a journal needs

Isilon Solution Design 87

recovering, the node can refer to these blocks for reconciliation.

Shared Journals on Node Pairs


Node pairs mirror each other's journals, and can run off each other's power supplies. The
introduction of a mirrored journal in Gen 6 improves node reliability because there is a
consistent copy of the journal, either locally, on flash or on the peer node, in the event of a
node failure. In this illustration, node 1 and node 2 are a node pair and mirror each other’s
This provides an additional level of security with respect to data that has been written to
journal. Even if the node servicing the client goes down, its paired node has a valid copy of
the journal that can be restored when the node comes back online. The link between node
pairs is built into the mid-plane of the chassis, so that this is an inherent function of node
installation, rather than a separate step.
These numbers for nodes in a chassis have no bearing on anything else with respect to
OneFS, so there is no reason to renumber nodes in terms of LNN. This is simply a numbering
system that helps identify nodes in chassis and drives in sleds.

Isilon Solution Design 88

Journal Behavior for Node Pairs


When a node boots, it first checks its own vault resources before trying to refer to its paired
node. This way if the node can recover its journal from its own resources, there is no need to
reach out to the paired node. On the other hand, if the journal is bad, the node can identify
the journal condition from its node state block data, and recovery should still be possible.
One consequence of this process is that nodes are intended to run in pairs, and if one is
running by itself, it is in an underprotected condition.

Isilon Solution Design 89

Considerations: Journaling


Since Gen 6 does not run pre OneFS 8.1 versions, older OneFS version compatibility with
journaling is not a problem. There is no change to the way pre-Gen 6 hardware does
journaling. Virtual nodes inherently lack specific hardware function, so this also does not
apply to them. The existence of node pairs in Gen 6 hardware does not complicate rolling
reboots of a cluster. Each reboot completes properly before the next commences. There is
no additional risk when separating the reboots of node pairs.

Isilon Solution Design 90

Lesson 2: File Striping


Upon completion of this lesson, you will be able to show how files are broken up to for file
stripes, explain the benefits of disk pools, and differentiate a file layout across a 3 node
versus a 6 node cluster.

Isilon Solution Design 91

Data Layout - Variables


Isilon is designed to lay out data in the most efficient, economical, and highest performing
way. There are four variables that combine to determine how data is laid out. This makes the
possible outcomes almost unlimited when trying to understand how the system will work.
The number of nodes in the cluster affects the data layout because data is laid out vertically
across all nodes in the cluster, then the number of nodes determines how wide the stripe
can be. N+Mn where N is the number of data stripe units and Mn is the protection level. The
protection level also affects data layout because you can change the protection level of your
data down to the file level, and the protection level of that individual file changes how it will
be striped across the cluster. The file size also affects data layout because the system
employs different layout options for larger files than for smaller files to maximize efficiency
and performance. The access pattern modifies both prefetching and data layout settings
associated with the node pool. Disk access pattern can be set at a file or directory level so
you are not restricted to using only one pattern for the whole cluster.
Ultimately the system’s job is to lay data out in the most efficient, economical, highest
performing way possible. You can manually define some aspects of how it determines what
is best, but the process is designed to be automated. All parts of a file are written into the
same node pool and are contained in a single disk pool.
This module will explore these variables.

Isilon Solution Design 92

Overview: File Striping


Striping protects the cluster’s data and improves performance. In our journey to
understanding OneFS data protection, the first step is grasping the concept of data and
forward error correction or FEC stripes. File stripes are portions of a file that are contained in
a single data and protection band distributed across nodes on the cluster. Each file stripe
contains both data stripe units and protection stripe (FEC) units. When a file is written into
the cluster, the node that the client connects to is responsible for calculating its data
protection. This connected node will take the file and break it into 128K data stripes, then
calculate the FEC stripes needed based on the protection level. Once the FEC is calculated,
the data and FEC stripes together are called the stripe width. So, for example, if the file is
broken into four pieces of data and one FEC - the stripe width would be five, because 4+1=5.
Once the stripe width is determined, the individual data and FEC stripes are sent across the
back-end network to other nodes in the cluster. Depending on the write pattern, the data
and FEC stripes might be written to one drive per node or TWO drives per node. The
important piece to take away from this slide is that files are broken into stripes of data, FEC
is calculated and this data is distributed across the cluster. One note, FEC works much like
RAID-5, in that it generates protection data blocks and stores them separately from the data
Now, let’s take a closer look at data protection.

Isilon Solution Design 93

Data Layout - FEC Stripes


Here we’ll take a moment to look at how files are broken into data stripes. The Isilon uses
the Reed-Solomon algorithm, which is an industry standard method to create error-
correcting codes at the file level. Isilon clusters do not use hardware or software-based RAID.
Whereas RAID is hardware-based and protects against disk failures, Reed-Solomon is
software-based and protects data. In OneFS, data protection is calculated on individual files,
giving much greater granularity and control than that of RAID schemes.
When a client connects to a node, that node is responsible for calculating the data stripe
units. This same node will then calculate the data protection needed for the file. The number
of FEC stripes will depend on the level of protection configured for the cluster. Taking a
closer look at each data stripe, you’ll see that it contains a maximum of 16 blocks. Each block
is 8 KB in size, so if we do the math, 16 times 8 is 128, which is the size of our data and FEC
Remember, Isilon specializes in Big Data, which means large files that need to be cut up and
distributed across the nodes in the cluster.

Isilon Solution Design 94

Qualifying Questions


MiTrend can be used to gather information that can help analyze the current solution.
Concurrency and access patterns will help to narrow to the right solution. You can optimize
how OneFS lays out data to match your dominant access pattern-concurrent, streaming, or

Isilon Solution Design 95

Node Pools, Neighborhoods, Disk Pools


Let’s begin with a hierarchy. The lowest groupings are disk pools. In Gen 6, neighborhoods
define a grouping of disk pools in a node pool. Next is a node pool, which is used to describe
a group of similar nodes, or a grouping of the underlying neighborhoods and disk pools
spread across similar nodes. Different types of node pools can work together to form a
heterogeneous cluster. A single node pool can range from three (pre-Gen 6) or four (Gen 6)
up to 144 nodes. All the nodes with identical hardware characteristics are automatically
grouped in one node pool. A node pool is the lowest granularity of storage space that users
manage. Underlying disk pools are automatically created and managed by OneFS. At the top
is a tier, which is covered in detail later.
Drive failures represent the largest risk of data loss especially as node pool and drive sizes
increase. All parts of a file are written into the same node pool and are contained in a single
disk pool. Disk pools are subsets or groups of drives within a node pool and data protection
stripes or mirrors don’t span disk pools. The separation into multiple disk pools creates
multiple isolated drive failure zones per node pool. Disk pool configuration is automatically
done as part of the auto provisioning process and cannot be configured manually.
Though it’s recommended to use the default-automatic node pool creation-users can
manually reconfigure node pools. Reconfiguring is an option in the event the created node
pools are not suitable for the customer workflows. Manually configured node pools may not
provide the same level of performance and efficiency as automatically configured node

Isilon Solution Design 96

Gen 4 and Gen 5 Disk Pools:
In pre-Gen 6 hardware, six drives from each node are grouped together to form a disk pool.
This illustration shows an S-Series, X-Series, and NL-Series that form the cluster. Disk pools
are the smallest unit of the data layout architecture. Similar node drives are automatically
provisioned into disk pools with each disk pool representing a separate failure domain.
For all hard drive models there are 2 to 10 separate disk pools per node pool, making for 2
to 10 isolated failure zones. The grouping is done according to the drive bay number in the
node, bays 1 thru 6 form the first disk pool, 7 thru 12 the second, and so on up to 55 thru 60
for the HD400 node models. Disk pools span no more than 39 nodes, meaning when node
40 is added to the node pool, the disk pools are logically divided, nodes 1-20 with one group
of disk pools and nodes 21-40 with another group of disk pools.
The exception to this rule is when there are 1 to 4 SSDs in a node. In these configurations,
the SSDs are placed into their own disk pool and the hard drives are distributed into near
equal counts as possible. As an example, a X200 node with 3 SSD drives would have three
disk pools, one with the 3 SSDs per node, one with 4 hard drives per node and another with
5 hard drives per node.
Gen 6 Neighborhood and Disk Pools:
Node pools in Gen 6 are the same as previous generations, that is they are made up from
groups of like-type nodes. Gen 6 node pools are then divided into neighborhoods, which are
made up of disk pools for smaller, more resilient fault domains. None of these features are
user accessible, and automatically take effect based on the size of the cluster.
With Gen 6 nodes, we have drive sleds with three, four, or six drives. Shown here is a
representation of a drive sled with three drives. This configuration is typical in the F800,
H500 and H400 nodes. In the event that a drive fails, replacing the failed drive requires an
entire sled to be pulled from the chassis. Data availability is addressed by putting each drive
within the sled into different disk pools or fault domains. The graphic illustrates a chassis
with four nodes. Each color represents a disk pool. Here disk 1 in each sled in belongs to a
disk pool different than disk 2 or disk 3. By having drives provisioned into separate disk
pools, we limit the chance for data unavailability. If a sled is pulled without proper
precautions, or a failure occurs across an entire sled, provisioning prevents multiple disk
failures from occurring within the same disk pool by prohibiting drives within a disk pool
from sharing the same sled. Pulling any single sled only removes a single disk from any disk
Data is written across disks within a disk pool. For example, a file would be written within
only one disk pool, assuming there is enough space in that pool. Files would not be written
across different disk pools. Considering we are protecting each disk pool for all the node
types at a default +2d:1n, which means you can lose 2 drives or 1 whole node, pulling a
single sled will not put you in a 'data unavailable' situation as you are only temporarily losing
a single disk per disk pool.
Gen 6 nodes, the ideal neighborhood size has dropped from 20 nodes, in pre-Gen 6 nodes,
to 10 nodes. Decreasing the size of the neighborhoods improves reliability of data access
because it decreases the amount of devices within each neighborhood. The decrease in the
size of the neighborhoods is the reason that it is recommended to use the +2d:1n, +3d:1n1d,
or +4d:2n protection levels. Use of a larger protection level, such as N+2 or N+3 (which

Isilon Solution Design 97

would allow the loss of two or three whole nodes), would have detrimental effects on the
storage efficiency given that the neighborhood size is now 10-19 nodes.
To recap, this illustration shows eight like node types in a node pool. The node pool has a
single neighborhood. The neighborhood has three disk pools.

Gen 4 and Gen 5 Disk Pools

Isilon Solution Design 98

Gen 6 Neighborhood and Disk Pools

Isilon Solution Design 99

Gen 5 Disk Pool Benefits


So what benefits do we get from disk pools? By separating the drives into multiple disk pools
we increase the statistical MTTDL. MTTDL is covered in detail in a later lesson. If we assume
the default N+2d:1n protection, you can potentially lose 2 drives per disk pool without
exceeding the protection level. So, theoretically you could lose between 4 and 20 drives per
node pool depending upon the node model. Just to reinforce the point, this assumes no
more than 2 drives per disk pool have failed.
In previous versions of OneFS, as a node pool grew in size, a higher protection level was
required to meet the MTTDL standards. As we approached 20 nodes a protection level of
N+4 (tolerate a loss of 4 drives) was required. With the disk pools we can maintain the lower
protection level of N+2d:1n for most configurations and still meet the MTTDL requirements.
The result is to lower the protection overhead, which improves the cluster space usage by
reducing the number of protection stripe units stored on a node pool.
This type of disk pool provisioning allows for greatly improved reliability for large node pools,
and node pools containing larger sized drives, and allows for lower safe protection levels
than would otherwise be achievable. This improves storage efficiency.

Isilon Solution Design 100

Gen 5 Disk Pool Division


Let’s take a closer look at what happens if Diverse Genomics scales out the Gen 5 cluster
beyond 39 nodes. When exceeding the 39th node limit, the cluster automatically divides
each of the disk pools into two separate groups. The group is just more logical disk pools
with grouping 1 spanning nodes 1-19 and grouping 2 spanning nodes 20-39. Remember the
location and division of the groups is completely managed by OneFS. The system starts a
process that divides and balances the data between the new disk pools. FEC protection is
recalculated if necessary. FEC protection would be recalculated if the stripe size has been
changed. Data is now restriped within the different disk pools. Note that the process will run
until completion and can take a very long time depending upon the amount of data,
available space on the node pool, cluster usage and the node configurations. This process
could take many days to complete. The data remain accessible and protected throughout
the process.
When additional new nodes, similar to the existing installed nodes are added, they will be
added to the group with 20 nodes until the next division threshold is reached, at which time
the process for division of the disk pools is then repeated.
Disk pools divide again when node 60 is added, then at 80, 100, 120, and 140. These division
points are for OneFS 8.0. Previous OneFS versions have the division points when node 41, 61,
81, 101, 121, and 141 are added.

Isilon Solution Design 101

Gen 6 Neighborhood Division


A Gen 6 node pool splits into two neighborhoods when the 20th node is added. One node
from each node pair moves into a separate neighborhood. Note the illustration shows after
20th node added up to the 39th node, no two disks in a given drive sled slot of a node pair
share the same neighborhood. Each color represents a different disk pool. The
neighborhoods will split again when the node pool reaches 40 nodes. At 40 nodes, each
node within the chassis will belong to a separate neighborhood thus ensuring that in the
event of a chassis failure, only one node from each neighborhood will be lost. To maintain
protection against chassis failure as the cluster scales, the next neighborhood divisions
happens when the 80th node is added, and then again when the 120th node is added. Given
a protection of +2d:1n, which allows for the loss of two drives or one node, the loss of a
single chassis will not result in a data unavailable or data loss scenario. Remember, the
+2d:1n protection is per fault domain, in this case that would be per neighborhood given
that each neighborhood will consist of 20 nodes or less.

Isilon Solution Design 102

Data Integrity


ISI Data Integrity (IDI) is the OneFS process that protects file system structures against
corruption via 32-bit CRC checksums. All Isilon blocks, both for file and metadata, use
checksum verification. Metadata checksums are housed in the metadata blocks themselves,
whereas file data checksums are stored as metadata, thereby providing referential integrity.
All checksums are recomputed by the initiator, the node servicing a particular read, on every
In the event that the recomputed checksum does not match the stored checksum, OneFS
will generate a system event, log the event, retrieve and return the corresponding FEC block
to the client and attempt to repair the suspect data block.

Isilon Solution Design 103

Considerations: File Striping


With a maximum 16 data stripe units per file stripe, the max size of a file in a file stripe is
2MB (16 x 128KB). If a file does not fill the 128K stripe unit, the stripe unit is not padded (i.e.,
the extra capacity is usable by the cluster).
With Gen 5, the division points are when node 40, 60, 80, 100, 120, and 140 are added to the
node pool. Gen 6 neighborhoods divide at node 20 and then again at node 40. The split at
node 40 provides protection against peer node and chassis failures. After 40, it splits every
40 nodes so as to maintain chassis protection. Thus, at 20, 40, 80 and so on.
The file size and protection level will determine capacity efficiency. Don't go over 80%, and
remember that you should be safe to the tune of one extra node over and above that.

Isilon Solution Design 104

Lesson 3: Data Protection


Upon completion of this lesson, you will be able to explain Mean Time To Data Loss (MTTDL),
illustrate OneFS protection schemes, distinguish between requested, suggested, and actual
protection, and discuss data access patterns.

Isilon Solution Design 105

Overview: Data Protection


Data protection is one of the variables used to determine how data is laid out. OneFS is
designed to withstand multiple simultaneous component failures (currently four) while still
affording access to the entire file system and data set. Data protection is implemented at the
file system level and, as such, is not dependent on any hardware RAID controllers. This
provides many benefits, including the ability add new data protection schemes as market
conditions or hardware attributes and characteristics evolve. Because protection is applied
at the file-level, a OneFS software upgrade is all that’s required in order to make new
protection and performance schemes available.
Files smaller than 128KB are treated as small files. Due to the way in which OneFS applies
protection, small files are mirrored.
A cluster can have multiple protection levels enabled. OneFS supports protection levels that
offer a degree of protection where up to four drives, nodes or a combination of both can fail
without data loss. This might be too much protection overhead for many environments, but
it illustrates the protection options available. The requested protection can be set by the
default system setting, at the node pool level, per directory, or per individual file. Shown in
the screen capture from the web administration interface is the requested protection at the
node pool level.
FEC uses erasure coding. Erasure codes encode the file's data in a distributed set of symbols,
adding space-efficient redundancy. With only a part of the symbol set, OneFS can recover
the original file data.

Isilon Solution Design 106

Qualifying Questions


All parts of a file are written into the same node pool and are contained in a single disk pool.
The maximum number of drives for streaming is six drives per node per node pool per file.

Isilon Solution Design 107

Data Protection Terms


MTTDL is a statistical calculation that estimates the likelihood of a hardware failure resulting
in data loss. Basically, MTTDL deals with how long you can go without losing data. Because
there are so many disk drives in a large Isilon installation, it’s common for a drive to be down
at one time or another. Where other systems try to harden against failures, Isilon was
designed to accommodate them. It was built with the expectation that any device could fail
at any point in time. MTTDL is a system view of reliability and asks the question “What
happens when hardware does fail, and will I lose any data when it does?”
Due to the variety of parameters and features in the Gen 6 hardware, MTTDL is replaced
with a MTTDL simulator to perform reliability calculations. These reliability simulations are
equal to or higher than the previous MTTDL reliability and will ensure smooth, efficient, and
reliable operations and data protection of the Gen 6 platform.
As discussed, disk pools improve MTTDL because they create more limited failure domains,
improving the statistical likelihood of tolerating failures over the lifetime of the equipment:
The model predicts that MTTDL is greater than 5,000 years.
We’ll note that MTBF (mean time before failure) refers to individual component failure. Isilon
subscribes to the ‘all devices will fail’ philosophy (MTTDL), whereas MTBF is a single-
component view of reliability. MTTDL is a better measure of what customers actually care

Isilon Solution Design 108

Quorum is important for anticipating failure scenarios. For a quorum, more than half the
nodes must be available over the internal, back-end network. A seven-node Gen 4/5 cluster,
for example, requires a four-node quorum. A 10-node Gen 4, Gen 5, or Gen 6 cluster
requires a six-node quorum. Imagine a cluster as a voting parliament where the simple
majority wins all votes. If 50% or more of the members are missing, there can be no vote.
Reads may occur, depending upon where the data lies on the cluster but for the safety of
new data, no new information will be written to the cluster. So, if a cluster loses its quorum,
the OneFS file system becomes read-only and will allow clients to access data but not to
write to the cluster.
Each protection level requires a minimum number of nodes. For example, N+2d:1n needs a
minimum of three Gen 4/5 nodes or four Gen 6 nodes. Why? You can lose two nodes and
still have three Gen 4/5 nodes or four Gen 6 nodes up and running; greater than 50%. You
must keep quorum to keep the cluster writeable.
You can protect your data using anywhere from 2-8x mirroring, depending on the
importance of the data and what is considered acceptable protection overhead. Because
mirrored data creates exact duplicates, it consumes more space and overhead then the
other protection schemes.
N+Mn illustrates the primary protection level in OneFS. The first capitol “N” represents the
number of data stripes and capitol-“M”-small-“n” represents the number of simultaneous
drive -“M”, or node -“n”, failures that can be tolerated without data loss. M also represents
the number of protection or FEC stripes created to meet the failure tolerance requirements.
The available N+Mn Requested Protection levels are plus one, two, three, or four “n” (+1n,
+2n, +3n, and +4n). With N+Mn protection, only one stripe unit is written to a single drive on
the node.
The minimum number of nodes required in the node pool for each requested protection
level is displayed in the chart. Note that Gen 6 hardware is only scaled out in node pairs,
thereby increasing the minimum node pool size.
If N equals M, the protection overhead is 50 percent. For example, with N+2n, a file size
256KB with have a 50% protection overhead (256KB = 2 stripe units). N must be greater than
M to gain efficiency from the data protection. If N is less than M, the protection results in a
level of FEC calculated mirroring.
Recall that the disk pools provide drive failure isolation zones for the node pool. The number
of sustainable drive failures are per disk pool on separate nodes. Multiple drive failures on a
single node are equivalent to a single node failure. The drive loss protection level is applied
per disk pool.
N+Md:Bn Protection
The best way to read this protection level is to remember that the lowercase “d” indicates
the number of DRIVES and the lowercase “n” the number of NODES.
So N+3d:1n reads as N+3 drives or 1 node. In this protection level, M is the number of drives

Isilon Solution Design 109

per node onto which a stripe unit is written. M is also the number of FEC stripe units per
protection stripe. If you need to be able to suffer 3 drives failing, well then you’ll need to
write 3 pieces of FEC across 3 separate drives. N+Md:Bn utilizes multiple drives per node as
part of the same data stripe and will have multiple stripe units per node. This protection
level lowers the protection overhead by increasing the size of the protection stripe. This
protection scheme simulates a larger node pool, by utilizing multiple drives per node. The
single protection stripe spans the nodes and each of the included drives on those nodes.
N+Md:Bn Advanced Protection
In addition to the previous N+Md:Bn, there are two advanced forms of Requested Protection,
N+3d:1n1d and N+4d:2n. M represents the number of FEC stripe units per protection stripe.
However, the number of drives per node and the number of stripe units per node is set at
two. The number of stripe units per node does not equal the number of FEC stripe units per
protection stripe. The benefit to the advanced N+Md:Bn protection levels is that they provide
a higher level of node loss protection. As previously stated, the higher protection provides
the extra safety during data rebuilds associated with the larger drive sizes of 4TB and 6TB.
The maximum number of data stripe units is 15 and not 16 when using +3d:1n1d Requested
N+3d:1n1d includes three FEC stripe units per protection stripe, and provides protection for
three simultaneous drive losses, or one node and one drive loss. The higher protection
provides the extra safety during data rebuilds associated with the larger drive sizes of 4TB
and 6TB. N+4d:2n includes four FEC stripe units per stripe, and provides protection for four
simultaneous drive losses, or two simultaneous node failures.
Actual Protection Nomenclature
The actual protection nomenclature is represented differently than requested protection
when viewing the output showing actual protection from the isi get -D or isi get -DD
command. The output displays the number of data stripe units plus the number of FEC
stripe units divided by the number of disks per node to which the stripe is written. The chart
displays the representation for the requested protection and the actual protection. N is
replaced in the actual protection with the number of data stripe units for each protection
stripe. If there is no / in the output, it implies a single drive per node. Mirrored file protection
is represented as 2x to 8x in the output.
Overhead for Protection Levels
The overhead for each protection level depends on the file size, and the number of nodes in
the cluster. The percentage of protection overhead declines as the cluster gets larger. In
general, N+1n protection has a protection overhead equal to one node’s capacity, N+2n
protection has a protection overhead equal to two nodes' capacity, N+3n would be three
nodes, and so on.
OneFS supports data mirroring. Data mirroring requires significant storage overhead and
may not always be the best data-protection method. For example, if you enable 3x mirroring,
the specified content is explicitly duplicated three times on the cluster; depending on the
amount of content being mirrored, this can require a significant amount of capacity.
The table displayed indicates the relative protection overhead associated with each FEC
Requested Protection level available in OneFS. Indicators include when the FEC protection

Isilon Solution Design 110

would result in mirroring
All Protection Levels
OneFS provides +1n through +4n protection levels, providing protection against up to four
simultaneous component failures respectively. A single failure can be as little as an
individual disk or, at the other end of the spectrum, an entire node. This chart provides an
easy reference for all of the protection levels. As highlighted, with Gen 6, for better reliability,
better efficiency, and simplified protection using +2d:1n, +3d:1n1d, or +4d:2n is
recommended. Remember that Gen 6 requires a minimum of 4 nodes of the same type, so
where the minimum number of nodes of three is indicated, for Gen 6 this is four. When
mirroring the cluster can recover from N - 1 drive or node failures without sustaining data
loss. For example, 4x protection means that the cluster can recover from three drive or three
node failures.


Isilon Solution Design 111


Isilon Solution Design 112


Isilon Solution Design 113


Isilon Solution Design 114

N+Md:Bn Protection

Isilon Solution Design 115

N+Md:Bn Advanced Protection

Isilon Solution Design 116

Actual Protection Nomemclature

Isilon Solution Design 117

Overhead for Protection Levels

Isilon Solution Design 118

ALL Protection Levels

Isilon Solution Design 119

No Quorum

Isilon Solution Design 120

Yes Quorum

Isilon Solution Design 121

Use Case: N+2n vs N+2d:1n


Let’s take a look at a use case to help clarify how N+2d:1n is more efficient than N+2n. Using
a 1MB file, there are 8 data stripe units to write in the file stripe (8 x 128K). The desired
protection will sustain the loss of two hard drives.
In a 5 node cluster using N+2n protection, the 1MB file would be placed into 3 separate file
stripes, each with 2 protection stripe units. A total of 6 protection stripe units are required to
deliver the requested protection level (2 disks or 2 nodes) for the 8 data stripe units. The
protection overhead is 43 percent. Using N+2d:1n protection the same 1MB file requires 1
data stripe, 2 drives per node wide per node and only 2 protection stripe units. The 10 stripe
units are written to 2 different drives per node. The protection overhead is 20%, the same as
a 10 node cluster at N+2n protection.
Note that higher protection levels linearly impact utilization for large files. As an example, a
10 node X410 cluster at N+2n results in 20% protection overhead, whereas the same cluster
at N+3n results in 30% protection overhead.

Isilon Solution Design 122

Actual Protection Applied to File


In OneFS, the actual protection applied to a file depends on the requested protection level,
the size of the file, and the number of nodes in the node pool. Actual protection must meet
or exceed the requested protection level but may be laid out differently than the requested
protection default layout. For example, if you have a requested protection of N+2d:1n and
there is a 2MB file and a node pool of at least 18 nodes, the file is actually laid out as N+2n.
Also, if you have a small file of 128KB or less, the file is actually protected using 3x mirroring.
In both cases, the minimum drive loss protection of 2 drives and node loss protection of 1
node are exceeded by the actual protection applied to the file. The exception to meeting the
minimum requested protection is if the node pool is too small and unable to support the
requested protection minimums. For example, a node pool with 3 nodes and set to N+4n
requested protection. The maximum supported protection is 3x mirroring in this scenario.

Isilon Solution Design 123

Mirrored Data Protection


With mirrored data protection the blocks are copies (or mirrors) of the original set of data
blocks. OneFS can employ 2x to 8x mirrored protection, meaning a 4x mirror stores 4 copies
of the data across the cluster for a total of 5 instances of the data (original plus 4 copies). By
default, mirroring protects the file’s metadata and some system files that exist under /ifs in
hidden directories.
Mirroring can be explicitly set as the requested protection level. One particular use case is
where the system is used to only store small files-files less than 128KB. Some workflows
store millions of 1KB to 4KB files. Explicitly setting the requested protection to mirroring
saves fractions of a second per file and reduces the write ingest time for the files.
Under certain conditions, mirroring is set as the actual protection on a file even if another
requested protection level is specified. If the files are small, the FEC protection for the file
results in a mirror. The number of mirrored copies is determined by the loss protection
requirements of the requested protection. Mirroring is also used if the node pool is not large
enough to support the requested protection level.
Protection policies have varying impacts to performance; FEC requires RMW (read-modify-
write) and additional CPU resources, mirroring does not require these.

Isilon Solution Design 124

Storage Pool Protection Setting


The default file pool policy protection setting is to use the node pool or tier setting.
Requested protection is set per node pool. When a node pool is created, the default
requested protection applied to the node pool is +2d:1n.
The required minimum requested protection for an HD-Series node pool is +3d:1n1d. You
are requested to modify the H-Series node pool requested protection to meet this minimum.
The requested protection should meet the minimum requested protection level for the node
pool configuration. The minimum is based on MTTDL calculations for the number of nodes
and the drive configuration in the nodes. If the requested protection requires modification,
the screen capture shows the File System > Storage Pools > SmartPools page where the
node pool requested protection is modified.

Isilon Solution Design 125

Suggested Protection


When a node pool is below the MTTDL standards, the data is at risk. This doesn’t mean data
loss will occur, it does indicate the data is below the MTTDL standards and anything that
puts data at risk is considered something to be avoided. Based on the configuration of your
Isilon cluster, OneFS automatically calculates the amount of protection that is recommended
to maintain the cluster’s stringent data protection requirements. Suggested protection refers
to the visual status and CELOG event notification for node pools that are set below the
calculated suggested protection level. The suggested protection is based on meeting the
minimum MTTDL standard for EMC Isilon node pools. The notification doesn’t give the
suggested setting and node pools that are within suggested protection levels are not
displayed. As shown in the web administration interface (File System >Storage Pools >
Summary page), the suggested protection is part of the SmartPools health status reporting.
When a new node pool is added to a cluster or the node pool size is modified, the suggested
protection level is calculated and the MTTDL calculations are compared to a database for
each node pool. The sizing tool is used to determine appropriate node pool sizing for a
customer workflow, and calculates the appropriate suggested protection levels based on the
node pool size and node configuration.
What commonly occurs is a node pool starts small and then grows beyond the configured
requested protection level. The once adequate +2d:1n requested protection level is no
longer appropriate, but is never modified to meet the increased MTTDL requirements. The
suggested protection feature provides a method to monitor and notify users when the

Isilon Solution Design 126

requested protection level should be changed.

IO Optimization: Data Access Patterns


Data access patterns are another variable used to determine how data is laid out. Ultimately,
the system’s job is to lay data out in the most efficient, economical, highest performing way
possible. You can manually define some aspects of how it determines what is best, but the
process is designed to be automated.
Concurrency is used to optimize workflows with many concurrent users accessing the same
files. The preference is that each protection stripe for a file is placed on the same drive or
drives depending on the requested protection level. For example, a larger file with 20
protection stripes, each stripe unit from each protection stripe would prefer to be placed on
the same drive in each node. Concurrency is the default data access pattern. Concurrency
influences the prefetch caching algorithm to prefetch and cache a reasonable amount of
anticipated associated data during a read access.
Streaming is used for large streaming workflow data such as movie or audio files. Streaming
prefers to use as many drives as possible when writing multiple protection stripes for a file.
Each file is written to a disk pool within the node pool. With a streaming data access pattern,
the protection stripes are distributed across the disk pool drives. This maximizes the number
of active drives per node as the streaming data is retrieved. Streaming also influences the

Isilon Solution Design 127

prefetch caching algorithm to be highly aggressive and gather as much associated data as
A random access pattern prefers using a single drive per node for all protection stripes for a
file just like a concurrency access pattern. With random however, the prefetch caching
request is minimal. Most random data does not benefit from prefetching data into cache.

Storage Pool Features


Click on the buttons to review the different features.

Virtual Hot Spare:
VHS allocation enables you to allocate space to be used for data rebuild in the event of a
drive failure. This feature is available with both the licensed and unlicensed SmartPools
module. By default, all available free space on a node pool is used to rebuild data. The virtual
hot spare option reserves free space for this purpose. VHS provides a mechanism to assure
there is always space available and to protect data integrity in the event of overuse of cluster
space. Another benefit to VHS is it can provide a buffer for support to repair nodes and node
pools that are overfilled. You can uncheck the Deny data writes to reserved disk space
setting and use the space for support activities.
Using the Virtual hot spare (VHS) option, for example if you specify two virtual drives or 3

Isilon Solution Design 128

percent, each node pool reserves virtual drive space that is equivalent to two drives or 3
percent of their total capacity for virtual hot spare, whichever is larger. You can reserve
space in node pools across the cluster for this purpose, equivalent to a maximum of four full
drives. If you select the option to reduce the amount of available space, free-space
calculations exclude the space reserved for the virtual hot spare. The reserved virtual hot
spare free space is used for write operations unless you select the option to deny new data
writes. VHS is calculated and applied per node pool across the cluster.
VHS reserved space allocation is defined using these options:
 A minimum number of virtual drives in each node pool (1-4)
 A minimum percentage of total disk space in each node pool (0-20 percent)
 A combination of minimum virtual drives and total disk space. The larger number of
the two settings determines the space allocation, not the sum of the numbers. If you
configure both settings, the enforced minimum value satisfies both requirements.
It is recommended you use the default settings enabling VHS, ignoring reserved space for
free space calculations, and deny writes to reserved space. The recommended space
allocation setting varies by customer. A safe setting would be At least 2 virtual drive(s).
As a support note, if the Ignore reserved space and Deny data writes options are enabled, it
is possible for the reported file system use percentage to be over 100%.
Global Spillover:
The Enable global spillover and Spillover Data Target options configure how OneFS handles a
write operation when a node pool is full. With the licensed SmartPools module, a customer
can direct data to spillover to a specific node pool or tier group of their choosing. If spillover
is not desired, then you can disable spillover so that a file will not move to another node
Virtual hot spare reservations can affect when spillover would occur. For example, if the
virtual hot spare reservation is 10 percent of storage pool capacity, spillover occurs if the
storage pool is 90 percent full.
Global Namespace Acceleration:
The purpose of GNA is to accelerate the performance of metadata-intensive applications and
workloads such as home directories, workflows with a heavy enumeration and activities
requiring a large number of comparisons. Example of metadata-read-heavy workflows exist
across the majority of Isilon's established and emerging markets. In some, like EDA, such
workloads are dominant and the use of SSDs to provide the performance they require is
GNA enables SSDs to be used for cluster-wide metadata acceleration and the use of SSDs in
one part of the cluster to store metadata for nodes that have no SSDs. For example if you
have ten S-Series nodes with SSD drives and three NL nodes that do not have SSD drives,
you can accelerate the metadata for the data that resides on the NL nodes by uses GNA to
store metadata on the SSD drives that sit inside of the S-Series nodes. The result is that
critical SSD resources are maximized to improve performance across a wide range of
workflows. Global namespace acceleration can be enabled if 20% or more of the nodes in
the cluster contain SSDs and 1.5% or more of the total cluster storage is SSD-based. The

Isilon Solution Design 129

recommendation is that at least 2.0% of the total cluster storage is SSD-based before
enabling global namespace acceleration. If you go below the 1.5% SSD total cluster space
capacity requirement, GNA is automatically disabled and all GNA metadata is disabled. If you
SmartFail a node containing SSDs, the SSD total size percentage or node percentage
containing SSDs could drop below the minimum requirement and GNA would be disabled.

Virtual Hot Spare

Isilon Solution Design 130

Global Spillover

Isilon Solution Design 131

Global Namespace Acceleration

Isilon Solution Design 132

Examples: Access Pattern


The process of striping spreads all write operations from a client across the nodes of a
cluster. The example in this animation demonstrates how a file is broken down into chunks,
after which it is striped across disks in the cluster along with forward error correction (FEC).
The file is divided into 8K blocks that are written into 128K stripe units. Even though a client
is connected to only one node, when that client saves data to the cluster, the write operation
occurs in multiple nodes in the cluster. This is also true for read operations. A client is
connected to only one node at a time, however, when that client requests a file from the
cluster, the node to which the client is connected will not have the entire file locally on its
drives. The client’s node retrieves and rebuilds the file using the back-end network.
This illustrates the easiest example. Click on the buttons to view more examples.
Gen 5: Concurrency:
Diverse Genomics will be using the default protection for their home directory files. Shown
here is how a 2MB file is laid out using concurrency on the cluster. The 2MB file is divided
into 2 file stripes and with a N+2d:1n protection level, each file stripe has two FEC units.
N+2d:1n protects against two drive failures or one node failure. Protection schemes such as
N+2d:1n and N+3d:1n are particularly useful for high-density node configurations, where
each node contains up to thirty six, multi-terabyte SATA drives. Here, the probability of
multiple drives failing far surpasses that of an entire node failure. In the unlikely event that
multiple devices have simultaneously failed, such that the file is “beyond its protection level”,

Isilon Solution Design 133

OneFS will re-protect everything possible and report errors on the individual files affected to
the cluster’s logs.
Gen 5: Streaming:
Now let’s see how a 2MB file is laid out using a streaming access pattern on the Diverse
Genomic cluster. The default N+2d:1n protection level is used. Each file stripe has a single
FEC unit. Remember, because of workflows such as video, that streaming prefers to use as
many drives as possible.
The data layout on the slide is for illustrative purposes only and does not address the
concept of disk pools as discussed earlier. A file would write only within its disk pool and so
the data layout for this file would be constrained to the disk pool.
Gen 6: Access Patterns:
There is no difference in the way the stripe and protection units are written in Gen 6
hardware. This example shows the drive sleds and drives for an H600 chassis with four
nodes. Each of the three colors represents a different disk pool. Here we’ll show a 3MB file
written to the disk pool in blue, which encompasses the first drive in each sled on each node.
Note that the data is written the same whether the access pattern is concurrency or

Gen 5: Concurrency

Isilon Solution Design 134

Gen 5: Streaming

Isilon Solution Design 135

Gen 6: Access Patterns

Isilon Solution Design 136

Tiering vs Independent Clusters


For archive and large capacity clusters, it’s good to consider whether a single large cluster is
the right solution. Many clusters don’t need namespace acceleration under ordinary
workload, but internal processes such as file system analysis or a rebuild may still take a lot
of time. Depending on the comfort level of the customer, it may be more efficient to split the
archive and the working production data into separate clusters. Separate clusters might also
provide a workaround if the back-end cables are not long enough to handle all nodes in one
cluster. There may be a case for clusters to be separated if there are SEC regulations around
particular sets of data. The customer may want one compliant cluster and one Enterprise
cluster that does not have the WORM compliance challenges. Also, new nodes with QSFP
InfiniBand adapters support fibre cables up to 100 meters.

Isilon Solution Design 137

Sizing for Data Protection


Once a pre-Gen 6 cluster grows beyond 6 or 7 nodes, the protection level should be
increased to at least N+2n from the default N+2d:1n in order to decrease the risk of data
loss. At N+2d:1n, as soon as one node fails, customers become nervous, fearing that one
more failure means data loss. But at N+2n, you have some buffer and a chance to fix the
first failure without the threat of imminent data loss. You can remove the failed node from
the cluster without fear. Remember the recommended protection for Gen 6 is N+2d:1n,
N+3d:1n1d or N+4d:2n.
Adding extra protection can affect performance. One easy way to solve this trade-off is
simply to add another node.
In practice, the following steps should be followed:
1. Establish physical configurations (with policies) that satisfy the design MTTDL
requirement. The sizing tool will generally only produce configurations that satisfy
data protection requirements.
2. For these options, evaluate the capacity impacts of data protection choices.
3. Determine whether protection policy or pool configurations impact performance.
4. Determine a configuration that satisfies all requirements.
Click on the buttons to learn more about sizing.

Isilon Solution Design 138

Sizing Tips:
When sizing, make a habit of checking the Isilon Sizing Tool as a starting point or a double-
The default protection level works well on smaller clusters but as you increase the number
of nodes, you are also increasing the chance that one of the nodes will fail. As a cluster gets
larger, the protection level should be increased to accommodate multiple failures, hence a
20 node cluster should be able to withstand losing 3 full nodes or 3 drives, at N+3. Using N+1
or N+2d:1n will not protect as efficiently for large clusters. With a 20 node cluster, the
overhead for going from N+2d:1n to N+2n is relatively small - a mere 5 or 10% of capacity
invested in return for much better resiliency. In a 4-node cluster, you go from 25% overhead
to 50% overhead. Isilon was built to scale, and gets better as the array gets larger.
CPU speed on archival type nodes is slower than other node types designed for workflows
requiring more performance. Thus, in the event of a failure, the rebuild time on archival type
nodes is greater. This is why for large archive clusters, the more fault-tolerant protection
level of N+3 is recommended.
Single Cluster:
Some workflows require 100% uptime. Many companies (especially in the Media and
Entertainment market) accomplish this by using two clusters at 50% utilization. Then, for
maintenance, such as a firmware update, they can redirect the workflow temporarily to one
cluster while they manage the other.
Protection Sizing:
The Isilon Sizing tool will ensure configurations comply to design MTTDL considerations.
Always verify your configurations using the Sizing tool. When considering the size of the
cluster, realize that different protection options influence capacity and performance due to
the overhead and the writes of the protection. Small clusters are constrained to certain
protection options due to the concept of quorum. Mirrored (e.g., 2x) policies are useful for
small files and random access workloads, due to increased performance. The reason
mirroring policies improve performance is because mirroring consumes less CPU; the
system doesn’t need to read/modify/write, it just overwrites data as needed.

Isilon Solution Design 139

Sizing Tips

Isilon Solution Design 140

Single Cluster

Isilon Solution Design 141

Protection Sizing

Isilon Solution Design 142

Considerations: Data Protection


As the cluster scales, the default protection may need to be adjusted. You may not want to
apply a higher protection to the entire cluster. Although this is better protection, it’s less
By default, the suggested protection feature is enabled on new clusters. On clusters
upgrades the feature is disabled by default. This is by design because a field review and
customer discussion is necessary to mitigate any concerns and to fully explain the suggested
protection feature before it is turned on. Some customer node pools may be below the
suggested protection level and, although important to meet MTTDL, it is not a critical
situation. The discussion consists of the impact on protection overhead, any potential
workflow impacts, and an assessment of any risk. After the discussion, the feature can be
enabled using a non-customer facing command. Customers should contact their EMC Isilon
account team to arrange a field review.
Higher protection levels will heavily impact utilization for small files. Remember OneFS
considers small file to be 128KB or less and with the N+2d:1n default protection, the small
files are mirrored at 3X. Using a mirrored scheme makes sense if all files are small, though
the capacity utilization is unchanged, performance increases.
As protection increases, performance decreases, because the system is doing more work to
calculate and stripe the protection data. Thus, the same cluster at N+3 will perform slower
than it does at N+2. An example is N+3 has approximately 5% less performance than N+2 for
sequential writes. Again, often the answer to performance slow-downs caused by raising

Isilon Solution Design 143

protection levels is to add an extra node. Note that in some workflows that desire both high
performance and high protection, it may be an option to ingest data at high speeds, then
subsequently - even immediately - move it to another tier that focuses less on performance
and more on protection.

Considerations: Data Protection (cont'd)


The default protection level for a cluster is N+2d:1. This protection level works well on
smaller clusters but as you increase the number of nodes, you are also increasing the
chance that one of the nodes will fail. Remember, as a cluster gets larger, the protection
level should be increased to accommodate multiple failures.
For a large NL cluster, bear in mind that the CPU on an NL node is slower than other clusters.
The Sizing Tool defaults to a more fault tolerant protection level of N+3 when you specify
large NL clusters.
Some workflows store millions of 1KB to 4KB files. Explicitly setting the requested protection
to mirroring can save fractions of a second per file and reduce the write ingest time for the
N+3d:1n1d is suggested for node pools with larger drives, minimum for node pools with 6TB

Isilon Solution Design 144

Gen 6 supports all the data protection levels used by the previous generations. Because of
the decrease in Gen 6 neighborhoods size, it is recommended to use the N+2d:1n,
N+3d:1n1d or N+4d:2n protection levels. Use of a larger protection level, such as N+2 or N+3
(which would allow the loss of two or three whole nodes), would have detrimental effects on
the storage efficiency given that the neighborhood size is now 10-19 nodes.
The maximum number of drives for streaming is six drives per node per node pool per file.
Data sets can be protected with different policies via SmartPools file pool policies and
manually at a directory and file levels so take into account any repositories that the
customer might want protected at a higher level than the cluster default.

Lesson 4: Working with Small Files


After completing this lesson, you will be able to define small files in OneFS, illustrate actual
protection applied to file less than 128KB, and describe application of calculated FEC and FEC

Isilon Solution Design 145

Small File Size: Under 128KB


In OneFS the definition of a small file varies, but it often refers to a file less than one stripe
unit in length, or 128KB or less. Small files result in the protection being mirrored. When FEC
protection is calculated, it is calculated at the 8KB block level. If there is only one 8KB to use
in the calculation, the result is a mirror of the original data block. The number of mirrored
blocks is determined by the requested protection level. The table illustrates a 64KB file with
the protection level set at N+2d:1n. Note that with this protection level, a 2x mirror is applied.
The result is that the 64KB file consumes 192KB of storage.
Since small files are a single stripe unit and not related to other stripe units, there is no, or at
best, minimum benefits obtained from read or write cache. The use of L3 cache can improve
chances of gaining a cache benefit for repeat random reads. In other words, the same small
read multiple times could benefit from L3 cache. For many workflows this occurs frequently.
If the workflow is predominantly small files, setting the access pattern to random can reduce
unnecessary cluster resource utilization used when predicting cache data. If the workflow
data is going to be all small files, CPU resources can be saved by setting the requested
protection level as mirrored protection.
Warning: All files managed by the setting will be mirrored regardless of file size. Be selective
in the use, and use only when appropriate.

Isilon Solution Design 146

Small Files Size: Over 128KB and Less Than 256KB


If you have files greater than 128KB and less than 256KB some of the FEC blocks will result in
mirrors. Not all 8KB blocks will have a corresponding block in the second data stripe to
calculate FEC against. The table illustrates an example of a 176KB file. Notice the file has one
128KB stripe unit and one 48KB stripe unit. The first six 8KB blocks of each stripe unit will
calculate FEC results. The remaining ten 8KB blocks will result in mirrored protection. This is
still a small file and might have some caching benefit, but very little. L3 cache will recognize
this file size and enable repeat random read caching. Setting a random access pattern may
be appropriate depending on the workflow.

Isilon Solution Design 147

Calculating Space for Small File


With OneFS only the required 8KB to save the file are utilized. 8KB is the minimum block size
used. 8KB was chosen for storage efficiencies and was determined to be the optimal size for
the majority of the workflows on Isilon. Any file or portion of a file less than 8KB will
consume an 8KB block. So a 4KB file will consume one 8KB block, and a 12KB file will
consume two 8KB blocks. If we take a 4KB file and have N+2d:1n as the requested protection
level, we can calculate the on disk space requirements. We would have 8KB for the data, and
have two 8KB mirrors for the protection, for a total of 24KB. If we want to get more precise,
we also need to calculate the metadata usage. Metadata is calculated per file. Assuming we
do not have GNA enabled, we have three 512B metadata blocks per file for this example, or
1.5KB. So the total space is 25.5KB for the file on disk.

Isilon Solution Design 148

Example: Small Files


All files 128KB or less are mirrored. For a protection strategy of N+1 the 64K file would have
a 2X mirroring; the original data and one mirrored copy. Any file less than or equal to 128KB
is still FEC ECC calculated but the result is a copy. The other 64K of the 128K stripe unit not
used are free to be used in the next stripe unit. The stripe unit is not padded and the
capacity is not wasted.

Isilon Solution Design 149

Avoiding Overhead for Small Files


There are a few things you can do to avoid miscalculating true overhead when considering
the file sizes of the data set. The first is to break the data set into three types of files sizes
and calculate overhead for each separately. The second is that the total space of the files
within those categories should be considered not the number of files. A very small number
of large files, over 128 KB can quickly offset the overhead of many small files.

Isilon Solution Design 150

Small Files vs Large Files


At first observation it appears mirroring small files could be a large space concern. Let's put
this into better perspective by going through some examples. If you had one million 24KB
files stored with a requested protection of N+2d:1n, the amount of space consumed on the
cluster, including the file data, protection overhead and metadata would be approximately
73.5 million KB, or 70.09GB. Even with the protection and metadata overhead, the space
consumed is really not very much.
Now if we stored 1.5 hour YouTube videos at 1080p, they would average approximately
1.2GB per file before protection and metadata, or approximately 1.35GB per file with
protection and metadata overhead. So one million small files is about the same as 52
YouTube videos. It takes few large files to equal the space consumed by a large number of
small files.
OneFS small file usage may not be highly efficient, but there also is not a huge impact. One
idea is to look at data in three categories. The number of small files and the average file size,
the number of large files and average file size, and the number of all other or medium files
and average file size. The idea is to look at all workflows and not just the workflow with a
large number of small files.

Isilon Solution Design 151

Example: Overhead in Mixed Data Sets


This is an example illustrating when the data set is separated into file categories very few
large files can quickly offset the extra overhead of small files; with 99.98% of the files less
than 129 KB. OneFS has a blended overhead of 58%.

Isilon Solution Design 152

Considerations: Mixed Data Sets


Remember that different file sizes incur different protection overhead depending on the size
and the protection level set. Most data sets include a mix of small and large files contained
together. It takes very few large files in a data set to offset the high protection overhead of
many small files. Storage consolidation has the extra benefit of creating data sets with mixed
file sizes, which further reduces total storage overhead.
Always analyze the full distribution of small and large files, not the average file size. Average
file size calculates to significantly higher storage overhead.

Isilon Solution Design 153

Lesson 5: Caching


After completing this lesson, you will be able to describe Isilon’s caching architecture, explain
the function of L1, L2, and L3 caching, and define Endurant cache.

Isilon Solution Design 154

Overview: OneFS Caching


Isilon employs multiple methods of caching. The caching architecture was designed to
leverage the distributed and highly parallel nature of the cluster. OneFS groups RAM into a
single coherent cache so that a data request on a node benefits from data that is cached
anywhere. NVRAM is grouped to write data with high throughput and to protect write
operations from power failures.
Caching accelerates data access by placing a copy on a lower latency medium other than
spinning drives, thus improving the performance of client reads. Because cache is a copy of
the metadata and user data, any data contained in cache is temporary and can be discarded
when no longer needed. Cache in OneFS is divided into levels and each level serves a specific
purpose in read and write transactions.
The cache levels provide a guidance to the immediacy of information from a client-side
transaction perspective, the relative latency or time to retrieve or write information, and
indicates how the cache is refreshed, how long the data is available and how the data is
emptied or flushed from cache. SSDs are employed in the cluster’s caching architecture,
increasing capacity, affordability, and persistence.

Isilon Solution Design 155

Client and Node Cache


Displayed here is a diagram of a six node Gen 5 cluster divided into two node pools, with a
detailed view of one of the nodes. Note that caching is unchanged in Gen 6. Illustrated are
the clients connected to the L1 cache and the write coalescer.
 The L1 cache is connected to the L2 cache on all of the other nodes and within the
same node. The connection to other nodes occurs over the back-end network when
data contained on those nodes is required for reads or writes.
 The L2 cache on the node connects to the disk storage on the same node.
 The L3 cache is connected to the L2 cache and serves as a read only buffer. L3 cache
is spread across all of the SSDs in the same node and enabled per node pool.
L1 cache is the immediate buffer on the node connected to the client and is involved in any
immediate client data transaction. L1 cache specifically refers to read transaction requests,
or when a client requests data from the cluster. L1 cache collects the requested data from
the L2 cache of the nodes that contain the data. The write coalescer buffers write
transactions from the client to be written to the cluster. The write coalescer collects the write
blocks and performs the additional process of optimizing the write to disk.
L2 cache stores blocks from previous read and write transactions, buffers write transactions
to be written to disk, and prefetches anticipated blocks for read requests. L2 cache is
available to serve L1 cache read requests and to take data handoffs from the write coalescer.
For write transactions, L2 cache works in conjunction with the NVRAM journaling process to

Isilon Solution Design 156

insure protected committed writes. L2 cache is node specific, interacting with the data
contained on the specific node. The interactions between the drive subsystem, the hard
drives and the SSDs on the node go through the L2 cache for all read and write transactions.
L3 cache reduces the process and resource expensive random read I/O from the hard disks
and improves random read performance within OneFS. The L3 cache implementation and
the advanced caching algorithms are design to improve most common workflows. L3 cache
can provide an additional level of storage node-side cache by utilizing the node’s SSDs as
read cache. Because SSDs are larger than RAM, SSDs can store significantly more cached
metadata and user data blocks than RAM. Like L2 cache, L3 cache is node specific and only
caches data associated with the specific node.
Note that since Accelerator nodes (A100) do not write data to their local disks, there are no
blocks to cache. Instead accelerator nodes use all their memory for level 1 cache to service
their clients. The performance advantage of accelerators is its ability to serve more clients,
and potentially hold a client’s working set entirely in cache.
Another type of caching Isilon uses is Endurant Cache. This is for synchronous writes or
writes that require a stable write acknowledgement be returned to the client. This cache
provides ingest and staging of stable synchronous writes. It manages the incoming write
blocks and stages them to stable battery backed NVRAM or Gen 6 vault, ensuring the
integrity of the write. Endurant Cache also provides stable synchronous write loss protection
by creating multiple mirrored copies of the data, further guaranteeing protection from single
node and often multiple node catastrophic failures. The process lowers the latency
associated with synchronous writes by reducing the “time to acknowledge” back to the client.
The process removes the read-modify-write operations from the acknowledgement latency
path. Endurant Cache was specifically developed to improve NFS synchronous write
performance and write performance to VMware VMFS and NFS datastore.

Isilon Solution Design 157

Example: Cache Coherency


Let’s take a moment to illustrate how the OneFS caching subsystem is coherent across the
cluster. If the same content exists in the private caches of multiple nodes, the cached data is
consistent across all instances. Shown here is the Diverse Genonmic’s 6 node cluster.
1. Node 1 and node 5 each have a copy of data located at an address in shared cache.
2. Node 1, in response to a write request, invalidated node 5’s copy.
3. Node 1 updates the value.
4. Node 5 must re-read the data from shared cache to get the updated value.
OneFS uses the MESI (Modified Exclusive Shared and Invalid) protocol to maintain cache
coherency. MESI implements an invalidate-on-write policy to ensure that all data is
consistent across the entire shared cache.
Ref: EMC ISILON OneFS SMARTFLASH File System Caching Infrastructure whitepaper.

Isilon Solution Design 158

Considerations: Caching


The most common use for L3 cache is for metadata read acceleration. The customer
challenge is SSDs are usually underutilized with metadata read acceleration and only the
metadata is available for faster access. The other challenge is appropriately sizing for data
on SSD. Customers may require fast access to the data blocks on SSD usually for random
data workflows. To size appropriately, very careful manipulation of the data on SSD is
required or configurations with significantly more SSDs are required.
Changing the SSD strategy to use L3 cache is transparent with little to no impact for newly
configured (empty) node pools configured for an SSD strategy. All metadata and user data is
relocated off the SSDs to HDDs within the same node. For existing node pools with metadata
read or metadata read/write acceleration enabled, every file on the node requires an
updating. For a data on SSD strategy, both the user data and the metadata mirror copies
must be moved to HDD. The SmartPools and FlexProtect jobs are run to manage the file and
metadata migration. Once all metadata and user data has been vacated from the SSDs, the
SSDs are reformatted specifically for L3 cache use. L3 cache populates the SSD as the node is

Isilon Solution Design 159

Considerations: Caching (cont'd)


Though SSDs provide L3 cache with more benefits, there is a tradeoff for HDDs. Calculating
the SSD space for a given workflow can be done using the isi_cache_stats command. L3
cache is enabled by default for all new node pools added to a cluster. New node pools
containing SSDs are automatically enabled. A global setting is provided in the web
administration interface to change the default behavior. Each node pool can be enabled or
disabled separately. L3 cache is either on or off and no other visible configuration settings
are available. L3 cache consumes all SSD capacity in the node pool when enabled. L3 cache
cannot coexist with other SSD strategies on the same node pool; no metadata read
acceleration, no metadata read/write acceleration, and no data on SSD. SSDs in an L3 cache
enabled node pool cannot participate as space used for GNA either.
If a node contains all SSDs or contains 16 or more SSDs, L3 cache cannot be enabled. As a
best practice, use at most two-to-three SSDs per L3 node.
For more comprehensive best practices and design considerations, review the EMC ISILON
ONEFS SMARTFLASH File System Caching Infrastructure whitepaper located on

Isilon Solution Design 160

Serviceability: isi_cache_stats


The isi_cache_stats command provides information for L1, L2, and L3 cache for both user
data and metadata. To view the cache statistics use the isi_cache_stats -v command. The
example output displayed was taken on a relatively new cluster with L3 cache newly
populating. Over time hit rates should improve with use.

Isilon Solution Design 161

Lesson 6: Read and Write Performance


After completing this lesson, you will be able to investigate SSD sizing and strategies, discuss
the A100 Performance Accelerator, and explain how a node handles read and write requests.

Isilon Solution Design 162

Overview: Read and Write Performance


OneFS uses advanced data layout algorithms to determine data layout for maximum
efficiency and performance. Data is evenly distributed across nodes in the node pool as it is
written. The file size also affects data layout because the system employs different layout
options for larger files than for smaller files to maximize efficiency and performance. When it
comes to tuning performance or isolating performance issues, the customer owns and
maintains the client and network points of the topology, making 66% of the solution the
customer responsibility. Isilon accounts for 33% of the solution.

Isilon Solution Design 163

Qualifying Questions


Namespace operations show up in many ways on an Isilon cluster. Many of the jobs
triggered by the job engine are namespace operations, so SSDs improve performance of
almost every job engine task. Tree walk-type jobs show a very large performance
improvement on systems where SSDs have been added. Wide and deep directories have
tens or hundreds of thousands of files per directory, and cluster-wide, ten million files or
more. A proof of concept or the results of an analysis are instrumental in understanding
workflows and starting a design at the right place.
How do you know how much metadata the customer has? In theory, it is technically possible
to calculate the amount of metadata, but that is generally not how we go about sizing. There
is a bit of chicken-egg to this problem. You are likely going to configure initially for an
existing or expected workflow and capacity, which may or may not include an accurate file

Isilon Solution Design 164

SSD Strategy


SSDs provide lower latency and increase IOPS, and allow the flexibility of a hybrid nodes with
optimum SSD + SATA/SAS ratios to address a wide range of workflows. In Isilon’s architecture,
SSDs primarily speed up namespace operations and actions involving metadata. Such
activities are at the heart of many common office activities.
SSDs can provide faster namespace operations for use cases such as file directory look-ups,
directory tree walks, access time updates, inode operations, and workflows with home
directories that have a lot of files for end-user storage. Also, SSDs can benefit applications
that generate relatively large numbers of files, have many files per directory, and have wide
and/or deep directory structures.
Though the assertion is difficult to quantify, experience indicates that the more files, the
more directories, the wider and deeper the directories…the more benefit you will see from
SSD and metadata acceleration. In some use cases, SSDs can store data to provide faster
read/write access to the data stored on SSDs such as metadata and data blocks. In this case,
there is no benefit to the remaining data in the node pool.

Isilon Solution Design 165

Workflows That Do Not Benefit from SSDs


It is important to understand that while SSDs can help improve the overall throughput a
node can achieve, if the workflow is such that the HDD spindles are I/O bound due to client
data read/write operations, there is little that can be done other than add more HDD
While the workflow may not benefit from the use of SSDs, note that the overall cluster
performance and subsequent customer satisfaction will be higher when SSDs are used for
metadata to improve the performance of internal operations in OneFS.
Traditional archive is a good example of a workflow with little SSD benefit. The application is
primarily writing, not reading and files may be rarely accessed.
Some workflows bind I/O operations to HDD spindles due to the nature of client read/write
operations. The SSDs can help overall node throughput but cannot assist in these specific
operations, regardless of defined policies. To improve performance in this scenario, add

Isilon Solution Design 166

SSD Sizing


The amount of metadata depends upon the files themselves: how many files, how many
directories, the depth and width of the directories, the number of inodes required in the
cluster, etc. These are often unknown and are subject to change over time as other
workflows get added to a cluster. For this reason, many field personnel utilize the
preconfigured node types, or the GNA rules. If your proposed deployment is going to include
SmartPools and multiple tiers, SSDs almost always improves the performance of data
movement policies.
The GNA rules have been arrived at after trial and error and some hard experiences. GNA is
namespace acceleration across multiple node pools. It is cluster wide and can be used to
accelerate non-SSD nodes. Without GNA, only those nodes with SSDs have metadata
accelerated. GNA can be enabled if 20% or more of the nodes in the cluster contain at least
one SSD and 1.5% or more of the total raw cluster storage is SSD-based. The 2% rule is when
the GNA is automatically disabled at any SSD rate of 1.5% or less, so for best results, ensure
that at least 2.0% of the total cluster storage is SSD-based before enabling GNA. This will
roughly equate to approximately 200 GB of SSD for every 10 TB of HDD capacity in the
cluster. The Isilon Sizing Tool is very useful when configuring clusters with SSDs. Following
the rules prevents capacity or performance oversubscription of a cluster’s SSD resources.
If you try to quote a new cluster at below 2% SSD capacity, you will have to get an exception
granted by Isilon engineering (contact your CSE to start the process). You will be asked for a
technical justification of why the exception is necessary. You can make the process of

Isilon Solution Design 167

seeking an exception more efficient by preparing the justification in advance.

Performance Accelerator


The A100 Performance Accelerator can be added seamlessly to the cluster to scale the
cluster’s performance. Adding performance independent of capacity reduces costs. The
A100 reduces latency and increases concurrent read throughput for a cached data set by
serving data from RAM (256GB). Supports highly parallel workloads.
Accelerator nodes do not allocate memory for level 2 cache. This is because accelerator
nodes are not writing any data to their local disks, so there are no blocks to cache. Instead
accelerator nodes use all their memory for level 1 Cache to service their clients. Cache is
used differently in the accelerator nodes. Since an accelerator has no local disk drives
storing file system data, its entire read cache is L1 cache, since by definition all the data
handled by an accelerator is remote data. The cache aging routine in the accelerator cache is
LRU-based, as opposed to the drop-behind used in storage node L1 cache. This is because
the size of the accelerator’s L1 cache is larger, and the data in it is much more likely to be
requested again, so it is not immediately removed from cache upon use. In a cluster
consisting of storage and accelerator nodes, the primary performance advantage of
accelerators is in being able to serve more clients, and potentially hold a client’s working set
entirely in cache.

Isilon Solution Design 168

Anatomies of a Read and Write


Let’s illustrate three examples of Isilon caching. First, we’ll look at how read caching is done,
then asynchronous writes, and finally we’ll look at how we handle synchronous write
Let’s start with a read.
When a client requests a file, the node to which the client is connected uses the isi get
command to locate all the file’s data blocks. In this illustration, the client is connected to
node 2. The first file inode is loaded and the file blocks are read from disk on all other nodes.
If the data was recently written, it may already be in L2 cache and there is no need to load
from disk. From there it is loaded directly from L2 cache into L1 cache. If the data isn’t
already in the L2 cache, data blocks are copied into the L2. Non-local data blocks are sent
from other nodes over the back-end network. The same process of loading the data from
disk or cache is done on these remote nodes. Once all data is received on the local node, the
file is reconstructed in L1 cache and sent to the client.
Move the slider to anatomy of a write to examine how caching handles an asynchronous
Anatomy of Write

Isilon Solution Design 169

When a client requests that a file be written to the cluster, the node to which the client is
connected is the node that receives and processes the file. In this illustration, the client is
connected to node 2. Node 2 creates a write plan for the file including calculating Forward
Error Correction. Data blocks assigned to the node are written to the NVRAM of that node
for future reads. Data blocks assigned to other nodes travel through the back-end network
to their L2 cache, and then to their NVRAM. Once all nodes have all the data and FEC blocks
in NVRAM, a commit is returned to the client. Data block(s) assigned to this node stay cached
in L2 for future reads of that file. Data is then written onto the spindles.
The layout decisions are made by the Block Allocation Manager, or BAM, on the node that
initiated a particular write operation. The BAM makes the decision on where best to write
the data blocks to ensure the file is properly protected. To do this, the BAM Safe Write
process or BSW, generates a write plan, which comprises all the steps required to safely
write the new data blocks across the protection group. Once complete, the BSW will then
execute this write plan and guarantee its successful completion. OneFS will not write files at
less than the desired protection level, although the BAM will attempt to use an equivalent
mirrored layout if there is an insufficient stripe width to support a particular FEC protection
Finally, move the slider to anatomy of a synchronous write. Here we’ll illustrate how caching
handles an synchronous write.
Anatomy of Synchronous Write
This is an example of the synchronous write of a new file, and how the write process occurs
in OneFS with Endurant Cache. Shown is a NFS client sending 4KB blocks writing a 512KB file
with a simple acknowledgement to be returned after the entire file is written. We’ll assume
an N+1 protection level.
First, a client sends a file to the cluster requesting a synchronous write acknowledgement.
The client begins the write process by sending 4KB data blocks. The blocks are received into
the node’s Write Coalescer; which is a logical separation of the node’s RAM similar to, but
distinct from, L1 and L2 Cache. The point of the ACK request varies depending on the
application, and the form of the ACK request also varies based on the client protocol.
Endurant Cache manages how the write request comes into the system.
Once the entire file has been received into the Write Coalescer, the Endurant Cache
Logwriter Process writes mirrored copies of the data blocks, with some log file-specific
information added, in parallel to the Endurant Cache Log Files, which reside in the NVRAM.
The Endurant Cache Log Writer writes the mirrors to the Endurant Cache Log File in NVRAM
on different nodes. The protection level of the mirrored Endurant Cache Log Files is based
on the Drive Loss Protection Level assigned to the data file to be written; the number of
mirrored copies equals two, three, four or five times.
Once the data copies are received into the EC Log Files, a stable write exists and the write
acknowledgement is sent back to the client, indicating that a stable write of the file has
occurred. The client assumes the write is completed and can close out the write cycle with its
application or process. At this point the client considers the write process complete. The
latency or delay time is measured from the start of the process to the return of the
acknowledgement to the client. This process is very similar to many block storage systems.
From this point forward, the standard asynchronous write process is followed.

Isilon Solution Design 170

How is it determined when the acknowledgement is returned to the client? The answer, like
many with technology, is…it depends. It depends on the application and its interaction with
the protocol, as applications are designed to receive acknowledgements at specific block size
points. It also depends upon the protocol and when the protocol makes the request to the
storage system, usually at the behest of the application. So for some applications and
protocols the acknowledgement request could be as little as for every 4K or 8K block sent, or
it could be at different incremental sizes, or it could be after an entire file write has been

Anatomy of Write

Isilon Solution Design 171

Anatomy of Synchronous Write

Isilon Solution Design 172

SSD Strategies


A solution using SSDs requires some consideration on how to employ the SSDs. Apply SSDs
tactically. SSDs will almost always help a cluster to perform better, especially when
considering internal maintenance. As a good practice, it’s not recommended to use SSDs for
both L3 cache and GNA. Generally, SSDs are best at accelerating metadata operations, but
can also store data in order to accelerate read-intensive workflows.
Putting data on SSDs accelerates reads, and generally does not accelerate data writes-
metadata writes work differently than data writes and SSDs do help metadata writes. SSDs
offer almost no benefits to streaming write performance, and in some cases can even
reduce that performance because SSDs mean fewer spindles absorbing the ingest load. For
write intensive workflows, consider metadata read/write acceleration. This may require 2 to
3 time more SSD capacity than metadata read. Using metadata-write (all metadata on SSD) is
especially good for large files that are randomly accessed, such as VMDK files, iSCSI files,
database files, etc. GNA is an option for clusters including nodes without SSD but scaling will
be significantly more complex and expensive.
When trying to design a SSD strategy, a good rule of thumb is that of all the various
namespace operations, the ones that matter the most to SSD performance are reads and
writes. Reads and writes matter more than all other namespace operations combined.
Note that L3 consumes all SSDs and cannot coexist with other SSD strategies with the
exception of GNA. However, since they’re exclusively reserved, L3 Cache node pool SSDs
cannot participate in GNA.

Isilon Solution Design 173

For a comprehensive list of best practices and considerations, reference the EMC ISILON
OneFS SMARTFLASH File System Caching Infrastructure White Paper (Jan 2016).

Considerations: Read and Write Performance


As a general practice, always consider SSDs. SSDs improve performance on the vast majority
of Isilon installations. Take the approach to quote SSDs, and then remove from the proposal
only if the customer insists. Potential symptoms due to absence of SSD for metadata include
prolonged maintenance operations, excessive tree walk times, analyzer jobs that never
complete, and excessive HDD contention and metadata operations starving throughput.
When SSD space overflows (at <1.5%), OneFS abandons all GNA, and reverts to only
accelerating metadata on nodes with SSD. Because of the 2% ratio, future planning of cluster
expansion needs to be taken into consideration, for example, only adding additional NL
nodes to a cluster already at minimum threshold will disable GNA. Sometimes more than 2%
may be needed if you have a significant amount of metadata such as when there is a high
ratio of small files to total capacity. The performance impact of overflowing available SSD
storage can be severe.
Underuse of SSDs will backfire (as well as damage a long-term customer relationship) as
soon as the customer’s data grows. Once reaching the threshold of having less than 2% of
their storage in SSDs, performance will plunge - and so will your credibility as their technical

Isilon Solution Design 174

In NL and HD series nodes, L3 cache is enabled by default and cannot be disabled. L3 cache
runs in metadata only mode.
Clusters with less than five nodes may run high IOPS because OneFS striping has to hit every
node per write.



Two internal tools that can help view performance statistics are InsightIQ and the CLI isi
commands. The isi statistics command has approximately 1,500 combinations of data you
can display as statistical output of cluster operations. Other Isilon services, such as InsightIQ,
the web administration interface, and SNMP, gather needed information using the isi
statistics command. The isi statistics command enables you to view cluster throughput
based on connection type, protocol type, and open files per node. You can also use this
information to troubleshoot your cluster as needed. In the background, isi_stats_d is the
daemon that performs a lot of the data collection. To get more information on isi statistics,
run man isi statistics from any node. You can also use the --help options. For example: isi
statistics system --help
The isi get command displays information about a set of files, including the requested
protection, current actual protection, and whether write-coalescing is enabled.

Isilon Solution Design 175

Network traffic, network misconfiguration, client or cluster processing load, or a combination
thereof typically causes performance issues. As other lessons in this course detail, if
InsightIQ is present, you can use it (or other EMC or third-party packet tools) to analyze what
types of namespace operations occurring most frequently, and in what proportions relative
to one another. You can easily drill-down to the per protocol breakdown and view the
percentage of namespace vs. non-namespace operations. This data can be used to
determine an appropriate SSD strategy.



Noted here are some tools that can also aid in understanding performance in an
environment. A more comprehensive look can be found in a later module.
Iometer measures I/O sub-system and characterization for a controlled workflow. It can
identify read and write latencies and profiles client caching, measures file server
performance, and you can stress and scale testing.
Iperf measures TCP and UDP streaming throughput between two end-point (e.g., client >
network > servers). Iperf can help measure a result from one compute node and compare to
IOZone measures file-system I/O operations using a variety of I/O APIs. Use when you want
to isolate different types of API calls.

Isilon Solution Design 176

Itrace measures system activity and scheduling. This shows kernel process stacks based on
what they are sleeping on. You can use this to find operations where we sleep/wait for a long
time in long running processes.

Isilon Solution Design 177

Module 3: Networking


Upon completion of this module, you will be able to explain Isilon network integration into
the data center, discuss multi-tenancy, show how SmartConnect optimizes networking, and
explain access zones.

Isilon Solution Design 178

Lesson 1: Networking


Upon completion of this lesson, you will be able to identify properties of front-end NICs,
examine NIC aggregation, and differentiate SBR and default routing in OneFS.

Isilon Solution Design 179

Overview: Isilon Networking


Isilon has many different components allowing a simple cluster implementation or one that
involves complex workflows with complex configurations. Knowing how the internal features
interact is integral to positioning, designing, and servicing the cluster. Client computers are
connected to the cluster through the external network. Illustrated are X-Attire systems and
GearItUp systems connecting to an eight node Generation 6 (Gen 6) cluster through the
external network. The cluster’s external network configuration is built with groupnets,
subnets, and IP address pools. You can create node provisioning rules that automate the
configuration of new network interfaces. For instance, adding a new node to the cluster will
automatically add the new node’s interfaces to the IP address pool.
Using what we have learned so far in the course, keep in mind the following when
considering our questions and introducing the front-end hardware: Clients can access their
files via a node in the cluster because the nodes communicate with each other via the
Ethernet or InfiniBand back-end to locate and move data. Any node may service requests
from any front-end port. There are no dedicated controllers. File data is accessible from all
nodes via all protocols. Nodes communicate internally. Clients can connect to different
nodes based on performance needs.
You need to know whether they’ll be connecting directly into core switches, connecting into
access switches for their storage, and whether they’ll be over-subscribed. Enterprise
networks can have aggressive oversubscription. Uplinks between switches should be
adequately sized for the network load. Here is the graphic that illustrates the external

Isilon Solution Design 180

networking components of a cluster, but with a deeper dive as it applies to the scenario.
SmartConnect and access zones will be covered later in this module, but this architecture
picture can serve as an illustration to the networking concepts of a cluster. Groupnets are at
the top of the networking hierarchy. Each groupnet can have multiple subnets and each
subnet has IP address pools.

Qualifying Questions


When first working with a customer it is essential to understand their network topology and
how all of their systems and equipment interact. You cannot assist in a design if you do not
know about the network connectivity, client distribution, types of protocols, subnets, frame
sizes and whether or not they are using VLANs. The slide lists the minimum basic questions
that should be asked.
Ask the ‘Big Picture’ questions and do the research to determine the types of workflow in the
environment, what the SLAs are, are VLANs used, and their available IP ranges. Many of the
questions around networking and connectivity are those conducted during the initial
interview, but others such as routing and multi tenancy may come later in the design
Regarding the last bullet point: not all nodes need to be connected to the network in order
to provide data services. The capacity and raw performance of a node that is not connected

Isilon Solution Design 181

to the network, is still available through its connection to the other nodes. Some customers
connect only a handful of the nodes in their cluster to the network, because they want the
other nodes mainly for capacity. NANON is the abbreviation for Not All Nodes On Network
and although it is possible it is not the recommended configuration because certain features,
such as anti-virus, will not work if all the nodes are not connected to the network. NANON
should only be used in very limited circumstances and is not a best practice. Nodes not-on-
the-network (not on Ethernet) cannot send SNMP traps, nor send notifications, emails, or
Isilon log sets. These nodes are disconnected from the rest of the world, and are connected
only to the cluster itself; only on-network nodes can complete these functions on behalf of
the rest of the cluster.

Network Sizing


At this point some additional basic questions should be asked of the customer such as: Will
all nodes be on the network? 10GbE or 40GbE? Will link aggregation be used? If the customer
does need link aggregation, the recommended configuration is Link Aggregation Control
Protocol (LACP), as opposed to static trunking.
You should size the network connectivity with throughput in mind so as to reduce
bottlenecks. When sizing, consider how many Mbps or Gbps the cluster needs to
accommodate? What is the acceptable latency? Is the connectivity to a LAN or WAN or both?
Consider different client groups with perhaps different throughput needs. For example, a

Isilon Solution Design 182

media and entertainment (M&E) group will perhaps need more throughput and less latency
than for example a client group accessing home directories.



Shown here are some terms used when discussing performance. Not all the terms apply to
network performance, but network performance can influence how the cluster performs.
Note that benchmarking numbers are built on best case - in some verticals and specific
workflows, actual performance may be lower. Latency can dominate the discussion if
storage interactions involve many, small, serialized operations as opposed to large, coherent
Click on the buttons to learn more about performance.
IOPS Benchmarks:
We use SPEC benchmarks as an analogue of real-world customer needs, and the operations
per second performance is a good representation of what is possible. The actual blend of
operations is part of what makes up the SPEC benchmark, but it is a set of typical file system
operations such as reads, writes and listings. The latest benchmarks reveal that our
throughput rates are not only excellent, but that there's a wide range of capacity available to
meet the needs of our very diverse customer body.

Isilon Solution Design 183

File system operations do not only rely on the speed of the storage itself but also on the
computing capacity of the hardware. This is one reason for the upgrade in the CPU and RAM
capacity from Gen 5 to Gen 6 nodes. Every operation relies upon the CPU's capacity to
perform integrity tests, run networking code and so on. A faster CPU adds up to faster
At the same time, the software tweaks have made a substantial difference as well. We get
more mileage from every CPU cycle when less of it is wasted.
All these numbers are based on 4 node clusters, which is the minimum cluster size for the
new hardware packages.
Ops per Node Benchmarks:
Benchmark operations rates on the new hardware are equally impressive compared to the
previous generation. The streaming read and write operations are valuable for the media
and entertainment industry, especially in the context of 4K video streaming, but EDA and
other development environments rely a lot more on the rate of operations, performing tasks,
such as creating and deleting and listing files.
IOzone Streaming Benchmarks for NFSv3:
IOzone is a benchmarking tool for file systems, and generates a given style of load on
demand. This graph measures what kind of streaming performance we can achieve, in
megabytes per second, from our new nodes. The two bars represent streaming reads and
streaming writes, respectively. These numbers are for the minimum cluster size, which is
four nodes. The new nodes all have SSDs for L3 cache, so this streaming measurement
already incorporates that facility.
Our drive management software is already well tuned for streaming from spinning discs, so
the performance benefit of SSDs is not as dramatic as it would otherwise be. Even so, the
top range F800 nodes deliver impressive rates of over 16GB/s across the cluster.
Data writes are more demanding than data reads because the nodes need to calculate
protection and distribute blocks across the cluster. A single write can be quite quick because
the journal can easily store the data and acknowledge success to the client, but streaming
writes over time will overwhelm the journal's immediate capacity and slow to the maximum
pace of the drive system.
Despite these challenges, H600 and F800 4-node clusters can accept over 7 or 8 GB/s
streaming write rates respectively.
Gen 5 vs. Gen 6: Throughput per Node:
This graph displays how large the difference is between the capacities of the previous
generation of nodes and the current generation. The headline figures are rather deceptive,
because these are strictly per-node figures whereas four new generation nodes fit into the
same rack space as a single HD400 or X410 or NL410 node. On a per-rack-unit basis, as well
as the basis of aggregate capacity, the A200 node type is competitive with the previous
generation's dense storage units. On a similar basis, the H500 is competitive with the
previous generation's top-of-the-line S210 node. This is a consequence of how we have
rebalanced the CPU-per-node structure of our storage - an old node is not the same storage
capacity as a new node.

Isilon Solution Design 184

The story with read bandwidth per node is similar to that of write bandwidth, but with higher
numbers because read operations are faster than write operations, not least because of the
lower processing overhead. Even so, the general results are very similar in outlook. The top-
of-the-line F800 is head and shoulders above any alternative on a per-node basis, and even
the lower-end new nodes are, on a per-rack-unit basis, quite competitive with the older node

IOPS Benchmarks

Isilon Solution Design 185

Ops per Node Benchmarks

Isilon Solution Design 186

Streaming for NFSv3

Isilon Solution Design 187

Gen 5 vs. Gen 6: Throughput

Isilon Solution Design 188

Link Aggregation


Link aggregation, also known as NIC aggregation, is primarily for NIC failover purposes,
enhancing failure resiliency. It is not a performance enhancing option, and can in fact reduce
performance. Link aggregation is an optional IP address pool feature that allows you to
combine the bandwidth of a single node’s physical network interface cards into a single
logical connection for improved network throughput and redundancy. For example, if a node
has two or four physical Gigabit Ethernet (GigE) interfaces on the external network, the ports
are logically combined to act as one interface (three ports are not aggregated). Note that the
aggregated NICs are used for client I/O but the two channels are not “bonded” to single 2/20
Gigabit link.
The link aggregation mode determines how traffic is balanced and routed among
aggregated network interfaces. The aggregation mode is selected on a per-pool basis and
applies to all aggregated network interfaces in the IP address pool. OneFS supports dynamic
and static aggregation modes. A dynamic aggregation mode enables nodes with aggregated
interfaces to communicate with the switch so that the switch can use an analogous
aggregation mode. Static modes do not facilitate communication between nodes and the
Round-robin: Static aggregation mode that rotates connections through the nodes in a first-
in, first-out sequence, handling all processes without priority. Balances outbound traffic
across all active ports in the aggregated link and accepts inbound traffic on any port.
Note: This method is not recommended if the cluster uses TCP/IP workloads.

Isilon Solution Design 189

Active/Passive Failover: Static aggregation mode that switches to the next active interface
when the primary interface becomes unavailable. The primary interface handles traffic until
there is an interruption in communication. At that point, one of the secondary interfaces will
take over the work of the primary.
Link Aggregation Control Protocol (LACP): Dynamic aggregation mode that supports the
IEEE 802.3ad Link Aggregation Control Protocol (LACP). You can configure LACP at the switch
level, which allows the node to negotiate interface aggregation with the switch. LACP
balances outgoing traffic across the interfaces based on hashed protocol header information
that includes the source and destination address and the VLAN tag, if available. This option is
the default aggregation mode.
Fast EtherChannel (FEC): Static aggregation method that accepts all incoming traffic and
balances outgoing traffic over aggregated interfaces based on hashed protocol header
information that includes source and destination addresses.



Virtual LAN (VLAN) tagging is an optional front-end network subnet setting that enables a
cluster to participate in multiple virtual networks. A VLAN is a group of hosts that
communicate as though they are connected to the same LAN regardless of their physical
location. VLAN support allows participation on multiple subnets without multiple network

Isilon Solution Design 190

switches. Also, security and privacy is increased because network traffic across one VLAN is
not visible to another VLAN.
A non-aggregated Ethernet interface can have only one VLAN configured on the interface
whereas an aggregation can have two or more VLANs configured. VLAN tags are set on the
cluster. To correctly deliver the traffic on a trunk port with several VLANs, the device uses the
IEEE 802.1Q encapsulation (tagging). Packets that are encapsulated for several different
VLANs can traverse the same aggregated port and maintain traffic separation between the
VLANs. The switch port needs to be configured for that VLAN ID and configured as a trunk
port if multiple VLANs are configured for the external physical port of a cluster node.



NANON (not all nodes on network) allows a cluster to expand, not for more front-end I/O,
but for the need for additional capacity. Isilon clusters can get big, very big. Imagine a 15-
node X410 cluster, with 2x10Gbe links per node. The total potential bandwidth at that point
is 2x10x15=300Gbps, or 37.5GBps. In most cases adding more nodes at this point is going to
be done for capacity and aggregated cache/CPU/disk spindle count reasons, rather than
front-end IO. As a result, some customers choose to stop connecting additional nodes to the
front-end network, because the cost of network switches and optics cannot be justified. This
decision has pros such as lower network cost and non-network connected nodes can have
maintenance performed at any time. As long as enough nodes are online to meet protection

Isilon Solution Design 191

criteria, patches, firmware updates, etc., are never disruptive to clients on these nodes.
There are, however, certain features, that need network connectivity that make NANON an
inadvisable configuration. Features such as anti-virus require all the nodes that access files
to have IP addresses that can reach the ICAP (Internet control adaptation protocol) server.
Quota notifications won’t work with a NANON cluster. If this is required, please contact
technical support for assistance. ESRS does not require all nodes to be on the external
network, because other nodes that are online can proxy out ESRS dial-home events. Make
sure that the ESRS service can reach external servers so that you can properly register every
node with the ESRS gateway servers.
Additionally, the lowest LNN (logical node number) should always be connected as there are
cluster wide notifications that go out via the LNN. If using SMB, it is recommended to have all
nodes connected to the network as the LNN needs to communicate notifications, ESRS data,
and log files from the cluster, as well as ensure there are no clock skew or time issues.
The recommended best practices would be to connect all nodes to the network with an
assigned IP address.

Default Routes Compared to SBR


Routing is the process of determining how to get IP packets from a source to a destination.
Source Based Routing, or SBR, simplifies routing when there are multiple access routes and

Isilon Solution Design 192

the default gateway does not appear to be the best route available. Shown here, the client
must send a packet to the Isilon cluster at IP address First, the client determines
that the destination IP address is not local and it does not have a static route defined for that
address. The client sends the packet to its default gateway, Router C, for further processing.
Next, Router C receives the packet from the client and examines the packet’s destination IP
address and determines that it has a route to the destination through the router “A” at Then, router A receives the packet on its external interface and determines that it
has a direct connection to the destination IP address, Router A sends the packet
directly to its destination using its internal interface on the 40GbE switch.
Next, the Isilon must send a response packet to client. Without SBR, it determines that the
destination IP address,, is not local and that it does not have a static route defined
for that address. OneFS determines to which gateway it must send the response packet
based on its priority numbers. Gateways with lower priority numbers have precedence over
those with higher numbers. OneFS has two default gateways: with a priority of 1 and with a priority of 10. OneFS chooses the gateway with the lower priority number and
sends the packet to gateway to the 10 GbE switch, not the 40 GbE switch.
Were SBR enabled, the cluster would not refer to the default gateway, but instead examine
the MAC address on the packets it received, and respond to that address. This means that
the 40GbE switch would be used, even if the default gateway were something else. SBR does
not override statically configured routes. It only replaces default routes for responses to
incoming connections. SBR is a cluster-wide configuration option.
Once the response has reached Router A, it travels back to Router C through the core
network, and, finally, returns to the client.

Isilon Solution Design 193

Considerations: Networking


IP address pools for a subnet can either be IPv4 or IPv6, not both. Multiple subnets are
required if employing both IPv4 and IPv6 ranges.
Though SBR was developed to be enabled or disabled as seamlessly as possible, when
enabling, packets leaving the cluster may be routed differently. How this affects a customer
depends on their network setup. Consider enabling source-based routing when
implementing a cluster in a large network with a complex topology. For example, if the
network is a multi-tenant environment with several gateways, traffic is more efficiently
distributed with source-based routing.
Jumbo Frames are not a silver bullet and you should expect to see typically 2-5%
performance increase on modern hardware. Remember, the entire connected infrastructure
should be configured to work with jumbo frames, or you may see packet fragmentation
reducing throughput. Jumbo Frames rely upon large communications to make them
worthwhile, otherwise they actively reduce performance. Understanding their constraints
will give you a good idea of where bottlenecks and choke points could lie.
With Gen 5, mixed interface types cannot be aggregated, meaning that a 10 GigE must be
combined with another 10 GigE, and not with a 1 GigE. Mixing would result in intermittency
on single interfaces. You cannot aggregate a NIC from node1 and a NIC from node2. When
planning link aggregation, remember that pools using the same aggregated interface cannot
have different aggregation modes. For example, if they are using the same two external
interfaces, you cannot select LACP for one pool and round-robin for the other pool. A node’s

Isilon Solution Design 194

external interfaces cannot be used by an IP address pool in both an aggregated
configuration and as individual interfaces. You must enable NIC aggregation on a cluster
before enabling on the switch in order to avoid data unavailability. Doing it on the switch
first may stop communication from the switch to the cluster and result in unexpected
Tracing routes can be used to ensure network traffic flows as expected. Use source based
routing to keep network traffic on the right path.



Using the isi network interfaces list -v command, you can see both the interface name and
its associated NIC name. For example, ext-1 would be an interface name and em1 would be
a NIC name. NIC names are required if you want to do a tcpdump and may be required for
additional command syntax. It is important to understand that the Ethernet ports can be
identified by more than one name.
SBR is enabled from the CLI. This action cannot be done via the web administration interface.
SBR can be enabled or disabled by running the isi network external modify command as
shown. To view if SBR is enabled on a cluster, you can run the isi networks eternal view
command. In the output, if SBR is not enabled on the cluster, Source Based Routing is False.
If SBR is enabled, Source Based Routing is True.

Isilon Solution Design 195

Lesson 2: Multi-tenancy


Upon completion of this lesson, you will be able to discuss multi-tenancy and explain

Isilon Solution Design 196

Overview: Multi-tenancy


In the computer realm, multi-tenancy is defined as the ability to host multiple customers in a
single cloud, application, or storage device. Each customer in that environment is called a
tenant. In our X-Attire scenario, the solution needs to treat each business unit as a separate
and unique tenant with access to the same cluster. With OneFS, multi-tenancy refers to the
ability of an Isilon cluster to simultaneously handle more than one set of networking
configurations. Multi-Tenant Resolver, or MTDNS, refers to the subset of that feature
pertaining specifically to hostname resolution against DNS name servers. These features are
available to customers in OneFS 8.0. Each tenant on the cluster can have its own network
settings. Prior to OneFS 8.0, only one set of DNS servers could be defined on the cluster: this
was a global cluster setting. Isilon is now able to host multiple networks with multiple DNS
servers using a groupnet.
Groupnets are the configuration level for managing multiple tenants on the cluster’s external
network. Even if there are no plans to use multi-tenancy, it is a good practice to organize
data based on access zones, both for security purposes and to enable compartmentalization
of failover by, for instance, AD domain.

Isilon Solution Design 197



Groupnets (introduced in OneFS 8.0) are how the cluster communicates with the world. If
the cluster needs a connection to a second, unique authentication domain, it needs to know
how to find that domain and requires a DNS setting to know how to route out to that
Groupnets store all subnet settings, they are the top-level object. Groupnets can contain
individual DNS settings that were a single global entry in previous versions. OneFS creates
groupnet0 by default. You only need to configure another groupnet if separate DNS settings
are required, otherwise the cluster will run perfectly well under groupnet0.
Access zone and authentication providers must exist within one and only one groupnet, and
must reside in same groupnet to associate with one another.
Conceptually it would be appropriate to think of groupnets as a networking tenant. Having
multiple groupnets on the cluster means that you are configuring access to completely
separate and different networks. This is the configuration level for managing multiple
tenants on your external network. Different groupnets allow portions of the cluster to have
different networking properties for name resolution. Additional groupnets should be created
only in the event that a customer requires a unique set of DNS settings.
Subnets simplify network management and define a range of IP addresses (called pools). IP
address pools can be created within subnets to partition network interfaces according to
workflow or node type. IP address pools can be associated with network interfaces on

Isilon Solution Design 198

cluster nodes. Client connection settings are configured at the IP address pool level.

Considerations: Multi-tenancy


There is no need to create multiple groupnets unless there is a need for two separate sets of
DNS settings. Groupnets are an option for those clusters that will be hosting multiple
companies, departments, or clients that require their own DNS settings. Follow the proper
creation order to eliminate frustration. You cannot create these out of order because the
configuration of one object is dependent upon the previous.
In a multiple tenant solution, with OneFS 8.0 and later a share can be mapped across access
zones. Combining namespaces and overlapping shares is an administrative decision.

Isilon Solution Design 199

Create Networking and Access Zone Environment


When creating a groupnet with access zones and providers in the same zone, you have to
create them in the proper order. Shown here is the use of the CLI. The WebUI can also be
used to configure groupnets.
1. First you create the groupnet using the isi network groupnets command shown.
2. Then you create the access zone and tell it which groupnet you want to associate it with.
This is done using the isi zone zones create command.
3 & 4. Once that is done, you then create the networking information; subnets and pools
using the isi network subnets and isi network pools create commands.
You must create the access zone after the groupnet because when you create the
networking/pool you must reference at the access zone.
5. Then you add your provider(s) and point it/them to the groupnet. This is done using the isi
auth ads create command.
6. Finally you associate your authentication providers with your zone using the isi zone
zones modify command.

Isilon Solution Design 200

Lesson 3: SmartConnect


Upon completion of this lesson, you will be able to discuss SmartConnect’s networking role,
distinguish between static and dynamic pools, and evaluate SmartConnect Best Practices.

Isilon Solution Design 201

Overview: Client Access


One of the key features of Isilon clusters is just that: they’re clusters. The cluster architecture
itself allows for redundancy, but it also presents a challenge for load balancing and failure
transparency from a client perspective. It may be rare, but network interfaces, and even
entire nodes do fail. When that happens, clients must be able to seamlessly connect and
continue working.
One might be tempted to just use ordinary load balancing tools from a networking company
to achieve this, but such devices have no reference to the Isilon cluster internals, and thus
their load balancing falls short.
Isilon addresses this scenario using SmartConnect.
SmartConnect is one of the most valuable technologies within the Isilon platform.
SmartConnect is a client connection balancing management feature (module) that enables
the balancing of client connections across selected nodes in a cluster. It does this by
providing a single virtual host name for clients to connect to, which simplifies connection
mapping. SmartConnect’s role is to lead clients to nodes that are responsive, as well as avoid
crowding. The cluster appears as a single network element to a client system. Both cluster
and client performance can be enhanced when connections are more evenly distributed.
This leads to a robust client experience in terms of both stability and performance. Also,
SmartConnect can remove nodes that have gone offline from the request queue, and
prevent new clients from attempting to connect to a node that is not available. In addition,
SmartConnect can be configured so new nodes are automatically added to the connection

Isilon Solution Design 202

balancing pool.
Often access zones and SmartConnect are misunderstood or used synonymously, but in fact
they are distinctly different and dependent on one another. SmartConnect deals with getting
the clients from their devices to the correct front-end interface on the cluster. The key is the
“correct” front-end interface for their job function/segment/department. Once the client is at
the front-end interface, the associated access zone then authenticates the client against the
proper directory service.
SmartConnect provides name resolution for the cluster, enabling client connections to the
storage cluster using a single host name or however many host names a company needs.
SmartConnect eliminates the need to install client side drivers, enabling administrators to
manage large numbers of clients in the event of a system failure.
The SmartConnect Advanced license has intelligent algorithms (CPU utilization, aggregate
throughput, connection count or Round-robin) and distributes clients across the cluster to
optimize client performance. It provides dynamic NFS failover and failback of client
connections across storage nodes to provide optimal utilization of the cluster resources.

SmartConnect Architecture


Let’s take another look at our X-Attire and GearItUp topology. SmartConnect can be
configured into multiple zones that can be used to ensure different levels of service for

Isilon Solution Design 203

different groups of clients. For example, here SmartConnect directs X-Attire users to F800
nodes for their needed performance whereas GearItUp users access the H400 nodes for
general purpose file sharing. All of this is transparent to the end-user. The SmartConnect
Service IP (SSIP or SIP) is one IP address that is part of the subnet. Do not put the SIP in an
address pool. The SIP is a virtual IP within the Isilon configuration, it is not bound to any of
the external interfaces.
To configure SmartConnect, you must also create records on the customer’s DNS servers. If
the clients use DNS for name resolution, the DNS server needs to be configured to forward
cluster name resolution requests to the SmartConnect service on the cluster.

Qualifying Questions


Some preliminary information that can determine how the solution is implemented is
knowing whether SmartConnect is licensed to take advantage of the advanced functionality.
Dynamic IP allocation and multiple subnet options require SmartConnect Advanced. Barriers
may include the inability to configure a DNS host record. Clients that perform DNS caching,
such as Mac OS X in certain configurations, might not connect to the node with the lowest
load if they make multiple connections within the lifetime of the cached address. If multiple
IP address pools are required, ensure that there are enough addresses in the environment
to accommodate.

Isilon Solution Design 204

SmartConnect Components


The SIP will never be put into one of the pools, the same way you would not put a static
server IP address into a DHCP scope. The SIP resides on the node with the lowest logical
number. If that node goes down, the SIP seamlessly moves to the next lowest logical node
number. For example, if you had a 5 node cluster and the SIP was answering DNS queries
from node 1, if node 1 went down, the SIP would move to node 2 and node 2 would start
answering the DNS queries. The SmartConnect zone name is a friendly fully-qualified domain
name (FQDN) that users can type to access the cluster.

Isilon Solution Design 205

SmartConnect Licensing


In traditional NAS scale-up solution, the file system, volume manager, and the
implementation of RAID are all separate entities. Each entity is abstracted from the other.
The functions of each are clearly defined and separate. In scale-up solutions you have
controllers that provide the computational throughput and are connected to trays of disks.
The disks are then carved up into RAID groups and into LUNs. If you need additional
processing, you can add an additional controller, which can run Active/Active or
Active/Passive. If you need additional disk, you can add another disk array. To administer
this type of cluster, there is an overarching management console that allows for single seat
administration. Each of these components are added individually and may have an upper
limit of 16TB although some solutions may be higher. This type of solution is great for
specific types of workflows, especially those applications that require block-level access.
In a scale-out solution, the computational throughput, the disk and disk protection, and the
over-arching management can be combined and exist within a single node or server. OneFS
creates a single file system for the cluster that performs the duties of the volume manager
and applies protection to the cluster as a whole. There is no partitioning, and no need for
volume creation. Because all information is shared among nodes, the entire file system is
accessible by clients connecting to any node in the cluster, this is the point SmartConnect
enables. Because all nodes in the cluster are peers, the Isilon clustered storage system also
does not have any master or slave nodes. All data is striped across all nodes in the cluster.
As nodes are added, the file system grows dynamically and content is redistributed. Each

Isilon Solution Design 206

Isilon storage node contains globally coherent RAM, meaning that as a cluster becomes
larger, it also becomes faster. Each time a node is added, the cluster’s concurrent
performance scales linearly.
As shown, the cluster inherently includes the SmartConnect basic functionality. For greater
flexibility and control, the advanced license will be needed.
Note that in OneFS versions 8.1 and later Isilon is now bringing its feature licensing system
into parity with the other Dell EMC legacy storage systems. Internally, the system is known as
ELMS; the Electronic Licensing Management System. Customers may come across it as
Software Licensing Central. This is not a new system, but a proven system with years of
history and practical use behind it. Legacy Isilon installations are not being moved to
eLicensing, but will remain on their current system unless they upgrade to OneFS 8.1. Click
on the information button to see the high level notes.

IP Address Pools


IP address pools are allocated to a set of external network interfaces. The pools of IP
address ranges in a subnet enable you to customize how users connect to your cluster.
Pools control connectivity into the cluster by allowing different functional groups, such as
sales, engineering, marketing, etc., access into different nodes. The combination of address
pools and access zones is what directs client access to the groupnets.

Isilon Solution Design 207

This is of vital importance in those clusters that have different node types. Let’s say GearItUp
adds 4 F800 nodes for a video media group and wants the video media team to connect
directly to the F800 nodes to use a variety of high I/O applications. The administrators can
separate GearItUp’s connection. Access to the home directories will connect to the front-end
of the H400 nodes while the video media group will access the F800 nodes. This
segmentation will keep the home directory users from using bandwidth on the video media
team’s F800 nodes.
The first external IP address pool and IP subnet, subnet0, is configured during the
initialization of the cluster. The initial default IP address pool, pool0, was created within
subnet0. It holds an IP address range and a physical port association. Additional subnets can
be configured as either IPv4 or IPv6 subnets. Additional IP address pools can be created
within subnets and associated with a node, a group of nodes, or network interface card, or
NIC, ports.

Static vs. Dynamic Pools


When configuring IP address pools on the cluster, an administrator can choose either static
pools or dynamic ones. A static pool is a range of IP addresses that allocates only one IP
address to each network interface. Static Pools do not reallocate addresses in the event of
hardware failures. If there are more IP addresses than nodes, as seen here, the additional IP
addresses will wait to be assigned when additional nodes are added to the cluster. When

Isilon Solution Design 208

that happens, the next IP address from the range (in this case .13) is assigned. Static pools
are best used for SMB clients because SMB is a stateful protocol. When an SMB client
establishes a connection with the cluster the session or “state” information is negotiated and
stored on the server, or node in this case. If the node goes offline the state information goes
with it, and the SMB client must reestablish a connection to the cluster. SmartConnect will
hand out the IP address of an active node when the SMB client reconnects.
Dynamic pools are best used for NFS clients. Dynamic pools assign out all the IP addresses in
their range to the NICs on the cluster. You can identify a Dynamic range by the way the IP
addresses present in the interface are displayed as .110-.114 or .115-.119 instead of a single
IP address like .10. NFS is considered a stateless protocol, in that no session or “state”
information is maintained on the cluster side. If a node goes down, the IP address that the
client is connected to, will move to another node in the cluster. For example, if a Linux client
were connected to .110 as shown here, and that node were to go down, the IP
addresses .110, .111, .112, .113 and .114 would be distributed equally to the remaining two
nodes in pool, and the Linux client would seamlessly failover to one of the active nodes. The
client would not know that their original node had failed.

Example: Dynamic Pools


Let’s see how dynamic pools could be set up in the X-Attire array to provide NFS failover for
the web hosting application. This example illustrates how NFS failover and failback works. X-

Isilon Solution Design 209

Attire accesses node 1, 2, 3, and 4. An IP address pool provides a single static node IP
( to an interface in each cluster node for the Windows home directories.
Another pool of dynamic IP addresses (NFS failover IPs) has been created and distributed
across the nodes (
When node 2 in the Isilon cluster goes offline, the IP addresses (and connected clients)
associated with node 2 failover to the remaining nodes based on the configured connection
balancing policy (Round-robin, Connection count, Throughput, or CPU usage). The static
node IP address for node 2 is no longer available.
If a node with client connections established goes offline, the behavior is protocol-specific.
NFSv3 automatically re-establishes an IP connection as part of NFS failover. In other words, if
the IP address gets moved off an interface because that interface went down, the TCP
connection is reset. NFSv3 reconnects with the IP address on the new interface and retries
the last NFS operation. However, SMBv1 and v2 protocols are stateful. So when an IP
address is moved to an interface on a different node, the connection is broken because the
state is lost. NFSv4 is stateful (just like SMBv1 and v2) and does not benefit from NFS failover.
However, OneFS 8.0 and later supports SMBv3 with continuous availability (CA) and NFSv4
CA clients, which provide continued file operations during both planned or unplanned
network or storage node outages.
Note: A best practice is to set the IP allocation method to static unless using NFSv3. Other
protocols such as SMB and HTTP have built-in mechanisms to help the client recover
gracefully after a connection is unexpectedly disconnected.

Isilon Solution Design 210

Balancing Policies for Connection


SmartConnect load balances client connections across the front-end ports based on
workflows. SmartConnect advanced (license) enables four load balance options: Round-
robin, Connection count, Throughput, and CPU usage. If the cluster does not have
SmartConnect licensed, it will load balance by Round-robin only.
 Round-robin: as a very basic example, the first client that connects will go to node 1,
the second to node 2, the third to node 3, etc.
 Connection count: the SIP can load balance by sending clients to the nodes with the
least amount of client connections. If one node has seven clients connecting and
another has only four, then the SIP will send the next client connection to the node
with only four connections. Attempts to balance the number of connections to each
node. Data is collected every 10 seconds.
 Network throughput: load balances based on a node’s current network throughput,
thus sending the next client connection to the node with the least network
throughput. Directs new connections to nodes that have lower external network
throughput. Data is collected every 10 seconds.
 Lastly, CPU utilization sends the client connections to the node with the least CPU
usage at the time the client connects. This helps spread the load across the nodes
and does not over burden any one node. Attempts to balance the workload across
the cluster nodes. Statistics are collected every 10 seconds.

Isilon Solution Design 211

How do you balance load? Some general recommendations can be made regarding
recommended connection balancing policies in SmartConnect. In general, long lived and low
performance clients can be best accommodated by either Round-robin or Connection Count
policies to fairly distribute client connections across nodes in a SmartConnect pool. This is a
good approach if performance is not an especially sensitive issue.
For non-persistent and high performance clients it is often best to balance based on the type
of performance applicable to the workload such as, network throughput for a streaming
workload, CPU utilization for a more ops intensive workload.
Note that customers who want to use external load balancers do not get the same depth of
options. External load balancers tend to get round-robin, connection count and maybe
throughput, but they cannot inspect node CPUs.

IP Failover and Rebalance Policies


IP rebalancing and IP failover are features of SmartConnect Advanced. The rebalance policy
determines how IP addresses are redistributed when node interface members for a given IP
address pool become available again after a period of unavailability. The rebalance policy
could be:
 Manual Failback - IP address rebalancing is done manually from the CLI using isi
network pools rebalance-ips. This causes all dynamic IP addresses to rebalance

Isilon Solution Design 212

within their respective subnet.
 Automatic Failback - The policy automatically redistributes the IP addresses. This is
triggered by a change to either the cluster membership, external network configuration
or a member network interface.

Multiple Tiers per Cluster


Because each SmartConnect zone is managed as an independent SmartConnect

environment, they can have different attributes, such as the client connection policy. For
environments with very different workloads, this provides flexibility in how cluster resources
are allocated. Clients use one DNS name to connect to the performance nodes and another
to connect to the general use nodes. The performance zone could use CPU utilization as
the basis for distributing client connections, while the general use zone could use Round-
robin or Connection count, which will optimize the allocation of cluster resources based on
client requirements and workloads.
So let’s revisit the X-Attire example of an F800 chassis for Marketing’s video media group. X-
Attire can create a subnet and/or pool to be used by high computational power servers to
give a higher level of performance. This is the performance tier shown above. A second
subnet and/or pool is created with a different zone name for general use, often desktops,
that do not need as high a level of performance. This is the general use tier. Each group

Isilon Solution Design 213

connects to a different name and gets different levels of performance. This way, no matter
what the desktop users are doing, it does not affect the performance to the cluster. Because
it is still one cluster, when the data is generated from the cluster, it is immediately available
to the desktop users.

Example: Cluster Name Resolution Process


Here we’ll illustrate how SmartConnect uses X-Attire’s existing DNS server, providing a layer
of intelligence within the OneFS software application.
An NS record that delegates the subdomain to the name server with a
hostname of SIP ( thus the mapping looks like the DNS configuration noted
on the slide. The NS states that anyone looking to
resolve should go and query the NS called The A record
maps the hostname of to the IP address Now anyone looking
for will be forwarded to and can be
found at
Specifically, all clients are configured to make requests from the resident DNS server using a
single DNS host name (i.e., cluster). Because all clients point to a single host name,, it makes it easy to manage large numbers of clients. The resident DNS
server forwards the lookup request for the delegated zone to the delegated zone’s server of

Isilon Solution Design 214

authority, in this case the SIP address of the cluster. SmartConnect evaluates the
environment and determines which node (single IP address) the client should connect to,
based on the configured policies. It then returns this information to the DNS server, which,
in turn, returns it to the client. The client then connects to the appropriate cluster node using
the desired protocol.

Best Practices for DNS Delegation


Delegate to address (A) records, not to IP addresses. The SmartConnect service IP on an

Isilon cluster must be created in DNS as an address (A) record, also called a host entry. An A
record maps an FQDN such as to its corresponding IP address. Delegating to
an A record means that if you ever need to failover the entire cluster, you can do so by
changing just one DNS A record. All other name server delegations can be left alone. In many
enterprises, it is easier to have an A record updated than to update a name server record,
because of the perceived complexity of the process.
SmartConnect requires adding a new name server (NS) record that refers to the
SmartConnect service IP address in the existing authoritative DNS zone that contains the
cluster. Use one NS record for each SmartConnect zone name or alias. You must also
provide a zone delegation to the fully qualified domain name (FQDN) of the SmartConnect
zone. Isilon recommends creating one delegation for each SmartConnect zone name or for
each SmartConnect zone alias on a cluster. This method permits failover of only a portion of

Isilon Solution Design 215

the cluster's workflow-one SmartConnect zone-without affecting any other zones. This
method is useful for scenarios such as testing disaster recovery failover and moving
workflows between data centers.
Isilon does not recommend creating a single delegation for each cluster and then creating
the SmartConnect zones as sub records of that delegation. Although using this method
would enable Isilon administrators to change, create, or modify their SmartConnect zones
and zone names as needed without involving a DNS team, this method causes failover
operations to involve the entire cluster and affects the entire workflow, not just the affected
SmartConnect zone.

Considerations: SmartConnect


Never put the SmartConnect SIP address into one of the IP address pools, the same way you
would not put a static server IP address into a DHCP scope. As a good practice, start with
using the Round-robin balancing policy, and then modify for workflow. Use round-robin to
avoid imbalanced connections.
To successfully distribute IP addresses, SmartConnect DNS delegation server answers DNS
queries with a time-to-live of 0 so that the answer is not cached. Certain DNS servers, such
as Windows Server 2003, 2008, and 2012, will fix the value to one second. If you have many
clients requesting an address within the same second, this will cause all of them to receive

Isilon Solution Design 216

the same address. In some situations, there may be barriers to deploying SmartConnect, in
which case other means should be specified in the solution design. DNS servers (not
SmartConnect) handle client DNS requests. IIS also does not play well with SmartConnect.
(Look for a workaround, a plug-in called IIS-IQ).
Certain clients perform DNS caching and might not connect to the node with the lowest load
if they make multiple connections within the lifetime of the cached address. For example,
this issue occurs in Mac OS X for certain client configurations. The site DNS servers handle
DNS requests from clients and route the requests appropriately.
Static pools are best used for SMB clients because of the stateful nature of the SMB protocol,
and dynamic pools are best used for NFS clients.

Lesson 4: Access Zones


Upon completion of this lesson, you will be able to highlight the authentication structure,
explain how access zones interact with other networking concepts, and discuss file filtering.

Isilon Solution Design 217

Overview: Access Zones


Isilon has powerful cluster features, which can be very useful to a wide range of users. Each
of those use cases have administration needs, which require individualized security and data
management. It looks a lot like the management issues cloud data storage providers have to
solve, which are based in multitenancy.
Isilon embraces multitenancy by giving administrators the features they need to solve these
problems. Simply put, you can divide users and manage their access by their network of
origin before they are authenticated, authenticate them with appropriate services, and then
handle their file permissions correctly across multiple access and authentication systems.
Now we will take a look at how Isilon breaks this down in more detail.
First, when a client wants to connect to a service on an Isilon cluster, we must resolve a
name to an IP address. SmartConnect handles this step, because as you have seen, it is a
DNS server. Granted, SmartConnect is a specialized DNS server that only represents the
cluster, but it understands the cluster and sends clients to the correct, available nodes.
Next, the client connects to the Isilon cluster, and has to authenticate. Authentication
happens by an authentication service configured in that access zone. Once authentication is
complete, the client has access.
As we have seen before, different clients may be in the same groupnet or different
groupnets, and even in any given groupnet they may in different access zones.
An access zone can be thought of as one of many of a cluster’s virtual partitions or

Isilon Solution Design 218

containers. The cluster can be segmented in to multiple access zones, allowing data isolation
and granular access control. To control data access, the access zone is associated with a
groupnet. Access zones support configuration settings for authentication and identity
management services, meaning authentication providers are configured and protocol
directories provisioned (e.g., SMB shares and NFS exports) on a zone-by-zone basis.
Access zones provide two important functions in a complex storage environment. First, they
allow you to arrange separate authentication services. Second, they allow you to separate
data into discrete sets that are only accessible within certain access zones. Access zones do
not actually separate access to different nodes, for different groups of clients. To do that,
you need to configure SmartConnect.

Access Zone Architecture


OneFS’s identity management is what maps users and groups from separate directory
services to provide a single combined identity. It also provides uniform access control to files
and directories, regardless of the incoming protocol. lsassd is the cluster’s authentication
daemon and is covered in a later module. The cluster’s default access zone is “System”, and it
uses an internal authentication provider. Configuration of access zones - or any other
configuration of the cluster for that matter - is only supported when an administrator is
connected through the System access zone. Each access zone has its own authentication
providers (File, Local, Active Directory, or LDAP) configured. Multiple instances of the same

Isilon Solution Design 219

provider can occur in different access zones though doing this is not a best practice.
Once the client is at the front-end interface, the associated access zone then authenticates
the client against the proper directory service; whether that is external like LDAP and AD or
internal to the cluster like the local or file providers. Access zones do not dictate which front-
end interface the client connects to, they only determine what directory will be queried to
verify authentication and the shares that the client will be able to view. Once authenticated
to the cluster, mode bits and access control lists, or ACLs, dictate the files, folders and
directories that can be accessed by this client. Remember, when the client is authenticated
Isilon generates an access token for that user. The access token contains all the permissions
and rights that the user has. When a user attempts to access a directory the access token
will be checked to verify if they have the necessary rights.
Click on the boxes to learn more about each area.
External Protocols
External access protocols are used by clients to connect to the Isilon cluster. The currently
supported protocols are listed on the slide.
lsassd Daemon
Within OneFS, the lsassd (L-sass-d) daemon mediates between the external protocols and
the authentication providers, with the daemon reaching out to the external providers for
user lookups.
External Providers
In addition to external protocols, there are also external providers. These are external
directories that hold lists of users that the internal providers contact in order to verify user
credentials. Once a user’s identity has been verified OneFS will generate an access token.
The access token will be used to allow or deny a user access to the files and folders on the
Internal Providers
Internal providers sit within the cluster’s operating system and are the Local, or File
Providers. A Local Provider is a list of users local to the cluster, and the File Provider would
use a converted etc/password file.

Isilon Solution Design 220

External Protocols

Isilon Solution Design 221

lsassd Daemon

Isilon Solution Design 222

External Providers

Isilon Solution Design 223

Internal Providers

Isilon Solution Design 224

Qualifying Questions


Covered are some leading questions that can help you to understand whether access zones
would be a fit. Behind some of these questions are some inherent limitations in access zone
architecture. DNS configuration (which DNS servers we talk to, and what domain suffixes we
append to names) is done at the cluster level (i.e., globally). This means that in order for
Isilon to be able to resolve host names (particularly the Active Directory domain controllers
to which we are attempting to bind), the global DNS resolver must be able to look up host
names in the various zones/domains to which the admin wishes to bind us. For instance,
consider the following scenario:
 Cluster DNS is configured to point to, which is a resolver for “EMC.COM”.
Additionally, the DNS suffixes configured are “EMC.COM” and “”.
 The administrator wants to bind a separate “Isilon” access zone to a domain
controller responsible for “”.
 If the EMC DNS server cannot resolve “”, we will not be able to join or
use this Active Directory domain.
Workarounds for this issue include:
 Adding more DNS resolvers to the config (e.g.,, which is a resolver for
 Note that an Isilon cluster currently has a limit of three resolvers per cluster, so

Isilon Solution Design 225

binding to many domains that cannot be resolved by the ‘parent’ DNS server is not
 Another possible workaround is to have the customer delegate authority for
“” to a DNS server in the “” zone; however, that requires the
customer to add delegation records to their DNS infrastructure, which they may not
be willing to do.
Another limitation is that there is no ‘administration’ alignment with access zones. That is,
you cannot delegate authority to administer shares, quotas, etc. on a per-access zone basis.
This may not work well for hosting providers or other institutions that need to be able to
allow other departments/customers/etc. to manage their own shares. Obviously, if the
customer answers ‘yes’ to the central IT question, that is a good thing, since we can’t do it
any other way yet.
One of the main reasons for access zones was to give customers the ability to consolidate
many separate SMB servers (Windows file servers, vfilers, etc.) into a single Isilon cluster,
without needing to change how users accessed the data. For example, users accustomed to
connecting to \\fileserver01\data and \\fileserver02\hr can continue to do so, by simply
creating separate SmartConnect names and access zones that align with those names.
Another primary function of access zones was to provide the ability to bind the cluster to
multiple, untrusted Active Directory domains (or LDAP realms, etc.). This can help customers
with multi-domain environments (either due to acquisitions, or due to needing a separate
prod/dev infrastructure) allow Isilon to integrate without having to set up trusts or purchase
multiple Isilon clusters.
Also, this lesson covers file filtering. It may be good to know up front what types of files the
cluster will filter.

Isilon Solution Design 226

Qualifying Questions (Qualify or Redirect)


The answers to these questions are important as well. As a follow-on to what was discussed
previously, customers who want to be able to assign or delegate administrative privileges for
specific access zones will not be able to do so with current OneFS code.

Isilon Solution Design 227

Access Zone Interactions


Because groupnets are the top networking configuration object, they have a close
relationship with access zones and the authentication providers. The groupnet defines the
external DNS settings for remote domains and authentication providers so that the external
authentication providers will have an extra parameter that defines the groupnet in which
they exist. When the cluster joins an Active Directory server, the cluster must know which
network to use for external communication with the AD domain. Because of this, if you have
a groupnet, both the access zone and authentication provider must exist within same
groupnet or you will see an error. Access zones and authentication providers must exist
within one and only one groupnet. Multiple access zones can reference a single groupnet. If
a groupnet is not specified, the access zone will reference the default groupnet (0).
Incoming connections to the access zone can be directed to a specific IP address pool in the
groupnet. Associating an access zone with an IP address pool restricts authentication to the
associated access zone and reduces the number of available and accessible SMB shares and
NFS exports. An advantage to multiple access zones is the ability to configure audit protocol
access for individual access zones. You can modify the default list of successful and failed
protocol audit events and then generate reports through a third-party tool for an individual
access zone.
A base directory defines the file system tree exposed by an access zone. A base directory is
assigned to each access zone. The access zone cannot grant access to any files outside of the
base directory, essentially creating a unique namespace. However, in OneFS 8.0 and later,

Isilon Solution Design 228

access zones can have a shared subdirectory, allowing data sharing across zones. Access
zones sharing a subdirectory should also share authentication providers. The base directory
of the default System access zone is /ifs and cannot be modified. To achieve data isolation
within an access zone, a unique base directory path not identical or that does not overlap
another base directory is created. In the example shown, do not specify /ifs/data as the
base directory for both the X-Attire and the GearItUp access zones.

Do I Need Multiple Access Zones?


OneFS enables you to configure multiple authentication providers on a per-zone basis. In

other words, more than one instance of LDAP, NIS, File, Local, and Active Directory providers
per one Isilon cluster is possible.
Access zones provide a means to limit data access to specific directory structures by access
zone and SmartConnect zone/IP address pool. Each access zone can be configured with its
own authentication providers, zone aware protocols, such as SMB, FTP, and HTTP, and
associated SmartConnect IP address pools. An access zone becomes an independent point
for authentication and access to the cluster. Only one Active Directory provider can be
configured per access zone. If you connect the cluster to multiple AD environments
(untrusted) only one of these AD providers can exist in a zone at one time. Each access zone
may also have relationships to the System access zone. This is particularly useful for storage
consolidation, for example, when merging multiple storage filers that are potentially joined

Isilon Solution Design 229

to different untrusted Active Directory forests and have overlapping directory structures.
SMB shares that are bound to an access zone are only visible/accessible to users connecting
to the SmartConnect zone/IP address pool to which the access zone is aligned. SMB
authentication and access can be assigned to any specific access zone. Here’s an example of
separate namespaces for SMB/NFS:
 A number of SmartConnect zones are created, such as, Each of those SmartConnect zones can be aligned to an access zone.
 Users connecting to \\ would only see hr shares.
 Users connecting to \\ would only see finance shares.
 Having multiple zones allows you to audit specific zones without needing to audit the
entire cluster.

File Filtering in Access Zones


Some features work by access zone, and can be individually configured. Authentication is
one significant case, however for administrators file filtering is equally important. File
filtering enables administrators to deny or allow file access on the cluster based on the file
extension. Users often want to save an array of irrelevant data, but storage administrators
have to be able to manage that data. File filtering rules prevent writing files by their

Isilon Solution Design 230

extensions on each access zone. This, in combination with quotas, offers you a powerful set
of tools to manage abuses of file storage.
File filtering can be a blunt instrument. If you block files with .pptx extensions in a zone, you
block all PowerPoint files, not just particular ones. It can also be circumvented by renaming
files, because file filtering does not operate by inspecting file contents. This is a storage
management practice to consider in the light of your organization's particular needs. File
filtering is included with OneFS 8.0 and later, and no license is required. Click on the
information icon to review the configuration levels.

When to Use File Filtering


Some of the reasons for file filtering include the capability to enforce organizations policies.
With all of the compliance considerations today, organizations struggle to meet many of the
requirements. For example, many organizations are required to make all email available for
litigation purposes. To help make sure email is not stored longer than desired, they may not
want to allow *.pst files to be stored on the cluster by the users. Some reasons are practical;
cluster space costs money. Organizations plan storage space increases based on their work.
They may not want typically large files, such as video files, to be stored on the cluster, so
they can prevent *.mov or *.mp4 files from being stored. An organizational legal issue is
copyright infringement. Many users store their *.mp3 files on the cluster and open a
potential issue for copyright infringement for the organization. Another use case is to limit a

Isilon Solution Design 231

cluster for only a specific application with its unique set of file extensions. File filtering with
an explicit allow list of extensions can help limit the cluster to its singular intended purpose.

Considerations: Access Zone


Configuration management through a non-System access zone is not permitted through SSH,
the OneFS Platform API, or the web administration interface. However, you can create and
delete SMB shares in an access zone through the Microsoft Management Console (MMC).
Role-based access, which primarily allows configuration actions, is available through only the
System zone. All administrators, including those given privileges by a role, must connect to
the System zone to configure a cluster. Base directories restrict path options for several
features such as SMB shares, NFS exports, the HDFS root directory, and the local provider
home directory template.
As a best practice, the number of access zones should not exceed 50. The maximum number
of access zones has yet to be established. Access zones and authentication providers must
exist within one and only one groupnet.
There are several things to note about joining multiple authentication sources through
access zones. First, the joined authentication sources do not belong to any zone, instead
they are seen by zones; meaning that the zone does not own the authentication source. This
allows other zones to also include an authentication source that may already be in use by an

Isilon Solution Design 232

existing zone. Multiple instances of the same provider in different access zones is not a best
practice. When joining AD domains, only join those that are not in the same forest. Trusts
within the same forest are managed by AD, and joining them could allow unwanted
authentication between zones. Finally, there is no built-in check for overlapping UIDs. So
when two users in the same zone - but from different authentication sources - share the
same UID, this can cause access issues.
OneFS supports overlapping data between access zones for cases where your workflows
require shared data; however, this adds complexity to the access zone configuration that
might lead to future issues with client access. For the best results from overlapping data
between access zones, EMC recommends that the access zones also share the same
authentication providers. Shared providers ensures that users have consistent identity
information when accessing the same data through different access zones.

Best Practices for Access Zones


You can avoid configuration problems on the cluster when creating access zones by
following best practices guidelines shown here.
 Create unique base directories.
 Separate the function of the System zone from other access zones.

Isilon Solution Design 233

 Create access zones to isolate data access for different clients or users.
 Assign only one authentication provider of each type to each access zone.
 Avoid overlapping UID or GID ranges for authentication providers in the same access

Isilon Solution Design 234

Module 4: Data Management


Upon completion of this module, you will be able to explain Isilon’s role in an ILM
environment, illustrate file system layout, describe file tiering, quota management, and data
deduplication, and explain snapshots, WORM compliance, and antivirus.

Isilon Solution Design 235

Lesson 1: Information Lifecycle Management


Upon completion of this lesson, you should be able to describe Information Lifecycle
Management (ILM), identify how Isilon fits into an ILM environment, and establish questions
to ask about ILM environments.

Isilon Solution Design 236

Overview: ILM


ILM refers to the full lifecycle of managing data, from creation to deletion. It is an
architectural component of the environment that enables management of the data and is
primarily a concept and an infrastructure, not one specific individual product.

Isilon Solution Design 237

Information Lifecycle Management Components


ILM can include many components, such as: Basic storage, tiered storage, tools that identify
and analyze data characteristics, archive tools, backup tools and backup hardware. ILM
means that something is managing where the data is stored. This could be a tool, an
application, a metadata controller or the storage unit itself.

Isilon Solution Design 238

Examples of Where Isilon Fits Into ILM Directly


Where does Isilon fit into the ILM? Isilon is primarily tier one, two and three storage. Isilon
includes fully automated tiering software called SmartPools but this in itself does not fully fit
the definition of ILM. Isilon also supports integration with backup vendors as both a target
and a source and is compliant with data retention strategies.

Isilon Solution Design 239

ILM Differs for Each Customer


Isilon can fully fit into a ILM environment, but it is rarely a stand-alone ILM solution. Isilon is
merely one component of a customer’s complete ILM architecture.
Important! You must have a consulting discussion about ILM with each prospective customer
ILM means different things to different people. An ILM misunderstanding can stop an Isilon
installation; therefore, ask plenty of detailed questions so that you have a full understanding
of customer requirements.

Isilon Solution Design 240

Qualifying Questions


If the answer is “yes,” to “Do you currently have an ILM architecture?” Ask the customer to
describe their architecture in detail.
What technologies do you currently employ for ILM? An adequate response must include
both hardware and software. What are the criteria that trigger data movement?
Decision criteria that cause data movement are considerations such as, the customer may
consider the newest data the most important and in need of high performance; or perhaps
they consider the most-requested data as most important, and they move less-requested
data into near-line storage. Other criteria might be based on who authored the data (e.g.,
engineering report), who owns the data, who needs the data, type of data (including file
format), and so on. There is much to explore here, so don’t be shy about doing so.

Isilon Solution Design 241

Worth Repeating


Generally, customers will not intentionally deceive you or withhold information. However,
they are so familiar with their environments, they sometimes forget what it is like not to
know what they know. Omissions of information are common. You may have to question
them more than once to get the entire picture. Dig in deep!

Isilon Solution Design 242

Areas to Evaluate Closely


If a customer has an ILM standard that involves migrating data between different storage
platforms, it will be difficult to implement (compared to a stand-alone Isilon implementation).
If Isilon does not fit as a stand-alone solution, all is not lost. It might still act as a component
of ILM; e.g., there may be another storage layer on top, or it might act as a bottom layer,
such as an archive or DR target.

Isilon Solution Design 243

Lesson 2: File System Layout


Upon completion of this lesson, you should be able to analyze key factors of file system
layout and examine directory layout.

Isilon Solution Design 244

Overview: File System Layout


Most NAS platform technology refreshes involve replacing the hardware completely every
three to five years. The migration to a new hardware array can be straightforward,
performed while you’re still using the old array (for example, using VDM mobility or with
vFilers), or the migration can be a very complex host-based process. When the migration
finishes, there is a new NAS to manage with a new configuration, even if the configuration
was copied from the old array.
Isilon hardware is also refreshed on a three-to-five-year cycle, depending on your business
processes and needs. You perform an Isilon technology refresh by adding new nodes to your
existing cluster, and then using SmartFail to remove the obsolete nodes. All of your
information remains the same, including the cluster name, file system structure,
configuration, data, and DNS information. Proper directory layout is critical to a successful
OneFS disaster recovery plan. Make sure that you understand the following two factors
before you decide on a file system layout.
 Isilon technology refresh cycles
 Isilon disaster recovery
When planning your initial directory layout, consider multi-tenancy and disaster recovery.
During a failover event, NFS clients require their exported paths to remain the same to
enable accessing the data. The mount entry for any NFS connection must have a consistent
mount point so that during failover, you don’t have to manually edit the file system table

Isilon Solution Design 245

(fstab) or automount entries on all connected clients. For more information, see the SyncIQ
Performance, Tuning and Best Practices guide, and the EMC Community Network.



OneFS combines and presents all of the nodes in a cluster as one single global namespace
by providing the default file share /ifs. We recommend that you do not save data to the root
/ifs file path as you could mistakenly copy or delete important files into or from the
/ifs/.ifsvar directory which is the OneFS operating system. Instead, create a logical collection
of directories below /ifs to use as your data directories.
The design of your data storage structure should be planned carefully. A well-designed
directory optimizes cluster performance and cluster administration.

Isilon Solution Design 246

Considerations: File System Layout


It is important to set and leave the permissions of the default directories. Changing them,
especially after user data is on the cluster, could alter permissions cluster-wide and result in
unintended downtime. For example, you could accidentally restrict access to all data on
cluster, you could restrict access for administrator access, or changing permissions on
the .ifsvar could even restrict the operating system's ability to function.
There should be a consistency of path if you are using two Isilon clusters for DR. This allows
you to use SyncIQ for data replication to the other Isilon cluster and then, in the event that
you need to failover to that other Isilon cluster, the path to the users data remains
unchanged. The image shows both source and destination clusters with identical paths.

Isilon Solution Design 247

Lesson 3: File Tiering


Upon completion of this lesson, you should be able to describe SmartPools functionality,
explain and configure tiers, node pools, global settings, differentiate unlicensed and licensed
SmartPools capabilities, and clarify results of disk pools.

Isilon Solution Design 248

Overview: SmartPools


SmartPools is a software module that enables administrators to define and control file
management policies within a OneFS cluster. SmartPools uses storage pools, which allow for
the grouping of nodes into storage units that include node pools, CloudPools, and tiers.
Without an active SmartPools license, OneFS applies a default file pool policy to organize all
data into a single file pool. With this basic policy, OneFS distributes data across the entire
cluster so that data is protected and readily accessible. We will discuss the additional
functions that become available when activating a SmartPools license.
Any types of current Isilon storage nodes (including Gen 4 and Gen 5 S-Series, X-Series, NL-
Series, HD-Series, and Gen 6 A-series, H-series and F-series) can all co-exist within a single
file system, with a single point of management. Shown here is a cluster with F800 nodes
optimized for random access, S210 nodes optimized for concurrent access, and NL410
nodes optimized for streaming access. With an active SmartPools license, administrators can
specify exactly which files they want to save onto a particular pool or tier. Node pool
membership changes through the addition or removal of nodes to the cluster. Tiers are
formed when combining different node pools.
We’ll get into the details of storage pools and file pool policies shortly, but as this example
shows, with an active SmartPools license, you can create tiers of storage to form a storage
pool and then apply file pool policies to that storage pool.
Let’s see how this looks. File pool policies, including a default file pool policy, are used to
create automated policies to manage file placement, requested protection settings and I/O

Isilon Solution Design 249

optimization settings. File pool policies enable you to filter files and directories and store
them on specific node pools or tiers according to criteria that you specify, such as the value
of file properties. So, we can change the storage pool tier, change the optimization, and
change the protection level if the file or directory no longer requires greater protection. We
can trigger the changes at any time and on any directory or file.
SmartPools is also used to manage global settings for the cluster, such as L3 cache
enablement status, global namespace acceleration (GNA) enablement, virtual hot spare
(VHS) management, global spillover settings, and more. These will be discussed later.
File pool policies are used to determine where data is placed, how it is protected and which
other policy settings are applied based on the user-defined and default file pool policies. The
policies are applied in order through the SmartPools job.
File pool policies are user created polices used to change the storage pool location,
requested protection settings, and I/O optimization settings. File pool policies add the
capability to modify the settings at any time, for any file or directory. File pool policies
automate file management with user created policies. Files and directories are selected
using filters and apply actions to files matching the filter settings. The management is file-
based and not hardware-based. Each file is managed independent of the hardware, and is
controlled through the OneFS operating system.

Qualifying Questions


Isilon Solution Design 250

What do you do with the customer’s answers? Ask more questions and qualify what they are
saying. Make sure you understand what they are attempting to do and what they are saying.
The primary goal of questioning is to identify where Isilon does not fit.
Key areas that tend to be problematic:
 Customer is leveraging stubbing.
 Customer is archiving, and expects SmartPools to move data to a different directory.
 Customer expects SmartPools to manage all of ILM, including data destruction.
 Certain Compliance requirements.
 Integration with some older ILM tools.

Storage Pools


Let’s explore the building blocks of a storage pool. This will help understand the underlying
structure when moving data between tiers. Shown is a cluster consisting of Gen 6 F800
nodes, Gen 5 X-Series, and Gen 5 NL-Series nodes. The first storage pool component we’ll
discuss is disk pools, the smallest unit. Similar or identical node drive types, are
automatically provisioned into disk pools with each representing a separate failure domain.
disk pools can span from 3 up to 39 nodes in a node pool for Gen 4 and 5 nodes.

Isilon Solution Design 251

Neighborhoods are a group of disk pools and can span from 4 up to 19 nodes in a node pool
for Gen 6 nodes. Data protection stripes or mirrors cannot span disk pools, making them the
granularity level at which files are striped to the cluster. disk pool configuration is automatic
and cannot be configured manually. Mouse over a disk pool or neighborhood for the
overview highlights.
A node pool is used to describe a group of similar or identical nodes. There can be up to 144
nodes in a single node pool. All the nodes with identical hardware characteristics are
automatically grouped in one node pool. A node pool is the lowest granularity of storage
space that users manage. Mouse over a node pool for an overview.
Multiple node pools with similar performance characteristics can be grouped together into a
single tier with the licensed version of SmartPools. Multiple tiers can be included in a cluster
to meet the business requirements and optimize storage usage. This example shows a
performance tier, tier1, a throughput tier, tier2, and an archive tier, tier3.
Now that you understand the components of a storage pool, we can answer the question,
what are storage pools? Storage Pools are an abstraction layer that encompasses disk pools,
neighborhoods, and node pools, as well as tiers. Storage pools also monitor the health and
status at the node pool level. Using storage pools, multiple tiers of Isilon storage nodes
(including Gen 4 and Gen 5 S-Series, X-Series, NL-Series, HD-Series, and Gen 6 A-series, H-
series and F-series) can all co-exist within a single file system, with a single point of
management. By licensing and configuring SmartPools, administrators can specify exactly
which files they want to reside on node pools and tiers.
Whereas storage pools define a subset of the cluster’s hardware, file pools are SmartPools’
logical layer, at which file pool policies are applied. File pool policies provide a single point of
management to meet performance, requested protection level, space, cost, and other
requirements. User created and defined policies are set on the file pools.
We can also discuss CloudPools as another tier. CloudPools is a licensed module that
enables data to be stored in a cold/frozen data tier, thereby taking advantage of lower-cost,
off premise storage. Optimizes and protects transfer of data to cloud with use of both
encryption and compression.

Isilon Solution Design 252

Disk Pool

Isilon Solution Design 253


Isilon Solution Design 254

Node Pool

Isilon Solution Design 255


Isilon Solution Design 256



CloudPools offers the flexibility of another tier of storage that is off-premise and off cluster.
CloudPools will optimize and protect the transfer of data to cloud with the use of both
encryption and compression. Essentially what CloudPools does is provide lower TCO for
archival-type data by optimizing primary storage with intelligent data placement. CloudPools
integrates seamlessly with the cloud. It eliminates management complexity and allows a
flexible choice of cloud providers. Customers who want to run their own internal clouds can
use an Isilon installation as the core of their cloud.
CloudPools Concept:
Shown here is an Isilon cluster with tiering between the nodes. When files are moved to a
cloud pool tier, a stub file remains on the cluster (sometimes called a “SmartLink” file). The
stub files are pointers (contain metadata) to the data moved to the cloud, and any cached
data changes not yet written out to the cloud. Stub files have the details for connecting to
the appropriate cloud resource for its file. Also, when enabling encryption, the encryption
keys become a part of the stub file, further securing cloud data from direct access. Data can
also be pulled from the cloud back to the enterprise.
As an example, frequently accessed general purpose file data such as media, documents,
presentations, etc., data may reside primarily on the X-Series tier. This data has a policy that
moves files that have not been accessed for more than 60 days to the NL-Series tier. We can
then have a CloudPools policy that moves files that have not been accessed for more than
nine months to the cloud. A user accessing a file that resides on the cloud tier could see

Isilon Solution Design 257

slower performance as this is dependent on the cloud choice and actual location of the data.
Data that is moved to the cloud, is also protected against anyone connecting directly to the
cloud. Files are stored in 1MB chunks called Cloud Data Objects that appear unreadable to
direct connections. Metadata stored on the Isilon cluster is required to read these files,
adding an extra layer of protection to cloud storage.
Clients and applications access to data is transparent. So clients simply continue opening
files, with a bit longer latency for those files in the cloud. NDMP backups and SyncIQ policies
continue as if the data were still in place, save time by just backing up the stub files, or by
copying full files as necessary. Additional details for this functionality follows in the SyncIQ
section of the training.
At a high level, there are two elements we can integrate to expand the data lake beyond the
data center. First is the ability to consolidate and replicate remote location data in a remote
office/branch office (ROBO) type solution. Remote locations are referred to the “edge” of the
enterprise’s data center. Second is the use of a public or private cloud to tier data out of the
“core” platforms.
Isilon SD Edge is the edge component and CloudPools is the cloud mechanism. Though this
module covers an overview, both concepts will be discussed in more detail within the course.
We’ll revisit our drawing of the tiered cluster, here it is the core of the data center. Our
branch office is employing commodity servers with VMware ESXi and SD Edge running on
them. This is a software defined solution. As many as 68 percent of enterprises have over
10TB of data at each branch location. Data moves from the edge locations to the core.
CloudPools allow data to expand beyond the core and into the cloud. Cloud vendors such as
Amazon Web Services and Microsoft Azure are supported as well as EMC Elastic Cloud
Storage and even Isilon storage. The overall concept of CloudPools is to move old and
inactive data to more cost efficient storage, taking advantage of massively scalable storage
and reducing the enterprises’ OPEX and CAPEX. In doing so, we expand the data lake to the
enterprise’s edge and to the cloud.
SD Edge Overview:
Let’s take a look at these features, starting with Isilon SD Edge. This is a software defined
scale-out NAS running OneFS and leveraging the OneFS protocols and access methods, and
enterprise grade features. For our design, we are especially interested in using SyncIQ to
consolidate data to the core. Replicating the data may eliminate the need for backups at the
edge sites. SD Edge and SyncIQ are covered in more detail later in this course. The table
compares SD Edge with Isilon. The notable differences are SD Edge scaling to 36 TB and a
cluster can have from 3 to 6 nodes.
SD Edge addresses the common challenges the customer face when trying to manage
remote offices. Most notably the solution is installed on a virtual environment on commodity
hardware, eliminates disparate islands of storage, adds data protection, and simplifies
management. In the module’s scenario, SD Edge can help consolidate data under the “core”
data center. It’s simple, agile and cost efficient, ideal for remote locations with limited IT
resources. It can be managed with standard VMware tools, removing much of the
management complexity.

Isilon Solution Design 258

The IsilonSD Edge Foundation Edition is a free download for non-production use and has
EMC Community only support.

CloudPools Concept

Isilon Solution Design 259


Isilon Solution Design 260

SD Edge Overview

Isilon Solution Design 261

SmartPools Licensing


SmartPools is a software module enabling administrators to define and control file

management policies within a OneFS cluster. Referring to the chart, with unlicensed
SmartPools, we have a one-tier policy of “anywhere” with all node pools tied to that storage
pool target through the default file pool policy. This means that there is one file pool policy
that applies that same protection level, or defers the requested protection level to the node
pool setting (default) and I/O optimization settings to all files and folders in the cluster. After
purchasing and activating a SmartPools license, the administrator can have multiple storage
pools containing node pools or tiers with different performance characteristics on the same
cluster. Data can be managed at a granular level through the use of SmartPools file pool
policies. Because of the ability to have multiple data target locations, some additional target
options are enabled in some global settings. These advanced features include the ability to
create multiple storage tiers, multiple file pool policy targets, and multiple file pool policies,
each with its own protection, I/O optimization, SSD metadata acceleration, and node pool
spillover settings.

Isilon Solution Design 262

SmartPools Considerations


Each node pool must contain at least three nodes. If you have fewer than three nodes, the
node is under provisioned. File pools policies are a set of conditions that move data to
specific targets, either a specific pool or a specific tier. By default, all files in the cluster are
written anywhere on the cluster as defined in the default file pool policy. You cannot target
additional node pools or tiers unless SmartPools is licensed on your cluster. If you license
SmartPools and then create multiple node pools and tiers, you can create multiple file pool
policies. The file pool policies are listed and applied in the order of that list. Only one file
pool policy can apply to a file, so after a matching policy is found, no other policy is
evaluated for that file. The default file pool policy is always last in the ordered list of enabled
file pool policies. File policy filters help to automate and simplify high file volume
management. To simplify the creation of file pool polices, customizable templates are
Data spill over occurs when a node pool is full. The feature redirects writes to another pool.
Enabled by default, can be disabled if SmartPools licensed. Disabling ensures a file will only
exist in one pool.
SmartPools automatically divides equivalent node hardware into disk pools. The disk pools
are protected against up to two drive failures, depending on the protection setting. This
method of subdividing a node’s disks into separately protected disk pools increases
resiliency to multiple disk failures.

Isilon Solution Design 263

SmartPools Job


File pool polices are applied to the cluster by the SmartPools job. By default, this job runs at
10:00 p.m. (22:00 hours) every day at a low priority and is manageable through the web
administration interface or through the CLI. The SmartPools job enforces the file pool
policies and are checked in order from the top to the bottom of the list of file pool policies.
The order of the file pool policies is important as the first policy matched defines action on
the file. Once a match is found, no other policies are checked and the default policy settings
complete any unspecified remaining attributes applied to the file.

Isilon Solution Design 264

File Pool Policies


File pools are SmartPools logical layer. File pool policies are user created polices used to
automatically change the storage pool location, requested protection settings, and I/O
optimization settings at any time, for any file or directory. Files and directories are selected
using filters and apply actions to files matching the filter settings. The management is file-
based and not hardware-based. Each file is managed independent of the hardware, and is
controlled through the OneFS operating system.
Shown here is an example. The first file pool policy that matches a file is applied. Here that
would be moving the JPG files. No other policies are processed after the first match. File pool
policies are processed in a top-to-bottom order with the default file pool policy last. The
order the policies are listed can determine the actions applied to the file. File pool policies
should be as specific as possible to accomplish the desired results without having too many
policies to be manageable. They should be placed in an order to deliver the desired results.
If a setting has not been specified, the setting in the default file pool policy is applied. For
example, if I/O optimization or the requested protection level settings are not specified in
the file pool policy, the I/O optimization or requested protection level settings are
determined from the default file pool policy.
To change the order of the file pool policy, simply click the up or down arrow to move the
policy in the list order. Click on the buttons to learn more.
Policy Order Matters:

Isilon Solution Design 265

What is displayed in the web administration interface varies slightly between OneFS versions.
Each policy’s settings are available for viewing regardless of the OneFS version. Displayed is
the policy created to move files over 2MB to a specific node pool. The interface displays the
filters and attribute settings for the policy in the Filter and Operations columns. You can
review the policy to assist in identifying when the policy should be used and what settings
should be chosen. The listed order of policies is the order in which policies are processed.
You can move policies up or down in the list to meet the desired file setting behavior.
Complex File Pool Policy Filters:
Complex file pool policies are a way of representing combining multiple file pool policies
together that have the same associated actions. The 100 file pool policy limit can often be a
challenge in managing desired behavior on the cluster. Each filter portion of the file can be
fine tuned using the AND conditions and only the specific matches will invoke the policy. If
the policy filter is granular and is the desired behavior, you can place the file pool policies at
the top of the file pool policy order so they are check before less specific policies.
The use of the OR condition allow multiple filters to be combined in the same policy. Each
filter is independent of the other and both filters are evaluated for against the file. The OR
condition helps to minimize the number of file pool policies.
File Pool Policy Filters:
File pool policy filters are the file matching criteria used by the policy. The File pool policy
filters can be created in the Configure File Matching Criteria dialog box. At least one criterion
is required, but multiple criteria are allowed. Click on the “Path” field to view criteria options.
You can add AND or OR statements to a list of criteria to narrow the policy based on
customer requirements. You can configure up to three criteria blocks per file pool policy.

Isilon Solution Design 266

Policy Order Matters

Isilon Solution Design 267

Complex File Pool Policy Filters

Isilon Solution Design 268

File Pool Policy Filters

Isilon Solution Design 269

Storage Pool Features

Click on the buttons to review the different features.
Virtual Hot Spare:
VHS allocation enables you to allocate space to be used for data rebuild in the event of a
drive failure. This feature is available with both the licensed and unlicensed SmartPools
module. By default, all available free space on a node pool is used to rebuild data. The virtual
hot spare option reserves free space for this purpose. VHS provides a mechanism to assure
there is always space available and to protect data integrity in the event of overuse of cluster
space. Another benefit to VHS is it can provide a buffer for support to repair nodes and node
pools that are overfilled. You can uncheck the Deny data writes to reserved disk space
setting and use the space for support activities.
Using the Virtual hot spare (VHS) option, for example if you specify two virtual drives or 3
percent, each node pool reserves virtual drive space that is equivalent to two drives or 3
percent of their total capacity for virtual hot spare, whichever is larger. You can reserve
space in node pools across the cluster for this purpose, equivalent to a maximum of four full
drives. If you select the option to reduce the amount of available space, free-space
calculations exclude the space reserved for the virtual hot spare. The reserved virtual hot
spare free space is used for write operations unless you select the option to deny new data
writes. VHS is calculated and applied per node pool across the cluster.
VHS reserved space allocation is defined using these options:

Isilon Solution Design 270

 A minimum number of virtual drives in each node pool (1-4)
 A minimum percentage of total disk space in each node pool (0-20 percent)
 A combination of minimum virtual drives and total disk space. The larger number of
the two settings determines the space allocation, not the sum of the numbers. If you
configure both settings, the enforced minimum value satisfies both requirements.
It is recommended you use the default settings enabling VHS, ignoring reserved space for
free space calculations, and deny writes to reserved space. The recommended space
allocation setting varies by customer. A safe setting would be At least 2 virtual drive(s).
As a support note, if the Ignore reserved space and Deny data writes options are enabled, it
is possible for the reported file system use percentage to be over 100%.
Global Spillover:
The Enable global spillover and Spillover Data Target options configure how OneFS handles a
write operation when a node pool is full. With the licensed SmartPools module, a customer
can direct data to spillover to a specific node pool or tier group of their choosing. If spillover
is not desired, then you can disable spillover so that a file will not move to another node
Virtual hot spare reservations can affect when spillover would occur. For example, if the
virtual hot spare reservation is 10 percent of storage pool capacity, spillover occurs if the
storage pool is 90 percent full.
Global spillover is enabled by default.
Global Namespace Acceleration:
The purpose of GNA is to accelerate the performance of metadata-intensive applications and
workloads such as home directories, workflows with a heavy enumeration and activities
requiring a large number of comparisons. Example of metadata-read-heavy workflows exist
across the majority of Isilon's established and emerging markets. In some, like EDA, such
workloads dominate and the use of SSDs to provide the performance they require is
GNA enables SSDs to be used for cluster-wide metadata acceleration and use SSDs in one
part of the cluster to store metadata for nodes that have no SSDs. For example if you have
ten S-Series nodes with SSD drives and three NL nodes that do not have SSD drives, you can
accelerate the metadata for the data residing on the NL nodes by using GNA to store
metadata on the SSD drives that sit inside of the S-Series nodes. The result is that critical SSD
resources are maximized to improve performance across a wide range of workflows. Global
namespace acceleration can be enabled if 2% or more of the nodes in the cluster contain
SSDs and 1.5% or more of the total cluster storage is SSD-based. The recommendation is
that at least 2% of the total cluster storage is SSD-based before enabling global namespace
acceleration. If you go below the 1.5% SSD total cluster space capacity requirement, GNA is
automatically disabled and all GNA metadata is disabled. If you SmartFail a node containing
SSDs, the SSD total size percentage or node percentage containing SSDs could drop below
the minimum requirement and GNA would be disabled.
GNA is less relevant in the latest generation of nodes, because they all contain SSDs that can
be used for L3 cache. Thus there is no need to make up for an absence of SSDs in any given

Isilon Solution Design 271

class of node. Very performance-sensitive customers should be encouraged to make use of
the latest hardware to meet their needs.

Virtual Hot Spare

Isilon Solution Design 272

Global Spillover

Isilon Solution Design 273

Global Namespace Acceleration

Isilon Solution Design 274

Considerations: File Tiering


You should plan to add more node space when the cluster reaches 80% so that it does not
reach 90%. The extra space is needed for moving data around the cluster, as well as for the
VHS space to rewrite data when a drive fails.
If you go below the 1.5% SSD total cluster space capacity requirement, GNA is automatically
disabled and all GNA metadata is disabled. If you SmartFail a node containing SSDs, the SSD
total size percentage or node percentage containing SSDs could drop below the minimum
requirement and GNA would be disabled. If you add high capacity nodes such as HD or NL
make sure you add nodes with SSD so that the 20% of nodes with SSD and the 1.5% cluster
capacity on SSD ratio doesn't break. If the ratio goes below 1.5% then data on the SSDs is
forcefully evacuated and GNA disabled without warning.
File pool policies should be carefully planned and specific enough to keep data from
matching more than one rule…if data matches more than one rule, it will follow the first rule
that it matches and this could cause data to be written to the wrong node pool or tier.
To help create file pool policies, OneFS also provides customizable template policies that can
be used to archive older files, increase the protection level for specified files, send files that
are saved to a particular path to a higher-performance disk pool, and change the access
setting for VMware files. To use a template, click View / Use Template.

Isilon Solution Design 275



There should always be at least 10% free capacity on the cluster. To check capacity you can
use the isi storagepool list command. If file pool policies are not being applied correctly,
check the file pool policy order - order does matter. You can also test the file pool policy
before applying it. This is a great way to keep from accidentally doing something wrong.

Isilon Solution Design 276

Lesson 4: Quotas


Upon completion of this lesson, you should be able to differentiate quota types, examine
thin provisioning and quota nesting, and establish best practices for quotas.

Isilon Solution Design 277

Overview: Quotas


SmartQuotas is a software module used to limit, monitor, thin provision, and report disk
storage usage at the user, group, and directory levels. Administrators commonly use file
system quotas as a method of tracking and limiting the amount of storage that a user, group,
or a project is allowed to consume. SmartQuotas can send automated notifications when
storage limits are exceeded or approached.
Quotas are a useful way to ensure that a user or department uses only their share of the
available space. SmartQuotas are also useful in enforcing an internal chargeback system.
SmartQuotas contain flexible reporting options that can help administrators analyze data
usage statistics for their Isilon cluster. Both enforcement and accounting quotas are
supported, and a variety of notification methods are available.
SmartQuotas allows for thin provisioning, also known as over-provisioning, which allows
administrators to assign quotas above the actual cluster size. With thin provisioning, the
cluster can be full even while some users or directories are well under their quota limit.
Administrators can configure notifications to send alerts when the provisioned storage
approaches actual storage maximums enabling additional storage to be purchased as

Isilon Solution Design 278



You can choose to implement accounting quotas or enforcement quotas. Accounting quotas
monitor, but do not limit, disk storage. They are useful for auditing, planning, and billing
purposes. The results can be viewed in a report. SmartQuotas accounting quotas can be
used to:
 Track the amount of disk space that various users or groups use
 Review and analyze reports that can help identify storage usage patterns
 Intelligently plan for capacity expansions and future storage requirements
Enforcement quotas include all of the functionality of accounting quotas, but they also
enables the sending of notifications and the limiting of disk storage. Using enforcement
quotas, a customer can logically partition a cluster to control or restrict how much storage a
user, group, or directory can use. Enforcement quotas support three subtypes and are
based on administrator-defined thresholds:
 Hard quotas limit disk usage to a specified amount. Writes are denied after the
quota threshold is reached and are only allowed again if the usage falls below the
 Soft quotas enable an administrator to configure a grace period that starts after the
threshold is exceeded. After the grace period expires, the boundary becomes hard,
and additional writes are denied. If the usage drops below the threshold, writes are

Isilon Solution Design 279

again allowed.
 Advisory quotas do not deny writes to the disk, but they can trigger alerts and
notifications after the threshold is reached.

Quota Types


There are five types of quotas that can be configured, which are directory, user, default user,
group, and default group.
 Directory quotas are placed on a directory, and apply to all directories and files
within that directory, regardless of user or group. Directory quotas are useful for
shared folders where a number of users store data, and the concern is that the
directory will grow unchecked because no single person is responsible for it.
 User quotas are applied to individual users, and track all data that is written to a
specific directory. User quotas enable the administrator to control how much data
any individual user stores in a particular directory.
 Default user quotas are applied to all users, unless a user has an explicitly defined
quota for that directory. Default user quotas enable the administrator to apply a
quota to all users, instead of individual user quotas.
 Group quotas are applied to groups and limit the amount of data that the collective

Isilon Solution Design 280

users within a group can write to a directory. Group quotas function in the same way
as user quotas, except for a group of people and instead of individual users.
 Default group quotas are applied to all groups, unless a group has an explicitly
defined quota for that directory. Default group quotas operate like default user
quotas, except on a group basis.

Overhead Calculation


Most quota configurations do not need to include overhead calculations. If you configure
overhead settings, do so carefully, because they can significantly affect the amount of disk
space that is available to users.
If you include data-protection overhead in a quota usage calculation, disk-usage calculations
for the quota subtract any space that is required to accommodate the data-protection
settings for that. The options are:
 Default: The default setting is to only track user data, which is just the data that is
written by the user. It does not include any data that the user did not directly store
on the cluster.
 Snapshot Data: This option tracks both the user data and any associated snapshots.
This setting cannot be changed after a quota is defined. To disable snapshot tracking,

Isilon Solution Design 281

the quota must be deleted and recreated.
 Data Protection Overhead: This option tracks both the user data and any
associated FEC or mirroring overhead. This option can be changed after the quota is
 Snapshot Data and Data Protection Overhead: Tracks all data user, snapshot and
overhead with the same restrictions.
 For example, consider a user who is restricted by a 40 gigabyte (GB) quota that
includes data-protection overhead in its disk-usage calculations. If the cluster is
configured with a 2x data-protection level and the user writes a 10 GB file to the
cluster, that file actually consumes 20 GB of space: 10 GB for the file and 10 GB for
the data-protection overhead. In this example, the user has reached 50% of the 40
GB quota by writing a 10 GB file to the cluster.
 Quotas can also be configured to include the space that is consumed by snapshots.
A single path can have two quotas applied to it: one without snapshot usage (default)
and one with snapshot usage. If snapshots are included in the quota, more files are
included in the calculation.

Quotas and Thin Provisioning


Isilon Solution Design 282

Thin provisioning is a tool that enables an administrator to define quotas that exceed the
capacity of the cluster. Doing this accomplishes two things:
1. It allows a smaller initial purchase of capacity/nodes, and the ability to simply add
more as needed, promoting a capacity on demand model.
2. It enables the administrator to set larger quotas initially so that continual increases.
However, thin provisioning requires that cluster capacity use be monitored carefully. With a
quota that exceeds the cluster capacity, there is nothing to stop users from consuming all
available space, which can result in service outages for all users and services on the cluster.
The rule with multiple quotas is that whichever quota uses the space first, wins. If Quota1
uses all 200 TB, then there is only 50 TB available for Quota2.

Quotas and SyncIQ


Quotas should be set on the Source directory allowed to replicate to the target directory. Do
not set quotas specifically on target directories. SyncIQ does not replicate the quota
configuration; merely the data in the directory. In OneFS 8.0, quotas can be permitted to
match 1:1 between the source and target clusters. Multiple quotas are supported within a
source directory or domain structure. During replication SyncIQ ignores quota limits.
However, if a quota is over limit, quotas still prevent users from adding additional data.
SyncIQ will never automatically delete an existing target quota. SyncIQ will prefer to fail the

Isilon Solution Design 283

sync job rather than delete an existing quota. This may occur during an initial sync where the
target directory has an existing quota under it, or if a source directory is deleted that has a
quota on it on the target. The quotas still remain and requires administrative removal if
Finally, one of the most common misconfiguration is setting quotas on a directory before the
migration to the cluster has completed. If you limit the amount of data on a directory during
a migration, it is possible to hit the quota and have the migration fail. Do not set quotas on
the directory during a migration. Wait until the migration is complete or the cutover has
been completed.

Considerations: Before Nesting Quotas


Nesting quotas refers to having multiple quotas within the same directory structure. In the
example shown, all quotas are hard enforced. At the top of the hierarchy, the
/ifs/data/media folder has a directory quota of 1 TB. Any user can write data into this
directory, or the /ifs/data/media/temp directory, up to a combined total of 1 TB.
The /ifs/data/media/photo directory has a user quota assigned that restricts the total
amount any single user can write into this directory to 25 GB. Even though the parent
directory (media) is below its quota restriction, a user is restricted within the photo directory.
The ifs/data/media/video directory has a directory quota of 800 GB that restricts the capacity

Isilon Solution Design 284

of this directory to 800 GB. However, if users place a large amount of data in the
/ifs/data/media/temp directory, say 500 GB, then only 500 GB of data can be placed in this
directory, as the parent directory (/media) cannot exceed 1 TB.

Considerations: SmartQuota


One of the unusual aspects of OneFS is that it has nested quotas. However, one should be
careful when using nested quotas because nested quotas can also cause performance
overhead as clients get deeper into the directory tree; a directory listing may cause the
calculation of multiple quotas to as a result. Because thin provisioning allows you to
provision more space than you physically have, some customers frown on creating quotas
that will thin provision the cluster. In the event that a customer is thin provisioning the
cluster, careful and efficient monitoring of the capacity of the cluster should be done to
ensure that the cluster does not run out of space, or to ensure ordering of more nodes
happens efficiently enough to mitigate any space issues on the cluster. Another
consideration is whether or not overhead calculations will be added to the quotas. Most
customers do not do this but in the event that the customer is doing charge backs of raw
space and overhead, then, and only then, would overhead calculations be taken into
consideration when designing the quota limits and quota location.
You should not configure any quotas on the root of the file system (/ifs), as it could result in
significant performance degradation. Remember, part of the OneFS operating system lives

Isilon Solution Design 285

under /ifs/.ifsvar. If you put a quota on /ifs, you are effectively limiting not only the clients
but the OS as well. For this reason, Isilon recommends never putting a quota on /ifs.



Prior to OneFS 8.0, user quotas had no options to show the available size of the individual
quota and instead would show the complete cluster size. When network drive is mapped to
a home directory the user will see 300 TB or the full size of the cluster instead of just seeing
the small quota that was set. In OneFS 8.0, an option has been added to allow the user with
a hard quota to see just the amount of space in the quota. This eliminates the chance of a
user getting angry thinking they have another few hundred TB of space when in fact they
were only allotted 10 GB for their directory.

Isilon Solution Design 286

Lesson 5: Deduplication


Upon completion of this lesson, you should be able to describe deduplication on Isilon,
illustrate proper workflows for SmartDedupe, and generate best practices for deduplication.

Isilon Solution Design 287

Overview: Deduplication


Deduplication maximizes the storage efficiency of a cluster by scanning the on-disk data for
identical blocks and then eliminating and duplicates. This approach is referred to as post-
process deduplication. Deduplication runs as an asynchronous batch job that occurs
transparently to the user. The deduplication job has a few phases. Transparent to the user,
the job first builds an index of blocks, against which comparisons are done in a later phase,
and ultimately confirmations and copies take place. Shown here is a cluster, the file system,
and two files in the /ifs/data directory called ’Canine’ and ’Feline’. This graphic show no
Let’s see how the deduplication looks. Obviously, this is a very simple example. The Canine
file is indexed to blocks shown in blue. The Feline file is indexed to blocks shown in green.
The actual deduplication job can be a very time consuming one, but because it happens as a
job, which is throttled by the load on the system, the actual customer experience is fairly
seamless. The job runs through blocks saved in every disk pool, and compares the block
hash values. If a match is found, and confirmed to be a true copy, the block is moved to the
shadow store, and the file block references are updated in the metadata. File metadata is
not deduplicated. One copy of duplicate blocks are saved, thus reducing storage
consumption. Storage administrators can designate which directories are to go through
deduplication, so as to manage the cluster’s resources to best advantage, not all workflows
are right for every cluster.
Because this is a post process form of deduplication, data has to be written to the system

Isilon Solution Design 288

before it is inspected. This has the benefit that cluster writes happen faster, but the
disadvantage is that the Isilon cluster may have duplicate data written to it before it is picked
up and reorganized to eliminate the duplicates.
Since the time deduplication takes is heavily dependent on the size and usage level of the
cluster, a large and complex environment would benefit from using the dry run procedure
and consulting with support or engineering. Another limitation is that the deduplication does
not occur across the length and breadth of the entire cluster, but only on each disk pool
individually. This means that some opportunities for deduplication may be missed if the
identical blocks are on different disk pools. This also means that data which is moved
between node pools may change the level of deduplication that is available for it.

Architecture: Deduplication


The administrator can use the web interface, REST API or CLI to manage the configuration,
scheduling, and control of the Deduplication job. The job itself is a highly distributed
background process that manages the orchestration of deduplication across all node in the
cluster. The job scans, detects and shares mating data blocks by using a 'fingerprint index' of
the scanned data. This index contains a sorted list of digital fingerprints (or hashes) and their
associated blocks. After the index is created the fingerprints are checked for duplicates.
Shadow stores are file system containers that allow data to be stored in a sharable manner.
As a result, files on OneFS can contain both physical data and pointers, or references, to

Isilon Solution Design 289

shared blocks in shadow store. Each shadow store can contain up to 256 blocks, with each
block able to be referenced by 32,000 files.

Qualifying Questions


The most important question to answer is: Will deduplication be worth it for the customer in
terms of storage saved, with respect to the load placed on the cluster?
Certain considerations may immediately preclude Isilon deduplication. Because it is a post
process, it will not satisfy any requirement for inline deduplication.
Deduplication by its nature does not deal well with compressed data since the compression
process tends to rearrange data to the point that identical files in separate archives are not
identified as such.
Unique files don’t duplicate each other, so the chances of blocks being found which are
identical are very low. The time and resources required to deduplicate a few blocks would be
unjustified. On the other hand, a home directory scenario in which many users may be
saving copies of the same file can offer excellent opportunities for deduplication.
Deduplication is more justifiable when the files in question are relatively static. Rapid
changes in the file system tend to undo deduplication, so that the net savings achieved at
any one time are low.

Isilon Solution Design 290

If in doubt, or attempting to establish the viability of deduplication, a good and relatively
nonintrusive way of identifying the practicality of deduplication is to perform a dry run. The
sharing phase is the slowest phase of deduplication, so the dry run usually places minor load
on the cluster and returns an estimate much more quickly than a full deduplication run. This
enables a customer to decide whether or not the savings offered by deduplication are worth
the effort and load.
The customer can run an analysis job on their production data that will not actually save any
space, but will report back how much capacity could be saved. This helps some
organizations justify the license purchase. It also helps Dell EMC/Partner sales teams sell the
license, which is not currently available as part of any bundles.
If SmartPools were in the picture, and the data was on two tiers, deduplication is constrained
per node pool, so in this example, the newer data would be deduped on the F800 node pool,
and the older data would be deduped on the A2000 node pool. This could decrease the
benefit of the license purchase, depending on what data is moved to a separate tier.

Deduplication and Rehydration Explained


One of the most fundamental components of OneFS SmartDedupe, and deduplication in

general is “fingerprinting”. In this part of the deduplication process, unique digital signatures,
or fingerprints, are calculated using the SHA-1 hashing algorithm; one for each 8 KB data

Isilon Solution Design 291

block in the sample. When dedupe runs for the first time, it scans the data set and selectively
samples blocks, creating the fingerprint index. This index contains a sorted list of the digital
fingerprints (or hashes) and their associated blocks. After creating the index, the job checks
the fingerprints for duplicates. When a match is found, during the sharing phase, a byte-by-
byte comparison of the blocks is performed to verify they are identical. SmartDedupe will
not deduplicate files that span SmartPools node pools or tiers, or that have different
protection levels set.

Considerations: Deduplication


Only a single instance of deduplication can run at one time. Even though this is a low priority
job the deduplication job will consume CPU and memory resources and should be run
during non-peak or off hour times. The job should be run multiple times to see the best
A file can be undeduped but it is critical to note that once a file is marked for undedupe, it
cannot be re-deduplicated. This is because an internal flag is set on the file once it is
undeduped. A customer would have to engage support to delve further into this situation.
Undedupe is a job that runs in the Job Engine and it must be started from the CLI. Prior to
running this job, you should remove the path you are expanding from the deduplication
configuration and then ensure that sufficient cluster capacity exists to hold the full,
undeduplicated directory.

Isilon Solution Design 292

Deduplication is most effective when applied to static or archived files and directories. The
less files are modified, the less negative effect deduplication has on the cluster. For example,
virtual machines often contain several copies of identical files that are rarely modified.
Deduplicating a large number of virtual machines can greatly reduce consumed storage

Lesson 6: Snaps


Upon completion of this lesson, you should be able to describe benefits of snapshots,
explain how snapshots work in OneFS, and establish best practices for SnapshotIQ.

Isilon Solution Design 293

Overview: Snapshots


SnapshotIQ can take read-only, point-in-time copies (snapshots) of any directory or

subdirectory within OneFS. When a snapshot is taken, it preserves the exact state of a file
system at that instant, which can then be accessed later. This immutable, point-in-time copy
has a variety of applications. For example, snapshots can be used to make consistent
backups, or to restore files, which were inadvertently changed or deleted. Snapshots are also
used for quickly identifying file system changes.
You can disable or enable SnapshotIQ at any time. You can configure basic functions for the
SnapshotIQ application, including automatically creating or deleting snapshots, and setting
the amount of space that is assigned exclusively to snapshot storage. An Isilon OneFS
snapshot is basically a logical pointer to data that is stored on a cluster at a particular point
in time. Each snapshot references a specific directory under OneFS, and includes all the files
stored in that directory and its subdirectories. If the data referenced by a snapshot is
modified, the snapshot stores a physical copy of the data that was modified.

Isilon Solution Design 294

Qualifying Questions


Determining when and where to snap is critical to a healthy SnapshotIQ deployment. You
must first understand the SLAs with your users and what their expected RPO (recovery point
objective) and RTO (recovery time objects) are. Basically…how far back can you go to recover
(RPO) and how quickly did you promise them you would get the data back (RTO)? Snapshots
are not backups. They should not be used as your primary backup solution as they will not
help you if the cluster fails, the site goes dark, or in the event of a natural disaster. Backups
should be kept in addition and stored offsite to mitigate these risk factors. Snapshots should,
however, complement your backup and allow for more granularity in your restore points. If
you only back up a directory once a month then maybe weekly snaps would provide the best
RPO for your users. Time must be taken to properly determine the answers to these
questions or you run the risk of an unruly deployment of snaps. The final piece to consider is
WHAT do you want to snap and WHERE in the directory tree do you want to snap? Never
snap the root of /ifs. Determine what directories or data is mission critical and snap
accordingly. Do you need a daily snap of the HR personnel photos directory? Probably not.
Do you need hourly snaps of a research project or drug trial? Maybe, you’ll know after
consulting with your user base.

Isilon Solution Design 295

Benefits of Using Snapshots


You can use snapshots to protect data against accidental deletion and modification. If a user
modifies a file and later determines that the changes were unnecessary or unwanted, the
earlier version of the file can be copied back from the snapshot. Reverting a directory is
called “SnapRevert”, which are covered in more detail later in the module.
Because snapshots are available locally, end users can often restore their data without the
assistance of a system administrator, saving administrators the time it takes to retrieve the
data from another physical location. Snapshots can be used to complement your backup
strategy and help meet your SLA (service level agreements) by offering granular rollback
options, depending on how you have configured your snapshots schedules. Snaps can be
configured and retained hourly, daily, weekly, monthly or yearly.

Isilon Solution Design 296



SnapshotIQ uses both copy on write (CoW) and redirect on write (RoW) strategies for its
differential snapshots, and uses the most appropriate method for a given situation. Both
have pros and cons, and OneFS dynamically picks which flavor to use in order to maximize
performance and keep overhead to a minimum.
With copy on write, as the name suggests, a new write to HEAD results in the old blocks
being copied out to the snapshot version first. Although this incurs a double write penalty, it
results in less fragmentation of the HEAD file, which is better for cache prefetch, etc.
Typically, CoW is most prevalent in OneFS, and is primarily used for small changes, inodes
and directories.
RoW, on the other hand, avoids the double write penalty by writing changes to a snapshot
protected file directly to another free area of the file system. However, the flip side to this is
increased file fragmentation. Since file contiguity not maintained by virtue of writing changes
to other file system regions, RoW in OneFS is used for more substantial changes such as
deletes and large sequential writes.
A snapshot is not a copy of the original data, but only an additional set of pointers to the
original data. At the time it is created, a snapshot consumes a negligible amount of storage
space on the cluster. Snapshots refers to or are referenced by the original file. If data is
modified on the cluster, only one copy of the changed data is made. This allows the
snapshot to maintain a pointer to the data that existed at the time that the snapshot was
created, even after the data has changed. A snapshot consumes only the space that is

Isilon Solution Design 297

necessary to restore the files contained in the snapshot. If the files that a snapshot contains
have not been modified, the snapshot consumes no additional storage space on the cluster.

Ordered vs Unordered Deletions


An ordered deletion is the deletion of the oldest snapshot of a directory, whereas an

unordered deletion is the deletion of a snapshot that is not the oldest snapshot of a
directory. Unordered deletions can take twice as long to complete and consume more
cluster resources than ordered deletions. However, unordered deletions can save space by
retaining a smaller total number of blocks in snapshots.
The benefits of unordered deletions versus ordered deletions depend on how often the data
referenced by the snapshots are modified. If the data are modified frequently, unordered
deletions save space. However, if data remains unmodified, unordered deletions will most
likely not save space, and it is recommended that you perform ordered deletions to free
cluster resources.

Isilon Solution Design 298

Considerations: Snapshot


You can create snapshots either by configuring a snapshot schedule or manually generating
an individual snapshot. Manual snapshots are useful if you want to create a snapshot
immediately, or at a time that is not specified in a snapshot schedule. For example, if you
plan to make changes to your file system, but are unsure of the consequences, you can
capture the current state of the file system in a snapshot before you make the change.
The most common method is to use schedules to generate the snapshots. A snapshot
schedule generates snapshots of a directory according to a schedule. The benefits of
scheduled snapshots is not having to manually create a snapshot every time you would like
one taken. You can also assign an expiration period to the snapshots that are generated;
automating the deletion of snapshots after the expiration period. It is often advantageous to
create more than one snapshot per directory, with shorter expiration periods assigned to
snapshots that are generated more frequently, and longer expiration periods assigned to
snapshots that are generated less frequently.
Delete snapshots in order beginning with the oldest. Do not delete from the middle of the
Snapshots should not be manually deleted but should have an expiration date set when
created. Caution: Deleting snapshots out of order may cause newer snapshots (that are
dependent on data that is being removed) to have to copy the blocks before deletion. This
increases the running time of the snapdelete job and potentially allows it to pause or queue
behind other, higher priority jobs. Therefore, snapshots should not be deleted out of order if

Isilon Solution Design 299

it can be avoided.
If the SnapshotDelete job does not run regularly, bad things can happen. Specifically, when
customers have fast-changing data sets, the cluster can fill quickly and eventually go read-

Considerations: Snapshots (cont'd)


If you are snapping data on a high performance tier and do not want to use the space on the
high performance tier, you can save the snaps to a lower cost-of-ownership tier.
As with all OneFS features, planning and analysis must be done before implementing a
feature on the cluster. One consideration that must be made is whether or not to use the
snapshot alias feature. An alias is a friendly name that always points to the most recent
version of the snapshot. This allow ease of use for the user when doing file or directory
restores as they know that ‘homedir-new’ will always contain the newest snapshot of the
home directories. Another consideration is that it is imperative when designing your
snapshot strategy to plan for expiry of your snapshots. If you do not set an expiry date, the
snapshot will sit on disk forever, potentially causing you to hit your snap limit or filling your
cluster up completely. Snapshots must be regularly deleted to maintain cluster performance.
When planning snaps, consideration must be taken of other OneFS features that use snaps.
SyncIQ, for example, regularly uses snapshots in its synchronization strategy and these

Isilon Solution Design 300

snapshots count against the total number of snaps as well as disk usage. Do not manually
delete snaps with a SIQ- preface because these snaps were created by SyncIQ and are
needed to continue replication.

Lesson 7: WORM Compliance


Upon completion of this lesson, you should be able to recognize use of system clock vs.
compliance clock and differentiate between standard cluster and Compliance mode cluster

Isilon Solution Design 301

Overview: SmartLock


SmartLock is a licensed software application that enables cost-effective and efficient

protection against accidental, premature or malicious deletion or modification of data. Files
are protected from change using SmartLock’s management capabilities. SmartLock provides
WORM (write-once / read-many) status on files. In a WORM state files can be read but not
modified. SmartLock has been integrated with SyncIQ, Isilon’s cluster-to-cluster replication
application, to provide failover capabilities and retention on the SyncIQ source and target. In
OneFS versions later than OneFS 8.0.1, SyncIQ failback is supported on SmartLock

Isilon Solution Design 302

SmartLock Terms


Before configuring SmartLock on a cluster, you must familiarize yourself with a few concepts
that are needed to fully understand SmartLock requirements and capabilities. The first
concept is file retention, which refers to a time period where files are set to a read-only state
and may not be moved, modified, or deleted until a future date. When the retention date is
reached, the file can once again be modified or deleted. Files from the Isilon cluster are
never automatically deleted and OneFS provides no automated means to delete files with
expired retention. The date varies by the organization’s internal and regulatory
requirements. A retention clock manages the date and time associated with the retention
Compliance refers to a regulatory requirement that carries certain restrictions as to how
retention must be implemented. The simple Securities and Exchange Commission (SEC) Rule
17a-4(f) definition states that “the requirement in paragraph (f)(2)(ii)(A) of the rule permits
use of an electronic storage system that prevents the overwriting, erasing or otherwise
altering of a record during its required retention period through the use of integrated
hardware and software control codes”. ( This
rule is often referred to as the regulatory standard that must be met for data retention by
other regulatory agencies. A specific compliance clock is used for compliance SmartLock
System integrity is one of the required elements to guarantee that the retention of the file
meets the compliance requirements. The system must be secure and protect against

Isilon Solution Design 303

modifications which could allow data to be modified or deleted. Retention date integrity is
another requirement that refers to how the retention date is stored and accessed so that
retention time requirements are met.
Committing a file refers to changing a file from a read-write state to a Write-Once-Read-
Many (WORM) state that has a retention expiration date. Files are committed to a WORM
state when using SmartLock.

Compliance vs SmartLock


Compliance Mode should only be used when SEC-17A4 must be adhered to. Compliance
Mode starts a separate non-changeable Compliance Mode clock. It also removes all root
access to the cluster. The compadmin account must be used. Not having root access means
that any problems become very difficult to troubleshoot.
Enterprise Mode SmartLock can accomplish many of the same goals without these limits. For
example, privileged delete can be set in Enterprise Mode to on / off / permanently
disabled on a SmartLock directory. If permanently disabled, even root could not turn
privileged delete back on.
In a standard Isilon cluster using Enterprise SmartLock directories, you can commit a file to a
WORM state manually or you can configure SmartLock to automatically commit the file. You
can create two types of SmartLock directories: enterprise and compliance. However, you can

Isilon Solution Design 304

create compliance directories only if the EMC Isilon cluster has been upgraded to SmartLock
compliance mode. Before you can create SmartLock directories, you must activate a
SmartLock license on the cluster.
Enterprise directories enable you to protect your data without restricting your cluster to
comply with regulations defined by U.S. Securities and Exchange Commission rule 17a-4. If
you commit a file to a WORM state in an enterprise directory, the file can never be modified
and cannot be deleted until the retention period passes. However, if you own a file and have
been assigned the ISI_PRIV_IFS_WORM_DELETE privilege or are logged in through the root
user account, you can delete the file before the retention period passes through the
privileged delete feature. The privileged delete feature is not available for compliance
directories. Enterprise directories reference the system clock.

SmartLock Directory Types


Different types of directories have different SmartLock capabilities. When using SmartLock,
there are two types of directories: enterprise and compliance. A third type of directory is a
standard or non-WORM directory. A license must be installed on the cluster to enable
SmartLock capabilities.
Standard non-WORM directories can be supported on the same cluster with SmartLock
directories. Standard directories are just typical directories with read, write, modify, execute

Isilon Solution Design 305

privileges, but have no data retention capabilities.
Enterprise SmartLock directories are data retention directories that do not meet SEC
regulatory compliance requirements. These are the most commonly used directories in a
SmartLock configuration. Enterprise SmartLock directories have the option to allow
administrators or RBAC enabled users the capability to delete files. This capability is known
as privileged deletes. Privileged deletes can be enabled or turned on, temporarily disabled or
turned off, or permanently disabled. To create or modify a directory as an Enterprise
directory, the directory may be fully populated with data or empty, in previous versions this
process required the directory to be empty.
Compliance SmartLock directories are data retention directories that meet SEC regulatory
compliance requirements. A cluster must be setup as a Compliance mode cluster to support
compliance SmartLock directories.
A standard directory can be changed to an Enterprise SmartLock directory. The standard
directory can have data in it. A standard directory may be converted to a Compliance
SmartLock directory.
An empty Enterprise SmartLock directory can be upgraded to a compliance SmartLock
directory. The change process is one way, from Enterprise to compliance. When this occurs,
privileged deletes are disabled permanently and cannot be changed back. The directory is
also set to use a compliance clock instead of the system clock. Data must be copied into the
compliance SmartLock directory structure before the data can be committed to a WORM
state. The compliance clock must be enabled before creating compliance SmartLock

Isilon Solution Design 306

Committing Files to WORM


In order for a file to have a file retention date applied and to be set to a read-only state, the
file must be committed to WORM. Until they are committed to WORM, files that are in a
SmartLock directory act as standard files that may be moved, modified, or deleted. Files may
be committed manually or by using autocommit. Files can be manually committed by the
administrator or user through Windows controls or UNIX commands. The manual commit
process involves first setting the retention date on the file, then committing the file to WORM
state. This provides a high-level of control as to when the file is committed to WORM,
however, it adds a level of management to the process.
The other option is to autocommit files to WORM. Autocommit in SmartLock sets a time
period since the file was last modified on a directory. During the autocommit offset time
period files maybe deleted, modified or moved. After the time period has expired, the file is
automatically committed to WORM. Autocommit automates the process and removes
management intervention and ensures a high-level of adherence to the organizations
retention policies.

Isilon Solution Design 307

Considerations: WORM


SmartLock retention settings apply to both enterprise SmartLock directories and the more
stringent compliance SmartLock directories. SmartLock includes the capability to set explicit
retention expiration dates on a directory or on a per file basis. Explicit retention expiration
dates are set manually by the administrator or by using Windows Power Shell and UNIX
commands. The preferred method is to use the default directory retention period setting.
The default directory retention setting is used to apply the same retention offset period to all
files in that directory when they are committed to WORM.
To ensure proper retention requirements are met, retention boundary parameters are used.
SmartLock includes the capability to set both minimum and maximum retention periods.
The minimum and maximum parameters override any file expiration date outside of the
boundaries to guarantee adherence to retention requirements.
SmartLock enables the administrator to extend a retention date on specific files and
directories. This provides the capability to override a retention date for use cases such as
litigation hold requirements where files may be required to be preserved beyond the initial
retention period requirement.
Retention dates involve the use of a clock both for setting and referencing the retention date.
The system clock is the typical clock used on the cluster for date and time reference.
Regardless of whether the cluster is a standard Isilon cluster or a Compliance mode cluster,
the system clock is used for standard directories and for enterprise SmartLock directories.
The system clock can be changed by an administrator. Changing the system clock may affect

Isilon Solution Design 308

retention periods for files and allow access to files before the original expiration date.
A compliance directory uses the compliance clock instead of the system clock. The
compliance clock is independent from the system clock. It is set once and may not be
changed after it is set. The compliance clock is used only with compliance SmartLock
directories for setting and referencing retention dates. The compliance clock is initially set on
a cluster using the system clock. If the compliance and the system clock begin to differ from
each other, the compliance clock has an auto-correction mechanism programmed that will
slowly drift towards the system clock time over time. This drift may be up to 14 days per year.
The drift is used to make small time correction to accommodate variances caused by system
down time without jeopardizing the retention date integrity. Files protected in a compliance
SmartLock directory retain the same retention period they had remaining when the cluster
was shut down.

Considerations: WORM (cont'd)


Compliance mode clusters should only be used when required to meet regulatory
RBAC is available for both standard cluster and compliance mode cluster. This applies for all
RBAC supported functionality for management and control.
Administration of the cluster varies based on the whether the cluster is a standard or in

Isilon Solution Design 309

compliance mode. On a standard Isilon cluster, the root user can be used to perform
SmartLock commands, as well as other management tasks on the cluster. However, to meet
the system and retention integrity requirements for compliance, the root user is disabled on
a compliance mode cluster and cannot be used. The compadmin user is used to manage
compliance mode clusters. All commands that could potentially affect either system or file
integrity are not available on compliance mode clusters. All other commands have been
converted for use through the use of sudo pre-fix.
SmartLock requires the licensed version of IsilonSD Edge. Enterprise and compliance modes
are supported, but the IsilonSD cluster likely does not comply with the regulations defined
by U.S. Securities and Exchange Commission rule 17a-4. This is because the virtualization
software on which the IsilonSD cluster runs maintains a root user who could theoretically
tamper with the disk configuration of the virtual cluster, and therefore the data that resides
on it. When an IsilonSD cluster is placed in compliance mode, you cannot add new nodes to
the cluster. Therefore, you must add as many nodes as necessary before upgrading the
cluster to SmartLock compliance mode.

Considerations: WORM (cont'd)


Some other SmartLock limitations to be aware of and address are there is no auto-delete
functionality in OneFS. The position is that file deletion should be a selective choice on a case
by case basis. Once deleted, the data may be unrecoverable except from snapshots or

Isilon Solution Design 310

backup. To clean up old files, all files past the retention date must be identified. There is no
easy way to identify these files in OneFS today. Once identified, the files with expired
retention can be deleted by normal methods.
The limited search capabilities offers a challenge on managing files with expired retention.
You can use the isi worm files view command to verify the retention status for any file.
Another option that is not recommended and has created issues with some customers is
running the UNIX command rm -rf to recursively delete any file without verification that is
not protected by its retention date setting. You should always include the path to the
SmartLock directory to avoid accidental deletion of non-WORM files. All files in the directory
are tried for deletion and deleted if possible without confirmation you want to delete the file.
Only the SmartLock retention setting prevents the files from deletion. You could run the rm -
r </path> command without the -f option and then confirm each deletion. If you use the
command, we recommend you confirm deletion to insure only the desired files are removed.
Fail back of SmartLock compliance mode directories is supported as of OneFS 8.0.1. Earlier
versions require migrating SmartLock compliance directories from the recovery cluster. Data
failover and failback with earlier versions of OneFS are supported for SmartLock enterprise

Lesson 8: Antivirus


Isilon Solution Design 311

Upon completion of this lesson, you should be able to describe types of file remediation and
establish ICAP design considerations.

Overview: Antivirus


Files stored on an Isilon cluster can be scanned for viruses and other security threats by
integrating with third-party scanning services through the Internet Content Adaptation
Protocol (ICAP). OneFS sends files through ICAP to a server running third-party antivirus
scanning software. These servers are referred to as ICAP servers. ICAP servers scan files for
viruses. If a threat is detected, OneFS informs system administrators by creating an event,
displaying near real-time summary information, and documenting the threat in an antivirus
scan report. You can configure OneFS to request that ICAP servers attempt to repair,
quarantine, or truncated infected files.

Isilon Solution Design 312



Repair: The ICAP server attempts to repair the infected file before returning the file to
Quarantine: OneFS quarantines the infected file. A quarantined file cannot be accessed by
any user. However, a quarantined file can be removed from quarantine by the root user if
the root user is connected to the cluster through secure shell (SSH). If you backup your
cluster through NDMP backup, quarantined files will remain quarantined when the files are
restored. If you replicate quarantined files to another Isilon cluster, the quarantined files will
also to be quarantined on the target cluster. Quarantines operate independently of access
control lists (ACLs).
Truncate: OneFS truncates the infected file. When a file is truncated, OneFS reduces the size
of the file to zero bytes to render the file harmless.

Isilon Solution Design 313

Scan Options


You can configure global antivirus settings that are applied to all antivirus scans by default.
You can exclude files from antivirus scans based on a variety of filters and you can filter
using wildcard characters such as *.jpg. When you configure a filter you can prevent files
from being scanned by the antivirus scans. These filtered settings will apply to all antivirus
scans. You can configure OneFS to automatically scan files as they are accessed by users.
On-access scans operate independently of antivirus policies. Administrators can manually
scan a file or directory or they can manually run an antivirus policy at any time. This
procedure is available only through the web administration interface.

Isilon Solution Design 314

Considerations: Antivirus


If you configure more than one ICAP server for a cluster, it is important to ensure that the
processing power of each ICAP server is relatively equal. OneFS distributes files to the ICAP
servers on a rotating basis, regardless of the processing power of the ICAP servers. If one
server is significantly more powerful than another, OneFS does not send more files to the
more powerful server.
The number of ICAP servers that is required to support an Isilon cluster depends on how
virus scanning is configured, the amount of data a cluster processes, and the processing
power of the ICAP servers. If you intend to scan files exclusively through antivirus scan
policies, it is recommended that you have a minimum of two ICAP servers per cluster. If you
intend to scan files on access, it is recommended that you have at least one ICAP server for
each node in the cluster.

Isilon Solution Design 315

Module 5: Replication and Recovery


Upon completion of this module, you will be able to identify uses for data replication
(backups, site-to-site), explain SyncIQ limitations for disaster recovery, and determine use
cases for Backup Accelerators and when to use versus replication or a snap-and-replicate

Isilon Solution Design 316

Lesson 1: Replication


Upon completion of this lesson, you will be able to understand SyncIQ replication
capabilities, evaluate replication for disaster recovery, and explain SyncIQ limitations for
disaster recovery.

Isilon Solution Design 317

Overview: SyncIQ


OneFS enables you to replicate data from one Isilon cluster to another through the SyncIQ
software module. You must activate a SyncIQ license on both Isilon clusters before you can
replicate data between them. You can replicate data at the directory level while optionally
excluding specific files and sub-directories from being replicated. SyncIQ creates and
references snapshots to replicate a consistent point-in-time image of a root directory.
Metadata, such as access control lists (ACL) and alternate data streams (ADS), are replicated
along with data. SyncIQ enables you to maintain a consistent backup copy of your data on
another Isilon cluster. SyncIQ offers automated failover and failback capabilities that enable
you to continue operations on another Isilon cluster if a primary cluster becomes
Replication most often takes place between two storage devices, a primary and a secondary.
The primary holds the gold copy of the data which is actively used by clients. The primary is
the source of the replication. The secondary is the target of the replication and holds a copy
of the data. If the source gold data gets updated on the primary, those updates are
eventually replicated to the target.

Isilon Solution Design 318

Replication Options


Isilon offers replication as an option. SyncIQ replication is performed cluster-to-cluster,

asynchronously. With SyncIQ you can replicate file to the same cluster to create an
additional copy of high value data. You can replicate across the LAN to a second cluster to
protect against cluster failure. You can replicate over the WAN to a remote cluster to protect
against cluster and site failure. You can also replicate one-to-many to multiple sites to
protect against multiple cluster and multiple site failures, or to distribute data geographically.

Isilon Solution Design 319

Architecture: SyncIQ Core Features


SyncIQ core features include replication and synchronization of data files contained in the
file system structure. The OneFS top tree directory is /ifs. You should not use /ifs as a
replication domain, but only selected subdirectories. Replication is from the source SyncIQ
domain to the target SyncIQ domain. A domain is defined as from a starting point in the
directory path, such as /ifs/ClusterA/data/foo forward, containing the subfolders down the
tree. SyncIQ only replicates file data. The data can be copied from the source to the target,
or the target can be synchronized with the source.
SyncIQ runs as jobs under its own job engine that is separate from the cluster maintenance
activity job engine in OneFS. SyncIQ runs based on SyncIQ policies. The policies can be
scheduled or run as required manually.
SyncIQ includes the capability to perform semi-automated failovers from source to target,
and semi-automated failback from target to original source. Failover and failback only
include the cluster preparation activities and do not include DNS changes, client redirection
or any required networking changes. Failover, failback and external settings are discussed in
greater detail later in this module.

Isilon Solution Design 320

Compatibility Matrix and Caveats


SyncIQ supports various versions of OneFS as a source or as a target. We have the capability
to have a OneFS version on the target cluster lower than the OneFS version on the source
cluster. The compatibility chart is displayed.
When the OneFS version is not the same on both the source and target clusters, some
functions are not available because the earlier releases will have no ability to support
features added in the newer OneFS releases. Minor feature changes require the same or
newer version on the target to enable the new features on the source side. One feature
added in OneFS 7.1.1 was the ability to divide large files to be worked on by multiple workers
at a time. The source and target must be at the same minimum OneFS release level for the
feature to be enabled. Other examples are if the target cluster is OneFS 5, the automated
failover functionality is not available. Conversely, if the source's OneFS version is pre-OneFS
8.0, the automated failback functionality is not available.
Another more recent development is the ability to fail back SmartLock protected data in
compliance mode. This was introduced in OneFS 8.0.1. This function is not backward
compatible with earlier versions of OneFS, but is forward compatible with OneFS 8.1.

Isilon Solution Design 321

SyncIQ Options


SyncIQ includes the capability to stop a failover in progress and revert back to the pre-
failover state. The semi-automated failover process preserves the synchronization
relationships between the source and target clusters. Optionally the relationship can be
broken if required and re-established when appropriate. Since SyncIQ is RBAC ready, the
management capabilities can be included in administration roles. For organizations
automating processes, the platform application programming interface or PAPI integration is
available. A scheduling option of when-changes-occur is included to aid with content
distribution workflows. Content distribution workflows have infrequently changed data and
a need to distribute the changes as soon as possible. This is not an implementation to
enable continuous replication. Use time based scheduling for all workflows besides content
distribution. The number of generated snapshots can be an issue when used improperly.
The SyncIQ process uses snapshots on both the source and target clusters. No Snapshot IQ
license is required for basic SyncIQ snapshots on either the source or target clusters. These
snapshots are only used for SyncIQ jobs and are single-instance snapshots with the latest or
last-known good version being retained. To enable multilayered, historical, archival snapshot
use on either cluster, SnapshotIQ licenses are required. SyncIQ is able to support larger
maximum transmission units or MTU over the LAN or WAN. SyncIQ supports auto-
negotiation of MTU sizes over WAN connections. The MTU across the network is negotiated
by the network.
OneFS and SyncIQ negotiate with the switch to set the appropriate MTU. The MTU should be

Isilon Solution Design 322

the same from end-to-end for the connection.
Some additional capabilities aid specific use cases. SyncIQ has the capability to import
manually taken snapshots to use as the point-in-time reference for synchronization
consistency. You can add new nodes while a sync job is running. There is no requirement to
stop the sync job before adding new nodes. Especially useful in troubleshooting potential
sync job issues, you can change the verbosity of the sync policy logging mid-job.
We have added functionality enabling the ability to create a point-in-time report showing the
SyncIQ worker activity. Point-in-time reports are pictures at a given instance of the SyncIQ
worker activity. The ability to see how many workers are active is very useful in
troubleshooting potential performance issues. Run the isi sync jobs reports list -v
command to view detailed worker output. You may want to output to a text file to simplify
viewing of the output.

Data Protection - Copy vs. Sync


What is the goal or the requirement for replication? Is a mirrored copy of the source the
goal? Or is the goal to have all source data copied and retain deleted file copies in case they
are required later? With SyncIQ you can choose the option to meet your goal for each
replication policy. When you create a SyncIQ policy you must choose a replication type of
either sync or copy.

Isilon Solution Design 323

Sync maintains a duplicate copy of the source data on the target. Any files deleted on the
source are removed from the target. Sync does not provide protection from file deletion,
unless the synchronization has not yet taken place.
Copy maintains a duplicate copy of the source data on the target the same as sync. However,
files deleted on the source are retained on the target. In this way copy offers file deletion,
but not file change protection. This retention is passive and not secure retention as provided
by SmartLock. Copy policies can include file filter criteria not available with the
synchronization option.
You can always license SnapshotIQ on the target cluster and retain historic SyncIQ
associated snapshots to aid in file deletion and change protection.

How Does SyncIQ Work?


The SyncIQ process executes the same way each time a SyncIQ job is run. SyncIQ uses
snapshot technology to take a point in time copy of the data on the source cluster before
starting each replication or copy job; compares the new source snapshot to the last known
good source snapshot and creates a changelist based on the differential between the
snapshots. The changed directories, files and metadata are replicated at the block level. The
first time a SyncIQ policy is run, a full replication of the data from the source to the target
occurs. Subsequently, when the replication policy is run, only new and changed files are

Isilon Solution Design 324

replicated. The snapshot is taken in case a sync job fails and is used to reverse any target
cluster modifications to return the target to the last known good state.
When a SyncIQ job completes successfully, a snapshot is taken on the target cluster. This
snapshot replaces the previous last known good snapshot. The snapshot is taken if a sync
job fails and is used to reverse any target cluster modifications to return the target to the
last known good state.
On the source cluster when a SyncIQ job completes successfully, the system deletes the
previous source cluster snapshot, and retains only the most recent snapshot.
Historical snapshots can be maintained and deleted using the options in the SyncIQ policy.
Historical snapshots on the source or target clusters require a SnapshotIQ license.

SyncIQ Limitations


SyncIQ does not offer high availability (HA). The target cluster contains a copy of the source
data synchronized on a schedule. The implementation is active on the source cluster with a
read-only copy on the secondary cluster. Actions must be taken to make the target copy
read/writeable. The use is for disaster recovery or to maintain a second copy of the data only.
As a standard business practice, failover should be used only for major outages or for use in
controlled or planned outage situations.

Isilon Solution Design 325

Performing a complete failover and failback test on a monthly or quarterly basis is
discouraged. It can be performed as long as writes to the source are quiesced (prevented
from changing the data) and all SyncIQ policies are successfully run a final time to assure
complete synchronization between source and target. Failing to perform a final
synchronization can lead to lost data. An alternative test option involves creating a test
policy that is discussed later in this module.
Failing over to the target is not required to retrieve a copy of the data from the target cluster.
The target is a read-only copy of the data. Perform a copy operation to make a copy of the
read-only data on the target cluster to a location outside of the SyncIQ domain on the target,
or to a location on the source cluster, or to the client.
The SyncIQ policy scheduling option Whenever the source is modified is not for
continuous replication. Isilon does not offer a continuous replication option. This option is
for specific workflows that have infrequent updates and require the information to be
distributed as soon as possible. The workflows it was designed for are content distribution
and EDA. Serious issues can be created when using this option to try to simulate a
continuous replication scenario. When using this scheduling option historic snapshots must
be turned off to avoid potentially serious issues.

File System Effect on Disaster Recovery


Isilon Solution Design 326

The mount entry for any NFS connection must have a consistent mount point so that during
failover, you don’t have to manually edit the file system table (fstab) or automount entries on
all connected clients. For more information, see the SyncIQ Performance, Tuning and Best
Practices guide, and this discussion in the EMC Community Network:

Implementation Practices


SyncIQ should replicate files under the /ifs top-level directory. You should never create a
SyncIQ policy to replicate /ifs. All SyncIQ policies should at a minimum be at least one
directory level removed, i.e., /ifs/data. OneFS does not natively replicate any of the cluster
configuration data. SMB shares, NFS exports, local providers, file providers, and snapshots
on the source cluster are not replicated with SyncIQ. Each cluster is a standalone unit for
configuration. The target cluster to accept access from the clients in the same manner as the
source cluster must be configured the same way. Snapshots on the source should not be
contained on the target cluster. If you require snapshots on the target cluster, the snapshots
should be taken based on the target data.
The RPO for any SyncIQ directory is the point of the last successful synchronization. If a
synchronization is interrupted or has failed, the entire synchronization session is rolled back
to the last known good snapshot.

Isilon Solution Design 327

Retention systems all have issues due to the nature of setting files as immutable, and
SmartLock is no exception. Failover used to be a one-way event with SmartLock Compliance
directories, until OneFS 8.0.1. Failback is now supported on compliance directories. OneFS
now has a conflict detection system that will report when a failback introduced a conflict
among different versions of committed files, as well as a store in which older versions of
committed files are retained. This satisfies SEC regulatory requirements for file retention.
External cluster changes must be managed outside of failover and failback SyncIQ policy
settings. LAN / WAN connectivity must be valid to reach both clusters. The DNS or client
redirection changes must be performed separate from the SyncIQ policies. Permissions and
authentication must be valid on both clusters for the users. AD or LDAP authentication must
be accessible and applied to both primary and secondary clusters.

Source Node Selection / Restriction


Selecting run on all nodes means that the cluster can use any nodes in the cluster to run
the SyncIQ policy, and use any of its external interfaces to replicate the data to the
secondary cluster. Selecting run on only the nodes in the specified subnet and pool, means
that only those interfaces which are members of that specific pool move the replication
traffic. This option is effectively selecting a SmartConnect zone over which the replication
traffic is transferred. You would pick the appropriate subnet and pool from the drop-down
menu. The menu lists all the subnets and pools on the primary cluster. SyncIQ only supports

Isilon Solution Design 328

static IP address pools. Only static address pools should be used. If a replication job
connects to a dynamically allocated IP address, SmartConnect might reassign the address
while a replication job is running, which would disconnect the job and cause it to fail. In the
policy-configuration content, specifying file criteria in a SyncIQ policy slows down a copy or
synchronization job. Using includes or excludes for directory paths does not affect
performance, but specifying file criteria does.

Target Settings / Restrict Nodes


To select the target cluster you can enter the fully qualified domain name, the host name,
the SmartConnect zone, the IPv4 or IPv6 IP address of any node in the target cluster. You can
also enter localhost for directing replication within the same cluster. When connecting over a
WAN link, many situations require using a separate static SmartConnect zone. To accomplish
this, use the DNS SmartConnect SSIP. The target directory or target SyncIQ protection
domain top-level directory should be identical to the source SyncIQ domain. To limit the
target nodes to only run SyncIQ jobs on the nodes connected to the SmartConnect zone, you
must check the box Connect only to the nodes within the target cluster SmartConnect

Isilon Solution Design 329

Target Snapshots


Snapshots are used on the target directory on the secondary cluster to retain one or more
consistent recover points for the replication data. You can specify if and how these
snapshots are generated on the secondary cluster. If you want to retain the snapshots
SyncIQ takes, then you should check the box Capture snapshots on the target cluster.
SyncIQ always retains one snapshot of the most recently replicated delta set on the
secondary cluster to facilitate failover, regardless of this setting. Capture snapshots will
retain them beyond the time period in which SyncIQ needs them.
The Snapshot Alias Name is the default alias name for the most recently taken snapshot.
Note the alias name pattern. If this snapshot alias were taken on a cluster called “cluster1”
for a policy called “policy2” it would have the alias “SIQ_cluster1_policy2”.
The Snapshot Naming Pattern field shows the default naming pattern for all snapshots. To
modify the snapshot naming pattern, in the Snapshot Naming Pattern box, type a naming
pattern. Each snapshot generated for this replication policy is assigned a name based on this
pattern. Using the example naming pattern shown produces names similar to newPolicy-
In the Snapshot Expiration section, specify whether you want SnapshotIQ to automatically
delete snapshots generated according to this policy and/or how long to retain the snapshots,
either Snapshots do not expire or Snapshots expire after, and then stipulate the time
period. The options are in days, weeks, months, and years.

Isilon Solution Design 330

Target Compare Initial Sync (Diff-Sync)


During a full synchronization, SyncIQ transfers all data from the source cluster regardless of
what data exists on the target cluster. Full replications consume large amounts of network
bandwidth and may take a very long time to complete. A differential synchronization
compares the source and target data by doing tree walks on both sides. This is used to re-
establish the synchronization relationship between the source and target. Remember that a
full tree walk will take a lot of I/O and CPU power to complete, so you don't want to do full
synchronizations any more than you have to. Following the tree walks, the changed data is
replicated in place of a full data synchronization. The differential synchronization option is
only executed during the first time the policy is run during the failback operation, after which
the policy will return to using snapshots and the changelist to replicate the differences.
Some SyncIQ replication issues may require using this option including when a SyncIQ policy
is modified. If you modify the source directory, any included or excluded directories, any file
criteria, change the target cluster, or target directory, either a full or differential
synchronization is required.
Before you run the replication policy again, you must enable a target compare initial sync,
using the command on the primary isi sync policies modify <policy name> --target-
compare-initial-sync on. With target-compare-initial-sync on for a policy, the next time
the policy runs, the primary and secondary clusters will do a directory tree walk of the

Isilon Solution Design 331

source and target directory to determine what is different. It will then only replicate just
those differences from the source to the target.
The target-compare-initial-sync option determines whether the full or differential
replications are performed for this policy. Full or differential replications are performed the
first time a policy is run and after a policy has been reset. If set to on, the cluster performs a
differential replication. If set to off, the cluster performs a full replication. If differential
replication is enabled the first time a replication policy is run, the policy runs slower without
any benefit. The default value is off.

Policy Assessment


SyncIQ can conduct a trial run of a policy without actually transferring all the file data
between the primary and secondary cluster. This is called an Assessment. SyncIQ scans the
data set and provides a detailed report of how many files and directories were scanned. This
is useful if you want to preview the size of the data set that will be transferred if you run the
Running a policy assessment is also useful for performance tuning, allowing you to
understand how changing worker loads affects the file scanning process so you can reduce
latency or control CPU resource consumption. It also verifies that communication between
the primary and secondary clusters is functioning properly. The benefit of an assessment is

Isilon Solution Design 332

it can tell you whether your policy works and how much data will be transferred before
you’ve run the policy. This can be useful when the policy will initially replicate a large amount
of data. If there is a problem, with your policy it would be better to know that before you
start moving a large amount of data across your network. This functionality is available only
after you create a new policy and before you attempt a normal synchronization for the first
You can assess only replication policies that have never been run before. You have to run
the assessment when the policy is new. This can be done in the web administration interface
or from the command-line. You can view the assessment information in the SyncIQ report,
which gets generated when you run the assessment. The report displays the total amount of
data that would have been transferred in the Total Data Bytes field.

Managing SyncIQ Performance


One of the simplest ways to manage resource consumption on the source and target
clusters is with proper planning of job scheduling. If the business has certain periods when
response time for clients is critical, then replication can be scheduled around these times. If
a cluster is a target for multiple source clusters, then modifying schedules to evenly
distribute jobs throughout the day is also possible. Another way to maintain performance at
either the source or target cluster is to use a more specific directory selection in the SyncIQ
policy. This can be useful in excluding unnecessary data from replication and making the

Isilon Solution Design 333

entire process run faster, but it does add to the administrative overhead of maintaining
policies. However, when required recovery time objectives (RTOs) and recovery point
objective (RPOs) dictate that replication schedules be more aggressive or data sets be more
complete, there are other features of SyncIQ that help address this.
SyncIQ offers administrators the ability to control the number of workers that are created
when a SyncIQ job is run. This can improve performance when required or limit resource
load if necessary. Administrators can also specify which source and target nodes are used
for replication jobs on a per policy basis. This allows for the distribution of workload across
specific nodes to avoid using resources on other nodes that are performing more critical
Replication bandwidth between the source and target cluster can be limited to preserve
network performance. This is useful when the link between the clusters has limited
bandwidth or to maintain performance on the local network. To limit node resource load,
administrators can also use file operation rules to limit the number of files that are
processed in a given time period, this feature though would only be practical if the majority
of the files were close in size.

Managing Performance: Workers


When a replication job runs, SyncIQ generates worker processes on the source and target

Isilon Solution Design 334

cluster. Workers on the source cluster send data while workers on the target cluster receive
and write data. For example, you can increase the maximum number of workers per node to
increase the concurrent number of files being processed per SyncIQ job. SyncIQ jobs may
have a negative effect in overall node or cluster performance or client response. Conversely,
some SyncIQ jobs may require a higher number of workers in order to replicate data in a
timely fashion. Administrators can control the number of active workers per node using the
SyncIQ policy. The default value is three workers per node and can be modified up to the
maximum of eight per node. When replicating a data set with many small files, increasing
the number of workers per node increases the number of files processed at one time.
However, more workers consume system resources, so caution should be exercised when
making changes to this setting. Each source or primary worker has a corresponding target or
secondary worker.

Worker Efficiency - Large File Splitting


For most operations, the number of SyncIQ workers per file is fixed as one worker per file on
both the primary or source cluster, and the secondary or target cluster. The work is divided
amongst the threads or workers at a file level granularity. Each worker “locks” a single file
then works to transfer it. That means one worker per file. As the SyncIQ job runs the number
of remaining files to replicate decreases and the number of active workers decreases. In
many cases the last portion of a SyncIQ job involves a single worker completing a file sync on

Isilon Solution Design 335

a large file. Until the SyncIQ job completes, another new or queued SyncIQ job cannot start
as part of the five concurrent running SyncIQ jobs.
However, large file synchronization work is divided at the file sub-range and distributed
across threads. A sub-range is a given portion of the file. Instead of locking at a file level,
locking occurs on the sub-range. The replication state, or repstate, is also tracked based on
the file sub-range. This implementation enables multiple workers or threads per file.
Dividing of files is necessary when the remaining file replication work is greater than or
equal to 20 MB in size. The number of file splits is limited only by the maximum SyncIQ
workers per job. File splitting avoids SyncIQ jobs dropping to single-threaded behavior if the
remaining work is a large file. The resultant behavior is overall SyncIQ job performance by
providing greater efficiency for large files and a decreased time to job completion.
File splitting is enabled by default, but only when both the source and target cluster are at a
minimum of OneFS 7.1.1. It can be disabled or enabled on a per policy basis using the
command isi sync policies modify <policy_name> --disabled-file-split [ true | false ]. True
to disable, false to re-enable if it had been disabled.
File splitting is enabled by default at the time the replication policy is created. File splitting
can be disabled manually using the CLI. Use the isi sync policies modify command with the
policy_name and the --disable-file-split option followed by true or false to set the policy
state. Note that the --disable-file-split option is hidden and not listed using the -h or --help
Both the source and target clusters must be running OneFS 7.1.1 or newer to enable file
splitting. If either the source or the target cluster is pre-OneFS 7.1.1, file splitting cannot be

Isilon Solution Design 336

SmartLock Compliance Mode Replication


The process surrounding replication and failover for compliance mode SmartLock domains
is quite simple in principle. Any committed file can not be deleted. This is a regulated
government requirement. On the other hand, managing failover and failback is a realistic
business requirement. The solution is to fail over and fail back, but to look for files with
conflicting versions. Both versions are retained on the cluster, to meet regulations. This can
potentially increase the file footprint on the cluster, but that is a reason to be judicious about,
which files are committed, rather than instituting some sort of blind blanket policy. This is an
important discussion to have with the customer when planning this sort of installation.

Isilon Solution Design 337

Overview: SyncIQ Worker Pools


The concept of the SyncIQ worker pool is introduced in OneFS 8.0. As the cluster grows,
more workers are available for allocation to all running policies. Workers are then
dynamically allocated equally to all running policies. To help manage resource usage during
scheduled events, the bandwidth throttling option is retained and two new throttling options
are added, worker throttling and CPU usage throttling.

Isilon Solution Design 338

SyncIQ Scalability - Limits


With OneFS 8.0, new limits are defined. The number of active SyncIQ policies is increased
from 100 to 1,000, which is a 10 fold increase. The number of running SyncIQ jobs is
increased from 5 to 50, also a 10 fold increase. The maximum sworkers or target workers
remain at 100 workers per node.
The number of workers on the source cluster is now variable based on the number of CPU
cores and the number of nodes. For every CPU core in the cluster, 4 workers are available to
the worker pool. So for every CPU with 4 cores, 16 workers are added to the worker pool. If a
node has two 4-core CPUs, each node adds 32 workers. As an example to calculate the
number of available workers, if the cluster has 20 nodes with 1 4-core CPU per node, you
would have 320 source cluster workers or pworkers available in the pool. If the cluster has
15 nodes with 2 4-core CPUs per node there are 480 pworkers available to the pool.
More recent, high performance nodes have one CPU per node, but each of those CPUs may
have over 10 cores (depending on the node series). Check the node version carefully when
making these calculations.
On a per-job basis, there is a maximum number of workers per node. This means that even
if you have a huge number of cores per node, each node will only use up to the per-job
maximum on any given job. By default, this is 8 workers. This helps prevent any one node
from being thrashed by a SyncIQ job.

Isilon Solution Design 339

Dynamic SyncIQ Worker Allocation


Why a maximum? Workers are dynamically allocated between running SyncIQ policy jobs. All
running policies get an equal share of workers, plus or minus 1 due to rounding. Workers
are determined as sync jobs start and stop. So as a job finishes, the job may only have work
for a few workers and its allocated workers are released back into the pool. As a new job
starts, workers may be allocated from other running jobs to provide resources for the policy
to execute its tasks. Workers are allocated slowly and smoothly between jobs as required to
eliminate any contention or resource thrashing.
The worker process model remains the same as before. Each worker is an individual process
working on an individual task. The workers are created or ended as they are required.
Workers are started or stopped when switching between tasks.

Isilon Solution Design 340

Example: Available Workers Calculation


To illustrate dynamic worker allocation we start with our example cluster. The cluster
consists of 4 nodes and has a single 4-core CPU per node. We use the default configuration
numbers of 4 workers per CPU core, and 8 workers per node per job limit maximum. The
calculations mean we have a total of 64 workers available in the worker pool, and each
running policy or job can be assigned up to 32 workers maximum.

Isilon Solution Design 341

Example: Dynamic Worker Allocation


When the first SyncIQ policy starts the job and is the only running job, 32 workers are
allocated to the running policy because that is the maximum based on the cluster size.
When the second SyncIQ job begins, the remaining 32 workers in the pool are allocated to
policy 2. The maximum of 32 workers per job are available in the worker pool, and the
workers are evenly distributed between jobs.
Now when a third job begins, no more workers exist in the worker pool. The daemon
examines the other running jobs and determines how to reallocate some of their workers to
the new job. Each job is evenly allocated workers. The number of workers is smoothly
reduced from policies 1 and 2 and allocated to policy 3.
You can carry on this example adding additional jobs and reallocating workers. If the
example were of a 100 node cluster, you can quickly calculate the number of workers in the
worker pool and maximum workers per job. SyncIQ truly scales with the cluster and
available node CPU resources.

Isilon Solution Design 342

Transfer Performance Rules


You can manage the effect of replication on cluster performance by creating rules that limit
the network traffic created and the rate at which files are sent by replication jobs. For a rule
to be in effect, it must be enabled. When the Rule Type is Bandwidth, the limit field is
KB/sec. When the Rule Type is File Count, then the Limit field is files/sec.
Using performance rules, you can set network and file processing threshold limits to limit
resource usage. These limits are cluster-wide, they affect all SyncIQ policies, and are shared
across jobs running simultaneously. You can configure network-usage rules that limit the
bandwidth used by SyncIQ replication processes. This may be useful during peak usage
times to preserve the network bandwidth for client response. Limits can also be applied to
minimize network consumption on a low bandwidth WAN link that exists between source
and target. Multiple network rules can be configured to allow for different bandwidth limits
at different times. These rules are configured globally under the performance tab of SyncIQ
and apply to all replication jobs running during the defined timeframe on that source cluster.
System resource load can also be modified by using file operation rules. File operation rules
are also global. They can limit the total number of files per second that are processed during
replication. You can schedule when the limits are in effect.

Isilon Solution Design 343

Performance: Source and Target Nodes


If no source subnet:pool is specified, then the replication job could potentially use any of the
external interfaces on the cluster. SyncIQ attempts to use all available resources across the
source cluster to maximize performance. This additional load may have an undesirable
effect on other source cluster operations or on client performance. You can control which
interfaces, and therefore which nodes, SyncIQ uses by specifying a source subnet:pool. You
can specify a source subnet:pool globally under the Settings tab or per policy when creating
a new SyncIQ policy. Specifying a subnet:pool is effectively specifying a SmartConnect zone.
You can isolate source node replication resources by defining a SmartConnect zone. The
SmartConnect zone can define a subset of nodes in a cluster to be used for replication. It can
also be used to define specific subnets or interfaces on each node to isolate replication
traffic from client traffic.
When configuring a SyncIQ policy you select a target host. If this hostname is a
SmartConnect zone on the secondary cluster, then you have the same ability to control,
which nodes or interfaces the replication traffic goes through on the secondary. This would,
of course, require pre-configuring the SmartConnect zone on the secondary cluster.

Isilon Solution Design 344

SyncIQ CloudPools Support


SyncIQ is enhanced with new features to support CloudPools. SyncIQ can synchronize
CloudPools data from the Isilon CloudPools aware source cluster to an Isilon target cluster.
The enhancements extend existing SyncIQ data protection for CloudPools data and provides
failover and failback capabilities. SyncIQ uses the CloudPools application programming
interface (API) tools to enable support.
The enhancements extend previous SyncIQ capabilities enabling replication of CloudPools
data, including stub files. SyncIQ continues to support all other SyncIQ capabilities during the
process including failover and failback for disaster recovery. The processes and capabilities
of SyncIQ features are based on the OneFS version relationship between the source cluster
and the target cluster. This relationship determines the capabilities and behaviors available
for SyncIQ policy replication.
This does not enable CloudPool operations where they would otherwise not work. For
example, SmartLock protected files can not be reduced to stubs and uploaded to the a cloud

Isilon Solution Design 345

Stub Files and Deep Copy


As discussed in the CloudPools lesson, when a file is saved to the cloud storage location, the
file structure changes on the cluster for the file. This is called a SmartLink file or stub file. The
stub file contains the file metadata, the cloud storage location and any cached CloudPools
transactional data for the file. Stub files are only applicable for CloudPools stored files. The
illustration represents what is contained in a stub file.
With SyncIQ we have the option to synchronize the stub files to the target cluster, or we have
the option to copy the stub file data and the actual file data. If we synchronize the full file
data with the stub file data, it is called a deep copy. Deep copy preserves the entire file to the
target. The primary use is with SyncIQ when the target is not CloudPools aware. An example
of a non-CloudPools aware target is a cluster running pre-OneFS 8.0, or a cluster without
access to the cloud location storage provider. The lower illustration represents the data
stored during a deep copy.

Isilon Solution Design 346

SyncIQ with CloudPools: 8.0+ > 8.0+


We now take a look at how SyncIQ works with CloudPools data when we have OneFS 8.0 or
later on both the source and target clusters. In this case SyncIQ can replicate and
understand the CloudPools data natively. The CloudPools data contains the stub file and the
cached CloudPools synchronization data. SyncIQ replicates and synchronizes both data
components to the target cluster.
Both the source cluster and target cluster are CloudPools aware. The target cluster supports
direct access to CloudPools data if the CloudPools license is purchased and enabled by
adding the CloudPools account and password information on the target cluster. This enables
seamless failover for disaster recovery by using the standard SyncIQ failover processes.
Failback to the original source cluster updates the stub file information and current cached
CloudPools data as part of the process.

Isilon Solution Design 347

SyncIQ with CloudPools: 8.0+ > pre-8.0


How does SyncIQ differ when the source cluster is CloudPools aware and the target cluster is
not? SyncIQ has been updated to support target clusters with OneFS 6.5 through OneFS
7.2.1. These OneFS versions are pre-CloudPools and are not aware of CloudPools stub files.
When this occurs, SyncIQ initiates a deep copy of the CloudPools data to the target. The files
synchronized contain the CloudPools information stored as part of the file along with a full
copy of the file data. The target cluster cannot connect directly to the CloudPools and relies
on the deep copy data stored locally on the cluster. The synchronization behaves like any
standard SyncIQ job updating the target data. In the event of a failover or a failback, the
target relies on the local copy of the data. During failback, the source cluster recognizes
when a file has been tiered to the cloud and updates the cloud with data from the target
appropriately. Any changes made to the target file data is saved as a new file version on the

Isilon Solution Design 348

Deep Copy Configuration


In addition to the default SyncIQ behavior, options are provided to control the how
CloudPools file data is synchronized. Customers may desire different replication behavior
based on their policies for different data sets. As an example, low importance data stored on
the cloud may not merit the storage space required for a deep copy to a non-CloudPools
aware cluster. Or they have decided to keep a local copy of all CloudPools data for archive or
as a backup to the services provided through the cloud storage provider.
Three options are available to configure with each SyncIQ policy: Deny, Allow, and Force.
 Deny never deep copies CloudPools data to a target cluster and fails the SyncIQ
policy if a deep copy is required. Deny is the default behavior.
 Allow copies stub file and cached file data when it can, and does a deep copy of the
data when it needs to.
 Force deep copies all data and never the stub file data to the target.

Isilon Solution Design 349

Considerations: CloudPools


In a standard node pool, file pool policies can move data from high performance tiers to
storage tiers and back as defined by their access policies. However, data moved to the cloud
will remain stored in the cloud unless an administrator explicitly requests data recall to local
storage. If a file pool policy change is made that rearranges data on a normal node pool,
data will not be pulled from the cloud. Public cloud storage often places the largest fees on
data removal from cloud storage, thus file pool policies avoid incurring removal fees by
placing this decision in the hands of the administrator.
The connection between a cluster and a cloud pool has limited statistical features. The
cluster does not track the data storage used in the cloud. This means file spillover is not
supported. Spillover to the cloud again presents the potential for file recall fees. Spillover is
designed as a temporary safety net, once the target pool capacity issues are resolved, data
would be recalled back to the target node pool.
Additional statistic details, such as the number of stub files on a cluster or how much cache
data is stored in stub files and would be written to the cloud on a flush of that cache, is not
easily available. Finally, no historical data is tracked on the network usage between the
cluster and cloud either in writing traffic or in read requests. These network usage details
should be found by referring to the cloud service management system.
A domain is simply a scoping mechanism for the data contained within a SyncIQ policy - that
is, the directories and folders which are replicated. If the source domain hasn’t already been
marked, the domain mark process runs during the resync-prep step of the failback, and it

Isilon Solution Design 350

will require a tree walk. So if you haven't run a failback until there's a lot of data associated
with your policy, that domain mark on the first failback can take a long time. It's not affecting
your client I/O - that still proceeds on the target, but it does increase the duration of your
failover test or your return to production.



We have added functionality enabling the ability to create a point-in-time report showing the
SyncIQ worker activity. Point-in-time reports are pictures at a given instance of the SyncIQ
worker activity. The ability to see how many workers are active is very useful in
troubleshooting potential performance issues. Run the isi sync jobs reports list -v
command to view detailed worker output. You may want to output to a text file to simplify
viewing of the output.

Isilon Solution Design 351

Lesson 2: SyncIQ Disaster Recovery


Upon completion of this lesson, you should be able to define failover and failback, and
describe semi-automated failback.

Isilon Solution Design 352

Typical Disaster Recovery Use Cases


The typical use cases include normal SyncIQ replication to protect the data. The primary site
is read/write, the secondary site is read-only. Snapshots are independent for each site and
need to be created on each cluster. Backups should be maintained for each cluster
independently and stored offsite.
Controlled failover to a DR site. Very useful for planned outages. Key element is the
completion of a final synchronization prior to a cut over to the DR location. The return to the
primary cluster has two options, failback preserving any changes to the secondary site data
using the prepare-resync or the discarding any changes made to the secondary site using a
failover revert. Failover revert, rolls back to the last known good snapshot for the SyncIQ
policy it is applied to.
In the event where a site is completely lost, such as what occurred in New Orleans with
Hurricane Katrina, the secondary site becomes the primary site. Eventually a new site is
setup, either as the new primary site, or established as the new secondary site.
* This is an important point for DR of any NAS: Just because the data is protected with
SyncIQ and the permissions are identical does not mean that the customer can access it
without shares or exports being created at the DR site. As discussed, both sites must be
made accessible to the client as seamlessly and with as little effort as possible.

Isilon Solution Design 353

Failover and Failback Definitions


Failover and failback are Isilon terminology for changing what location has the read/write
data and should be considered the active data set. Failover is the semi-automated process to
enable read/write capabilities on the target. SyncIQ maintains the sync relationship between
the source and target during the process. Failback is the process of resynchronizing and
restoring read/write capabilities to the original source data and returning the target data to
read-only status. This includes the reverse synchronization updating the source data to the
current status of the target data.

Isilon Solution Design 354

Failover / Failback with SyncIQ


SyncIQ includes the ability to do a semi-automated failover and failback between Isilon
clusters. The sync relationship is persevered between the directories and clusters when a
failover is required. Breaking the relationship would require a complete synchronization job,
with end to end comparison of the files to re-establish the relationship after failing back to
the original source cluster. Data consistency is guaranteed through the use of snapshots and
the ability to reverse incomplete or failed changes to the last-known-good state. SyncIQ
includes the capability to interrupt sync jobs in progress with failover jobs, and the capability
to interrupt a failover job in progress and revert to the original source cluster. Failback does
not apply to SmartLock retention directories. The nature of retention and the immutable
state of the files on the original source prohibits failback.

Isilon Solution Design 355

DR Execution – Failover / Failback


Failover and failback processes are initiated by the administrator using the CLI or the web
administration interface. Each SyncIQ policy must be initiated separately on the target
cluster. There is no global failover or failback selection. Failover/failback can be performed
with standard non-WORM or SmartLock directories with privileged deletes enabled or
disabled. Compliance SmartLock directories cannot be failed back, unless you are using at
least OneFS 8.0.1. If you perform a failover on a Compliance SmartLock directory, you can
fail it back in OneFS 8.0.1 or 8.1; the software will detect any conflicts between different
versions of committed files, and retain both versions.
SyncIQ management procedures include never deleting mirror SyncIQ policies used for
failback. SyncIQ snapshots begin with SIQ- and should never be manually deleted.
Historically kept SyncIQ snapshots should be deleted according to the policy settings. Both
mirror policies and SIQ snapshots are baseline elements used by SyncIQ in normal

Isilon Solution Design 356

SyncIQ Failover Procedure


The procedure to perform a failover can be performed in the web administration interface
or using the CLI. A failure is when the source cluster experiences downtime. SyncIQ
therefore assumes the source cluster is no longer available. Performing a failover makes no
changes on the source cluster. The failover is initiated on the target cluster using the isi
sync recovery allow-write command followed by the policy name or number you want to
failover. The command executes operations to create the failover job and execute the job.
The failover job prevents further synchronizations to the target on that specific policy. The
data under that policy is restored to the last-known-good snapshot. Then the read-only
restriction is removed from the SyncIQ domain for that policy.
The administrator then redirects the clients to the target for new file operations. At this point,
the users are connected to and accessing the target data.

Isilon Solution Design 357

Failover Site SyncIQ Domain Preparation


Here we look at two separate scenarios as part of the failover site preparation for the target.
Remember the process is for each separate sync job.
You have two scenarios: first scenario is the last sync job has completed successfully and the
other scenario is the last sync job did not complete successfully or failed mid job.
The first part of the site preparation stage is to set the SyncIQ directories for the sync job to
no longer accept incoming sync requests. The system then takes a snapshot of the
directories for the sync job, labels as “-new”. The system then compares the “-new” snapshot
to the “-latest” or last-known-good snapshot. If they are the same and no differences are
found, the sync directories have the read-only bit removed and are placed into a read/write
state and ready to accept write activity.
In the case where a sync job has not completed, failed or was interrupted in progress, the “-
new” snapshot is taken as before and compared to the “-latest” last-known-good snapshot.
The differences in directories, files and blocks are then reverted to the last-known-good
state. This process is also called snapshot revert. This restores the files to the last know
consistent state. All synchronized data in the difference between the snapshots is deleted.
Be aware, some data might be lost or unavailable on the target. After this has been
accomplished, the sync directories have the read-only bit removed and are placed into a
read, write state and ready to accept client write activity.

Isilon Solution Design 358

SyncIQ Failover Revert


So what is Failover revert? It is undoing a failover job in process. An administrator would use
Failover revert if the primary cluster or original source cluster once again became available.
This could result from a temporary communications outage or in a failover test scenario.
Failover revert stops the failover job and restores the cluster to a sync ready state and
enables replication to the target cluster to once again continue without performing a failback.
Failover revert may occur even if data modification has occurred to the target directories. If
data has been modified on the original target cluster, then either a failback operation must
be performed to preserve those changes, otherwise any changes to the target cluster data is
Failover revert is not supported for SmartLock directories. Before a fail revert can take place,
a failover of a replication policy must have occurred. On the original target cluster, using the
web administration interface, navigate to Data Protection > SyncIQ > Local targets. In the
local targets table, select the row of the failover policy you want to revert, click Disallow
Writes, and confirm the operation by clicking Yes. This needs to be done for each failover
policy you want to revert.

Isilon Solution Design 359

Failback Semi-automated Process


Failback is a bit more complicated than failover and we examine the process in detail.
The prerequisites are:
A sync policy has been failed over. The policy does not involve a SmartLock directory, and
the policy does not exclude any files or directories, the failback job must include all files and
directories of the original failover policy.
Let’s go over the procedure.
From the web administration interface on source cluster, navigate to Data Protection >
SyncIQ > Policies.
In the row of policy you want to failback, click Prepare re-sync, type yes in the confirmation,
and then click Yes one more time. A mirror policy is created for each replication policy on
the secondary cluster. Mirror policies naming pattern look like this:
<replication_policy_name>_mirror (replication, policy, name, underscore mirror).

Isilon Solution Design 360

Failback Semi-automated Process (cont’d)


On the secondary cluster, you begin to replicate data to the primary cluster by using the
mirror policies. You can replicate data either by manually starting the mirror policies or by
modifying the mirror policies and specifying a schedule. Isilon recommends that you
disallow client access to the secondary cluster and run each mirror policy again to make sure
all data is replicated.
Now on the primary cluster, navigate to Data Protection > SyncIQ > Local Targets, and for
each mirror policy, in the Local Targets table, in the row of the mirror policy, click Allow
Writes, and then in the confirmation dialogue box, click Yes.
Now on the secondary cluster, once again navigate to Data Protection > SyncIQ > Policies,
and for each mirror policy, in the Policies table, in the row of the policy, click Prepare re-
sync, and then in the confirmation dialog box, type yes and then click Yes.
As you have witnessed here, the failback process has several more steps than the semi-
automated failover process.

Isilon Solution Design 361

Setting Up SyncIQ Failover Test


To test SyncIQ failover and failback you should create a test scenario. The scenario should
not interrupt normal cluster operations. If you have created DNS entries and aliases per
SmartConnect zone, the test scenario should allow testing without interruption. Create a
separate set of SyncIQ test directories. For each time you are testing, copy new test data into
the directories. This data can be a set of other files from the cluster. Delete some or all of the
previous test data to test SyncIQ policy deletions or copy policy operations. Create or use the
previous SyncIQ test policy or policies. Run the SyncIQ policies manually. Once completed,
the data should be ready for testing for failover and failback operations. You can also test
failover revert and other SyncIQ tests using the test scenario and policies.

Isilon Solution Design 362

Lesson 3: NDMP Backups


Upon completion of this lesson, you should be able to define design considerations for
backup and identify differences between two-way and three-way backup.

Isilon Solution Design 363

Backup Challenges


Let’s see how Isilon can help backup administrators meet backup challenges.
OneFS is an ideal file system for the backup-at-scale use case. The ability to scale a single
container or volume to multiple-petabyte large data sets is easily accommodated. The NL-
Series nodes also provides a well-balanced combination of performance and storage density
for use as a backup target. In new installations, consider the Gen 6 archive or A-Series nodes
for density and capacity.
Because Isilon’s performance scales with capacity, we are able to meet aggressive SLAs even
with the increase of unstructured data storage.
Snapshots provide a fast way to keep the backup data current, which makes it easier to meet
recovery point objectives, and our scalable performance improves our recovery time if the
backup data needs to be retrieved.
We also support the ability, with SyncIQ, to support a remote backup site for disaster
As you can see, Isilon is a compelling solution for backup.

Isilon Solution Design 364

Qualifying Questions


Total capacity - Can a full backup be done in the customer’s backup window? If not, do they
split full backups and do the parts on different days?
Daily change rate - The change rate can be due to both editing of files already on the file
system, or new files added to the file system. This is important for determining the length of
incremental backups. Generally, file systems change only slightly on a daily basis, perhaps
~10-15%, but this can be highly variable depending on the workflow. Customers don’t always
have a good idea of their change rate, but you can get a rough idea from the cluster (via CLI
commands, the web administration interface, or InsightIQ).
Backup schedule - A common backup schedule is weekly fulls, daily incrementals, and some
combination of monthly & yearly fulls. If a full backup takes more than one day (or one
backup period), they may be split into two or more pieces. For example, do a full backup of
half the data on one day, and the other half on the next day.
Backup window - How long do backups have to finish? How busy is the cluster during
backups? Of course, backups are generally done at night during times of low to no activity,
but this is not always the case. Some customers will use their clusters around the clock, and
some customers will let backups run to completion, even if the backup runs during the day.

Isilon Solution Design 365

Qualifying Questions (cont'd)


File system information - The total number of files is important, as a large number of files
will take longer to back up than a small number of files. The size of the files is also important.
A lot of small files will take much longer to backup than a small number of large files that
equate to the same amount of capacity. Backup Accelerators can achieve ~110 MB/s
throughput for files as small as 64 kB.
Backup infrastructure - How many tape devices or virtual tape devices are available for
backing up the cluster? If the backup goes over the LAN, is there a dedicated backup LAN? Is
the LAN GigE or 10 GigE? If the backup uses Fibre Channel, is the SAN dedicated to backup?

Isilon Solution Design 366

Solution Design and Workflow Profile Document


In order to better serve the customer, make sure to update your Solution Design & Workflow
Profile documents throughout the customer discovery process.

Isilon Solution Design 367

Isilon Supportability


Isilon supports all the major backup vendors (Networker, NetBackup, CommVault, TSM, etc.).
Isilon supports LTO-3, 4, & 5 currently as well as VTL vendors (list maintained in the
compatibility guide).
For large clusters with 10’s to 100’s of millions (or billions) of files, backup (and restore) can
become unrealistic and a replication or snap and replicate strategy needs to be considered.
Backing up a subset of a large clusters data set is viable. For compliance, security, or DR
reasons, backing up a few directories of a large file system to an offsite media is a good
For large clusters, or those with 100s of millions of files, backups are almost impossible to
complete in a reasonable time frame. And even if backups are made, restoring that data can
take much longer than a business can tolerate. However, with the proper amount of backup
accelerators, it is possible to backup large clusters, though this is becoming increasingly
uncommon. NDMP access is by virtue a third-party application in that NDMP itself is simply a
protocol Isilon offers for backups

Isilon Solution Design 368

NDMP Backup Options


With Isilon you have two options for NDMP backup: two-way using the Backup Accelerator,
or three-way transferring data across the IP network.
Two-way (Direct)
 Backup application manages the backup
 Backup Accelerator writes data directly to LTO tape over FC
 Data traverses the InfiniBand back-end network
Three-way (Remote)
 Backup application controls the backup
 Data traverses the front-end network (LAN)

Isilon Solution Design 369

Backup Accelerator for Tape Backup


A companion hardware product to the Isilon nodes is the Backup Accelerator, which is
designed specifically to move data as quickly as possible to tape using the Network Data
Management Protocol (or NDMP) that is well established in the market. Virtually all backup
applications and tape-storage companies support NDMP.
Each Backup Accelerator can support multiple streams of data in excess of 100MB/s. The
number of Backup Accelerators you need is determined by the backup policies in place and
on the size of the backup data set.
A key benefit of the Backup Accelerator is that it features Fibre Channel connectivity to talk
to the tape system. Tape automation is typically located on a SAN, allowing Isilon with
Backup Accelerator to drop into existing environments.

Isilon Solution Design 370

Backup Accelerator Performance Benefits


File system-based and remote three-way backups are slower and more resource consuming
for large clusters, so Isilon Backup Accelerator nodes should be deployed for efficiency and
scale, even if targeting only a subset of data. Large memory cache and multiple processors
allow the backup accelerators to pre-fetch data from the file system and efficiently push the
data to tape. Backup Accelerators offload some of the processing overhead from the cluster
during backups. High speed, low latency data path from the storage nodes via InfiniBand to
a Tape Library via Fiber Channel provides more consistent, higher performance than LAN-
based backups.

Isilon Solution Design 371

Two-Way or Direct Backup


An example architecture using two-way NDMP backup with the Backup Accelerator.

Isilon Solution Design 372

Two-Way or Direct NDMP Peak Performance


Peak concurrent streaming performance requires LTO-4 or better drives. LTO-3 drives won’t
support those sustained speeds.
Peak performance is probably unlikely to be achieved in most environments, and is affected
by many variables, such as the number of files, the directory structure, the size of file, the
type of nodes, the cluster workload, etc.

Isilon Solution Design 373

Three-Way or Remote Backup


An example architecture using three-way NDMP backup without the Backup Accelerator.

Isilon Solution Design 374

Example 1: Direct Backups


Best case, a Backup Accelerator has a throughput to LTO-4 (equivalent of 1.7 TB/h).
Six S200 nodes - 13.2 TB, 400 GB SSD = 59 TB useable
At 85% full (50TB), full backup would take:
 29 hours with 1 Backup Accelerator
 14.5 hours with 2 Backup Accelerators
 10 hours with 3 Backup Accelerators
Conclusion: Full backup is realistic with 2 or 3 Backup Accelerators

Isilon Solution Design 375

Example: Direct Two-Way Backup


This is why backing up clusters is not the norm. It’s just not practical in most cases. Roughly
half of our customers backup some or all data on their clusters, but the rest use a snapshot
and replicate strategy. As a cluster grows, it’s really the only viable solution.

Isilon Solution Design 376

Backup Accelerators Sizing Guidelines


You can have as many Backup Accelerators (BA) as needed in a single cluster as long as you
don’t exceed the following guidelines. Following these node to BA ratios will make sure that
there is enough spindle performance to drive a consistent stream of data to the BA. Backup
Accelerator sizing recommendations:
Every platform should be sized with:
 One BA for the first 3 nodes,
 One BA for every two additional S-Series nodes
 One BA for every three additional X-Series nodes
 One BA for every three or four additional NL-Series nodes
At the time of publication, no sizing recommendations for Gen 6 nodes, as far as
accelerators are concerned, are available. These scenarios should be referred for
consultation until guidance becomes available.

Isilon Solution Design 377

Considerations: Backups


Finally, while full backups of high capacity clusters may or may not be feasible, don’t forget
that it’s the restore that really matters. Apart from other considerations, is a full restore of
the data set feasible?
What are the RTOs/RPOs for backup? And for restore? Restores may take an incredibly long
time and the data restored from tape may or may not be in a consistent state. Two-way,
three-way or file system backup/restores the Backup Accelerators (BA) move data over the IB
back-end through FC to tape, they will provide the highest and most consistent performance
for backups. Determine the bandwidth requirements. How much data as a percentage is
changing between backups. How many files and what is the average size of the files? Big
data often requires either longer time periods or more bandwidth or both to perform
Best case performance for BAs backing up to LTO-4 or LTO-5 drives is 120MB/s, with four
streams providing up to 1.7TB/hr per BA.
LAN performance generally limited by network when using GigE ports, though they can
provide up to ~100MB/s throughput. In reality, peak throughput will fall short of this, and is
often as low as 10MB/s, even on a 10GbE link. On a direct connection through a 40GbE link,
throughput could in principle reach 4GB/s, but that scenario would require an essentially
dedicated link from a cluster node to a backup infrastructure.
Although it’s not exactly a backup, due to the high capacities and large file counts, using

Isilon Solution Design 378

snapshots and replicating to a DR site is a common “backup” strategy. Roughly half of all
Isilon customers use this approach instead of backups. One drawback of the snap and
replicate approach is the lack of a catalog. You have to know what you want to restore, or
search through a snapshot. Using a snap and replicate strategy on Isilon with OneFS protects
against accidental or intentional deletions, as well as data corruption and provides a DR copy
of the data and file system. Keep in mind that snapshots are not writeable, though a
snapshot of a single file can be made writeable (referred to as a clone). Restoring from a
snapshot is much faster than from tape or disk. Choosing between backups vs.
snap/replicate is based on RPO/RTO, as well as customer preference (assuming they have an
option). Some customers insist on backing up a copy to tape for off-site storage. If that’s not
required, then snapshots and replication may be preferred in any case. Bear in mind that
some industries do have archival requirements that, while they make no sense on paper, do
reflect regulatory needs.
Backing up anything but smaller clusters using the front-end client connectivity requires
large backup windows in most cases. The use of backup accelerators enables the back-end
to be utilized used to consolidate the data from the nodes. The accelerators connect over
fibre channel to the backup tape library or virtual tape library system and gain greater
backup efficiencies. Backup to virtual tape library systems is recommended for further
performance gains. If possible, use systems such as Data Domain with in-line deduplication
capabilities to improve remote backup bandwidth efficiencies and storage efficiencies.

Isilon Solution Design 379

Avamar/Isilon Integration


Backing up an Isilon cluster with EMC’s Avamar (source-based de-duplication backup

solution) will be possible with the 7.0 release of Avamar. This will also require the use of
Avamar’s NDMP Accelerator device. Avamar’s backup methodology is to do daily synthetic
full backups. After the first backup, which must be a traditional, full backup, Avamar does
daily incremental backups, which can be used in combination with the initial full backup to
create daily synthetic full back ups. Because Avamar uses a hashing strategy on the source
to keep track of changes, incremental backups are extremely fast.

Isilon Solution Design 380

Module 6: Authentication and Authorization


Upon completion of this module, you will be able to identify Isilon supported protocols,
differentiate authentication vs. authorization, examine identity management, and implement
RBAC management.

Isilon Solution Design 381

Lesson 1: Client Protocol Support


Upon completion of this lesson, you will be able to list the client protocols Isilon supports,
and why these protocols are so important in the multi-tenancy environment of a single Isilon
cluster. You should also be able to explain why Isilon manages user identities to support
multiple client protocols.

Isilon Solution Design 382

Overview: Multiprotocol


Multiple network protocols can be configured if a network supports many types of clients. If
so, in a multiprotocol environment, authentication sources for both Windows and UNIX
users, including Active Directory, LDAP, and NIS must be correctly configured. OneFS creates
a heterogeneous environment with different types of network access to files that
accommodates multi-tenancy. The graphic illustrates a typical real world requirement
demand and need by the administrators from any NAS storage solution. A solution should
be able to provide interoperability of UNIX and Windows clients accessing the same data
from the same file system.

Isilon Solution Design 383

Qualifying Questions


To determine the best permission model for an environment, it is critical to first determine
what behavior is expected or required. The three most important questions in determining
which permission model best suits an environment are:
 Do you even have a multiprotocol environment? True multiprotocol means the same
environment is accessed by both protocols.
 What sources of authentication data exist in the environment? For example, Active
Directory, LDAP, NIS, and so on.
 Are the sources complete and consistent?
Also, the most important piece of information to take-away from this lesson is that Identity
Management can quickly get complicated if the customer has an unusual implementation of
authentication schemes or identity schemes.
Your best fallback is to reach out to a CSE (Customer Support Engineer). Mapping identities
between Windows and UNIX directories can be challenging. Policies can help and mapping
can be manipulated via command line, potentially scriptable. When implementing it is
recommended to associate with LDAP, NIS, or file providers before any SMB client connects.
Directory services must be accessible by all nodes supporting front-end client connections.

Isilon Solution Design 384

Primary Storage Protocols


The primary two client access protocols supported by OneFS are NFS and SMB.
Multiple versions of each protocol are supported. OneFS supports several network
communication protocols including enabling user access to the files and folders
stored on an Isilon cluster.
Network File System (NFS) - Isilon provides a highly scalable NFSv3 and NFSv4
service with dynamic failover for best in class availability for UNIX/Linux/FreeBSD,
etc., users.
Server Message Block (SMB) - Isilon provides a highly-scalable native SMB
implementation supporting Windows clients that use SMB 1.0, SMB 2.0, SMB 2.1, or
SMB 3.0.

Isilon Solution Design 385

Overview: SMB


The Isilon cluster’s default file share (/ifs) gives Windows users access to file system
resources over the network, including resources stored by UNIX and Linux systems. Now,
with Windows XP being EOL (end of life), more and more enterprises are moving to Windows
7 or Windows 8.
Windows 8 supports SMB 3.0 with the Continuous Availability (CA) and Witness Protocol's
which are critical to ensure non-disruptive operations on client machines. CA provides
Windows/SMB based clients the ability to continue file operations during both planned and
unplanned network or storage node outages.
These features provide big benefits to any environment where disruption of client
connectivity or restart of mission critical applications, can cause significant downtime -
translating to significant money and time loss.

Isilon Solution Design 386

Overview: NFSv3 and NFSv4


For NFS, OneFS works with versions 3 and 4 of the Network File System protocol (NFSv3, and
NFSv4). The Isilon cluster’s default export (/ifs) enables Linux and UNIX clients to remotely
mount any subdirectory, including subdirectories created by Windows users. Linux and UNIX
clients can also mount ACL-protected subdirectories that a OneFS administrator created.
In OneFS versions prior to OneFS 8.0, when an NFSv4 client connects to the cluster, it
connects to a single node. In the event that this node goes down or if there is a network
interruption between the client and the node, the NFSv4 client has to reconnect to the
cluster manually. This is due in part to the stateful nature of the protocol. This is an issue
because it is a noticeable interruption to the client’s work. In order to continue working, the
client must manually reconnect to the cluster. Too many disconnections would also prompt
for the clients to open help desk tickets with their local IT department to determine the
nature of the interruption/disconnection.

Isilon Solution Design 387

NFSv3 and NFSv4 Compared


NFSv3 does not track state. A client can be redirected to another node, if configured, without
interruption to the client. NFSv4 tracks state, including file locks. Automatic failover is not an
option in NFSv4.
Because of the advances in the protocol specification, NFSv4 can use Windows Access
Control Lists (ACLs). Technically, NFSv4 ACLs are different from Windows ACLs, but there is
sufficient overlap between the two that they can be considered interoperable. NFSv4
mandates strong authentication. It can be used with or without Kerberos, but NFSv4 drops
support for UDP communications, and only uses TCP because of the need for larger packet
payloads than UDP will support.
File caching can be delegated to the client: a read delegation implies a guarantee by the
server that no other clients are writing to the file, while a write delegation means no other
clients are accessing the file at all. NFSv4 adds byte-range locking, moving this function into
the protocol; NFSv3 relied on NLM for file locking.
NFSv4 exports are mounted and browesable in a unified hierarchy on a pseudo root (/)
directory. This differs from previous versions of NFS.

Isilon Solution Design 388

What is NFSv4 Continuous Availability?


As of OneFS 8.0, Isilon offers the continuously available (CA) feature. This option allows
NFSv4 clients to transparently fail over to another node in the event of a network or node
failure. This feature is part of Isilon's non-disruptive operation initiative to give customers
more options for continuous work and less down time. The CA option allows seamless
movement from one node to another and no manual intervention on the client side. This
enables a continuous workflow from the client side with no appearance or disruption to
their working time. CA supports home directory workflows as well.

Isilon Solution Design 389

Considerations: NFS CA Configuration


In OneFS 8.0 and later, NFSv4 CA is enabled by default. This won’t affect the majority of
customers that use NFSv4 with a static IP address pool; however, if a customer is using
NFSv4 with a dynamic IP address pool, they will notice a significant drop in the performance
of this pool. The best practice is currently to use NFSv4 with a static pool because NFSv4 acts
and functions similarly to SMB. In rare instances in which a customer decided or was
inadvertently told to use a dynamic pool, those customers upgrading to OneFS 8.0 or later
will notice a decrease in the performance of these pools. Planning and reviewing of the
current pool types should be done, and the effects explained to those customers prior to
upgrading to OneFS 8.0 or more recent versions.

Isilon Solution Design 390

Additional NFS Improvements


Prior to OneFS 8.0, Isilon supported up to 1,000 exports, however, many customers required
or requested a larger number of exports. With OneFS 8.0 and later, in order to meet the
demands of large and growing customers, Isilon now supports up to 40,000 exports.

Isilon Solution Design 391

Overview: FTP


Isilon provides support for file transfers to and from the cluster using a standard File
Transfer Protocol (FTP) service and ports. FTP and FTPs run through the shared FTP service
daemon, vsftpd. sFTP runs through sshd.
The Isilon cluster supports FTP access, however by default the FTP service is disabled. OneFS
includes a secure FTP service called vsftpd, which stands for Very Secure FTP Daemon, that
you can configure for standard FTP and FTPS file transfers. Any node in the cluster can
respond to FTP requests, and any standard user account can be used.
When configuring FTP access, ensure that the specified FTP root is the home directory of the
user who logs in. For example, the FTP root for local user jsmith would be /ifs/home/jsmith.
You can enable the anonymous FTP service on the root by creating a local user named ftp.
The FTP root can be changed for any user by changing the user’s home directory.
The recommended limit of FTP connections per node is 450. This is the tested limit. The
number assumes that of the 450 FTP connections, 400 are idle and 50 are active at a time. If
the number of FTP connections to a node exceeds 450, then FTP performance might be
affected. The guideline of 450 connections per node assumes anonymous access that
requires no authentication.

Isilon Solution Design 392

Overview: FTPS and SFTP


Generally speaking, FTP in its basic form is not secure, FTPS takes the security up a step in
that it allows you to secure all or part of a session (at the cost of speed), and the SFTP
protocol is used to ensure that all file transmission will be secure and efficient.
FTP over SSL (FTPS, as it’s commonly known) allows for the encryption of both the Control
and Data Connections either concurrently or independently. This is important because the
negotiation of the SSL connection is time-consuming, and having to do it twice-once for the
Data Connection and once for the Control Connection-can be expensive if a client plans to
transfer a large number of small files.
SFTP (Secure File Transfer Protocol) is a relatively new protocol developed in the 1990s,
which allows for the transfer of files and other data over a connection that has previously
been secured using the Secure Shell (SSH) protocol. While similar to FTPS in that both
protocols communicate over a secure connection, that’s basically where the similarities end.
Slides Sources:
1. FTPS Server by Rex Yuan

Isilon Solution Design 393

The Two Modes of FTPS


There are two modes of FTPS: Explicit and Implicit. In the Explicit mode, FTPS client must
explicitly request security from an FTPS server and then step-up to a mutually agreed
encryption method. Also, clients could determine which mechanisms are supported by
querying the FTPS server and common methods of invoking FTPS security included: AUTH
Whereas in the Implicit mode, negotiation is not allowed with implicit FTPS configurations. A
client is immediately expected to challenge the FTPS server. Also, in order to maintain
compatibility with existing non-TLS/SSL aware FTP clients, implicit FTPS was expected to
listen on Port 990/TCP for FTPS control channel and 989/TCP for the FTPS data channel.

Isilon Solution Design 394

SFTP Overview


SFTP is technologically superior to FTPS. It is a good idea to implement support for both
protocols, although they are different in: concepts, supported commands, and in many
functional areas.
It may be a good idea to use FTPS when you have a server that needs to be accessed from
personal devices (smartphones, PDAs etc.) or from some specific operating systems which
have FTP support but don’t have SSH / SFTP clients. If you are building a custom security
solution, SFTP is probably the better option.
FTPS (FTP over SSL) vs. SFTP (SSH File Transfer Protocol): what to choose

Isilon Solution Design 395

Overview: HTTP and HTTPS


Hypertext Transfer Protocol (HTTP) - Isilon provides an HTTP service for Web-based file
access and administration of the cluster as well as support for distributed authoring and
versioning (DAV). The REST API and WebDAV access both run over HTTP, which is delivered
by the installed web service daemon.
The procedure for encrypting information and then exchanging it is called HyperText
Transfer Protocol Secure (HTTPS). With HTTPS if anyone in between the sender and the
recipient could open the message, they still could not understand it. Only the sender and the
recipient, who know the "code," can decipher the message.
2. Isilon OneFS V8.0.0 Web Administration Guide

Isilon Solution Design 396

HDFS Overview


The Hadoop Distributed File System (HDFS) protocol enables a cluster to work with Apache
Hadoop, a framework for data-intensive distributed applications. HDFS integration requires
you to activate a separate license.
Hadoop is an open-source platform that runs analytics on large sets of data across a
distributed file system. In a Hadoop implementation on an EMC Isilon cluster, OneFS acts as
the distributed file system and HDFS is supported as a native protocol. Clients from a
Hadoop cluster connect to the Isilon cluster through the HDFS protocol to manage and
process data.
Also, Hadoop support on the cluster requires you to obtain an HDFS license.

Isilon Solution Design 397

HDFS in OneFS 8.0+


In OneFS 8.0, the Isilon engineering team made the decision to provide a robust and scalable
version of HDFS for this and all future releases. Starting in OneFS 8.0, the HDFS protocol was
entirely rewritten to increase processing parallelism and scalability, as well as to add a web
administration interface, additional support for auditing, CloudPools, and SMB file filtering.
With this rewrite OneFS has a new foundation, purpose built, to support continued future
HDFS innovations.

Isilon Solution Design 398

Why Isilon for HDFS?


When we partner with any Hadoop vendor where Isilon adds value and ultimately lowers the
cost of the implementation are in these sections:
Total Cost of Ownership: Depending on the size of the Hadoop infrastructure the white box
technology will be more expensive when you factor in the number of servers needed for
replica copies and data locality. You will need to buy 3x-5x the capacity than what you really
need because of Hadoop's mirroring. What this leads to is a larger data center footprint
needed for Hadoop. You will ultimately have to power and cool more physical machines then
you would with a Hadoop infrastructure that consists of compute nodes and Isilon nodes.
Independent Scaling: In the traditional Hadoop DAS-based model, when you need more
capacity you are buying more compute, but you are buying it in 3x-5x and in some case
higher. With Isilon, if you need 500TB of capacity you are buying 500TB of usable capacity
and only buying the compute that you need to run your analytics. If you decide that you
need more compute to run your analytics you buy more compute nodes. Independent
scaling of the compute and storage is one of the major benefits of this architecture.
Disaster Recovery: When a Hadoop environment is built to make business decisions or
help drive marketing, the data that is produced is extremely valuable. In traditional DAS-
based Hadoop implementations, replication isn't all that possible. Isilon enables you to be
able to replicate all of the Hadoop data, some of the Hadoop data or just the data that
Hadoop analytics outputs from the queries run. Leveraging SyncIQ (SIQ) and snapshots gives
enterprises the flexibility of having a robust analytics tool with enterprise replication and

Isilon Solution Design 399

local recovery technology.
Immediate access to Hadoop Data: Isilon enables Hadoop data to be viewed over NFS and
SMB immediately without having to move data out of HDFS itself. Isilon treats the HDFS data
as data that can be processed by any of the protocols that Isilon supports. Traditional
Hadoop clusters do not allow for this.
Tech refreshes: When going with a white box technology model you will have to refresh the
hardware every 3-4 years which means that any data that is currently in Hadoop will need to
be migrated from the old infrastructure to the new infrastructure. With Isilon you can stand
up a new infrastructure and cut it over to Isilon and have all of the data readily available. It
also allows for enterprises to have multiple flavors of Hadoop stood up in their environment
and have shared access to Hadoop data. When it comes time to refresh Isilon, all you need
to do is add the new nodes into the cluster and the OS will automatically migrate data to the
new nodes and take the old nodes out of the cluster without any manual intervention.

Analytics-ready Storage Choices


EMC has developed the Hadoop-as-a-Service (HDaaS) reference architecture called the
Hadoop Starter Kit to provide a step-by-step guide for quickly and easily deploying any
distribution of choice using the VMware and Isilon technologies we just covered.
The EMC Hadoop Starter Kit enables this consolidation and virtualization of Hadoop. We

Isilon Solution Design 400

have found that EMC Isilon customers with virtualized environments using VMware can
deploy the Hadoop distribution of their choice for minimal investment of time and money.
The EMC Hadoop Starter Kit enables customers to stand up a Hadoop infrastructure in just a
few hours with simple downloads of free software and documented configuration steps
leveraging automation provided by VMware Big Data Extensions.
HSK supports VMware vSphere Big Data Extensions with the following new features for the
rapid deployment of a Hadoop cluster on a VMware vSphere virtual platform.
 Support for Major Hadoop Distributions. Big Data Extensions includes support for all the
major Hadoop distributions including Apache Hadoop, Cloudera, Hortonworks, and
PivotalHD. Customers can easily deploy Hadoop distributions of their choice on a flexible,
scalable compute and storage solution using EMC Isilon.
 Quickly Deploy, Manage, and Scale Hadoop Clusters. Big Data Extensions enables the rapid
deployment of Hadoop clusters on VMware vSphere. You can quickly deploy, manage, and
scale Hadoop nodes using the virtual machine as a simple and elegant container. Big Data
Extensions provides a simple deployment toolkit that can be accessed though VMware
vCenter Server to deploy a highly available Hadoop cluster in minutes using the Big Data
Extensions user interface.
 Graphical User Interface Simplifies Management Tasks. The Big Data Extensions plug-in, a
graphical user interface integrated with vSphere Web Client, lets you easily perform
common Hadoop infrastructure and cluster management administrative tasks.
 Elastic Scaling Lets You Optimize Cluster Performance and Resource usage. Elasticity-
enabled clusters start and stop virtual machines automatically and dynamically to optimize
resource consumption. Elasticity is ideal in a mixed workload environment to ensure that
high priority jobs are assigned sufficient resources. Elasticity adjusts the number of active
compute virtual machines based on configuration settings you specify.

Isilon Solution Design 401

Overview: Swift


In general, OpenStack Object Storage (Swift) provides redundant, scalable distributed object
storage using clusters of standardized servers. “Distributed” means that each piece of the
data is replicated across a cluster of storage nodes. The number of replicas is configurable,
but should be set to at least three for production infrastructures.
Objects in Swift are accessed via the REST interface, and can be stored, retrieved, and
updated on demand. The object store can be easily scaled across a large number of servers.
Isilon Swift enables you to access file-based data stored on your EMC Isilon cluster as objects.
The Swift API is implemented as a set of Representational State Transfer (REST) web services
over HTTP or secure HTTP (HTTPS). Content and metadata can be ingested as objects and
concurrently accessed through other supported EMC Isilon protocols.
2. Isilon OneFS Version 8.0.0 Web Administration Guide

Isilon Solution Design 402



The table describes some of the guidelines and consideration for configuring protocols for
OneFS. The Isilon Technical Specifications Guide has the complete listing of these guidelines
and recommendations for configuring OneFS and IsilonSD Edge. Configuration guidelines
are provided for protocols, file system features, software and hardware components, and
network settings.

Isilon Solution Design 403

Serviceability and Troubleshooting


There are many commands, tools, log files and jobs that can be viewed to assist in
troubleshooting the cluster functionality.

Isilon Solution Design 404

Lesson 2: Authentication and Authorization


Upon completion of this lesson, you will be able to differentiate authentication and
authorization, explain access zone functions, enumerate supported authentication systems,
outline Isilon authentication architecture, and understand best practices in access zone

Isilon Solution Design 405

Layers of Access


Interactions with an Isilon cluster have four layers in the process. The first layer is the
protocol layer. This may be Server Message Block, or SMB; Network File System, or NFS; File
Transfer Protocol, or FTP; or some other protocol but this is how the cluster is actually
reached. The next layer is authentication. The user has to be identified using some system,
such as NIS, local files, or Active Directory. The authentication layer is the topic of this
module. The third layer is identity assignment. Normally this is straightforward and based on
the results of the authentication layer, but there are some cases where identities have to be
mediated within the cluster, or where roles are assigned within the cluster based on a user’s
identity. We examine some of these details later in this module. Finally, based on the
established connection and authenticated user identity, the file and directory permissions
are evaluated to determine whether or not the user is entitled to perform the requested
data activities.

Isilon Solution Design 406

Qualifying Questions


Questions to ask:
 What mechanism are you using for SMB authentication?
 If the customer has multiple AD domains or Forests, are they all trusted?
 Is LDAP provided by AD, or some other client? If LDAP is present, what’s the schema?
Is Kerberos present? Are they using NIS? NIS+? (Isilon doesn’t support NIS+).
 Are you doing any custom authentication configurations, or identity management

Isilon Solution Design 407

Identity Management and Access Control


Here we will revisit the access zone architecture graphic. Authentication providers support
the task of authentication and identity management by verifying users’ credentials before
allowing them to access the cluster. The authentication providers handle communication
with authentication sources. These sources can be external, such as Active Directory (AD),
Lightweight Directory Access Protocol (LDAP), and Network Information Service (NIS). The
authentication source can also be located locally on the cluster or in password files that are
stored on the cluster. Authentication information for local users on the cluster is stored in
OneFS supports the use of more than one concurrent authentication source. The lsassd
daemon manages the authentication process.
OneFS works with multiple identity management systems to authenticate users and control
access to files. In addition, OneFS features access zones that allow users from different
directory services to access different resources based on their IP address. Meanwhile, role-
based access control (RBAC) segments administrative access by role.
As shown, the identity management systems OneFS authenticates users with are Microsoft
Active Directory (AD), Lightweight Directory Access Protocol (LDAP), Network Information
Service (NIS), local users and local groups, and a file provider for accounts in /etc/spwd.db
and /etc/group files. With the file provider, you can add an authoritative third-party source of
user and group information.

Isilon Solution Design 408

You can manage users with different identity management systems; OneFS maps the
accounts so that Windows and UNIX identities can coexist. A Windows user account
managed in Active Directory, for example, is mapped to a corresponding UNIX account in NIS
or LDAP.
For a review of the access zone architecture, click on the boxes to learn more about each
External Protocols
External access protocols are used by clients to connect to the Isilon cluster. The currently
supported protocols are listed on the slide.
lsassd Daemon
Within OneFS, the lsassd (L-sass-d) daemon mediates between the external protocols and
the authentication providers, with the daemon reaching out to the external providers for
user lookups.
External Providers
In addition to external protocols, there are also external providers. These are external
directories that hold lists of users that the internal providers contact in order to verify user
credentials. Once a user’s identity has been verified OneFS will generate an access token.
The access token will be used to allow or deny a user access to the files and folders on the
Internal Providers
Internal providers sit within the cluster’s operating system and are the Local, or File
Providers. A Local Provider is a list of users local to the cluster, and the File Provider would
use a converted etc/password file.

Isilon Solution Design 409

External Protocols

Isilon Solution Design 410

lsassd Daemon

Isilon Solution Design 411

External Providers

Isilon Solution Design 412

Internal Providers

Isilon Solution Design 413

Access Zone Functionality


Isilon provides secure multi-tenancy with access zones. Access zones do not require a
separate license. Access zones enable you to partition cluster access and allocate resources
to self-contained units, providing a shared tenant environment. You can configure each
access zone with its own set of authentication providers, user mapping rules, and
An access zone is a context that you can set up to control access based on an incoming IP
address. The purpose of an access zone is to define a list of authentication providers that
apply only in the context of the zone you created. All user access to the cluster is controlled
through access zones. Each access zone contains all of the necessary configuration to
support authentication and identity management services on OneFS. Using access zones
enables you to group these providers together and limit which clients can login to the

Isilon Solution Design 414

Access Zone Capabilities


All user access to the cluster is controlled through access zones that provide a method for
users from different authentication providers to access different cluster resources based on
the IP address to which they connect. OneFS contains a built-in access zone, called System,
which has access to all available authentication providers, NFS exports, and SMB shares.
Administrators are able to partition the cluster into additional access zones and configure
each zone with its own namespace and list of providers. NFS users can authenticate through
their own access zone as NFS is now aware of the individual zones on a cluster, allowing you
to restrict NFS access to data at the target level as you can with SMB zones. Multiple access
zones are particularly useful for server consolidation, for example, when merging multiple
Windows file servers that are potentially joined to different untrusted forests. Access zones
contain all of the necessary configuration to support authentication and identity
management services on OneFS.

Isilon Solution Design 415

Authentication Sources and Access Zones


There are three things to know about joining multiple authentication sources through access
zones. First, the joined authentication sources do not belong to any zone, instead they are
seen by zones; meaning that the zone does not own the authentication source. This allows
other zones to also include an authentication source that may already be in use by an
existing zone. For example, if you have Zone-A with providers LDAP-1, AD-1 and Zone-B with
NIS, not restricting authentication sources to a zone means that the administrator can then
create Zone-C with the LDAP-1 provider that was used in Zone-A. Second, when joining AD
domains, only join those that are not in the same forest. Trusts within the same forest are
managed by AD, and joining them could allow unwanted authentication between zones.
Finally, there is no built-in check for overlapping UIDs. So when two users in the same zone -
but from different authentication sources - share the same UID, this can cause access issues;
additional details on this topic will be covered in the next module.

Isilon Solution Design 416

Functionality of Groupnets


Groupnets are how your cluster communicates with the world. If your cluster needs to talk
to another customer’s authentication domain, your cluster needs to know how to find that
domain and requires a DNS setting to know how to route out to that domain. Groupnets
store all subnet settings; they are the top-level object and all objects live underneath
Groupnet0. Groupnets in OneFS 8.0 and later can contain individual DNS settings that were
one single global entry in previous versions. After upgrading from pre-OneFS 8.0,
administrators will see a Groupnet0 object; this is no different from what a customer had
prior to the upgrade, with the whole cluster pointing at the same DNS settings. Groupnet0 is
the default groupnet.
Conceptually it would be appropriate to think of groupnets as a networking tenant. Different
groupnets allow portions of the cluster to have different networking properties for name

Isilon Solution Design 417

Example Access Zones


Isilon allows companies to consolidate their many islands of storage into a single namespace.
The challenge comes from overlapping home directory structures and the use of multiple
authentication sources within the company. Access zones allow for each department or
division to continue with its own authentication sources, with access to separate directory
trees within the shared storage system. This allows for overlapping home directory
structures to be contained and isolated by department or division.
The second use case stems from departmental or divisional data isolation. The access zones
can be set up to isolate areas of the directory tree to be accessible only from a particular
access zone. For instance, HR can have secure access to the /ifs/HR/data directory path.

Isilon Solution Design 418

Best Practices: Configuring Access Zones


There are some best practices for configuring access zones.

First, administrators should create a separate /ifs tree for each access zone. This process
enables overlapping directory structures to exist without conflict and a level of autonomous
behavior without the risk of unintentional conflict with other access zone structures.
Second, administrators should consider the System access zone exclusively as an
administration zone. To do this, they should remove all but the default shares from the
System access zone, and limit authentication into the System access zone only to
administrators. Each access zones works with exclusive access to its own shares providing
another level of access control and data access isolation.
Also the administrators should follow existing best practices guidelines by connecting to
LDAP or NIS servers before joining the cluster to an Active Directory domain per access zone.
Isilon recommends joining the cluster to the LDAP environment before joining AD so that the
AD users do not have their SIDs mapped to cluster ‘generated’ UIDs. If the cluster is a new
configuration and no client access has taken place, the order LDAP/AD or AD/LDAP doesn’t
matter as there have been no client SID-to-UID or UID-to-SID mappings.

Isilon Solution Design 419

Considerations: Authentication and Authorization


Access zones meets many, but not all, definitions of multi-tenancy; understand the full scope
of customer’s requirements before qualifying in or out. Due to the nature of OneFS, there
are challenges regarding DNS for untrusted private domains and vLANs in multiple default
gateways. SmartConnect Advanced is highly recommended, though not required, for access
zones implementation.

Isilon Solution Design 420

Authentication Providers


Shown is access zone architecture graphic. Click on the authentication provider boxes to
learn more about each area.
Active Directory:
Active Directory, or AD, is a directory service created by Microsoft that controls access to
network resources and that can integrate with Kerberos and DNS technologies. Active
Directory can serve many functions, but the primary reason for joining the cluster to an AD
domain is to enable domain users to access cluster data.
A cluster that joins a domain becomes a domain resource and acts as a file server. The
domain join process can take up to several minutes depending on the complexity of the
domain being joined. While joining the domain, the browser window displays the status of
the process and confirms when the cluster has successfully joined the AD domain. During
the process of joining the domain, a single computer account is created for the entire cluster.
If the web administration interface is being used to join the domain, you must enable pop-
ups in the browser.
Before joining the domain, complete the following steps:
 NetBIOS requires that computer names are 15 characters or less. Two to four
characters are appended to the cluster name you specify to generate a unique name
for each node. If the cluster name is more than 11 characters, you can specify a
shorter name in the Machine Name box on the Join a Domain page.

Isilon Solution Design 421

 Obtain the name of the domain to be joined.
 Use an account to join the domain that has the right to create a computer account in
that domain.
 Include the name of the organizational unit, or OU, in which you want to create the
cluster’s computer account. Otherwise the default OU “Computers” is used.
When a cluster is destined to be used in a multi-mode environment, the cluster connect to
the LDAP server first before joining the AD domain, so that proper relationships are
established between UNIX and AD identities. Joining AD first and then LDAP will likely create
some authentication challenges and permissions issues that will require additional
Click on the button to learn about trusts and pass-through authentication.
Trusts and Pass-through Authentication:
The AD authentication provider in an Isilon cluster supports domain trusts and NTLM (NT
LAN Manager) or Kerberos pass through authentication. This means that a user
authenticated to an AD domain can access resources that belong to any other trusted AD
domain. Because the cluster is a domain resource, any user that is authenticated to a
trusted domain can access the cluster’s resources just as members of the cluster’s domain
can access the cluster’s resources. These users must still be given the permission to cluster’s
resources, but pass through authentication makes it possible to grant trusted users access
to the cluster’s resources. For this reason, a cluster needs only to belong one Active
Directory domain within a forest or among any trusted domains. A cluster should belong to
more than one AD domain only to grant cluster access to users from multiple untrusted
NIS provides authentication and uniformity across local area networks. OneFS includes a NIS
authentication provider that enables you to integrate the cluster into an existing NIS
infrastructure in your network. The NIS provider is used by the Isilon clustered storage
system to authenticate users and groups that are accessing the cluster. The NIS provider
exposes the passwd, group, and netgroup maps from a NIS server. Hostname lookups are
also supported. Multiple servers can be specified for redundancy and load balancing.
NIS is different from NIS+, which Isilon clusters do not support.
LDAP can be used in mixed environments and is widely supported. It is often used as a
meta-directory that sits between other directory systems and translates between them,
acting as a sort of bridge directory service to allow users to access resources between
disparate directory services or as a single sign-on resource. It does not offer advanced
features that exist in other directory services such as Active Directory.
A netgroup is a set of systems that reside in a variety of different locations that are grouped
together and used for permission checking. For example, a UNIX computer on the 5th floor,
six UNIX computers on the 9th floor, and 12 UNIX computers in the building next door, all
combined into one netgroup.

Isilon Solution Design 422

Within LDAP, each entry has a set of attributes and each attribute has a name and one or
more values associated with it that is similar to the directory structure in AD. Each entry
consists of a distinguished name, or DN, which also contains a relative distinguished name
(RDN). The base DN is also known as a search DN since a given base DN is used as the
starting point for any directory search. The top-level names almost always mimic DNS names,
for example, the top-level Isilon domain would be dc=isilon,dc=com for
You can configure Isilon clusters to use LDAP to authenticate clients using credentials stored
in an LDAP repository. The LDAP provider in an Isilon cluster supports the following features:
 Users, groups, and netgroups
 Configurable LDAP schemas. For example, the ldapsam schema allows NTLM
authentication over the SMB protocol for users with Windows-like attributes.
 Simple bind authentication (with or without SSL)
 Redundancy and load balancing across servers with identical directory data
 Multiple LDAP provider instances for accessing servers with different user data
 Encrypted passwords
Click on the button to review considerations with LDAP.
LDAP Considerations:
To enable the LDAP service, you must configure a base distinguished name (base DN), a port
number, and at least one LDAP server. Before connecting to an LDAP server you should
decide which optional customizable parameters you want to use. You can enable the LDAP
service using the web administration interface or the CLI. LDAP commands for the cluster
begin with isi auth config ldap. To display a list of these commands, run the isi auth config
ldap list command at the CLI.
If there are any issues while configuring or running the LDAP service, there are a few
commands that can be used to help troubleshoot. Often issues involve either misconfigured
base DNs or connecting to the LDAP server. The ldapsearch command can be used to run
queries against an LDAP server to verify whether the configured base DN is correct and the
tcpdump command can be used to verify that the cluster is communicating with the
assigned LDAP server.
Note: AD and LDAP both use TCP port 389. Even though both services can be installed on
one Microsoft server, the cluster can only communicate with one of services if they are both
installed on the same server.
Multiple LDAP servers can be specified, to meet various customer needs. One case is if
multiple LDAP servers provide the same authentication data as part of a high availability
system. Another reason is if there are multiple LDAP services which will be connected to
multiple access zones for different user groups.
The Local provider supports authentication and lookup facilities for local users and groups
that have been defined and are maintained locally on the cluster. It does not include system
accounts such as root or admin. UNIX netgroups are not supported in the Local provider.

Isilon Solution Design 423

The Local provider can be used in small environments, or in UNIX environments that contain
just a few clients that access the cluster, or as part of a larger AD environment. The Local
provider plays a large role when the cluster joins an AD domain. Like the local groups that
are used within an Active Directory environment, the local groups created on the cluster can
included multiple groups from any external provider. These external groups would be added
to the cluster local group to assist in managing local groups on the cluster.
OneFS uses /etc/spwd.db and /etc/group files for users and groups associated with running
and administering the cluster. These files do not include end-user account information;
however, you can use the file provider to manage end-user identity information based on
the format of these files.
The file provider enables you to provide an authoritative third-party source of user and
group information to the cluster. The file provider supports the spwd.db format to provide
fast access to the data in the /etc/master.passwd file and the /etc/group format supported
by most UNIX operating systems.
The file provider pulls directly from two files formatted in the same manner as /etc/group
and /etc/passwd. Updates to the files can be scripted. To ensure that all nodes in the cluster
have access to the same version of the file provider files, you should save the files to the
/ifs/.ifsvar directory. The file provider is used by OneFS to support the users root and nobody.
The file provider is useful in UNIX environments where passwd, group, and netgroup files
are synchronized across multiple UNIX servers. OneFS uses standard BSD /etc/spwd.db and
/etc/group database files as the backing store for the file provider. The spwd.db file is
generated by running the pwd_mkdb command-line utility. Updates to the database files can
be scripted.
You can specify replacement files for any combination of users, groups, and netgroups.
Note: The built-in System file provider includes services to list, manage, and authenticate
against system accounts (for example, root, admin, and nobody). Modifying the System file
provider is not recommended.

Isilon Solution Design 424

Active Directory

Isilon Solution Design 425

Trusts and Pass-through Authentication

Isilon Solution Design 426


Isilon Solution Design 427


Isilon Solution Design 428

LDAP Considerations

Isilon Solution Design 429


Isilon Solution Design 430


Isilon Solution Design 431

Lesson 3: Permissions and User Identity


Upon completion of this lesson, you will be able to describe the permission models and
design considerations for NFS, NTFS/ACLs, and Multiprotocol environment.

Isilon Solution Design 432

Overview: Permissions and User Identity


OneFS supports three primary identity types, each of which can be stored directly on the file
system. These identity types are used when creating files, checking file ownership or group
membership, and performing file access checks. The identity types supported by OneFS are:
 User identifier, or UID, is a 32-bit string that uniquely identifies users on the cluster.
UIDs are used in UNIX-based systems for identity management.
 Group identifier, or GID, for UNIX serves the same purpose for groups that UID does
for users.
 Security identifier, or SID, is a unique identifier that begins with the domain identifier
and ends with a 32-bit relative identifier (RID). Most SIDs take the form S-1-5-21-<A>-
<B>-<C>-<RID>, where <A>, <B>, and <C> are specific to a domain or computer, and
<RID> denotes the object inside the domain. SID is the primary identifier for users
and groups in Active Directory.
The available on-disk identity types are UNIX, SID, and Native.
 If the UNIX on-disk identity type is set, the system always stores the UNIX identifier, if
available. During authentication, the system authentication lsassd daemon looks up
any incoming SIDs in the configured authentication sources. If a UID/GID is found,
the SID is converted to either a UID or GID. If a UID/GID does not exist on the cluster,
whether it is local to the client or part of an untrusted AD domain, the SID is stored
instead. This setting is recommended for NFSv2 and NFSv3, which use UIDs and GIDs

Isilon Solution Design 433

 If the SID on-disk identity type is set, the system always stores a SID, if available.
During the authentication process, lsassd searches the configured authentication
sources for SIDs to match to an incoming UID or GID. If no SID is found, the UNIX ID
is stored on-disk.
 If the Native on-disk identity is set, the lsassd daemon attempts to choose the correct
identity to store on disk by running through each of the ID mapping methods. If a
user or group does not have a real UNIX identifier (UID or GID), it stores the SID. This
is the default setting in OneFS 6.5 and later.
If you upgrade from a previous version of OneFS, by default, the on-disk-identity is UNIX. For
new installations or re-imaging, the default on-disk identity type is Native.

Qualifying Questions


When dealing with Identity Management there are some additional questions to ask, such
as: What environment are they using today? Look for special requirements; for example,
should groups be mapped? Is there some reason why groups should NOT be mapped? You
are trying to find out if the policy system currently used for Identity Mapping is up to the task
once Isilon is introduced.

Isilon Solution Design 434

Although Isilon supports varied infrastructures, every vendor does Identity Mapping slightly
differently. You may not be able to do an exact mapping if custom vendor mapping
configurations are used. If we get access to a customer’s current ID Mapping database or file,
we can determine whether we can fully replicate the configuration. Review the prospect’s ID
Mapping requirements and ensure that Isilon can accomplish what is required.

Identity Mapping Rules


On new installations and re-imaging, the on-disk identity is set to Native, which is likely to be
best identity for a network that has UNIX and Windows clients. If an incoming authentication
request comes in, the authentication daemon attempts to find the correct UID/GID to store
on disk by checking for the following ID mapping types in this specified order:
1. If the source has a UID/GID, use it. This occurs when incoming requests from AD has
Services for NFS or Services for UNIX installed. This service adds an additional
attribute to the AD user (uidNumber attribute) and group (gidNumber attribute)
objects. When you configure this service, you identify from where AD will acquire
these identifiers.
2. Check if the incoming SID has a mapping in the ID Mapper.
3. Try name lookups in available UID/GID sources. This can be a local, or sam.db,
lookup, as well as LDAP, and/or NIS directory services. By default, external mappings

Isilon Solution Design 435

from name lookups are not written to the ID Mapper database.
4. Allocate a UID/GID.
You can configure ID mappings on the Access page. To open this page, expand the
Membership & Roles menu, and then click User Mapping. When you configure the settings
on this page, the settings are persistent until changed. The settings in here can however
have complex implications, so if you are in any doubt as to the implications, the safe option
is to talk to EMC Technical Support and establish what the likely outcome will be.

Secondary Identifiers


UIDs, GIDs, and SIDs are primary identifiers of identity. Names, such as usernames, are
classified as a secondary identifier. This is because different systems such as LDAP and
Active Directory may not use the same naming convention to create object names and there
are many variations in the way a name can be entered or displayed. Some examples of this
include the following:
 UNIX assumes unique case-sensitive namespaces for users and groups. For example,
Name and name can represent different objects.
 Windows provides a single namespace for all objects that is not case-sensitive, but
specifies a prefix that targets a specific Active Directory domain. For example,

Isilon Solution Design 436

 Kerberos and NFSv4 define principles, which requires that all names have a format
similar to email addresses. For example name@domain.
As an example, given the name support and the domain EXAMPLE.COM, then support,
EXAMPLE\support, and support@EXAMPLE.COM are all names for a single object in Active
In an Isilon cluster, whenever a name is provided as an identifier, the correct primary
identifier of UID, GID, or SID is requested.
The administrator can configure the ID mapping system to record mappings based on
names, but it is not the default setting.

Multiple Identities


Although there are multiple ways to authenticate users to the same cluster, the aim is to
treat users uniformly regardless of how they reached the cluster. Whether the case is a team
of developers who have Windows, Apple, and UNIX operating systems on each desktop, or
internal and external sales networks, which are being integrated into a uniform
authentication scheme, or two entire corporations which are merging and therefore
combining their IT infrastructure, the need is to provide a consistent and uniform mapping

Isilon Solution Design 437

of user identities externally to user identities that Isilon uses internally.
This does not apply to a forest of mutually trusting Active Directory servers, because user
identification is handled within AD in this scenario so there is no need for the Isilon cluster to
perform any disambiguation.
Isilon handles multiple user identities by mapping them internally to unified identities.

Considerations: Permissions and User Identity


Identity Management can be difficult, especially when multiple directory services are being
used. For example, mixed-mode environments can use a lot of custom mapping to go
between Windows and NFS users but Isilon does well in mixed-mode. If File Services or ID
Mapping for UNIX are already present, that should make identity mapping for Isilon
relatively easy.
The complexity begins if the user names between platforms do not match. For example, if
Julio is "Julio" on Windows but "John" on UNIX, the user has to be mapped manually. If Julio is
"Julio" on both Windows and UNIX, then identity mapping can be done programmatically.

Isilon Solution Design 438

Lesson 4: Access Control


Upon completion of this lesson, you will be able to describe the Role Based Access Control
(RBAC), explain Management resources and limitations, and distinguish between various

Isilon Solution Design 439

Overview: RBAC


Role based access control (RBAC) allows the right to perform particular administrative
actions to be granted to any user who can authenticate to a cluster. Roles are created by a
Security Administrator, assigned privileges, and then assigned members. All administrators,
including those given privileges by a role, must connect to the System zone to configure the
cluster. When these members log in to the cluster through a configuration interface, they
have these privileges. All administrators can configure settings for access zones, and they
always have control over all access zones on the cluster.

Isilon Solution Design 440

RBAC Roles


You can permit and limit access to administrative areas of your EMC Isilon cluster on a per-
user basis through roles. OneFS includes several built-in administrator roles with predefined
sets of privileges that cannot be modified. You can also create custom roles and assign
The following list describes what you can and cannot do through roles:
 You can assign privileges to a role.
 You can create custom roles and assign privileges to those roles.
 You can copy an existing role.
 You can add any user or group of users, including well-known groups, to a role as
long as the users can authenticate to the cluster.
 You can add a user or group to more than one role.
 You cannot assign privileges directly to users or groups.
Built-in roles are included in OneFS and have been configured with the most likely privileges
necessary to perform common administrative functions. You cannot modify the list of
privileges assigned to each built-in role; however, you can assign users and groups to built-in
The following sections describe each of the built-in roles and include the privileges and

Isilon Solution Design 441

read/write access levels assigned to each role.
Also, with the custom roles which supplement built-in roles, Admins can create custom roles
and assign privileges mapped to administrative areas in your EMC Isilon cluster environment.
For example, you can create separate administrator roles for security, auditing, storage
provisioning, and backup.
For the complete listing of different roles and privileges please refer to the latest Isilon web
administration guide.

RBAC Best Practices


Roles also give you the ability to assign privileges to member users and groups. By default,
only the root user and the admin user can log in to the web administration interface through
HTTP or the command-line interface through SSH. Using roles, the root and admin users can
assign others to built-in or customer roles that have login and administrative privileges to
perform specific administrative tasks.

Isilon Solution Design 442



Privileges permit users to complete tasks on an EMC Isilon cluster. Privileges are associated
with an area of cluster administration, such as Job Engine, SMB, or statistics. Privileges have
one of two forms:
 Action - Allows a user to perform a specific action on a cluster. For example, the
ISI_PRIV_LOGIN_SSH privilege allows a user to log in to a cluster through an SSH
 Read/Write - Allows a user to view or modify a configuration subsystem, such as
statistics, snapshots, or quotas. For example, the ISI_PRIV_SNAPSHOT privilege allows
an administrator to create and delete snapshots and snapshot schedules. A
read/write privilege can grant either read-only or read/write access. Read-only access
allows a user to view configuration settings; whereas the read/write access allows a
user to view and modify configuration settings.

Isilon Solution Design 443

Considerations: Access Control


In some cases, privileges cannot be granted or there are privilege limitations. Privileges are
not granted to users that do not connect to the System Zone during login or to users that
connect through the deprecated Telnet service, even if they are members of a role. Privileges
do not provide administrative access to configuration paths outside of the OneFS API. For
example, the ISI_PRIV_SMB privilege does not grant a user the right to configure SMB shares
using the Microsoft Management Console (MMC). Privileges do not provide administrative
access to all log files. Most log files require root access.

Isilon Solution Design 444

Module 7: Monitoring


Upon completion of this module, you will be able to describe Job Engine and its effect on the
cluster, explain monitoring and alerting in OneFS, and contrast sysctl with isi_tardis.

Isilon Solution Design 445

Lesson 1: Job Engine


Upon completion of this lesson, you will be able to explain Job Engine functionality, and
characterize job priorities and impact policies.

Isilon Solution Design 446

Overview: Job Engine


The Job Engine performs cluster-wide automation of tasks to minimize the Job Engine's
effect on the cluster's performance. The Job Engine's structure consists of a job hierarchy
and component orchestration. The job hierarchy (phase, task, and task items) are the job
processes and the components (coordinator, director, manager, and worker) are the pieces
that orchestrate the completion of the process. When a job starts, the Job Engine distributes
job segments-phases and tasks-across the nodes of your cluster. One node acts as job
coordinator and continually works with the other nodes to load-balance the work. In this
way, no one node is overburdened, and system resources remain available for other
administrator and system I/O activities not originated from the Job Engine.
Jobs can have a number of phases. There might be only one phase, for simpler jobs, but
more complex ones can have multiple phases. Each phase is executed in turn, but the job is
not finished until all the phases are complete. Each phase is broken down into tasks. These
tasks are distributed to the nodes by the coordinator, and the job is executed across the
entire cluster. Each task consists of a list of items. The result of each item’s execution is
logged, so that if there is an interruption the job can restart from where it stopped.
The Job Engine oversees many different jobs; from cluster maintenance to administrative
tasks. Some jobs are triggered by an event (e.g., drive failure), some are feature jobs (e.g.,
deleting a snapshot), and some are user action jobs (e.g., deleting data). Jobs do not run
continuously. For example, when a hard drive fails, a job runs to reprotect the data, ensuring
that all protection levels configured on data are properly implemented. All jobs have

Isilon Solution Design 447

priorities. If a low priority job is running when a high priority job is called for, the low priority
job is paused, and the high priority job starts to run.The Job Engine executes these jobs in
the background, and uses some resources from each node. You can learn more about jobs
by hovering over “Jobs”.
The Job Engine service has daemons that run on each node. The daemons manage the
separate jobs that are run on the cluster. Learn more about the daemons’ function by
hovering over the daemon image.
After completing a task, each node reports task status to the job coordinator. The node
acting as job coordinator saves this task status information to a checkpoint file.
Consequently, in the case of a power outage, or when paused, a job can always be restarted
from the point at which it was interrupted. This is important because some jobs can take
hours to run and can use considerable system resources. Learn more about the coordinator
by hovering over the Coordinator image.
 The coordinator is the executive of the Job Engine, this thread starts and stops jobs
and processes work results as they are returned during the execution of the job.
 The director runs on each node and communicates with the job coordinator for the
cluster and coordinates tasks with the three managers. One director per node.
 Each manager manages a single job at a time on the node. The three managers on
each node coordinate and manage the tasks with the workers on their respective
node. Each node has a manager, responsible for managing the flow of tasks and task
results throughout the node. Managers request and exchange work with each other
and supervise the worker processes they assign. Hover over the Manager or Director
image to see more.
 Each node runs one to many workers to complete its work item. OneFS can throttle a
job by reducing number of workers, thus maintaining acceptable CPU and disk I/O
performance. Each worker is given a task, if any task is available. The worker then
processes the task item by item until the task is complete or the manager removes
the task from the worker. The number of workers assigned to a task is set by the
job's impact policy. The impact policy applied to the cluster is based on the highest
impact policy for all current running jobs.
Let’s show a sequence. First the job starts. The coordinator then balances the work across
the other nodes in the cluster and the job phase starts. The tasks of the phase begins with
the communication between the director and coordinator.
The relationship between the running jobs and the system resources is complex. Several
dependencies exist between the category of the different jobs and the amount of system
resources consumed before resource throttling begins. The default job settings, job
priorities, and impact policies are designed to balance the job system requirements. The
most important jobs have the highest job priority and should not be modified.

Isilon Solution Design 448


Isilon Solution Design 449


Isilon Solution Design 450


Isilon Solution Design 451

Directory and Manager

Isilon Solution Design 452

Job Priority


Every job is assigned a priority that determines the order of precedence relative to other
jobs. The lower the number assigned, the higher the priority of the job. As an example,
FlexProtect, the job to reprotect data from a failed drive and restore the protection level of
individual files, is assigned a priority of 1, which is the top job priority.
When multiple jobs attempt to run at the same time, the job with the highest priority takes
precedence over the lower priority jobs. If more than three jobs are called upon to run, or
two job phases that occupy the same exclusion set, the lower priority job is interrupted and
paused while the higher priority job runs. The paused job restarts from the point at which it
was interrupted. Exclusion sets are explained in greater detail later in this content.
New jobs of the same or lower priority of a currently running job are queued and then
started after current job completes.
Job priority can be changed either permanently or during a manual execution of a job. If a
job is set to the same priority as the running job, the running job will not be interrupted by
the new job. It is possible to have a low impact, high priority job, or a high impact, low
priority job.
In the Job Engine, jobs from similar exclusion sets are queued when conflicting phases may
run. If there is a queued job or new job phase ready to start from another exclusion set or
from the all other jobs category, the job will also be run.
Changing the priority of a job can have negative effect on the cluster. Jobs priority is a

Isilon Solution Design 453

tradeoff of importance. Historically, many issues have been created by changing job
priorities. Job priorities should remain at their default unless instructed to be changed by a
senior level support engineer.

Job Workers


The job daemon uses threads to enable it to run multiple tasks at the same time. A thread is
similar to a process, but multiple threads can work in parallel inside a single process. The
coordinator tells the job daemon on each node what the impact policy of the job is, and
consequently, how many threads should be started to get the job done.
Each thread handles its task one item at a time and the threads operate in parallel. A
number of items are being processed at any time. The number of items being processed is
determined by the number of threads. The defined impact level and the actual load placed
on any one node is managed by the maximum number of assigned threads.
It is possible to run enough threads on a node that they can conflict with each other. An
example would be five threads all trying to read data off the same hard drive. Each thread
cannot be served at once and so they have to wait for each other to complete. The disk can
thrash while trying to serve all the conflicting access requests, thus reducing efficiency. A
threshold exists to the useful degree of parallelism available depending upon the job.
Increasing the impact policy for a job is not usually advisable. You need to understand what

Isilon Solution Design 454

each job is doing to assess the costs and benefits before changing the impact policy. As a
general recommendation, all impact policy settings should remain as the default settings.

Job Impact Policies


In addition to being assigned a priority, every job is assigned an impact policy that
determines the amount of cluster or node resources assigned to the job. The cluster
administrator must decide what is more important; the use of system resources to complete
the job or to have the resources available for processing workflow requirements.
A default impact policy has been set for each job based on how much of a load the job
places on the system. Very complex calculations are used in determining how cluster
resources are allocated.
By default, the system includes default impact profiles with varying impact levels assigned-
low, medium, high; and the ability to create custom schedule policies if required. Increasing
or lowering an impact level from its default results in increasing or lowering the number of
workers assigned to the job. The number of workers assigned to the job affects the time
required to complete the job and the impact on cluster resources.
By default, the majority of jobs have the LOW impact policy, which has a minimum impact on
the cluster resources.

Isilon Solution Design 455

More time-sensitive jobs have a MEDIUM impact policy. These jobs have a higher urgency of
completion usually related to data protection or data integrity concerns.
The use of the HIGH impact policy is discouraged because it can affect cluster stability. This
has not been found to be a problem with TreeDelete, but is known to be a problem with
other jobs. The HIGH impact policy should not be assigned to other jobs. HIGH impact policy
use can cause contention for cluster resources and locks that can result in higher error rates
and negatively affect job performance.
The OFF_HOURS impact policy allows greater control of when jobs run in order to minimize
impact on the cluster and provide the maximum amount of resources to handle customer
Impact policies in the Job Engine are based on the highest impact policy for any currently
running job. Impact policies are not cumulative between jobs but set the resource levels and
number of workers shared between the jobs.
Modified in the job impact settings can cause significant performance problems. Lowering
the number of workers for a job can cause jobs to never complete. Raising the impact level
can generate errors or disrupt production workflows. Use the default impact policies for the
jobs whenever possible. If customer workflows require reduced impact levels, create a
custom schedule based on the OFF_HOURS impact policy.

Exclusion Sets

Isilon Solution Design 456


Job Engine includes the concept of job exclusions sets. Exclusion sets provide additional
impact management. Job phases are grouped into three categories, the two exclusion sets,
restripe exclusion, mark exclusion and then all other jobs. Two categories, restripe jobs and
mark jobs, modify core data and metadata. Although up to three jobs can run at the same
time, multiple restripe or mark job phases cannot safely and securely run simultaneously
without either interfering with each other or the risk of data corruption. Only one restripe
category job phase and one mark category job phase can run at the same time. There is one
job that is both a restripe job and a mark job (MultiScan). When this job's restripe and mark
phases run, no additional restripe or mark job phases are permitted to run. Up to three
other jobs can run at the same time and can run in conjunction with restripe or mark job
phases. Only one instance of any job may run at the same time.
In addition to the valid simultaneous job combinations, the mark/restripe job (MultiScan) can
run with two “other” jobs.
As an example, a restripe job say SmartPools, can run while a mark job, such as IntegrityScan,
is running and another job that is not a restripe or mark job, such as Dedupe, can also run.
For a comprehensive list of job categories and their definitions, see the Isilon OneFS Version
8 Web Administration Guide on or the OneFS Job Engine white paper at

Consideration: Job Engine

Isilon Solution Design 457


The number of workers for each job equals the impact level for that job multiplied by the
number of nodes in the cluster. The benefits of reconfiguring the default priority, impact
policy and/or schedule should be carefully weighed against any potential effects. Cluster
should be less than 90% of capacity so as not to affect performance. Disabling the snapshot
delete job will prevent space from being freed and can cause performance degradation.
Schedule jobs to run outside of busy, production hours. When possible, use the default
priority, impact and schedule for each job, but where there are clear peak times for business
activities, it makes sense to schedule jobs to run with the off hours configuration, so that
ordinary business is less affected by job activities. Always a best practice is to set up alerts in
the event a job terminates abnormally.

Lesson 2: Monitoring and Alerting


Upon completion of this lesson, you will be able to explain benefits of InsightIQ, describe
purpose of the cluster event system, and discuss purpose of ESRS.

Isilon Solution Design 458

Overview: Monitoring and Alerting


InsightIQ focuses on Isilon data and performance. InsightIQ is available for no charge and
provides advanced analytics to optimize applications, correlate workflow and network events.
It provides tools to monitor and analyze a cluster’s performance and file systems, including
performance, capacity, activity, trending, and analysis. InsightIQ runs on separate hardware
from the clusters it monitors and provides a graphical output for easy trend observation and
analysis. The tool does not burden the cluster beyond the data collection process.
InsightIQ has a straightforward layout of independent components. The Isilon cluster
generates monitoring data. isi_stat_d collects the data, and the cluster presents data
through isi_api_d, which handles PAPI calls, over HTTP. The InsightIQ datastore can be local
to the InsightIQ host or external through an NFS mount from the Isilon cluster, or any NFS-
mounted server. The datastore must have at least 70GB of free disk space. File System
Analytics (FSA) data is kept in a database on the cluster. InsightIQ retrieves monitoring data
from the cluster through PAPI rather than an NFS mount. Previous releases stored FSA data
externally, which was inefficient for a number of reasons.
InsightIQ is accessible through any modern web browser, such as Microsoft Edge, Internet
Explorer, Mozilla Firefox, Apple Safari, and Google Chrome. If InsightIQ is to be loaded on a
Red Hat or CentOS Linux system, EMC provides it in the form of an rpm package.

Isilon Solution Design 459

Qualifying Questions


Verify that you are asking the right questions and getting the customer to think of both the
implementation and the potential opportunities or challenges.

Isilon Solution Design 460

Customizable Reports


InsightIQ’s reporting allows monitoring and analysis of cluster activity in the InsightIQ web-
based application. Reports are customizable, and can provide cluster hardware, software,
and protocol operations information. InsightIQ data can highlight performance outliers,
helping to diagnose bottlenecks and optimize workflows. Use cases include:
 Problem isolation: Report to isolate the cause of performance or efficiency related
 Measurable effects of configuration changes: Report comparing past
performance to present performance
 Application optimization: Report to identify performance bottlenecks or
 Analyze real-time and historical data: Report on cluster information such as
individual component performance
 Forecasting: Report on the past cluster capacity consumption to forecast future

Isilon Solution Design 461

File System Analytics (FSA)


File System Analytics (FSA) is the Isilon system that provides detailed information about files
and directories on an Isilon cluster. Unlike InsightIQ data sets, which are stored in the
InsightIQ datastore, FSA result sets are stored on the monitored cluster in the
/ifs/.ifsvar/modules/fsa directory. The monitored cluster routinely deletes result sets to save
storage capacity. You can manage result sets by specifying the maximum number of result
sets that are retained.
The OneFS Job Engine runs the FSAnalyze job daily, which collects all the information across
the cluster, such as the number of files per location or path, the file sizes, and the directory
activity tracking. InsightIQ collects the FSA data from the cluster for display to the storage

Isilon Solution Design 462

InsightIQ vs. isi statistics


The isi statistics command gathers the same information as InsightIQ, but presents the
information in text form instead of using graphics and charts. The table lists some of the
major differences between isi statistics and InsightIQ. In situations where InsightIQ is
unavailable, isi statistics is a powerful and flexible way of gathering cluster data.
Some isi statistics parameters include the following list:
 isi statistics protocol --classes read,write,namespace_read,namespace_write
This format provides a display of statistics organized by protocol, such as NFS3, HTTP,
and others. The --classes options describe the list of protocol operations to measure.
 isi statistics client --remote_names "<IP Address>"
This format provides statistics broken out by users or clients accessing the cluster.
 Here are some of the other isi statistics subcommands:
 query mode provides highly customizable access to any statistic in the cluster
statistics library.
 query history mode provides basic access to historical values of statistics which are
configured to support history.
 drive mode shows performance by drive.
 heat mode displays the most active areas of the cluster file system.

Isilon Solution Design 463

 pstat mode displays a selection of cluster-wide and protocol data.
 list valid arguments to given options.
 system mode displays general cluster statistics. This mode displays operation rates
for all supported protocols, as well as network and disk traffic (in kB per second).
You can use the isi statistics command within a cron job to gather raw statistics over a
specified time period. A cron job can run on UNIX-based systems to schedule periodic jobs.
Note that cron works differently on an Isilon cluster vs. a UNIX machine so contact support
before using it.
InsightIQ retains a configurable amount of historic information with regard to the statistics it
collects. To prevent collection of a large backlog of data, InsightIQ retains data sets to
provide trending information over a year, but these settings are also configurable.

Overview: System Events


The CELOG (cluster event log) monitors, logs and reports important activities and error
conditions on the nodes and cluster. OneFS uses events and event notifications to alert you
to potential problems with cluster health and performance. Events and event notifications
enable you to receive information about the health and performance of the cluster,
including drives, nodes, snapshots, network traffic, and hardware.

Isilon Solution Design 464

The main goal to a system events feature is provide a mechanism for customers and support
to view the status of the cluster. Events provide notifications for any ongoing issues and
displays the history of an issue. This information can be sorted and filtered by date,
type/module, and criticality of the event.
CELOG is designed to support the task-management systems, such as the Job Engine. The
task-management systems notify CELOG of major task changes, such as starting and
stopping a job. However, the task-management system does not notify CELOG of internal
substates, such as which files are being worked on and what percentage of completion the
job has reached. The other type of system events that are generated are a result of errors,
such as file system errors, threshold violations, system messages, and Simple Network
Management Protocol (SNMP) traps.

Cluster Event Architecture


An event is a notification that provides important information about the health or

performance of the cluster. Some of the areas include the task state, threshold checks,
hardware errors, file system errors, connectivity state and a variety of other miscellaneous
states and errors.
The raw events are processed by the CELOG coalescers and are stored in log databases, and
coalesced into event groups. Events themselves are not reported, but CELOG reports on

Isilon Solution Design 465

event groups. Reporting on event groups is not uniform, but depends on conditions, and
defined reporting channels. For example, networking issues would be reported to a channel
that includes network administrators, but database administrators would probably not
benefit much from the information, so their reporting channel need not be on the list for
networking related issues.

Overview: ESRS


EMC Secure Report Services (ESRS) is a mature and well-established system that
communicates alerts and logs, and enables EMC support staff to remotely perform support
and maintenance tasks. ESRS monitors the Isilon cluster on a node-by-node basis, sending
alerts regarding the health of your devices. It provides a secure, IP-based customer service
support system that features x7 remote monitoring, secure authentication with AES 6-bit
encryption, and RSA digital certificates. ESRS is included with the OneFS operating system
and not licensed separately.
InsightIQ status is monitored through ESRS. Information passed to the cluster is automatic,
passing registration information through to ESRS. There is no administrative intervention
needed to achieve the registration.

Isilon Solution Design 466

ESRS Architecture


The graphic shows the general architecture of ESRS operation in a heterogeneous EMC
environment. ESRS functions as communications brokers between the managed devices, the
Policy Manager, and the EMC Enterprise. All communication with EMC initiates from ESRS on
port 443 or 8443 outbound from the customer site to EMC support services. EMC does not
establish inbound network communications to the systems. This is a security measure which
is to the benefit of customers that run secure sites but do permit limited, controlled
outbound communications.
Although the Policy Manager is optional, it is required to fulfill requirements for
authentication, authorization and auditing. By implementing the optional ESRS Policy
Manager, customers can enable monitoring on a node-by-node basis, allow or deny remote
support sessions, and review remote customer service activities. The Policy Manager enables
permissions to be set for ESRS managed devices. When the ESRS server retrieves a remote
access request from the EMC Enterprise, the access is controlled by the policies configured
on the Policy Manager and are enforced by the ESRS server.
Communications between the customer site and EMC support flow over an encrypted HTTPS
connection, which means that sensitive information does not traverse the internet
ESRS can be configured for redundancy with more than one ESRS instance installed, allowing
reports through ESRS in the event of hardware or partial data environment failure. On the
EMC support side, only authorized EMC representatives have access to the customer

Isilon Solution Design 467

systems or their information.

Isilon Log Transmission via ESRS


Isilon logs, even compressed, can be many gigabytes of data. There are ways of reducing the
log burden, such as gathering incremental logs rather than complete log records or selecting
specific logs to gather, but even so, logs on Isilon tend to be large. Uploading logs may
require a lot of bandwidth and could take a while with the risk of timeouts and restarts. The
support scripts are based on the isi_gather_info tool. The remote support scripts are
located in the ifs/data/Isilon_Support directory on each node. The scripts can be run
automatically to collect information about your cluster's configuration settings and
operations. ESRS uploads the information to a secure Isilon FTP site, so that it is available for
Isilon Technical Support personnel to analyze. The remote support scripts do not affect
cluster services or the availability of your data.

Isilon Solution Design 468

Consideration: InsightIQ


Monitor clusters over a LAN connection. Monitoring clusters over a WAN connection can
significantly degrade InsightIQ performance.
The FSA job consumes computing resources on the monitored cluster and can affect cluster
performance. If cluster performance is negatively affected, the FSA feature can be disabled,
thus preventing the job from running.
Limiting the size of the InsightIQ datastore through a quota call limits the ability of IIQ to
properly run and save its data. If you use a quota, the datastore could become full before
InsightIQ begins deleting older data to make room for newer data. Do not apply a quota to
the /ifs/.ifsvar/modules/fsa directory.
If snapshots are used and the datastore is reverted to a snapshot, the InsightIQ datastore
might become corrupted.
The maximum number of clusters that you can simultaneously monitor is based on the
system resources available to the Linux computer or virtual machine. Isilon recommends
that you monitor no more than 8 storage clusters or 150 nodes with a single instance of
Set up an email notification to receive an alert when the datastore begins to fill up.
InsightIQ does not support Active Directory (AD) authentication.
Performance reports include deduplication information, which is cumulative.

Isilon Solution Design 469

ESRS treats each node as a separate device, and each node is individually connected to ESRS.
The cluster is not monitored as a whole and supports up to 0 devices. Also, as of OneFS 8.0,
SupportIQ (ESRS’s predecessor) is fully deprecated.

Considerations: InsightIQ (cont'd)


There is no one-size-fits-all solution to editing the cluster's configuration or the kernel

parameters. Most customers are very well served by the normal parameter settings and, at
best, would see no substantial improvements from making such changes. If there were such
a generally useful tweak that all customers could use, that tweak would have been
incorporated in the Isilon’s default build.
There are some rare cases where unusual or extremely sensitive customer applications can
benefit from some tweaks, and the downsides of the tweaks are negligible, or worth the
benefit to the primary workflow. The right time to propose these ideas is not at the inception
of a discussion, but during a proof of concept installation, and in consultation with
engineering with the goal of meeting customer performance requirements.
In practice, customers may ask about tweaking parameters. Obviously, Isilon makes this
possible, and you can certainly tell customers that the architecture is a flexible and
adaptable one, but at the same time you should also clarify that this is almost never a
concern. Customers may suggest that they always apply a certain sysctl tweak to all their

Isilon Solution Design 470

BSD systems, and this may even be true, but such customers should be warned that the
kernel in OneFS is modified to match the needs of the cluster, and as such kernel
configurations that are perfectly harmless in a regular BSD system may be very detrimental
to cluster operations.

sysctl and isi_tardis


OneFS has more than one level of cluster configuration. There is the per node level kernel
configuration (remember that each node runs its own modified version of BSD, that is
OneFS) and there is the cluster-wide functional level of configuration, that incorporates
configurations such as exports and shares, network pools and so on.
The OneFS kernel shares many traits with the original BSD kernel, including a parameter
modification system called sysctl (sys-control). This is not unique to BSD, but also shows up
in Linux systems, among others. If one were to use a sysctl command to change a
parameter on OneFS, it would only affect one node at a time, rather than the whole cluster.
Cluster-wide configurations are maintained in the isi_tardis (izzi-tardis) system. If you are
familiar with older versions of OneFS, you may have heard of the gconfig (gee-config)
system for maintaining global configurations. This gconfig system is now superceded by the
isi_tardis system. The advantage of the isi_tardis system is that it can roll back its
configuration. This is valuable when performing an upgrade that the customer decides to roll

Isilon Solution Design 471

back for some reason. The OneFS upgrade process can require configuration changes that
the isi_tardis system retains, and rolling back requires the ability to return to the prior

Use of sysctl and isi_tardis


The internal configuration tools do not live a secret existence.

As mentioned before, sysctls are not unique to OneFS, but are known across the BSD and
Linux community. Similarly, the isi_tardis_d daemon that manages the isi_tardis system for
configurations is not secret, and may come up in discussions about troubleshooting.
The temptation for many administrators is to look at them as opportunities to tweak things
for reasons of performance or function. Administrators should resist this temptation.
Modification of kernel parameters through the use of the sysctl interface, or direct cluster
configuration is quite likely to end badly, because there are many moving parts to any given
configuration. Editing kernel parameters is even more dangerous in some ways than editing
isi_tardis configurations, because the exact nature of kernel parameter operations can
change, without warning or documentation between different versions of OneFS.
If there is any question of a reason to edit isi_tardis configurations, or sysctls, then
engineering should be specifically engaged, every time. There is no configuration that will
always be safe, reliable or appropriate, and we don't recommend any general practices in

Isilon Solution Design 472

changing these configurations. More importantly, this is the sort of information that
Technical Support will want to have on hand in the event that the customer calls for product

Isilon Solution Design 473

Module 8: Solution Tools


Upon completion of this module, you will be able to:

 Demonstrate data gathering and analysis tools
 Explain number analysis in workflow context
 Translate gathered data into solution parameters

Isilon Solution Design 474

Lesson 1: Isilon Sizing Tool


Upon completion of this lesson, you will learn where to find the Isilon Sizing Tool, as well as
what it can do for you and what it offers while discussing installation scope with the

Isilon Solution Design 475

Overview: Isilon Sizing Tool


Isilon's sizing tool is an official tool, fully supported and regularly updated, that Isilon's team
produces to help sales and architectural staff produce plans for customers to consider. The
tool's users can build a plan for a cluster, tweak the plan, add or remove nodes, change
OneFS versions under which the plan is created, or start from scratch and build an entire
cluster from the ground up. The sizing tool does not dictate a single answer, but offers a
range of answers for discussion. The exact choice of one plan over another depends upon
the precise situation on the ground.
The information in the sizing tool is maintained and updated as Isilon's offerings improve.
New drive types, new software choices and new node types are all added as the information
becomes available, so you can use the sizing tool to present customers with current choices.
This also means that you can compare and contrast existing cluster installations with what
may be possible during and after upgrades, so it's not only a tool for helping new customers.

Isilon Solution Design 476

References for Isilon Sizing Tool


The Isilon Sizing Tool is a web-based tool. This makes it easy to use from a remote site while
talking to a customer, or internally in the office. Any modern browser should be adequate to
display the website correctly, as well as most tablet-based browsers; this makes it easy and
convenient to engage the customer in an imaginative exercise on the fly.

Isilon Solution Design 477

Sizing Tool Wizards


The Isilon Sizing Tool offers a number of useful wizards, intended to help you quickly drill
down to an appropriate solution. Feel free to explore them and experiment, so that you are
ready to offer customers quick and compelling solutions. The entire tool offers many
functions, including drive and file sizing, specific solutions and saving old solutions. You will
be able to use them all in your day-to-day work.

Isilon Solution Design 478

Exporting Configs for Customer


The sizing tool enables you to export the configuration that you create as a spreadsheet.
This makes it an excellent basis for further discussion, tweaks or development into a firm
sales proposal. This tool also makes a fine foundation for explaining the finer points of
configurations to the customer.
For example, you could use the tool to illustrate how clusters with more nodes achieve
higher storage efficiencies, or how X-Series nodes contain more drives than S-Series nodes.
You could also use the tool to construct complex clusters with different node types and drive
sizes. Other factors, such as RAM capacity also figure in the tool, so you have ample
opportunity to tweak and modify a proposed solution.

Isilon Solution Design 479

Lesson 2: Other Assessment Tools


Upon completion of this lesson, you will learn where to find more assessment tools that
address a wide range of metrics, as well as what the strengths of each tool are. You will also
see how you can use these tools to demonstrate the Isilon difference in a Proof of Concept
or actual installation.

Isilon Solution Design 480



IOZone is an open source benchmarking tool. It runs on a variety of platforms, including

Windows (various versions), Linux, and various UNIX operating systems. It runs from a client
machine and measures actual throughput.
IOZone produces sophisticated graphs and, as a third-party tool, it provides credible metrics
for performance comparison. This is good news for Isilon because it allows us to cut through
FUD (Fear, Uncertainty and Doubt - a competitive sales strategy made famous by IBM) to real
IOZone's typical use case is directly attached storage, but it is possible to use options that
will measure network latencies. (-c and -e measure close and flush I/O operations.) However,
the fact that IOZone can handle direct attached storage means that it can also be used
locally on the cluster to help differentiate local cluster problems from network issues.

Isilon Solution Design 481

IOZone Use Case


IOZone is a powerful tool for understanding actual storage access rates. It does not rely
upon theoretical numbers or artificial arrangements, but reflects exactly what it finds on the
system where it is run. IOZone, like most benchmarking tools, can differentiate various API
calls and file systems.
The best use of this tool is either before a migration, to determine what the customer's
needs and activity are from their prior technology, or once an Isilon cluster has been
installed for either proof of concept or production, to demonstrate to the customer what
performance they are experiencing. This sort of benchmarking and workload simulation is
never a perfect match to reality, so bear the limitations of the tools in mind. As you gain
experience with them, you will get better at using them to discover what you need.
Tools, such as IOZone, are vendor neutral and so, they provide impartial data analysis that
can be presented to a variety of teams to prove either the source of an issue or the absence
of one. Because you can install it directly on an Isilon cluster, you can illustrate to a customer
what kind of performance penalty their network is creating.

Isilon Solution Design 482



MiTrend is rather like IOZone in that it works on the basis of actual collected information.
MiTrend is very well set up with information on a wide variety of EMC products for
performing assessments, but all of that depends on the information collection.
MiTrend used to be called the Workload Profile Assessment tool, and you can still find
references to that terminology in the interface because it describes exactly what MiTrend
does. The purpose of the tool is to help you match workflow measurements to a set of
The MiTrend website provides its reports in a friendly format through the web interface,
which makes it another great tool for communicating with customers. MiTrend can be used
to gather actual information on systems such as legacy storage, Windows servers, network
devices and virtual systems as well. This makes it a great tool for determining what the real-
world workload is that the customer needs to serve. MiTrend also works with tools such as
perfcollect to gather data on the server or client-side, so as to gather detailed information on
what is happening. Because the data needs to be uploaded, this does not afford a real time
view of activity, but is a great tool for establishing a baseline.

Isilon Solution Design 483

Assessment Tool Requirements


The major difference between the tools we have discussed so far is that the Isilon sizing tool
does not depend on any specific data collected from the customer's network or environment.
This makes its answers more general, but it also allows for more flexibility. Customers can be
unwilling to allow the kind of specific data collection that IOZone and MiTrend require, for
good reasons. Having more than one tool allows for flexibility in how you approach solution

Isilon Solution Design 484



WireShark is a tool that can capture packets as well as analyze and display the output for
human readers. WireShark is a third-party tool, but it is free, and widely respected in the
tech community. You may well find that a serious customer already has WireShark installed,
and has historic packet captures that illustrate any problems that they may be experiencing.
WireShark is commonly used as a debugging tool, and it is in the recommended toolkit for
Isilon's support staff, but its ability to measure transactions across the network also makes it
a powerful information gathering tool in a Proof of Concept. You can use WireShark output
to demonstrate a qualitative as well as quantitative improvement in performance to a

Isilon Solution Design 485



Iometer is a third-party tool designed to do I/O performance testing on Windows systems. It

is not very sophisticated, for example it does not currently support directIO storage activities,
and thus cannot exclude client side caching from performance considerations.
Iometer's strongest use case in typical Proof of Concept or presales situations is to
demonstrate how well a set of windows machines can take advantage of an Isilon
installation. This is a great factor in large home directory workflows, as well as a number of
more advanced use cases. For example, you could use iometer to illustrate the difference in
performance experienced by clients connecting to different tiers of storage.

Isilon Solution Design 486



Iperf is a common third-party tool that administrators use for validating network throughput.
Iperf can be run from both client and server ends, one-to-one or one-to-many, such as when
you have ten clients concurrently running against an Isilon cluster.
Iperf is not a broad tool. It will not perform a detailed, point-by-point analysis of your routing
infrastructure, or analyze your firewall efficiency. All it does is to open a socket between two
points and send packets of a given profile down that link. You can use this to explore the
limits of performance on high latency links, or in complex network environments where it is
difficult to establish the performance of a full chain of network connections. Iperf is
obviously useful for illustrating a problem with network infrastructure, but can also be quite
useful in excluding network limitations as sources of performance problems.
Usually network issues become important when the customer has remote SyncIQ
installations in mind, and you can use iperf to estimate what sort of time may be involved in
performing synchronization, but iperf is also useful to illustrate how well an Isilon cluster
handles a large number of active clients running in parallel.

Isilon Solution Design 487

Lesson 3: Tools on Cluster


Upon completion of this lesson, you will learn how to use tools that exist on the Isilon cluster
to collect metrics and demonstrate the performance of the cluster and the network around
it to the customer.

Isilon Solution Design 488



iostat is a common tool in the UNIX world, frequently found on Linux machines as well as
BSD and Isilon nodes. It is more or less what it sounds like: it collects and displays various
system statistics, most notably about I/O rates. iostat is not particularly sophisticated, and
like most command line tools is not very attractive, but it is ready to use if you have an Isilon
installation. You can quickly get answers on activity levels, throughput and related statistics
such as CPU activity, and use these to diagnose customer problems as well as answer
customer questions.
If a new customer has Linux or BSD installations serving data, iostat can similarly be a useful
tool for establishing a ballpark estimate of their baseline load. This is a great start to a sizing
discussion, because you can describe what the customer's needs are in quantitative terms,
and describe how much headroom an Isilon cluster could offer.

Isilon Solution Design 489

isi statistics


isi statistics is an Isilon command-line tool that you can find on OneFS. If the customer has
a cluster already installed, you can use isi statistics to find pain points on the cluster, and
plan an approach to alleviating those. Alternatively, you can use it to show how much head
room there is on a cluster, or how a Proof of Concept installation is performing.
isi statistics has a very wide range of options, ranging from protocol activity through CPU
performance and network activity. It can show a single output report, or constantly refresh
its output so that you always have the current information available. isi statistics is a
customer-facing tool, so you can introduce a customer's storage administrators to it.

Isilon Solution Design 490



netstat is a real workhorse tool for monitoring anything relating to a system's network. Like
iostat and tcpdump, netstat is present on many different UNIX and UNIX-like systems
including Linux and OneFS. It is worth looking through netstat's man page to examine all the
options available, but among the most common are to see the system's routing table, active
and listening sockets. netstat can also display information on protocol activity, on interfaces
and more. Not only is this a powerful tool for checking that what you find is what you expect
to find, but if a customer is in any doubt about their networking configuration, this tool can
help clarify the situation.
Skilled use of netstat can also be a real confidence builder. Take some time to learn and
explore its options. A customer who sees that you are familiar with the tool, and how to use
it in various contexts will conclude that you do know your field, and this builds trust.

Isilon Solution Design 491



tcpdump is a common tool in the world of UNIX. Owing to the fact that OneFS is based on
FreeBSD, tcpdump is also present on OneFS.
What tcpdump actually does is to read packets from within the system, according to a series
of filters, and dump them to the screen, or to a file for later analysis.
As a rule, tcpdump is used for debugging purposes, but the record of packets that it
captures can be very revealing with respect to network activity, network errors, network
latency and so on. One can track other transactions that travel over the network, such as
authentication tasks. If there is a concern regarding network throughput or a doubt about
the source of some performance issue, tcpdump can be useful.
The problem with using tcpdump is that it captures all the bits. This means that customers
often consider it a security hole, and object to running tcpdump. To avoid any impropriety,
always get explicit permission before running tcpdump in a customer environment, and
consider engaging the support team to get expert help on using it in a tightly controlled

Isilon Solution Design 492



If you are faced with a Proof of Concept where the performance is not what you expect, it
can be very helpful to gather lots of data quickly and get support's assistance in analysis of
the problem. One of the tools that we have in Isilon, that is not suited for customer use but
can be very revealing with the right background, is isi_get_itrace.
What this tool does is to display the process stack information, in particular a listing of
processes that are sleeping, or waiting on I/O. If a process is shown stuck in this fashion, that
can lead to an analysis of why that is happening, along with a resolution that clears up
apparent performance problems. If you can gather isi_get_itrace output before opening a
conversation with support, that may help to resolve performance issues more quickly and

Isilon Solution Design 493

Module 9: Verticals and Horizontals


After completing this module, you will be able to define industry terminology, describe
relevant workflows, identify storage requirements for workflow, establish Isilon strengths
and challenges in this vertical, and examine design considerations.

Isilon Solution Design 494

Lesson 1: Media and Entertainment


This lesson introduces you to how Isilon can be used in the Media & Entertainment

Isilon Solution Design 495

Industry Terminology


Bit-rate: The speed at which bits are transmitted, usually expressed in bits per second.
Video and audio information, in a digitized image for example, is transferred, recorded, and
reproduced through the production process at some rate (bits/s).
CODEC: An acronym of Coder, Decoder. A device or piece of software which takes one file or
signal format and translates it to another with an ideally undetectable loss of quality.
Color Space: A color model (4:2:2, 4:4:4, RGB, RGBA) describing the way colors can be
represented as sets of numbers, typically as three or four values or color components.
Frame Rate: The number of frames played every second. The standard film frame rate is 24
fps, with NTSC video at fps and PAL video at 25 fps. Shooting higher than these rates will
result in slow-motion footage and shooting lower will result in fast-motion.
Resolution: The sharpness or "crispness" of a picture. It can be measured numerically by
establishing the number of scanning lines used to create each frame of video. Typically
represented by the number of pixels wide in the horizontal span denoted as (H) by the
number of pixels high in the vertical span denoted as (V). Also see Format.

Isilon Solution Design 496

Overview: Media and Entertainment


Media and Entertainment is a very broad category. It’s everywhere and ranges from Major
Motion Pictures to Massive Multiplayer online gaming. If you can watch it, read it, listen to it,
or play it, it is likely coming from a company in the M&E category
M&E opportunities are everywhere: Local radio and TV broadcasters, enterprise and
commercial, marketing departments, training departments, education, college and
University Media Departments, streaming media for professional training and certifications,
retail, In-store advertising and the Travel and Hospitality Industry
Online video is rapidly becoming the de-facto standard for mass communication - including
print publishers.

Isilon Solution Design 497

Workflow Overview


M&E is often broken into three major segments: Content Creation and Post Production,
Content Broadcast, and Content Distribution and Streaming.
As you can see here, the flow of content begins with ingesting content from various sources
and ends with distributing a processed version of that content to the intended system or
Content Creation and Post Production handles content at the beginning of that content flow
while Content Distribution and Streaming is involved with the content that has already been
created or produced. In a way, they are on the opposite ends of the M&E content spectrum.
Content Broadcast however spans across most of the various aspects of the content flow
that includes some Content Creation, as well as Content Distribution.
M&E workflows are unique in that they deal with many very large files as well as many very
small files. Data on the Content Creation side is often very large and rather sequential in
how it is read or written. Data on the Content Distribution and Streaming side is often much
smaller and read more randomly.

Isilon Solution Design 498

Discovery Communications


Overview of M&E - Success Cases: Discovery Communications.

Isilon Solution Design 499

SiriusXM Satellite Radio


Increases the Efficiency and Cost-Effectiveness of Content Delivery Operations.

Isilon Solution Design 500

Industry-wide Challenges


All modern day media needs to be created in or converted into a high quality digital format
so it can be edited and assembled in the Post production stage. This is the first step in the
Content Creation process.
All of the raw footage that is in digital format needs to be stored so that it can be accessed
by content editors.
To prevent versioning issues, Digital Media Asset Management software is used to keep
track of all the various elements and versions of the media. Editing software packages such
as Avid Media Composer, Adobe Premiere, and Apple Final Cut are used to access the media
and assemble the various segments of footage into the feature length format. Similar
software is used to access the media and ensure that the brightness and colors are
consistent throughout the duration of the feature. Often special effects are added to the
media and the whole sequence is rendered or composited with more special software. The
media assets need to be preserved in case of a disaster, so they are often replicated off site
for safe keeping even before the feature is complete.
As you can see, the media needs to be accessed by multiple software products multiple
times in order to complete a segment. Most RAID volumes are too small to hold multiple
copies or in some cases even a single copy of the high quality large format raw footage. In
RAID-centric workflows, the media is often moved or even copied from one volume to
another so the various editors, colorists, or special effects artists can get access to the clips
they need. If multiple editors and artists need to work on the feature at the same time, RAID-

Isilon Solution Design 501

centric workflows can experience congestion caused by too many software connections
trying to access the media through the RAID controllers. Isilon avoids all this confusion and
performance issues by enabling a single volume file system that can grow more than 50+ PB
(depending on node choice).

Storage – Key Isilon Features


SmartConnect - Enables massive scalability and load balancing of media connections,

streaming media servers and content delivery agents. Allows origin servers, editors, media
playout server, and the like to continuously grow to meet customer demands without
downtime to upgrade. Allows segregation zones of nodes to support multiple specific
workflow performance requirements.
InsightIQ - Critical tool for media workflow troubleshooting.
SmartPools - Segregate real-time video workflows from editing or transcoding workloads
utilizing performance based disk tiering. Enable and automate archiving-in-place to free up
performance capacity to maximize storage effectiveness.
SyncIQ - Provides DR protection with fail-over, fail-back redundancy between media or data
centers. Facilitates scale-out level data replication between WAN/LAN locations.
Aspera - OneFS native implementation of industry standard media content delivery agent -

Isilon Solution Design 502

optimized for high-latency internet delivery between content creators and content
distributions hubs.

Design Solution


A typical design solution for latency sensitive real-time video ingest, render, and playback
clients would be to have them connect to all S-series node 10 GbE network interfaces.
For non real-time full resolution video clients like video transcoding, clients could connect to
all X-Series 10 GbE network interfaces.
And for Low res proxy video streaming clients, they would connect to all NL-Series 1 GbE
network interfaces.

Isilon Solution Design 503

Cluster Sizing for Performance


The provided aids help identify codec bitrates commonly found in M&E workloads, the
protocol performance of various nodes in example cluster sizes, and convert between
various units of measurement of speed, capacity, and duration:
 Keys to Easier Sizing - Bitrate Table
 Keys to Easier Sizing - Performance Table
 Keys to Easier Sizing - Unit Conversions

Isilon Solution Design 504

Cluster Sizing for Capacity


Multiply number of streams and corresponding MByte bitrates and convert to capacity per
Establish Capacity Requirements for each workload category Retention Requirement by
multiplying calculated capacity per hour by number of retention hours desired.
Identify capacity of each node type selected in Performance Requirements stage and
determine how many nodes will be needed to satisfy Retention Requirements.
Use provided Node Specs, Unit Conversion, and File Parity Effect Job Aids during calculations.

Isilon Solution Design 505

Example: Cluster Sizing


12 Sienna ingest clients doing real-time ingest at 1920X1080, 29.97 frames per second, using
the Apple QuickTime ProRes codec (~22MB/s per client) over SMB2 = 264 MB/s writes.
3 Apple Final Cut Pro edit clients doing real-time playback of the ingested ProRes content.
Each editor will play back 4 simultaneous streams over SMB2 = 264 MB/s reads.
 According to performance engineering sizing tool data + 30% overhead for Job
Engine performance impact this solution can be served with a 5 node X200 cluster.
 Since the workflow requires the low-latency required to support real-time ingest and
playback, SSD is added to the configuration.
 Since the workflow requires repeated high-performance cached reads from editing
clients, at least 24 GB of RAM per node are used in the configuration.

Isilon Solution Design 506

Tools for Determining Media Data Rates


Tools to help with determining media data rates:


Isilon Solution Design 507

Lesson 2: Video Surveillance


This section talks about how Isilon clusters can be used in the Video Surveillance vertical.

Isilon Solution Design 508

Industry Terminology


VMS: Video Monitoring System - allows you to record and view live video from multiple
surveillance cameras-either IP-based or analog cameras with an encoder-monitor alarms,
control cameras and retrieve recordings from an archive.
There are two primary classifications of video analytics: real-time and forensics. For actual
data flow and where the analytics take place, Isilon is a fit on all fronts, but the different
vendors perform the analytics differently-on camera, third-party via the VMS, or third-party
via secondary stream. Analytics companies specialize in imagery/video analytics, usually in
specific vertical applications. No current common video analytics engines, like Hadoop;
however, retail, utilities, transportation, gaming, and defense are common.

Isilon Solution Design 509

Overview: Workflow


Isilon can be used for many different needs in Video Surveillance: Airports, Cities/Public
Government, Prisons/Corrections, Manufacturing, Gaming, and Education. Data remains on-
premise in Isilon and subject to all IT data governance and protection policies. Files are not
replicated in the cloud and remain under IT control. This allows the customer to remain
regulation compliant.
With Pivotal HD, combined with EMC Isilon's native integration of the Hadoop Distributed
File System (HDFS) protocol, customers have an enterprise-proven Hadoop solution on a
scale-out NAS architecture. This powerful combination succeeds in reducing the complexities
traditionally associated with Hadoop deployments and allows enterprises to easily extract
business value from unstructured data.
Video is a very heavy BW (bandwidth) hog at between 2-60 Mbps depending on resolution,
rate, and codec. This means that for the streaming portion of the video feeds back to the
VMS (video monitoring system) instances, there is a network centric deployment that needs
to be considered in designing the storage as well as the infrastructure systems.
It is important to understand how the network topology of the customer will likely define the
solution and overall system, and how this affects Isilon positioning.
For the campus environments, the networks are substantial enough in BW that the primary
data flow is from the distributed IP cameras or Encoders back to a centralized set of servers
as a streaming video feed, where the VMS servers typically apply proprietary processing that
allows distribution to mobile, browsers, clients, video walls, and analytics systems.

Isilon Solution Design 510

The data flow from there is very simple, as it uses the network storage protocol (such as,
SMB, NFS, or HTTP) to store the video in real time or as an archive process. Different vendors
can tier on their VMS and some do not. Some can have a live buffer and an archive tier, and
some do not have this functionality but allow for event-based archival versus full streaming.
Note: There is no indication of the ability to do DR - this is because 99% of the VMS vendors
process data coming in and store it in a proprietary manner to make the application
responsible for all data indexing. Thus, the application handles the data migration between
and among tiers. Isilon is working to allow these vendors to use APIs, but due to the nature
of the VSM business, you can see how this is not what most will do.

Video Analytics Examples


Examples of the various analytics types are summarized here and on the next slide.

Isilon Solution Design 511

Video Analytics Examples (cont'd)



Isilon Solution Design 512

VMS Export Capabilities


Exports are used by VMS vendors of a video file for the purpose of distributing video to 3 rd
party system (evidence management or simply desktop clients).
Each VMS vendor can create exports with variety of options from digital hashing schemes,
encryption, file extensions, and packaging.
Using Isilon as the target via a separate directory/share, any network connected desktop can
export to a consolidated storage system and attain up to N+4 protection.

Isilon Solution Design 513

Surveillance Systems Storage


There are three Rs in determining capacity: Rate as shown in the Y Axis title (we are using 15
fps for this example as a common use case), Retention time (Z axis), and Resolution (X Axis).
As any of these variable increase, so does the capacity…which is why Isilon is well suited
because it can grow very easily.

Isilon Solution Design 514

Compliance Requirements


Evidence is created at the monitors depicted by security guards/officers at the remote or

central site. This evidence is typically exported from the client software running on the
monitor stations.
For chain of custody management, there are a variety of popular evidence management
software suites that support NAS interfaces, such as Mediasolv, Panasonic Arbitrator, or ACE.

Isilon Solution Design 515

Kangwon Land


Kangwon Land is the largest casino in Korea, and provides integrated hotel resort facilities
for visitors with 199 rooms, ski slopes and a golf course. The casino consists of 132 table
game machines, 960 slot and video game machines.
With more than 1,400 HD cameras and 300 SD cameras installed throughout its facilities, the
company required a high performance storage solution to collect, store, analyze, and search
massive video data logs created by the cameras, and provide continuous and highly
available services at all times.
One challenge that Kangwon Land had was video surveillance plays a critical role within
casinos, both to reduce the risk of theft and to meet industry regulations that require no
gambling to take place at a table that is not covered by a camera. In addition, in order to
meet the growing popularity of its integrated resort, Kangwon Land wanted to expand its
facilities to provide new gaming and recreation choices for customers. This meant the
company needed to improve both the effectiveness and the efficiency of its surveillance
infrastructure to capture and store video data. Kangwon Land required video data to remain
highly available for immediate review in the case of suspected gambling fraud and theft, and
enable archival of data to be retrieved when required by legal or industry authorities.
EMC partnered with systems integrator Samsung SDS to propose an IP surveillance solution
based on EMC Isilon storage running the OneFS operating system, with Genetec as the
supplier of video management software, and Bosch ST as the camera vendor.

Isilon Solution Design 516

With the scale-out architecture, EMC was the only solution provider that could meet the
current requirement for a fully active-active, fault-tolerant system, providing 11 petabytes of
data storage and the scalability to meet the casino’s predicted growth as camera numbers
increased throughout the facility.
The EMC video surveillance storage solution enables Kangwon Land to use existing human
and financial resources while scaling its systems to support the surveillance data. The
solution can store all recorded video data from 24-hour surveillance of the casino’s premises
for short-term live playback within EMC Isilon storage providing 270 terabytes within a single
file system. All video data is saved for 30 days, and long term archival of 20 percent of the
video data is held for 180 days in EMC Isilon storage with 7.7 petabytes of capacity.
For more details about this case study, follow the URL on the bottom of the slide.

Isilon FAQs


Q: Isilon nodes can handle >100 MBps, why not just size to this throughput from our
performance testing?
A: The primary limitation in video surveillance and most applications determined by Frame
Losses or Buffer Overflows in the applications themselves. This is a result of VMS software
implementation using NAS versus SCSI. Note, the VMS servers have to run a thread per
camera stream to normalize the video, which is highly compute, IO, and memory intensive.

Isilon Solution Design 517

Q: What if an application is not validated?
A: Ideally have a POC or the ISV partner do the testing to get some idea of limitations to
avoid poor CSAT. Many implementations are suboptimal when first evaluated (max of 15
MBps per server). Engage the EMC surveillance team to help in test procedures.
Q: Are SSDs necessary for larger nodes and drive types?
A: For steady state operation, the workloads are very underwhelming for Isilon for
supported bandwidths and per node server ratios. Even with node failures, the protocol
latencies can be kept well within the VMS implementation’s envelope without SSDs using
NL400s and X400s with 4TB drives. Once 6TB and 60 drive chassis come out, this may
change, but testing is needed. If NFS datastores used, latency for these are very sensitive
and SSDs would be best to absorb FlexProtect/FlexProtectLin.

Design Solution


Every VMS instance and server can automatically load balance to best Isilon node.
Connection Policies available for VMS vendors are “connections” and “round robin.” Testing
of VMS vendors has best mechanism detailed in technical notes. Using round robin is
typically best suited for larger server counts (>10) and also NFS VMS (Surveillus, Cisco, Next

Isilon Solution Design 518

Considerations in Production
 Round Robin: If VMS server count below 10 servers, round robin will likely create
non-uniform distribution of SMB/NFS connections across cluster. This leads to node
starvation and overloading.
 Connection Count: If a cluster is also used by other systems in SmartPool for
surveillance, connection count is skewed to non-VSM clients. Avoid SSH or InsightIQ
connections during initialization of VMS to avoid skewing connection count. To
assess, run isi statistics client or check in the OneFS web administration interface.
SmartLock allows administrators to set policy controls at the directory level or at the file level.
For example, admins can set longer retention policies on individual files than the default for
that directory. An administrator also has the option to mix WORM and non-WORM data in
the same file system. SmartLock directories can be on any tier, and can move to any tier with
SmartPools. SmartLock directories can replicate to other clusters with SyncIQ.
Many VMS servers perform “grooming” of video data based on how old the video data is
and/or available space on a volume.
 Due to Isilon's large, single namespace, some (not all or even majority) VMS vendors
have issues associated with the volume never getting groomed because of the
mechanisms used for issuing “deletes” based on space available in volume.
 In order to overcome this potential issue, SmartLock can present a smaller volume to
each VMS server instance.
SmartLock Consideration in Production
 Create default directory SmartLock configurations
 Use hard limits to enforce presentation of threshold to VMS on a per server basis.

Isilon Solution Design 519

Isilon SmartConnect for Surveillance


A delegation record and a name server record need to be created within the DNS server.
How this works is that the client connects to This is what we refer to as
the SmartConnect zone name.
The client then connects to this friendly name rather than an IP address. When this occurs,
the client connects to its local DNS server and asks for an IP address for that name. Because
there’s a delegation entry in the DNS server, the request is forwarded to the cluster’s
SmartConnect Service IP address (SSIP).
A basic DNS server runs on the cluster, and when the cluster receives that request, it will
decide which IP address of which node to return, thereby deciding where that client is going
to make a connection.
 The first client that connects in the morning, will get the IP address of Node 1, so in
this case, is going to be returned to the DNS server. The DNS server will
give that IP to the client. The client is then going to connect to Node 1.
 The next client connects in the morning. Same process occurs, but when the request
comes in to the SmartConnect service IP on the cluster, it will then give out the IP
address of Node 2 which is returned to the DNS server. The DNS server then hands
this IP to the client and the client maps its’ connection to Node 2.
 The next client does the same process, and this time, we connected them to Node 3,
and the next client, will connect to Node 1, Node 2, Node 3. That is what is referred

Isilon Solution Design 520

to as round-robin. Round-robin is the basic option included in the basic version of
SmartConnect. It’s the best option to set up initially as it is very easy to troubleshoot
and make sure that everything is configured properly.

Cluster Sizing


Use bandwidth (or BW) limits established for video management system (or VMS) to
determine how many nodes required for specific node type. The key is to make sure that the
per-server BW is below maximums published. Per server BW is assumed to be {aggregate
BW}/{number of VMS servers}. Use aggregate minimum capacity, cluster should be sized 10-
15% greater than this such that cluster is not running >90% utilization.
Most bandwidth limitations per server are via the VMS server.
VMS vendors have a maximum number of video feeds and bit rate per VMS server (between
100-200 video feeds and 300-500Mbps per server is a good rule of thumb)
Primary node types tested and deployed are X400 and NL400.
 The NL400/24G and X400/48G nodes are recommended. There is no need for SSDs
for large sequential writes unless supporting other workloads (NFS datastores for
 The X400 node will handle higher Server:Node ratios better and typically result in

Isilon Solution Design 521

less nodes when bandwidth is primary factor.
 All validated software is tested and sizing parameters are specified in tech notes
during failure modes (i.e., Flexprotect running), to ensure 0% lost frames during
node removal/addition. Affects of Flexprotect create higher CPU spikes on NL400s
Sizing/validation is done using primarily on GE ports on Isilon nodes to map to lowest cost
 Validations tested up to (2) VMS server per Isilon Node GE port. Information is
denoted in the Technical Notes for the VMS. We recommend assigning a single VMS
server to each GE interface. (Verint and Aimetis tested with 10GE interfaces).
VMS configurations & Isilon do not have every feature enabled in Validations
 VMS Watermarking not enabled (~20% reduction in achievable bandwidth per VMS)
 VMS Motion Detection or on server analytics based recording not enabled (~20%
reduction in achievable BW per VMS server) unless specified in Technical Notes
 Dual Writes, dual streaming, and HA designs for VMS vendors not validated (dual
writes will reduce bandwidth in half)
 SyncIQ, Dedupe, SmartPools, and other features, except SmartConnect and
SmartQuotas are not validated

Isilon Solution Design 522

Lesson 3: Home Directories and File Shares


In this section, we'll talk about how Isilon clusters can be used for home directories and file
share workflows.

Isilon Solution Design 523

Overview: Home Directories and File Shares Workflow


Home directories and file shares are used by most companies. The table on the slide shows
a comparison of typical workflows.
Home directories are generally used to centralize users’ personal files, which typically
contain private data, whereas file shares are used for centralizing data for better
collaboration between users. Home directories provide users secure access to their directory
and have a one-to-one file-to-user ratio, meaning a file is owned and accessed by one user.
In contrast, file shares are accessible by groups of users. Home directories are typically less
active and contain less critical data than file shares.
Home directories and file shares often share the same storage on the same network. Both
SMB and NFS protocols are supported for home directories and file shares. FTP, and HTTP
are also supported for file shares.

Isilon Solution Design 524

Industry Example: Columbia Sportswear


Columbia Sportswear Company is an industry leader in the design, manufacture, and

distribution of innovative outdoor apparel, footwear, accessories, and equipment. Founded
in 1938, Columbia Sportswear serves a global market with 3,200 employees and offices in 45
countries. Their legacy infrastructure included aging NetApp, HP, and IBM systems that
became difficult to manage and slow to respond to the changing needs of the business. The
customer profile document details other challenges Columbia Sportswear faced, such as
rapid growth, revenue increases, and platform manageability. The EMC solution and key
customer benefits include EMC platform such as VMAX, Isilon, VCE Vblock, and others. Also
part of the solution was Cisco, VMware, and SAP products. The solution resulted in support
of more than 2,000 users across the company, handling of data volumes growing at a rate of
90 to 95% a year, reduced project lead times from six weeks to two days. Also, the solution
cut the number of physical servers in half, realized hundreds of thousands of dollars in
savings, and achieved an RPO of 15 minutes and RTO of 8 hours.

Isilon Solution Design 525

Industry Example: University of Utah Scientific Computing and
Imaging Institute (SCI)


The Scientific Computing and Imaging (or SCI) Institute is a permanent research institute at
the University of Utah directed by Professor Chris Johnson, the Institute is now home to over
190 faculty, students, and staff and has established itself as an internationally recognized
leader in visualization, scientific computing, and image analysis. The overarching research
objective is to create new scientific computing techniques, tools, and systems that enable
solutions to problems affecting various aspects of human life. A core focus of the Institute
has been biomedicine, but SCI Institute researchers also solve challenging computational
and imaging problems in such disciplines as geophysics, combustion, molecular dynamics,
fluid dynamics, and atmospheric dispersion.
The challenge facing SCI was of capacity availability and inadequate performance of its
previous storage system, which limited research productivity. The EMC solution provided
true linear scalability - the Isilon improved performance even as SCI added capacity, enabling
the institute to stay ahead of user demand and under budget. EMC accelerated data access -
the Isilon more than doubled performance over SCI’s previous system, enabling greater
research productivity and new services for customers. The solution reduced management to
less than one full-time equivalent, enabling the SCI IT staff to work on research rather than
managing storage systems. Isilon also accelerated research productivity and increased
bandwidth for new services for less than SCI would have spent simply maintaining its old
system in the same amount of time.

Isilon Solution Design 526

Success Case: Rotary International


Increased IT agility has been a significant outcome of the IT transformation enabled by EMC
and Vblock technologies. With a private cloud, Rotary transitioned its IT resources from an
operational role to focus more on delivering projects that support its mission.
Shulda explains, “Our time to deliver IT resources has been reduced from a couple of days to
hours. So we can be proactive with IT solutions that support and accelerate important
projects, such as world health grants focused on eradicating polio. We’re also able to ensure
that mission-critical applications like SQL Server, SharePoint, and Oracle are well-protected
and scale to new data loads.”

Isilon Solution Design 527

Industry-wide Challenges


The key challenges within the vertical can be grouped into two categories: capacity and
performance. Knowing the number of users needing home directories and the capacity to
allocate for each user may be hard to determine, especially in environments with thousands
or tens of thousands of users and then predicting the users’ data growth or appropriate
quotas to set adds more complexity. The protection overhead for home directory and file
shares needs to be accounted for. How much protection is too much? Or too little? File
shares have the same challenges with the addition of access over different protocols and
how file permissions are handled in such mixed environments. Other challenges include
areas such as file retention, given that over time files become seldom accessed and tend to
accumulate, and the expectation is home directory and file shares have long data retention.
Will accessing the files meet performance expectations? Many variables need to be
considered to ensure predictable performance. Not just the number of connections,
throughput and bandwidth, but also how capacity growth will impact performance. Keep in
mind that access patterns are typically intermittent, with short burst demand. What are the
file sizes? Typically, file sizes are small (50KB ~ 1MB each). Another consideration is how a
failure will affect data access. Failure effects can range from loss of data access to degraded
performance to little or no effect.

Isilon Solution Design 528

Industry-wide Challenges (cont'd)


Shown in this table is a comparison between home directory and file share data with the
challenges around access file size and backup. Home directory files are seldom accessed and
tend to accumulate. The access pattern is typically intermittent, with short burst demand.
Home directory file sizes (50KB ~ 1MB each) are usually smaller than file share files (>1MB
each). Snapshots and backups are typically used less often than on collaborative file shares.

Isilon Solution Design 529

Isilon Strengths


This table highlights the value added to home directories and file shares. From
SmartConnect’s client load balancing to SyncIQ’s ability to replicate and protect data, Isilon
has a compelling story for any challenges customers may face. In addition to the listed
features, we can add data deduplication, which a great capacity efficiency tool with home
directories and file shares.

Isilon Solution Design 530

Isilon Challenges


Isilon challenges can range from integrating into the customer’s Active Directory to
positioning nodes with SSDs. The Isilon cluster supports home directory creation to and user
permissions just like any Windows file server. A user with Windows administrator privileges
is required for the cluster to join the AD domain. Consider using the cluster’s /ifs/home
directory that is enabled for SMB by default. Use the Isilon’s support of Group Policy Objects
(GPOs) redirection for “My Documents”. Personalized sub-directories of the Active Directory
Users and Computers %username% can be created automatically.
For LDAP integration it’s required to use the Base Distinguished Name (Base DN) to include
the Common Name (CN), Locality (L), Domain Controller (DC), etc. The port number and
LDAP server’s host name are also required. The default is read permission enabled for
Everyone and Full Control for the Domain Admin Group.
For mixed-mode authentication schemas, manage ACL via native Windows tools rather than
UNIX tools and set Isilon’s Global Permissions Policy to Balanced Mode. Metadata
acceleration should be enabled for home directories and file sharing. For performance
sensitive home directory and file sharing workflows, use SSDs for metadata performance.
Though this is not required, it is strongly recommended.

Isilon Solution Design 531

Cluster Sizing: Capacity Requirements


When determining cluster size, you should first determine the cluster capacity requirements,
and then the performance requirements. Typically, sizing for capacity accommodates the
performance demand of home directories and file shares. The table on the slide shows an
example of determining the combined total capacity requirements for home directories and
file shares.

Isilon Solution Design 532

Cluster Sizing: Performance Requirements


Shown here is the factoring in of the cluster performance requirements. Reminder: Sizing for
capacity usually accommodates the performance demand of home directories and file
shares. Given this example, X410 nodes with SSDs will achieve both the capacity and the
performance requirements.

Isilon Solution Design 533

Design Solution


Here is the solution design based off of the sizing examples. Adding a node to maintain
performance and capacity in the event of a node failure is very highly recommended. Adding
capacity to keep planned utilization in the 80% range is also very highly recommended to
keep the cluster operating at peak efficiency. The final recommended design solution is: 6
X410 109TB node cluster.
To accommodate for archive and backup capacity, add an NL node tier to the the customer’s
backup and archive requirements, and follow the same capacity-sizing methodology.

Isilon Solution Design 534



 EMC Isilon M&E Reference Site:

 Node | Cluster Performance and Capacity Tools:
 Media File Size Estimator:
 AJA DataCalc
 Digital Rebellion - Video Space Calculator
 Video Space Online

Isilon Solution Design 535

Lesson 4: Hadoop


This lesson talks about how Isilon works with Hadoop.

Isilon Solution Design 536

Industry Terminology


On the slide are terms commonly used in the Hadoop industry.

Isilon Solution Design 537

Overview: Hadoop Workflow


When you had structured data in big databases - it was common to run queries against the
database to provide dashboards and reports on details within the data. How many houses
are for sale, how many people named John, how many customers with 10 or more support
cases, etc.
The growth of unstructured data, documents, spreadsheets, slide decks, presents a problem
when looking to identify summary information about this data. Hadoop has arisen as the
way to run queries against unstructured data. Isilon offers a few key benefits unique in the
storage industry that really shine for anyone working with Hadoop or looking to get started
with data analytics. Hadoop adoption is growing across all verticals and once a company
starts seeing value from data analytics, we see those analytics push storage needs even
There are five basic roles to every Hadoop environment:
 HDFS is the storage component, and is made up of a NameNode, secondary
NameNode, and DataNode roles.
 MapReduce is the processing component and is comprised of the Job Tracker and
task tracker roles.

Isilon Solution Design 538

Industry Example: WGSN Group


WGSN Group is a strategic intelligence, insight, and trends company that provides a
combination of Big Data, deep intelligence, and sharp creative judgment to help its
customers seize opportunities in accelerating markets.
One challenge that WGSN Group identified was a promising opportunity to create a new
market intelligence service that would help fashion retailers make crucial merchandising
decisions. The service would need to gather and process huge volumes of product, sales,
and pricing data gathered from across the industry-and would grow exponentially as the
user base expanded. To successfully roll out the new service, named WGSN INstock, WGSN
Group required an information infrastructure with the performance and scalability to handle
both current and future Big Data requirements.
 Rapidly launch new market intelligence service for fashion retailers
 Support large and growing volumes of Big Data
The results were a streamlined deployment with native Hadoop integration, which enabled a
rapid launch of new market intelligence service. High performance was delivered and
streamlined scalability for growing Big Data assets, all with a simplified platform
Follow the URL on the slide for more details about this case study.

Isilon Solution Design 539

Industry Example: Return Path


Return Path is the worldwide leader in email intelligence, serving Internet service providers
(ISPs), businesses, and individuals. The company’s email intelligence solutions process and
analyze massive volumes of data to maximize email performance, ensure email delivery, and
protect users from spam and other abuse.
Previously they had a hodge-podge of more than 25 different storage systems, including
server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard
Challenges included data growing at 25-50 terabytes per year in addition to limited
performance and capacity to support intensive Hadoop analytics with disparate systems
lacking in performance and capacity.
The Isilon solution included X-Series nodes, Hadoop, internally developed email intelligence
solutions, along with SmartPools, SmartConnect, SmartQuotas, and InsightIQ.
The solution results were unconstrained access to email data for analysis, reduced shared
storage data center footprint by 30%, and improved availability and reliability for Hadoop
Follow the URL on the slide for more details about this case study.

Isilon Solution Design 540

Industry-wide Challenges


Standard authentication in Hadoop is intended for an environment where individual Hadoop

users are expected to be honest, although they may make innocent mistakes such as
accidentally deleting somebody else’s file.
Hadoop clients simply pass the name of the logged in user to the Hadoop service (JobTracker,
NameNode, etc.). These passwords are not validated by any Hadoop services.
Standard Hadoop only provides basic UNIX-type permissions.
Each file or directory is assigned an owner and a group.

Isilon Solution Design 541

Isilon Strengths


Hadoop on Isilon provides full ACLs for NFS, SMB, and HDFS. Before using the Hadoop
service, a user must authenticate with the Kerberos server (using their password) to obtain a
Kerberos ticket. The Kerberos ticket then must be passed to the correct Hadoop service.
Alternately, an MIT Kerberos server can be configured to pass authentication requests to
Active Directory server.
When using Isilon with Hadoop, each file and directory has an Access Control List (ACL)
consisting of one or more Access Control Entries (or ACE). Each ACE assigns a set of
permissions (read, write, delete, etc.) to a specific security identifier (user or group). Users
must authenticate in order to access the data. Isilon presents greater flexibility in scaling
storage or processing independently. Isilon’s clustered architecture is already much more
efficient at file management, data protection and network operations than traditional
architectures. Again adding greater efficiency and performance for data scientists trying to
get results quickly.

Isilon Solution Design 542

Isilon Challenges: Common Objections


Some common objections to using Isilon as the repository for Hadoop data are that 'Data
locality is critical for Hadoop' or 'Hadoop should run on bare metal.'
We often hear that the 'Network can’t be faster than SATA bus', or 'Files need to be
replicated 3+ times for tasks to recover from failed nodes', or 'Traditional HDFS gives me
what I need today.' We will address these concerns on the next few slides.

Isilon Solution Design 543

Design Solution


Traditional Hadoop was designed for SLOW star networks (1 Gbps).

The only way to effectively deal with slow networks was to strive to keep all I/O local to the
server. This is called disk locality.
Disk locality is lost under several common situations:
 All nodes with a replica of the block are running the maximum number of tasks. This
is very common for busy clusters!
 Input files are compressed with a non-splittable encoding such as gzip.
Disk locality provides low latency I/O, however this latency has very little effect for batch
operations such as MapReduce.
Today, a non-blocking 10 Gbps switch (up to 2500 MB/sec full duplex) can provide more
bandwidth than a typical disk subsystem with 8 disks (600 - 1200 MB/sec).
We are no longer constrained to maintain data locality in order to provide adequate I/O
This gives us much more flexibility in designing a cost-effective and feature-rich Hadoop
Isilon provides rack-locality to keep data flow internal to the rack.

Isilon Solution Design 544

Cluster Sizing


On the slide are details about questions and calculations when calculating cluster size.

Isilon Solution Design 545

Hadoop Sizing – Compute Requirements


Compute sizing is based on the required bandwidth of Hadoop jobs performance.

The slide has typical questions to ask when computing the requirements.
Calculate effective HDFS read bandwidth per compute server, which is generally around 50
MB/sec per CPU core.

Isilon Solution Design 546

Hadoop Sizing – Finalize


Here we have the final steps when you're determining size.

A good tip is to use Google to convert units. Remember, though, that Google uses binary
units, not SI (decimal).

Isilon Solution Design 547

Lesson 5: Life Sciences


In this lesson, will talk about how Isilon is used within the Life Sciences industries.

Isilon Solution Design 548

Use Cases


Many Life Science workflows have common requirements and so a solution developed for
one customer often can apply to other customers within the same category. Connecting a
potential customer in the agricultural sector with a medical clinic where both produce and
use data in similar ways even in spite of very dissimilar objectives can have a better effect
than connecting them with a sequencing service provider where the overall production and
flow of data is so different.
Shown here are some of the Life Sciences customer categories. Service Providers provide
bulk data acquisition as a service to other customers. Discrete Clinical are typically university
hospitals with a limited application of something like NGS or genotyping for a very specific
patient demographic. Commercial Research are pharmaceutical companies or biotechs
developing new drugs, treatments and therapies. Non-Commercial Research such as
institutes, universities, and/or research hospitals perform research without an immediate
commercial objective.
Customers within different opportunity areas explore certain fields of study using a variety
of approaches. The way customers use data (not what data or how they obtain data) is the
best way to understand them.

Isilon Solution Design 549

Industry Terminology


These terms, and the rest of the presentation, focus primarily on Next Generation
Sequencing (NGS). That is the majority of the work done in Life Sciences, or LS, today. Life
Sciences is a collection of areas, the application of which are focused on the study of living
systems to develop useful products, treatments and services in fields such as
pharmaceuticals, healthcare, and agriculture.

Isilon Solution Design 550

Opportunity Areas


Here is an overview of the Life Sciences opportunity areas.

Isilon Solution Design 551

Industry Example: J. Craig Venter Institute


J. Craig Venter Institute is a not for profit genomic research institute and a worldwide leader
in genomic sciences. They focus on various projects including sequencing and analyzing viral
isolates and other infectious disease-causing microbes. Their challenges include the
significant time and resources required to manage traditional storage, and the inability of
their systems to scale effectively to meet the demands of JCVI’s workflow. The EMC Isilon
solution allowed seamless scaling of capacity and performance in lock-step by simply adding
nodes to its cluster on the fly. It eliminated the cost and complexity barriers of traditional
storage architectures, and streamlined data access and analysis, increasing workflow
Eddy Navarro notes “With Isilon, we have the flexibility to ‘pay as we grow,’ allowing us to
expand our storage only as need dictates, while still maintaining the high-performance
necessary to power our data-intensive workflow and advance our research.” Eddy Navarro,
Storage team lead, JCVI

Isilon Solution Design 552

Industry Example: Takara Bio


Takara Bio is an innovative biotechnology research and development company based in

Shiga, Japan. The company has active research and product development activities in the
fields of gene and cell-based therapy, as well as agricultural biotechnology, and is committed
to preventing disease and improving quality of life through the use of biotechnology. The
Dragon Genomics Center is the largest genome analysis institution in Asia and is involved in
developing various contract research services including human genome analysis, genetic
engineering investigations, and custom DNA and RNA synthesis.
Their challenges included the ability of the infrastructure to scale to meet the required
capacity which had a direct effect on the center’s ability to provide competitive services to
The solution needed to meet the demands of increased data output volume, reduce system
processing times, improve speed of results delivery, and ensure that storage is highly
Takara selected the EMC Isilon X200 and NL400 nodes. Takara Bio uses the Isilon X200 for
the analytical servers and for data generated by the genome sequencers, and the NL400 is
used for archiving in order to provide quick access to client data and less frequently
accessed data sets.
The solution resulted in the Dragon Genomics Center reducing its delivery times by at least
50%, enabling it to provide highly competitive research services.

Isilon Solution Design 553

Follow the URL on the slide for more details about this case study.

Industry-wide Challenges


What should a data storage solution provide in Life Sciences? For data volume, begin at the
PB scale. The solution must be scalable. LS data analysis has high performance
requirements in research. Also, LS data transfer has high throughput requirements for both
research and clinical spaces. Regarding data transfer, LS data requires replication to many
collaborators, labs and shared sites. Interoperability with transfer and data managers is
essential. Clinical LS systems need to be 100% reliable. Institutions using LS data require
long term archival for regulatory compliance.

Isilon Solution Design 554

Isilon Challenges


Perhaps the key challenge is understanding the general category to design a solution for.
Knowing what questions to ask becomes significant to determine this. What data will they
generate and where will it come from? This could be internal, external, or both. Who or what
will access the data? We need to understand the compute resources and infrastructure in
use as well as understand the organization of researchers and users. How and/or when will
data move? We should know if they have explicit archive requirements and if any activity is
Research what was implemented at other customers in same category and then understand
underlying technology in order to tune system and configuration.

Isilon Solution Design 555

Cluster Sizing: Performance


 LS workflows are not typically metadata access intensive. Data is typically staged
before analysis.
 More performance gains can be had by increasing the amount of cache available to
nodes within the cluster.
 Configuring SmartPools is recommended to ensure that the HPC cluster has
exclusive access to the X or S-Series storage tier.
 Instruments: normally communicate via CIFS/SMB and bulk upload their data.
 Node memory and processor configurations should be optimized to support heavy
job scheduler/autobalance activity.

Isilon Solution Design 556

Cluster Sizing: Capacity


On the slide are the calculations used to determine the capacity for cluster sizing.

Isilon Solution Design 557

Design Solution


Optimizing the solution for HPC environments is key. It is by way of the HPC system that
most users in LS will touch the storage environment and where their impressions of storage
quality will be formed.

Isilon Solution Design 558

Lesson 6: Healthcare


This lesson introduces you to how Isilon is typically used in the Healthcare industry.

Isilon Solution Design 559

Industry Terminology


On the slide are common terms used in the Healthcare industry.

Isilon Solution Design 560

Overview: Healthcare Workflow


PACS stands for Picture Archiving and Communication System. It's a system that stores
medical images and other study metadata for clinical imaging studies. It replaces traditional
film and light boxes with digital imaging on high-resolution monitors. These digital images
are stored using the DICOM standard format.
It is important to note that PAC systems can generate a wide range of file sizes. Computed
radiography studies can generate up to five images at five to ten megabytes each; whereas,
a computed tomography study might generate up to 2000 images at 250 kilobytes apiece.
Different vendors also use different compression levels, meaning that the same modality
may generate differently sized files from vendor to vendor.
When we discuss Isilon sizing, we will discuss file size variables and their impact on cluster

Isilon Solution Design 561

Workflow Storage Requirements


There is generally a PACS database, as well as short term, long term, and disaster recovery
storage tiers. Typically, Isilon is used mainly with the Long Term and Disaster Recovery tier.
In some cases, Isilon can also be used as the Short Term tier. Isilon is not suitable for the
PACS database since it tends to be a relational database that performs better on block
storage solution such as VMAX or VNX.

Isilon Solution Design 562

Compliance Requirements


HIPAA, the Health Insurance Portability and Accountability Act, allows people to keep their
insurance coverage when they change or lose a job. It also mandates industry-wide
standards for health care information, billing services and other processes, and requires
protection and confidential handling of protected health information (PHI).
HITECH, the Health Information Technology for Economic and Clinical Health Act, provides
$25.8 billion for health information technology investments and incentive payments.
Adoption and Meaningful Use (MU) of Electronic Health Records, allows eligible professionals
to receive up to $44,000 through Medicare EHR Incentive Program and $63,750 through
Medicaid EHR Incentive Program. Doctors who do not adopt EHR by 2015 will be penalized
1% of Medicare payments, up to 3% over three years.

Isilon Solution Design 563

Industry Example: Children’s Hospital Boston


Children's Hospital in Boston is the world’s largest pediatric research center with thousands
of researchers focused on developing treatments and caring for sick children.
Some of the challenges the hospital faced were that they had hit the storage limitation with
traditional SAN and had unpredictable data growth. Traditional storage was cost-prohibitive
and incapable of keeping pace with data growth.
After implementing the Isilon solution, the data management was simplified by unifying file-
based applications onto the cluster. SyncIQ improved data reliability and eliminated the
impact of data on overall IT backup operations.
Follow the URL on the slide for more details about this case study.

Isilon Solution Design 564

Kyoto University


At Kyoto University, the Diagnostic Imaging and Nuclear Medicine Department conducts a
wide range of research for a variety of disciplines, for example, nuclear medicine, positron
emission tomography, magnetic resonance imaging and device development.
One of the main challenges faced by the department was that all the medical imaging was
centrally managed in a work system called PACS and stored for five years. Also, there was an
enormous amount of image data used in clinical studies that must be stored because
verification data from all cases is analyzed and researched separately from the PACS.
After implementing Isilon, the department now has now unified its ultrasonic tomography,
retinal tomography and MRI image data in a single, highly scalable, high performance,
shared pool of storage. Unification of imaging data into a single file system and single
volume increased the efficiency of the research process.
For more information about this case study, search online for: press release "Leading
Japanese University Powers Diagnostic Imaging Research With Isilon IQ"

Isilon Solution Design 565

Industry-wide Challenges


In today's healthcare environment, there are several challenges with which IT departments
must grapple. First, is the increase in the volume and types of sensitive patient information
being generated. This data is growing at about 35.3% per year, and is expected to reach 13
and a half exabytes by the year 2015.
Looking at the chart on the right, you can see about two-thirds of this information is made
up of medical imaging, nonclinical imaging, and other file-based workloads. These are areas
where Isilon is a great fit.
Further, this data is typically stored in silos for each department and application. There are
many new and changing regulatory requirements, which IT departments must adopt. We
discussed HIPAA and HITECH, but there are many more.
There are also varying data retention periods for the types of data being stored. Some data
only needs to be maintained for seven years; whereas, other data is maintained for the
lifetime of the patient. All of these challenges make the healthcare IT environment very

Isilon Solution Design 566

Isilon Strengths


What is a VNA? A VNA will consolidate the individual archives associated with departmental
PACS. All image and non-image data are stored in non-proprietary format on an enterprise-
class data management system.
Some of the VNA benefits include the ability to switch PACS systems without complex and
expensive migrations, a common viewer to view images across departments, the ability to
consolidate existing applications or bring new imaging applications online, and the ability to
own your image data.

Isilon Solution Design 567

Isilon Challenges


OneFS mirrors files smaller than 128 KB, and protection overhead impact tapers off to 1MB.
Institutions with a large percentage of CT/MRI studies have a large number of smaller image
files, requiring increased cluster capacity.

Isilon Solution Design 568

Relevant OneFS Features in Healthcare


On the slide are OneFS features along with how they're typically used in an Isilon solution.

Isilon Solution Design 569

Cluster Sizing


On the slide are general guidelines and questions to keep in mind when sizing a cluster for

Isilon Solution Design 570

Lesson 7: Oil and Gas


In this lesson, we'll talk about using Isilon in the Oil & Gas industry.

Isilon Solution Design 571

Overview: Oil and Gas


Over the last several years, the Oil & Gas industry has made the decision to embrace new
technology in an attempt to overcome key challenges, such as the need to identify ways to
locate new hydrocarbon deposits quickly, extend the life of existing fields by improving
recovery techniques, and manage the exponential growth of demands in data and analytics.
Of the three main sectors of Oil & Gas, Upstream encompasses locating and extracting
hydrocarbons, and is where the biggest investments are being made. Without success in
locating and extracting hydrocarbons, Midstream and Downstream operations are useless.
To EMC, Upstream alone represented a huge $1.6B market opportunity in 2014.
1. Isilon Continues to See Strong Growth in the Oil & Gas Market , Isilon INSIGHT, by George
Bennet, Sept 2014.
2. IDC Energy Insights: Worldwide Oil and Gas IT Strategies, Chris Niven , Apr 2016

Isilon Solution Design 572

Sectors and Terminology


In general, companies in the Oil & Gas industry operate in three main sectors. This slide
shows those sectors along with the industry terminology for the hydrocarbon value chain.
Upstream: While there is a pressing need for transformation across the industry, the
Upstream sector - which is primarily focused on subsurface exploration, discovery and
extraction of resources - is by far the area where companies are making the biggest
investment in technology relevant to EMC Isilon.
Midstream: The changes in the Upstream sector in extending the life of existing fields,
deeper drilling and the rise of unconventional plays is having an effect on the Midstream
sector, which mainly involves the transportation and storage of crude products. The complex
logistics surrounding transport systems, such as pipeline, barge, rail, oil tanker or truck, is
driving businesses to optimize operations and to decrease costs through the
implementation of smart technologies, which heavily use real-time data capture and
analytics, workflow automation, and enhanced collaboration through mobility.
Downstream: The Downstream sector focuses on the refining of crude oil and raw natural
gas, and the retail (marketing and distribution) operations of resulting products such as
gasoline/petrol, diesel, natural gas, aviation fuel, lubricants, liquefied petroleum gas (or LPG)
and asphalt. The increasing global demand for energy particularly in developing countries
places pressure on logistics management here too.

Isilon Solution Design 573

Typical Requirements


The table on the slide describes typical requirements for the Oil & Gas industry.

Isilon Solution Design 574

Industry Example: Arcis Corporation


Arcis Corporation is a provider of 2D and 3D seismic data processing and geotechnical

services to the oil and gas industries, and has an extensive seismic data library.
The company needed to quickly and accurately analyze massive amounts of Big Data. Their
existing storage systems were not able to meet data requirements, which meant timely and
valuable analysis was severely compromised.
The benefits of installing Isilon were the unification of 40 disconnected data silos into a
single file system, enabling the rapid access to a vast archive of seismic data to then open
new lines of business, and reducing the sample analysis iteration cycle from eight weeks to
three weeks.
Rob Howey, Senior VP at Arcis Corporation, said “Our ability to quickly collect, organize, and
analyze seismic data is a critical business enabler that our customers rely on to make
accurate exploration decisions. Without Isilon, we simply would not be able to perform and
deliver such rich analysis to our growing customer base. Isilon not only worked as advertised,
but delivered a wealth of value beyond what we expected.”
For more details about this case study, follow the URL on the slide.
Old Press Release

Isilon Solution Design 575

Provider of Seismic Processing For Oil and Gas Industry Leverages Clustered Storage to
Achieve 10-Fold Increase in Delivery of Seismic Data and Services
SEATTLE, WA - November 13, 2006 - Isilon Systems, the leader in clustered storage, today
announced that Arcis Corporation has deployed Isilon IQ to power its entire seismic data
processing operations. Using Isilon IQ, powered by Isilon’s OneFS operating system software,
Arcis has unified its vast stores of seismic data into one, easily scalable and shared pool of
data and met the concurrent data performance requirements of its 850-node clustered
computing farm. By combining Isilon clustered storage with its high-power clustered
computing architecture, Arcis has achieved new breakthroughs in the search for oil and gas
reserves, accelerating time to results by 10-fold and enabling the company to undertake
massive seismic processing projects that were previously unattainable.
“In the ever-intensifying and expanding search to discover new, extractable sources of oil
and gas, rapid and accurate analysis of seismic data is the key to gaining a competitive
advantage in this high-stakes race and feeding massive consumer demand,” said Rob Howey,
Senior VP, Arcis Corporation. “In order to maximize the value of our vast library of seismic
data and deliver the deepest and most accurate analysis possible to our customers, we
require storage with unparalleled concurrent performance, ease of use, simplicity and
scalability. Isilon IQ delivered as advertised.”
Prior to adopting Isilon IQ, Arcis Corporation had deployed a number of disparate Direct
Attached Storage (DAS) systems to manage their ever-increasing amounts of data. These
systems could only scale to one terabyte per file system, resulting in the creation of 40
separate volumes or silos of data. These legacy systems were extremely difficult to manage
and maintain and did not provide the concurrent data throughput required to keep pace
with the more than 30 users accessing the performance-deprived clustered computing farm.
In contrast, Isilon clustered storage has enabled Arcis to easily scale its storage, unify vital
seismic data into one file system, feed its 850-node clustered computing farm the
concurrent data throughput it requires, and increase performance of its seismic applications,
thereby dramatically increasing project turnaround time and enabling Arcis to pursue larger
and more profitable contracts.
“With Isilon IQ, Arcis Corporation is able to maximize the power of clustered computing and
unify all of its raw seismic data into one, easy to manage, single pool of shared data,” said
Brett Goodwin, VP of Marketing & Business Development, Isilon Systems. “Arcis’ ability to
leverage the cutting-edge combination of clustered computing and clustered storage into
true business breakthroughs - such as completing projects that used to take eight weeks in
five, with an even higher degree of analysis - truly demonstrates the power and wide
enterprise applicability of these technologies.”
The acquisition and processing of raw seismic data is an intense process involving massive
land and marine projects using both Pre-Stack Time Migration and Pre-Stack Depth
Migration - complex, 24x7 seismic data analysis processes. First, a project site is selected and
thousands of sensory devices (geophones) are arranged on the surface in a grid to record
the output of a controlled blast or detonation. When the charge is detonated, the geophones
trace the blast by collecting time-series data, sampled at 2 milliseconds for a period of up to
8 seconds or more of listening time. This measurement process can result in hundreds of
billions of samples and hundreds of millions of traces. Arcis conducts these survey projects
independently, as well as processing data from customers’ surveys, resulting in tremendous

Isilon Solution Design 576

amounts of raw seismic data files up to terabytes in size. With Isilon IQ, Arcis has been able
to unify all its data into one easy to manage, single shared pool of storage, maximize the
power of its clustered computing farm to achieve deeper, faster analysis and advance its
business to the next level.
Isilon IQ delivers the industry’s first single file system that unifies and provides instant and
ubiquitous access to the rapidly growing stores of digital content and unstructured data,
eliminating the cost and complexity barriers of traditional storage architectures. OneFS 4.5 is
a unified operating system software layer that powers all of Isilon’s award-winning IQ family
of clustered storage systems including the Isilon IQ 1920, 3000, 6000, Accelerator, and EX
6000, which are available immediately.

Industry Example: PetroChina


PetroChina, the publicly-traded arm of China National Petroleum Corporation, executes a

range of geological and geophysical research and development throughout China, focused
on identifying, extracting, producing and distributing oil and gas resources around the world.
Prior to Isilon, PetroChina's traditional storage systems couldn't keep pace with the
organization's exponential data growth and intense performance demands, slowing
workflow productivity and escalating operating costs.
By deploying Isilon scale-out NAS, PetroChina was able to consolidate a variety of mission-

Isilon Solution Design 577

critical applications on a single file system and point of management. This streamlined their
Big Data management allowing them to accelerate time-to-discovery in oil and gas
exploration. Additionally, their scientists now have immediate, highly concurrent access to
seismic data and applications, which improves collaboration and operating efficiency.
For more details about this case study, follow the URL on the slide.

Industry-wide Challenges


One challenge in the Oil & Gas industry is ensuring safe operations, along with minimizing
the risk and impact to the environment. The key to addressing this challenge is by collecting
data that provides the real-time status of every part of operations, and being able to apply
historical information that can help optimize operations. By learning from similar past
scenarios, more accurate predictions can be made about what may happen so that
corrective actions can be taken with little disruption and cost.
Some of the other challenges faced in the industry today include the difficulty during
Upstream activity in identifying just where viable reservoirs of oil & gas may lie in the
subsurface, logistics complexity in Midstream for the transportation of recovered
hydrocarbons, and the management of plant operations in a safe and efficient manner in
Downstream operations.

Isilon Solution Design 578

Industry-wide Challenges (cont'd)


The table on the slide highlights Oil & Gas industry challenges and how they can be

Isilon Solution Design 579

Design Solution


The Isilon scale-out data lake is an ideal platform for handling data diversity using a
consistent and unified approach that is easy to implement and manage. One major benefit
of using Isilon with Hadoop is the ability to use existing and known file protocol mechanisms
for data management and ingestion instead of Hadoop-specific mechanisms that require
specific application-level modifications. Using native protocols enables in-place analytics,
which eliminates migrations, making data workflows faster and helping businesses to gain
faster insights from their data.
Following an October 2014 Lab Validation of EMC Isilon Scale-Out Data Lake, IDC stated that
businesses will find it easy to build out workflows using the EMC Isilon Scale-Out Data Lake
 It enables the use of existing and known file protocol mechanisms (instead of
Hadoop-specific mechanisms that require specific application-level modifications)
 Its performance optimization capabilities make it an ideal platform for enterprise-
wide data storage/analytics with a centralized storage repository
 The use of native protocols enables in-place analytics (eliminates migrations), makes
data workflows faster and helps businesses to gain faster insights
With EMC Isilon, possible data ingest protocols include NFS, SMB and HDFS, which covers a
wide range of data types used across the Hydrocarbon Value Chain. When you leverage the
commodity storage economics of ECS appliance, and the software-defined storage benefits

Isilon Solution Design 580

of ViPR, EMC provides a comprehensive solution to building the Exploration & Production
Data Lake.
Below are some technical details on the benefits of a HDFS-enabled Isilon.
Leveraging HDFS enabled Isilon storage, you can easily consolidate Hadoop data into a
single, scale out storage array. Not only do you eliminate multiple copies of data, but data is
shared for better and faster insight. Leveraging Isilon you have the additional benefits: see
 Scale compute and data independently - over 80% storage capacity usage - Dell EMC
Hadoop solutions can also scale easily and independently. This means if you need to
add more storage capacity, you don’t need to add another server (and vice versa).
With EMC isilon, you also get the added benefit of linear increases in performance as
the scale increases.
 Automatically HDFS enable existing data - no ingest necessary
 Ease of import and export, and support multiple applications via multi-protocol
support - HDFS, NFS, CIFS, HTTP, FTP. This means that with EMC Isilon storage, you
can readily use your Hadoop data with other enterprise applications and workloads
while eliminating the need to manually move data around as you would with direct-
attached storage.
 Fault tolerant and end to end data protection - Isilon also eliminates the “single-
point-of-failure” issue. We do this by enabling all nodes in an EMC Isilon storage
cluster to become, in effect, namenodes. This greatly improves the resiliency of your
Hadoop environment. The EMC solution for Hadoop also provides reliable, end-to-
end data protection for Hadoop data including snapshotting for backup and recovery
and data replication (with SyncIQ) for disaster recovery capabilities.
 Rapid deployment with EMC Hadoop Starter Kit leveraging Vmware vShpere - HSK
enables IT to quickly and easily deploy a Hadoop cluster and provide Hadoop as a
Service to the business in order to support Big Data projects. Through VMware
Vsphere, a virtual Hadoop cluster can be deployed using Isilon shared storage in just
a few short hours with simple downloads of free software and documented
configuration steps leveraging automation provided by VMware Big Data Extension.

Isilon Solution Design 581

Cluster Sizing


There is no single formula for sizing in this industry, because of the wide variety of potential
avenues for EMC engagement. Therefore the guidance here is more generic and could be
applied to any customer.
Sizing for Oil & Gas companies starts with identifying the key players, their priorities, and the
goals they're trying to accomplish. It's important to understand the primary workflow and
the steps taken from beginning to end.
Getting a detailed understanding of the technologies in use and specific criteria for success
from a storage purchase will be the key in these engagements.

Isilon Solution Design 582

Lesson 8: Financial Services


This lesson provides information about how Isilon is used in the Financial Services industry.

Isilon Solution Design 583

Overview: Financial Services


When people think of the Financial Services Industry, or FSI, they typically think of banks. In
reality, FSI consists of many types of businesses as listed on this slide.
The FSI market has many interesting aspects, including the diverse set of workloads. At the
core, FSI companies have all of the Enterprise IT workloads you would expect, such as file
shares, home directories, backup, archive, etc. But they also may have many other
workloads including Tick Data analysis, big data analytics, video surveillance, audio recording
as well as other specialized applications. Isilon is designed to handle a wide variety of
workloads along with a diverse set of applications and use cases, which is one of the reasons
Isilon is such a great fit for this vertical.
Regulatory compliance is another huge factor in FSI. A lot of the data needs to be protected,
encrypted or access audited to comply with regulations. This is another reason Isilon is such
an obvious choice for FSI.
We will discuss this in more detail as we go through this lesson.

Isilon Solution Design 584

Typical Requirements


On the slide are key requirements for the Financial Services industry.

Isilon Solution Design 585

Industry Terminology


This table has terminology commonly used in the Financial Services industry.

Isilon Solution Design 586

Financial Services Workflows


On any given day, a financial institution’s infrastructure faces numerous workload challenges
(i.e., data pouring into its IT environment from various platforms and protocols). While the
industry does have traditional workflows, including home directories and file shares
(represented on the left in the diagram), FIS also has next generation workflows, such as Big
Data, mobile and cloud (represented in the diagram on the right).
In many cases, the storage infrastructure to support these applications and workloads are
“silo’ed”. This can result in islands of storage that are inefficient and complex to manage, and
can also create hotspots and single points of failure.
Isilon addresses this challenge by offering customers what we call the Isilon scale-out data
lake. This will be the one place to store, manage, protect and secure all unstructured data
within an enterprise, regardless of whether it is used by traditional workflows or next
generation workloads.

Isilon Solution Design 587

Compliance Requirements


Listed here are some major sources of regulatory requirements in the Financial Services
industry. The names may be familiar to you, but you can see the Student Guide for more
The Sarbanes-Oxley (SOX) Act of 2002 (often shortened to SarbOx or SOX) is legislation
passed by the U.S. Congress to protect shareholders and the general public from accounting
errors and fraudulent practices in the enterprise, as well as improve the accuracy of
corporate disclosures. The U.S. Securities and Exchange Commission (SEC) administers the
act, which sets deadlines for compliance and publishes rules on requirements.
SEC 17a-4 is a regulation issued by the US SEC pursuant to its regulatory authority which
outlines requirements for data retention, indexing, and accessibility for companies which
deal in the trade or brokering of financial securities such as stocks, bonds, and futures.
According to the rule, records of numerous types of transactions must be retained and
indexed on indelible media with immediate accessibility for a period of six months, and with
non-immediate access for a period of at least two years. Duplicate records must also be kept
within the same time frame at an off-site location.
Federal Information Security Management Act (FISMA) is the Confidential Information
Protection and Statistical Efficiency Act of 2002, which is an Act to strengthen Federal
Government information security, including through the requirement for the development
of mandatory information security risk management standards.

Isilon Solution Design 588

Industry Example: Major US Bank


A large bank based in the USA, but serving an international customer base across the globe,
needed to improve their storage efficiency and their data center footprint. As well as their
efficiency needs, they needed to integrate Hadoop into their system for analytics.
Isilon provided them with a solution consisting of two clusters; an active cluster with five X-
Series nodes and three NL-Series nodes and an archive cluster with eight NL-Series nodes.
Their physical footprint improved by a factor of seven, and their storage efficiency by a
factor of three. Their environmental efficiency (power, cooling and so on) improved by 40%.
Follow the URL on the slide for more details about this case study.

Isilon Solution Design 589

Industry Example: Large International Bank


A very large bank in the USA needed to consolidate their storage systems after some
acquisitions. They replaced their NetApp filer infrastructure with Isilon for their regular
home directories and project storage. A total of 26 Isilon nodes combined into a data lake
that was so efficient and easy to manage that they could not only serve their internal
customers better, but also free up storage administrators to tackle other needs in their
Old Press Release
Application: 30,000 users! File shares for Aging NetApp Filer infrastructure used for home
shares, project space and application data, CommVault Backups.
Background on Customer: One of the 10 largest banks in the U.S., with approximately
26,000 employees and deep roots in the community dating back more than 150 years. The
Bank offers a broad array of retail, small business and commercial banking products and
services to more than 8 million customers through its extensive network of approximately
1,300 convenient locations throughout the Northeast, Mid-Atlantic, Metro D.C., the Carolinas
and Florida. In addition to banking products, the bank and its subsidiaries offer customized
private banking and wealth management services and vehicle financing and dealer
commercial services.
Customer Challenges: The bank embraced Isilon for their core corporate banking file
services. Acquisitions brought two separate file serving infrastructures to be managed by the

Isilon Solution Design 590

IT team. Both came with an aging NetApp infrastructure. The following requirements were
identified as requirements to move to standardized Isilon platform:
 Aging NetApp Filer infrastructure used for home shares, project space and
application data.
 SMB is the main protocol requirement.
 Snapshots and Replication for DR.
 70% of 120TB is cold data.
 Tape for long term retention - Weekly full backup to TSM = 60-70 LTO tapes shipped
off site.
 7 year retention = approximately 36 PB. currently with no expiration.
 Frees up several man days per month from storage management duties. Reduced
necessary FTE for storage admin from 6 to 2. Frees them up for other projects.
 Improved efficiency and ease of management including familiarity with the Isilon
platform as they add more Admins to manage new workloads and clusters.
 Ease of management and maintenance, therefore reduces the TCO associated with
growing file stores.
 Removed the need for 36 PB of offsite tape storage by leveraging NL nodes with
SyncIQ and SnapShotIQ.

Isilon Solution Design 591

Industry-wide Challenges


From these trends of mobility, social media, analytics and regulation, we’re seeing our FSI
customers experiencing new challenges with their infrastructure and this comes with new
Often, existing infrastructure is siloed by architecture, organization or policy. This can inhibit
the effective management and use of rapidly growing unstructured content.
And growth is coming from existing and new streams driven by real time, historical,
consumer data utilized for analytics.
As new mobility features are being requested and used, it creates new levels of risk, not only
in terms of security and compliance but also in the complexities of integrating and
supporting multivendor mobility solutions.
New regulations are being invented all of the time and failure to comply with government
mandates can be financially damaging but can also damage a financial company’s reputation.
Failure to be compliant with all appropriate requirements is not an option.
Is the customer experiencing any of these issues or any others?

Isilon Solution Design 592

Isilon Strengths


Some of the Isilon strengths are listed on this slide.

Isilon Solution Design 593

OneFS Features in Financial and Banking


Here are some OneFS features and how they can be used in an Isilon solution.

Isilon Solution Design 594

Cluster Sizing


The general sizing guidelines for the Financial Service Industry are similar to other verticals,
and are listed here along with questions to ask the customer.

Isilon Solution Design 595

Module 10: Competition


Upon completion of this module, you will be able to explain ROI and TCO, identify
marketplace competition, explain Isilon’s competitive edge, and apply competitive edge to
solution proposal.

Isilon Solution Design 596

Lesson 1: ROI and TCO Objectives


After completing this lesson, you should be able to discuss the differences between ROI and
TCO, and explain the benefits of linear scalability.

Isilon Solution Design 597

Overview: ROI and TCO


The first section in the competitive module covers Total Cost of Ownership (TCO) and the
financial benefits Isilon can provide for many customers. There are two key areas to a
financial benefits conversation: Total Cost of Ownership, or TCO, and Return on Investment,
or ROI.
TCO includes the capital expenses, often referred to as CAPEX, associated with acquiring the
hardware, software, related expenses for on-going expansion, services (both professional
and migration), and takes into account any trade-in credits and depreciation. Operational
Expenses, or OPEX, include items, such as people (salaries), support, power, cooling, and
floor space. Productivity refers to reduced downtime and improved application performance.
ROI should use all the components of TCO and compares the funding of an Isilon solution to
a competitors solution over some period of time, usually 3 to 5 years. ROI can be calculated
for both cash purchases, as well as leases depending on the customer’s buying preference.
ROI can be an effective competitive differentiator as Isilon’s ease of use and storage
efficiency will translate into significant savings for many customers.

Isilon Solution Design 598



The question to ask when discussing ROI is: Does buying Isilon use funds more efficiently
than buying a competitor’s solution (over some period of time)? When potential solutions
compete (other factors may or may not be equal), the investment with the higher ROI is
often considered the better choice. Using ROI as a component of the sales process is an
effective competitive differentiator.

Isilon Solution Design 599

Capital Expense Drivers


The storage efficiency and usage rate charts on the slide explain, in general, how you should
deal with calculating storage usage or efficiency. There are two parts to each chart:
subtracting overhead and discretionary reserve will leave you with the actual useable
capacity. In the upcoming slides, we’ll look at the sources of overhead and how it affects
useable capacity and capital expense.

Isilon Solution Design 600

Isilon Efficiency


In the IDC white paper, Quantifying the Business Benefits of Scale-Out NAS, IDC was able to
quantify the benefits the companies realized from their deployments. IDC found that, on
average, the companies in our study were able to reduce their average annual cost per TB
used by 41%, achieving a total benefit of $1.6 million (or $812 per TB in use as shown in
slide). Thus, this report and analysis- comparing Isilon’s storage efficiency to traditional
scale-up NAS- depict the reduction in cost per useable TB and identifies that Isilon makes
better actual use of the available capacity than competitive architectures delivering 84%
storage usage and providing a CAPEX savings of 37%.

Isilon Solution Design 601

Sources of Overhead


The table on the slide compares and contrast the sources of overhead between Isilon
systems and those File System and RAID-based legacy NAS products, such as NetApp and
EMC VNX systems. Certainly by taking all the sources of overhead, Isilon offers a solution
with considerably less overhead.

Isilon Solution Design 602

Example: Clustered Scale-Up Usage


The chart displays volume used, with aggregate size gathered from a NetApp system’s
Autosupport data. The used percent is volume used divided by usable disk. For this
particular customer, only 37% of the capacity is used. This is due to the poor usage rate of its
volume-based architecture. While this may be an extreme example that highlights
inefficiencies, typical environments are at 50% to 60% usage.

Isilon Solution Design 603

What Causes Usage Difference?


For many customers, Isilon’s single file system and single volume architecture provide 80%
to 90% storage efficiency or usage. But remember, planning for usage greater than 80% can
affect performance, as well as not having enough capacity to withstand a node loss. This is
also true for traditional scale-up NAS systems.
Competing clustered scale-up architectures, such as NetApp ONTAP c-mode, incur heavy
overhead due to its multiple volumes and the inefficiencies these create. As seen in the
previous slide, volume usage will vary. This causes performance hot spots and wasted free
space, which in turn means manual management of data to optimize performance and
capacity on an on-going basis. Typical, best case usage for these environments ranges in the
50% to 65% range.

Isilon Solution Design 604

Real Case: 400TB Usable


In this 400TB example, a customer requesting to purchase ~400TBs of useable capacity will
find that the Isilon solution requires less raw capacity for the same amount of useable. After
factoring in the usage rate, or storage efficiency, the customer has nearly 100TBs more of
actual capacity. This translates to reduced CAPEX and improved ROI when compared to

Isilon Solution Design 605

FUD: Compression, Dedupe, Thin Provisioning


Often, the competitive response from vendors, such as NetApp, is to point out the storage
efficiencies gained from compression or dedupe features found in ONTAP. While these
features can be beneficial for certain data sets, there are limitations to their effectiveness in
improving the economics of NetApp ONTAP cluster mode. These limitations include the
inability add incremental capacity or benefit the entire cluster. They will only improve
storage efficiency for the controller(s) and volumes where the deduped data resides,
providing no global benefit to the cluster’s efficiency. Another way to look at it is that less
raw capacity is required for all data sets on the Isilon cluster, not just for the dedupe-able
data. This benefits organizations by reducing the overall CAPEX and TCO for an Isilon cluster.
Additional issues with looking to dedupe or compression to help solve for the generally poor
efficiency of ONTAP is that it does not eliminate the manual processes required to
continually balance and optimize capacity and it adds additional overhead which can
adversely affect application performance.
Isilon added the deduplication feature in OneFS 7.1, and like NetApp, Isilon's deduplication is
a post-process step that recovers space without slowing production operations, but has the
benefit of deduplicating a much larger set of data at a time. Similarly, thin provisioning is
possible using Isilon's highly flexible quotas. By using the transparent scale-out process to
expand capacity without disruption, Isilon can provide a much larger upper capacity range.

Isilon Solution Design 606

Linear Scalability Reduces CAPEX and OPEX


Another key area of Isilon and its ability to offer improved TCO over competitive products, is
its linear scalability. As nodes are added to the cluster, performance scales linearly and
predictably. Upon adding a new node, the system rebalances user IO and data across all the
spindles and nodes in the cluster; this eliminates hot spots and verifies optimization of all
the disks simultaneously. There is also no need to manually migrate data and users across
volumes and RAID groups to “balance out” the array and achieve optimal performance. Isilon
does it automatically!
All of these features add up to reducing both capital and operational expenses.

Isilon Solution Design 607

OPEX Management Metrics


Operational expenses and personnel costs associated with managing rapidly growing data
can be a significant burden to many organizations.
Isilon’s ease of use and automated processes is a key factor in allowing the management of
more data with less personnel then competitive offerings. Summarizing the results of
several analyst studies and customer experiences, Isilon storage admins are managing 5x
more capacity than those managing competitive products. For example, in the diagram on
the slide, assumed traditional staffing is about 400TB per full-time employee; whereas with
the Isilon solution, a full-time employee can manage 2000TB.

Isilon Solution Design 608

TCO Productivity


Productivity improvements can be measured by improved application uptime, increased

performance, and reducing application and user disruptions.
While being a component of TCO, these areas become harder to quantify than OPEX and
CAPEX. Nevertheless, we’ve included them here for a complete look at TCO.

Isilon Solution Design 609

Reduced Downtime; Improved Productivity


With Isilon, adding additional capacity, performance, and consolidating applications is a

simple, nondisruptive process. Also, old hardware can be seamlessly removed from the
cluster and replaced by more current node varieties. In 2016 IDC Lab Verification Brief
reported that “a disk failure on a single node has no noticeable impact on the cluster.
Furthermore, the operation of replacing the drive is a seamless process and has little
administrative overhead, no different than enterprise disk storage systems. Also, analyzing
the impact of a node-failure it was reported that “IDC validated that a single-node failure has
no noticeable impact on the cluster. Furthermore, the operation of removing a node from
the cluster and adding it back to the cluster is a seamless process…”
Analysis, based upon the IDC’s report and 2016, Isilon reduces downtime by 95% and
improves end-user productivity.

Isilon Solution Design 610

Reduced Big Data Storage TCO


IDC found that due to Isilon’s reduction in CAPEX and the increase in user and IT productivity,
customers saw a reduction in their storage costs by 41%.

Isilon Solution Design 611



EMC Storage TCO Tool produces professional quality reports that you can use to illustrate
the metrics in a business case presentation.
The summary results in the reports include high-level financial metrics for: TCO, ROI, Internal
Rate of Return, Net Present Value, and CAPEX ad OPEX savings. You can also view the results
for cash or lease purchase separately.
In addition, you can view the results by detailed cost categories, including power/cooling,
floor space, management costs, additional investments (Services, Training, Migration) or
financial considerations for remaining depreciation and/or credits and buybacks.
These reports can be created online with the web-based version of the EMC Storage TCO
tool. Or you can download the off-line model to Excel in order to access the Word and
PowerPoint reports.

Isilon Solution Design 612

EMC TCO Tool: Use Case


In this use case, one possible option available to this organization would be to continue to
expand the existing home directories storage environment by adding NAS filers and capacity
in order to support the growing performance and capacity requirements.
 Home directory infrastructure and management costs are straining IT budgets
 Currently manage a large amount of home directory data or an expectation of
growth exceeding 20% CAGR
 An existing or foreseen problem in the form of storage inefficiencies, difficulties
achieving SLAs or difficulty provisioning new storage
The business’s assessment of the existing storage environment, along with the necessary
acquisitions and administration costs to support the 20% CAGR requirement over a three-
year period, revealed the following requirements:
 Raw capacity required would exceed 980TBs
 The cluster would need to be expanded from three to five filers
 Two full time storage admins would be necessary
 Total expenditures would reach nearly $4M
 Total management operating costs would exceed $1M

Isilon Solution Design 613

As an alternative to adding additional infrastructure to the existing storage environment, a
new EMC Isilon 570TB cluster solution is architected to replace the business’s current filers
and form the infrastructure of growth for its future home directory storage requirements.
When compared to putting capital expense into expanding the existing installed storage, an
investment in the EMC Isilon solution over a three year period is shown to provide significant
financial benefits including:
 Capital savings exceeding $800,000 (30% savings)
 Total cost of ownership (TCO) is reduced by 46%
 Capacity managed per admin is increased 75%
 Disk usage increases from less than 60% to 80%+
 Ongoing storage management costs (salaries) decline 73%
The capital costs of deploying the Isilon infrastructure are offset within a four-month
This three-year cost of ownership study demonstrates the fundamental financial savings of
replacing the existing storage with an investment in the EMC Isilon solution that was derived
from the EMC Storage TCO Tool.

Lesson 2: Creating an Analysis

Isilon Solution Design 614


After completing this lesson, you will be able to perform an analysis using the TCO tool, and
access resources related to TCO.

TCO: Creating an Analysis


The new Quick Analysis view, enables you to create a high-level analysis by entering the
information in one screen and leveraging the tool’s assumptions and calculations. Only a few
data points are required to create a full analysis.
After inputting the customer's basic and analysis information, the tool displays the Analysis
configuration screen for selecting the EMC proposed arrays. On this page you can select
whether or not to compare the EMC proposed solution against arrays and solutions from
other vendors.
Next, the EMC proposed storage solution page will appear. Current Isilon node-types can be
selected and added to the analysis using drop-down lists. The tool is not designed to size a
cluster for performance. That should be done externally to the TCO tool. Prices are included
in the tool as well as discount assumptions that you can be modify.
You can use the tool's drop-down menus to add node types and quantities, and results will
include the amount of floor space and power calculations for the number and type of nodes

Isilon Solution Design 615


TCO: Isilon Proposed Solution


The TCO tool contains deep editing capabilities and allows you to modify the pre-loaded
assumptions to ensure that the nodes, drives, capacity and availability requirements are
accurately captured.

Isilon Solution Design 616

TCO: Customer’s Existing Environment


The TCO tool is most useful if you can include the information from the customer’s existing
environment or the proposed competitive product for a net new opportunity.
There are pre-loaded assumptions for NetApp and other products, which allow you to simply
add in the specific details.

Isilon Solution Design 617

TCO: Customer’s Growth Calculations


Based on the customer’s growth requirements, the TCO tool calculates the additional
capacity, nodes, rack and floor space over the specified period of time. It will maximize node
and rack capabilities, and add the additional items required to support the anticipated
growth rate.

Isilon Solution Design 618

TCO Tool Output


The output of the tool is available in Excel, Word, and PowerPoint formats.
It includes detailed financial information and graphical data for the financial parameters
including, lease or buy. It also includes the operational expenses, such as cost of labor, to
manage the proposed environment.

Isilon Solution Design 619

Accessing EMC Storage TCO Tool


You can access the EMC Storage TCO Tool by going to the link on the slide
( Channel partners may need to request their user
names and credentials for this tool by sending an email to

Isilon Solution Design 620

Easy Access to Information


Internal to Dell EMC, support and additional materials are accessible via PowerLink. Channel
partners can access the tool support directly from the login area. Also, useful documentation
can be found at the following address to further discuss the Isilon TCO benefits:

Isilon Solution Design 621

Resources and Support


There are a number of industry analyst white papers and supporting articles that can be
incorporated with your use of the TCO tool to further enhance the your efforts.

Isilon Solution Design 622

Lesson 2: Wrap Up


Isilon has a number of differentiating features that are associated with its single file system
and single volume architecture. The TCO tool, along with the analyst papers, are a means to
articulate the financial benefits of these features as they directly address a customer’s
requirements and then compare them to their existing environment or to a proposed
competitive solution.

Isilon Solution Design 623

Lesson 3: Competition and Technical Differentiators


After completing this lesson, you will be able to differentiate between tier 1 and tier 2
competitors, and identify technical differentiators between Isilon and competitors.

Isilon Solution Design 624

Who’s Your Competition?


Tier 1 competition includes:

 NetApp cluster mode
 HP StoreAll (IBRIX)
 HDS HNAS (BlueArc)
Tier 2 competition includes:
 Panasas
 Oracle ZFS
 Nexenta (ZFS)
 Market-specific. For example, Harmonic Media Grid, Quantum StorNext (M+E), DDN
(M+E, HPC).
On the next slide, we expand on some of the key technical differences between Isilon and
competing products.

Isilon Solution Design 625

Who's Your Competition?


In the table on the slide are the most common competitors to Isilon, divided into tier1 and
tier 2. Out of this entire list only one, Panasas, does not use traditional scale-up RAID with a
file system to present a global namespace.
Each of these competitors has a particular niche from low cost to a market focus like HPC in
the case of Panasas, or media and entertainment in the case of Quantum StorNext.
In the following material we’ll focus in on the tier 1 clustered scale-up competition and look
at some of the common limitations of this architecture that makes it less preferable to
Isilon’s scale-out technology for the requirements of Big Data.

Isilon Solution Design 626

Tech Differentiators: Scale-up Architecture


All tier 1 competitors to Isilon use traditional, dual RAID controller, scale-up storage as the
basis of their product.
As such, all of these competitors have common architecture that makes it ineffective for
scale-out, Big Data environments. Some of these common architectures are:
 Back-end RAID storage
 RAID sets, volumes, and LUNs
 SAN attached file export servers (with the exception of NetApp)
 Independent servers, SAN, and file system management (with the exception of
Next we’ll look at how this common “clustered scale-up” architecture, which is not well suited
for the requirements of Big Data.

Isilon Solution Design 627

Tech Differentiators: Competitor Limitations


All of these tier 1 competitors suffer from the same limitations of fixed availability and
reliability due to the use of clustered scale-up and traditional dual controller RAID storage.
All Isilon tier 1 competitors have:
 Data availability that is limited to a single controller failure
 Data availability that is limited to a max of 2 drive failures in a RAID set
 They can not scale RAID parity protection or drive rebuild time and when failures do
occur, they have significant effects across the namespace
 Performance is reduced by >50% when a controller fails
 Performance is significantly compromised when a controller is in drive rebuild
 In a dual-controller failure event, a significant amount of data is completely
unavailable in the namespace

Isilon Solution Design 628

Tech Differentiators: Data Protection


All the tier 1 competitors treat all the data with the same level of protection, RAID 6. They
rely on non-scalable data protection for storing scale-out.
This is again due to their use of traditional scale-up RAID storage which was developed when
the effect of scale-out and the advent of multi-petabyte drives and the time required to
rebuild them was unforeseeable.
The use of new 4TB drives increases the risk of data loss in this type of architecture.
As you can see in the table, Isilon is the only vendor that brings scale-out capabilities to Big
Data, being able to scale availability and reliability as an Isilon cluster and its data grows is a
key differentiating competitive feature of its technology that is highly valued by customers
across markets.

Isilon Solution Design 629

Tech Differentiators: Can't Scale Drive Rebuilds


Isilon storage systems are highly resilient and provide unmatched data protection and
Isilon uses the proven Reed-Solomon Erasure Encoding algorithm rather than RAID to
provide a level of data protection that goes far beyond traditional storage systems.
Here is an example of the flexibility and types of data protection that is standard in an Isilon
 With N+1 protection, data is 100% available even if a single drive or node fails. This is
similar to RAID 5 in conventional storage.
 N+2 protection allows 2 components to fail within the system - similar to RAID 6.
 With N+3 or N+4 protection, three or four components can fail - keeping the data
100% available.
Isilon FlexProtect is the foundation for data resiliency and availability in the Isilon storage
 Legacy “scale-up” systems are still dependent on traditional data protection. They
typically use traditional RAID which consume up to 30% - 50% of the available disk
capacity. The time to rebuild a RAID group after a drive failure continues to increase
with drive capacity, and data loss is susceptible to a 2 disk failure.

Isilon Solution Design 630

Isilon’s will provide 100% accessibility to data with 1, 2, 3 or 4 node failures in a pool. Data
protection levels can be established on a file, directory or file system level so all data can be
treated independently - meeting SLAs based on the application or type of data. Due to the
distributed nature of the cluster, all nodes participate in rebuilding files from a failed drive.
As the cluster grows, the data re-protection times become faster and more efficient making
the adoption of larger capacity drives very simple.
With Isilon, a drive can be rebuilt quickly - the larger the storage system, the faster. In Isilon
solutions, drives are hot pluggable and hot swappable with no downtime.
The following example demonstrates the dramatic effect that Isilon’s scalable data re-
protection capabilities has on reducing the risk of data loss. A five node X400 cluster, 87%
full, can restore a 4TB drive in 2-3 days while a competitive RAID system requires 1 to 2
weeks to complete. Isilon reduces the risks of data loss while traditional controller based
systems increases the risk of data loss for large drive sizes.

Objection Handling


The categories in this table are the most common objections brought up based by NetApp
NetApp wants to keep Isilon in an HPC corner by stating the performance characteristics
aren’t suitable for the enterprise. Both NetApp and Isilon have published specsfs

Isilon Solution Design 631

benchmarks. These benchmarks are widely accepted as general purpose file operations such
as file shares. Isilon holds the record for specsfs SMB performance and has proven linearity
for scaling performance as the cluster scales.
Isilon is not unified for both block and file. Customers are facing three major challenges
today: Big Data, cloud and analytics. Discuss the issues that the customer is facing and how
Isilon address all three challenges with scale-out NAS with next gen unified - Object and the
integration of HDFS for Big Data analytics.

Objection Handling (cont'd)


NetApp’s claims that ONTAP is proven in the enterprise is limited to 7-mode. C-mode is a
first generation product with a limited installed base in the enterprise. Ensure that NetApp
doesn’t blur the distinction between 7-mode and the risk of a new platform - c-mode can
have on the enterprise. NetApp has encryption capabilities that Isilon lacks. Don’t let the
NetApp blur the distinction between the features available for c-mode and those of 7-mode.
Today, encryption is not available on ONTAP c-mode. And neither is WORM. Isilon scale-out
capabilities include SEC compliant WORM and encryption can be provided with SED (self-
encrypting drives).

Isilon Solution Design 632

Objection Handling (cont'd)


Often NetApp will claim that their multiple file system and volume implementation of c-
mode is a safer bet than Isilon’s single file system and single volume. Ensure that customers
understand the concept of pools as failure domains and how available in any pool can be
equal to or better than that NetApp by being able to provide complete access to data with
up to 4 node failures or 4 drive failures per pool.
Whereas NetApp is limited to:
 Data availability to a single controller failure
 Data availability to 2 drive failures in a RAID set
 No scaling of RAID parity protection or drive rebuild time
 And their Failure modes have significant effects on the cluster performance
 Performance is reduced by 50% when a controller fails
 Performance is significantly compromised when a controller is in drive rebuild mode
 In a dual-controller failure event, a significant amount of data in c-mode is
completely unavailable in the namespace

Isilon Solution Design 633

Isilon Pass: Look at EMC Portfolio for Solution


While there are many opportunities for Isilon, there are also a few that should be avoided as
they are outside of the design and use case of Isilon today. Generally speaking, these are
usually uses that require small block random I/O, such as OTLP and database environments.
These instances are not scale-out file use cases and generally require lower latencies than
Isilon provides. Often they also require application integration with snapshot and other
utilities. These types of apps along with unified and VDI are better served with other EMC
products such VNX and VMAX. And while VDI may seem like an appropriate application for
Isilon, its irregular performance requirements are better served by VNX. Frequently a VNX or
VMX plus Isilon may be an appropriate solution for VDI with the VNX or VMX hosting the VDIs
while Isilon is the storage for the virtualized file stores.
If the environment doesn’t have 150TBs of data or more and is not growing at 20% CAGR or
more, Isilon may not be the appropriate solution for this customer’s requirements.

Isilon Solution Design 634



Sales information is thoroughly curated and regularly updated. While this module does
discuss the broad sweep of sales information and outlines the competitive landscape for you,
the detailed information that affects every deal will change frequently. The resources for the
sales teams are at the links listed here, and you should be ready to make practical
recommendations for competitive architectural designs based on this information.
Every opportunity is different, and you should combine what you learn from the battlecards
and playbooks with the customer's specific situation so that you can make the best possible
architectural recommendation rather than just working by formula.

Isilon Solution Design 635