Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Availability Objectives
WHITE PA P E R
shortcomings.
Introduction
Table of Contents
In today’s IT environments, the need for the highest levels of availability is a well-established
principle. Businesses increasingly require immediate and continuous access to their infor-
Introduction 2
mation systems, and regularly set up traditional high availability software clusters to meet
Defining and measuring this business objective.This common solution, however, is often not enough to meet busi-
high availability 2 ness’s high availability objectives.
Supporting the four pillars
The purpose of this report is twofold. First, we will discuss the need for high availability
of high availability 4
and how availability is defined and measured. Secondly, we will discuss how to weed out
Achieving functional the shortcomings in common high availability designs and describe a methodology to
high availability 6 address those needs.
Conclusion 8
This report is written from a technology independent perspective. It takes no bias towards
high availability software products, server, storage and network hardware vendors. Also, our
high availability discussion will focus primarily on open systems technology such as UNIX
servers and Windows 2000/Windows NT servers.
Figure 1: causes of downtime An application environment is highly available if it possesses the ability to recover automatically
within a prescribed minimal outage window.1
Source: Data Quest, November 1999
HA implies that no single point of failure (SPOF) exists in the application environment.
30% An SPOF is any software, hardware or environmental component that, if it should fail,
would take the application environment offline for an extended outage and require
25% 1 human intervention to correct.
2 It is also important to note what HA is not. HA is not continuous access to the application
20% environment throughout failures.This area of availability—called continuous availability—is
3
addressed by such technologies as fault tolerant hardware, data center site redundancy, and
15% 4 real-time remote data replication. Application environments requiring continuous
availability cannot sustain any kind of failure.
10%
What is downtime?
5 The goal of high availability solutions is to automatically recover from functional down-
6 time within a minimal outage window. By downtime we mean a service interruption at
5%
any layer of the application environment.
0%
It is important to understand clearly what we mean by the application environment.The
So
Ha
Hu
Ne err
Lo rk
Ot nvir
application environment refers to all the hardware and software required to support a
he
ca
ftw
rd
tw or
an
r
w
le
o
ar
ar
function provided to the business from IT. Largely we think of this environment as the
e
on
servers, software, network and storage required for users to be able to execute a pro-
m
en
t
2 WHITE PA P E R 1
The minimal outage window depends on the critical nature of the business being executed. Generally these windows are from 3 to 7 minutes.
• Database server
• Operating System (at each level)
• Application software (at each level)
• Data storage subsystems for each server
The study showed that fully 27 percent of service interruptions were software related.
From this statistic alone, we can gather that installing any solution that addresses only
hardware related failures would be incomplete.
Figure 2: cost of downtime
Not surprisingly, software monitoring utilities have recently become more common-
place, suggesting that software related failures are being acknowledged and addressed Source: Meta Group, Individual.com, October 2000
more frequently.
Other sources of downtime included human error (18 percent), network issues (17
percent), local environment issues (8 percent), and other issues (7 percent). True high Industry $ per hour
availability can only be achieved when consideration is given to all areas that may cause
downtime. Energy $3M
Telecommunications $3M
What is the cost of downtime?
Now that we understand what is meant by downtime, the first step in planning for high Finance $1.5M
availability is to understand the exposure posed to a company by the interruption of the
IT-dependent mfg. $1.5M
application environment. It is often prudent to quantify the cost of downtime per hour of
that environment. Quantifying the cost of downtime is helpful as it clearly and concisely Healthcare $0.5M
details the risk a company faces. Deploying high availability solutions is one way of miti-
Media $0.5M
gating that risk. Simply put, if you know how much you can lose, you know how much to
spend in prevention. Hospitality/Travel $0.5M
The cost of downtime varies tremendously by industry. A study recently published by the
Meta Group puts the cost of downtime for many common industries anywhere between
$0.5 million per hour to $3 million per hour (see Figure 2).These figures are based on
an entire IT operations center being off-line. However, outages of a single server can
range in the several thousands of dollars per hour.These figures are just to be used as an
example to show that the revenue lost in an outage is substantial and should be investi-
gated on an application environment basis for each company.
Any accurate cost of downtime study must also consider the indirect costs associated
with service interruptions. It is difficult to translate these numbers into loss per hour of
downtime, but to deny that such intangibles contribute to cost is shortsighted. Examples
of these intangible costs may be decreased customer satisfaction, penalties for failure to
meet service level agreements or a legal liability associated with failure to provide serv-
ice.This is especially relevant in the healthcare and financial industries.
• Redundant sites
Figure 3: cost of availability • Hot site disk mirroring
Recovery Time
For the sake of argument, let’s enhance the above application environment to include a
realistic scenario. Figure 4 describes a typical application environment, with each ele-
ment’s anticipated uptime/downtime.This is a rough number of expected downtime
Database application
software 99.3% 61.32 hours
4 WHITE PA P E R
within the environment per year.The most important point to note is that these avail-
ability numbers assume that all other criteria for successful high availability solutions
have been met.The next section describes these criteria and how they can dramatically
affect functional availability of an application environment.
The technology of the hardware and software is simply not enough.We must also
address other areas that affect availability such as:
• An adequate and well trained staff
• Change management policies and problem determination policies that are detailed and
specific.These policies must be known and respected by all the staff
• Adequate environment monitoring tools
• Successful backup/recovery and disaster recovery tools and plans
If any one of these areas is not adequately addressed the availability of the application
environment will be in jeopardy.
We call these different areas of application environment support “pillars,” and categorize Figure 5: pillars of high availability
them into four groups: infrastructure, business contingency, support services, and operations.
We take the approach that availability objectives are achieved with a combination of Business Application
hardware and software technology brought together by a philosophy of availability.
Business Continuity
Simply put, the philosophy is as follows:
Support Services
Infrastructure
Operations
High Availability of an application environment is achieved when all pillars of that environ-
ment are adequately supported.
In some capacity every application environment contains the four components we list as
pillars. However, it is a matter of opinion as to which items are placed in which pillar.
Often the items within each pillar address a multitude of application environments.
What is crucial is that all items that affect availability are placed in pillars for examination.
Understanding these pillars and shoring up any weaknesses provides a solid foundation for
addressing availability effectiveness.
The second major area of the infrastructure pillar is the shared storage infrastructure.While
technically another component of the hardware solution, the shared storage really stands
alone as a crucial piece of the overall availability solution.This area of the pillar requires
focus on the storage hardware technologies such as enterprise storage arrays and their asso-
ciated data management software.These tools can move data from one storage device to
another or provide for real time mirror copies, both in local as well as remote locations.
Also important to the storage infrastructure is networked storage such as the storage area
network (SAN), the network attached storage (NAS), and the IP storage solutions.
This pillar covers two major areas: local backup and recovery solutions, and business
continuance solutions.The local backup and recovery solution has obvious influence on
application availability as nearly all applications have a significant data impact. Focus here
is on the use of the technology and the retention policies of the data.
The business continuance, or disaster recovery, solution is also closely coupled with the
availability of application environments. The focus here is on the use of technology,
information from reports such as a business impact analysis, and execution of disaster
recovery testing.
Networks and connectivity are also critical to application availability. Areas such as
redundancy in the network architecture and throughput analysis should be investigated
to understand their influence on the ability to execute a business process. Another key
piece of network support services is the ability to quickly diagnose and repair network
related issues. Critical to this success are detailed diagrams of all network segments;
these should be regularly updated and distributed to support teams.
Support Services
Operations
The best way to ensure availability objectives are being met is to perform an availability
effectiveness assessment of the application environment.This investigation should be
conducted through a series of server and environment interrogations and interviews
with key staff.The investigation should study each of the four pillars in three different
Tools dimensions: tools, staff, and procedures (see figure 5).
Staff
Procedures
In general the tools of a pillar refer to the hardware and software components installed
to meet specified technology needs.We must discover whether tools exist to support
this pillar, whether they are used or known, and whether the current tool is effective in
6 WHITE PA P E R
supporting the pillar. A critical tool to investigate is the presence of customized diagrams
and documentation that clearly depict the application environment.These can be server,
network and storage configurations.
The staff associated with a pillar is the employees, the managers and the consultants
needed to support that pillar.This staff must be adequately trained with adequate num-
bers. For example, it is never a good idea to have only a single person who is capable of
providing system administration duties for critical servers.The staff must be well sup-
ported and represented by management. And they must have adequate training in tech-
nology supporting future IT initiatives. Overall, to support functional high availability,
critical staff should be self-sufficient. Contracted remote monitoring services are benefi-
cial for supplementing and aiding critical staff, but avoid dependence upon outside
groups and contractors for critical functions.
The procedures associated with the support of a pillar should focus on how and why
technology is used to meet availability objectives. Most importantly, they must be docu-
mented and known to all. Far too often we allow smart minds to contain far too much
critical information without asking for them to write it down.These procedures should
be clear so that even the simplest of minds can follow them. All should be educated
on—and instructed to follow—the procedures. Lastly, all procedures should constantly
be evolving, or regularly reviewed and updated.
This quantitative score can then be compared to a perfect score, or if similar questions are
asked, the score of another application environment. A quick overview of such scores can
show whether deficiencies exist in tools, staff, or procedures within a particular pillar.
The second crucial method for providing feedback should be a qualitative approach.
The person evaluating the availability effectiveness based on interviews should draft
this. It should report on responses to prepared questions, especially when those
responses differ from person to person. For example, an employee may report that a
backup and recovery tool exists, but is totally worthless. On the other hand, a man-
ager who is asked the same question might reply that the backup and recovery tool
exists and completely meets their needs.
Since the hardware and software technology associated with high availability is fairly well
understood, the changes required to improve availability effectiveness often do not
require the capital acquisition of technology. Rather, they require proper creation and
management of policies and procedures. If is often most effective if these policies are
created and managed internal to a company, as full-time employees tend to have the best
insights into what will be fruitful solutions.
In general, it is most effective to first address issues that are most import to an applica-
tion environment. For example, having clustering software installed and running without
a trained staff to support it can often affect system availability more negatively than not
having clustering software at all.
Conclusion
In nearly all aspects of today’s business world the availability of the underlying IT infra-
structure is crucial. Even being off-line for a short time can have a tremendous effect on
a company’s health and economic viability. But preventing these outages cannot be
addressed simply by a technology solution. People, policy and procedures can have a far
more significant impact on availability. Functional high availability can only be achieved
through an effort to investigate all areas that can stifle the business transaction.
If any company is considering deploying high availability solutions, it is also critical that
they consider availability from the users’ perspective. Specifically, an investigation should
be performed to understand how effective a solution would be in terms of user inter-
action and satisfaction.
CNT is one of the world’s largest providers of comprehensive © 2003 by Computer Network Technology Corporation (Nasdaq: USA: 1-800-638-8324 Canada: 905-595-1500
storage networking solutions. For over 20 years, our experts have CMNT). All rights reserved. Any reproduction of these materials U K : 4 4 - 17 5 3 - 7 9 2 4 0 0 F r a n c e : 3 3 - 1 - 4 13 0 - 1 2 1 2
analyzed, designed, and built enterprise storage networks. without the prior written consent of CNT is strictly prohibited. CNT, Australia: 61-2-9540-5486 Germany: 49-89-42 74 11-0
the CNT logo, Channelink, and UltraNet are registered trademarks of Switzerland: 41-1-73 35-733 Belgium: 32-2-737 76 42
Visit www.cnt.com to learn about our solutions, products, partner- Computer Network Technology Corporation. All other trademarks Italy: 39-06-51 49 31 Brazil: 55-11-5509-1504
ships, career opportunities, and more. identified herein are the property of their respective owners. CNT is Japan: 813-5403-4858 Other locations: 1-763-268-6000
an equal opportunity employer. CNT corporate headquarters’ QMS is
registered to ISO 9001: 2000. Certificate #006765. PL581 | 0803