Sei sulla pagina 1di 44

Disaster Recovery and

Business Continuity

Pranita Upadhyaya
Outline

Disaster Recovery and Business Continuity


– Business continuity planning
– Business impact assessment
– BCP documentation
– Nature of disaster
– Disaster recovery planning

1
DR and BCP motivation
 WTC, 9/11 terrors
 BASEL II
– An international business standard
– A series of recommendations on banking
laws and regulations
 e-commerce, e-banking, e-government
booming

2
Disaster aftermaths
 Most companies that experience a major
disaster are no longer in business within 5
years !!!
- The US Bureau of Labor -
 Revenue loss
 Brand image hurt
 Customer leaves

 What if in case of public sectors ?

3
How Disasters Affect Businesses
 Direct damage to facilities and equipment
 Transportation infrastructure damage
– Delays deliveries, supplies, customers, employees goi
ng to work
 Communications outages
 Utilities outages
Classification of Disasters

disasters

natural man-made
natural
natural non-intentional intentional

 Thunderstorms  Acts of people  Workplace violence


 Tornadoes  Technological  Civil disobedience
 Lightning system failures - Labor riots
 Earthquakes  Hazardous materials - Political riots
 Volcanoes  Environmental  Terrorism
 Tsunami  Nuclear  Weapons of mass
 Landslides  Aviation, railways destruction
 Floods, droughts  Fires, collapse
 Epidemics

5
9 major threats to Data Center
 Cooling system down
 Power system down
 Radioactive contamination
 Terror (including cyber terror)
 Telecom network cut off
 Huge human resources vacuum
 Earthquake
 Flood
 Fire

6
How BCP and DRP
Support Security
 BCP (Business Continuity Planning) and DRP
(Disaster Recovery Planning)
 Security pillars: C-I-A
– Confidentiality
– Integrity
– Availability
 BCP and DRP directly support availability
BCP and DRP Differences
and Similarities
 BCP
– Activities required to ensure the continuation of critic
al business processes in an organization
– Alternate personnel, equipment, and facilities
– Often includes non-IT aspects of business
 DRP
– Assessment, salvage, repair, and eventual
restoration of damaged facilities and systems
– Often focuses on IT systems
Industry Standards Supporting
BCP and DRP

 ISO 27001: Requirements for Information


Security Management Systems. Section 14
addresses business continuity management.
 ISO 27002: Code of Practice for Business
Continuity Management.
Industry Standards Supporting
BCP and DRP (cont.)

 NIST 800-34
– Contingency Planning Guide for Information
Technology Systems.
– Seven step process for BCP and DRP projects
– From U.S. National Institute for Standards and
Technology
 NFPA 1600
– Standard on Disaster / Emergency Manageme
nt and Business Continuity Programs
– From U.S. National Fire Protection Association
Benefits of BCP and DRP Planning
 Reduced risk
 Process improvements
 Improved organizational maturity
 Improved availability and reliability
 Marketplace advantage
The Role of Prevention
 Not prevention of the disaster itself
– Prevention of surprise and disorganized response
 Reduction in impact of a disaster
– Better equipment bracing
– Better fire detection and suppression
– Contingency plans that provide [near] contin
uous operation of critical business processes
– Prevention of extended periods of downtime
What is a Disaster Recovery ?
 DR : The planned process of restoring systems, data, and infrastructure
required to support key ongoing business operations.
 A DR plan : a proactive measure to minimize a company’s downtime
during sudden emergencies
 An unforeseen event : fire, flood, earthquake, etc

Customer site Emergency event Personnel mobilized to Company systems


declared backup DR site run from DR site

13
Benefits from DR center
 Significantly reducing the impact of sales, financial,
and customer losses during unforeseen interruptions
to the business operations

 A successful DR plan gives


– Confidence in knowing the key operations can take
place at a second site within a set timeframe – even if
your office is affected
– Protection against a single point failure associated with a
single site for operations and business data
– The ability to recover valuable company data
– Fully functional office working areas for your evacuated
employees during emergencies

14
Types of DR sites
Average
Type Ideal for Pros Cons
recovery
Hot Mission-critical Almost instant failover, Long setup process. High cost, 10
standby applications, high full data integrity, little higher administrative burden seconds ~
business impact to no impact to business 2 minutes
activities operations, guaranteed
recovery timeframe
Warm Mission-critical Fast failover, little data Long setup process, medium- 10 ~ 45
standby applications, loss, small-to-medium to-high cost, medium minutes
medium-to-high impact to business administrative burden
business impact operations, guaranteed
activities recovery timeframe
Cold Non-mission- Low initial cost, Unpredictable recovery time, 4 hours ~
standby critical applications, guaranteed equipment tedious restoration process, 2 days
low business availability potentially large impact to
impact activities business operations
Offsite Non-mission- Flexible, inexpensive, Very long recovery time, must 18 hours ~
data critical applications, secure first configure application 8 days
backup very low business environment and then restore
storage impact activities data, very large impact to
business operations
15
DR components

 DR center infrastructure
 DR Solution implementation
 DR planning

16
DR – infrastructure construction

17
Data center design considerations
 Operational reliability
 Quick changes, including additions and
rapid expansions
 Online status monitoring
 Life cycle management
 Customer access
 Physical security
 Rapid detection, identification and
resolution of faults

18
Considerations for DR site selection
 Geographic accessibility from the main center
 Expandability for the future demand
 Network capabilities for interconnections (optical fibers)
 Proximity to public utilities (power supply, emergency
services, transport, etc)
 Security
- Natural hazards like flood, seismic activity, and lightning
- Potential man-made hazards (strikes, fire, pollution, etc)
 Manageability
 Economic feasibility

19
Case : DR site selection - distance
 US : 40 miles (64Km, out of the same
influence of the hurricane)
 Japan : on a different tectonic plate, a
different seismic activity zone
 EU : 5~10Km (against bombing attack)
 Korea : similar to the situation in EU, usually
+30km away

 What about in Nepal?

20
DR site selection - distance

disaster
manageability responsiveness

optimum point ?

distance

21
Site evaluation factors : ASSES

 Backup, redundancy
Availability
 24*7 operation

 Natural disasters
stability Security
 Potential man-made disasters

Survivability  IT resources

Efficiency  Maintenance
 Hi-quality equipment
economics
Scalability  Physical scalability
 Functional scalability

22
General DR plan
 Primary processing location
 Backup processing location Primary
– Mirrors primary processing
location
– Can be used for load
balancing
Backup
 Remote storage and archival
– Tape vaults
– Storage for data files, SaaS
library images
– Allows government
operations continuity in the
event of major disruption Archive

23
DR Solution implementation

24
DRS implementation

Planning Analyzing Proceeding & execution

Business
Define DR DR Implementation Implementing
impact & DRP
requirements solution methodology DRS
system

 BIA, system analysis


 DR  DR solution selection
- business impact
requirements - H/W solution
- data
- RPO - S/W solution
- customer contact
- RTO
- RAO  DR planning
 DR solution analysis
- DR process
- economics
 Detailed DR - DRP test & update
- manageability
targets
- technological
- reference

25
DR requirements
 Identify what are the Functional Areas that MUST
be recovered during an emergency
 Define the Recovery Time Objective (RTO)
- “How much downtime (if any) can be tolerated?”
 Define the Recovery Point Objective (RPO)
- “How much data (if any) can you afford to lose?”

In addition,
 Define the Recovery Access Objective (RAO), and
 the Recovery Scope Objective (RSO)

26
RPO/RTO vs. cost

Critical data is Disaster Systems recovered


recovered strikes and operational

time

time t0 time t1 time t2


Recovery point Recovery time
Days hours mins secs secs mins hours days weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape


backup replication replication replication cluster migration restore

Increasing cost Increasing cost

How current or fresh is How quickly can systems and


the data after recovery ? data be recovered ?
27
DR solutions
type solution DB/file
- HAGEO
IBM unix
- GEORM DBMS,
OS
- VVR (Veritas Volume HP, SUN File system
System
mirroring Replicator) unix
(S/W type) - RRDF DBS
DB2, ORACLE
DBMS - Symmetric Replication
ORACLE DBMS
- SharePlex
- SRDF EMC
Disk mirroring All file
- HRC HITACHI
(H/W type) systems
- XRC IBM
• HAGEO : High Availability Geographic Cluster • SRDF : Symmetrix Recovery Data Facility
• GeoRM : Geographic Remote Mirroring • HRC : Hitachi Remote Copy
• RRDF : Remote Recovery Data Facility • XRC : eXtended Remote Copy
28
DR solution selection
cost

high
Mirroring(Copy Database)

real-time data replication(Copy data and database objects)

log journaling

periodic data replication


offsite archive
low
backup tape
time
minutes hours days

- Increasing CAPEX -Increasing OPEX


- DR solution/equipment -Backup data
- Real-time data replication -Data consistency
- N/W implementation needed

29
DR solution selection
Continuous availability High availability Improved availability Traditional availability

Loss

 IRC : intermittent
SOS remote copy
Loss after
backup Remote
DASD  SOS : standby
operating system
 PPRC : peer-to-peer
Remote tape
IRC remote copy
Little loss  XRC : extended
XRC
Electronic remote copy
RR/400
journaling  Electronic journaling :
GDPS/XRC dual transaction
PPRC logging
No loss SRDF
GDPS/PPRC
Recovery
time
0~1 hour 1~6 hours 6~24 hours 24~48 hours

30
Business Continuity Planning

31
Creating a BCP
 Is an on-going process, not a project with a
beginning and an end
• Creating, testing, maintaining, and updating
• “Critical” business functions may evolve
 The BCP team must include both business an
d IT personnel
 Requires the support of senior management

32
BCP phases
1. Project management & initiation
2. Business Impact Analysis (BIA)
3. Recovery strategies
4. Plan design & development
5. Testing, maintenance, awareness, training
I - Project management & initiation
Establish need (risk analysis)
Get management support
Establish team (functional, technical, BCC – Business
Continuity Coordinator)
Create work plan (scope, goals, methods, timeline)
Initial report to management
Obtain management approval to proceed
II - Business Impact Analysis (BIA)
Goal: obtain formal agreement with senior manageme
nt on the MTD for each time-critical business resource
MTD – maximum tolerable downtime, also known as
MAO (Maximum Allowable Outage)
Quantifies loss due to business outage (financial, extra
cost of recovery, embarrassment)
Does not estimate the probability of kinds of incidents
, only quantifies the consequences
II - BIA phases
Choose information gathering methods (surveys,
interviews, software tools)
Select interviewees
Customize questionnaire
Analyze information
Identify time-critical business functions
Assign MTDs
Rank critical business functions by MTDs
Report recovery options
Obtain management approval
III – Recovery strategies
Recovery strategies are based on MTDs
Predefined
Management-approved
Different technical strategies
Different costs and benefits
How to choose?
Careful cost-benefit analysis
Driven by business requirements
Strategies should address recovery of:
•Business operations
•Facilities & supplies
•Users (workers and end-users)
•Network, data center, telecommunications (technical)
•Data (off-site backups of data and applications)
IV – BCP development / implementati
on
Detailed plan for recovery
•Business & service recovery plans
•Maintenance
•Awareness & training
•Testing
Sample plan phases
•Initial disaster response
•Resume critical business operations
•Resume non-critical business operations
•Restoration (return to primary site)
•Interacting with external groups (customers, media,
emergency responders)
V – BCP final phase
Testing
•Until it’s tested, you don’t have a plan
•Testing types: Structured walk-through, Checklist, Simulation,
Parallel, Full interruption.
Maintenance
•Fix problems found in testing
•Implement change management
•Audit and address audit findings
Awareness / Training
•BCP team is probably the DR team
•BCP training must be on-going, part of corporate culture
DR planning

40
Disaster recovery plan
 DRP
– is a subset BCP (business continuity
planning), and
– should include planning for resumption of
applications, data, hardware,
communications (such as networking) and
other IT infrastructure.

41
Body of DR plan

• Immediate steps to be taken


Emergency information sheet • Individuals to be contacted

• Its purpose, author,


Introduction to the plan organization, scheduled updates

Communication plan

Pre-disaster actions

• Step by step, what to do


Instructions for response and recovery afterwards

42
Case : DR plan
Main center DR center

Spread out & Identify emergency &


redeploy Make DRS ready
time Identify disaster &
Declare emergency response Recover system
System recovery
Activate system

Restore data
RTO : 3 hours
Recover DB & task Recover N/W

Consistency? Recover DB & task

DB & business recovery


Start DRS

Resume business

43

Potrebbero piacerti anche