Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This document discusses backup and restore strategy of systems with respect to the
setup and implementation at PIFRA. A System Disaster may be considered "any
event that causes significant disruption in services for a period of time that effects the
organization". Therefore, this plan covers various levels of service interruption. The
information in this plan is organized in such a way that this is possible for the other
sites to choose the pieces of the plan necessary for recovery depending on the type
of interruption. For example, in the event of a server failure, the DRP must provide
the contact numbers of persons necessary for recovery of disaster. It should also
provide information of onsite and offsite backups. With this information, the System
Administration Team can begin the recovery process in case of a disaster in the most
efficient and planned manner.
A sensible recovery plan may be the one thing that keeps it going. Here are the basic
steps to creating such a plan and making sure it will work when required. These
steps are observed through the extensive study of on going process at PIFRA Test
Site.
Basic Steps
Make disaster recovery an integral part of the way PIFRA business process runs.
Someone at the top needs explicit responsibility for overseeing the plan, as it is too
easy to make dangerous mistakes when times are tough.
Prioritize
The data and systems that need to be recover first. Each section thinks that theirs is
the most important, but the decision has to be made -- and that usually ends up with
the System Administration, which has the appropriate business insight. Don't forget
to look outside the data centre for things that need protecting. If employees have
heavily customized desktops to do their work, how will it affect them if they have to
start from scratch? Paper records are also always important.
Page 1 of 26
Disaster Recovery Plan
Redundancy
Make sure System administrator have redundancy for critical systems, whether it's a
RAID storage system, server mirroring or even a complete duplicate data centre.
There should be no one point of failure, including power supplies,
telecommunications or even the office building itself, that will disrupt SAP Servers for
any length of time.
Backup
Along with redundancy, backup is the most important part of disaster recovery. Once
System Administrator knows what he needs to backup, decide when and how he will
do his backups. A common scheme is to do a full backup at the beginning of each
week, followed by deltas -- backups of changes -- at least daily if not more often.
These can be differential backups, where the entire difference from the starting state
is copied each time, or incremental, where the difference since the last backup is
stored. Incremental backups take less time but produce more individual backups that
have to be restored in order; with differential, administrator have just two restorations
to make.
Offsite backups are essential, but difficult to manage -- especially for the smaller
organizations. Where teleworking is common, it may be possible to automate the
keeping of remote copies of information as part of the standard access
arrangements. Whatever the backup process -- and floppy disks, CD-Rs, removable
hard disks, tapes, leased lines and VPNs are all common -- ensure that access to
offsite backups isn't dependent on just one person. It is common to duplicate the
weekly backup and keep it offsite, and also to keep monthly backups.
Security
Regular Tests
Run regular tests to shake the bugs out of SAP servers -- and that means testing
absolutely everything. Tests that produce no errors aren't tough enough:
Administrators not testing to make sure it works, but to find out when it doesn't. This
will also tell you if your recovery procedure is working but too slow or cumbersome --
a system that comes back but takes two days to rebuild may be inappropriate.
Deciding backup and restoration strategies should be part of the initial architectural
planning of any major system and should influence bus types, storage devices and
the segmentation of the network.
When your SAP business processes change, reassess your plans. An acquisition,
new operating system installation or reorganization can trigger this. Also, when
PIFRA change an underlying system and migrate data over make sure you can
recover to the old system for as long as may be necessary -- it's no good having old
SIEMENS SAP Implementation for PIFRA Prepared By Muhammad Bilal
Page 2 of 26
Disaster Recovery Plan
data desperately needed if System Administrator no longer have a system that will
read it.
There's no point in having your data in the hands of an inexperienced persons. And
keep contact information up to date -- lists of employees with addresses and mobile
phone numbers, supplier contacts, and making everyone's role in the recovery plan
part of their basic training.
Siemens Team offer a full range of recovery options depending on PIFRA specific
needs and budget from simple hardware replacement to complex mirroring.
Backups:
Backup means making copies of files from one location to another. Both, the source
and the target can be on the same or on different storage device. It is best to copy
the files to a different storage device other than the source. Data can be restored
from this backup if the original files get damaged, lost or else.
The backup strategy adopted at PIFRA is in such a way to minimize the chances of a
system disaster and to recover the system at the earliest in case of such a
happening.
A System Disaster:
A System Disaster can mean anything: theft, fire, flood, an earthquake, a virus or
anything that could keep users from accessing SAP Servers and hence the data. If
System Administrator loses entire system (possibly including hardware), then he has
to recover the system as much and as soon as possible.
Data recovery plan for the first case should be a part of comprehensive plan for
disaster recovery in second case. In both cases it is important to test recovery plans
and to repeat the testing procedure periodically.
Page 3 of 26
Disaster Recovery Plan
Critical Steps Involved In SAP Recovery
Plan
This plan follows the document needed for PIFRA Test Site Services, and prepared
by Disaster Recovery Team in co-ordination with PIFRA Directorate.
The need for a disaster recovery plan is in order to provide data security and access
to users during normal operations and at times of disaster.
Members of the Disaster Recovery Team responsible for the Data Processing
component of disaster recovery include:
System Administrator
WAN Administrator
LAN Administrator
The Information Technology Services of Siemens will plan and implement disaster
recovery techniques applicable to:
Page 4 of 26
Disaster Recovery Plan
PIFRA project data maintained by computing systems located at the Central
Office (Test Site).
Data communications systems, including hardware, software, and lines.
As per the System data security audit, the Assistant Manager for Administrative
Services will be the primary administrator for the institutional disaster recovery plan.
Each computer-oriented Pilot Site will develop a disaster recovery and/or business
interruption plan to assure their continued operation when Central systems are
inoperable.
Equipment inventory
The Test Site at the Central Office is responsible for disaster recovery for the
following hardware and software systems at pilot Sites.
SAP Servers
Hardware, software, and data.
Goods inventory system hardware, software, and data.
Personnel responsible for hardware, software, and data tapes.
PIFRA electronic mail hardware and software.
WAN hardware, software, configurations, and lines for systems within.
Central Office LAN hardware, software, configurations, data, and wiring.
UPS equipment located at the Test Site Office.
Key client workstations at the Test Site Office.
Page 5 of 26
Disaster Recovery Plan
Physical Space for Restoring Institutional Systems
In the event current office space was unusable, new office space would be necessary
that meets the follow requirements:
Equipment
PIFRA and SAP servers would need to be acquired from Supplier. LAN server could
be any Intel based PC of sufficient size. Access to 2 ports on link integrity a hub
would be required for PIFRA and electronic mail servers. 48 port link integrity Switch
would be required for restoring new Central Office LAN.
Personnel
All System administration staff would be required to restore mission critical systems
to operation. Key members of other areas including the Functional Consultants, and
members of their staffs, would be required to assist in restoration of critical systems.
Record Storage
Offsite backup tapes are stored in a safe deposit box at ITS Office of Siemens. Each
Monday the backups for the previous Friday are moved to this site and previously
stored tapes returned. Access to the offsite tapes is available 24 hours. System
Administrator has access to this box. The box also includes a copy of this plan, as
well as brief documentation on PIFRA, Inventory.
Page 6 of 26
Disaster Recovery Plan
Steps involved in Disaster recovery plan:
Risk Analysis
The first step in drafting a disaster recovery plan is conducting a thorough risk
analysis of SAP R/3 systems. List all the possible risks that threaten system uptime
and evaluate how imminent they are Test Site. Anything that can cause a system
outage is a threat, from relatively common man-made threats like virus attacks and
accidental data deletions to more rare natural threats like floods and fires.
Once you've figured out the risks, ask what we can do to suppress them, and how
much will it cost.
The results of risk analysis should be a comprehensive list of possible threats, each
with its corresponding solution and cost.
Once our Disaster Recovery Plan is set, test it frequently. Eventually you'll need to
perform a component-level restoration of your largest databases to get a realistic
assessment of our recovery procedure, but a periodic walk-through of the procedure
with the Recovery Team will assure that everyone knows their roles. Test the systems
you're going to use in recovery regularly to validate that all the pieces work. Always
record your test results and update the Disaster Recovery Plan to address any
shortcomings.
Page 7 of 26
Disaster Recovery Plan
Update the Disaster Recovery Plan:
It is very important to update the disaster recovery plan from time to time. This
depends how rapidly changes are being brought in the organization with respect to
the change in system architecture, change of system activities, etc.
Page 8 of 26
Disaster Recovery Plan
FD Baloch Production
Database Server FDB B Siemens ITS
Application Server FDB B Siemens ITS
District Functionality
System ABF C Siemens ITS
Other Servers
Exchange Server PIFRA 192.168.1.7 C Central Site
File Server SIP 192.168.1.250 Central Site
LAN Equipment:
WIRELESS NETWORKING
Page 9 of 26
Disaster Recovery Plan
Application/system software inventory
System Software:
Application Software
Database Software: ORACLE, SAP
Database Failures:
There are certain cases in which the database of a system crashes or fails to open.
These database errors can occur due to the following:
Page 10 of 26
Disaster Recovery Plan
Failures caused by user errors (such as logical errors)
Failures due to errors in upgrade
Page 11 of 26
Disaster Recovery Plan
Oracle Recovery Catalog Maintenance
Use Recovery Manager to register, resynch, and reset a database
Maintain the recovery catalog using change, delete, and catalog commands
Query the recovery catalog to generate reports and lists
Create and execute scripts to perform backup and recovery operations
Create, store, and run scripts
Page 12 of 26
Disaster Recovery Plan
Restore files to a different location if media failure occurs
Recover a database in noarchivelog mode using RMAN
Use the Export utility to create a complete logical backup of a database object
Use the Export utility to create an incremental backup of a database object
Invoke the direct-path method export
Use the Import utility to recover a database object
Page 13 of 26
Disaster Recovery Plan
Additional Security Options
Backups:
Backup means making copies of files from one location to another. Both, the source
and the target can be on the same or on different storage device. It is best to copy
the files to a different storage device other than the source. Data can be restored
from this backup if the original files get damaged, lost or else.
The backup strategy adopted at PIFRA is in such a way to minimize the chances of a
system disaster and to recover the system at the earliest in case of such a
happening.
System Backups
Restore:
Raid Technology:
The basic idea behind RAID (Redundant Array of Independent Disks) is to combine
multiple small, inexpensive disk drives into an array which yields performance
exceeding that of one large and expensive drive. This array of drives will appear to
the computer as a single logical storage unit or drive.
Since at PIFRA, there are large quantities of data to keep, it would be beneficial
using the RAID technology. One of the primary reasons to use RAID includes greater
efficiency in recovering from a disk failure. Therefore RAID reduces the chances of a
disaster to a system.
ERDs:
Emergency Repair Disk (ERD) creation procedure has been integrated with Microsoft
Servers in case of registry corruption. Registry is the main database of operating
Page 14 of 26
Disaster Recovery Plan
system which holds all the information related to hardware and software installed on
the machine. It is recommended that ERD to be updated frequently.
Data to backup:
Types of Backup
Offline
Online
Offline
Incase of the SAP R/3 System an offline backup is taken with the application and
database stopped - that is, the users cannot work.
In an offline backup of the complete database, you have a backup of the database
that is consistent. If you work with the database after the backup, the backup is
consistent, but not up-to-date. In this case, you have to recover the database after
you restore the backup.
Online
Page 15 of 26
Disaster Recovery Plan
Online backup is taken with the application and database running - that is, the users
can continue to work normally. The management of database changes by the
corresponding Oracle background processes is not affected either.
Backup Utilities
The offline backup is taken while the SAP application and database is down in the
case of SAP R/3 Systems. Similarly for systems other than SAP, it is essential that no
users are connected to those systems for an offline backup. Since high capacity tape
drives are now more common, it is simple and safe to backup the entire server. This
full server backup eliminates the possibility of not backing up an important file.
In an offline backup the data in the database does not change while the backup is
being made, which means that you have a static picture of the database and do not
have to deal with the issue of data changing while the backup is being run. A full
server offline backup also gives you the most complete backup in the event of a
catastrophic disaster. On one tape, you have everything of the server.
SAP R/3 offers the utility programs BRBACKUP, BRARCHIVE and BRRESTORE.
Each of these programs has its own range of functions that is backup, archiving the
redo log files and restore respectively.
Page 16 of 26
Disaster Recovery Plan
4. Thursday -----------------------------------------------
5. Friday Online Backup (Online Tape # 3)
6. Saturday -----------------------------------------------
7. Sunday Offline Backup (Offline Tape # 1)
On next Monday the first tape is repeated for online backups. DDS tapes for offline
backups will be recycled on every 4th Sunday.
Note: We have 3 days Online Backup of the development system and Offline
Backup of 2 Sundays in hand. First DDS tape of Offline Backup is repeated on every
4th Sunday.
Offsite backups are to be rotated on weekly basis.
On next Monday the first tape is repeated for online backups. DDS tapes for offline
backups will be recycled on every 4th Sunday.
Note: We have 3 days Online Backup of the Quality assurance system and Offline
Backup of 2 Sundays in hand. First DDS tape of Offline Backup is repeated on every
4th Sunday.
Offsite backups are to be rotated on weekly
From every 15th day the first DDS tape is to be repeated for online backups
Page 17 of 26
Disaster Recovery Plan
15. Monday Online Backup (Online Tape # 1)
16. Tuesday Online Backup (Online Tape # 2)
17. Wednesday Online Backup (Online Tape # 3)
18. Thursday Online Backup (Online Tape # 1)
19. Friday Online Backup (Online Tape # 2)
20. Saturday Online Backup (Online Tape # 3)
21. Sunday Offline Backup (Offline Tape # 3)
22. Monday Online Backup (Online Tape # 1)
23. Tuesday Online Backup (Online Tape # 2)
24. Wednesday Online Backup (Online Tape # 3)
25. Thursday Online Backup (Online Tape # 1)
26. Friday Online Backup (Online Tape # 2)
27. Saturday Online Backup (Online Tape # 3)
28. Sunday Offline Backup (Offline Tape # 1)
Note: We have 3 days Online Backup and Offline Backup of 3 Sundays in hand.
First DDS tape of Offline Backup is repeated on the 4 th Sunday. Offsite backups are
to be rotated on Monthly basis.
There are chances of failure or crash of system at various situations. The possible
failures can be:
Hardware Failures
Network Failures
Operating System Failure
SAP Software/Application Failure
Database Failure
Reason Of Failures
Fire
Theft
Earthquake
Flood
Page 18 of 26
Disaster Recovery Plan
Electric Fluctuations
UPS Failure
Hard Disk problems
Problems in the hardware e.g. RAM, Motherboard, VGA, Data Buses etc.
Abnormal shutdown of server
Virus Attack
Password Policy
Illegal Operation in the System
To save the system from possible crash there are number of options that must be
taken into consideration;
To protect data loss & system disaster, RAID level (software & hardware) is
configured.
For safety you should have two Array controller and SCSII disks should be
distributed equally.
It is also important that your hard disk should be hot plug-in (i.e. incase of disk
failure the disk could be replaced while the system is running).
There must be at least two Network Cards to protect the system against
Network Card failure.
To save the system from electric shock there is dire need of having power
system solely dedicated for a particular LAN setup, for that purpose a
separate electric power-system powered by generator is a necessary pre-
requisite.
If UPS fails to support the servers, there are chances that operating system
may be saved but in extreme case there is a possibility of crash of operating
system. For that purpose Repair Option during Operating System installation
must be utilized from operating system CD.
If hard disk failure occurs than first repair or replace the hard disk.
Do not install unnecessary softwares on the servers, for those purposes view
the error logs daily to rectify the problem.
Avoid abnormal shutdown of servers. If it happens recover the system from
the last known good configuration settings
If operating system crashes and blue screen appears, then fresh installation
of operating system is recommended on the same drive where the previous
operating system is residing with new folder name.
SIEMENS SAP Implementation for PIFRA Prepared By Muhammad Bilal
Page 19 of 26
Disaster Recovery Plan
Systems should be updated with the latest anti virus software in order to be
secure from virus attacks.
To secure the system and protect it from any hazard, operating systems
password must be changed frequently and make it difficult for others to guess
it. Proper password policy should be maintained.
Benefits are from complex business continuity recovery issues to focus on PIFRAS
core business process.
Provides long term cost efficiencies with complete project management and
professional task execution
Ensure a comprehensive business continuity program consistent with
PIFRAS policies
Provides a single point of contact for improved communications and co-
ordination between Client & Customer.
Performance tuning
Performance should be one of the main issues during every phase of system
development: application analysis, design, and implementation. But usually it
becomes an issue once it becomes a problem when the system is in run.
Page 20 of 26
Disaster Recovery Plan
To solve performance problems requires a close look at the whole system to identify
potential bottlenecks.
More often problems with performance arise from many different areas like:
hardware configuration
software configuration
application design
Steps To Be Taken
Database audit
Page 21 of 26
Disaster Recovery Plan
production, tape handling, upgrades and patch installations. This may also cover any
site specific procedures that may affect data security and integrity.
Page 22 of 26
Disaster Recovery Plan
corrupted. coupled with responsible person.
database.
Troubleshoot the exact damage (if any) to
the system and recover the system
Page 23 of 26
Disaster Recovery Plan
Database Failure
Datafile(s) may Online the datafile(s) using server
(Scenario 5) become offline and manager tool
inaccessible to the
users
Database Failure
Limited or no space Increase the space of the database by
(Scenario 6) left in database and either increasing the size of the
rollback segments datafile or adding a new datafile
Database Failure
System hangs due to Remove the old archive logs or
(Scenario 7) archival stuck backup them on a separate storage
device
Database Failure Data files may be corrupted, recover
Database could not the specific datafile
(Scenario 8) open and hence not
accessible to users If required restore the datafile from a
backup and then recover the
database
Important Instructions:
System Administration team is assigned the task of recovering the system in case of
a disaster. They should know the following regarding disaster recovery plan:
Properly monitor your system, closely and efficiently, in order to reduce the
chances of a system disaster.
Take immediate actions in case of errors in backup and resolve the backup
errors.
SIEMENS SAP Implementation for PIFRA Prepared By Muhammad Bilal
Page 24 of 26
Disaster Recovery Plan
Proper tagging of the backup tapes should be carried out.
Exact location of all the software necessary for recovering a complete system
should be known.
Take appropriate actions to the problem occurred; some of these have been
already mentioned above.
Check logs and take precautionary actions in order to avoid the same disaster
in future.
Update the disaster recovery plan as per changes in the system architecture,
system activities etc
Page 25 of 26
Disaster Recovery Plan
Present contact persons and numbers (Siemens Pakistan):
Page 26 of 26
Disaster Recovery Plan