Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
0 white paper
Environment: Oracle 10g on Red Hat AS4, HP Integrity rx7620 server, HP ProLiant DL580 G2 server, using HP StorageWorks EVA8000 and EVA5000 storage arrays and HP StorageWorks EML E-Series103e and VLS 6510 libraries
Executive summary............................................................................................................................... 3 Key findings........................................................................................................................................ 3 Overview............................................................................................................................................ 4 Components........................................................................................................................................ 5 Configuring the hardware..................................................................................................................... 5 Hardware statistics ........................................................................................................................... 6 Partitioning the Integrity rx7620 server ............................................................................................... 8 Configuring the Management Processor .......................................................................................... 8 Creating nPars ............................................................................................................................. 8 Defining the zones ........................................................................................................................... 9 Configuring the EVA8000 storage array for primary storage .............................................................. 10 Configuring the EVA5000 for disk backups ...................................................................................... 10 Configuring the HP StorageWorks EML E-Series 103e Tape Library ..................................................... 11 Configuring the HP StorageWorks 6510 Virtual Library System ........................................................... 11 Configuring the software .................................................................................................................... 12 Configuring the QLogic driver ......................................................................................................... 12 Setting up QLogic Dynamic Load Balancing .................................................................................. 12 EVA8000 and EVA5000Active-Active/Active-Passive.................................................................. 12 OCFS2 ......................................................................................................................................... 13 OCFS disk configuration ............................................................................................................. 14 Setting up the OCFS Clustered File Systems ................................................................................... 15 Working with Benchmark Factory .................................................................................................... 16 Oracle parameter changes .......................................................................................................... 18 OLTP workload results..................................................................................................................... 18 Oracle backup and restore ................................................................................................................. 19 NetBackup policies ........................................................................................................................ 19 Oracle templates............................................................................................................................ 20 Setting up the storage units ............................................................................................................. 21 Set the master server jobs global attribute ......................................................................................... 21 Backup issues ................................................................................................................................ 22 Setting up restores.......................................................................................................................... 22
Backup and restore performance results ............................................................................................... 22 Backup methodologies.................................................................................................................... 23 Disk-to-disk backupsBacking up Oracle 10gR2 to EVA5000......................................................... 23 Disk-to-virtual tape backupsBacking up Oracle 10gR2 to VLS ....................................................... 23 Disk-to-tape backupsBacking up Oracle 10gR2 to EML ................................................................ 23 EVA performance results ................................................................................................................. 23 EVA5000 RAW performance characterization............................................................................... 23 EVA8000 RAW performance characterization............................................................................... 23 EVA5000 backup and restore results................................................................................................ 23 EML E-Series backup and restore performance results ......................................................................... 25 VLS6510 backup and restore performance results.............................................................................. 26 Conclusions ...................................................................................................................................... 28 Oracle RMAN ............................................................................................................................... 28 Disk-to-disk backup......................................................................................................................... 28 Disk-to-tape backup ........................................................................................................................ 28 Diskto-VLS backup ........................................................................................................................ 28 Server configuration ....................................................................................................................... 29 Best practices .................................................................................................................................... 29 Best practices for disk-to-disk backups on the EVA5000 storage array.................................................. 29 Best practices for disk-to-tape backups on the EML E-Series Tape Library ............................................... 29 Best practices for disk-to-virtual tape backups on the VLS Virtual Tape Library ........................................ 30 Best practices for using Oracle Recovery Manager (RMAN) ................................................................ 31 Appendix A. Bill of Materials .............................................................................................................. 32 Appendix B. Configuring Oracle RMAN .............................................................................................. 34 Appendix C. Examples....................................................................................................................... 35 Appendix D. Other issues ................................................................................................................... 43 Server OS hangs/crashes ............................................................................................................... 43 Oracle session hangs ..................................................................................................................... 43 NetBackup catalog synchronization ................................................................................................. 43 RMAN not backing up archive logs.................................................................................................. 43 RMAN specific syntax changes........................................................................................................ 43 Imbalanced backups ...................................................................................................................... 43 Poorly streaming backups ............................................................................................................... 44 RAC issues .................................................................................................................................... 44 General Oracle changes................................................................................................................. 44 For more information.......................................................................................................................... 45 HP................................................................................................................................................ 45 Oracle .......................................................................................................................................... 45 Symantec NetBackup 6.0 ............................................................................................................... 45 Quest Benchmark Factory 5.0 ......................................................................................................... 45 Open Source Tools ..................................................................................................................... 45
Executive summary
Many businesses use Oracle databases to store critical data and they need a reliable, robust, and efficient backup and recovery method. Backup and recovery of Oracle databases is a vital part of IT data protection strategies. Recovery times and backup windows are at the core of establishing recovery time objectives (RTOs) and recovery point objectives (RPOs). A faster backup method is required as data grows to maintain these objectives and allow administrators peace of mind that the implemented backup and recovery strategy continues to be viable. The HP StorageWorks Customer Focused Testing Team constructed a Red Hat AS4, Oracle 10g environment with HP StorageWorks storage arrays to represent an enterprise environment. The purpose of the testing was to develop best practices for the backup and restore of an enterprise environment consisting of the deployment of HP StorageWorks 8000 Enterprise Virtual Array (EVA8000), HP StorageWorks 5000 Enterprise Virtual Array (EVA5000), HP StorageWorks 6000 Virtual Library System (VLS6000), and HP StorageWorks Enterprise Modular Library (EML) to provide data protection and recovery operations for Oracle 10g incorporating NetBackup 6. The objectives for the testing, based on actual customer input, included the following: Back up an approximate 2.5-TB Oracle database in 2.5 hours or better (approximately 1 TB/hr) Establish best practices for online backup and restore of Oracle databases Determine impact to a transaction workload while backups are running Provide details of NetBackup 6.0 integration with RMAN
Key findings
Testing successfully provided the following high-level results: Limited the impact to transaction workloads while optimizing database backup and restore times: HP Integrity RAC VLS backup923 GB/hr (approximately 2 hours, 30 mins) HP Integrity rx7620 RAC EML restore453 GB/hr (approximately 5 hours, 20 mins) HP ProLiant DL580 G2 VLS backup100 GB/hr (approximately 5 hours, 30 mins) HP ProLiant DL580 G2 EML restore60 GB/hr (approximately 7 hours, 40 mins) Successfully exemplified different EVA5000 configurations. Successfully determined maximum server workloads and capacities for the DL580 and Integrity rx7620 database servers. Successfully determined best configurations for each backup methodology: Disk-to-disk Disk-to-tape Disk staging Important findings uncovered during the tests are documented in the Best practices section.
Overview
The main purpose of the project was to conduct backup and restore testing using various backup targets in an effort to determine best ways for reducing Oracle database downtime and increase database availability using Symantec NetBackup 6 with RMAN. HP integrated and tested backup and recovery of different Oracle databases with the following objectives: Demonstrate best practices to back up an approximate 2.5-TB database within 2.5 hours Demonstrate NetBackup 6 integration with Oracle RMAN Determine best practices for backup and recovery to tape, virtual tape, and disk Characterize the impact of online backups on application performance Testing included full backup of the databases, utilizing staged backups, and performing full and incremental restores. The backup testing included the database data, control files, and archive logs with and without user load. Two different restore tests were performed. In the first test, the full database was restored after a simulated disaster, such as the loss of an entire storage array. In the second test, incremental restores were conducted. Each time a restore was conducted the database was opened and checked for data integrity by conducting a simulated workload against the database and monitoring the test for errors. Several options for backup and restore were evaluated as were their impact on database recovery, complexity, and recovery speed: Integrity and ProLiant Servers to HP StorageWorks EML E-Series Tape LibraryThe scenario utilized a standard RMAN backup method using an online database and Symantec NetBackup to spool the data directly to tape. The test goal was to measure the backup time and throughput for a quiesced and busy full database backup. Times for the full backups were recorded. Integrity and ProLiant Servers to HP StorageWorks VLS6510 Virtual Tape libraryThis scenario utilized a standard RMAN backup method using an online database and Symantec NetBackup to spool the data directly to a virtual tape library. The test goal was to measure the time taken and throughput for a quiesced and busy full database backup. Times for the full backups were recorded. Integrity and ProLiant Servers to HP StorageWorks EVA5000 storage arrayThis scenario utilized a standard RMAN backup method using an online database and Symantec NetBackup to spool the data directly to another physical disk array. The test goal was to measure the time taken and throughput for a quiesced and busy full database backup. Times for the full backups were recorded. Full restore from HP StorageWorks EML E-Series, VLS6510, and EVA5000 to Integrity and ProLiant ServersThis testing provided data for a full restore using each methodology with RMAN and Symantec NetBackup. The test goal was to measure the recovery time using each method with a mounted control file. Times for each restore were recorded. Incremental restore from HP StorageWorks EML E-Series, VLS6510, and EVA5000 to Integrity and ProLiant ServersThis testing provided data for an incremental restore using each methodology with RMAN and Symantec NetBackup. The test goal was to measure the recovery time using each method with a mounted control file. Times for each restore were recorded.
Components
To run these tests, HP configured the system illustrated in Figure 1. The environment was based on input from customers and is representative of a typical Oracle database environment. The key components include the following: Oracle 10gBenchmark Factory was used to generate load against the Oracle database, which was backed up and restored. HP rx7620 serverThis server was used to host the Oracle database. Two configurations were useda single instance database and a two-node RAC instance. DL580 serverThis server was used to host multiple database instances. EVA8000The primary SAN-based storage array, which held the Oracle database, logs, and so on. EVA5000This storage array was used as a disk-to-disk backup target to show how an older EVA may be re-deployed within existing infrastructure. HP StorageWorks EML E-Series 103e Tape LibraryThe EML was configured as a primary tape backup and restore device. VLS6510The VLS was configured as a primary tape backup and restore device. Red Hat AS4 operating systemEnterprise Linux operating system used on both the rx7620 and DL580 servers. Symantec NetBackup 6.0 was used as the backup application. Benchmark Factory was used to create the OLTP data in the databases and simulate 500- and 1,000-user workloads.
Note: At the time of this writing, the rx7620 server was obsoleted by the rx7640 server. The Best Practices outlined in this document are still pertinent.
Hardware statistics
The hardware involved in the configuration of this environment includes: Database Server 1 and 2 HP Integrity rx7620 server 8x IA64 1.6-GHz Processors in one partition 8x QLogic-based HP A6286A Dual Port FC HBAs 64-GB RAM Two cells and two partitions Database Server 3 HP ProLiant DL580 G3 server 4x Intel Xeon 3.0-GHz Processors 2x QLogic-based HP FC2214A Dual Port FC HBAs 16-GB RAM Storage EVA8000 (primary) 144x 300-GB FC drives Dual Controllers (Active/Active) One Disk Group (FC) EVA5000 (D2D, secondary) 56x 250GB FATA Drives Dual Controllers (Active/Active) One Disk Group (FATA) Tape devices EML E-Series103e Tape Library 4x LTO3 drives 24x tapes at 400-GB Native 2 FC paths VLS6510 24x 250-GB SATA drives Emulating 12x LTO3 drives with four FC paths 49x tapes at 100-GB each
Creating nPars The system was prepared for partitioning and then partitioned:
1. If no partition exists, a new complex must be created with the cc command. Choose cell 0
Use the rr command to reset the partition. Use the rs command to restart the partition. Create a new complex with the cc command. Choose cell 0 and save the configuration.
-u Admin h <IPADDRESS>
This created a partition consisting of a single Cell, and an OS was loaded onto the system. After an OS is loaded, you can install the nPar command line utilities and connect to the MP to create the second partition. Alternatively, you can install the nPar utilities on a server running Linux, HP-UX, or Windows and create the second partition from a remote host. To increase the process, one partition was created to provide access to all the on-board SCSI disks, and an OS was loaded on the first SCSI disk. The first SCSI disk was then duplicated using the dd utility to the remaining disks. Upon completion of the duplication, the system partitions were re-set and the rx7620 server was repartitioned into two partitions consisting of one cell each. Each new server partition now had an identical bootable OS because of the disk duplication effort. Alternatively, a network install could have been performed on each partition if a PXE server was set up. Since no PXE server was used, disk duplication was the simplest method of preparing the OS disks for each server partition.
10
11
12
Figure 2. The /etc/hp_qla2300.conf file qdepth = 16 port_down_retry_count = 30 login_retry_count = 30 failover = 1 load_balancing = 2 auto_restore = 0x80
A value of 2 was used for Least Recently Used (LRU). In this way, the paths selected were balanced across the HBAs and switches automatically for the EVA8000 and load balanced across HBAs to the same controller within the same switch for the EVA5000.
OCFS2
OCFS is the Oracle Clustered File System and was required when using the RAC configuration where a file system must be employed. OCFS provided a Distributed Lock Management (DLM) facility to coordinate writes to the files contained within. OCFS version 2 was used for this environment and is currently the only supported version of OCFS.
Note OCFS, ASM, and RAW devices can all be used with RAC depending on platform and business requirements.
13
OCFS disk configuration The Virtual Disks, VDisks, are zoned so both hosts have visibility to all LUNs for use with RAC. The single LUN containing u06 and u07 are the flash recovery area and Voting and CRS files, respectively. Each of the LUNs has a single partition, which is then formatted OCFS. Figure 3 shows the hosts and LUN visibility. For OCFS format information, see Figure 4.
14
Setting up the OCFS Clustered File Systems Each of the five EVA virtual disks was formatted as an OCFS file system with the defaults listed in Figure 4.
The OCFS2 Console can simplify the formatting and configuration of the OCFS filesystems and helps set up lock management and OCFS heartbeat. The cluster size was set at 64 K with a blocksize of 4 K. Likewise the command line is still valuable to mount OCFS filesystems. If this is a RAC and the file system will be used to store the Voting Disk file (CRS), Oracle Cluster Registry (CRS), Data files, Redo logs, Archive logs or Control files, the file system should be mounted with datavolume and nointr options. For example: # mount o _netdev,datavolume,nointr /dev/cciss/c0d7p1 /data The default /etc/fstab entries were modified to include the _netdev and datavolume options: LABEL=CFT106_DATA1 /u02 ocfs2 _netdev,datavolume,nointr 0 0
The datavolume option is necessary when putting data files on an OCFS volume. Prior versions of OCFS would not allow data files to be placed on OCFS volumes. The only allowed file types were the shared Oracle Home. The datavolume option was not added until OCFS version 2.
Note All of the OCFS2 file systems need to have the _netdev option specified. This guarantees that the network is started before mounting the file systems and unmounted before the network is stopped.
15
For more information, see the OCFS2 Users Guide at: http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_users_guide.pdf
After creating the database, the Oracle spfile parameters were tuned to generate best performance for backup during workloads. Refer to Table 5 for the specific Oracle parameters that were used during the testing. These are the optimal settings for this environment. Benchmark Factory scale factors are approximate and should not be used as absolute guides. The following example should make approximately 930 GB of data when in fact this scale factor created 1.3 TB of data and 300 GB of indexes. Although this seems somewhat greater than the target size, it does not include the actual indexes since size will vary depending on database block size.
16
The scale factor shown is an estimate as stated. The actual size of the database must include indexes as well. This may be as much as 33% of the total data size after the data has been completely generated. The generated data alone may be as much as 20% greater than the estimated size. Table 4a and Table 4b show the table and index parameters, respectively.
Table 4a. Benchmark Factory Table/index settings Object C_ORDER_LINE C_STOCK C_DISTRICT C_CUSTOMER C_HISTORY C_ORDER C_ITEM C_WAREHOUSE C_NEW_ORDER Type Table Table Table Table Table Table Table Table Table Creation Parameters tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring tablespace bmf parallel (degree default instances default) nologging cache monitoring
17
Table 4b. Benchmark Factory Table/index settings Object C_STOCK_I1 C_WAREHOUSE_I1 C_NEW_ORDER_I1 C_CUSTOMER_I2 C_ORDER_LINE_I1 C_ORDER_I1 C_ITEM_I1 C_DISTRICT_I1 C_CUSTOMER_I1 Type Index Index Index Index Index Index Index Index Index Creation Parameters tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics tablespace bmf parallel (degree default instances default) nologging compute statistics
Oracle parameter changes Table 5 lists the parameter changes that were made and the defaults. These changes were made to accommodate for the maximum user load and performance during workloads while backups were occurring.
Table 5. Changed Oracle parameters Oracle parameter sort_area_size parallel_max_servers parallel_threads_per_cpu db_files Processes Dbwr_io_slaves Tape_io_slaves db_file_multiblock_read_count Cusor_sharing Default 65536 15 2 200 150 0 False 8 Exact Used 262144 2048 8 1024 1250 4 True 128 force
18
The IA64 RAC system performed best providing no negative impact to users regardless of backup activity. The IA64 Single system also shows there would be almost no impact to users and is operating at half the transaction rate of the entire RAC. The IA32 Multi system also shows a flat performance curve with the exception of the EML backup, which dropped by approximately 6 TPS. This was mainly due to the I/O wait incurred by the system, which impacted the TPS. This could be eliminated by adding more RAM or more HBAs. The next section outlines the backup and restore performance of each methodology in contrast to the OLTP workload.
NetBackup policies
NetBackup policies determine the way a backup is executed. While each of the backup policies has similar attributes, a template or script is the ultimate influence as to how the backup will be performed. Since the Policy Type used is Oracle, the EML, VLS, and disk backups all share common policy settings. Policy settings used include: Policy settings Policy TypeSet to Oracle Policy Storage UnitSelect the appropriately configured storage unit for the Volume Pool Policy Volume PoolSet to the volume pool associated with the storage unit device ScheduleTwo schedule types must exist if automated backups are to work, an Automatic Backup type and an Application Backup type. The defaults are Full and Default-Application-Backup. ClientsThis is a list of clients to be backed up by this schedule. At least one client must be defined on an active policy. Backup SelectionsThis can be a template or script. Multiplexing cannot be set in the Oracle Policy Type since RMAN already performs a level of multiplexing and can be specified in the template.
19
Oracle templates
On the NetBackup Media Server you can create a template using the Java NetBackup Admin Console as root or a defined administration user: # /usr/openv/netbackup/bin/jnbsa When the admin console starts, select the Backup Files tab and then select the checkbox next to the database name. A dialog box will appear and ask for database connection information. When logged in, the Backup button as well as the rest of the database objects that can be backed up will display. Selecting the entire database implies a database backup, but granular objects can be selected as well. Clicking the Backup button starts the Backup Wizard. The following information must be provided: Authentication Select System or Oracle Select RMAN catalog if you are using a catalog Archived Redo Logs Include archived redo logs and specify the range, if any Delete archived logs after they are backed up, if desired Configuration Default Existing template Backup options Backup file name formatSet the format you want to use for backup files Backup set identifierSet the identifier you want the template to employ, if any Database state Online Offline Configuration variables Backup policy nameAn Oracle policy name on the EMM server Schedule nameA schedule name of type Application Backup on the EMM server Server nameThe EMM server Client nameThe Media Server or remote Oracle client system Backup limits RMAN defaultsThe currently configured RMAN defaults Specify maximum limits I/O limitsRead Rate, size of backup piece, number of open files Backup Set LimitsFilesPerSet, MaxSize, ArchivelogMaxSize I/O outputNumber of streams (channels) to configure for the backup The last step is to run the backup, save the backup to a template, both, or cancel. Figure 6 shows the initial screen after entering the database credentials.
20
21
Backup issues
Since each environment is different, know that issues may arise and require patches or workarounds to allow proper backup execution. One example encountered was shortly after beginning tests: archivelogs would not backup properly after full backups. Expiring all media and performing the backup again would allow the archivelogs to be backed up once, but not after. Since this is not an optimal situation, applying MP2 was required to fix the issue. For other issues, see Appendix D. Other issues.
Setting up restores
Three options exist when performing restores to the Oracle database servers in this environment. Create a restore job on the Media Server with the Java NetBackup Admin Console. Manually create a script on the Media Server and execute the restore with RMAN at the command line. Copy a backup template and modify the RMAN script portion to perform restores instead of backups. You can then create a backup policy but use the restore template. For files backed up on one machine that were to be restored to another machine, a parameter must be set in the bp.conf file located in the /usr/openv/netbackup directory. The line to be added must be in the following format: FORCE_RESTORE_MEDIA_SERVER = backup_server,restore_server Where the backup_server is the host name of the server that performed the backup and the restore_server is the host name of the server that needs to perform the restore. RMAN and NetBackup automated the restore process of de-multiplexing files and mounting the required backup images, regardless of media type. This is especially helpful with disk backups and disk staging.
Note: For the restore to be successful, the controlfile from the last backup must be available or you must have a RMAN catalog configured.
22
Backup methodologies
Disk-to-disk backupsBacking up Oracle 10gR2 to EVA5000 For this test, RMAN performed a tape backup and NetBackup translated the data streams into files and wrote them to the defined Disk Storage Units. The I/O load was balanced across HBAs and controller ports. Disk-to-virtual tape backupsBacking up Oracle 10gR2 to VLS For this test, RMAN performed a tape backup using 12 streams to the VLS. The I/O load was balanced across HBAs and controller ports. Disk-to-tape backupsBacking up Oracle 10gR2 to EML For this test, RMAN performed a tape backup using four streams to the EML. The I/O load was balanced across HBAs and controller ports.
23
At first these results might not mean much, but if you examine the backup time, you can understand if this meets the backup window requirements. The target of approximately 1 TB per hour was missed. For comparison, Table 10 shows the restore performance for this EVA5000 configuration.
Table 8. Backup results for disk-to-disk backups using EVA5000, VCS 4.001, without OLTP load Host type IA64 RAC IA64 Single IA32 Multi Database size (GB) 2400 1320 600 Backup LUNs 2 1 1 Backup time (hrs:min) 5:47 3:19 4:15 Channels 12 12 8 Backup rate (GB/hr) 415 400 140
The IA64 systems performed best overall but the target of approximately 1 TB/hr was missed. When contrasted with the results during OLTP workload, you can see that performing a backup during peak hours is not advised if trying to meet specific backup times. One of the issues causing the results in both of these backups is the firmware. Table 9 shows the same type of backup but with VCS 3.028 and cache mirroring disabled for the backup LUNs.
Table 9. Backup results for disk-to-disk backups using EVA5000, VCS 3.028, without OLTP load Host type IA64 RAC IA64 Single IA32 Multi Database size (GB) 3090 1373 750 Backup LUNs 2 1 1 Backup time (hrs:min) 4:24 2:46 2:10 RMAN channels 12 12 8 Backup rate (GB/hr) 702 508 245
Using the backrev firmware improved backup times dramatically, nearly doubling the throughput for each system. The largest contributing factor was turning off controller cache mirroring.
Table 10. Restore results for disk-to-disk backups using EVA5000, VCS 4.001, offline restore Host type IA64 RAC IA64 Single IA32 Multi Database size (GB) 2400 1300 750 Backup LUNs 2 1 1 Restore time (hrs:min) 5:54 2:05 7:36 RMAN channels 12 12 8 Restore rate (GB/hr) 406.78 624.00 87.21
The two IA64 systems performed best during restores, and much better than the IA32 system overall. The RAC system did not seem to perform as well on the restore as the Single instance database. The RAC system had two factors causing these results. The RAC system had imbalanced backups because of NetBackup. The disk I/O was shared on the same HBA for backups. The IA32 system performed poorly due to two similar factors: The disk I/O was shared on the same HBA for backups. Bigfile tablepsaces were used. The side effect of these were long backups and restores using only one channel because of the limited capabilities of the hardware and the bigfile tablespace used to store the entire 150-GB database.
24
From these results, you can conclude: Using bigfile tablespaces can hamper overall restore results if you do not have enough tablepsaces to spread across multiple devices. Using 4.001 or higher code and RAID1 can yield the best protection from failures by eliminating the capability to turn off cache mirroring and not taxing controllers with RAID5 parity overhead. Using 3.028 firmware and RAID0 LUNs yields the best performance by allowing you to turn off cache mirroring, but eliminates any possible recovery from failures. Properly balancing datafiles/tablespaces to RMAN channels and tape devices yields the best results.
The two IA64 backups were similar in their rates. There is still an imbalance due to NetBackup, but it is more streamlined on the system bus and there is no contention from the EVA5000 cache. The cause and fix for this imbalance is discussed in Appendix C. Examples. The IA32 system still has fairly low performance overall, but better than disk-to-disk. This was because one of the limitations was removed by backing up from one FC Port to two separate FC Ports. Congestion at the system bus or EVA5000 controller cache was not an issue. Table 12 shows backup results for the EML while not under load.
Table 12. Backup results for disk-to-tape backup using EML 103e, without OLTP load Host type Database size (GB) 2400 1300 600 Backup LUNs Backup time (hrs:min) 5:24 2:13 3:20 RMAN channels Backup rate (GB/hr) 4 4 4 441.72 586.67 180
2 1 1
25
The performance characteristics of this test again look different than when performing a backup under load. Each of the backups for the servers had better performance than the previous backup results in Table 11. This again reinforces the fact that a backup while under heavy load is not advised. The IA32 Multi system saw the greatest percentage increase in performance, though it is not at the level of the IA64 systems.
Table 13. Restore results for disk-to-tape backup using EML 103e, offline restore Host type IA64 RAC IA64 Single IA32 Multi Database size (GB) 2400 1300 600 Backup LUNs 2 1 1 Backup time (hrs:min) 3:25 1:42 2:25 RMAN channels 4 4 4 Restore rate (GB/hr) 702.44 764.71 248.28
Each of these results is fairly good for each server. Each of the restores is done in an offline manner, so each of the servers has more resources available for the restore than they would for the backup. The IA32 system could yield even higher performance if more bigfile tablespaces were employed, or smallfiles were used for the single 150-GB tablespace. From these results, you can conclude: Tape backup is very sensitive to the layout of the data paths and device visibility. As long as you have proper size files to back up, tape streaming can perform at decent levels of I/O. Tape restores are fast, while backups are highly impacted if poor channel balancing occurs.
The backup results show the high throughput the VLS can deliver. The approximate 1-TB/hr target on the RAC system is met. Each system had great improvement over disk and tape though during backup.
26
The VLS6510 allows current tape users to enjoy the speed of disk. Since the VLS is emulating any type of tape device and multiple libraries, you could use this system for consolidation when moving to one type of tape media and devices.
Table 15. Backup results for disk-to-tape backup using VLS6510, with OLTP load Host type IA64 RAC IA64 Single IA32 Multi Database size (GB) 2400 1320 425 Backup LUNs 2 1 1 Backup time (hrs:min) 2:44 2:43 3:45 RMAN channels 12 12 8 Restore rate (GB/hr) 883.44 484.47 113.21
Even though these backup results are also very close to the approximate 1-TB/hr target, there is a large decrease in throughput during workloads. The single largest percent decrease is seen by IA32 Multi, with the IA64 Single following closely behind. This further reinforces that backups during peak workloads should be avoided. Table 16 shows the VLS restore results.
Table 16. Restore results for disk-to-tape backup using VLS6510, without OLTP load Host type IA64 RAC IA64 Single IA32 Multi Database size (GB) 2400 1320 600 Backup LUNs 2 1 1 Backup time (hrs:min) 9:25 2:46 1:48 RMAN channels 12 12 8 Restore rate (GB/hr) 254.87 477.11 333.33
These results show some startling results. The first of which is the low performance of the restore on IA64 RAC when compared to the backup results. The reason this occurred is the backup imbalance mentioned earlier in the paper. When a scripted backup is performed ensuring no RMAN script modification by NetBackup, these results would be similar instead of highly contrasted as seen here. The backup performance masks the issue but the restore is impacted. The same is true for the IA64 Single system, but not to as high a degree. Because a single LUN on IA64 Single is being written to, there is not as great an impact. Because IA64 RAC is restoring to two LUNs, tape channels are being waited on to complete and so the restore finishes with a lesser degree of parallelism than seen on IA64 Single. The IA32 Multi performed very well for the VLS restore considering that bigfile tablepsaces were being used and were again serialized. If several smallfiles or many bigfile tablespaces were to be used, this restore could be improved. From these results, you can conclude: Backups are sensitive to the layout of the data paths, device visibility, and the ratio of the number of files to RMAN device channels. The VLS is very easy to use and maintain allowing administrators to incorporate a speedy method of backup into any tape environment. This allows shorter backup windows than traditional tape. Configuring many emulated devices on the VLS is essential in achieving maximum performance from the VLS. Using four MSA20 disk shelves, the maximum for the VLS6510, is a must where high throughput backups are required on the VLS6510. It is also essential when emulating multiple types of libraries, tape devices, and media.
27
Conclusions
The data derived for the workload, backup, restore, and EVA5000 VCS tests provide an understanding to important things about each methodology discussed.
Oracle RMAN
Using few bigfile tablespaces can hamper overall restore results if you do not have enough tablepsaces to spread across all backup devices. Preventing poorly balanced channels will improve backup performance. Conducting backups under load is not advised where short backup windows are required.
Disk-to-disk backup
Using 4.001 or higher code and RAID1 can yield the best protection from failures by eliminating the capability to turn off cache mirroring and not taxing controllers with RAID5 parity overhead. Using 3.028 firmware and RAID0 LUNs yields the best performance by allowing cache mirroring to be disabled, but has potential to corrupt the backup if a failure occurs. Using the same HBA for disk backups can cause contention on the PCI bus resulting in slower backups. The EVA5000 VCS 4.00x cannot disable use of the cache mirror port, which is used for more than just host LUN cache mirroring. The need for Active/Active versus Active/Passive capabilities should be weighed before implementation.
Disk-to-tape backup
Tape backup is sensitive to the layout of the data paths and device visibility. Having proper size files to back up will allow tape streaming to perform well. Using OS and hardware tape buffering as well as enabling hardware compression with a proper block size will allow the tape device to perform at its peak. Tape restores are fast, while backups are highly impacted if poor channel balancing occurs.
Diskto-VLS backup
Backups are sensitive to the layout of the data paths, device visibility, and the ratio of the number of files to RMAN device channels. The VLS is very easy to use and maintain allowing administrators to incorporate a speedy method of backup into any tape environment. This allows shorter backup windows than traditional tape. Configuring many emulated devices on the VLS is essential in achieving maximum performance from the VLS. Using four MSA20 disk shelves, the maximum for the VLS6510, is a must where high throughput backups are required on the VLS6510. It is also essential when emulating multiple types of libraries, tape devices, and media.
28
Server configuration
The DL580 server system was very loaded as a multi-database server. A DL580 server with more memory and HBAs would be advised if performing high levels of transactions to a server of this type. System bus contention should be avoided by ensuring servers have enough HBAs or PCI-X busses. If too much swapping occurs during backup, add more RAM to your system until swapping reduces to acceptable levels. The latest QLogic driver as of August 11, 2006, supports dynamic load balancing; earlier versions do not. This is important because servers can avoid possible I/O contention causing better backup and/or restore performance.
Best practices
During testing, several best practices were developed to improve backup and recovery performance for each scenario.
Use as many disks as possible for a given Disk Group. Create Disk Groups of at least 56 FATA disks for highest throughput per disk group Use two or more LUNs for each host to spread data streams across controllers for improved bandwidth
Best practices for disk-to-tape backups on the EML E-Series Tape Library
Several settings were set to achieve optimum performance. These settings should be tuned for the specific environment, and when possible, validated first within a test environment. The following is a list of the important tuning options used to achieve good performance during the backup to tape. Buffer configurationThe bulk of the NetBackup tuning was done at this level. There were three touch files that were used within NetBackup to achieve better levels of performance, with respect to the use of memory buffers. These files will typically be located in the /usr/openv/netbackup/db/config directory. A document that explains buffer configuration for NetBackup can be found at http://seer.support.veritas.com/docs/183702.htm. Another recommended document regarding buffer configuration for LTO3 and NetBackup can be found at http://h71028.www7.hp.com/ERC/downloads/5982-9971EN.pdf.
29
Block settingsThe OS level tape block settings were configured using the /etc/stinit.def file to specify settings for the LTO tape devices. The following settings were used in the test environment: NUMBER_DATA_BUFFERS: The number of buffers used by NetBackup to buffer data before sending it to the tape drives. The default value is 16 and was set to 32. SIZE_DATA_BUFFERS: The size of each buffer setup multiplied by the NUMBER_DATA_BUFFERS value. The default value is 65536 and was set to 262144. NUMBER_DATA_BUFFERS_RESTORE: The number of buffers used by NetBackup to buffer data before writing it to the disk. The default value is 16 and was set to 32. (optional) Blocksize: This is a stinit.def setting. This is used by stinit to set each tape devices defaults. A setting of 0 is used so that the blocksize is automatically determined at write time. Drive-buffering: This is a stinit.def setting. This is used by stinit to set each tape devices defaults. Adding this to the stinit device definition will enable hardware buffering for the LTO3 tape device (this parameter can only be used if the drive is buffer capable).
Best practices for disk-to-virtual tape backups on the VLS Virtual Tape Library
Several settings were set to achieve optimum performance. These settings should be tuned for the specific environment, and when possible, validated first within a test environment. The VLS6510 uses four MSA20 disk arrays to emulate tape devices. Buffer configurationThe bulk of the NetBackup tuning was done at this level. There were three touch files that were used within NetBackup to achieve better levels of performance, with respect to the use of memory buffers. These files will typically be located in the /usr/openv/netbackup/db/config directory. A thorough document that explains buffer configuration for NetBackup can be found at http://seer.support.veritas.com/docs/183702.htm. Another recommended document regarding buffer configuration for LTO3 and NetBackup can be found at http://h71028.www7.hp.com/ERC/downloads/5982-9971EN.pdf. Block settingsThe OS level tape block settings were configured using the /etc/stinit.def file to specify settings for the LTO tape devices. Number of devicesThe number of devices emulated can contribute greatly to the overall performance of the VLS. The best performing configuration is generally a 3 or 4:1 ratio of emulated devices to MSA20 disk shelves attached to the VLS Interface Controller. Following are the settings that were used in the test environment: NUMBER_DATA_BUFFERS: The number of buffers used by NetBackup to buffer data before sending it to the tape drives. The default value is 16 and was set to 32. SIZE_DATA_BUFFERS: The size of each buffer setup multiplied by the NUMBER_DATA_BUFFERS value. The default value is 65536 and was set to 262144. Blocksize: This is a stinit.def setting. This is used by stinit to set each tape devices defaults. A setting of 0 is used so that the blocksize is automatically determined at write time. Drive-buffering: This is a stinit.def setting. This is used by stinit to set each tape devices defaults. Adding this to the stinit device definition will enable hardware buffering if the drive is capable.
30
31
3.0-GHz CPU GB Memory FCA2214 (dual port HBA) StorageEVA8000 EVA8000 (2C12D) 300-GB FC Disk (NDSOMEOMER) HP StorageWorks SAN 2/16N switches Brocade SilkWorm 3800 SAN switches HP OpenView Storage Management Appliance III 1 144 2 2 1 2
32
Disk-to-disk backup targetEVA5000 EVA5000 (2C8D) 250-GB FATA Disk (ND25058238) Disk-to-virtual tape backup targetVLS6510 VLS6510 250-GB SATA Disk (ND12341234) HP OpenView Command View TL Disk-to-tape backup targetEML 103e EML E-Series 103e Tape Library Ultrium 960 LTO-3 drives (ND25058238) 1 4 V3.020 HP01 1 48 1 V3.020 HP02b V3.2 1 56 V4.001 and V3.028 HP01
33
Figure 7. RMAN configuration defaults CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default CONFIGURE BACKUP OPTIMIZATION OFF; # default CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE MAXSETSIZE TO UNLIMITED; # default CONFIGURE ENCRYPTION FOR DATABASE OFF; # default CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/10.2.0/db_1/dbs/snapcf_ORDB1.f'; # default
You may want to modify some of the preceding defaults, in particular the Backup Optimization, Default Device Type, Controlfile Autobackup, Parallelism, and Archivelog Deletion Policy. The following are examples of suggested changes to these settings: Enable Backup Optimization If you plan to use several incrementals and merge them. Only changed blocks will be backed up since the last backup. Not very useful for Full backups. Default Device Type The default device type may need to be a tape library or worm drive, so setting this may relieve some scripting. Controlfile Autoackup This is highly useful to ensure a controlfile backup is done often. Parallelism When writing backup sets, this will stream multiple files together to the same channel if set to a value greater than one. Archivelog Deletion Policy Setting this can ease management of scripts since you can set the archivelogs to be deleted at a predefined interval.
34
Appendix C. Examples
This section shows RMAN scripts examples, NetBackup templates, and NetBackup Screenshots.
Figure 8. RMAN Full Backup script (four channels configured) RUN { ALLOCATE CHANNEL ch00 ALLOCATE CHANNEL ch01 ALLOCATE CHANNEL ch02 ALLOCATE CHANNEL ch03 TYPE 'SBT_TAPE'; TYPE 'SBT_TAPE'; TYPE 'SBT_TAPE'; TYPE 'SBT_TAPE';
SEND 'NB_ORA_CLIENT=PRIM1,NB_ORA_SERV=EMMSRV,NB_ORA_POLICY=PRIM1-EML,NB_ORA_PC_SCHED=DefaultApplication-Backup'; BACKUP INCREMENTAL LEVEL=0 FORMAT 'Data_Plus_Arch_%d_u%u_s%s_p%p_t%t' TAG 'DB1 Full Standby Backup' DATABASE PLUS ARCHIVELOG; RELEASE CHANNEL ch00; RELEASE CHANNEL ch01; RELEASE CHANNEL ch02; RELEASE CHANNEL ch03; ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'; SEND 'NB_ORA_CLIENT=PRIM1,NB_ORA_SERV=EMMSRV,NB_ORA_POLICY=PRIM1-EML, \ NB_ORA_SCHED=Default-Application-Backup'; BACKUP FORMAT 'STBYCTLFILE-_%d_u%u_s%s_p%p_t%t' CURRENT CONTROLFILE FOR STANDBY; RELEASE CHANNEL ch00; }
35
Figure 9. RMAN Duplicate script run { # Auxiliary channels are the only way to restore a database as a duplicate allocate auxiliary channel ch00 device type 'sbt_tape'; allocate auxiliary channel ch01 device type 'sbt_tape'; allocate auxiliary channel ch02 device type 'sbt_tape'; allocate auxiliary channel ch03 device type 'sbt_tape'; SEND 'NB_ORA_CLIENT=STBY1,NB_ORA_POLICY=STBY1-EML,NB_ORA_SERV=EMMSRV, \ NB_ORA_SCHED=Default-Application-Backup'; duplicate target database for standby; release channel ch00; release channel ch01; release channel ch02; release channel ch03; }
36
Figure 10. NetBackup 6 Oracle template #^oracle template configuration file <<MUST BE FIRST IN FILE, DO NOT REMOVE>> # Template level: 1.9.0 # Generated on: 06/28/06 16:01:13 # ----------------------------------------------------------------TEMPLATE_ID1=<SOURCE TEMPLATE> TEMPLATE_ID2=<CURRENT TEMPLATE> TEMPLATE_OWNER=root RUN_AS_USER=oracle
# ----------------------------------------------------------------# BACKUP_TYPE is derived from the schedule type when this script # is used in a NetBackup scheduled backup. For example, when: BACKUP_TYPE=INCREMENTAL LEVEL=0 ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1 ORACLE_SID=PRIM1 TARGETDB_LOGIN=sys TARGETDB_PASSWD=<SHA128 Encoded Password> TARGETDB_TNSNAME=PRIM1 # ----------------------------------------------------------------# RMAN command section # ----------------------------------------------------------------RUN { ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'; SEND 'NB_ORA_CLIENT=Client1,NB_ORA_POLICY=Oracle-Policy, \ NB_ORA_SERV=EmmServer,NB_ORA_SCHED=DefaultApplication-Backup'; BACKUP INCREMENTAL LEVEL=0
37
FILESPERSET 1 MAXOPENFILES 8 FORMAT 'bk_u%u_s%s_p%p_t%t' DATABASE; RELEASE CHANNEL ch00; # Backup Archived Logs sql 'alter system archive log current'; ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'; SEND 'NB_ORA_CLIENT=Client1,NB_ORA_POLICY=Oracle-Policy, \ NB_ORA_SERV=EmmServer,NB_ORA_SCHED=DefaultApplication-Backup'; BACKUP FORMAT 'arch-s%s-p%p-t%t' ARCHIVELOG ALL DELETE INPUT; RELEASE CHANNEL ch00; # Control file backup ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE'; SEND 'NB_ORA_CLIENT=Client1,NB_ORA_POLICY=Oracle-Policy, \ NB_ORA_SERV=EmmServer,NB_ORA_SCHED=DefaultApplication-Backup'; BACKUP FORMAT 'bk_u%u_s%s_p%p_t%t' CURRENT CONTROLFILE; RELEASE CHANNEL ch00; }
38
39
40
41
Figure 14. Stinit.def configurations for the EML and VLS # HP Ultrium 960 LTO-3 devices on the EML E-Series 103e manufacturer="HP" model="Ultrium 3-SCSI" revision="L29S" { scsi2logical=1 # Common definitions for all modes
can-bsr drive-buffering can-partitions auto-lock buffer-writes async-writes read-ahead compression timeout=800 long-timeout=14400 mode1 blocksize=0 density=0x00 }
# HP Ultrium 960 LTO-3 devices emulated on the VLS 6510 manufacturer="HP" model="Ultrium 3-SCSI" revision="R138" { scsi2logical=1 # Common definitions for all modes can-bsr drive-buffering can-partitions auto-lock buffer-writes async-writes read-ahead timeout=800 long-timeout=14400 mode1 blocksize=0 density=0x00 compression=0 }
42
Server OS hangs/crashes
Issue: System hangs under high load. Resolution: Upgrade from AS4 U1 to U3.
Imbalanced backups
Issue: Allocated channels do not back up data evenly resulting in overall decrease in performance of the backup. Resolution: Use FilesPerSet, DiskRatio, MaxSetSize, or MaxPieceSize arguments to create more balanced backupsets.
43
RAC issues
Issue: OCFS2 timeouts under load. Resolution: Set default timeout value greater than seven.
44
Oracle
Backup and Recovery Best Practices Guide http://www.oracle.com/technology/deploy/availability/pdf/S942_Chien.doc.pdf Backup and Restore Overview http://www.oracle.com/technology/deploy/availability/htdocs/BR_Overview.htm
2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel, Xeon, and Itanium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. Java is a US trademark of Sun Microsystems, Inc. Oracle is a registered US trademark of Oracle Corporation, Redwood City, California. 4AA0-8102ENW, October 2006