Sei sulla pagina 1di 42

SEOUC 2011 November 4th, 2011

How to Solve the Wrong Problem


(DFS Lock Handle)

Romeo Vasileniuc BB&T Specialized Lending

About Romeo..

I started working with Oracle database since 1994 I've been involved in about all aspects of Oracle database technologies including RAC, ASM, Data Guard and Streams Designed and implemented many different varieties of high available database environments using RAC on ASM, OCFS2 and Tru64 CFS I enjoy performance tuning Proficient Perl developer In my current role as data warehouse architect with BB&T, I have architected and implemented many business-driven solutions using Oracle and other vendor products to meet critical business needs especially in warehousing area Oracle Certified Master, OCP, OCE

Agenda

DWH Story DFS Lock Handle Oracle DB Storage Overview Direct I/O Sync/Async Mode Supported Platforms System Monitoring Oracle testing tools Orion Q&A

DWH Story
Tru64 10.2.0.4 RHEL5 10.2.0.4

TDWH Service

DWH Service

TDWH01

TDWH01
TTS

DWH01

DWH01

Storage Vendor A

Storage Vendor B

9.2.0.3 >> 10.2.0.4

10.2.0.4 >> 10.2.0.5

TDWH Service

DWH Service

TDWH01

TDWH01

DWH01

DWH01

Storage Vendor A
Tru64 9.2.0.3

Storage Vendor B
RHEL5 10.2.0.5

DFS Lock Handle

Parameter name

Description
The name or "type" of the enqueue or global lock can be determined by looking at the two high order bytes of P1 or P1RAW. The name is always two characters. Use the following SQL statement to retrieve the lock name.
select chr(bitand(p1,-16777216)/16777215)|| chr(bitand(p1,16711680)/65535) "Lock" from v$session_wait where event = 'DFS enqueue lock acquisition';

mode

The mode is usually stored in the low order bytes of P1 or P1RAW and indicates the mode of the enqueue or global lock request.
select chr(bitand(p1,-16777216)/16777215)|| chr(bitand(p1, 16711680)/65535) "Lock", bitand(p1, 65536) "Mode" from v$session_wait where event = 'DFS enqueue lock acquisition';

id1 id2

The first identifier (id1) of the enqueue or global lock takes its value from P2 or P2RAW. The meaning of the identifier depends on the name (P1). The second identifier (id2) of the enqueue or global lock takes its value from P3 or P3RAW. The meaning of the identifier depends on the name (P1).

DFS Lock Handle 10.2.0.4

Bug 8215444 Sessions may hang in a RAC shared server environment while waiting for invalidation locks to be acquired. This is most likely to occur when using Shared Servers with RAC - it should be very rare with dedicated server connections. Rediscovery Notes: In a RAC environment, sessions wait for "DFS lock handle" with the lock being waited on being an invalidation lock (type "IV") Fixed in: 10.2.0.5 11.2.0.2 12.1 (future release)

File System Characteristics

Performance is not the most important point Oracle does not support files on file systems that do not have a write-through-cache capability The file system must acknowledge the write operations (Standard NFS UDP / Network Appliance modified NFS) Security Requirements : data files should be accessible only for the database owner Journaling file systems changes are recorded in a journal file (fsck more quickly compared to non-journaled file systems)

Supported File systems

Raw Partitions
Raw reads and writes do not use the OS buffer cache Can move larger buffers that the file system I/O Requires more experienced administration

Ext2, ext3, ext4


Ext3 is a journaled files systems. Ext4 larger file systems, better performance

ReiserFS
Default file system for Novell/SuSE Linux

Oracle Cluster File System (OCFS/OCFS2)


Designed for use with RAC Performance 2-5% slower than Raw devices OCFS2 extent based, POSIX compliant file system. General purpose (shared Oracle home installation)
8

Recommended File Systems

Single Node
Any file systems supported by the Linux vendor

Multi-node (RAC)
RAW OCFS/OCFS2 Redhat Global File System (GFS) See Document 329530.1 NFS-based storage systems (e.g. NetApp, EMC) ASM

Oracle Memory/Disk Workflow

Buffer Cache

CBC

CBC

User Process

Redo Log Buffer

Redo Log Writer

DB Writer

Storage

10

Linux Write Operations Ext3

Oracle Kernel fd1 = open(/system01.dbf, O_SYNC|..) Kernel Space kernel switch kernel switch Kernel

Oracle Process ssize_t = write(fd1, const void *buf, size_t count)

Buffer Cache

Page Cache

User Space

I/O

Disk

11

Linux Read Operations Ext3

Oracle Kernel fd1 = open(/system01.dbf, O_SYNC|..); Kernel Space kernel switch kernel switch Kernel

Oracle Process ssize_t = read(fd1, const void *buf, size_t count)

Buffer Cache

Page Cache

I/O

User Space Disk

12

Linux Write Operations RAW, OCFS

Oracle Kernel fd1 = open(/system01.dbf, O_SYNC|O_DIRECT|..); Kernel Space kernel switch kernel switch Kernel

Oracle Process ssize_t = write(fd1, const void *buf, size_t count)

Buffer Cache

I/O

User Space Disk

13

Linux Read Operations RAW, OCFS

Oracle Kernel fd1 = open(/system01.dbf, O_SYNC|O_DIRECT|..); Kernel Space kernel switch kernel switch Kernel

Oracle Process ssize_t = read(fd1, const void *buf, size_t count)

Buffer Cache

I/O

User Space Disk

14

Linux System Call : Open(..)


int open(const char *pathname, int flags); O_APPEND The file is opened in append mode O_ASYNC Enable signal-driven I/O O_CLOEXEC (Since Linux 2.6.23) Enable the close-on-exec flag for the new file descriptor. O_CREAT If the file does not exist it will be created. O_DIRECT (Since Linux 2.4.10) Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The O_DIRECT flag on its own makes at an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC that data and necessary metadata are transferred. To guarantee synchronous I/O the O_SYNC must be used in addition to O_DIRECT. O_SYNC The file is opened for synchronous I/O. Any writes on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware.

15

AIO Overview

Standard feature of 2.6 kernels (patches for 2.4) Initiate a number of I/O operations without having to block or wait for any to complete Later, or after being notified of I/O completion, the process can retrieve the results of the I/O

16

I/O models

Blocking Synchronous Aynchronous Read/Write I/O multiplexing (select/poll)

Non-blocking Read/Write (O_NONBLOCK) AIO

17

Synchronous blocking I/O model

18

Synchronous non-blocking I/O model

19

Aynchronous blocking I/O model

20

Aynchronous non-blocking I/O model (AIO)

21

AIO interface APIs

API function aio_read aio_error aio_return aio_write aio_suspend aio_cancel lio_listio

Description Request an asynchronous read operation Check the status of an asynchronous request Get the return status of a completed asynchronous request Request an asynchronous operation Suspend the calling process until one or more asynchronous requests have completed (or failed) Cancel an asynchronous I/O request Initiate a list of I/O operations
22

Example AIO Usage : AIOCB Structure


#include <aio.h> int aio_read(struct aiocb *aiocbp); struct aiocb { /* The order of these fields is implementation-dependent */ int aio_fildes; off_t aio_offset; volatile void *aio_buf; size_t aio_nbytes; int aio_reqprio; struct sigevent aio_sigevent; int aio_lio_opcode; /* /* /* /* /* /* /* File descriptor */ File offset */ Location of buffer */ Length of transfer */ Request priority */ Notification method */ Operation to be performed; lio_listio() only */

/* Various implementation-internal fields not shown */ };

23

Example AIO Usage


... #include <aio.h> ... int main(){ // open the file int file = open(/tmp/file.txt", O_RDONLY|O_DIRECT|O_SYNC, 0); // create the buffer char* buffer = new char[SIZE_TO_READ]; // create the control block structure aiocb cb; memset(&cb, 0, sizeof(aiocb)); cb.aio_nbytes = SIZE_TO_READ; cb.aio_fildes = file; cb.aio_offset = 0; cb.aio_buf = buffer; // read! if (aio_read(&cb) == -1){ cout << "Unable to create request!" << endl; close(file); } cout << "Request enqueued!" << endl; // wait until the request has finished while(aio_error(&cb) == EINPROGRESS){ cout << "Working..." << endl; } // success? int numBytes = aio_return(&cb); if (numBytes != -1) cout << "Success!" << endl; ... }

else cout << "Error!" << endl;

24

FILESYSTEMIO_OPTIONS

Value asynch directIO

Description This allows asynchronous IO to be used where supported by the OS. This allows directIO to be used where supported by the OS. Direct IO bypasses any Unix buffer cache. As of 10.2 most platforms will try to use "directio" option for NFS attributes are sensible). Enables both ASYNC and DIRECT IO. This disables ASYNC IO and DIRECT IO so that Oracle uses normal synchronous writes, without any direct io options.

AIO

DIO

setall none

25

FILESYSTEMIO_OPTIONS - ASM

Value asynch directIO

Description This allows asynchronous IO to be used where supported by the OS. This allows directIO to be used where supported by the OS. Direct IO bypasses any Unix buffer cache. As of 10.2 most platforms will try to use "directio" option for NFS attributes are sensible). Enables both ASYNC and DIRECT IO. This disables ASYNC IO and DIRECT IO so that Oracle uses normal synchronous writes, without any direct io options.

AIO
DISK_ASYNC_IO {TRUE|FALSE}

DIO

DISK_ASYNC_IO {TRUE|FALSE}

setall none

DISK_ASYNC_IO {TRUE|FALSE} DISK_ASYNC_IO {TRUE|FALSE}

26

DISK_ASYNC_IO

Controls whether I/O to datafiles, control files and logfiles is asynchronous (that is, whether parallel server processes can overlap I/O requests with CPU processing during table scans). Default value : TRUE Set to FALSE to disable asynchronous I/O If you set DISK_ASYNCH_IO to false then you should set DBWR_IO_SLAVES to a value other than its default (0) in order to simulate asynchronous I/O If DBWR_IO_SLAVES>0 then number of processes used by ARCH and LGWR is set to 4. Also, RMAN server processes will be set to 4 only if asynchronous I/O is disabled

27

How to check the usage at OS Level

If I/O async is enabled:


[oracle@dwh01 ~]$ sudo cat /proc/slabinfo | grep kio kioctx 89 144 320 12 1 : tunables 54 27 kiocb 47 240 256 15 1 : tunables 120 60 8 : slabdata 8 : slabdata 12 13 12 16 0 0

If I/O async is disabled:


[oracle@lnx6 ~]$ sudo cat /proc/slabinfo | grep kio kioctx 0 0 320 12 1 : tunables 54 kiocb 0 0 256 15 1 : tunables 120 27 60 8 : slabdata 8 : slabdata 0 0 0 0 0 0

28

How to check the usage at DB Level

If I/O async is enabled:


[oracle@a01 ~]$ ps -eaf | grep lgw oracle 13924 1 0 Oct12 ? 00:19:18 ora_lgwr_DWH1

[oracle@a01 ~]$ strace -p 13924 io_submit(47566907363328, 2, {{0x2b4309936b70, 0, 1, 0, 41}, {0x2b4309936d38, 0, 1, 0, 42}}) = 2 io_getevents(47566907363328, 1, 1024, {{0x2b4309936b70, 0x2b4309936b70, 512, 0}, {0x2b4309936d38, 0x2b4309936d38, 512, 0}}, {600, 0}) = 2

If I/O async is disabled:


[oracle@b01 ~]$ ps -eaf | grep lgwr oracle 28400 1 0 Feb22 ? 05:41:21 ora_lgwr_PMT1 [oracle@b01 ~]$ strace -p 28400 .. pwrite64(20, "\1\"\0\0\4)\0\0p\317\10\0\20\200..\213k\r\1\0\0042"..., 512, 5376000) = 512 pwrite64(21, "\1\"\0\0\4)\0\0p\317\10\0\20\200..\213k\r\1\0\0042"..., 512, 5376000) = 512 ..

29

How to check the usage at DB Level - ASM

Async Enbled by default on >=10g and ASM You can disable using DISK_ASYNC_IO=false
[oracle@c01 ~] strace -p 13260 ... open(0xfe2c85f0, O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0660) = 8 writev(8, [?] 0xffffb608, 2) = 80 ... read(16, "MSA\0\2\0\10\0P\0\0\0\222\377\377\377@\313\373\5\0\0\0"..., 80) = 80 ...

30

Oracle Linux, Filesystem & I/O Type Supportability (Doc ID 279069.1)

10g Async I/O ext3/ext4 raw block (4) ASM OCFS2 NFS GFS GFS2 Yes (8) Depr. (2) (3) Yes Yes Yes Yes (5) Yes (6) Yes

10g Direct I/O Yes (8) Depr. (2) Yes (3) Yes Yes (7) Yes

11g - Async 11g I/O Direct I/O Yes (8) Depr. (2) (3) Yes Yes Yes Yes (7) (5) Yes (6) Yes Yes (8) Depr. (2) Yes (3) Yes Yes Yes

31

System Monitoring - LGWR


[oracle@dwh1 ~]$ ps -eaf | grep lgwr oracle 28878 1 0 11:47 ? [oracle@pd3 ~]$

00:00:02 ora_lgwr_test1

[oracle@dwh1 ~]$ strace -cp 28878 Process 28878 attached - interrupt to quit Process 28878 detached % time seconds usecs/call calls errors ------ ----------- ----------- --------- --------91.68 0.076143 43 1791 5.67 0.004709 16 290 2.59 0.002152 10 212 7 0.06 0.000053 0 5945 0.00 0.000000 0 1 0.00 0.000000 0 36 0.00 0.000000 0 36 0.00 0.000000 0 21 0.00 0.000000 0 35 0.00 0.000000 0 10 0.00 0.000000 0 123 0.00 0.000000 0 67 0.00 0.000000 0 10 ------ ----------- ----------- --------- --------100.00 0.083057 8577 7

syscall ---------------pwrite pread semtimedop times read open close stat writev sendto kill semctl getrusage ---------------total

32

System Monitoring - DBWR


[oracle@dwh1 ~]$ ps -eaf | grep dbw oracle 28876 1 0 11:47 ?

00:00:03 ora_dbw0_test1

[oracle@dwh1 ~]$ strace -cp 28876 Process 28876 attached - interrupt to quit Process 28876 detached % time seconds usecs/call calls errors ------ ----------- ----------- --------- --------89.77 0.019979 54 370 5.67 0.001262 0 70462 4.49 0.000999 42 24 9 0.07 0.000015 0 817 0.00 0.000000 0 2 0.00 0.000000 0 21 ------ ----------- ----------- --------- --------100.00 0.022255 71696 9

syscall ---------------pwrite getrusage semtimedop times mmap semctl ---------------total

33

System Monitoring LGWR - ASM


[oracle@pd3 ~]$ ps -eaf | grep lgwr oracle 30809 1 0 13:37 ?

00:00:00 ora_lgwr_test1

[oracle@pd3 ~]$ strace -cp 30809 Process 30809 attached - interrupt to quit Process 30809 detached % time seconds usecs/call calls errors ------ ----------- ----------- --------- --------65.97 0.034288 91 377 13.77 0.007155 4 1839 11.15 0.005794 20 295 8.04 0.004178 13 314 7 0.82 0.000428 31 14 0.23 0.000118 0 5475 0.02 0.000011 0 248 0.00 0.000000 0 32 0.00 0.000000 0 32 0.00 0.000000 0 21 0.00 0.000000 0 32 0.00 0.000000 0 6 0.00 0.000000 0 42 0.00 0.000000 0 8 ------ ----------- ----------- --------- --------100.00 0.051972 8735 7

syscall ---------------io_submit io_getevents pread semtimedop pwrite times kill open close stat writev sendto semctl getrusage ---------------total

34

System Monitoring DBWR - ASM


[oracle@pd3 ~]$ ps -eaf | grep dbw oracle 30807 1 0 13:37 ?

00:00:00 ora_dbw0_test1

[oracle@pd3 ~]$ strace -cp 30807 Process 30807 attached - interrupt to quit Process 30807 detached % time seconds usecs/call calls errors ------ ----------- ----------- --------- --------88.57 0.036811 449 82 5.32 0.002210 0 96686 4.36 0.001814 4 451 1.11 0.000463 1 416 0.57 0.000238 0 6008 0.06 0.000024 0 3488 0.00 0.000000 0 2 0.00 0.000000 0 96 0.00 0.000000 0 22 8 ------ ----------- ----------- --------- --------100.00 0.041560 107251 8

syscall ---------------io_submit getrusage io_getevents mmap kill times semop semctl semtimedop ---------------total

35

ORION (Oracle I/O Calibration Tool)

Standalone Tool for calibrating the I/O performance for storage systems that are intended to be used for Oracle databases No need to create and run an Oracle DB May be configured to simulate OLTP or DWH environments

36

Orion Configuration
[oracle@pd3 ~]$ cat dwh_dg1.lun /dev/mapper/mpath7p1 /dev/mapper/mpath8p1 /dev/mapper/mpath9p1 /dev/mapper/mpath14p1

37

Orion Configuration
[oracle@dwh01 orion]$ cat orion-test1.sh #!/bin/bash HOST=`hostname -s` DG=$1 TEST="${HOST}-${DG}" cd ${ROOT} echo "Hostname: ${HOST}" echo " dg: ${DG}" echo " test: ${TEST}" sudo ./orion_linux_x86-64 -run advanced \ -testname ${TEST} \ -matrix point \ -num_small 0 \ -num_large 4 \ -size_large 1024 \ -num_disks 4 \ -type seq \ -num_streamIO 4 \ -simulate raid0 \ -cache_size 0 \ -stripe 1024 \ -write 50 \ -verbose

38

Orion Run : summary.txt


ORION VERSION 11.1.0.7.0 Commandline: -run simple -testname dg1 -num_disks 4 This maps to this test: Test: dg1 Small IO size: 8 KB Large IO size: 1024 KB IO Types: Small Random IOs, Large Random IOs Simulated Array Type: CONCAT Write: 0% Cache Size: Not Entered Duration for each Data Point: 60 seconds Small Columns:, 0 Large Columns:, 0, 1, 2, 3, Total Data Points: 29 Name: /dev/mapper/mpath7p1 Name: /dev/mapper/mpath8p1 Name: /dev/mapper/mpath9p1 Name: /dev/mapper/mpath14p1 4 FILEs found. Size: Size: Size: Size: 1099522496512 1099522496512 1099522496512 1099522496512

4,

5,

6,

7,

Maximum Large MBPS=330.91 @ Small=0 and Large=8 Maximum Small IOPS=8856 @ Small=20 and Large=0 Minimum Small Latency=0.85 @ Small=1 and Large=0

39

Implementation Tips..

Test, Test, Test Make friends in the Storage/System Administration Groups Be aware of any existing bugs/limitations : Data Pump Export (EXPDP) Received Error ORA-31641 (Doc ID 1330406.1)
Setting filesystemio_options=O_DIRECT/SETALL and using expdp to export tables to a file system (tmpfs) which is not support O_DIRECT. alter system set filesystemio_options=none scope=spfile;

40

References

Metalink
Oracle Linux, Filesystem & I/O Type Supportability (Doc ID 279069.1) ASM Inherently Performs Asynchronous I/O Regardless of filesystemio_options Parameter (Doc ID 751463.1) How To Check if Asynchronous I/O is Working On Linux (Doc ID 237299.1) Supported and Recommended File Systems on Linux (Doc ID 236826.1) Comparing Performance Between RAW IO vs OCFS vs EXT 2/3 (Doc ID 236679.1) Using Redhat Global File System (GFS) as shared storage for RAC (Doc ID 329530.1) Asynchronous I/O (aio) on RedHat Advanced Server 2.1 and RedHat Enterprise Linux 3 (Doc ID 225751.1) Asynchronous I/O Support on OCFS/OCFS2 and Related Settings: filesystemio_options, disk_asynch_io (Doc ID 432854.1) M. Tim Jones - Boost application performance using asynchronous I/O (http://www.ibm.com/developerworks/linux/library/l-async/index.html)

Other Sites

41

Questions & Contact Info

http://blog.romeosoft.com/ romeo@romeosoft.com

42

Potrebbero piacerti anche