Sei sulla pagina 1di 15

Page Printed: September 19, 2011 10:11:46 AM

GMT+05:30
SR 3-4526370651: Database hangs
completely

Information
Request
Number

3-4526370651

Primary
Contact

Vinay Kashyap

Status

Customer Working

Support ID

15917149

Opened
Product
Platform

Filed By
Primary
Contact
Phone
Severity
Legacy SR
Number

VINAY.KASHYAP@KOTAK.COM
+919820882572
1

September 16, 2011


September 17, 2011 10:52:56 AM
5:50:43 PM
Last Update
GMT+05:30
GMT+05:30
Oracle Server Product
10.2.0.4
Enterprise Edition
Version
IBM AIX on POWER
Systems (64-bit)

History
Oracle Support - September 17, 2011 10:52:55 AM GMT+05:30 [Notes]
Generic Note
-----------------------Update
=======
Spoke to Customer's Manager Laxman 9873919042 and discussed the issue.
As per Laxman,the distributed transactions recovery was in progress during the issue time
Informed the following things,helps in issue investigations
+ Action plan given to previous update,for what to collect from database end when the hang
situation repeats
+ Advised to collect the OS information's like memory,io,cpu usage during the problem time
to review for OS scarcity if any like memory ,IO,swapspace etc
+ Advised to have a tab of nature of the activities in database prior to hang occurs

Thank You
Karunakar
Oracle Support - September 17, 2011 10:13:08 AM GMT+05:30 [Notes]
Generic Note
-----------------------Update
========
Spoke to customer at 0120 4726187 and discussed the issue.
+ So far the database appears hang for a duration of 10 - 15 mintues time
+ And during this time new connections or existing connections are not working
+ The hang situation was auto-corrected without any user intervension,
And new connections or existing connections are started working
+ Alert log contains jobs errors and ora-60 errors.
+ As per customer before the hang the cpu utilization was around 80 to 90 % ,during the hang
time the cpu utilization was dropped to around 20%
+ Advised customer to collect the following details when the hang situation repeats to
understand more about the cause.
Collect the 2 sets of hanganalyze + 2 sets of system state dumps during issue time
A)If you are able to connect to the DB during hang time,then use this steps
Open new sqlplus session each time:
SQL> connect / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug hanganalyze 3
SQL>oradebug tracefile_name
SQL>exit
... Wait for 1 minutes
Open new sqlplus session each time:
$ sqlplus /nolog
connect / as sysdba
oradebug setmypid
oradebug unlimit

oradebug dump systemstate 266


oradebug tracefile_name
exit
... Wait for 1 minutes
Open new sqlplus session each time:
SQL> connect / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug hanganalyze 3
SQL>oradebug tracefile_name
SQL>exit
... Wait for 1 minutes
Open new sqlplus session each time:
$ sqlplus /nolog
connect / as sysdba
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug tracefile_name
exit
B)If you are not able to connect to the DB during hang time,then use this steps
Note
=====
Incase if you are unable to connect to the DB during the issue time,then use the following
method to collect
the hang analyze and system state dumps
export ORACLE_SID=PROD ## Replace PROD with the SID you want to trace
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug hanganalyze 3
exit
-- wait for 1 minute
export ORACLE_SID=PROD ## Replace PROD with the SID you want to trace
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;

oradebug dump systemstate 266


exit
-- wait for 1 minute
export ORACLE_SID=PROD ## Replace PROD with the SID you want to trace
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug hanganalyze 3
exit
-- wait for 1 minute
export ORACLE_SID=PROD ## Replace PROD with the SID you want to trace
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 266
exit
+ Upload the AWR,ADMM reports of 1 hour internvals close to the problem time,when the
issue repeats
+ Please also check whether is any usual messages in OS logs during the last hang time
Thank You
Karunakar
Oracle Support - September 17, 2011 9:43:35 AM GMT+05:30 [Notes]
Generic Note
-----------------------Data collected
==================
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2/db
System name: AIX
Node name: MUM-BO-S022
Release: 3
Version: 5
Machine: 00CC80414C00
Instance name: klife
Redo thread mounted by this instance: 1
Oracle process number: 79

Unix process pid: 9601264, image: oracle@MUM-BO-S022 (J001)


*** 2011-09-16 17:16:03.531
*** ACTION NAME:() 2011-09-16 17:16:03.521
*** MODULE NAME:() 2011-09-16 17:16:03.521
*** SERVICE NAME:(SYS$USERS) 2011-09-16 17:16:03.521
*** CLIENT ID:() 2011-09-16 17:16:03.521
*** SESSION ID:(2472.1630) 2011-09-16 17:16:03.521.
.
.
.
*** ACTION NAME:() 2011-09-16 17:36:49.934
*** MODULE NAME:() 2011-09-16 17:36:49.934
*** SERVICE NAME:(SYS$USERS) 2011-09-16 17:36:49.934
*** CLIENT ID:() 2011-09-16 17:36:49.934
*** SESSION ID:(2295.711) 2011-09-16 17:36:49.934
*** 2011-09-16 17:36:49.934
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
*** 2011-09-16 17:51:25.176
*** ACTION NAME:() 2011-09-16 17:51:25.176
*** MODULE NAME:() 2011-09-16 17:51:25.176
*** SERVICE NAME:(SYS$USERS) 2011-09-16 17:51:25.176
*** CLIENT ID:() 2011-09-16 17:51:25.176
*** SESSION ID:(2516.3043) 2011-09-16 17:51:25.176
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)--------Resource Name process session holds waits process session holds waits
TX-00940010-00253d63 79 2516 X 79 2516 X
session 2516: DID 0001-004F-0000532C session 2516: DID 0001-004F-0000532C
Rows waited on:
Session 2516: obj - rowid = 00010DCF - AAHM71AF3AAAAaQAAZ
(dictionary objn - 69071, file - 375, block - 1680, slot - 25)
Information on the OTHER waiting sessions:
End of information on OTHER waiting sessions.
Current SQL statement for this session:
UPDATE T_PROPOSAL_FINE_TUNE SET DT_PSM_ST_DT = :B1 + 1/86400,
I_PSM_ST_SER_NO=:B4 WHERE S_PSM_PROP_NO=:B2 AND S_RSM_ROLE_ID=:B3
----- PL/SQL Call Stack -----

object line object


handle number name
7000005779b6880 58 BTSTARGET.TRG_PROPOSAL_FINE_TUNE
7000005795c09b0 17 BTSTARGET.TRG_PROPSTATUSUPD
700000577e2c548 88 procedure BTSTARGET.SP_KTRACK_STATUS_UPDATE
700000577e43b30 90 procedure BTSTARGET.SP_WF_STATUS_PULL
7000004d3ea2700 1 anonymous block
===================================================
VINAY.KASHYAP@KOTAK.COM - September 17, 2011 9:08:54 AM GMT+05:30 [Update
from Customer]
We upload the same
Oracle Support - September 16, 2011 9:44:15 PM GMT+05:30 [ODM Action Plan]
=== ODM Action Plan ===
1.I need the deadlock trace files
2.For this error
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
It looks like there is some dbms_job or schedular_job trying to run in the database.
Please follow the below Note 744645.1 to find the job name and let us know your findings.
Note 744645.1 How to find the job name if a scheduled job fails with ORA-12012
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 9:41:45 PM GMT+05:30 [Update
from Customer]
We upload the same you are asking .but it seems that database hanging while recovery
happening in distributed transaction failure.
Oracle Support - September 16, 2011 9:36:37 PM GMT+05:30 [Notes]
Generic Note
-----------------------reviewing files
Oracle Support - September 16, 2011 9:27:11 PM GMT+05:30 [ODM Action Plan]
=== ODM Action Plan ===
Please upload the requested files I cannot troubleshoot with only the alert.log
Errors in file /oracle/product/admin/admin/klife/bdump/klife_j002_8102088.trc:
ORA-12012: error on auto execute of job 43086

ORA-01031: insufficient privileges


ORA-06512: at line 2
Fri Sep 16 17:36:49 2011
Errors in file /oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc:
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
Fri Sep 16 17:51:25 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc.

To find a root cause I need the following


1.Please get systemstates from the database while the hang is occuring
% sqlplus "/ as sysdba"

SQL > ALTER SESSION SET MAX_DUMP_FILE_SIZE=UNLIMITED;


SQL > ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE
LEVEL 266';
wait 60 seconds
SQL > ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE
LEVEL 266';
wait 60 seconds
SQL > ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE
LEVEL 266';
wait 60 seconds
exit
FILE will output in the user_dump_dest directory....

2.Can you please upload the deadlock trace files


Fri Sep 16 18:36:39 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j004_6152346.trc.
Thu Sep 15 22:43:44 2011

ORA-00060: Deadlock detected. More info in file


/oracle/product/admin/admin/klife/bdump/klife_j003_8507528.trc.
Also upload AWR and ASH reports for the deadlock times also
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 9:24:52 PM GMT+05:30 [Update
from Customer]
Dear team ,
Please look this and suggest us regarding this .

Fri Sep 16 16:50:40 2011


starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)
(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
Fri Sep 16 16:50:40 2011
ALTER DATABASE MOUNT
Fri Sep 16 16:50:44 2011
Setting recovery target incarnation to 3
Fri Sep 16 16:50:45 2011
Successful mount of redo thread 1, with mount id 801788448
Fri Sep 16 16:50:45 2011
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Fri Sep 16 16:50:45 2011
ALTER DATABASE OPEN
Fri Sep 16 16:50:45 2011
Beginning crash recovery of 1 threads
parallel recovery started with 11 processes
Fri Sep 16 16:50:46 2011
Started redo scan
Fri Sep 16 16:50:48 2011
Completed redo scan
206145 redo blocks read, 8401 data blocks need recovery
Fri Sep 16 16:50:49 2011
Started redo application at
Thread 1: logseq 88608, block 6272081
Fri Sep 16 16:50:49 2011
Recovery of Online Redo Log: Thread 1 Group 15 Seq 88608 Reading mem 0
Mem# 0: /data1/oradata/klife/redo15.log
Fri Sep 16 16:50:50 2011
Completed redo application

Fri Sep 16 16:50:51 2011


Completed crash recovery at
Thread 1: logseq 88608, block 6478226, scn 171648608910
8401 data blocks read, 5720 data blocks written, 206145 redo blocks read
Fri Sep 16 16:50:52 2011
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=29, OS id=9134230
ARC1 started with pid=30, OS id=9097380
Fri Sep 16 16:50:53 2011
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Fri Sep 16 16:50:53 2011
Thread 1 advanced to log sequence 88609 (thread open)
Thread 1 opened at log sequence 88609
Current log# 16 seq# 88609 mem# 0: /data1/oradata/klife/redo16.log
Successful open of redo thread 1
Fri Sep 16 16:50:53 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Sep 16 16:50:53 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Sep 16 16:50:53 2011
ARC1: Becoming the heartbeat ARCH
Fri Sep 16 16:50:53 2011
SMON: enabling cache recovery
Fri Sep 16 16:50:56 2011
Successfully onlined Undo Tablespace 90.
Fri Sep 16 16:50:56 2011
SMON: enabling tx recovery
Fri Sep 16 16:50:56 2011
Database Characterset is WE8ISO8859P1
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 12
Fri Sep 16 16:50:57 2011
SMON: Parallel transaction recovery tried
Fri Sep 16 16:50:57 2011
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=45, OS id=8093908
Fri Sep 16 16:51:02 2011
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Sep 16 16:51:02 2011

Completed: ALTER DATABASE OPEN


Fri Sep 16 17:35:39 2011
Errors in file /oracle/product/admin/admin/klife/bdump/klife_j002_8102088.trc:
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
Fri Sep 16 17:36:49 2011
Errors in file /oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc:
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
Fri Sep 16 17:51:25 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc.
Fri Sep 16 17:51:31 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc.
Fri Sep 16 17:51:34 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc.
Fri Sep 16 17:51:37 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc.
Fri Sep 16 17:51:40 2011
ORA-00060: Deadlock detec
Oracle Support - September 16, 2011 8:33:40 PM GMT+05:30 [ODM Action Plan]
=== ODM Action Plan ===
To find a root cause I need the following
1.Please get systemstates from the database while the hang is occuring
% sqlplus "/ as sysdba"

SQL > ALTER SESSION SET MAX_DUMP_FILE_SIZE=UNLIMITED;


SQL > ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE
LEVEL 266';
wait 60 seconds
SQL > ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE
LEVEL 266';
wait 60 seconds

SQL > ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE
LEVEL 266';
wait 60 seconds
exit
FILE will output in the user_dump_dest directory....

2.Can you please upload the deadlock trace files


Fri Sep 16 18:36:39 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j004_6152346.trc.
Thu Sep 15 22:43:44 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j003_8507528.trc.
Also upload AWR and ASH reports for the deadlock times also
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 8:30:41 PM GMT+05:30 [Update
from Customer]
Dear team , We are facing the same situation again my database is now again on hang stage
please suggest us we are again unable to find the reason.
Oracle Support - September 16, 2011 8:08:58 PM GMT+05:30 [ODM Action Plan]
=== ODM Action Plan ===
Can you please upload the deadlock trace files
Fri Sep 16 18:36:39 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j004_6152346.trc.
Thu Sep 15 22:43:44 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j003_8507528.trc.
Also upload AWR and ASH reports for the deadlock times also
Oracle Support - September 16, 2011 8:01:11 PM GMT+05:30 [ODM Action Plan]

=== ODM Action Plan ===


Can you please upload the deadlock trace files
Fri Sep 16 18:36:39 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j004_6152346.trc.
Thu Sep 15 22:43:44 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j003_8507528.trc.
Also upload AWR and ASH reports for the deadlock times also
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 8:00:01 PM GMT+05:30 [Update
from Customer]
ok if you want to talk than call us on 01204726187
Oracle Support - September 16, 2011 7:57:15 PM GMT+05:30 [ODM Action Plan]
=== ODM Action Plan ===
I am reviewing the files I will update you ASAP
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 7:54:12 PM GMT+05:30 [Update
from Customer]
Any update team
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 7:11:08 PM GMT+05:30 [Update
from Customer]
And we alresdy upload our Alert log file
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 7:10:41 PM GMT+05:30 [Update
from Customer]
Dear team Please call us on this number ,
01204726187
Oracle Support - September 16, 2011 7:06:53 PM GMT+05:30 [ODM Issue Clarification]
=== ODM Issue Clarification ===
CAlling Vinay on +919820882572 to see exactly what the issue is here.
He is driving so needs a call in 20 minutes.

Oracle Support - September 16, 2011 7:02:02 PM GMT+05:30 [ODM Data Collection]
Name
-------=== ODM Data Collection ===
=== ODM Data Collection ===
FileName
---------------alert log
FileComment
---------------------Numerous deadlock errors:
Example:
Fri Sep 16 15:46:07 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j003_9138372.trc.
...
Fri Sep 16 16:48:51 2011
Shutting down instance (abort)
Fri Sep 16 16:49:01 2011
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 6209546
Fri Sep 16 16:50:39 2011
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
...
Fri Sep 16 16:50:48 2011
Completed redo scan
206145 redo blocks read, 8401 data blocks need recovery
Fri Sep 16 16:50:49 2011
Started redo application at
Fri Sep 16 16:51:02 2011
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Sep 16 16:51:02 2011
Completed: ALTER DATABASE OPEN

Fri Sep 16 17:35:39 2011


Errors in file /oracle/product/admin/admin/klife/bdump/klife_j002_8102088.trc:
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
Fri Sep 16 17:36:49 2011
Errors in file /oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc:
ORA-12012: error on auto execute of job 43086
ORA-01031: insufficient privileges
ORA-06512: at line 2
...
Fri Sep 16 17:51:25 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j001_9601264.trc.
Final message:
Fri Sep 16 18:36:39 2011
ORA-00060: Deadlock detected. More info in file
/oracle/product/admin/admin/klife/bdump/klife_j004_6152346.trc.
Oracle Support - September 16, 2011 6:22:57 PM GMT+05:30 [ODM Action Plan]
=== ODM Action Plan ===
1. Please upload the lates talert log (use GZIP to compress the file)
2. You are saying the database cannot be connected to without further status given.
- Can you see Oracle processes ruinning (use "ps" command on command line)
- Is the Listener running?
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 6:06:16 PM GMT+05:30 [Update
from Customer]
What configuration details you want
Database Version -10.2.0.4
OS Version -5.3
OS-IBM AIX
64BIT
Please specify
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 5:50:44 PM GMT+05:30 [Customer
Problem Description]

** Customer's Management 24x7 contact name: laxman singh tomar


** Customer's Management 24x7 contact number: 01204726187
** Primary Customer contact name: laxman
** Current Customer 24x7 voice phone number: 01204726187
** Current Customer E-mail address: kli.dba@kotak.com
** Current Customer Pager/Fax number: 9911093700
DB is down?
No
Data corruption physical or logical?
No
System crashes repeatedly?
No
Critical functionality is not available?
No
Severe performance impact or system hangs?
Yes
Based on the above responses, do you want to proceed with a severity 1 service request?
Yes
1) ### How is this Issue Impacting Your Business ###
We are unable to connect to database. Users also facing the same problem.Database is not
available
VINAY.KASHYAP@KOTAK.COM - September 16, 2011 5:50:43 PM GMT+05:30 [Customer
Problem Description]
Problem Description: Not able to connect to database

Copyright (c) 2007, 2011, Oracle. All rights reserved.

Potrebbero piacerti anche