Sei sulla pagina 1di 10

Solaris Crash Dumps and Basic Analysis

If your Solaris system panics and reboots, it'll probably create a crash dump in
/var/crash. You can also force a crash dump either online (using "savecore -L")
or as part of a reboot (using "reboot -d").
Normally this is where I stop and upload the /var/crash/vmdump.0 file up to Orac
le to find out what the problem was. However, you can do some basic investigatio
ns yourself using the following steps:
# savecore f vmdump.0 /somedirectory
# cd /somedirectory
# mdb *0
mdb> ::status
mdb> ::panicinfo
mdb> ::stack
mdb> ::msgbuf
mdb> ::cpuinfo
mdb> ::ps
mdb> ::arc
mdb> ::memstat
(*If the vmdump file is called vmdump.1 then use 1 instead of 0 in the above ste
ps)
--------------------------------------------------------------------------------
--------------------------
http://eldar.aydayev.com/how-to-analyze-oracle-solaris-os-crash-dump/
First of first we need collect memory dump file from dump device. By default aft
er system crash reboot, when is coming up
Solaris save core tool store dump file under /var/crash/server/vmdump.0. If file
system under /var/ have not enough space then
you can save dump output file in separate storage place of NFS shared resource.
Here is command how to do it:
To find out dump device check it by command dumpadm:
# dumpadm
Dump content: kernel pages
Dump device: /dev/md/dsk/d3 (swap) dump device
Savecore directory: /var/crash/mexico
Savecore enabled: yes
Save compressed: on
Now we will save crash dump file to the NFS shared resource on ifs server:
# savecore -vd -f /dev/md/dsk/d3 /net/nfsserver/dump/server01/crash_dump/
We will got file /net/nfsserver/dump/server01/crash_dump/vmdump.0
root@server # savecore -vd -f /dev/md/dsk/d3 /net/nfsserver/dump/server01/crash_
dump/
savecore: System dump time: Sun Nov 27 23:08:00 2011
savecore: Saving compressed system crash dump in /net/nfsserver/dump/server01/cr
ash_dump/vmdump.0
savecore: Copying /dev/md/dsk/d3 to /net/nfsserver/dump/server01/crash_dump/vmdu
mp.0
savecore: Decompress the crash dump with
savecore -vf /net/nfsserver/dump/server01/crash_dump/vmdump.0'
3:15 dump copy is done
Now we need decompress vmdump.0 file. We can make it with savecore command:
# savecore -vd -f vmdump.0 /net/nfsserver/dump/server01/crash_dump/
System dump time: Sun Nov 27 23:08:00 2011
Constructing namelist /net/nfsserver/dump/server01/crash_dump/unix.0
Constructing corefile /net/nfsserver/dump/server01/crash_dump/vmcore.0
: 3902138 of 3902138 pages saved

Now we have crash dump output files: unix.0 (Oracle Solaris kernel) and vmcore.0
(Oracle Solaris Memory snapshot)
Before continue you need install SUNWscat Solaris Crash Analyzer Tool. You can g
et it from Oracle Support.
After installing scat will be located in /opt/SUNWscat/bin/scat
There two type of generating crash dump analyze report interactive or as explore
r collection.
First interactive :
# /opt/SUNWscat/bin/scat /net/nfsserver/dump/server01/crash_dump/vmcore.0
Interactive will provide next output:
Solaris[TM] CAT 5.2 for Solaris 10 64-bit UltraSPARC
SV4990M, Aug 26 2009sn
Copyright 2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Feedback regarding the tool should be sent to SolarisCAT_Feedback@Sun.COM
Visit the Solaris CAT blog at http://blogs.sun.com/SolarisCAT
opening ./vmcore.0 dumphdr
WARNING: ./vmcore.0 incomplete/corrupt. size: 32167952384, expected: 32167960576
symtab core done
loading core data: modules symbols CTF done
core file: /net/nfsserver/dump/server01/crash_dump/vmcore.0
user: UNIX Administrator Eldar Aydayev (eldara:5002)
release: 5.10 (64-bit)
version: Generic_144488-17
machine: sun4u
node name: mexico
domain: mtn.com.ng
hw_provider: Sun_Microsystems
system type: SUNW,SPARC-Enterprise (SPARC64-VII)
hostid: 847c1fc7
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/md/dsk/d3(125G)
kmem_flags: 0xf (AUDIT|DEADBEEF|REDZONE|CONTENTS)
time of crash: Sun Nov 27 23:06:55 WAT 2011 (core is 74 days old)
age of system: 13 days 1 hours 51 minutes 3.68 seconds
panic CPU: 210 (56 CPUs, 224G memory, 3 nodes)
panic string: BAD TRAP: type=34 rp=2a1040da720 addr=deadbeefdeadc187 mmu_fsr=0
sanity checks: settings
NOTE: /etc/system: symbol not found "set noexec-stack=0x1"
NOTE: /etc/system: lwp_default_stksize set to 0x6000 2 times
NOTE: /etc/system: rpcmod:svc_default_stksize set to 0x6000 2 times
vmem CPU
WARNING: TS thread 0x303572b1120 on CPU32 using 99%CPU
WARNING: TS thread 0x376abc6e140 on CPU105 using 100%CPU
WARNING: CPU136 has cpu_intr_actv for PIL 6
WARNING: PIL6 interrupt thread 0x2a101917ca0 on CPU136 pinning TS thread 0x30351
a6d200
WARNING: TS thread 0x4cecc9e7b00 on CPU208 using 100%CPU
sysent
WARNING: unknown module acctctl seen 4 times in sysent table
clock misc
WARNING: hat_kpr_enabled is 0
WARNING: 80 severe kstat errors (run "kstat xck")
done
SolarisCAT(./vmcore.0/10U)>
Now we will check analyze output from crash dump:
SolarisCAT(./vmcore.0/10U)> analyze
core file: /net/nfsserver/dump/server01/crash_dump/vmcore.0
user: UNIX Administrator Eldar Aydayev (eldara:5002)
release: 5.10 (64-bit)
version: Generic_144488-17
machine: sun4u
node name: server01
domain: aydayev.com
hw_provider: Sun_Microsystems
system type: SUNW,SPARC-Enterprise (SPARC64-VII)
hostid: 847c1fc7
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/md/dsk/d3(125G)
kmem_flags: 0xf (AUDIT|DEADBEEF|REDZONE|CONTENTS)
time of crash: Sun Nov 27 23:06:55 WAT 2011 (core is 74 days old)
age of system: 13 days 1 hours 51 minutes 3.68 seconds
panic CPU: 210 (56 CPUs, 224G memory, 3 nodes)
panic string: BAD TRAP: type=34 rp=2a1040da720 addr=deadbeefdeadc187 mmu_fsr=0

==== checking for trap information ====


CPU 210 had the panic

==== panic thread: 0x3035f247c20 ==== CPU: 210 ====


==== panic user (LWP_SYS) thread: 0x3035f247c20 PID: 17031 on CPU: 210 affinity
CPU: 210 ====
cmd: /usr/sbin/in.ftpd -l a (Root cause command for crash system !!!!)
t_procp: 0x4fc4b0f4170
p_as: 0x4fc4c2ad000 size: 4210688 RSS: 3358720
hat: 0x4cf37103500
cnum: CPU160:948/4416 CPU162:544/7597 CPU164:460/5949 CPU166:489/4946 CPU136:985
/46 CPU138:370/4159 CPU140:273/5099 CPU142:290/5991 CPU208:1090/6512 CPU210:655/
4632 CPU212:473/2761 CPU214:467/3957 CPU0:1026/217 CPU2:382/4415 CPU4:274/4129 C
PU6:267/6665 CPU72:1034/7355 CPU74:363/42 CPU76:252/6184 CPU78:252/1587 CPU32:10
79/7151 CPU34:370/2611 CPU36:254/5165 CPU38:256/7132 CPU104:1057/6148 CPU106:380
/1438 CPU108:273/671 CPU110:263/6088
cpusran: 0,1,2,3,4,5,6,7,33,34,35,36,37,38,39,72,73,74,75,76,77,78,79,104,105,10
6,107,108,109,110,111,136,137,138,139,140,141,142,143,160,161,162,163,164,165,16
6,167,208,209,210,211,212,213,214,215
zone: global
t_stk: 0x2a1040dbae0 sp: 0x18c0791 t_stkbase: 0x2a1040d6000
t_pri: 41(TS) t_tid: 1 pctcpu: 23.594872
t_lwp: 0x6013936acd8 machpcb: 0x2a1040dbae0
mstate: LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.0003415 seconds earlier
ms_start: 1 minutes 6.2930931 seconds earlier
psrset: 0 last CPU: 210
idle: 0 ticks (0 seconds)
start: Sun Nov 27 23:05:49 2011
age: 66 seconds (1 minutes 6 seconds)
syscall: #236 shutdown(, 0xffbfd370) (sysent: unix:shutdown+0x0)
tstate: TS_ONPROC thread is being run on a processor
tflg: T_PANIC thread initiated a system panic
T_DFLTSTK stack is default size
tpflg: TP_TWAIT wait to be freed by lwp_wait
TP_MSACCT collect micro-state accounting information
tsched: TS_LOAD thread is in memory
TS_DONT_SWAP thread/LWP should not be swapped
pflag: SMSACCT process is keeping micro-state accounting
SMSFORK child inherits micro-state accounting
pc: unix:panicsys+0x48: call unix:setjmp
unix:panicsys+0x48(0x10a4ec8, 0x2a1040da4c8, 0x18c1160, 0x1, , , 0x1605, , , , ,
, , , 0x10a4ec8, 0x2a1040da4c8)
unix:vpanic_common+0x78(0x10a4ec8, 0x2a1040da4c8, 0x0, 0x0, 0x0, 0x60104661340)
unix:panic+0x1c(0x10a4ec8, 0x34, 0x2a1040da720, 0xdeadbeefdeadc187, 0x0, 0x7b227
2f4)
unix:die+0x9c(0x34, 0x2a1040da720, 0xdeadbeefdeadc187, 0x0)
unix:trap+0x69c(0x2a1040da720, 0xdeadbeefdeadc187)
unix:ktl0+0x48()
trap data type: 0x34 (memory address not aligned) rp: 0x2a1040da720
addr: 0xdeadbeefdeadc187
pc: 0x7b2272f4 ip:ip_output_options+0xd3c: ldx [%o7 + 0x298], %l7
npc: 0x7b2272f8 ip:ip_output_options+0xd40: ldub [%i1 + 0x19], %o4
global: %g1 0x7b2d4bfc
%g2 0x1 %g3 0x10000
%g4 0 %g5 0
%g6 0 %g7 0x3035f247c20
out: %o0 0x68d %o1 0xffffffffffffffff
%o2 0x68e %o3 0x68e
%o4 0x30230d79940 %o5 0x6011de11aa0
%sp 0x2a1040d9fc1 %o7 0xdeadbeefdeadbeef
loc: %l0 0 %l1 0xdeadbeefdeadbeef
%l2 0x33c20aa4b00 %l3 0x6011e6ef580
%l4 0 %l5 0x334bbebbe38
%l6 0x3006 %l7 0x4f9d8bf5890
in: %i0 0x33c20aa4b00 %i1 0x4f9d8bf5800
%i2 0x334bbebbe38 %i3 0
%i4 0x70065f14 %i5 0x2
%fp 0x2a1040da0d1 %i7 0x7b2d4bfc
ip:ip_output_options+0xd3c(, 0x6011de11aa0, 0x334bbebbe38?, , 0x70065f14, 0x2)
ip:ip_output(0x33c20aa4b00, 0x6011de11aa0, 0x334bbebbe38, 0x2) frame recycled
ip:tcp_send_data+0x1d4(0x33c20aa4d00, 0x349955c03c0, 0x6011de11aa0)
ip:tcp_rput_data+0x35b4(, 0x6011de11aa0?)
ip:tcp_input(0x33c20aa4b00, 0x6011de11aa0, 0x601043f1700) frame recycled
ip:squeue_enter_nodrain+0x31c(0x601043f1700, 0x6011de11aa0, 0x7b2ca160, 0x33c20a
a4b00, 0x1a)
ip:ip_fanout_tcp+0x868(0x419671c5658, 0x6011de11aa0, 0x601042ee4a8, 0x3600ed592d
0, 0xa3, 0x0, 0x0)
ip:ip_wput_local+0x6f4(0x419671c5658, 0x601042ee4a8, 0x3600ed592d0, 0x6011de11aa
0, 0x5b3b17c8ff8, 0x0, 0x0)
ip:ip_wput_ire+0x2fbc(0x419671c5658, 0x6011de11aa0, 0x5b3b17c8ff8, 0x6011e6ef580
, 0x2, 0x0)
ip:ip_output_options+0xa14(, 0x6011de11aa0, 0x419671c5658?, , 0x70065f14, 0x2)
ip:ip_output(0x6011e6ef580, 0x6011de11aa0, 0x419671c5658, 0x2) frame recycled
ip:tcp_send_data+0x1d4(0x6011e6ef780, 0x419671c5658, 0x6011de11aa0)
ip:tcp_xmit_end+0x98(0x6011e6ef780)
ip:tcp_wput_proto+0x410(0x6011e6ef580, 0x6011de11aa0, 0x601043f1700)
ip:squeue_enter+0x74()
ip:tcp_wput(0x419671c5658, 0x6011de11aa0) frame recycled
unix:putnext+0x218(0x4fbdec59650, 0x6011de11aa0?)
genunix:strput+0x1b4(0x334da7c7ad8, 0x6011de11aa0, 0x0, 0x2a1040db958, 0x0, 0x0)
genunix:kstrputmsg+0x33c(0x38444d08580, , 0x0, 0x0, 0x0, 0x2c4, 0x0)
sockfs:sotpi_shutdown+0x324(, 0x1)
sockfs:shutdown+0x28(, 0x1, 0x1, 0x0)
unix:syscall_trap32+0xcc()
switch to user thread s user stack

==== analyzing panic thread stack for trap frames ====

==== using trap() frame 1 @ 0x2a1040da520, rp(%i0): 0x2a1040da720 ====


type(%l2): 0x34 (memory address not aligned)
pc: 0x7b2272f4 ip:ip_output_options+0xd3c: ldx [%o7 + 0x298], %l7 (Root Cause be
long to IP stack bag in kernel)
npc: 0x7b2272f8 ip:ip_output_options+0xd40: ldub [%i1 + 0x19], %o4
global: %g1 ip:tcp_send_data+0x1d4
%g2 0x1 %g3 0x10000
%g4 0 %g5 0
%g6 0 %g7 0x3035f247c20
out: %o0 0x68d %o1 0xffffffffffffffff
%o2 0x68e %o3 0x68e
%o4 0x30230d79940 %o5 0x6011de11aa0
%sp 0x2a1040d9fc1 %o7 0xdeadbeefdeadbeef
loc: %l0 0 %l1 0xdeadbeefdeadbeef
%l2 0x33c20aa4b00 %l3 0x6011e6ef580
%l4 0 %l5 0x334bbebbe38
%l6 0x3006 %l7 0x4f9d8bf5890
in: %i0 0x33c20aa4b00 %i1 0x4f9d8bf5800
%i2 0x334bbebbe38 %i3 0
%i4 ip(bss):zero_info+0x0 %i5 0x2
%fp 0x2a1040da0d1 %i7 ip:tcp_send_data+0x1d4
ip:ip_output_options+0xd14: ldx [%fp + 0x7f7], %i0
ip:ip_output_options+0xd18: call genunix:freemsg
ip:ip_output_options+0xd1c: restore %g0, %g0, %g0 ( restore )
ip:ip_output_options+0xd20: 80: ldx [%fp + 0x7f7], %i1
ip:ip_output_options+0xd24: or %g0, %l5, %i0 ( mov %l5, %i0 )
ip:ip_output_options+0xd28: call genunix:putq
ip:ip_output_options+0xd2c: restore %g0, %g0, %g0 ( restore )
ip:ip_output_options+0xd30: 81: ldx [%fp + 0x7f7], %o5
ip:ip_output_options+0xd34: ldx [%l5 + 0x28], %o7
ip:ip_output_options+0xd38: ldx [%o5 + 0x28], %i1
ip:ip_output_options+0xd3c: ldx [%o7 + 0x298], %l7
ip:ip_output_options+0xd40: ldub [%i1 + 0x19], %o4
ip:ip_output_options+0xd44: subcc %o4, 0x0, %g0 ( cmp %o4, 0x0 )
ip:ip_output_options+0xd48: be,pn %icc, ip:ip_output_options+0xef8 (95f)
ip:ip_output_options+0xd4c: or %g0, %i3, %l1 ( mov %i3, %l1 )
ip:ip_output_options+0xd50: subcc %o4, 0xd, %g0 ( cmp %o4, 0xd )
ip:ip_output_options+0xd54: 82: bne,a,pn %icc, ip:ip_output_options+0xee8 (94f)
ip:ip_output_options+0xd58: or %g0, %i3, %i0 ( mov %i3, %i0 )
ip:ip_output_options+0xd5c: ldx [%fp + 0x7f7], %g1
ip:ip_output_options+0xd60: ldx [%g1 + 0x10], %l0
ip:ip_output_options+0xd64: ldx [%g1 + 0x18], %l2
SolarisCAT(./vmcore.0/10U)>
From marks in Red/yellow you can see the exactly root cause of system crash and
based on this information you can research released solution from Vendor company
as patches or updates. Using this way of analyzing you will reduce time to reso
lving issues and bring production system back to Online.

You can also use Sun Crash Tool explorer output based report to send it to Vendo
r (Oracle) support department to get right solution to solving current issues:

# /opt/SUNWscat/bin/scat scat_explore ./vmcore.0


WARNING: ./vmcore.0 incomplete/corrupt. size: 32167952384, expected: 32167960576
Was the system hung? [ y or n ] : y
Please enter a one line basic problem description [ max 256 chars ] :
System has unexpected reboot with crash dump
#Extracting crash data
#Gathering Hang Related data
#Successful extraction
SCAT_EXPLORE_DATA_DIR=./scat_explore_server01_847c1fc7_0x6bc73a4_vmcore.0
And you will get compressed tarball:
# ls -alF scat_explore_server01_847c1fc7_0x6bc73a4_vmcore.0/
total 2804
drwxr-xr-x 2 root root 32 Feb 10 04:04 ./
drwxr-xr-x 3 root root 8 Feb 10 04:04 ../
-rw-r r 1 root root 8238 Feb 10 04:02 analyze.out
-rw-r r 1 root root 60230 Feb 10 04:02 callout-a.out
-rw-r r 1 root root 0 Feb 10 04:02 callout-xck.out
-rw-r r 1 root root 62305 Feb 10 04:03 clockinfo.out
-rw-r r 1 root root 1603 Feb 10 04:02 coreinfo.out
-rw-r r 1 root root 66627 Feb 10 04:02 cpu-L.out
-rw-r r 1 root root 205444 Feb 10 04:03 cpu-t.out
-rw-r r 1 root root 156 Feb 10 04:04 dev_busy.out
-rw-r r 1 root root 33154 Feb 10 04:02 dev_info.out
-rw-r r 1 root root 6423 Feb 10 04:02 dispq.out
-rw-r r 1 root root 993 Feb 10 04:02 etcsystem.out
-rw-r r 1 root root 1293 Feb 10 04:02 ifconf.out
-rw-r r 1 root root 3270 Feb 10 04:03 intr.out
-rw-r r 1 root root 757 Feb 10 04:02 memerr.out
-rw-r r 1 root root 22679 Feb 10 04:02 modinfo.out
-rw-r r 1 root root 25528 Feb 10 04:02 msgbuf.out
-rw-r r 1 root root 4956 Feb 10 04:02 panic.out
-rw-r r 1 root root 1209 Feb 10 04:02 panic_buf.out
-rw-r r 1 root root 4860 Feb 10 04:02 panic_thread.out
-rw-r r 1 root root 45 Feb 10 04:02 prob_desc.out
-rw-r r 1 root root 37393 Feb 10 04:04 proc.out
-rw-r r 1 root root 12630 Feb 10 04:04 proc_tree.out
-rw-r r 1 root root 508755 Feb 10 04:04 scat_explore_server01_847c1fc7_0x6bc73a4_vm
core.0.tar.Z
-rw-r r 1 root root 16104 Feb 10 04:04 stack-l.out
-rw-r r 1 root root 26093 Feb 10 04:03 stack_summary.out
-rw-r r 1 root root 57167 Feb 10 04:03 stream_summary.out
-rw-r r 1 root root 1223 Feb 10 04:03 thread_summary.out
-rw-r r 1 root root 7938 Feb 10 04:04 tlist_rfscall.out
-rw-r r 1 root root 8978 Feb 10 04:02 tunables.out
-rw-r r 1 root root 7305 Feb 10 04:02 vfstab.out

Potrebbero piacerti anche