Sei sulla pagina 1di 91

1.Debug Process and Analyze Process Core on Solaris...............

2
1.1 General Commands...................................................................................... 2 1.2 Analyze Process Core by Using mdb (Modular Debugger)............................4 1.3 Analyze Process Core by Using adb on Solaris..............................................4 1.4 Analyze Process Core by Using gdb on Solaris..............................................5 1.5 Debug Process and Analyze Core by Using truss on Solaris..........................7 1.6 Analyze Process by using dbx.......................................................................9 1.8 Create Process Core.................................................................................... 11 1.9 Examining Memory Address Spaces with mdb on Solaris...........................11 1.10Debug Kernel, System Calls and Processes (DTRACE)...............................11 1.11Other Debugging Tools on Solaris..............................................................17 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Debug Processes by using STRACE.............................................................18 Analyze a Process Core Dump with gdb on Linux.......................................19 Analyze the Process Core using dbx on Linux.............................................21 Analyze a Core Dump Using Oprofile on Linux............................................22 Debug Libraries and Symbols on Linux.......................................................23 Other Debugging Tools on Linux.................................................................23 Debug Processes by using tusc...................................................................25 Instaling tusc on HP-UX 11.xx.....................................................................28 Debug Processes and Core Files by using HP WDB / GDB...........................29 Debug Processes by using truss.................................................................34 Anlalyze Process Performance by using Caliper on HP-UX 11.xx................35 Analyze Process Performance by using Prospect on HP-UX 11.xx...............37 Live Memory Analysis on HP-UX 11.xx by using KWDB...............................40 Other Debugging Tools on HP-UX 11.xx......................................................41 Debug Processes by using proctools...........................................................43 Debug Processes by using trace.................................................................44 Debug Processes by using syscalls.............................................................46 Debug Processes by using watch................................................................46 Debug Processes by using ProbeVue..........................................................46 Debug Processes by using truss.................................................................47 Debug Processes by using dbx...................................................................47 Analyze a Processes Core by using KDB.....................................................47 Other Debugging Tool on IBM AIX...............................................................48

2.Debug Process and Analyze Process Core on Linux...............18

3.Debug Process and Analyze Process Core on HP-UX..............25

4.Debug Process and Analyze Process Core on IBM AIX...........43

5.Debug Process and Analyze Process Core on IRIX.................49 6.Debug Process and Analyze Process Core on Tru64..............50 7.Generate / Analyze a Crash Dump on Solaris........................50
7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Save a Crash Dump on a Panicd System...................................................50 Setup a System to Save a Crash Dump......................................................51 Crash Dump Analysis on Solaris by using MDB...........................................52 Service Tool Bundle Service Crash Analysis Tool........................................55 Crash Dump Analysis on Solaris by using ADB............................................56 Crash Dump Analysis on Solaris by using Crash.........................................59 Crash Dump Analysis on Solaris by using ACT............................................60 Other Crash Dump Analysis Tools on Solaris..............................................61

8.Generate / Analyze a Crash Dump on HP-UX.........................61


8.1 Crash Dump Analysis by using KWDB.........................................................61

8.2 8.3 8.4 8.5 8.6 9.1 9.2 9.3 9.4 9.5 9.6

Remote Crash Dump Analysis.....................................................................65 Crash Dump Analysis by using Q4..............................................................65 Crash Dump Analysis by using KWDB Q4 Mode..........................................68 Crash Dump Analysis by using HP WDB / GDB............................................73 Crash Dump Analysis by using adb.............................................................74 Enable Saving Crash Dump by using kexex-tools.......................................76 Symulate a Panic and Save a Crash Dump.................................................77 Analyze Crash Dump by using crash...........................................................77 Analyze Crash Dump by using GDB............................................................79 Analyze Crash Dump by using LKCD...........................................................80 Other Useful Commands............................................................................. 84

9.Generate / Analyze a Crash Dump on Linux..........................75

10.Generate / Analyze a Crash Dump on Linux........................84


10.1Setup and Enable KDB...............................................................................84 10.2Analyze a Crash Dump by using KDB.........................................................85 11.1Informations............................................................................................... 89

11.Debugging Tools..............................................................89

1.Debug Process and Analyze Process Core on Solaris


1.1 General Commands

Show Process Tracebacks: pstack core Show Process Tracebacks on Running Process: pstack process_id Show Process Threads Info: pflags core Show Process Memory Mapping: pmap core Show Process Memory Mapping for a Running Process: pmap -sx `pgrep testprog` Show Kernel Info: kstat -n system_misc Check System Pages: kstat -n system_pages Check Processes: prstat -Lmc 10 10 > prstat.out more prstat.out

Debug Processes: pargs core pcred $$ pldd $$ psig $$ pfiles $$ pfile pid pstop $$ prun core pwait pid ptree $$ ptree pid ptime core pwdx $$ preap* core pgrep -u rmc Kernel Lock Statistics (Use -i 971 as Interval to Avoid Collisions with the Clock Interrupt and Gather Fine-Grained Data): lockstat -i 971 sleep 300 > lockstat.out lockstat -i 971 -I sleep 300 > lockstatI.out Kernel Profiling: lockstat -Ikw i997 sleep 10 CPU Traps Statistics: trapstat -t Gather CPU Hardware Counters per Process: cputrack -N 20 -c pic0=DC_access,pic1=DC_miss -p 19849 bc -l Gather CPU Statistics: cpustat -c pic0=Cycle_cnt,pic1=DTLB_miss 1 Check Page Size: pagesize -a Set Page Size Preference: ppgsz -o heap=4M ./testprog Segmap Hit Rates Statistics: kstat -n segmap Dump ELF File: elfdump -e /bin/ls Dump Section Headers: elfdump -c /bin/ls Invoke the Runtime Linker on the Specified Binary File to Check which Libraries are Linked to it: ldd netstat Run pled on Running Processes:

pldd $$ Get Linked List of All Processes: kstat -n var mdb -k > max_nprocs/D Library Tracing: apptrace ls Check Scheduling Classes: dispadmin -l priocntl -l Check Scheduling Class and Threads Priority: ps -eLc Check Timeshare Dispatch Table: dispadmin -g -c TS

1.2 Analyze Process Core by Using mdb (Modular Debugger)


mdb executable_name core_name $C $q OR: mdb core ::status data ::files ::stack ::walkers ::dcmds -l ::cpu0::print cpu_t ::walk walk_name | ::dcmd ::walk cpu|::print cpu_t ::cpu_t::sizeof ::address::list OR: mdb k

1.3

Analyze Process Core by Using adb on Solaris

Invoke the debugger: adb -c core Display the message buffer:

$<msgbuf Get the thread list: $< threadlist Check the status: $>status Get the process crash time: $> time/Y Get the kernel memory structures: $> kmastat Quit the debugger: $>q

1.4

Analyze Process Core by Using gdb on Solaris

The GNU Debugger is a powerful debugger developed for the main operating systems. In the most recent Solaris versions, the GDB is shipped with the installation media. You can find the here the current release. Start gdb on core file: gdb -c core OR: gdb a.out core OR: gdb path/to/the/binary path/to/the/core OR by gdb Prompt: (gdb) core core If the executable path is not provided, the debugger selects the invocation path of the process that generated the core file. The invocation path information is stored in the core file. If the invocation path is a relative path, you must enter the executable while debugging the core file. To start debugging the process, at the gdb Prompt, invoke the core file: (gdb) core core Check Status: (gdb) status View Data: (gdb) data View Stacks: (gdb) stack

Analyze a Stack by its Number: (gdb) frame number View Code around that Stack: (gdb) list List Variables: (gdb) info locals View Files: (gdb) files View Internals: (gdb) internals View Command Aliases: (gdb) aliases Check Support Facilities: (gdb) support Running Program: (gdb) running Analyze Tracing of Program Execution without Stopping it: (gdb) tracepoints User-defined Commands: (gdb) user-defined Get Obscure Features: (gdb) obscure View the stack trace: (gdb) backtrace List all threads of the process at the time of the crash: (gdb) info thread View the specified thread: (gdb) thread thread_id Disassembly a specified section in the memory: (gdb) disassemble <address> Displays memory information of a specified address: (gdb) x / s <address> Display the contents the registers: (gdb) info registers Display all registers, including floating point registers: (gdb) i all Display informations about all of the shared libraries:

(gdb) info shared Prints the target that is currently under the debugger: (gdb) info files (gdb) info target To view the source code: (gdb) list Start the target program: (gdb) run Set a breakpoint: (gdb) break sum On a line number: (gdb) b 25 On a n offset on the current line: (gdb) b +9 (gdb) b -1 On a memory address (use *): (gdb) b *00x2324 Set a watchpoint on a variable or expression: (gdb) watch x (gdb) watch_target &x (gdb) watch_target (<type of x> *) *<addr of x> Display a list of breakpoints and watchpoints: (gdb) info break (gdb) info watch Help: (gdb) help

1.5 Debug Process and Analyze Core by Using truss on Solaris


Trace System Calls of a Process or Command: truss -p pid truss -p 2975/3 truss /usr/local/sbin/snmpd Trace System Calls, Faults and Signals of a Process or Command and Count them: truss -c -p pid Trace a Process, Follow its Children and Count Syscalls, Faults and Signal: truss -cf -p pid Trace System Call, its Environment Strings and Timestamp for a Process (and Put it

on a File): truss -d -e -p 1873 truss -d -e -f -o /tmp/dbstart.lst -p 2522 Trace System Calls of a Process and Include a Time Delta on Each Line of Trace Ouput: truss -d -D -p 1473 Trace a Process Including Timestamp on Each Line and Include / Exclude Specific System Calls (in this case read Syscalls): truss -d -t read -p 1468 truss -d -t !read -p 1468 Trace a find and put the output on a file: truss find . -print >find.out Trace of the open, close, read, and write System Calls: truss -t open,close,read,write find . -print >find.out Trace a Shell Script: truss -f -o truss.out spell document Abbreviating Output: truss nroff -mm document >nroff.out Because 97% of the output reports lseek(), read(), and write() system calls, to abbreviate it: truss -t !lseek,read,write nroff -mm document >nroff.out Tracing library calls from within the C library: truss -u libc Trace all user-level calls made to any library other than the C library: truss -u '*' -u !libc p 1544 Tracing all user-level printf and scanf function calls in the C library: truss -u 'libc:*printf,*scanf' p 1100 Trace every user-level function call from any-where to anywhere: truss -u a.out -u ld:: -u :: ... Trace the system call activity of process #1, init: truss -p -v all 1 Trace a Process exec() Syscalls and Follow its Children: truss -ftexec -p pid 2> /dev/null & Trace System Calls of an Oracle Listener and its Timestamp and Put the Output in the File lsnrctl.truss: truss -d -o lsnrctl.truss -p 3949 Trace All of the System Calls of the pgrep command in a File: truss -o /var/tmp/syslog.truss.out -sall -p `pgrep syslogd` Trace the System Calls and its Forks, show arguments passed to the exec calls, and

the environment variables: truss aef -p <PID> OR: truss -aef lsnrctl dbsnmp_start Trace the System Calls and its Forks, show arguments passed to the exec calls, the environment variables, and the full contents of the I/O buffer for each read() and write() on any of the specified file descriptors and follow its children: truss aef -rall -wall -p <PID> Trace the full contents of the I/O buffer for each read() and write() on any of the specified file descriptors and follow its children: truss -rall -wall -f -p <PID> Trace verbosely the full contents of the I/O buffer for each read() and write() on any of the specified file descriptors and follow its children: truss -wall -rall -vall -f /usr/local/sbin/snmpd Verbosely Trace init: truss -p -v all 1 Trace the Machine Faults: truss mall p 1200 Exclude the Machine Faults from the Trace: truss m!all p 1200 Machine Faults that Stops the Process (If one of the specified faults is incurred, truss leaves the process stopped and abandoned): truss Mall p 1200 Run truss to Debug read() and write() syscalls as Oracle Listener/DBSnmp Starts: truss -rall -wall lsnrctl start Count Total CPU Seconds per System Calls: truss -c dd if=500m of=/dev/null bs=16k count=2k OR: truss -d -u a.out,libc dd if=500m of=/dev/null bs=16k count=2k more a.out Trace allthe syscalls, threads and API functions for CORBA-based process: truss -t!all -s!all -u libit_*::CORBA* -p 21922

1.6

Analyze Process by using dbx

To invoke dbx: dbx program_name OR:

dbx pid OR: dbx a pid OR: dbx -d 100 program_name core_file OR: dbx -d 100 -a pid OR: dbx - `pgrep Freeway` At dbx Prompt: (dbx) run (dbx) where (dbx) status Analyze Process Core by Using dbx: dbx program_name core OR: dbx - core OR: dbx a.out core At dbx Prompt: (dbx) run (dbx) where (dbx) threads (dbx) status (dbx) list main (dbx) print msg (dbx) check -access (dbx) check -memuse (dbx) help (dbx) quit

1.7
coreadm OR: savecore -d

Generate a Process Core Dump on Solaris

If after enabling core file generation your system still does not create a core file, you may need to change the file-size writing limits set by your operating system: ulimit -a ulimit -c unlimited ulimit -H -c unlimited

Enable Applications to Generate Core Files: coreadm -g /path-to-file/%f.%n.%p.core -e global -e process -e global-setid -e procsetid -e log

1.8
echo ls gcore ls

Create Process Core

1.9 Examining Memory Address Spaces with mdb on Solaris


prstat top ps -ef | grep pid pmap -x 919 mdb -k Load the dmod containing the new dcmd: ::load /wd320/max/source/mdb/segpages/i386/segpages.so Walk through the Segments of the Process Address Space, showing Each Virtual Page in the Segment: 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages Count the Pages currently Valid for the Process: 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i " valid" | wc Count the Pages in Memory Not currently Valid in the Page Table(s) for the Process: 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i "inmemory" | wc How Many Pages are Currently Not Valid (and Not in Memory): 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i " invalid$" | wc How Large is the Address Space (this should be the total size as reported by pmap): 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -v OFFSET | wc How Many Pages have been Swapped Out: 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i swapped | wc pmap -x 919

1.10 Debug Kernel, System Calls and Processes (DTRACE)


dtrace -f /usr/local/sbin/snmpd dtrace -l -n tcp::entry dtrace -l -m tcp

dtrace -lv -n fbt:tcp:_info:entry dtrace -n 'ufs_read:entry { printf("%s\n",stringof(args[0]->v_path));}' Get wich Process is making more SysCalls: dtrace -n 'syscall:::entry { @[execname] = count(); }' OR: dtrace -n 'syscall::read:entry { @[execname,pid]=count()}' Get new Processes with Arguments: dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }' Files opened by process: dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }' Pages paged in by process: dtrace -n 'vminfo:::pgpgin { @pg[execname] = sum(arg0); }' Minor faults by process: dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }' System Calls Count by Name: dtrace -n 'syscall:::entry { @syscalls[probefunc] = count(); }' Syscall Count by Program: dtrace -n 'syscall:::entry { @num[execname] = count(); }' Syscall Count by Syscall: dtrace -n 'syscall:::entry { @num[probefunc] = count(); }' Syscall Count by Process: dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }' Syscalls by Type: dtrace -n 'syscall:::entry { @[probefunc] = count(); }' Match the syscall probe only when the execname matches our investigation target, filebench, and count the syscall name: dtrace -n 'syscall:::entry /execname == "filebench"/ { @[probefunc] = count(); }' Kernel: Kernel Profiling: dtrace -n 'profile-997ms / arg0 != 0 / { @ks[stack()]=count() }' Counting xcalls: dtrace -n 'xcalls { @[probefunc] = count() }' Probe Virtual Memory Info on Running Staroffice Process: dtrace -P vminfo/execname == "soffice.bin"/{@[probename] = count()} dtrace -s ./soffice.d Successful Signal Details: dtrace -n 'proc:::signal-send /pid/ { printf("%s -%d %d",execname,args[2],args[1]>pr_pid); }'

Kernel stack trace profile at 1001 Hertz: dtrace -n 'profile-1001 { @[stack()] = count(); }' Thread off-cpu stack trace count: dtrace -n 'sched:::off-cpu { @[stack()] = count(); }' Adaptive lock block time totals (ns) by kernel stack trace: dtrace -qn 'lockstat:::adaptive-block { @[stack(5), "^^^ total ns:"] = sum(arg1); }' Kernel function call counts for module "zfs" by module: dtrace -n 'fbt:zfs::entry { @[probefunc] = count(); }' Kernel function call counts for functions beginning with "hfs_" by module: dtrace -n 'fbt::hfs_*:entry { @[probefunc] = count(); }' Kernel stack back trace counts for calls to function "arc_read()" (for example): dtrace -n 'fbt::arc_read:entry { @[stack()] = count(); }' Identify kernel stacks calling disk I/O: dtrace -n 'io:::start { @[stack()] = count(); }' Trace errors along with disk and error number: dtrace -n 'io:::done /args[0]->b_flags & B_ERROR/ { printf("%s err: %d", args[1]>dev_statname, args[0]->b_error); }' Look at what is calling semsys: dtrace -n 'syscall::semsys:entry /execname == "filebench"/ { @[ustack()] = count();}' Probe Functions: dtrace -n 'syscall:::entry { @scalls[probefunc] = count() }' Check which Process is Creating Threads: dtrace -n 'thread_create:entry { @[execname]=count()}' CPU: What are the top user functions running on CPU (% usr time)? dtrace -n 'profile-997hz /arg1/ { @[execname, ufunc(arg1)] = count(); }' What are the top 5 kernel stack traces on CPU (shows why)? dtrace -n 'profile-997hz { @[stack()] = count(); } END { trunc(@, 5); }' What threads are on CPU, counted by their thread name? (FreeBSD) dtrace -n 'profile-997 { @[stringof(curthread->td_name)] = count(); }' What system calls are being executed by the CPUs? dtrace -n 'syscall:::entry { @[probefunc] = count(); }' Which processes are executing the most system calls? dtrace -n 'syscall:::entry { @[pid, execname] = count(); }' Get Interrupts by CPU: dtrace -n 'sdt:::interrupt-start { @num[cpu] = count(); }' Get Functions by Process by CPU: dtrace -n 'pid221:libc::entry'

Find what is Context Switching Much onto the CPU: dtrace -n 'sched:::on-cpu { @[execname] = count(); } profile:::tick-20s { exit(0); }' Memory: Tracking memory page faults by process name: dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }' Process allocation (via malloc()) requested size distribution plot: dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p PID Process allocation (via malloc()) by user stack trace and total requested size: dtrace -n 'pid$target::malloc:entry { @[ustack()] = sum(arg0); }' -p PID File System: Trace file creat() calls with file and process name: dtrace -n 'syscall::creat*:entry { printf("%s %s", execname, copyinstr(arg0)); }' Frequency count stat() files: dtrace -n 'syscall::stat*:entry { @[copyinstr(arg0)] = count(); }' Tracing "cd": dtrace -n 'syscall::chdir:entry { printf("%s -> %s", cwd, copyinstr(arg0)); }' Count read/write syscalls by syscall type: dtrace -n 'syscall::*read*:entry,syscall::*write*:entry { @[probefunc] = count(); }' Syscall read(2) by file name: dtrace -n 'syscall::read:entry { @[fds[arg0].fi_pathname] = count(); }' Syscall write(2) by file name: dtrace -n 'syscall::write:entry { @[fds[arg0].fi_pathname] = count(); }' Syscall read(2) by filesystem type: dtrace -n 'syscall::read:entry { @[fds[arg0].fi_fs] = count(); }' Syscall write(2) by filesystem type: dtrace -n 'syscall::write:entry { @[fds[arg0].fi_fs] = count(); }' Syscall read(2) by process name for the "zfs" filesystem only: dtrace -n 'syscall::read:entry /fds[arg0].fi_fs == "zfs"/ { @[execname] = count(); }' Syscall write(2) by process name and filesystem type: dtrace -n 'syscall::write:entry { @[execname, fds[arg0].fi_fs] = count(); } END { printa("%18s %16s %16@d\n", @); }' Check Write Entries: dtrace -n 'syscall::write:entry { trace(arg2) }' dtrace -n 'fbt:ufs:ufs_write:entry { printf("%s\n",stringof(args[0]->v_path)); }' Identify who's responsible for to Much Reading: dtrace -n 'syscall::read:entry { @Execs[execname] = count(); }' dtrace -n 'syscall::open:entry { @Open[copyinstr(arg0)] = count(); }' dtrace -n 'syscall::exec*:entry { trace(execname); }'

Drive into Complex Structures: dtrace -qn 'syscall::exec*:entry { printf("%5d %s\n",pid,stringof(curpsinfo>pr_psargs)); }' Count All ioctl System Calls by Both Executable Name and File Descriptor: dtrace -n 'syscall::ioctl:entry { @[execname, arg0] = count(); }' Distribution of Write Size by Executable Name: dtrace -n 'syscall::write:entry { @[execname] = quantize(arg2); }' Read bytes by process: dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }' Write bytes by process: dtrace -n 'sysinfo:::writech { @bytes[execname] = sum(arg0); }' Read size distribution by process: dtrace -n 'sysinfo:::readch { @dist[execname] = quantize(arg0); }' Write size distribution by process: dtrace -n 'sysinfo:::writech { @dist[execname] = quantize(arg0); }' Disk size by process: dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->b_bcount); }' Chase the Hot Lock Caller: dtrace -n 'pr_p_lock:entry { @s[stack()]=count() }' dtrace -n 'pr_p_lock:entry { @s[execname]=count() }' prep process_name dtrace -n 'pid4485:libc:pread:entry { @us[ustack()]=count() }' Check UFS Read: dtrace -q -n 'ufs_read:entry { printf("UFS Read: %s\n",stringof(args[0]->v_path)); }' dtrace -q -n 'ufs_read:entry { @[execname,stringof(args[0]->v_path)]=count() }' Show disk I/O size as distribution plots, by process name: dtrace -n 'io:::start { @size[execname] = quantize(args[0]->b_bcount); }' Processes paging in from the filesystem: dtrace -n 'vminfo:::fspgin { @[execname] = sum(arg0); }' Which processes are executing common I/O system calls: dtrace -n 'syscall::*read:entry,syscall::*write:entry { @rw[execname,probefunc] = count(); }' What is the rate of disk I/O being issued: dtrace -n 'io:::start { @io = count(); } tick-1sec { printa("Disk I/Os per second: %@d\n", @io); trunc(@io); }' NFSv3 count of operations by client address: dtrace -n 'nfsv3:::op-*-start { @[args[0]->ci_remote] = count(); }' NFSv3 count of operations by file pathname: dtrace -n 'nfsv3:::op-*-start { @[args[1]->noi_curpath] = count(); }'

Socket Provider: Socket accepts by process name: dtrace -n 'syscall::accept*:entry { @[execname] = count(); }' Socket connections by process and user stack trace: dtrace -n 'syscall::connect*:entry { trace(execname); ustack(); }' mib Provider: IP event statistics: dtrace -n 'mib:::ip* { @[probename] = sum(arg0); }' TCP event statistics with kernel function: dtrace -n 'mib:::tcp* { @[strjoin(probefunc, strjoin("() -> ", probename))] = sum(arg0);}' IP Provider: Received IP packets by host address: dtrace -n 'ip:::receive { @[args[2]->ip_saddr] = count(); }' IP send payload size distribution by destination: dtrace -n 'ip:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }' TCP Provider: Who is connecting to what: dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' Who isn't connecting to what: dtrace -n 'tcp:::accept-refused { @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' What am I connecting to? dtrace -n 'tcp:::connect-established { @[args[3]->tcps_raddr , args[3]->tcps_rport] = count(); }' IP payload bytes for TCP send, size distribution by destination address: dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }' MySQL: MySQL: query trace by query string: dtrace -n 'mysql*:::query-start { trace(copyinstr(arg0)) }' MySQL: query count summary by host: dtrace -n 'mysql*:::query-start { @[copyinstr(arg4)] = count(); }' MySQL server: trace queries: dtrace -qn 'pid$target::*mysql_parse*:entry { printf("%Y %s\n", walltimestamp, copyinstr(arg1)); }' -p PID MySQL client: who's doing what (stack trace by query): dtrace -Zn 'pid$target:libmysql*:mysql_*query:entry { trace(copyinstr(arg1)); ustack(); }' -p PID

1.11

Other Debugging Tools on Solaris

gcore: Take a snapshot of a process: gcore o output_filename pid kill: Kill a process and generate its core dump: kill -SEGV <pid> lsof: Get File Open by the Specified Process/Command: lsof -p 28290 lsof -a -p 28290 Check How Many Instances of sendmail are Open: lsof -c sendmail File Descriptors Number: ps -ef cd /proc/28290/fd ls -l | wc -l Get File Open by the Specified User: lsof -u root Get FileSystem iNodes: lsof -i /fs Check Open Files on the specified File System and Processes the use it: lsof /fs Check How Many Instances of sendmail are Open: lsof -c sendmail Check iNodes Usage on the specified File System: lsof i /fs List All Open Files for the User abe and for the Specified Process IDs: lsof -p 456,123,789 -u 1234,abe Find processes with open files on the NFS filesystem /nfs/mount/point whose server is inaccessible, and presuming your mount table supplies the device number for /nfs/mount/point: lsof -b /nfs/mount/point Send a SIGHUP signal to All of the Processes that have /u/abe/bar Open: kill -HUP 'lsof -t /u/abe/bar' Ignore the Device Cache File: lsof Di Get PID and command name field output for each process, file descriptor, file device number, and file inode number for each file of each process: lsof FpcfDi

List the files at descriptors 1 and 3 of every process running the lsof command for login ID ''abe'' every 10 seconds: lsof -c lsof -a -d 1 -d 3 -u abe -r10 List All Files using Any Protocol on Any Port of mace.cc.org: lsof -i @mace List All Files using Any Protocol on the Specified Port Range of mace.cc.org: lsof -i @mace:123-140 List All IPv4 Network Files in Use whose PID is 1234: lsof -i 4 -a -p 1234 fuser: Get Processes and related Username Running on the /var File System: fuser -uc /var Get Process IDs and Login Names that have the /etc/passwd Files Open: fuser -u /etc/passwd Reports on the File System and Files, restricting the output to Processes that hold Non-blocking Mandatory Locks: fuser -cn /export/foo Kill Processes Running on the /var File System: fuser -ku /var Send SIGTERM to Any Processes that hold a Non-blocking Mandatory Lock on the File /export/foo/my_file: fuser -fn -s term /export/foo/my_file Get Processes Running on the / File System and Print the Processes Name and Arguments: ps -o pid,args -p "$(fuser / 2>/dev/null)" Report Device Usage Informations: fuser d /dev/dsk/c0t0d0

2.Debug Process and Analyze Process Core on Linux


2.1 Debug Processes by using STRACE

Trace the "ls" Command; strace ls Trace the "open" System Call of the "ls" Command: strace -e open ls Trace "open" and "read" System Calls of the "ls" Command:

strace -e trace=open,read ls /home Trace rsync and Log to File: strace -o /tmp/strace_ls_output.txt rsync Trace a Process by PID and Log to File: strace -o /tmp/strace_rsync_21.06.txt -p pid Trace "ls" Command and Print Relative Time for System Calls: strace -r ls Generate Statistics Report of System Calls for "ls" Command: strace -c ls /home Trace All System Calls which have a filename as an argument: strace -o /tmp/strace_rsync_output.txt -e trace=file -p pid Trace All Network Related System Calls: strace -o /tmp/strace_rsync_output.txt -e trace=network -p pid Trace All File Descriptor Related System Calls: strace -o /tmp/strace_rsync_output.txt -e trace=desc -p pid # -e verbose=all is the default verbosity. strace tttT o /tmp/s1.lst p 2395 strace -ttT -p 5164

2.2 Analyze a Process Core Dump with gdb on Linux


Start gdb on core file: gdb -c core OR: gdb a.out core OR: gdb path/to/the/binary path/to/the/core OR by gdb Prompt: (gdb) core core If the executable path is not provided, the debugger selects the invocation path of the process that generated the core file. The invocation path information is stored in the core file. If the invocation path is a relative path, you must enter the executable while debugging the core file. To start debugging the process, at the gdb Prompt, invoke the core file: (gdb) core core Check Status: (gdb) status

View Data: (gdb) data View Stacks: (gdb) stack Analyze a Stack by its Number: (gdb) frame number View Code around that Stack: (gdb) list List Variables: (gdb) info locals View Files: (gdb) files View Internals: (gdb) internals View Command Aliases: (gdb) aliases Check Support Facilities: (gdb) support Running Program: (gdb) running Analyze Tracing of Program Execution without Stopping it: (gdb) tracepoints User-defined Commands: (gdb) user-defined Get Obscure Features: (gdb) obscure View the stack trace: (gdb) backtrace List all threads of the process at the time of the crash: (gdb) info thread View the specified thread: (gdb) thread thread_id Disassembly a specified section in the memory: (gdb) disassemble <address> Displays memory information of a specified address: (gdb) x / s <address>

Display the contents the registers: (gdb) info registers Display all registers, including floating point registers: (gdb) i all Display informations about all of the shared libraries: (gdb) info shared Prints the target that is currently under the debugger: (gdb) info files (gdb) info target To view the source code: (gdb) list Start the target program: (gdb) run Set a breakpoint: (gdb) break sum On a line number: (gdb) b 25 On a n offset on the current line: (gdb) b +9 (gdb) b -1 On a memory address (use *): (gdb) b *00x2324 Set a watchpoint on a variable or expression: (gdb) watch x (gdb) watch_target &x (gdb) watch_target (<type of x> *) *<addr of x> Display a list of breakpoints and watchpoints: (gdb) info break (gdb) info watch Help: (gdb) help

2.3

Analyze the Process Core using dbx on Linux

To invoke dbx: dbx a.out core To invoke dbx: dbx program_name OR:

dbx pid OR: dbx a pid OR: dbx -d 100 program_name core_file OR: dbx -d 100 -a pid At dbx Prompt: (dbx) run (dbx) where (dbx) threads (dbx) status (dbx) list main (dbx) print msg (dbx) check -access (dbx) check -memuse (dbx) help (dbx) quit

2.4

Analyze a Core Dump Using Oprofile on Linux

OProfile is a Linux system-wide Profiling Tool to Profile and Analyze Performance and Runtime Problems with Applications, or the Kernel. Gunzip the Kernel: cd /boot gunzip vmlinux-<something>.gz Run OProfile without Profiling the Kernel: opcontrol --no-vmlinux If you do want to Profile the Kernel: opcontrol --vmlinux=/boot/vmlinux-`uname -r` Start Collecting Data: opcontrol --start Dump the Collected Data: opcontrol --dump Stop Oprofile: opcontrol --stop If you want to Reset Profiling Counters: opcontrol --reset Report Collected Data: opreport

To Collect More Info: opcontrol --symbols OR: opcontrol -l To Create a Graph: opcontrol -c

2.5

Debug Libraries and Symbols on Linux

Trace Calls to the Library Function for the "ls" Command: ltrace /usr/bin/who Trace Calls to the Library Function for the "ls" Command and Log to File: ltrace -o ls.tr ls Trace All System Calls to the Library Function and Log to File: ltrace -S -o ls.tr ls Check Linked Libraries: ldd filename Check Module Info: modinfo module_name.ko The Names of the Files Containing the Object Code and Symbols for Libraries are in the ELF File. To Read ELF File: readelf program_of_interest | less Disassembly a Program: objdump -D -S <compiled_object_with_debug_symbols> > filename.out objdump -d -S module_name.ko > /tmp/whatever List Symbols: nm /usr/bin/who

2.6

Other Debugging Tools on Linux

gcore: Take a snapshot of a process: gcore o output_filename pid kill: Kill a process and generate its core dump: kill -SEGV <pid>

lsof: Get File Open by the Specified Process/Command: lsof -p 28290 lsof -a -p 28290 Check How Many Instances of sendmail are Open: lsof -c sendmail File Descriptors Number: ps -ef cd /proc/28290/fd ls -l | wc -l Get File Open by the Specified User: lsof -u root Get FileSystem iNodes: lsof -i /fs Check Open Files on the specified File System and Processes the use it: lsof /fs Check How Many Instances of sendmail are Open: lsof -c sendmail Check iNodes Usage on the specified File System: lsof i /fs List All Open Files for the User abe and for the Specified Process IDs: lsof -p 456,123,789 -u 1234,abe Find processes with open files on the NFS filesystem /nfs/mount/point whose server is inaccessible, and presuming your mount table supplies the device number for /nfs/mount/point: lsof -b /nfs/mount/point Send a SIGHUP signal to All of the Processes that have /u/abe/bar Open: kill -HUP 'lsof -t /u/abe/bar' Ignore the Device Cache File: lsof Di Get PID and command name field output for each process, file descriptor, file device number, and file inode number for each file of each process: lsof FpcfDi List the files at descriptors 1 and 3 of every process running the lsof command for login ID ''abe'' every 10 seconds: lsof -c lsof -a -d 1 -d 3 -u abe -r10 List All Files using Any Protocol on Any Port of mace.cc.org: lsof -i @mace List All Files using Any Protocol on the Specified Port Range of mace.cc.org: lsof -i @mace:123-140

List All IPv4 Network Files in Use whose PID is 1234: lsof -i 4 -a -p 1234 File Open by a Process: ps -ef cd /proc/28290/fd ls -lrt Process Info: cd /proc/28290 ls -l more status more limits more io more mounts more mountstat fuser: Get Process IDs and Login Names that have the /etc/passwd Files Open: fuser -u /etc/passwd Get Verbone Info Including Process IDs and Login Names that have the /etc/passwd Files Open: fuser -vu /etc/passwd Kill Processes Accessing the /var File System in Any Way: fuser -km /var Get Processes Running on the / File System and Print the Processes Name and Arguments: ps -o pid,args -p "$(fuser / 2>/dev/null)" If theres No Process on the specified Device, then Execute the xxx Command: if fuser -s /dev/ttyS1; then :; else something; fi Show All Processes at the Local Telnet Port: fuser telnet/tcp

3.Debug Process and Analyze Process Core on HP-UX


3.1 Debug Processes by using tusc

Trace a Process System Calls: tusc pid Trace a Process System Calls and Count it: tusc -c pid tusc -cc pid

tusc ccc pid Trace a Process System Calls and Count it adding more Informations: tusc -C pid tusc -cC pid Trace a init System Calls, Count it adding more Informations and Print Process Names: tusc -cCn 1 Trace Verbosely a Process System Calls and Follow Forks: tusc -vf pid Trace a Process System Calls, Follow Forks and Print Process Names: tusc -fn pid Trace Verbosely a Process System Calls, Follow Forks and Print Process ID: tusc -vfp pid Trace a bdf / System Calls and its Forks, Count it adding more Informations and Print Process Names: tusc -fcCn bdf / Trace Verbosely a bdf / System Calls and its Forks, and Print Process Names and Timestamp for Each Syscall and Signal: tusc -vfnT bdf / Trace a bdf / System Calls and its Forks, Count it adding more Informations, Print Process Names and Execution Time: tusc -fcCnD bdf / Trace a bdf / System Calls and its Forks, and Print Process Names, Duration Time and Timestamp for Each Syscall and Signal: tusc -fnDT bdf / Trace Verbosely a bdf / System Calls and its Forks, and Print Process Names, Duration Time and Timestamp for Each Syscall and Signal: tusc -vfnDT bdf / Trace a bdf / System Calls and its Forks, Count it adding more Informations, Print Process Names, Execution Time and Timestamp for Each Syscall and Signal: tusc -fcCnDT bdf / Trace a Process System Calls, Follow Forks and Keep Tracing Parent even if Parent Exits: tusc -fk pid Trace a Process System Calls, Printing Process Names and Timestamp for Each Syscall and Signal, and Detach Process if it Enters Traced Mode: tusc -tnT 455 Trace a Process System Calls Concentrating on exec() Functions: tusc -sexec pid 2> /dev/null & Trace a Process System Calls and File Descriptors and Log to File (lsnrctl.truss):

tusc -d -o lsnrctl.truss -p 3949 Trace a Process System Calls and the Specified File Descriptors and Log to File (lsnrctl.truss): tusc -dFileDescriptors -o lsnrctl.truss -p 3949 Trace Verbosely All of the System Calls of the ps -ef command: tusc v -o /var/tmp/syslog.truss.out -sall -p ps -ef Trace a Process System Calls, and Print Read Buffers for All of the File Descriptors: tusc -rall <PID> Trace a Process System Calls its Forks, and Print the Read Buffers for the Specified File Descriptors: /usr/local/bin/tusc -f -r 3,4,5,6 -o /tmp/trace_results /usr/local/sbin/snmpd Trace a Process System Calls and its Forks, and Print Read and Write Buffers for given File Descriptors, but dont show Sleeping Syscalls: tusc -rall -wall -f <PID> Trace a Process System Calls and its Forks, and Print Read and Write Buffers for given File Descriptors: tusc -rall -wall -f -i <PID> Trace a Process System Calls and its Forks, and Print Execution Time: tusc f -D /usr/local/sbin/snmpd Trace a Process System Calls and its Forks, and Print exec Arguments and Execution Time: tusc f -a -D /usr/local/sbin/snmpd Trace a Process System Calls and Execution Time and Count it: tusc -c sqlplus "/ as sysdba" << EOF exit; EOF Trace a Process System Calls and Execution Time: tusc -d sqlplus "/ as sysdba" << EOF exit; EOF Trace Specific System Calls: tusc s syscall_name 455 Trace Specific Signals: tusc S syscall_name 455 Execute syslog-ng, follow children, print timestamps and Send output to /tmp/tusc.out: tusc -faepo /tmp/tusc.out -v -T %H:%M:%S /opt/syslog-ng/sbin/syslog-ng Execute sqlplus, follow children, print timestamps and Send output to /tmp/tusc.out: tusc -faepo /tmp/tusc.out -v -T %H:%M:%S sqlplus scott/tiger Execute sqlplus, follow children and Send output to /tmp/tusc.out: tusc -faepo /tmp/tusc.out -v sqlplus scott/tiger

Attach to a running process and Send output to /tmp/tusc.out: tusc -faepo /tmp/tusc.out -v -T %H:%M:%S -p <pid> tusc -faepo /tmp/tusc.out -v -p <pid> tusc -faepo /tmp/tusc.out -p <pid> Unless advised otherwise, the minimum options used should be: tusc -faepo <output file> .... Trace Verbosely System Calls and its Forks, Print Environment Variables, Process Names, PIDs, Timestamps and Duration Time: tusc -e -n -p -T '%T' -D -f v pid Run truss on Log Files to Detect System Problems: tail -f /var/adm/SYSLOG tail -f /var/adm/messages tail -f /var/log/syslog /usr/local/bin/sstep ls Find the PIDs of the Processes to Trace: function get_pid { (echo foo 0 ${1};ps -ef)| grep ${1} | grep -v "grep *${1}" | tail -1| awk '{if ($2 > 0) {print $2} else {print ""}}' } /opt/tusc/bin/tusc -o /tmp/tusc.log -v -r all -w all -p -T "%d.%m.%Y %H:%M:%S" `get_pid WorkManager` OR with Multiple get_pid: /opt/tusc/bin/tusc -o /tmp/tusc.log -v -r all -w all -p -T "%d.%m.%Y %H:%M:%S" `get_pid WorkManager` `get_pid SolidDesigner` `get_pid MEls`

3.2

Instaling tusc on HP-UX 11.xx

Download the tusc package for your HP-UX version and architecture at the following address: http://hpux.connect.org.uk/hppd/cgi-bin/search?term=tusc&Search=Search Create a temporary directory and upload the depot onto it: mkdir /tmp/tempo_inst Access the temporary directory and gunzip the depot cd /tmp/tempo_inst gunzip tusc-x.x-xxxx-11.xx.depot.gz ls -l Install the depot package by using one the following methods: a) By using swinstall (recommended): swinstall -s tusc-x.x-xxxx-11.xx.depot

OR b) By manually extract the tarball and copying the content to the appropriate directories: tar xf tusc-x.x-xxxx-11.xx.depot Access the bin subdir of the depot directory and copy its content to the /bin directory: cd tusc/tusc-RUN/usr/local/bin/ cp * /bin/ Access the man subdir of the depot directory and copy its content to the /usr/local/man/man1 directory: cd ../man/ cp man1/tusc.1 /usr/local/man/man1/

3.3 Debug Processes and Core Files by using HP WDB / GDB


The HP Wildebeest Debugger (WDB) is an HP-supported implementation of the Open Source GNU debugger (GDB). HP WDB / GDB can be used to debug / monitor a process, but it mostly used to analyze crashed processes core files and systems crash dumps. Check if HP WDB is installed: swlist -l fileset | grep -i wdb If HP WDB is not installed, you can download the latest version (6.3) for your HP-UX version and architecture from here: you need an HP AllianceONE account with appropriate provileges. Upload the depot file onto the servers /tmp directory, access the directory and decompress it: cd /tmp gunzip hpwdb.xxxx.xxxx.depot.gz Install the depot: swinstall s hpwdb.xxxx.xxxx.depot/* The main path are: /opt/langtools/wdb /opt/langtools/gdb /opt/langtools/bin To monitor/debug a process: gdb crashdebug pid Before analyzing a process core file, check it: file corefile_name strings corefile_name Check if its truncated:

elfdump -o -S core To analyze a process core file generated by the snmpd daemon: gdb /usr/bin/snmpd core OR: gdb /usr/bin/snmpd -c core OR: gdb -c core OR: gdb /usr/bin/snmpd OR to start the HP WDB GUI: wdb /usr/bin/snmpd OR to start gdb with XDB support: gdb xdb /usr/bin/snmpd OR to start gdb with XDB support using Terminal User Interface: gdb xdb -tui /usr/bin/snmpd OR: gdb At gdb Prompt: (gdb) core core If the executable path is not provided, the debugger selects the invocation path of the process that generated the core file. The invocation path information is stored in the core file. If the invocation path is a relative path, you must enter the executable while debugging the core file. To start debugging the process, at the gdb Prompt, invoke the core file: (gdb) core core View the stack trace: (gdb) backtrace List all threads of the process at the time of the crash: (gdb) info thread View the specified thread: (gdb) thread thread_id Disassembly a specified section in the memory: (gdb) disassemble <address> Displays memory information of a specified address: (gdb) x / s <address> Display the contents the registers: (gdb) info registers

Display all registers, including floating point registers: (gdb) i all Display informations about all of the shared libraries: (gdb) info shared Prints the target that is currently under the debugger: (gdb) info files (gdb) info target To view the source code: (gdb) list Start the target program: (gdb) run Set a breakpoint: (gdb) break sum On a line number: (gdb) b 25 On a n offset on the current line: (gdb) b +9 (gdb) b -1 On a memory address (use *): (gdb) b *00x2324 Set a watchpoint on a variable or expression: (gdb) watch x (gdb) watch_target &x (gdb) watch_target (<type of x> *) *<addr of x> Display a list of breakpoints and watchpoints: (gdb) info break (gdb) info watch Force a core dump and create a core image file for the process under the debugger: (gdb) dumpcore core_filename Pack the core file along with the relevant executable and libraries in a single tar file for core file debugging on another system: (gdb) packcore Unpack the tar file that is generated by the packcore command so the debugger can use the executable and shared libraries from this bundle, when debugging the core file on a different system from the one on which the core file was originally created: (gdb) unpackcore To Debug Memory with gdb Set heap checking options: (gdb) set heap-check [option][on/off] Detection leaks:

(gdb) set heap-check leaks [on/off] Detect double-frees and free improper arguments: (gdb) set heap-check free [on/off] Check for out-of-bounds corruption: (gdb) set heap-check bounds [on/off] Set the number of frames to be printed for leak and heap profiles: (gdb) set heap-check frame-count [num] Produce a heap allocations report: (gdb) info heap [heap.out] Produce a memory leak report: (gdb) info leaks [leaks.out] Lists the potential in-block corruptions in all the freed blocks: (gdb) info corruption Search for a Pattern in the Memory Address Space (gdb) find &str[0], &str[15], "string_to_search" (gdb) find &a[0], &a[10], "el",'l' where &a[0] Specifies the start address of the memory address range. &a[10] Specifies the end address of the memory address range. el, 'l' Specifies the pattern. (gdb) find /1 &int8_search_buf[0], +sizeof(int8_search_buf), 'a', 'a', 'a' where /1 Specifies the find command to display only one matching pattern. &int8_search_buf[0] Specifies the starting address. +sizeof(int8_search_buf) Specifies the ending address. 'a', 'a', 'a' Specifies the pattern (expr1, expr2, expr3). (gdb) find /b &int8_search_buf[0], &int8_search_buf[0]+sizeof(int8_search_buf), 0x61, 0x61, 0x61, 0x61 where /b Specifies that the size of the pattern is 8 bits. &int8_search_buf[0] Specifies the starting address. &int8_search_buf[0] +sizeof(int8_search_buf) Specifies the ending address. 0x61, 0x61, 0x61, 0x61 Specifies the pattern (expr1, expr2, expr3, exp4). Avoid Core File Corruption To prevent overwriting of core files from different processes, set the kernel parameter core_addpid to 1. The core file is stored in a file name, <core.pid> in the current directory. To set the kernel parameter to prevent core file corruption, create a script called corepid: On HP-UX 11i v1 systems

case $1 in on) echo "core_addpid/W 1\ncore_addpid?W 1" | adb -w -k

/stand/vmunix /dev/kmem;; off) echo "core_addpid/W 0\ncore_addpid?W 0" | adb -w -k /stand/vmunix /dev/kmem;; stat) echo "core_addpid/D\ncore_addpid?D" | adb -w -k /stand/vmunix /dev/kmem;; *) echo "usage $0: on|off|stat";; esac
On HP-UX 11i v2 systems

case $1 in on) echo "core_addpid/W 1\ncore_addpid?W 1" | adb -o -w /stand/vmunix /dev/kmem;; off) echo "core_addpid/W 0\ncore_addpid?W 0" | adb -o -w /stand/vmunix /dev/kmem;; stat) echo "core_addpid/D\ncore_addpid?D" | adb -o -w /stand/vmunix /dev/kmem;; *) echo "usage $0: on|off|stat";; esac
Then, get the current settings: . /corepid stat To enable the feature to store the core file in the file core.pid (set core_addpid to 1), run the script: . /corepid on Get again the current settings to check the change: . /corepid stat If you want to disable the feature to store the core file in the file core.pid (set core_addpid to 0), run the script: . /corepid off . /corepid stat On HP-UX 11i v3 systems, use coredm, that allows to specify the location and pattern for core files created by abnormally terminating processes: it also allows to specify the process specific pattern for the file name of the core file. To set the global core file settings to include the process-ID and the system name in the file name of the core and to place the core file in the specified path, <path>, run: coreadm -e global -g <path>/core.%p.%n Java Core File Debugging HP WDB shows stack traces of mixed Java, C, and C++ programs for java corefile. The GDB_JAVA_UNWINDLIB environment variable must be set to the path name of the Java unwind library. If the Java and system libraries used by the failed application reside in non-standard locations, then the GDB_SHLIB_PATH environment variable must be set to specify the location of the libraries. Invoke gdb on a core file generated when running a 32-bit Java application on an Integrity system with /opt/java1.4/bin/java: gdb /opt/java1.4/bin/IA64N/java core.java

Invoke gdb on a core file generated when running a 64-bit Java application on an Integrity system with /opt/java1.4/bin/java -d64: gdb /opt/java1.4/bin/IA64W/java core.java Invoke gdb on a core file generated when running a 32-bit Java application on PA-RISC using /opt/java1.4/bin/java: gdb /opt/java1.4/bin/PA_RISC2.0/java core.java Invoke gdb on a core file generated when running a 64-bit Java application on PA-RISC using /opt/java1.4/bin/java: gdb /opt/java1.4/bin/PA_RISC2.0W/java core.java

3.4

Debug Processes by using truss

Trace a Process System Calls: truss -p pid Trace a Process System Calls and Count it: truss -c -p pid Trace a Process System Calls and Count it adding more Informations: truss -C -p pid Trace a Process System Calls and Follow Forks: truss -f -p pid Trace a Process System Calls Concentrating on exec() Functions: truss -sexec -p pid 2> /dev/null & Trace a Process System Calls and File Descriptors and Log to File (lsnrctl.truss): truss -d -o lsnrctl.truss -p 3949 Trace All of the System Calls of the ps -ef command: truss -o /var/tmp/syslog.truss.out -sall -p ps -ef Trace a Process System Calls, and Print Read Buffers for given File Descriptors: truss -rall -p <PID> Trace a Process System Calls and its Forks, and Print Read and Write Buffers for given File Descriptors: truss -rall -wall -f -p <PID> Trace a Process System Calls and its Forks, and Print Execution Time: truss f -D /usr/local/sbin/snmpd Trace a Process System Calls and its Forks, and Print exec Arguments and Execution Time: truss f -a -D /usr/local/sbin/snmpd Trace a Process System Calls and Execution Time and Count it: truss -c sqlplus "/ as sysdba" << EOF

exit; EOF Trace a Process System Calls and Execution Time: truss -d sqlplus "/ as sysdba" << EOF exit; EOF Verbosely Trace init: truss -p -v all 1 Run truss on a Command: truss -d date Run truss to Debug Application Start: truss -rall -wall lsnrctl start truss -aef lsnrctl dbsnmp_start nohup /opt/tusc/bin/truss -o /tmp/syslog-ng.truss -aef /usr/local/sbin/syslog-ng --debug --foreground --stderr > syslog-ng.out 2>&1 & grep syslog-ng.conf /tmp/syslog-ng.truss

3.5 Anlalyze Process Performance by using Caliper on HP-UX 11.xx


HP Caliper is a general-purpose performance analysis tool for applications on HP-UX and Linux systems running on HP Integrity Servers. If it is not installed, you can download the current Caliper version 5.5: you need an AllianceONE account with appropriate privileges. Upload the depot file on the server, gunzip it and install it: gunzip caliper.xx.xxxx.depot.gz swinstall s caliper.xx.xxxx.depot You can use with Caliper an initialization file (called .caliperinit), so it automatically uses this file at startup for data collection or data reporting runs. Putting the options in an initialization file simplifies the command line you use. This file is not required, but can be useful. If in the .caliperinit file the --read-init-file option is set to True, then Caliper will be used. You can find a sample initialization file in the caliper home, under the examples/startup_file/caliperinit directory: rename it to .caliperinit. Here is an example of the content: ********************************************************************

#Options applied to all report types. application ='myapp' arguments = '-myarg 2' context_lines = 0,3 summary_cutoff = 1

detail_cutoff =5 source_path_map = '/proj/src,/net/dogbert/proj/src:/home/wilson/work' #Report-specific options. if caliper_config_file == 'branch': sort_by = 'taken' elif caliper_config_file == 'fprof': sort_by = 'fcount' report_details = 'statement' context_lines = 'all' # Apply an option to a subset of reports. if caliper_config_file in ("fcount"): module_exclude = '/usr/lib/'
******************************************************************** caliper uses particular measurement configuration files you can edit or create according to your needs you can find it in /opt/caliper: cd /opt/caliper ls lrt The measurement configuration files provided with HP Caliper and the main performance measurements they take are the following: alat measurement measures and reports sampled advance load address table (ALAT) misses branch measurement cgprof measurement measures and reports a call graph profile, produced by instrumenting the application code cpu measurement and per-process metrics cstack measurement cycles measurement dcache data cache measurement ecount total CPU event counts measurement fcount function call counts measurement fprof function profile measurement icache instruction cache metrics measurement scgprof measurement measures and reports (an inexact) call graph profile, produced by sampling the PMU to determine function calls traps measurement collects and reports a profile of traps, interrupts, and faults. fprof (flat profile) shows the parts of the process that have the highest CPU usage: caliper fprof ./binary_name Show the parts of the process that have the highest CPU usage reporting both source and instructions (-r all) and logging the output to file: caliper fprof -o out.txt -r all Run the default measurement, scgprof: caliper ktrace Run functions call count measurement: caliper fcount ktrace

CPU measurement for the specified application or process: caliper cpu my_new_app System-wide CPU measurement (log output to file): caliper cpu -w -e 120 -o cpu.txt Measure CPU and Memory for the specified process and report: caliper cpu -o REPORT --memory-usage=all my_app Measure CPU and system usage for the specified process and report: caliper cpu -o REPORT --system-usage=all my_app Create a call graph profile with HP Caliper: caliper scgprof [caliper_options] program [program_arguments] Create a report: caliper report [options] The overview measurement enables collecting fprof, dcache, and cstack data in one single collection run: caliper overview -o rpt my_app Collect system-wide fprof and dcache data for a duration of 300 seconds: caliper overview -w -e 300 -o rpt Override the sampling_spec setting in pmu_trace: caliper pmu_trace -s period,variation,cpu_event program Override the events to be measured in ecount on HP-UX: caliper ecount -m cpu_event,cpu_event program Override the kernel stop functions and get all frames in the cstack on HP-UX: caliper cstack --stop_functions = "" program Create a call stack profile report in the file named results.save when profiling the program enh_thr_mutex1: /opt/caliper/bin/caliper cstack -o results.save enh_thr_mutex1 To stop Caliper: kill -s INT caliper_process_ID

3.6 Analyze Process Performance by using Prospect on HP-UX 11.xx


Prospect is a performance analysis tool. On HP-UX, Prospect uses the Kernel Instrumentation (KI) tracing and Kernel Timing Clocks (KTC) package. Prospect collects data from the kernel that is only a "window of time". Prospect is available on HP-UX (PA-RISC 64-bit kernel). If it is not installed, you can download the current Prospect version 2.6.1: you need an AllianceONE account with appropriate privileges.

Upload the depot file on the server, gunzip it and install it: gunzip prospect.xx.xxxx.depot.gz swinstall s prospect.xx.xxxx.depot\* You can use Prospect to profile Java applications on HP-UX. Prospect has additional features in profiling Java applications when running an HP JVM on HP-UX: to activate these features you must install HP Hotspot JVM version 1.3.1.02 or later. Obtain symbols of JVM compiled methods in Prospect's output: prospect -V3 -foutput java -XX:+Prospect Qsort 1000000 Profile a JVM process with the specified PID: prospect -j1495 -V4 -foutput2 sleep 20 Profile the specified process or application: prospect my_app Verbosely profile the specified process or application: prospect -v my_proc You can use Prospect as a statistical profiler to extract function or assembly level profiles and exact system call timings for processes of interest. In order to use Prospect in this mode, you first need to activate KI and keep it active. This is done via the daemon mode of Prospect: prospect -P This mode does not consume any processor resources, it is used only as a way to keep the KI trace active. Prospect collects data over an interstice in time. Use KI and distill the output for the immediate child of Prospect (in this case, my_app), and output the summary, memory maps, profiles, and system call tables into a file called "output": prospect -V 2 -f output my_app Outputs only information of the direct descendant child: prospect -V2 -f output1 my_app Record all traces sampled in the time my_app ran into a binary file called "Tfile1": prospect -T Tfile1 my_app Read the trace out of a file: prospect -t Tfile1 -f output42 Sample the kernel for 120 seconds and output the results to a file called "output: prospect -V k -f output sleep 120 See how the kernel is performing while a specific application is running and also how that application is performing, put kernel profile in file "kern_output" and your user process profile (my_app) in file "proc_output": prospect -TTfile my_app prospect -tTfile -Vk -fkern_output prospect -tTfile -V2 -fproc_output

Start the program to be profiled under prospect hprof (hierarchical profile), generate a user time profile of gzip run and save the output to file: prospect --hprof --output-file=hprof.out gzip firebolt.tar Start the program to be profiled under prospect hprof (hierarchical profile), generate a user time profile of gzip run and save the output to file with a sampling interval of the run is 100ms: prospect --hprof --sampling-interval=100 gzip firebolt.tar Generate a HP Caliper-like fprof reports: prospect --fprof --output-file fp.out ./qsort32 Attach a running process specified by process ID: prospect --fprof --output-file fp.out --attach=1234 Create a binary trace file: prospect --fprof --datafile=fp.cdf ./qsort32 Generate fprof report from the binary trace file: prospect --report --datafile=fp.cdf -o fp.out Profile for a particular duration of time: prospect --fprof -o fp.out --duration=5 ./loop 10000000 Specify function summary cutoffs: prospect --fprof --summary-cutoff=,80 ./wordplay Specify function deltails cutoff: prospect --fprof ./qsort32 (collect mode) prospect --report --detail-cutoff=,80 ./qsort32 (report mode) Generate single report for a multithreaded application with the results of all threads aggregated together: prospect --fprof --thread=sum-all ./threadsthread Report per-thread data for a multithreaded application: prospect --fprof --thread=all -o fp.out ./threadsthread Report per-module data for a multithreaded application: prospect --fprof --per-module-data=TRUE --thread=all ./threadsthread Exclude load modules: prospect --fprof --thread=all --module-exclude=/usr/lib ./threadsthread Include load modules: prospect --fprof --thread=all --module-default=none --module-include=threadsthread ./threadsthread Collect profile data till the processes specified terminate: prospect -V6 pid1,pid2,pid3 -f log Collect profile data for a specified duration of time: prospect -V6 pid1,pid2,pid3 -f log sleep <duration>

Get a raw ASCII file of KI trace records: prospect -T BinTraceFile sleep 30 prospect -t BinTraceFile -F AsciiTraceFile Prospect KI kernel buffer freeing: kill <prospect -P daemon> prospect -a Prospect KI buffer sizing: kill <prospect -P daemon> prospect -a prospect -A 4194304 prospect -P To find out how much lockable memory your system has: dmesg | grep lockable

3.7 Live Memory Analysis on HP-UX 11.xx by using KWDB


KWDB can analyze a live system to find memory leaks, performance issues and more. Find the pathname of the currently running vmunix: kmpath KWDB on PA requires the kernel file to be preprocessed by pxdb (change the kernel filename if it is not the standard /stand/vmunix): pxdb /stand/vmunix Start KWDB with Q4 support to debug the kernel file, and set up the devmem target to read from /dev/mem and /dev/kmem: kwdb -q4 /stand/vmunix /dev/kmem OR: kwdb /stand/vmunix (kwdb) target devmem (kwdb) set kwdb q4 on OR you can also run: q4 /stand/vmunix /dev/kmem At the q4 Prompt q4> load struct utsname from &utsname q4> print tx Get a listing of all the structures and typedefs that contain the string of characters callout: q4> cat callout Get a listing of all the fields defined in a callout structure: q4> fields -cx struct callout

Load all the callout structures from the callout table: q4> load struct callout from callout max ncallout List all the different flag fields in these structures: q4> print c_flag | sort u Keep only those callout structures with the PENDING_CALLOUT flag set: q4> keep c_flag & PENDING_CALLOUT List all the different function addresses pointed to by these structures: q4> print -x var.real_callout.cc_func | sort -u Get name of kernel routines found in the previous step: q4> examine 0x191a08 using a q4> ex 0x19e3e8 using a q4> ex 0x8c230 using a Display the instructions of the functions: q4> conde unselect Look into the near term, mid term and far future events. Load the near term callout headers and list different types from the flag fields: q4> load struct callout from callout_time_nr max ncallout until callout_time_md q4> print c_flag | sort u List the absolute time fields for the headers: q4> print indexof c_abs_time_hi c_abs_time_lo Load the mid term callout headers and print the absolute time fields for the headers: q4> load struct callout from callout_time_md max 256 Load the callout header for far future events, (there is only single header for all far future events) and display contents: q4> load struct callout from callout_time_ff q4> print tx Load the linked list of structures associated with this and print types and absolute times for each of them: q4> load struct callout from c_time_next max ncallout next c_time_next q4> print c_abs_time_hi c_abs_time_lo c_flag Load the hash headers and display flags, times and links: q4> load struct callout from callout_hash max 256 q4> print -x flag c_abs_time_lo c_time_next c_hash_next Load two of the expired headers and display all the fields: q4> load struct callout from callout_hash skip 256 max 2 q4> print tx

3.8
gcore:

Other Debugging Tools on HP-UX 11.xx

Take a snapshot of a process: gcore o output_filename pid kill: Kill a process and generate its core dump: kill -SEGV <pid> lsof: Check Open Files on the specified File System and Processes the use it: lsof /fs Check How Many Instances of sendmail are Open: lsof -c sendmail Check iNodes Usage on the specified File System: lsof i /fs Check Files Opened by the Specified User: lsof u user_name List All Open Files for the User abe and for the Specified Process IDs: lsof -p 456,123,789 -u 1234,abe Find processes with open files on the NFS filesystem /nfs/mount/point whose server is inaccessible, and presuming your mount table supplies the device number for /nfs/mount/point: lsof -b /nfs/mount/point Send a SIGHUP signal to All of the Processes that have /u/abe/bar Open: kill -HUP 'lsof -t /u/abe/bar' Ignore the Device Cache File: lsof Di Get PID and command name field output for each process, file descriptor, file device number, and file inode number for each file of each process: lsof FpcfDi List the files at descriptors 1 and 3 of every process running the lsof command for login ID ''abe'' every 10 seconds: lsof -c lsof -a -d 1 -d 3 -u abe -r10 List All Files using Any Protocol on Any Port of mace.cc.org: lsof -i @mace List All Files using Any Protocol on the Specified Port Range of mace.cc.org: lsof -i @mace:123-140 List All IPv4 Network Files in Use whose PID is 1234: lsof -i 4 -a -p 1234 fuser: Get Processes and related Username Running on the /var File System: fuser -uc /var Get Process IDs and Login Names that have the /etc/passwd Files Open:

fuser -u /etc/passwd Get Processes Running on the Specified Device: fuser -xc /dev/hd3 Kill Processes Running on the /var File System: fuser -ku /var Get Processes Running on the / File System and Print the Processes Name and Arguments: ps -o pid,args -p "$(fuser / 2>/dev/null)" A Debugging Example: type midaemon file `which midaemon`what `which midaemon` ldd `which midaemon` grep -i midaemon /etc/* grep -i midaemon /etc/init.d/* swlist -l file | grep midaemon lsof -c midaemon ps -elf | sed -n '1p; /midaem[.]*on/p;' lsof | sed -n '1p; / 17949 /p' lsof | sed -n '1p; / 17923 /p' tusc 2198 strings `which midaemon` | head -n 7 tail -n 30 /var/opt/perf/status.mi

4.Debug Process and Analyze Process Core on IBM AIX


4.1 Debug Processes by using proctools

Proctools are similar to Solaris ptools: see Solaris Section about ptools. Get Process Stack Trace: procstack Prints Pending and Held Signals for Process: procflags Display Signal Action and Handlers for Process: procsig Report stat and fcntl Info for All Open Files in Each Process: procfiles -n pid Print the Current Working Directory of the Process: procwdx Display the Process Tree:

proctree

4.2

Debug Processes by using trace

The IBM AIX trace tool is conceptually similar to Linux strace. Use trace Interactively: trace > !anycmd >q Start trace Asynchronously: trace -a; anycmd; trcstop Trace the System for 10 Seconds: trace -a; sleep 10; trcstop Output Tracing Data to a Specified Log File (Instead of the Default /var/adm/ras/trcfile): trace -a -o /tmp/my_trace_log; anycmd; trcstop Trace the Process "mydaemon" which is Currently Running: trace -A mydaemon-process-id -Pp Trace a "cp" Command, Excluding Specific Events - in this case, lockl and unlockl functions (20e and 20f events): trace -a -k "20e,20f" -x "cp /bin/track /tmp/junk" Trace a "cp" Command, Excluding Specific Events - in this case, lockl and unlockl functions (20e and 20f events) - and Produce a Raw Trace Output File: trace -a -k "20e,20f" -o trc_raw ; cp ../bin/track /tmp/junk ; trcstop Trace the Hook 234 and the Hooks that will Allow to See the Process Names (in this case trace the event-grou tidhk plus hook 234): trace -a -j 234 -J tidhk Trace Using One Set of Buffers per Processor. The Command will Produce the Files /var/adm/ras/trcfile, /var/adm/ras/trcfile-0, /var/adm/ras/trcfile-1, etc. up to /var/adm/ras/trcfile-(n-1), where n is the number of processors in the system. trace -aC all Trace a Program that Starts a Daemon Process And Continue Tracing the Daemon after the Program: trace -X "mydaemon" Capture PURR, PMC1 and PMC2: trace -ar "PURR PMC1 PMC2" Format a trace Raw Output as a Report: trcrpt -O "exec=on,pid=on" trc_raw > cp.rpt Format a trace Raw Output as a Report, Excluding the VMM Activity Detail: trcrpt -k "1b0,1b1" -O "exec=on,pid=on" trc_raw > cp.rpt2

Format a trace Output which Consists of Multiple Files: trcrpt -C all -r trace.out > trace.tr Reading a trace Report: trace -a -k 20e,20f -o trc_raw Filter the trace Report Searching for the Event ID for the open() System Call: trcrpt -j | grep -i open Filter the trace Report by Checking the Event ID 15b: trcrpt -d 15b -O "exec=on" trc_raw Filter the trace Report to Display Only the open() Subroutines: trcrpt -d 15b -p cp -O "exec=on" trc_raw To Format a trace Output from a System as a Report on Another System, run: trcnm > trace.nm OR Copy Also the /etc/trcfmt of the Traced System (as the Other System could have Different trace Format Stanzas): trcrpt -n trace.nm -t trcfmt_file -o newfile And then Run trcrpt on the Other System: trcrpt -n trace.nm -o newfile Generate CPU Report from a trace: curt -i trace.r -o outputfile curt -i trace.raw -m trace.nm -o outputfile curt -e -i trace.r -m trace.nm -n gensyms.out -o curt.out curt -s -i trace.r -m trace.nm -n gensyms.out -o curt.out cat curt.out trace -n -C all -d -j 100,101,102,103,104,106,10C,134,139,200,215,419,465,47F,488,489,48A,48D,492,6 05,609 -L 1000000 -T 1000000 -afo trace.raw curt -i trace.raw -n gensyms.out -o curt.out cat curt.out Generate Input File for curt: HOOKS="100,101,102,103,104,106,10C,119,134,135,139,200,210,215,38F,419,465,4 7F,488,489,48A,48D,492,605,609" SIZE="1000000" export HOOKS SIZE trace -n -C all -d -j $HOOKS -L $SIZE -T $SIZE -afo trace.raw export LIBPATH=/usr/ccs/lib/perf:$LIBPATH trcon ; pthread.app ; trcstop unset HOOKS SIZE ls trace.raw* trace.raw trace.raw-0 trace.raw-1 trace.raw-2 trace.raw-3 trcrpt -C all -r trace.raw > trace.r rm trace.raw* ls trace* trace.r gensyms > gensyms.out

trcnm > trace.nm

4.3

Debug Processes by using syscalls

The System Crashes if ipcrm -M sharedmemid is Run after syscalls has been run. Run stem -shmkill instead of Running ipcrm -M to Remove the Shared Memory Segment. Display System Calls Count: syscalls -start syscalls -c Collect System Calls for a Program: syscalls -x /bin/ps Trace a Process and Log to File: syscalls -o filename -p pid -start Simulate the C Code Fragment: output=open("x", 401, 0755); write(output, "hello", strlen("hello")); Run syscall open x 401 0755 \; write \$0 hello \#hello

4.4

Debug Processes by using watch

Watch All Files Opened by the "bar" Command: watch -e FILE_Open /usr/lpp/foo/bar -x Watch All Files Opened by the "bar" Command and Log to File: watch -e FILE_Open /usr/lpp/foo/bar -x -o output_file Watch the Installation of the Specified Program: watch /usr/sbin/installp xyzproduct

4.5

Debug Processes by using ProbeVue

Start ProbeVue with a Script: probevue myscript.e probevue <myscript.e Running ProbeVue on a Program: probevue -X progname -A prog-arguments myscript Format ProbeVue Output as CSV File: probevue -X /usr/bin/tar -A "-cf /dev/null /scratch/bcobb/probevue" ./p2.e | tee t.csv

Example of ProbeVue Script to Monitor a Program: #!/usr/bin/probevue double engine(int p1, int p2); @@uft:$1:*:engine:entry { printf("PID=%d TID=%d PPID=%d PGID=%d UID=%d GID=%d InKernel=%d\n", __pid, __tid, __ppid, __pgid, __uid, __euid, __kernelmode); printf("ProgName=%s errno=%d\n", __pname, __errno); printf("---\n"); stktrace(GET_USER_TRACE,20); printf("+++\n"); stktrace(PRINT_SYMBOLS|GET_USER_TRACE,20); exit; }

4.6

Debug Processes by using truss

See Solaris Section about truss truss -deaf -o truss.out program

4.7

Debug Processes by using dbx

See Solaris and Linux Section about dbx dbx exe core

4.8

Analyze a Processes Core by using KDB

Start analyzing a Core: kdb dump At kdb Prompt, Display Status: >stat Initial CPU Context: >cpu 1 VMM Error Log: >vmlog Process Info: >proc * Get Threads: > thread * pid Output: >p 3

4.9

Other Debugging Tool on IBM AIX

gcore: Take a snapshot of a process: gcore o output_filename pid kill: Kill a process and generate its core dump: kill -SEGV <pid> Other: Get which Application Created the Core: lquerypv -h core 500 64 List Debugging Commands: bindprocessor -q Show if 64-bit Kernel is Active: bootinfo K Show wether the Hardware in Use is 32-bit or 64-bit: bootinfo y Check libraries loaded by the specified process: ps -u sj1e652a | grep WILogin procldd 21922 Dump a library looking for API-type exported symbols: dump -Tv bin/orb/shlib/libit_art5_xlc50.so 2>&1 dump -Ctv E652/bin/WIReportServer ps -u sj1e652a | grep WILogin procldd 21922 dump -Tv bin/orb/shlib/libit_art5_xlc50.so 2>&1| grep EXP | c++filt | more truss -t!all -s!all -u libit_*::CORBA* -p 21922 dump -Ctv E652/bin/WIReportServer | grep FUNC.*GLOB.*9.*dgWICDZ_i ps -u sj2e652s -o pid,args | grep WIReportServer truss -t!all -s!all -u a.out::*dgWICDZ_* -p 18846 2>&1 | tee -a out.txt cat out.txt | c++filt pldd 18846 truss -t!all -s!all -u libclntsh -p 18846 2>&1 | tee -a out.txt dump -H ldd procfiles lockstat -IWk example_tnf 24 InterProcess Communication Facilities: ipcs System Attributes (Entries Marked as "True" are Configurable) lsattr -l sys0 -E

Changes the High/Low water marks for Pending Write I/Os per File: lsattr -l sys0 -a maxpout=9 -a minpout=6 Process Profiling: pprof Paging Space Statistics: pstat -s System Variables: pstat -T Paging Statistics: lsps -a Display Path Name from iNode Number: ncheck - i <inode> List Files and grep for the iNode: ls -ail |grep <inode> Report Placement of File Blocks: fileplace -pv /unix Monitor Activity at All FileSystem Levels and Write the Results to /tmp/filemon.log: filemon -o /tmp/filemon.log -O all trcstop CPU Profile: tprof CPU Usage Statistics: netpmon -o /tmp/netpmon.log -O all trcstop dkvis nfsvis systat mpvis dkstat

5.Debug Process and Analyze Process Core on IRIX


gcore: Take a snapshot of a process: gcore o output_filename pid kill: Kill a process and generate its core dump: kill -SEGV <pid>

par prfstat SystemTap lockstat -IWk example_tnf 24

6.Debug Process and Analyze Process Core on Tru64


gcore: Take a snapshot of a process: gcore o output_filename pid kill: Kill a process and generate its core dump: kill -SEGV <pid> trace truss atom -tool ptrace odump -Dl ldd lockstat -IWk example_tnf 24 lockinfo

7.Generate / Analyze a Crash Dump on Solaris


7.1 Save a Crash Dump on a Panicd System

Check if savecore is Enabled: /etc/init.d/sysetup Get Core Dump (or Crash Dump) Configuration coreadm Save a Crash Dump of the Running Solaris System (without actually rebooting or altering the system): savecore -Lv OR: savecore -d Save a Crash Dump (Rebooting the System): reboot -d OR: uadmin 5 #

OR generate a system panic: adb -k -w /dev/ksyms /dev/mem -> rootdir/W 0 -> ls / If after enabling core file generation your system still does not create a core file, you may need to change the file-size writing limits set by your operating system: ulimit -a ulimit -c unlimited ulimit -H -c unlimited Check Generated Core Dump on Solaris: ls -lrt /var/crash/sunbkl01 cd /var/crash/sunbkl01 pstack vmcore file vmcore strings vmcore

7.2

Setup a System to Save a Crash Dump

Disable / Enable the Saving if Crash Dumps: dumpadm -n dumpadm -y Enable Compressed Crash Dump (Default): dumpadm -z on -y Enable Uncompressed Crash Dump (it Uses Mush Space): dumpadm -z off -y Check dumpadm Configuration more /etc/dumpadm.conf dumpadm Setup System for Full Crash Dump: dumpadm -c all -d /dev/md/dsk/d201 -s /var/crash/vasdbs02 OR Setup System for Dumping Kernel Memory Pages Only (it Saves Space and Time, but it's Less Accurate and Less Useful for Debugging a Problem): dumpadm -c kernel -d /dev/md/dsk/d201 -s /var/crash/vasdbs02 OR Setup System for Dumping Kernel Memory Pages, and the Memory Pages of the Process whose Threads was Currently Executing on the CPU on which the Crash Dump was Initiated. If the Thread executing on that CPU is a Kernel Thread Not Associated with any User Process, Only Kernel Pages will be Dumped: dumpadm -c curproc -d /dev/md/dsk/d201 -s /var/crash/vasdbs02 more /etc/dumpadm.conf Reconfigure the Dedicated Dump Device and Directory on which Crash Dumps will be Saved: dumpadm -d /dev/dsk/c0t2d0s2 -s /var/crash/server_name OR (on a System using SVM):

dumpadm -d /dev/md/dsk/d201 -s /var/crash/vasdbs02 OR Reconfigure the Dedicated Dump Device on Swap: dumpadm -d swap Restart Dumpadm Service and Check: svcadm restart svc:/system/dumpadm:default svcs -a | grep -i dumpadm To setup a method to automatically save crash dump on the older versions of Solaris OS, or on servers where dumpadm is not installed, you can create a script /etc/init.d/sysetup with the following content:

if [ ! -d /var/crash/uname -n ] then mkdir -p /var/crash/uname -n fi echo 'checking for crash dump...\c ' savecore /var/crash/uname -n echo ' '

7.3

Crash Dump Analysis on Solaris by using MDB

The Solaris Modular Debugger is a powerful debugger that replaces the adb and crash utilities that you can still find on Solaris systems beside the mdb. Access the the crash dump directory and check files: cd /var/crash/syste_name ls lrt pstack vmcore file vmcore strings vmcore Invoke the mdb Debugger: mdb -k unix.0 vmcore.0 OR: mdb -k 0 At mdb Prompt Get the time of the crash: *time-(*lbolt%0t100)=Y ::time/Y Get the core informations: ::coreinfo Get crash informations: ::system ::status Display the panic string: ::panicinfo

Display the stack trace: ::stack Display the message buffer (containing the panic string): ::msgbuf Display the crash log: ::crashlog Get CPU informations at the time of the crash: ::cpuinfo v Get semaphores informations at the time of the crash: ::ipcs ::dnlc Display the thread list at the time of the crash: ::threadlist ::tlist killed ::tlist pctcpu Show the kernel memory structures and the kernel memory log: ::kmastat ::kmalog ::ksemid ::kshmid ::kstat xck filename ::mdump/rd\* -P ::nvlist ::slist Show memory informations at the moment of the crash: ::memstat ::memerr ::meminfo tree process ::meminfo user command ::meminfo -m user Show symbols and processes informations (including the processes tree): ::nm ::symbols ::ps ::ps -z ::pgrep processname ::ptree ::proc Show open files at the moment of the crash: ::pfiles Display the callouts and the memory walkers: ::callout ::walkers Display the CPU cycle informations:

::cycinfo v Display disk, slices, partitions table, SVM and ZFS informations: ::vfstab ::svm -i ::svm [-s <set>] [-d <devnum>] ::zfs Get pools informations: ::pool Get NFS and shared filesystems informations: ::nfs ::autofs Get file lists at the time of the crash: ::findfiles Get cluster informations: ::clust Get zone informations: ::zone Get informations about the previously selected structure: ::whatis P Get network interfaces informations at the time of the crash: ::ifconf ::netstat Display memory dump informations: ::pkma -fslL ::scatenv mdump_compression Get the alternate CPU walk and follow it: ::scatenv alternate_cpu_walk ffffffffaaaf8760::whatis 30018ca2d20::print -t kthread_t 2a101423cc0::findstack 300423b2000::cpuinfo -v ::walk thread ::walk thread |::findstack ::walk cpu |::print cpu_t cpu_thread |::print kthread_t t_pri 0x3000b270078::print -t proc_t p_user.u_psargs cpu0::print cpu_t cpu_disp |::print disp_t First or Second Address in pstack ff21fca4::dis Second Address in pstack 0003cb08::nmadd -f badfunc Second Address in pstack and End Address in pstack: 0003cb08::nmadd -f -e 00020dc0 badfunc

0003cb08::dis Get the registers informations: ::regs Display memory leaks and walk the kernel memory log to find leaks: ::findleaks ::walk kmem_log | ::bufctl ! grep tleak d4db0300::whatis 0x0000000010035a94::what is -av ::walk kmem_log | ::bufctl -a d4db0300 d4db0300::kgrep | ::whatis -av 80506c0::nmadd -f -e 80506da badfunc Quit the debugger: $q An alternate method to invoke the debugger is to pass echoed commands by pipe: echo "*panicstr/s" |mdb -k unix.0 vmcore.0 echo "*cmm_dbg_buf/s" |mdb -k unix.0 vmcore.0 > ./cmm_dbg_buf.out echo "$<threadlist" |mdb -k unix.0 vmcore.0 > ./threadlist.out OR: fmdump -v

7.4 Service Tool Bundle Service Crash Analysis Tool


Download the Oracle Solaris Service Tool Bundle from the support.oracle.com web portal. Untar the Package, Access the Directory and Install Service Tool Bundle and Choose the Components to Install (you must Select Service Crash Analysis Tool): ./install_stb.sh Execute the Service Crash Analysis Tool (scat): cd /var/crash/cldbrm2a /opt/SUNWscat/bin/scat --scat_explore -a -v unix.1 vmcore.1 OR: /opt/SUNWscat/bin/scat --scat_explore -a -v 1 Access the Directory Created by scat and Analyze the Files: cd $SCAT_EXPLORE_DATA_DIR more panic.out more panic_thread.out more panic_buf.out more analyze.out more coreinfo.out more cpu-L.out more dev An alternate method to use SCAT is to access its Prompt:

/opt/SUNWscat/bin/scat 0 Then, at the scat Prompt, analyze the crash dump: SolarisCAT(vmcore.1/11X)> analyze Get the thread list: SolarisCAT(vmcore.1/11X)> threadlist Get CPU informations: SolarisCAT(vmcore.1/11X)> cpuinfo -v Get kernel tunables: SolarisCAT(vmcore.1/11X)> tunables Get the dispatch queues: SolarisCAT(vmcore.1/11X)> dispq Get ZFS informations: SolarisCAT(vmcore.1/11X)> zfs e Get ZFS informations: SolarisCAT(vmcore.1/11X)> zfs arc Run Sanity Checks: scat --sanity_checks vmcore.0 scat can Include an optional module to retrieve the type information from. List Modules: ctf Dump qlc logs (fp_logq or ssfcp_logq): qlcfc fplog|ssfcplog Simplify Decoding ddi_devid_t (impl_devid_t) Structures in the Kernel and Display the String Representation of the devid: dev id Display the Threads that have an Affinity set for a CPU (Specify <cpu> to Show only Threads with Affinity for that <cpu>): tlist affinitiy <cpu>

7.5

Crash Dump Analysis on Solaris by using ADB

Access the the crash dump directory and check files: cd /var/crash/syste_name ls lrt pstack vmcore file vmcore strings vmcore Invoke the debugger:

adb -k unix.0 vmcore.0 OR: adb k 0 At adb Prompt Display the message buffer: $<msgbuf msgbuf+14s msgbuf+10/s Get the core informations: $>coreinfo Get crash informations: $>system $>status Display the panic string: $>panicinfo *panicstr/s Show the crash log: $>crashlog Get the thread list: $< threadlist Check the status: $>status Get the system crash time: $> time/Y Get the boot time: $>lbolt/X Get servers informations: $<utsname $<hw_provider/s $<architecture/s $<srpc_domain/s Display stack trace: $<stack $<stacktrace Display stack calls: $<stackcalls Display stack registers: $<stackregs Stack traceback:

<sp$<stacktrace Check the root device: rootfs$<bootobj Check the swapfile device: swapfile$<bootobj dumpfile$<bootobj Display the CPU cycle informations: $>cpuinfo v Get CPUs: $<cpus Get process on CPU: $<proconcpu Get processes running at the moment of the crash: $<proc Get modules: $<modules Show open files at the moment of the crash: ::pfiles Display the callouts and the memory walkers: $>callout $>walkers Get the kernel memory structures: $>kmastat Show memory informations at the moment of the crash: $>memstat $>memerr $>meminfo tree process Show kernel memory segments: $<seglist Get ipc informations: ipcaccess/10i Get segment map: $>segkmap/J Show kernel address space: $>kas Show queues: $<queue Get filesystem list:

$<vfslist Quit the debugger: $>q An alternate method to invoke the debugger is to pass echoed commands by pipe: echo 'msgbuf$<msgbuf' | adb -k unix.0 vmcore.0 echo 'msgbuf,100/s' | adb -k unix.0 vmcore.0 echo '$c' | adb -k unix.0 vmcore.0 echo "<fp$<stackcalls" | adb -k unix.0 vmcore.0 echo "<fp$<stack" | adb -k unix.0 vmcore.0 echo "<fp$<stackregs" | adb -k unix.0 vmcore.0 echo "<fp$<stacktrace" | adb -k unix.0 vmcore.0

7.6 Crash Dump Analysis on Solaris by using Crash


The crash tool is installed as part of the Solaris operating system. The binary is located in the /usr/sbin. Access the the crash dump directory and check files: cd /var/crash/syste_name ls lrt pstack vmcore file vmcore strings vmcore Invoke crash tool and output to file: crash d vmcore.0 -n unix.0 w /tmp/output_filename Invoke crash tool to use it by the prompt: crash d vmcore.0 -n unix.0 OR: crash d 0 At the crash Prompt Get the core informations: >coreinfo Get crash informations: >system >status Display the panic string: >panicinfo *panicstr/s Show the crash log: >crashlog

Show processes running at the moment of the crash: >proc >p -e >p -l Get the thread list at the moment of the crash: >threadlist Check the status: >status Get CPU informations at the moment of the crash: >cpuinfo >cpuinfo -v Show the buffer: >buf Show the queues: >queue Get kernel memory structures informations at the time of the crash: >kmastat Get memory informations at the time of the crash: >meminfo >memerr Quit the crash tool: <CTRL><D>

7.7

Crash Dump Analysis on Solaris by using ACT

The ACT tool analyzes a system kernel dump and generates a human-readable text summary. Its shipped with all the Solaris media installation or with the Service Tool Bundle. To check if it is installed: pkginfo | grep CTEact Access the the crash dump directory and check files: cd /var/crash/syste_name ls lrt pstack vmcore file vmcore strings vmcore To invoke ACT and output core file to seperate files in /tmp/dir: act -d /var/crash/hostname/vmcore.0 -s /tmp/dir/ OR to invoke ACT and output core file to act_out file: act -d /var/crash/hostname/vmcore.0 > /tmp/act_out

OR to invoke ACT and output on live server to screen: act l When ACT is invoked to split the core file into the specified directory it creates the following files: biowait getblk modules msgbuf mutex rwlock threads system summary sunsolve

7.8

Other Crash Dump Analysis Tools on Solaris

On Solaris you can use some common binaries and commands to analyze a crash dump. Get network status at the time of the crash: netstat unix.0 vmcore.0 Get NFS status at the time of the crash: nfsstat -n unix.0 vmcore.0 Get ARP tables at the time of the crash: arp -a unix.0 vmcore.0 Get IPC status at the time of the crash: /usr/sbin/ipcs C vmcore.0 unix.0

8.Generate / Analyze a Crash Dump on HP-UX


8.1 Crash Dump Analysis by using KWDB

Check the Crash Dump Directory: ls -lrt /var/adm/crash/c* Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement. cat INDEX cat /etc/shutdownlog Create the /etc/shutdownlog file if it does not exist: touch /etc/shutdownlog If there's No Dump, Re-Save it: savecrash -vr /tmp

Verify that kwdb (preferred) or q4 is Installed and Loaded: swlist -l fileset | grep -i KWDB swlist -l file | grep contrib swlist -l fileset | grep -i q4 If KWDB is not installed, you can download the HP official depot for your servers HPUX version and architecture from here (you need an HP AllianceONE account with appropriate privileges). Then upload the depot to the server and uncompress it: gunzip KWDB_3.xxxx_depot.gz Install the depot package for Itanium-based & PA-RISC systems: swinstall -s /KWDB_3.tape_depot KWDB_3 OR for PA-RISC system: swinstall -s /kwdb.pa.depot KWDBPA_3 Analyze the Crash Dump by Using kwdb: cd /var/adm/crash/crash.# ls lrt kwdb -q4 /var/adm/crash/crash.5 At the kwdb Prompt Check the panicstring: (kwdb) examine panicstr using s Display stack trace with pc and sp (PA-RISC only): (kwdb) pc sp Get breakpoint info: (kwdb) info breakpoints (kwdb) i b Trace event 0: (kwdb) trace event 0 Trace event 0 with input, local and output registers: (kwdb) trace -args event 0 Load structures: (kwdb) load struct utsname from &utsname (kwdb) print t Print console message buffer: (kwdb) examine &msgbuf+8 using s Print the system crash date/time: (kwdb) examine &time using Y How long had the system been up before the crash: (kwdb) ticks_since_boot/hz System load average at the moment of the crash: (kwdb) examine &avenrun using 3F

(kwdb) examine &real_run using 3F What command was running the specified process: (kwdb) load struct proc from 0xb0d240 (kwdb) examine p_cmnd using s (kwdb) load struct proc from 0x42234040 (kwdb) print -xt p_cmnd (kwdb) examine 0x41e4db40 (kwdb) print p_comm How was the kernel built: (kwdb) examine &_makefile_cflags using s Load the part of the crash event table that contains valid entries and trace them: (kwdb) load crash_event_t from &crash_event_table until crash_event_ptr max 100 loaded 4 struct crash_event_table_structs as an array (stopped by until clause) (kwdb) trace pile Load the processor info table and trace every processor (HP-UX v11.11): (kwdb) load mpinfo_t from mpproc_info max nmpinfo loaded 4 struct mpinfos as an array (stopped by max count) (kwdb) trace pile OR (post-HP-UX v11.11 kernels): (kwdb) load mpinfou_t from &spu_info max nmpinfo (kwdb) pileon mpinfo_t from pikptr (kwdb) trace pile Load the processor information table and trace every processor: (kwdb) load mpinfou_t from &spu_info max nmpinfo loaded 1 union mpinfou as an array (stopped by max count) (kwdb) pileon mpinfo_t from pikptr loaded 1 struct mpinfo (kwdb) trace pile Load the process table and trace the stacks: (kwdb) load struct proc from proc_list max nproc next (kwdb) trace pile Load crash event: (kwdb) load crash_event_t from &crash_event_table until crash_event_ptr max 100 (kwdb) print cet_hpa %#x cet_event Trace event 1: (kwdb) trace event 1 Trace event 0 with input, local and output registers: (kwdb) trace -args event 0 Load structures: (kwdb) load struct utsname from &utsname (kwdb) print t Check threads: (kwdb) load kthread_t from kthread max nkthread

(kwdb) hist (kwdb) load kthread_t from kthread_list max nkthread next kt_factp (kwdb) hist (kwdb) keep kt_cntxt_flags & TSRUNPROC Display stack trace for structures from the current pile for process, processor, thread and crash event structures: (kwdb) trace pile (kwdb) print -tx kt_stat kt_cntxt_flags kt_flag kt_spu addrof kt_procp (kwdb) addrof kt_procp Check running processes (at the time the panic occurred): (kwdb) runningprocs Display stack trace for the process at addr: (kwdb) trace process at 7032300014 Trace CPU3, its threads, spinlocks, calls, etc: (kwdb) trace .v processor 3 Check the state of the processors: (kwdb) load mpinfo_t from mpproc_info max nmpinfo (kwdb) load mpinfou_t from &spu_info max nmpinfo (kwdb) pileon mpinfo_t from pikptr (kwdb) call it mpinfo (kwdb) print indexof addrof threadp curstate (kwdb) exam &mp_avenrun for nmpinfo using 3F (kwdb) print indexof addrof held_spinlock spinlock_depth (kwdb) load lock_t from 0x129a4c0 (kwdb) print -x sl_owner sl_lock_caller sl_unlock_caller (kwdb) exam sl_lock_caller using a (kwdb) exam sl_unlock_caller using a Recall mpinfo (Make a pile which is specified by mpinfo): (kwdb) recall mpinfo (kwdb) print indexof spu_state (kwdb) print indexof last_idletime last_tsharetime (kwdb) lbolt (kwdb) recall mpinfo (kwdb) print mp_rq.nready_free mp_rq.nready_locked Check the per-processor run queues: (kwdb) print -t | grep mp_rq (kwdb) print -t | grep mp_rq > mprq.out (kwdb) load rtsched_info_t from &rtsched_info (kwdb) print rts_nready rts_bestq rts_qp rts_numpri (kwdb) print -t (kwdb) print addrof kt_lastrun_time kt_wchan | sort -k 3n,3 | uniq -c -f2 | grep -v ^ 1 | sort Trace the specified thread: (kwdb) trace thread at 1532338064 (kwdb) load unwindDesc_t from &$UNWIND_START until &$UNWIND_END max 100000 (kwdb) maint info unwind panic

(kwdb) examine &_makefile_cflags using s Check kernel memory writes and log: (kwdb) kmem_writes (kwdb) load kmem_log_t from &kmem_log max kmem_log_slots If the crash dump analysis reveal an hardware issue, you can find the associated tombstone for the system. To save a tombstone: /usr/sbin/diag/contrib/pdcinfo Check the tombstone: cd /var/tombstones/ ls lrt more ts99 Extract the PIM informations: cstm cstm>map cstm>sel dev 25 cstm>info cstm>infolog Enter Done, Help, Print, SaveAs, or View: [Done] SA cstm>quit ls -l /tmp/pim.HPMC.16Nov03

8.2

Remote Crash Dump Analysis

kwdbcr -help kwdbcr /var/adm/crash.5 kwdb -q4 [-m] vmunix remote_system:port_number | crash_path in remote system> kwdb -q4 [-m] vmunix (kwdb) target crash remote_system:port_number | crash_path in remote system> more /var/opt/kwdb/kwdbcr.log kwdbcr -d -l logfile

8.3

Crash Dump Analysis by using Q4

Q4 is a crash dump analysis tool shipped with HP-UX OS installation media. It can work alone or in combination with KWDB. Check the Crash Dump Directory: ls -lrt /var/adm/crash/* Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement. cat INDEX cat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist: touch /etc/shutdownlog If there's No Dump, Re-Save it: savecrash -vr /tmp Verify that kwdb (preferred) or q4 is Installed and loaded: swlist -l fileset | grep -i KWDB swlist -l fileset | grep -i q4 swlist -l file | grep contrib type q4 If Q4 is not installed, you can install it from the HP-UX INSTALL media. First, you have to check the following patches are installed on the corresponding OS versions: HP-UX v10.20: PHCO_20261 HP-UX v11.00: PHCO_20262 HP-UX v11.11: PHCO_25723 To check wether the patch is installed, run the following command by substituting the xxxxx with the ID of the corresponding patch youre searching for: /usr/sbin/swlist -l product | grep PHCO_xxxxx If needed, you can download the patch from the following locations: For v10.[12]0 versions: ftp://us-ffs.external.hp.com/hpux_patches/s700_800/10.X/PHCO_20261 For v11.0 versions: ftp://us-ffs.external.hp.com/hpux_patches/s700_800/11.X/PHCO_20262 For v11.11 versions: ftp://us-ffs.external.hp.com/hpux_patches/s700_800/11.X/PHCO_25723 swlist -l fileset -s /cdrom | grep Q4 OS-Core.Q4 B.10.10 HP-UX Crash Dump Debugger for PA-RISC systems Select and load it if not loaded: swinstall -vs /<CD-ROM mount point> OS-Core.Q4 Prepare dump tools For HP-UX 10.20 through 11.11: /usr/contrib/bin/q4prep p For HP-UX 11.20 and above: /usr/contrib/Q4/bin/q4prep p For HP-UX 10.10 Uncompress and untar the Q4Lib: uncompress /usr/contrib/Q4/lib/Q4Lib.tar.Z tar -xf /usr/contrib/Q4/lib/Q4Lib.tar Copy the q4rc.pl sample file to the /tmp: cp /usr/contrib/Q4/lib/q4lib/sample.q4rc.pl /tmp/.q4rc.pl Once the dump tools are installed and prepared, access the crash dump directory and

decompress the dump: cd /var/adm/crash/crash.5 ls lrt gunzip vmunix strings vmunix | more file vmunix Set the Environment: . /usr/contrib/Q4/bin/set_env Make a check of vmunix (for HP-UX 11.20 and above): /usr/contrib/Q4/bin/q4pxdb vmunix s status vmunix pxdb -s status ./vmunix OR (for HP-UX 10.20 through 11.11): /usr/contrib/bin/q4pxdb s status vmunix Start analyzing the dump (for HP-UX 11.20 and above): /usr/contrib/Q4/bin/q4pxdb vmunix OR (for HP-UX 10.20 through 11.11): /usr/contrib/bin/q4pxdb vmunix Get panic info and put the output on a file: last reboot > reboot.out Get installed patches list and put it on a file: swlist -l product | grep -i PH > patches.out Access the core directory: cd /var/adm/crash/core.0 ls lrt Analyze the Crash Dump by Using Q4 (for HP-UX 11.20 and above): /usr/contrib/Q4/bin/q4 -p . OR (for HP-UX 10.20 through 11.11): /usr/contrib/bin/q4 -p . At the q4 Prompt include the analyze.pl script to add more analyzing features: q4> include analyze.pl Analyze the dump and put the output in a file: q4> run Analyze AU > ana.out Check the panic cause and put the output in a file: q4> run WhatHappened > what.out If it happened an hang, check the hang cause and put the output in a file: q4> run WhatHappened -HANG > whath.out Exit q4 Prompt: q4> exit

If the crash dump analysis reveal an hardware issue, you can find the associated tombstone for the system. To save a tombstone: /usr/sbin/diag/contrib/pdcinfo Check the tombstone: cd /var/tombstones/ ls lrt more ts99 Extract the PIM informations: cstm cstm>map cstm>sel dev 25 cstm>info cstm>infolog Enter Done, Help, Print, SaveAs, or View: [Done] SA cstm>quit ls -l /tmp/pim.HPMC.16Nov03 Then analyze the Following Files: more patches.out more /etc/shutdownlog more /var/tombstones/ts* (if they exist and/or if HPMC was detected) more /var/adm/syslog/OLDsyslog.log (if the dump was due to a hang) more ana.out more what.out more whath.out more reboot.out more crashinfo.out

8.4 Crash Dump Analysis by using KWDB Q4 Mode


KWDB supports a superset of commands provided by the crash dump analysis tool, Q4, which extends its functionalities. Check the Crash Dump Directory: ls -lrt /var/adm/crash/* Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement. cat INDEX cat /etc/shutdownlog Create the /etc/shutdownlog file if it does not exist: touch /etc/shutdownlog If there's No Dump, Re-Save it: savecrash -vr /tmp Verify that kwdb (preferred) or q4 is Installed and loaded: swlist -l fileset | grep -i KWDB

swlist -l fileset | grep -i q4 swlist -l file | grep contrib type q4 If KWDB is not installed, you can download the HP official depot for your servers HPUX version and architecture from here (you need an HP AllianceONE account with appropriate privileges). Then upload the depot to the server and uncompress it: gunzip KWDB_3.xxxx_depot.gz Install the depot package for Itanium-based & PA-RISC systems: swinstall -s /KWDB_3.tape_depot KWDB_3 OR for PA-RISC system: swinstall -s /kwdb.pa.depot KWDBPA_3 If Q4 is not installed, then follow the indications in the above section 8.3. Access the crash dump directory and analyze the Crash Dump by Using kwdb / q4: cd /var/adm/crash/crash.# ls lrt kwdb -q4 -p -m . OR at the kwdb Prompt, Activate the q4 Mode : (kwdb) set kwdb q4 on You run set kwdb q4 off at the q4 Prompt to disable q4 support. At the q4 Prompt Check the events that occurred immediately before and during the panic and log to file: q4> run WhatHappened > what.out If theres a suspect an hang occurred, check the panic events by running: q4> run WhatHappened -HANG > whath.out Analyze the dump and log output to file: q4> run Analyze AU > ana.out Check the panicstring: q4> examine panicstr using s Display stack trace with pc and sp (PA-RISC only): q4> pc sp Get breakpoint info: q4> info breakpoints q4> i b Trace event 0: q4> trace event 0 Trace event 0 with input, local and output registers: q4> trace -args event 0 Load structures:

q4> load struct utsname from &utsname q4> print t Print console message buffer: q4> examine &msgbuf+8 using s Print the system crash date/time: q4> examine &time using Y How long had the system been up before the crash: q4> ticks_since_boot/hz System load average at the moment of the crash: q4> examine &avenrun using 3F q4> examine &real_run using 3F What command was running the specified process: q4> load struct proc from 0xb0d240 q4> examine p_cmnd using s q4> load struct proc from 0x42234040 q4> print -xt p_cmnd q4> examine 0x41e4db40 q4> print p_comm How was the kernel built: q4> examine &_makefile_cflags using s Load the part of the crash event table that contains valid entries and trace them: q4> load crash_event_t from &crash_event_table until crash_event_ptr max 100 loaded 4 struct crash_event_table_structs as an array (stopped by until clause) q4> trace pile Load the processor info table and trace every processor (HP-UX v11.11): q4> load mpinfo_t from mpproc_info max nmpinfo loaded 4 struct mpinfos as an array (stopped by max count) q4> trace pile OR (post-HP-UX v11.11 kernels): q4> load mpinfou_t from &spu_info max nmpinfo q4> pileon mpinfo_t from pikptr q4> trace pile Load the processor information table and trace every processor: q4> load mpinfou_t from &spu_info max nmpinfo loaded 1 union mpinfou as an array (stopped by max count) q4> pileon mpinfo_t from pikptr loaded 1 struct mpinfo q4> trace pile Load the process table and trace the stacks: q4> load struct proc from proc_list max nproc next q4> trace pile Load crash event: q4> load crash_event_t from &crash_event_table until crash_event_ptr max 100

q4> print cet_hpa %#x cet_event Trace event 1: q4> trace event 1 Load structures: q4> load struct utsname from &utsname q4> print t Check threads: q4> load kthread_t from kthread max nkthread q4> hist q4> load kthread_t from kthread_list max nkthread next kt_factp q4> hist q4> keep kt_cntxt_flags & TSRUNPROC Display stack trace for structures from the current pile for process, processor, thread and crash event structures: q4> trace pile q4> print -tx kt_stat kt_cntxt_flags kt_flag kt_spu addrof kt_procp q4> addrof kt_procp Check running processes (at the time the panic occurred): q4> runningprocs Display stack trace for the process at addr: q4> trace process at 0x41978040 Trace CPU3, its threads, spinlocks, calls, etc: q4> trace v processor 3 Check the state of the processors: q4> load mpinfo_t from mpproc_info max nmpinfo Recall mpinfo (Make a pile which is specified by mpinfo): q4> recall mpinfo q4> print indexof spu_state q4> print indexof last_idletime last_tsharetime q4> lbolt q4> recall mpinfo q4> print mp_rq.nready_free mp_rq.nready_locked Check the per-processor run queues: q4> print -t | grep mp_rq > mprq.out q4> load rtsched_info_t from &rtsched_info q4> print rts_nready rts_bestq rts_qp rts_numpri q4> print -t q4> print addrof kt_lastrun_time kt_wchan | sort -k 3n,3 | uniq -c -f2 | grep -v ^ 1 | sort Trace the specified thread: q4> trace thread at 1532338064 q4> load unwindDesc_t from &$UNWIND_START until &$UNWIND_END max 100000 q4> maint info unwind panic q4> examine &_makefile_cflags using s

Check kernel memory writes and log: q4> kmem_writes q4> load kmem_log_t from &kmem_log max kmem_log_slots Exit the q4 Prompt: q4> exit Run the crashinfo utility, if you have it. It may be in /usr/local/bin or /opt/sfm/tools/ search it if you dont find it: find / -type f | grep crashinfo Run crashinfo and log the output to file: /opt/sfm/tools/crashinfo > crashinfo.out OR: /usr/local/bin/crashinfo > crashinfo.out OR: /opt/sfm/tools/crashinfo -continue | tee crash-43.log OR pointing the crash .# file: /opt/sfm/tools/crashinfo /var/adm/crash/crash.5 > crashinfo.out If the crash dump analysis reveal an hardware issue, you can find the associated tombstone for the system. To save a tombstone: /usr/sbin/diag/contrib/pdcinfo Check the tombstone: cd /var/tombstones/ ls lrt more ts99 Extract the PIM informations: cstm cstm>map cstm>sel dev 25 cstm>info cstm>infolog Enter Done, Help, Print, SaveAs, or View: [Done] SA cstm>quit ls -lrt /tmp/pim.HPMC.16Nov03 Then analyze the Following Files: more patches.out more /etc/shutdownlog more /var/tombstones/ts* (if they exist and/or if HPMC was detected) more /var/adm/syslog/OLDsyslog.log (if the dump was due to a hang) more ana.out more what.out more whath.out more reboot.out more crashinfo.out

To install crashinfo: crashinfo is part of the SFM (System Fault Management) bundle. There are 2 versions of crashinfo crashinfo-a-2.exe (64-bit PA2.0) and crashinfo-ai.exe (IA64). The 64-bit PA2.0 version can be run on both PA and IA64 systems, and analyze both PA2.0 and IA64 crashdumps. The IA64 version will only run on IA64 systems, but can analyze crashdumps from both IA64 and PA2.0 systems. For performance reasons you may wish to use the IA64 version when running on IA64 systems. Check if crashinfo is installed on the system: ls -lrt /opt/sfm/tools Download crashinfo /var/adm/crash/depot/SFM-CORE/MISC_TOOLS/opt/sfm/tools/crashinfo-a-2.exe OR: /opt/sfm/tools/crashinfo-a-i.exe To run crashinfo: /opt/sfm/tools/crashinfo > crashinfo.out /usr/ccs/bin/pxdb -s status ./vmunix /usr/ccs/bin/pxdb ./vmunix

8.5

Crash Dump Analysis by using HP WDB / GDB

The HP Wildebeest Debugger (WDB) is an HP-supported implementation of the Open Source GNU debugger (GDB). HP WDB / GDB can be used to debug / monitor a process, but it mostly used to analyze crashed processes core files and systems crash dumps. To analyze a system crash dump follow the steps below. Check the Crash Dump Directory: ls -lrt /var/adm/crash/c* Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement. cat INDEX cat /etc/shutdownlog Create the /etc/shutdownlog file if it does not exist: touch /etc/shutdownlog If there's No Dump, Re-Save it: savecrash -vr /tmp Check if HP WDB is installed: swlist -l fileset | grep -i wdb

If HP WDB is not installed, you can download the latest version (6.3) for your HP-UX version and architecture from here: you need an HP AllianceONE account with appropriate provileges. Upload the depot file onto the servers /tmp directory, access the directory and decompress it: cd /tmp gunzip hpwdb.xxxx.xxxx.depot.gz Install the depot: swinstall s hpwdb.xxxx.xxxx.depot/* The main path are: /opt/langtools/wdb /opt/langtools/gdb /opt/langtools/bin Before analyzing a process core file, check it: file corefile_name strings corefile_name Check if its truncated: elfdump -o -S core If the core file is truncated at 2GB size, maybe the system does not support creating files over that size on the filesystem on which the crash dumps are saved. You can check the syslog file to see if the system warns it could not complete saving the file. If thats the problem, the you can enable support for files over 2GB for the specified filesystem: fsadm -o largefiles /filesystem_name To start analyze the core dump: gdb -c core OR: gdb At gdb Prompt: (gdb) core core For commands and details refer to section 3.3 Debug Processes and Core Files by using HP WDB / GDB.

8.6

Crash Dump Analysis by using adb

Check the Crash Dump Directory: ls -lrt /var/adm/crash/c* Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement. cat INDEX cat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist: touch /etc/shutdownlog If there's No Dump, Re-Save it: savecrash -vr /tmp Access the crash dump directory and start analyzing the dump (change crash.5 with the name of the crash.# directory): cd /var/adm/crash/crash.5 ls lrt gunzip vmunix.gz strings vmunix | more file vmunix adb -m vmunix . OR without accessing the crash dump directory: adb -m /var/adm/crash/crash.5/vmunix /var/adm/crash/crash.5 msgbuf+8/s At adb Prompt Display the message buffer: $<msgbuf msgbuf+14s msgbuf+10/s Get the core informations: $>coreinfo Get crash informations: $>system $>status Display the panic string: $>panicinfo *panicstr/s Show the crash log: $>crashlog Get the thread list: $< threadlist Check the status: $>status Quit the debugger: $>q

9.Generate / Analyze a Crash Dump on Linux

9.1 Enable Saving Crash Dump by using kexextools


Check the Presence of kdump Tool: yum search kexec-tools chkconfig --list | grep kdump more /etc/kdump.conf OR: /etc/init.d/kdump status If necessary, Add the Line to Yum Repository on Red Hat: vi /etc/yum.repos.d/rhel-debuginfo.repo baseurl=ftp://ftp.redhat.com/pub/redhat/linux/enterprise/$releasever/en/os/ $basearch/Debuginfo/ Enable Repository: yum install --enablerepo rhel-debuginfo httpd-debuginfo OR for CentOS: vi /etc/yum.repos.d/centos-debuginfo.repo baseurl=http://debuginfo.centos.org/$releasever/$basearch/ Enable Repository: yum install --enablerepo centos-debuginfo httpd-debuginfo Install kexec-tools: yum install kexec-tools Check or Edit /etc/kdump.conf File According to your Needs: vi /etc/kdump.conf more /etc/sysconfig/kdump Backup and Edit /boot/grub/grub.conf and Append to the End of the Kernel Line "crashkernel=128M@16M": cp /boot/grub/grub.conf /boot/grub/grub.conf.bkp vi /boot/grub/grub.conf OR: cp /boot/grub/menu.lst /boot/grub/menu.lst.bkp vi /boot/grub/menu.lst kernel /boot/vmlinuz-2.6.18-128.1.16.el5 ro root=LABEL=/ rhgb quiet crashkernel=128M@16M Enable kdump Service: chkconfig kdump on chkconfig kdump chkconfig --list | grep kdump OR: /etc/init.d/kdump start /etc/init.d/kdump status Reboot the System:

reboot

9.2

Symulate a Panic and Save a Crash Dump

There are different ways to simulate a panic. The following are the most common: echo 1 > /proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger OR: echo 1 > /proc/sys/kernel/sysrq On the system console type: Alt-SysRq-u All filesystems will be re-mounted as read-only: this saves the system from running fsck on all the file systems when the system reboots. On the system console type: Alt-SysRq-c This will force the system to panic and a crash dump to be taken.

9.3

Analyze Crash Dump by using crash

On CentOS 5 and 6, Download and Install the kernel-debuginfo and kernel-debuginfocommon Packages: wget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-`uname -r`.i686.rpm wget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-common-`uname -r`.i686.rpm OR: wget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-`uname -r`.i686.rpm wget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-common-`uname -r`.i686.rpm OR for Red Hat 5: wget ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/kexectools-debuginfo-1.102pre-96.el5_5.4.x86_64.rpm OR for SuSE: wget ftp5.gwdg.de/pub/opensuse/repositories/Kernel:/kdump/openSUSE_11.1/x86_64/kexe c-tools-debuginfo-2.0.0-58.1.x86_64.rpm rpm -Uvh kernel-debuginfo* Check the crash dump files: cd /var/crash/2009-06-09-20\:18

ls lrt file vmcore strings vmcore Start crash and Analyze Output: crash /usr/lib/debug/lib/modules/crashed-kernel-version/vmlinux /var/crash/2009-0609-20\:18/vmcore OR: crash /usr/lib/debug/lib/modules/`uname -r`/vmlinux /var/crash/2009-06-0920\:18/vmcore | tee /var/crash/crash3.log At crash Prompt View System Data: crash> sys Get Info About Open Files: crash> files Display Processes Status: crash> ps Display Virtual Memory Info: crash> vm View Stack Traces: crash> bt -a Display Modules Info and Loading of Symbols and Debugging Data: crash> mod Dump Kernel Log Buffer Contents in Chronological Order: crash> log Analyze EIP Address (from the Preceeding Output): crash> dis -lr c04a9c34 Exit: crash> exit Running crash in Unattended Mode You can run crash in unattended (non-interactive) mode by creating an input file containing the commands you want to pass to crash. Generate an Input File Containing Commands: vi inputfile bt log ps exit Run Crash: crash -i inputfile

OR: crash < inputfile OR: crash <debuginfo> vmcore < inputfile > outputfile OR: crash <System map> <vmlinux> vmcore < inputfile > outputfile

9.4

Analyze Crash Dump by using GDB

Check the crash dump files: cd /var/crash/2009-06-09-20\:18 ls lrt file vmcore strings vmcore Start gdb on core file: gdb -c core OR: gdb a.out core OR: gdb path/to/the/binary path/to/the/core objdump -d -S null-pointer.ko > /tmp/whatever OR by gdb Prompt: (gdb) core core At gdb Prompt, To Analyze BackTrace: bt Check Status: (gdb) status View Data: (gdb) data View Stacks: (gdb) stack Analyze a Stack by its Number: (gdb) frame number View Code around that Stack: (gdb) list List Variables: (gdb) info locals

View Files: (gdb) files View Internals: (gdb) internals View Command Aliases: (gdb) aliases Check Support Facilities: (gdb) support Running Program: (gdb) running Quit the debugger: (gdb) quit

9.5

Analyze Crash Dump by using LKCD

The Linux Kernel Crash Dump (LKCD) is a project that provides a a reliable method of detecting, saving and examining system crashes. Download the current lkcdutils rpm and the patches from here and upload the packages onto the server. The installation of LKCD requires the kernel patches installation, a new kernel to be built and the LKCD utilities to be installed. Make a copy of the kernel source directory: cp -r /usr/src/linux-x.x.x /usr/src/linux-x.x.x.lkcd Access the newly-created directory: cd /usr/src/linux-x.x.x.lkcd Test the patches: patch -p1 --dry-run < <path>/lkcd-x.x.x.diff If the previous command did not report any errors, then apply the kernel patches: patch -p1 < <path>/lkcd-x.x.x.diff Configure the kernel adding LKCD support (compiled into kernel, not as a module) and enabling Magic SysRq Keys (Magic SysRq Keys is not a mandatory but it will allow a crash to be created when a system has hung): make menuconfig Navigate to Kernel Hacking and type <enter>. Navigate to Magic SysRq key and type <space> an asterisk should appear next to the line Magic SysRq key. Navigate to Linux Kernel Crash Dump (LKCD) and type <space> until an asterisk appears. If compression options are presented select all available. Press<tab> <enter> twice until you are prompted to save configuration: type <enter> to

save and exit menuconfig. Build the new kernel: make dep; make bzImage Install the kernel image: make install The kernel build process will have built the file Kerntypes in the kernel source directory: check wether this file was copied to the /boot directory and if needed copy this files yourself: cp Kerntypes /boot The kernel build process builds the file System.map in the kernel build directory, and the kernel install process copies this file into the /boot directory: check that /boot/System.map matches the copy in the kernel source directory: diff System.map /boot/System.map If the two files do not match, then make a fresh copy in the /boot directory: cp System.map /boot Reboot with new kernel: init 6 As the system is up and running, check out the /proc/sys/dump directory exists: ls -d /proc/sys/dump If the directory is missing the kernel has not been patched or configured properly for LKCD. Once the kernel is patched, install the LKCD Utilities rpm: rpm -i lkcdutils-x_x-x_xxxx.rpm Edit the system startup script /etc/rc.sysinit on Red Hat and CentOS or /sbin/init.d/boot on SuSE (to find the system startup script for yout distribution issue the command grep sysinit /etc/inittab). Locate the line

action $"Mounting local filesystems: "mount -a -tnonfs,smbfs,ncpfs


Following this line add this text:

/sbin/lkcd config

If you are using a swap partition as the dump device, then the dump must be save before swap is activated. Locate the line with the swapon command in the system startup script and change it link this (adding the lkcd commands above it):

/sbin/lkcd config /sbin/lkcd save # Start up swapping. action $"Activating swap partitions: " swapon -a e

Configure the device on which to save the crash dump by creating a symbolic link to the chosen device and updating the LKCD configuration: df k cat /proc/partitions

ln -s /dev/sdb1 /dev/vmdump /sbin/lkcd config Enable the Magic SysRq key with the following command: echo 1 > /proc/sys/kernel/sysrq Check or Edit Configuration File According to your Needs: vi /etc/sysconfig/dump The parameter DUMP_ACTIVE must be set to 1 to enable the dump process. Set DUMP_SAVE to 1 if you want to save the memory image to disk. Define the DUMP_LEVEL: 0 nothing, 1 dump the dump header and first 128K bytes out, 4 dump everything except the kernel free pages, 8 dump all memory. Set DUMP_COMPRESS to 0 if you do not want the dump to be compressed, to 1 if you want to use rle compression or to 2 for gzip compression. An example of dump configuration file:

DUMP_ACTIVE=1 DUMPDEV=/dev/vmdump DUMPDIR=/var/log/dump DUMP_SAVE=1 DUMP_LEVEL=8 DUMP_FLAGS=0 DUMP_COMPRESS=0 PANIC_TIMEOUT=5

After changing the configuration, update and enable Crash Dump Saving: lkcd config Check Configuration Settings: lkcd query Setup the Service to Start at Boot: chkconfig boot.lkcd on Test the LKCD. On the system console type: Alt-SysRq-u All filesystems will be re-mounted as read-only: this saves the system from running fsck on all the file systems when the system reboots. On the system console type: Alt-SysRq-c This will force the system to panic and a crash dump to be taken. If the system startup scripts don't contain the lkcd save command, then create the dump files: /sbin/lkcd save As the system is back up and running, check the dump files have been created: cd /var/log/dump/0 ls -lrt

Invoke LKCD lcrash: /sbin/lcrash map.0 dump.0 kerntypes.0 OR: /sbin/lcrash n 0 At the lcrash Prompt, Get a list of processes running at the time of the crash: >>ps Display system statistics and the log_buf array: >>stat Display the crash dump report: >>report >>report w outfile Display dump: >>dump >>dump c02e4820 8 o >>dump c02e4820 8 d >>dump c02e4820 8 x List opened namelists: >>namelist >>namelist a /tmp/snd.o Display modules informations: >>module >>module pcmcia_core >>module pcmcia_core f >>module kernel_module f i 10 Display page structures informations: >>page Evaluate and print expressions: >>print Dynamically load a library of lcrash commands: >>ldcmds Displays all complete and unique stack traces: >>strace Display stack trace for task_struct: >>trace Display information for task_struct structs: >>task List symbol table informations: >>symtab l List symbols in the specified module: >>symtab l f /tmp/my_dummy.map

Removing symbol table: >>symtab r /tmp/my_dummy.map Recreating and reloading symbol table: >>symtab a __ksymtab__ >>symtab a /tmp/my_dummy.map my_dummy >>symtab l Walk a linked list of kernel structures or memory blocks: >>walk Examine a local variable: >>whatis DUMMY >>print *(dummy_t*) d0000240 >> whatis dummy_s.member2 Display disassembled code: >>dis F memcmp >>dis 0xc025188e 10 f Quit lcrash: >>q

9.6

Other Useful Commands

Examine a running kernel after a crash can be very useful to check wether its experiencing issues: cat /proc/sys/kernel/tainted If a module, a library or a program is suspected to having caused a panic, you can dump/disassemble it: objdump -D -S <compiled_object_with_debug_symbols> > filename.out

10. Generate / Analyze a Crash Dump on Linux


10.1 Setup and Enable KDB

KDB is an interactive kernel debugger shipped with IBM AIX operating system. kdb allows the user to control execution of kernel code (including kernel extensions and device drivers), and to observe and modify the variables and register. It has to be invoked by a special boot image. The kdb is a tool/command for analysing the system dumps. It is used for post-mortem analysis of system dumps, or for monitoring the running kernel.

Check Current Dump Device(s): sysdumpdev -l Start the System Dump: sysdumpstart -p Check the Minimum Size for the Dump Device: sysdumpdev -e Enable the KDB, but Not Invoke it at Boot: bosboot -a -d /dev/ipldevice -D Enable the KDB and Invoke it at Boot: bosboot -a -d /dev/ipldevice -D Disable the KDB: bosboot -a -d /dev/ipldevice Check if KDB is Available: kdb (0)>dw kdb_avail (0)>dw kdb_wanted Find the Dump Object: lsnim -l worker

10.2

Analyze a Crash Dump by using KDB

Check Current Dump Device(s): sysdumpdev -l Check if KDB is Available: kdb >dw kdb_avail Find the Dump Object: lsnim -l worker Access the crash dump directory: cd /var/crash/ ls lrt View the content of the snap package: zcat snap.pax.Z | pax v Exctract the content of the snap package: zcat snap.pax.Z | pax -r OR extract just the dump, general, and kernel subdirectories: uncompress snap.pax.Z zcat snap.pax.Z | pax -r ./dump ./general ./kernel

Check the Timestamps of Dump and UNIX Files: what unix | grep _kdb_buildinfo what dump | grep _kdb_buildinfo what /usr/sbin/kdb_64 | grep _kdb_buildinfo what /usr/sbin/kdb_mp | grep _kdb_buildinfo Analyze a Core: kdb /var/adm/ras/vmcore.0 /unix At kdb Prompt, Display system statistics that include the last kernel printf() messages still in memory: >stat Display all of the stack frames from the current instruction as deep as possible (interrupts, system calls, user stack): >f Display informations about whats currently running on each processor: >status Display the symptom string for a dump: >symptom Show system log entries not processed by the log daemon: >errpt Shows the global error-logging control informations: >errlg -g Shows the error-logging control informations about the specified address: >errlg a address Show dump-time.trace informations: >dmptrc Displays information about the Lightweight Memory Trace (LMT): >mtrc all -v Displays information about the Lightweight Memory Trace (LMT) for CPU 0: >mtrc C 0 v Dump the event buffer on channel 2, related to Thread ID 14539 for an active system trace: >trace c 2 t 14539 Initial CPU Context: >cpu 1 Get Breakpoints: >brk Display the Stack om Raw Format: >dw @r1 90 Display All of the Function Addresses:

>devsw Display data at ustname: >dw utsname Find physical address at ustname: >tr utsname Get Machine State: >mst Get Machine State Register: >mrs Dump the Content of the Machinte State Register: >dr msr VMM Error Log: >vmlog Display informations about component dump tables in a system memory dump: >cdt >cdt 11 >cdt -p 11 7 Process Info: >proc * Display file table: >file Print intr Symbol: >pr -p intr Show symbols matching *r: >pr -p *r Print following the next pointer: >pr -l next intr 30047A80 Display iNode table: >ino Print details of the inode pool: >jfsnode Print gfs slot 1: >gfs gfs Display either the Enhanced Journaled File System (JFS2) d-tree or x-tree structure based on the specified inode parameter: >tree 325C1080 Print gfs slot 2: >gfs gfs+30

Print gfs slot 3: >gfs gfs+60 pid Output: >p 3 Get Threads: > thread * Print current thread: >tpid Show VMM free lsit informations: >freelist Tid Output: >th 12 kdb Output: >p * Get the Address of the Symbol and Table Of Contents Section of the Executable Module: >nm >nm vmerrlog Display the inpcb structure for TCP connections: >tcb s Display the inpcb structure for UDP connections: >udb s Print the socket structure for TCP and UDP sockets: >sock s Display data structure (mbuf) informations (mbufs are used to store data in the kernel for incoming and outbound network traffic): >mbuf p Display data structure (mbuf) informations and follow the packet chain: >mbuf -a Follow the mbuf structure within a packet: >mbuf n effectiveaddress Display the list of all of the valid network device driver tables and gives the address of each ndd structure and the name of the corresponding network interface: >ndd s Display network connections at the time of the crash: >netstat -an Display network interface informations: >ifnet

Display the list of kernel data structures checkers: >check Display informations about the specified kernel data structures checker: >check h proc Run the proc checker to validate the entire process table: >check l 7 proc Exit the KDB: >g

11. Debugging Tools


11.1 Informations

In this section you can find a collection of debugging tools for the main UNIX and Linux operating system. GDB: The GNU Project Debugger allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed. It is available for different UNIX operating systems and Linux distributions. HP tusc: Tusc traces system calls invoked by a process. It works with HP-UX 11.0 and 11i PARISC systems, and HP-UX 11i HP Integrity systems. It is not supported on HP-UX 10.20. tusc is similar in functionality to truss on Solaris. HP Wildebeest Debugger (WDB): HP WDB is an HP-supported implementation of the Open Source GNU debugger (GDB). It is available for different HP-UX versions and architectures. Linux Kernel Crash Dump: LKCD is a project is designed to meet the needs of customers and system administrators wanting a reliable method of detecting, saving and examining system crashes. It is available for different Linux distributions. DTrace Toolkit: DTrace Toolkit is a collection of DTrace scripts to debug and deep diving a system: you can download here the current version (0.99). DTrace TazTool: DTrace TazTool is the DTrace version of the program taztool, a disk trace tool developed by Richard McDougall which takes the TNF disk trace records and matches them up in pairs for the start and end of a disk transaction. DTrace TazTool could be though as a taztool evolution: the last version is the 0.51 and it can be downloaded here. If you want, you can also download taztool 1.1 (as youll notice its package name is RMCtaz).

Dexplorer: DExplorer automatically runs a collection of DTrace scripts to examine many areas of the system, and places the output in a meaningful directory structure that is tar'd and gzip'd. You can download the current version, 0.70. Lsof for HP: Lsof lists files, sockets, inodes, etc opened by processes. Lsof for Solaris: You can find lsof for Solaris 10 SPARC and x86 on http://www.sunfreeware.com: you have to create a free account to download packages from this site. You can find packages for the previous versions of Solaris on http://unixpackages.com/: packages on this site are not freeware as you need to buy a subscription (a single-user subscription currently costs $20/Year. SE Toolkit: The SE Toolkit is a collection of scripts for performance analysis and gives advice on performance improvement. It has been a standard in system performance monitoring for the Solaris platform over the last 10 years. XE Toolkit: The XE Toolkit is a multi-platform, network-aware, secure performance monitoring solution for tactical analysis of enterprise computing systems. NMON: This Solaris system monitoring tool allows to perform standard SAR activity reporting and NMON activity reporting. The NMON output can be imported with Excel or RRD to output simple and efficient graphs. Ksar: ksar is a sar graphing tool that can graph for now linux,mac and solaris sar output. sar statistics graph can be output to a pdf file. Sar2html: sar2html converts sar binary data to graphical html format. It has command line, web interface and data collection script. HPUX 11.11, 11.23, 11,31, Redhat 3, 4, 5, 6 Suse 8, 9, 10, 11 and Solaris 5.9, 5.10 are supported. Sarface: sarface is a user-interface to the sysstat/sar database which inputs data from sar and plots to a live X11 graph via gnuplot. It mimics the cmd-line options from sar but can cross-plot any two or more stats and apply simple mathematical functions them. Visual SAR: Visual SAR is a Java graphical interpreter of an Unix sar command. It reads a sar output from a file and show it in a graphical format. Visual SAR allow a quick interpretation of a server behavior in several days. Sarvant: Sarvant (SAR Visual ANalysis Tool) is a python script that will analyze a sar file (from the sysstat utility, 'sar') and produce graphs using gnuplot of the collected data. Sarparse:

Sarparse is a utility based off of cacti to graph Sar metrics from remote hosts. It require NRPE and SAR to run out-of-the box but could easily be modified for any other transport.

Potrebbero piacerti anche