Sei sulla pagina 1di 3

Solution Type Technical Instruction

Solution 206274 : KERNEL: How to enable deadman kernel code

Description
The Solaris[TM] Operating System contains a deadman timer .
This allows the OS to force a kernel panic in the event of a system hang. Analysis of the
resulting crash dump can give information on the cause of the hang. The deadman timer is
implemented in different ways on different hardware platforms and different OS releases.
Some combinations of platform and OS allow you to configure the time between a hang
occurring and a panic being forced.
This document explains how to configure time taken until the deadman code forces a
panic. It also gives pointers to how to collect information which can be used to analyze
why the system hung.

Steps to Follow
To catch core information for a hanging/soft hanging system you can
enable the Solaris[TM] deadman timer.
The deadman timer can be used for the following configurations:

• Solaris[TM] 7 Operating System and higher

• Solaris 2.6 with patch 105181-06 or higher*

• Solaris 2.5.1 with patch 103640-21 or higher*


*Due to bug 4080160, it is likely unsafe to use deadman without the above mentioned
patches.
Note: In Solaris 7 and below, the deadman timer can only be used for sun4u, sun4m and
sun4d architectures. In Solaris 8 and higher, it is available for all architectures (including
x86).
1. Enable savecore:

• For Solaris 2.6 and previous releases, see Technical Instruction: < Solution:
213750 >.

• For Solaris 7 and later releases, see Technical Instruction: < Solution: 206669 >.
2. Enable the deadman timer within the kernel by adding the following line to /etc/system,
then reboot the system:
set snooping=1
On Solaris[TM] 10 Operating System the deadman kernel can only be enabled globally all
zones will inherit the setting from the global zone.
3. Specify the time after which the deadman timer should fire:

For Solaris[TM] 8 Operating System and later:


The deadman code is executed once per second. The snoop_interval tunable parameter has
a default value of 50000000 (fifty million microseconds). This corresponds to a deadman
timer of 50 seconds. There is normally no need or reason to change this from the
default on Solaris 8 and later.
To set the deadman timer to 90 seconds specify the following in /etc/system, then reboot
the system:
set snoop_interval = 90000000
This results in: 90000000 (ninety million) microseconds = 90 seconds

For Solaris 7 and earlier on sun4u architecture:


The deadman code is executed once per snoop_interval. The snoop_interval tunable
parameter has a default value of 50000000 (fifty million microseconds or 50 seconds). The
system will panic after the deadman code has executed ten times, that is, the panic occurs
after 500 seconds of inactivity. 500 seconds is normally considered too long to wait, and
tuning it to 90 seconds is recommended for Solaris 7 and earlier.
To set the deadman timer to 90 seconds specify the following in /etc/system, then reboot
the system:
set snoop_interval = 9000000
This results in: 10 * 9000000 (nine million) microseconds = 90 seconds

For Solaris 7 and earlier on sun4m architecture:


The deadman code is executed every 2 seconds. This happens 30 times before a panic is
forced The snoop interval is not tunable. That means the deadman is fired after 60 seconds
of inactivity.

For Solaris 7 and earlier on sun4d architecture:


The implementation of deadman code differs from all other architectures. The deadman is
fired directly from a level 14 interrupt handler as soon as a hanging clock thread is
detected. The inactivity interval is not tunable. This means the deadman is fired after less
than 1 sec of inactivity.
4. When the next hang occurs, the deadman timer should be triggered:
On Solaris 8 and newer releases a panic is initiated creating a corefile with a panic string
of
deadman: timed out after %d seconds of clock inactivity
On Solaris 7 and previous releases, the system will drop to the ok prompt. At this point,
any specific debugger commands can be run to examine the current state of the system. Of
particular interest are:

• .registers to dump the registers

• ctrace to dump the current stack backtrace


When data collection is complete (make sure to write down the results, since they will not
be recorded on the system), attempt to take a core dump by typing:
sync
at the ok prompt
On reboot, the system dump will be saved to a "vmcore.*" file in directory /var/crash/
`uname -n`.

Internal Comments
NOTE: On a Sun Fire[TM]12K/E25K server the following steps will also
need to be performed.
Create a copy of the platform level dsmd tuning file, $SMSETC/config/dsmd_tuning.txt,
and place it into the domain specific configuration directory,
$SMSETC/config/Q/dsmd_tuning.txt
Then in the DOMAIN SPECIFIC dsmd tuning file $SMSETC/config/Q/dsmd_tuning.txt
the following parameters will need to be modified as follows:
obp_heartbeat_time = 1200
os_heartbeat_time = 1200
domain_asr = 0
Do NOT modify the platform level dsmd_tuning.txt file
$SMSETC/config/dsmd_tuning.txt, as it is used for ALL domains, and should not be
changed.

Potrebbero piacerti anche