Troubleshooting Core Dumps

Postings may contain unverified user-created content and change frequently.
The content is provided as-is and

is not warrantied by Cisco.
1
Troubleshooting Core Dumps
Core dumps occur when a Linux process experiences a fault. This results in an outage of
the affected process or service. The process or service must restart to recover. During
these incidents, the server may remain up, but certain services may experience a brief
outage. This document covers troubleshooting core dumps and backtraces that may occur
on Communications Manager(CM or CUCM), Unity Connection(UC), Cisco Emergency
Responder(CER), Cisco Unified Presence Server(CUPS), Cisco Unified Contact Center
Express (UCCX or IPCC Express), or any product based on Cisco's Voice Operating System
(VOS) appliance model.

Identifying Core Dump Events on page 1
Listing Core Dump Files on page 3
Performing Core Analysis on page 4
Understanding the Backtrace of a Core File on page 6
Example 1: File Size Limit Exceeded on page 6
Example 2: Core when memory leak reaches maximum process memory
size on page 6
Example 3: Core Stack Corruption on page 8
Cisco Bug Toolkit Search on page 9
Troubleshooting Intentional Aborts on page 12
Useful Information for Creating TAC Service Requests on page 15
Identifying Core Dump Events
Two of the most common ways in which to identify the occurrence of a core dump in
CallManager are the following:

CoreDumpFileFound RTMT alert messages found in Alert Central
Postings may contain unverified user-created content and change frequently. The content is provided as-is and
2

Within RTMT Alert Central, more detail on the specific application that generated the
core can be found by right-clicking
on the alert selection. An example of the core dump alert details information can be
found below:

Application Event log alert messages indicating a core dump has occurred:
May 15 05:32:09 ccm-pub local7 2 : 0: May 15 09:32:08.865 UTC :
%CCM_LPM-LPMTCT-2-CoreDumpFileFound: The new core dump file(s) have been found in the
3
system. TotalCoresFound:1 CoreDetails:The following lists up to 6 cores dumped by
corresponding applications. Core1:Cisco CallManager (core.10499.6.ccm.1273915815) App
ID:Cisco Log Partition Monitoring Tool Cluster ID: Node ID: ccm-pub
May 15 05:32:15 ccm-pub local7 2 : 138: May 15 09:32:15.231 UTC : %CCM_RTMT-RTMT-2-RTMT-
ERROR-ALERT: RTMT Alert Name:CoreDumpFileFound Detail:CoreDumpFileFound TotalCoresFound :
1 CoreDetails : The following lists up to 6 coresdumped by corresponding applications. Core1 : Cisco
CallManager(core.10499.6.ccm.1273915815) AppID : Cisco Log Partition Monitoring Tool ClusterID : NodeID :
ccm-pub . The alarm is generated on Sat May 15 05:32:08 EDT 2010. AppID:Cisco AMC Service Cluster ID:
Node ID:ccm-pub

Listing Core Dump Files
On the CallManager server in question, a list of core dumps can be obtained by issuing the
following command:

utils core list (CallManager version 5.x, 6.x)
utils core active list (CallManager version 7.x and later)

An example of 'utils core list' is provided below, where we observe the CCM service as the
core dump generator:

4
Output of the command 'utils core active list' is similar to the example depicted above, with
the exception of the inclusion of the "active" parameter. This parameter was added in later
CallManager releases to allow core file listing from the CM Inactive partition (previous CM
version on the system, if an upgrade has taken place) without the need to perform a version
switch and reboot. Instead of supplying "active" as the command line parameter, inactive
partition core file listing is performed via 'utils core inactive list'.

An example of 'utils core active list' is provided below:

Performing Core Analysis
Once the core dump instance has been identified via the list command, the next step
is to obtain the core file backtrace for review. This function is provided by the following
command:

utils core analyze <CoreFileName> (CallManager version 5,x, 6.x)
utils core active analyze <CoreFileName> (CallManager version 7.x and later)

An example of the 'utils core anayze' command is provided below, where we are supplying a
ccm service core file that was generated on 11/30/2009 at 11:11:50:
5

Like the 'utils core active list' command, one can also perform core file analysis on the
inactive partition via the 'utils core inactive analyze <CoreFileName>' command. This
feature is available in CallManager 7.x and later, and a screenshot of the 'utils core active
analyze' command is provided below:

In both examples, a warning is provided stating that this procedure will take a considerable
amount of I/O and may impact system performance. During the analysis process, the raw
core file is parsed and interpreted into a backtrace output that can be used to identify the
cause of the core dump.

The analysis process normally takes a minute or less to complete on average. The warning
about impact to system performance is a suggestion to run this command during a non-peak
time period to avoid a potential resource issue.

6
Understanding the Backtrace of a Core File
The chief component of the core analysis process is retrieving the backtrace for review.
Once the analysis command has been executed, a section titled "Backtrace" will be
displayed on the command line, similar to the screenshot below:

The core backtrace output is composed of several process calls, denoted by #0, #1, #2,
etc. These lines indicate process calls stored in memory at the time of the service fault. In
many cases, these backtrace signatures are a unique fingerprint that can identify a particular
known or new defect in CallManager.
Example 1: File Size Limit Exceeded
Core was generated by `/usr/local/cm/bin/ccm'.Program terminated with signal 25, File size limit exceeded.#0 0x0067a211 in __write_nocancel () from /lib/tls/libc.so.6#1 0x00616d0f in _IO_new_file_write () from /lib/tls/libc.so.6#2 0x00615c6e in new_do_write () from /lib/tls/libc.so.6#3 0x00615c06 in _IO_new_do_write () from /lib/tls/libc.so.6#4 0x006164ba in _IO_new_file_sync () from /lib/tls/libc.so.6#5 0x0060abbb in fflush () from /lib/tls/libc.so.6#6 0x08271749 in dBProcs::addDevice (this=0xee6b1a0, deviceName=0xbc59c3e0 "SEP0022905BC978", deviceProtocol=0, deviceType=Device7941G) at dBProcs.cpp:7388

In this example the process was attempting to write to a file. The write attempt generted an
exception and generated a core file. The cause, "Program terminated with signal 25, File
size limit exceeded." is a direct match to
CSCsu94937 Multiple services core dumping with signal 25, File size limit exceeded.

Example 2: Core when memory leak reaches maximum process memory size
Memory leak in CCM process, resulting in intentional abort.

7
#0 0x00a157a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x01276825 in raise () from /lib/tls/libc.so.6
#2 0x01278289 in abort () from /lib/tls/libc.so.6
#3 0x0050d58b in __gnu_cxx::__verbose_terminate_handler () from /usr/local/cm/lib/
libstlport.so.5.1
#4 0x0050b2a1 in __cxxabiv1::__terminate () from /usr/local/cm/lib/libstlport.so.5.1
#5 0x0050b2d6 in std::terminate () from /usr/local/cm/lib/libstlport.so.5.1
#6 0x0050b41f in __cxa_throw () from /usr/local/cm/lib/libstlport.so.5.1
#7 0x0050b86c in operator new () from /usr/local/cm/lib/libstlport.so.5.1
#8 0x0a06bb2d in SdlProcessBase::operator new (size=102700) at
SdlProcessBase.cpp:105
#9 0x0a0014e2 in H245SessionManager::create (parentId={mSdlProcessName
= 0x0, mSdlNodeId = 4, mSdlAppId = 100, mSdlProcessNumber = 150,
mSdlProcessInstance = 2629}, vH245TerminalType=H245_Gateway,
vH245TransportConnectionMode=H245Client, vH245IpAddress=404699044,
vH245IpPort=40076, vTCPTos=96, vPassThruMSD=false, vTCSTimeout=10,
vFastStartInd=0, vFsAudioOutgoingLCN=0, vFsAudioIncomingLCN=0,
pktCaptureContext=0xbffab74d "", allowTCPKeepAlivesForH323=true) at
ProcessH245SessionManager.cpp:221
#10 0x08a5629c in H245Interface::start_Transition (this=0xbff99008, s=@0x5c70990) at /
vob/ccm/Common /Include/Sdl/SdlProcessBase.hpp:123
#11 0x08a99354 in H245Interface::fireSignal (this=0xbff99008, sdlSignal=@0x5c70990) at /
vob/ccm/Common /Include/Sdl/SdlProcessBase.hpp:175
#12 0x0a06c904 in SdlProcessBase::inputSignal (this=0xbff99008, rSignal=0x5c70990,
traceType=SdlSystemLog::SignalRouterThread, highPriority=0, normalPriority=0,
lowPriority=0, veryLowPriority=0, lazyPriority=0, dbUpdatePriority=0) at
#13 0x0a0746ce in SdlRouter::callProcess (this=0xe225ac0, _sdlSignal=0x5c70990,
_deleteSignal=@0x36b8d07, _traceType=SdlSystemLog::SignalRouterThread, _hp=0,
_np=0, _lp=0, _vlp=0, _lzp=0, _dbp=0) at SdlRouter.cpp:371
8
#14 0x0a0740f3 in SdlRouter::scheduler (sdlRouter=0xe225ac0) at SdlRouter.cpp:281
#15 0x05514bd7 in ACE_OS_Thread_Adapter::invoke (this=0xfe57a30) at
OS_Thread_Adapter.cpp:94
#16 0x054d5087 in ace_thread_adapter (args=0x0) at Base_Thread_Adapter.cpp:137
#17 0x00db73cc in start_thread () from /lib/tls/libpthread.so.0
#18 0x0131a96e in clone () from /lib/tls/libc.so.6

In this example, the CCM process cores due to a memory leak and subsequent resource
exhaustion. Backtraces that include calls to "operator new" are typically a result of memory
leak. The process has requested the maximum amount of memory allowed by the operating
system so a core is forced. It is not possible to identify the specific memory leak from the
core only to state it is result of memory leak. Other methods must be used to identify the
source of the leak. Frequently this is possible by parsing SDL traces to identify objects that
are "Started" or "Created" and not subsequently "Stopped". From traces the above core was
eventually diagnosed back to:
CSCte50152 Memory Leak in CCM due to Transient SIP Connections.

Example 3: Core Stack Corruption
Memory corruption results in corrupted stack with "??" characters in place of function calls.

#0 0x4e52500a in ?? () #1 0xaffb3070 in ?? () #2 0xaffb9084 in ?? () #3 0x030dc678 in ?? () #4 0x00000000 in ?? ()

In this example, a memory corruption incident had ocurred that resulted in the stack
being overwritten. In place of function calls, we observe "??" characters in its place.
Unfortunately, a search against this backtrace alone will not correlate to a known defect. It
is recommended that the corresponding service log (e.g. ccm traces, tomcat logs) and the
complete core file be retrieved from the affected system for TAC review.

9
Cisco Bug Toolkit Search
Once a backtrace has been retrieved for the core dump event, the next step is to search the
Bug Toolkit for potential known defects. The following defect will be used for this example:

CSCta39769 UnicastBridgeControl Causes CUCM to Crash

#0 0x097a0850 in UnicastBridgeControl::removeConfResources (this=0x6a69f698) at
/vob/ccm/Common/Include/CallManager/TDCLCpShares.hpp:2622
#1 0x097ab5ae in UnicastBridgeControl::star_StationClose (this=0x6a69f698,
s=@0x6a981938)
at ProcessUnicastBridgeControl.cpp:2193
#2 0x097bff64 in UnicastBridgeControl::fireSignal (this=0x6a69f698,
sdlSignal=@0x6a981938) at /vob/ccm/Common/Include/Sdl/SdlProcessBase.hpp:174
#3 0x09e4ae58 in SdlProcessBase::inputSignal (this=0x6a69f698, rSignal=0x6a981938,
traceType=SdlSystemLog::SignalRouterThread, highPriority=0, normalPriority=0,
lowPriority=0, veryLowPriority=0, lazyPriority=0, dbUpdatePriority=0) at
#4 0x09e52c1a in SdlRouter::callProcess (this=0xde9bcc8, _sdlSignal=0x6a981938,
_deleteSignal=@0x324bd97, _traceType=SdlSystemLog::SignalRouterThread, _hp=0,
_np=0,
_lp=0, _vlp=0, _lzp=0, _dbp=0) at SdlRouter.cpp:372
#5 0x09e5263f in SdlRouter::scheduler (sdlRouter=0xde9bcc8) at SdlRouter.cpp:282
#6 0x00a00ef3 in ACE_OS_Thread_Adapter::invoke (this=0x10b70b90) at
#7 0x009c1abf in ace_thread_adapter (args=0x0) at Base_Thread_Adapter.cpp:137
#8 0x003bf371 in start_thread () from /lib/tls/libpthread.so.0
#9 0x01339ffe in clone () from /lib/tls/libc.so.6

The following line will be used to perform initial searching in the Bug Toolkit:
10

#2 0x097bff64 in UnicastBridgeControl::fireSignal (this=0x6a69f698,
sdlSignal=@0x6a981938) at /vob/ccm/Common/Include/Sdl/SdlProcessBase.hpp:174

In the search example screenshot, unique memory location identifiers have been removed
from the search statement to ensure that matches are found. It may also be necessary to
refine the search criteria to a specific CUCM version if no matches are presented after the
search attempt, as shown below:
11

Software version 7.0 was selected in the modified search above to narrow down to a specific
subset of defects applicable to CUCM. With the search re-submitted for defects related to
version 7.0, the following results are displayed:

12
Troubleshooting Intentional Aborts
Core dumps that include the "IntentionalAbort" statement indicate a system resource issue
that was responsible for the service fault. The following ccm service core dump backtrace
example will be used to demonstrate steps involved in troubleshooting intentional aborts:

#0 0x001627a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00d64815 in raise () from /lib/tls/libc.so.6
#2 0x00d66279 in abort () from /lib/tls/libc.so.6
#3 0x084c4e7a in preabort () at ProcessCMProcMon.cpp:101
#4 0x084c4e92 in IntentionalAbort (reason=0xa9fdbdc "CallManager's timers appear
incorrect. This may be due to CPU or blocked function. Attempting to restart
CallManager.") at ProcessCMProcMon.cpp:106
#5 0x084c66c3 in CMProcMon::verifySdlTimerServices () at ProcessCMProcMon.cpp:843
#6 0x084c7035 in CMProcMon::callManagerMonitorThread (cmProcMon=0xec122d0) at
ProcessCMProcMon.cpp:439
#7 0x0107e5fb in ACE_OS_Thread_Adapter::invoke (this=0xf3ef3b8) at
#8 0x01040cbf in ace_thread_adapter (args=0x0) at Base_Thread_Adapter.cpp:137
#9 0x002dc3cc in start_thread () from /lib/tls/libpthread.so.0
#10 0x00e061ae in clone () from /lib/tls/libc.so.6

The RIS Data Collector Perfmonlog information should be retrieved from the CUCM node
that experienced the core dump via RTMT for review, for the timestamp of the core dump
13
alert. Using Windows Performance log viewer, the process CPU utilization counters are
reviewed first, as shown below:

In the screenshot above, it is observed that CPU utilization appears stable prior to the
core dump incident. CPU utilization dips during the crash as resources are released. In
troubleshooting a potential CPU utilization issue, the concern would be a trend in CPU
increase leading up to the core dump incident.

The next component to examine in the Perfmon data is percentage VM used by the system.
In the current example, it is observed that this counter is particularly high for the time period
leading up to the core dump incident:
14

Next, VMSize specific to all processes are examined to determine what caused the gradual
increase in memory utilization on the system. In this example, it was found that the VMSize
counter for the CCM process is relatively high and sloping upwards. This indicates that
CCM had cored due to a memory leak:
15
Useful Information for Creating TAC Service Requests
When opening a new TAC service request to troubleshoot a core dump incident, the
following information is useful to provide to TAC to expedite the process:

Full CallManager version in use (e.g. 7.1.3.32900-4)
Date/Time of the core dump incident
Application Event log is useful to provide for this information
Provide output of 'utils core list' command for absolute timestamps and core file names
Offending service that generated the core dump (e.g. CCM, CEF, Tomcat)
Core file backtrace output
Core file
Service logs for offending process
e.g. If a CCM core dump, provide Cisco CallManager traces for time period of incident
RIS Data Collector Perfmonlog
See Troubleshooting Intentional Aborts section

Troubleshooting Core Dumps

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Troubleshooting Core Dumps

Caricato da

Copyright:

Formati disponibili

Postings may contain unverified user-created content and change frequently.

The content is provided as-is and

Potrebbero piacerti anche