Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Outline
Crash dumps and tools Analysis basics
IRQLs Stacks
Manual analysis
Stack trashes Hung Systems When there is no crash dump
Introduction
Many systems administrators ignore Windows crash dump options
I didnt know I could analyze crashes Crash analysis too hard A crash dump wont tell me anything anyway
INF402
There are lots of third-party drivers! thirdFrom online crash analysis database:
55,000 unique drivers 24 new/day (28,000 in 2004) 220,000 total drivers 98 revised/day (130,000 in 2004)
KeBugCheckEx:
Turns off interrupts Tells other CPUs to stop Paints the blue screen Notifies registered drivers of the crash If a dump is configured (and it is safe to do so), writes dump to disk
Many Devices
Over 1,263,300 distinct Plug and Play (PnP) IDs (680,000 in 2004) 1,600 PnP IDs added every day
Bugcheck Codes
Bugcheck codes are shared by many components and drivers
There are about 150 defined stop codes Two common ones are:
(DRIVER_) IRQL_NOT_LESS_OR_EQUAL (0x0A) - Usually an invalid memory access INVALID_KERNEL_MODE_TRAP (0x7F) and KMODE_EXCEPTION_NOT_HANDLED (0x1E) Generated by executing garbage instructions Its usually caused when a stack is trashed
Kernel
Writes OS memory and not processes
Most crash debugging doesnt involve looking at process memory anyway
Often, bugcheck code and parameters are not enough to solve the crash
Need to examine crash dump
Useful for large memory systems Overwrites every time Default on Windows Vista
Full
Writes all of RAM Overwrites every time
Minidumps
On Windows XP, Windows Server 2003, and Windows Vista, minidump is always created, even if system set to full or kernel dump Can extract a minidump from a kernel or full dump using the debugger .dump /m command To analyze, requires access to the images on the system that crashed
At least must have have access to the Ntoskrnl.exe Microsoft Symbol Server now has images for Windows XP and later
Set image path to same as symbol path (covered later)
INF402
At The Reboot
The crash corrupted components involved in the dump process Spontaneous reboot Paging file on boot volume is too small Not enough free space for extracted dump Hung system
Well cover how to troubleshoot these problems later
Memory.dmp
NtCreatePagingFile
Paging File
At The Reboot
Session Manager process (\Windows\system32\smss.exe) initializes (\Windows\system32\ paging file
NtCreatePagingFile 1 2
WinLogon calls NtQuerySystemInformation to tell if theres a dump 3 to extract If theres a dump, Winlogon executes SaveDump 4 (\Windows\system32\savedump.exe) Windows\system32\
Writes an event to the System event log SaveDump writes contents to appropriate file On Windows XP or later, checks to see if Windows Error Reporting should be invoked
1. XML description of
Dumpprep then:
Generates an XML description of system version, drivers present, loaded plug and play drivers and depending on the configuration Displays the message box (if enabled) to send the dump Submits to dump for automatic analysis
system version, drivers present, loaded plug and play drivers 2. Minidump file
INF402
Outline
Crash dumps and tools Analysis basics
IRQLs Stacks
OCA cant tell you when it suspects a driver that hasnt been conclusively identified as being responsible by hand analysis
Manual analysis
Stack trashes Hung Systems When there is no crash dump
IRQLs
IRQL stands for Interrupt Request Level
Each CPU maintains IRQL independently Software and hardware interrupts map to IRQLs When a CPU raises its IRQL to a level all interrupts at that level and below are masked for that CPU
SYNCH_LEVEL : : : DEVICE_IRQL 2 DEVICE_IRQL 1 DISPATCH_LEVEL APC_LEVEL PASSIVE_LEVEL
If a minidump, must also configure image path to point to location of images (File->Image File Path) (FileUse same string as for symbol server (Windows XP and beyond)
Unmasked
Masked
Key IRQLs
PASSIVE_LEVEL:
No interrupts are masked User mode code always executes at PASSIVE_LEVEL KernelKernel-mode code executes at PASSIVE_LEVEL most of the time
Stacks
Each thread has a user-mode and userkernelkernel-mode stack
The user-mode stack is usually 1 MB on x86 userThe kernel-mode stack is typically 12 KB (20 KB for kernelGUI threads) on x86 systems
DISPATCH_LEVEL:
Highest software interrupt level Scheduler is off Page faults cannot be handled and are illegal operations
INF402
Stack Frames
Function 1
Parameter 1 Return Address Frame Pointer Local Variable 1 Local Variable 2 Parameter 3 Parameter 2 Parameter 1 Return Address Frame Pointer Local Variable 1 Local Variable 2
Higher Addresses
Calling Conventions
Stacks are easy to interpret if functions use standard calling conventions Other calling conventions make the stack hard to figure out
No frame pointer Register arguments (fast calls)
Function 2
Stack Frame
Function 3
Outline
Crash dumps and tools Analysis basics
IRQLs Stacks
NotMyFault.exe
In order to demonstrate common crash scenarios, Mark wrote NotMyFault.Exe
Download from http://www.sysinternals.com /files/notmyfault.zip
It loads MyFault.sys MyFault.Sys has an IOCTL interface that implements different bugs
Manual analysis
Stack trashes Hung Systems When there is no crash dump
IOCTL Interface
MyFault.sys
Paged buffers that are marked not present but are touched when IRQL DISPATCH_LEVEL result in the DRIVER_IRQL_NOT_LESS_OR_EQUAL bug check
Memory Manager calls KeBugCheckEx from page fault handler The IRQL is not less than or equal to the maximum IRQL at which the operation is legal (which is < DISPATCH_LEVEL)
INF402
Automated Analysis
When you open a crash dump with Windbg or Kd you get a basic crash analysis:
Stop code and parameters A guess at offending driver
Crash Transformation
Many crashes cant be analyzed
The victim crashed the system, not the criminal The analyzer may point at Ntoskrnl.exe or Win32K.sys or other Windows components Or, you may get many different crash dumps all pointing at different causes
The analysis is the result of the automated execution of the !analyze debugger command
!Analyze uses heuristics to walk up the stack and determine what driver is the likely cause of the crash Followup is taken from optional triage.ini file
Youre goal isnt to analyze impossible crashes Its to try to make an unanalyzable crash into one that can be analyzed
Dont trust blame of ntoskrnl, win32k, hal, ntfs or other core Windows components
Outline
Crash dumps and tools Analysis basics
IRQLs Stacks
Run Verifier.exe
Choose Create Custom Settings Choose Select Individual Settings from a List Enable all options except Low Resource Simulation
Manual analysis
Stack trashes Hung Systems When there is no crash dump
First, try any suspicious drivers (recently updated, known to be problematic, etc.) If still un-analyzable crashes, try enabling verification on all unthirdthird-party drivers and/or all unsigned drivers As a last resort enable verification on groups of 10-20 drivers 10at a time Run the Windows Memory Diagnostic
The following crash examples demonstrate the Driver Verifier making un-analyzable crashes into ones that unpoint at the problem
Buffer overflow System code overwrite
INF402
Buffer Overruns
Result when a driver goes past the end (overrun) or the beginning (underrun) of a buffer Usually detected when overwritten data is referenced
Another driver or the kernel makes the reference There can be a long delay between corruption and detection
Higher Addresses
Note that you might have to run several times since a crash will occur only if:
The kernel references the corrupted pool structures A driver references the corrupted buffer
Page n+2
Invalid
Buffer
Page n+1
Higher Addresses
Signature
Page n
Invalid
Code Overwrites
Caused when a bug results in a wild pointer
A wild pointer that points at invalid memory is easily detected A wild pointer that points at data is similar to buffer overrun
Might not cause a problem for a long time Crash makes it look like its something elses fault
System code write protection catches code overwrite, but its not on if:
Its a Windows 2000 system with > 127 MB memory Its a Windows XP or Windows 2003 Server system with > 255 MB In other words, its off on most systems
INF402
Rerun NotMyFault
Crash occurs immediately and even the blue screen points at MyFault.sys:
!analyze shows the address of the write and the target (NtReadFile)
Outline
Crash dumps and tools Analysis basics
IRQLs Stacks
Manual analysis
Stack trashes Hung Systems When there is no crash dump
Manual Analysis
Sometimes !analyze isnt enough
Doesnt tell you anything useful You want to know what was happening at the time of the crash
Stack Trashing
An example of a crash requiring manual analysis is a stack trash Stack trashes have several possible causes:
A driver pushing things on the stack causes the stack to overflow A driver overruns a stack-allocated buffer stack-
Useful commands:
List loaded drivers: lm kv
Make sure drivers are all recognized and up to date
INF402
Look deeper
!thread shows an outstanding IRP !irp <irp> shows that myfault.sys was the target of the IRP
Hung Systems
Sometimes system becomes unresponsive
Keyboard and mouse freeze
Debugger should connect and display the bugcheck code Type !analyze v, and if necessary, perform additional analysis commands as described earlier
Grinding to a halt
Storage stack resource deadlock
To save complete memory dump for offline analysis, use .dump (or .dump /f to capture a full dump)
Note: this will be slow over a serial cable
INF402
Analyzing a Hang
Then attempt to determine reason for hang. (This is the hard part.)
Use !thread to see whats running check the stack running
Check each CPU by using the ~ command, for example, ~0, ~1
Use !locks to look at possible deadlocks Use !irql to see previous IRQL (Windows Server 2003 and later)
If you cant figure it out but want to save it for later analysis:
Use .crash to force a crash Or .dump to save the current state of the system in a dump file
This can also be done with LiveKD (free from Sysinternals) on a live system
You can get a dump of a live system with LiveKd (free download from Sysinternals.com)
Use it to run Windbg or Kd Use .dump to snapshot live system
More Information
Windows Internals, 4th Edition Chapter 10: Crash Dump Analysis The help file which is installed with Debugging Tools for Windows Knowledge Base Articles
http://www.microsoft.com/ddk/debugging
Resources
Technical Chats and Webcasts
http://www.microsoft.com/communities/chats/default.mspx http://www.microsoft.com/usa/webcasts/default.asp
Other books:
http://www.microsoft.com/ddk/newbooks.asp
Virtual Labs
http://www.microsoft.com/technet/traincert/virtuallab/rms.mspx
Newsgroups
http://communities2.microsoft.com/ communities/newsgroups/en-us/default.aspx
User Groups
http://www.microsoft.com/communities/usergroups/default.mspx
INF402
10
Live from TechEd Webcast Series has Been Brought to You by:
2006 Microsof t Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The inf ormation herein is f or informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
INF402
11