Presentation is loading. Please wait.

Presentation is loading. Please wait.

ADM390 Microsoft® Windows® Crash Dump Analysis

Similar presentations


Presentation on theme: "ADM390 Microsoft® Windows® Crash Dump Analysis"— Presentation transcript:

1 ADM390 Microsoft® Windows® Crash Dump Analysis
Mark Russinovich Winternals Software David Solomon Expert Seminars

2 About The Speakers Authors of: David Solomon: Mark Russinovich:
Inside Windows 2000, 3rd Edition (Microsoft Press) Inside Windows 2000/XP/2003 Interactive Internals Video Tutorial Used by Microsoft for worldwide internal training David Solomon: Teaches Windows internals classes ( Writes books and articles on Windows internals Mark Russinovich: Author of tools on Co-founder and Chief Software Architect for Winternals Software ( Teaches Windows internals classes

3 Outline What causes crashes? Crash dump options
Analysis with WinDbg/Kd Debugging hung systems Microsoft On-line Crash Analysis Using Driver Verifier Live kernel debugging Getting past a crash

4 Introduction Many systems administrators ignore Windows NT/Windows 2000’s crash dump options “I don’t know what to do with one” “Its too hard” “It won’t tell me anything anyway” Basic crash dump analysis is actually pretty straightforward Even if only 1 out of 5 or 10 dumps tells you what’s wrong, isn’t it worth spending a few minutes?

5 Why Analyze Dumps? The debuggers and Microsoft Online Crash Analysis (OCA) often solve crashes Sometimes, however, they do not, so your analysis might tell you: What driver to disable, update, or replace with different hardware What OEM to send the dump to

6 What Causes Crashes? System crashes when a fatal error prevents further execution Any kernel-mode component can crash the system Drivers and the OS share the same memory space Therefore, any driver or OS component can, due to a bug, corrupt system memory Note: This is for performance reasons and is the same on Linux, most Unix’s, VMS, etc…

7 What Are The Root Causes?
Anecdotal evidence suggests: Buggy drivers Bugs in the OS Hardware failure/error Cosmic rays

8 At The Crash A component calls KeBugCheckEx, which takes five arguments: Stop code 4 stop-code defined parameters KeBugCheckEx: Turns off interrupts Tells other CPUs to stop Paints the blue screen Notifies registered drivers of the crash If a dump is configured: Verifies checksums Calls dump I/O functions

9 Common Stop Codes There are about 150 defined stop codes
Shared by many components and drivers Common ones include: IRQL_NOT_LESS_OR_EQUAL (0x0A) Usually an invalid memory access INVALID_KERNEL_MODE_TRAP (0x7F) and KMODE_EXCEPTION_NOT_HANDLED (0x1E) Generated by executing garbage instructions Usually caused when a stack is trashed Documented in Debugger Tools help file Often, multiple articles in Knowledge Base

10 Dump Options Complete memory dump (Windows NT 4, Windows 2000, Windows XP) Full contents of memory written to <systemroot>\memory.dmp Kernel memory dump (Windows 2000, Windows XP, Server 2003) System memory written to <systemroot>\memory.dmp Small memory dump (Windows 2000, Windows XP, Server 2003) Also called a minidump or triage dump 64KB of summary written to <systemroot>\minidump\MiniMMDDYY-NN.dmp

11 Enabling Dumps In Windows 2000/XP/2003:

12 What Happens When Crash Dumps Are Enabled
When the system boots it checks HKEY_LOCAL_MACHINE\System\ CurrentControlSet\Control\CrashControl The boot disk paging file’s on-disk mapping is obtained Relevant components are checksummed: Boot disk miniport driver Crash I/O functions Page file map

13 At The Reboot Memory.dmp Paging File WinLogon Session Manager 2 3
SaveDump 1 4 User mode Kernel mode Paging File NtCreatePagingFile

14 At The Reboot Session Manager process (\windows\system32\smss.exe) initializes paging file NtCreatePagingFile NtCreatePagingFile determines if the dump has a crash header Protects the dump from use WinLogon calls NtQuerySystemInformation to tell if there’s a dump 1 2

15 At The Reboot If there’s a dump, Winlogon executes SaveDump (\windows\system32\savedump.exe) Writes an event to the System event log SaveDump writes contents to appropriate file Crash dump portion of paging file is in use during copy, so virtual memory can run low 3 4

16 Why Crash Dumps Fail Most common reasons: Less common:
Paging file on boot volume is too small Not enough free space for extracted dump Less common: The crash corrupted components involved in the dump process Miniport driver doesn’t implement dump I/O functions Windows storage drivers must implement dump I/O to get a Microsoft® digital signature

17 Microsoft On-line Crash Analysis (OCA)
By Default, after a reboot XP/Server 2003 prompts you to send information to Can be configured with Computer Properties->Advanced->Error Reporting Can be customized with Group Policies

18 What Does OCA Do? Server farm uses !analyze, but uses Microsoft’s Triage.ini file and database that includes information about known problems Several ways to get OCA results: Via At the OCA site Sometimes OCA will point you at KB articles that describe the problem KB articles may tell you to use Windows Update to get newer drivers, a hotfix, or install a Service Pack

19 Analyzing a Crash Dump If OCA doesn’t help you, or you have an NT4 or Windows 2000 dump, then you need to open it with one of the kernel debuggers: WinDbg –Windows program Kd – command-line program Both provide same kernel debugger analysis commands Part of the Debugging Tools for Windows Free download from Supports Windows NT 4, Windows 2000, Windows XP, Server 2003 Check for updates frequently Don’t use older version on install media

20 Symbol Files Before you can use any crash analysis tool you need symbol files Symbol files contain global function and variable names Symbols are service pack-specific and have an installer (default directory is \windows\symbols) Windows NT 4: *.dbg Windows 2000: *.dbg, *.pdb Windows XP/2003: *.pdb Note: Service Pack symbols only include updates

21 Microsoft Symbol Server
WinDbg and Kd can download symbols automatically from Microsoft Pick a directory to install symbols and add the following to the debugger’s symbol path: SRV*directory* com/download/symbols The debugger automatically detects the OS version of a dump and downloads the symbols on-demand

22 Automated Analysis When you open a crash dump with Windbg or Kd you get a basic crash analysis: Stop code and parameters A guess at offending driver The analysis is the result of the automated execution of the !analyze debugger command

23 Automated Analysis Always execute !analyze with the –v option to get more information Text description of stop code Meaning (if any) of parameters Stack dump !Analyze uses heuristics to walk up the stack and determine what driver is the likely cause of the crash “Followup” is taken from optional triage.ini file

24 Manual Analysis Sometimes automated analysis isn’t enough
!analyze doesn’t tell you anything useful You want to know what else was happening at the time of the crash Useful commands: Examine current thread: !thread tid May or may not be related to the crash List all processes: !process 0 0 Make sure you understand what was running on the system Examine a specific process: !process <pid> 7 List loaded drivers: lm kv Make sure drivers are all recognized and up to date Look at memory usage: !vm Create a smaller dump file: .dump Additional commands: !help

25 Driver Verifier If you find a driver in a crash dump that looks like it might be the cause of the crash, turn on verification for it If the Verifier detects a violation it crashes the system and identifies the driver Use “Last Known Good” if the verifier detects a bug during the boot If a bug is detected in a third-party product check for updates and/or contact the vendor’s support

26 NotMyFault.exe In order to demonstrate common crash scenarios, use NotMyFault.Exe Download from /files/notmyfault.zip It loads MyFault.sys MyFault.Sys has an IOCTL interface that implements different bugs User Mode Kernel Mode MyFault.sys IOCTL Interface

27 IRQL_NOT_LESS_OR_EQUAL
Run NotMyFault and select “High IRQL fault (kernel mode)” Allocates paged pool buffer Frees the buffer Raises IRQL ≥ DISPATCH_LEVEL Touches the buffer Paged buffers that are marked “not present” but are touched when IRQL ≥ DISPATCH_LEVEL result in the IRQL_NOT_LESS_OR_EQUAL bug check Memory Manager calls KeBugCheckEx from page fault handler The IRQL is not less than or equal to the maximum IRQL at which the operation is legal (which is < DISPATCH_LEVEL)

28 Using the Stack in Analysis
!analyze easily identifies MyFault.sys by looking at the KeBugCheckEx parameters The Memory Manager looked at the stack and determined the address that caused the page fault !analyze often looks at the stack to determine the cause of a crash

29 Stacks Each thread has a user-mode and kernel-mode stack
The user-mode stack is usually 1 MB on x86 The kernel-mode stack is typically 12 KB on x86 systems Stacks allow for nested function invocation Parameters can be passed on the stack Stores return address Serves as storage for local variables

30 Stack Frames Function 1 Function 2 Function 3 Parameter 1
Return Address Frame Pointer Local Variable 1 Function 1 Local Variable 2 Parameter 3 Higher Addresses Parameter 2 Parameter 1 Function 2 Return Address Frame Pointer Local Variable 1 Local Variable 2 Function 3 Parameter 2 Parameter 1 Return Address Frame Pointer Local Variable 1

31 Stacks Other calling conventions make the stack hard to figure out
No frame pointer Register arguments (fast calls) Debugger requires symbol information to parse The stack is the #1 analysis resource It requires that a driver get “caught in the act” Sometimes that’s not possible without the Driver Verifier’s help

32 Stack Trashing Stack trashes have several possible causes:
A driver pushing things on the stack causes the stack to overflow A driver overruns a stack-allocated buffer Usually results in garbage code being executed (KMODE_EXCEPTION_NOT_HANDLED) Driver Verifier can’t determine cause Since the stack is corrupted, analysis is especially hard

33 Debugging Stack Trashes
Run NotMyFault and select “Stack Trash” Allocates a buffer on the stack Overruns the buffer Returns to the caller Crash doesn’t show much off hand !analyze actually blames Win32K.sys, the Win32 kernel-mode subsystem Stack doesn’t show anything except an exception handler Look deeper !thread shows an outstanding IRP !irp <irp> shows that myfault.sys was the target of the IRP

34 Another Driver’s Buffer
Buffer Overruns Result when a driver goes past the end (overrun) or the beginning (underrun) of a buffer Usually detected when overwritten data is referenced Another driver or the kernel makes the reference There can be a long delay between corruption and detection Another Driver’s Buffer Higher Addresses Pool Structures Driver Buffer

35 Causing a Buffer Overrun
Run NotMyFault and select “Buffer Overrun” Allocates a nonpaged pool buffer Writes a string past the end Note that you might have to run several times since a crash will occur only if: The kernel references the corrupted pool structures A driver references the corrupted buffer The crash tells you what happened, but not why

36 A Buffer Overrun Bluescreen
In this example, where the crash was the result of the kernel tripping on corrupt pool tracking structures, the Bluescreen tells you what to do:

37 What is Special Pool? Special pool is a kernel buffer area where buffers are sandwiched with invalid pages Conditions for a driver allocating from special pool: Driver Verifier is verifying driver Special pool is enabled Allocation is slightly less than one page (4 KB on x86) Invalid Page n+2 Higher Addresses Buffer Page n+1 Signature Invalid Page n

38 Turning on Special Pool
Enable Special Pool verification on the suspect driver

39 The Verifier Catching Buffer Overrun
The Driver Verifier catches the overrun when it occurs The Bluescreen tells you who’s fault it is !analyze explains the crash and also tells you the buggy driver name The stack shows where the driver bug is

40 Code Overwrites Caused when a bug results in a wild pointer
A wild pointer that points at invalid memory is easily detected A wild pointer that points at data is similar to buffer overrun Might not cause a problem for a long time Crash makes it look like its something else’s fault Driver Verifier doesn’t catch code overwrite System code write protection catches code overwrite, but it’s not on if: It’s a Windows 2000 system with > 127 MB memory It’s a Windows XP or .NET Server system with > 255 MB Something has disabled it

41 Causing a Code Overwrite
Run NotMyFault and select “Code Overwrite” Overwrites first bytes of nt!ntreadfile Function is most common entry to I/O system so a random thread will cause the crash The crash hints that the fault occurred in NtReadFile The last user-mode address is ZwReadFile The ebx register in the exception frame points at NtReadFile NtReadFile’s start location looks scrambled (u ntreadfile)

42 System Code Write Protection
Make sure system code write protection is on Set HKLM\System\CurrentControlSet\Control \Session Manager\Memory Management LargePageMinimum REG_DWORD 0xFFFFFFFF EnforceWriteProtection REG_DWORD 1 Reboot to take effect Rerun NotMyFault Crash occurs immediately and even the blue screen points at MyFault.sys: !analyze shows the address of the write and the target (NtReadFile)

43 Hung Systems You can tackle a hung system, but only if you’ve prepared: Boot in debug mode, or Set the keystroke-crash Registry value For debug mode you need a second system (the debugger host) connected to the target via serial cable Run Windbg/Kd on the host Edit the target’s boot.ini file: /debugport=comX /baudrate=XXX When the system hangs, connect with the debugger and hit Ctrl-C

44 Hung Systems To configure keystroke-crash:
Set HKEY_LOCAL_MACHINE\System\ CurrentControlSet\Services\i8042prt\ Parameters\CrashOnCtrlScrl to 1 Enter right-ctrl+[scroll-lock, scroll-lock] to crash the system Use !thread to see what’s running Examine loaded drivers, IRQL, …

45 Getting Past a Crash Last-Known Good Safe Mode Recovery Console
Boots with driver/kernel configuration last used during a successful boot Safe Mode Boots the system with core set of drivers and services Network and non-network Recovery Console Manually disable offending service, replace corrupt images, update files ERD Commander 2003 Registry Editor, Explorer, Driver/Service Manager, password changer, Event Log viewer, Notepad

46 The Bluescreen Screen Saver
Scare your enemies and fool your friends with the Sysinternals Bluescreen Screen Saver Be careful, your job may be on the line!

47 More Information Inside Windows 2000, 3rd edition
Section on System Crashes in chapter 4 Debugging Tools help file Knowledge Base Articles Usenet newsgroup microsoft.public.windbg for discussion of debugger issues The debugger team wants your feedback and bug reports - mail suggestions or bug reports to

48 Community Resources Community Resources
Most Valuable Professional (MVP) Newsgroups Converse online with Microsoft Newsgroups, including Worldwide User Groups Meet and learn with your peers

49 evaluations

50 © 2003 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.


Download ppt "ADM390 Microsoft® Windows® Crash Dump Analysis"

Similar presentations


Ads by Google