I am investigating system hang of our customer.
When a user is saving an Excel file using our FSD it causes the whole
windows machine to hang and the only way to kill it is by power cycling
the machine. This problem happens so rarely that we haven't been able to
reproduce in-house.
There are two log entries in Event log like
- Source: App hang, EventId: 1002, Hanging application Excel.exe,
version 11.0.8169, hang module hungapp, version 0.0.0.0 hang address
0x00000000 - Source: App hang, EventId: 1001, Fault bucket 517132579
We got a complete machine dump initiated by keyboard, but hanging Excel
process is not in this complete dump.
- !analyze -v -hang doesn't show any useful info
- there is no ready thread, so it looks like deadlock.
0: kd> !ready
Processor 0: No threads in READY state
0: kd> ~1
1: kd> !ready
Processor 1: No threads in READY state
1: kd> ~3
3 is not a valid processor number
- !locks shows two locked resources, but no waiters. Anyway how can I
dump locks on mutants?
1: kd> !kdexts.locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks......
Resource @ 0x88bc3040 Shared 1 owning threads
Threads: 89ac3b33-01<*> *** Actual Thread 89ac3b30
KD: Scanning for held locks..
Resource @ 0x88b52e80 Shared 1 owning threads
Threads: 89ac3dab-01<*> *** Actual Thread 89ac3da8
KD: Scanning for held locks.
7463 total locks, 2 locks currently held
1: kd> !thread 89ac3dab
89ac3dab is not a thread object, interpreting as stack value...
TYPE mismatch for thread object at 89ac3dab
1: kd> dt nt!ERESOURCE 0x88bc3040
Symbol nt!ERESOURCE not found.
1: kd> dt nt!_ERESOURCE 0x88bc3040
+0x000 SystemResourcesList : _LIST_ENTRY [0x888c5148 - 0x88cb41e0]
+0x008 OwnerTable : (null)
+0x00c ActiveCount : 1
+0x00e Flag : 8
+0x010 SharedWaiters : (null)
+0x014 ExclusiveWaiters : (null)
+0x018 OwnerThreads : [2] _OWNER_ENTRY
+0x028 ContentionCount : 0
+0x02c NumberOfSharedWaiters : 0
+0x02e NumberOfExclusiveWaiters : 0
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
1: kd> dt nt!_ERESOURCE 0x88b52e80
+0x000 SystemResourcesList : _LIST_ENTRY [0x88cb6ba8 - 0x88d46900]
+0x008 OwnerTable : (null)
+0x00c ActiveCount : 1
+0x00e Flag : 8
+0x010 SharedWaiters : (null)
+0x014 ExclusiveWaiters : (null)
+0x018 OwnerThreads : [2] _OWNER_ENTRY
+0x028 ContentionCount : 0
+0x02c NumberOfSharedWaiters : 0
+0x02e NumberOfExclusiveWaiters : 0
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
- I switched into context of all processes which were shown with
!stacks and tried to find locked user mode critical sections. I have
found two, but there are no waiters. One critical section seems little
bit strange : take look on high lock count, but no owner.
1: kd> .process /r /p 88b9a5b8
Implicit process is now 88b9a5b8
Loading User Symbols
1: kd> !ntsdexts.locks
CritSec +a762f8 at 00a762f8
LockCount 10964536
RecursionCount 0
OwningThread 0
EntryCount 0
ContentionCount 0
*** Locked
Scanned 447 critical sections
1: kd> .process /r /p 88b5e3e8
Implicit process is now 88b5e3e8
Loading User Symbols
1: kd> !ntsdexts.locks
CritSec shsvcs!g_csThemeService+0 at 776ff3d0
LockCount 0
RecursionCount 1
OwningThread 4d4
EntryCount 0
ContentionCount 0
*** Locked
1: kd> !thread -t 4d4
Looking for thread Cid = 4d4 ...
THREAD 88b6ada8 Cid 04cc.04d4 Teb: 7ffde000 Win32Thread: 00000000
WAIT: (WrLpcReceive) UserMode Non-Alertable
88b68e08 Semaphore Limit 0x7fffffff
Not impersonating
DeviceMap e1003150
Owning Process 88b5e3e8 Image: svchost.exe
Wait Start TickCount 9864 Ticks: 209 (0:00:00:03.265)
Context Switch Count 208
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x0000dd4d
LPC Server thread working on message Id dd4d
Start Address kernel32!BaseThreadStartThunk (0x7c810659)
Stack Init b9fd1000 Current b9fd0c34 Base b9fd1000 Limit b9fce000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr Args to Child
b9fd0c4c 80502e56 88b6ae18 88b6ada8 804fb048 nt!KiSwapContext+0x2f (FPO:
[Uses EBP] [0,0,4])
b9fd0c58 804fb048 e13965a8 8055c0a0 e13965a8 nt!KiSwapThread+0x8a (FPO:
[0,0,0])
b9fd0c80 805a4eb2 00000001 00000010 89ac4001
nt!KeWaitForSingleObject+0x1c2 (FPO: [Non-Fpo])
b9fd0d30 805a50e6 000000e0 0065fe24 00000000
nt!NtReplyWaitReceivePortEx+0x3dc (FPO: [Non-Fpo])
b9fd0d4c 805409ac 000000e0 0065fe24 00000000
nt!NtReplyWaitReceivePort+0x18 (FPO: [Non-Fpo])
b9fd0d4c 7c90eb94 000000e0 0065fe24 00000000 nt!KiFastCallEntry+0xfc
(FPO: [0,0] TrapFrame @ b9fd0d64)
0065ff34 00000000 00000000 00000000 00000000 ntdll!KiFastSystemCallRet
(FPO: [0,0,0])
- I have dumped all threads with stacks (!process 0 0x1f)and roughly
compared them to health system. There are only two waiting LPC clients,
but I can see them also on health system. So I guess the LPC is not
incorporated into deadlock.
- All threads of FSD look like waiting for request, the same with
our user mode service and GUI application. I cannot find anything
suspicious. There are two additional threads comparing to health system
- probably support for WMI tracing + some timer. I think it is unimportant.
THREAD 88a73b30 Cid 0e50.0e80 Teb: 7ffdc000 Win32Thread: 00000000
WAIT: (UserRequest) UserMode Alertable
890a3f58 SynchronizationEvent
890a3ef8 SynchronizationEvent
88a73c20 NotificationTimer
IRP List:
88a72240: (0006,0190) Flags: 00000070 Mdl: 00000000
Not impersonating
DeviceMap e14dc7c0
Owning Process 88c736a8 Image: Xythos.exe
Wait Start TickCount 5413 Ticks: 4660
(0:00:01:12.812)
Context Switch Count 3
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address ADVAPI32!WmipEventPump (0x77df9981)
Start Address kernel32!BaseThreadStartThunk (0x7c810659)
Stack Init 91bf3000 Current 91bf2960 Base 91bf3000 Limit
91bf0000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 2 DecrementCount 16
ChildEBP RetAddr
91bf2978 80502e56 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
91bf2984 804fad68 nt!KiSwapThread+0x8a (FPO: [0,0,0])
91bf29bc 805bf4a9 nt!KeWaitForMultipleObjects+0x284 (FPO:
[Non-Fpo])
91bf2d48 805409ac nt!NtWaitForMultipleObjects+0x297 (FPO:
[Non-Fpo])
91bf2d48 7c90eb94 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame
@ 91bf2d64)
00b0fea4 7c90e9ab ntdll!KiFastSystemCallRet (FPO: [0,0,0])
00b0fea8 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc (FPO: [5,0,0])
00b0ff44 77df9b26 kernel32!WaitForMultipleObjectsEx+0x12c (FPO:
[Non-Fpo])
00b0ffb4 7c80b683 ADVAPI32!WmipEventPump+0x230 (FPO: [Non-Fpo])
00b0ffec 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])
THREAD 88a78da8 Cid 0e50.0ec0 Teb: 7ffd6000 Win32Thread: 00000000
WAIT: (DelayExecution) UserMode Alertable
88a78e98 NotificationTimer
Not impersonating
DeviceMap e14dc7c0
Owning Process 88c736a8 Image: Xythos.exe
Wait Start TickCount 5451 Ticks: 4622
(0:00:01:12.218)
Context Switch Count 2
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address ntdll!RtlpTimerThread (0x7c92798d)
Start Address kernel32!BaseThreadStartThunk (0x7c810659)
Stack Init 91efe000 Current 91efdcbc Base 91efe000 Limit
91efb000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr
91efdcd4 80502e56 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
91efdce0 804faa13 nt!KiSwapThread+0x8a (FPO: [0,0,0])
91efdd0c 80614a7b nt!KeDelayExecutionThread+0x1c9 (FPO: [Non-Fpo])
91efdd54 805409ac nt!NtDelayExecution+0x87 (FPO: [Non-Fpo])
91efdd54 7c90eb94 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame
@ 91efdd64)
01b2ff98 7c90d85c ntdll!KiFastSystemCallRet (FPO: [0,0,0])
01b2ff9c 7c9279d4 ntdll!NtDelayExecution+0xc (FPO: [2,0,0])
01b2ffb4 7c80b683 ntdll!RtlpTimerThread+0x47 (FPO: [Non-Fpo])
01b2ffec 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])
- It looks like nothing suspicious with memory
1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 393035 ( 1572140 Kb)
Page File: ??\C:\pagefile.sys
Current: 2095104 Kb Free Space: 2081120 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Available Pages: 247488 ( 989952 Kb)
ResAvail Pages: 311055 ( 1244220 Kb)
Locked IO Pages: 110 ( 440 Kb)
Free System PTEs: 177951 ( 711804 Kb)
Free NP PTEs: 32766 ( 131064 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 2639 ( 10556 Kb)
Modified PF Pages: 2639 ( 10556 Kb)
NonPagedPool Usage: 4903 ( 19612 Kb)
NonPagedPool Max: 65536 ( 262144 Kb)
PagedPool 0 Usage: 19360 ( 77440 Kb)
PagedPool 1 Usage: 918 ( 3672 Kb)
PagedPool 2 Usage: 957 ( 3828 Kb)
PagedPool 3 Usage: 928 ( 3712 Kb)
PagedPool 4 Usage: 943 ( 3772 Kb)
PagedPool Usage: 23106 ( 92424 Kb)
PagedPool Maximum: 92160 ( 368640 Kb)
Shared Commit: 3120 ( 12480 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 4527 ( 18108 Kb)
PagedPool Commit: 23106 ( 92424 Kb)
Driver Commit: 4335 ( 17340 Kb)
Committed pages: 95943 ( 383772 Kb)
Commit limit: 878337 ( 3513348 Kb)
Total Private: 58139 ( 232556 Kb)
0658 Rtvscan.exe 14733 ( 58932 Kb)
0dbc Instant.exe 4223 ( 16892 Kb)
0b60 RTHDCPL.exe 3969 ( 15876 Kb)
04cc svchost.exe 3918 ( 15672 Kb)
0774 explorer.exe 3670 ( 14680 Kb)
09b4 OUTLOOK.EXE 3290 ( 13160 Kb)
0710 CcmExec.exe 2716 ( 10864 Kb)
0330 winlogon.exe 1864 ( 7456 Kb)
00fc SPBBCSvc.exe 1394 ( 5576 Kb)
0f30 wmiprvse.exe 1251 ( 5004 Kb)
0d48 wmiprvse.exe 1219 ( 4876 Kb)
0184 spoolsv.exe 1069 ( 4276 Kb)
0370 lsass.exe 1066 ( 4264 Kb)
08a0 wmiprvse.exe 1024 ( 4096 Kb)
07e4 ccEvtMgr.exe 989 ( 3956 Kb)
0b8c ccApp.exe 978 ( 3912 Kb)
0e50 Xythos.exe 960 ( 3840 Kb)
07a0 ccSetMgr.exe 924 ( 3696 Kb)
0c54 acrotray.exe 911 ( 3644 Kb)
0b94 VPTray.exe 825 ( 3300 Kb)
0414 svchost.exe 738 ( 2952 Kb)
051c nvsvc32.exe 560 ( 2240 Kb)
035c services.exe 536 ( 2144 Kb)
046c svchost.exe 477 ( 1908 Kb)
05b0 svchost.exe 468 ( 1872 Kb)
0318 csrss.exe 430 ( 1720 Kb)
0344 DWRCS.EXE 410 ( 1640 Kb)
024c DefWatch.exe 376 ( 1504 Kb)
0d44 msmsgs.exe 352 ( 1408 Kb)
05f4 SavRoam.exe 341 ( 1364 Kb)
06f0 XfsSvcCon.exe 324 ( 1296 Kb)
0528 svchost.exe 324 ( 1296 Kb)
097c DWRCST.EXE 291 ( 1164 Kb)
059c msiexec.exe 266 ( 1064 Kb)
0cdc ctfmon.exe 238 ( 952 Kb)
04e0 MDM.EXE 236 ( 944 Kb)
06a4 winvnc4.exe 226 ( 904 Kb)
0b74 jusched.exe 182 ( 728 Kb)
0fe8 FNPLicensingSer 174 ( 696 Kb)
04a4 iviRegMgr.exe 148 ( 592 Kb)
02e8 smss.exe 42 ( 168 Kb)
0004 System 7 ( 28 Kb)
0e30 DoScan.exe 0 ( 0 Kb)
0a50 userinit.exe 0 ( 0 Kb)
0668 acrodist.exe 0 ( 0 Kb)
Now, I don't know what to try as the next step. I asked our support to
arrange getting of excel crash dump + system and application event logs.
Does anyone know what means Fault bucket 517132579?
Any hint is appreciated. Thanks in advance.
-Bronislav Gabrhelik