Hi,
I am debugging a core dump I asked a customer to generate to help analyze a hang problem and its seems that our mini-filter driver (aka OurFSFD below) is stuck in FltFsControlFile:
THREAD 896e3db0 Cid 0004.0578 Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
899a2380 SynchronizationEvent
IRP List:
899953b0: (0006,01d8) Flags: 00000884 Mdl: 00000000
Impersonation token: e31b8030 (Level Impersonation)
DeviceMap e2398570
Owning Process 8a197478 Image: System
Wait Start TickCount 406919 Ticks: 3598079 (0:15:36:59.984)
Context Switch Count 561
UserTime 00:00:00.000
KernelTime 00:00:00.062
Start Address srv!WorkerThread (0xba2e9602)
Stack Init ba40a000 Current ba4095c4 Base ba40a000 Limit ba407000 Call 0
Priority 11 BasePriority 9 PriorityDecrement 2
ChildEBP RetAddr
ba4095dc 80833491 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
ba409608 80829a82 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo])
ba409650 f723eb4c nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])
ba409688 f723f66f fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x232 (FPO: [Non-Fpo])
ba4096c0 f724f04b fltmgr!FltPerformSynchronousIo+0xb9 (FPO: [Non-Fpo])
ba409720 f724f138 fltmgr!IssueControlOperation+0x2c1 (FPO: [Non-Fpo])
ba409750 f722cc62 fltmgr!FltFsControlFile+0x26 (FPO: [Non-Fpo])
ba4097dc f723bb73 OurFSFD!PostOpCreate+0x84 (FPO: [Non-Fpo]) (CONV: stdcall)
ba409844 f723dfc2 fltmgr!FltpPerformPostCallbacks+0x1c5 (FPO: [Non-Fpo])
ba409858 f723e4f1 fltmgr!FltpProcessIoCompletion+0x10 (FPO: [Non-Fpo])
ba409868 f723eb83 fltmgr!FltpPassThroughCompletion+0x89 (FPO: [Non-Fpo])
ba409898 f724c5de fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x269 (FPO: [Non-Fpo])
ba4098d4 8081df85 fltmgr!FltpCreate+0x26a (FPO: [Non-Fpo])
ba4098e8 808f904b nt!IofCallDriver+0x45 (FPO: [Non-Fpo])
ba4099d0 808f9492 nt!IopParseDevice+0xa35 (FPO: [Non-Fpo])
ba409a08 8093758f nt!IopParseFile+0x46 (FPO: [Non-Fpo])
ba409a88 80933b54 nt!ObpLookupObjectName+0x11f (FPO: [Non-Fpo])
ba409adc 808eaeff nt!ObOpenObjectByName+0xea (FPO: [Non-Fpo])
ba409b58 808ec199 nt!IopCreateFile+0x447 (FPO: [Non-Fpo])
ba409bb4 ba2f23ea nt!IoCreateFile+0xa3 (FPO: [Non-Fpo])
ba409c24 ba2f42dd srv!SrvIoCreateFile+0x36d (FPO: [Non-Fpo])
ba409cf0 ba2f2b5f srv!SrvNtCreateFile+0x5e0 (FPO: [Non-Fpo])
ba409d78 ba2d1e87 srv!SrvSmbNtCreateAndX+0x15c (FPO: [Non-Fpo])
ba409d84 ba2e96c7 srv!SrvProcessSmb+0xb7 (FPO: [0,0,0])
ba409dac 80949c5a srv!WorkerThread+0x138 (FPO: [Non-Fpo])
ba409ddc 8088e0c2 nt!PspSystemThreadStartup+0x2e (FPO: [Non-Fpo])
00000000 00000000 nt!KiThreadStartup+0x16
The SRV service is presumably creating a file with COMPLETE_IF_OPLOCKED flag and in our post-op-routine we do something like this:
FLT_POSTOP_CALLBACK_STATUS
VFileFltPostOpCreate(PFLT_CALLBACK_DATA data, // IN OUT
PCFLT_RELATED_OBJECTS fltObjects, // IN
PVOID cntxt, // IN
FLT_POST_OPERATION_FLAGS flags) // IN
{
(stack variables initialization and ASSERTS).
if (data->IoStatus.Status == STATUS_OPLOCK_BREAK_IN_PROGRESS) {
// Wait for the oplock break to be acknowledged.
DEBUG(“Waiting for oplock break acknowledgement.”);
ns = FltFsControlFile(fltObjects->Instance,
fltObjects->FileObject,
FSCTL_OPLOCK_BREAK_NOTIFY,
NULL,
0,
NULL,
0,
NULL);
if (!NT_SUCCESS(ns)) {
ERROR(“Error waiting for oplock break ack FltFsControlFile(): 0x%08X”,
ns);
goto exit;
}
We make the call to FltFsControlFile without holding any lock ourselves. As you can see, the thread has been waiting for a while for the notification to arrive (some 15 hours).
Looking at !locks I found out that our thread is the proud owner of an ERESOURCE:
Resource @ 0x89622588 Exclusively owned
Contention Count = 4
NumberOfExclusiveWaiters = 4
Threads: 896e3db0-01<*>
Threads Waiting On Exclusive Access:
894a5db0 8963cb40 894aab40 896e5db0
All 4 waiting threads are SRV threads, with very low tick counts for their wait time (and all ExAcquireResourceExclusiveLite have ‘1’ for their wait parameter). Specifically thread 894a5db0 is stuck trying to deliver an oplock break notification:
THREAD 894a5db0 Cid 0004.0b18 Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
89a7f050 SynchronizationEvent
894a5e28 NotificationTimer
Not impersonating
DeviceMap e1002890
Owning Process 8a197478 Image: System
Wait Start TickCount 4004775 Ticks: 223 (0:00:00:03.484)
Context Switch Count 14068
UserTime 00:00:00.000
KernelTime 00:00:00.000
Start Address srv!WorkerThread (0xba2e9602)
Stack Init ba3e2000 Current ba3e1c44 Base ba3e2000 Limit ba3df000 Call 0
Priority 9 BasePriority 9 PriorityDecrement 0
ChildEBP RetAddr Args to Child
ba3e1c5c 80833491 894a5db0 894a5e58 00000000 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
ba3e1c88 80829a82 894a5db0 89622588 00000000 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo])
ba3e1cd0 8087cc21 89a7f050 0000001b 00000000 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])
ba3e1d0c 8087ce3b 8999b568 e2488878 80a5e3c0 nt!ExpWaitForResource+0xd5 (FPO: [Non-Fpo])
ba3e1d2c ba2f0764 89622588 00000001 896b1d68 nt!ExAcquireResourceExclusiveLite+0x8d (FPO: [Non-Fpo])
ba3e1d44 ba326b10 8999b568 896b1d48 8999b568 srv!UnlinkRfcbFromLfcb+0x33 (FPO: [Non-Fpo])
ba3e1d58 ba326756 8999b568 8995c300 8999b570 srv!SrvCloseCachedRfcb+0x45 (FPO: [Non-Fpo])
ba3e1d84 ba2e96c7 00000000 894a5db0 00000000 srv!SrvOplockBreakNotification+0x1b1 (FPO: [Non-Fpo])
ba3e1dac 80949c5a 0095c340 00000000 00000000 srv!WorkerThread+0x138 (FPO: [Non-Fpo])
ba3e1ddc 8088e0c2 ba2e9602 8995c340 00000000 nt!PspSystemThreadStartup+0x2e (FPO: [Non-Fpo])
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
Has anybody encountered something similar before? My theory is that since SRV is holding an ERESOURCE while calling IoCreateFile APCs are disabled and the oplock notification is “lost”. What is a good way to check if special kernel APCs are disabled (I can check for APC_LEVEL, but I am not sure that will cover all cases). Can I just cancel the open and return STATUS_SHARING_VIOLATION in these cases?
Thanks,
G.