So own file-system, and there is a bug in my code that I wish to fix. I mostly
test against Win10 and Win11.
The easiest example of the problem is if you download a file (for example with chrome),
the file is handed off to MsMpEng.exe for checking/scanning and does a bunch of operations
with my file system, but never reports back to chrome. Making the download appear stuck.
I’m going to assume I am doing something wrong, but it’s a pretty big haystack.
I have compared FileSpy.exe traces with normal NTFS downloads, and my own file system.
I have exercised FileTest.exe to see if the operations it uses in the trace logs return different values.
Successful runs has MsMpEng.exe create “:Zone.Identifier” stream, but never tries for me, it it is
somewhere around here.
Ordinarily, I would just debug MsMpEng and see where it bails out, but it is protected. I get why, but
obviously it is frustrating. Not that I want to debug MsMpEng at all, I would rather have it tell me
what is wrong. I am guessing it does not have any verbose logs?
Are there other ways to get closer to MsMpEng, perhaps by issuing queries directly somehow, skipping the whole
chome aspect of my debugging (chrome has a lot of source lines and took quite a while to compile and to step through).
Anyone else wrestled with MsMpEng? Any tips?
Break in with the debugger during the hang. Then run:
!process 0 1f msmpeng.exe
And see what the threads are doing.
Too long to paste in here it seems:
https://pastebin.com/eF79Zps5
To my untrained eye, I liked my odds on thread ffff800db3dd9080
Child-SP RetAddr Call Site
ffffb80d`5da616e0 fffff802`7301b840 nt!KiSwapContext+0x76
ffffb80d`5da61820 fffff802`7301ad6f nt!KiSwapThread+0x500
ffffb80d`5da618d0 fffff802`7301a613 nt!KiCommitThreadWait+0x14f
ffffb80d`5da61970 fffff802`73428981 nt!KeWaitForSingleObject+0x233
ffffb80d`5da61a60 fffff802`73428a2a nt!ObWaitForSingleObject+0x91
ffffb80d`5da61ac0 fffff802`7320d9f5 nt!NtWaitForSingleObject+0x6a
ffffb80d`5da61b00 00007ffc`fc2cd144 nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffb80d`5da61b00)
000000be`63f7c108 00007ffc`f9fb306e ntdll!NtWaitForSingleObject+0x14
000000be`63f7c110 00007ffc`9a537896 KERNELBASE!WaitForSingleObjectEx+0x8e
000000be`63f7c1b0 00007ffc`9a53781c mpengine!FilterOplock::ReleaseOplock+0x36
000000be`63f7c1e0 00007ffc`9a5377e4 mpengine!FilterOplock::~FilterOplock+0x18
000000be`63f7c210 00007ffc`9a18bfce mpengine!FilterOplock::`vector deleting destructor'+0x14
000000be`63f7c240 00007ffc`9a5527a2 mpengine!CommonUtil::CRefObject::Release+0x36
000000be`63f7c270 00007ffc`9a575dd3 mpengine!CommonUtil::AutoRef<CommonUtil::IMpThreadPoolProvider>::Release+0x16
000000be`63f7c2a0 00007ffc`9a2f8786 mpengine!CWin32File::ReleaseOplock+0x13
000000be`63f7c2d0 00007ffc`9a5def04 mpengine!nUFSP_native::CloseAllHandles+0x6a
000000be`63f7c300 00007ffc`9a272e95 mpengine!nUFSP_native::CloseFile+0xd4
000000be`63f7c330 00007ffc`9a27179d mpengine!UfsNode::CloseFileUsingPlugin+0x39
000000be`63f7c360 00007ffc`9a2712b1 mpengine!UfsNode::Close+0x35
000000be`63f7c390 00007ffc`9a270dde mpengine!UfsClientRequest::FindNextInNode+0x11d
000000be`63f7c410 00007ffc`9a5034e9 mpengine!UfsNodeFinder::FindFirst+0xce
000000be`63f7c4b0 00007ffc`9a24dc35 mpengine!UfsClientRequest::AnalyzeNode+0x89
000000be`63f7c530 00007ffc`9a24d2db mpengine!UfsClientRequest::AnalyzeLeaf+0xf9
000000be`63f7c5b0 00007ffc`9a24de5f mpengine!UfsClientRequest::AnalyzePath+0x1a3
000000be`63f7c670 00007ffc`9a31ea70 mpengine!UfsCmdBase::ExecuteCmd<<lambda_63254cfa82a2be95f0c1106eef9d5b22> >+0x12b
000000be`63f7c710 00007ffc`9a31d3b1 mpengine!UfsScanFileCmd::Execute+0x50
000000be`63f7c9a0 00007ffc`9a31f1dd mpengine!ksignal+0x1d1
000000be`63f7cc90 00007ffc`9a2cb2a9 mpengine!EngineProcessFile+0x231
000000be`63f7cdd0 00007ffc`9a30ca9c mpengine!CResmgrFile::ScanExpanded+0x2dd
000000be`63f7cfc0 00007ffc`9a30c5ea mpengine!CResmgrFile::ScanExImpl+0x2a4
000000be`63f7d240 00007ffc`9a28813b mpengine!CResmgrFile::ScanEx+0x1a
000000be`63f7d370 00007ffc`9a2cb417 mpengine!ResmgrProcessResource+0x1c7
000000be`63f7d5a0 00007ffc`9a30ca9c mpengine!CResmgrFile::ScanExpanded+0x44b
000000be`63f7d790 00007ffc`9a30c5ea mpengine!CResmgrFile::ScanExImpl+0x2a4
000000be`63f7da10 00007ffc`9a28813b mpengine!CResmgrFile::ScanEx+0x1a
000000be`63f7db40 00007ffc`9a79bca4 mpengine!ResmgrProcessResource+0x1c7
000000be`63f7dd70 00007ffc`9a782b5f mpengine!ResScan+0xa24
000000be`63f7e1b0 00007ffc`9a785de0 mpengine!ScanOpenWithContext+0x18c7
000000be`63f7ea70 00007ffc`9a31d6ef mpengine!UberScanOpen+0xa4c
000000be`63f7eb80 00007ffc`9a532757 mpengine!ksignal+0x50f
I took another crack at it, dumping the stacks before, and after, and it seems this is the most likely stack:
# Child-SP RetAddr Call Site
00 ffffb80d`5ef106e0 fffff802`7301b840 nt!KiSwapContext+0x76
01 ffffb80d`5ef10820 fffff802`7301ad6f nt!KiSwapThread+0x500
02 ffffb80d`5ef108d0 fffff802`7301a613 nt!KiCommitThreadWait+0x14f
03 ffffb80d`5ef10970 fffff802`73428981 nt!KeWaitForSingleObject+0x233
04 ffffb80d`5ef10a60 fffff802`73428a2a nt!ObWaitForSingleObject+0x91
05 ffffb80d`5ef10ac0 fffff802`7320d9f5 nt!NtWaitForSingleObject+0x6a
06 ffffb80d`5ef10b00 00007ffc`fc2cd144 nt!KiSystemServiceCopyEnd+0x25
07 000000ae`3517c218 00007ffc`f9fb306e ntdll!NtWaitForSingleObject+0x14
08 000000ae`3517c220 00007ffc`c1da5d76 KERNELBASE!WaitForSingleObjectEx+0x8e
09 000000ae`3517c2c0 00007ffc`c1da5cfc mpengine!_rsignal+0x5d0d6
0a 000000ae`3517c2f0 00007ffc`c1da5cc4 mpengine!_rsignal+0x5d05c
0b 000000ae`3517c320 00007ffc`c19c0735 mpengine!_rsignal+0x5d024
0c 000000ae`3517c350 00007ffc`c1dac1de mpengine+0xe0735
0d 000000ae`3517c380 00007ffc`c19209c3 mpengine!_rsignal+0x6353e
0e 000000ae`3517c3b0 00007ffc`c1920964 mpengine+0x409c3
0f 000000ae`3517c3e0 00007ffc`c1c56ad0 mpengine+0x40964
10 000000ae`3517c410 00000000`00000e77 mpengine!MpContainerCloseObject+0x334c0
11 000000ae`3517c418 00007ffc`c1b5d2ee 0xe77
12 000000ae`3517c420 00000000`00000000 mpengine+0x27d2ee
and googling around for things I can look at, I found out the following
kd> kv
08 000000ae`3517c220 00007ffc`c1da5d76 : 00000270`44e25010 00000000`00000000 00000000`00000000 00000000`00001054 : KERNELBASE!WaitForSingleObjectEx+0x8e
this “could” be HANDLE 1054 being passed into WaitForSingleObjectEx:
kd> !handle 1054 f
PROCESS ffff800db1933340
SessionId: 0 Cid: 05d4 Peb: ae34cff000 ParentCid: 02b8
DirBase: 113cd3000 ObjectTable: ffffce872eda0380 HandleCount: 952.
Image: MsMpEng.exe
Handle table at ffffce872eda0380 with 952 entries in use
1054: Object: ffff800db66cc560 GrantedAccess: 001f0003 (Protected) (Inherit) Entry: ffffce874bfff150
Object: ffff800db66cc560 Type: (ffff800da62bdc40) Event
ObjectHeader: ffff800db66cc530 (new version)
HandleCount: 1 PointerCount: 32769
kd> dt nt!_object_header ffff800db66cc530
+0x000 PointerCount : 0n32769
+0x008 HandleCount : 0n1
+0x008 NextToFree : 0x00000000`00000001 Void
+0x010 Lock : _EX_PUSH_LOCK
+0x018 TypeIndex : 0xe6 ''
+0x019 TraceFlags : 0 ''
+0x019 DbgRefTrace : 0y0
+0x019 DbgTracePermanent : 0y0
+0x01a InfoMask : 0x8 ''
+0x01b Flags : 0 ''
+0x01b NewObject : 0y0
+0x01b KernelObject : 0y0
+0x01b KernelOnlyAccess : 0y0
+0x01b ExclusiveObject : 0y0
+0x01b PermanentObject : 0y0
+0x01b DefaultSecurityQuota : 0y0
+0x01b SingleHandleEntry : 0y0
+0x01b DeletedInline : 0y0
+0x01c Reserved : 1
+0x020 ObjectCreateInfo : 0xfffff802`73a53900 _OBJECT_CREATE_INFORMATION
+0x020 QuotaBlockCharged : 0xfffff802`73a53900 Void
+0x028 SecurityDescriptor : (null)
+0x030 Body : _QUAD
Assuming I held the debugger in the right direction, it would seem it is waiting for a PUSH_LOCK. I wonder then if the problem is
perhaps related to IRP_MJ_LOCK_CONTROL
. That is the most obvious locking with filesystems?
But could just be an internal lock released by something else too.
No, I don’t think you’re in the right direction…That thread is waiting on a Dispatcher Object (e.g. event, mutex, timer, semaphore, etc.) and the Lock field is just a lock protecting the object itself. If you look at the !thread output it will tell you which type of object the thread is waiting on. For example:
THREAD ffff9c8cac2f2600 Cid 0004.000c Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
**fffff8061dc44180 SynchronizationEvent**
IRP_MJ_LOCK_CONTROL, otoh, is used for Byte Range Locking, which is something else entirely and isn’t generally involved in deadlocks (conflicting operations fail immediately with STATUS_FILE_LOCK_CONFLICT instead of waiting).
Yeah, it never calls IRP_MJ_LOCK_CONTROL so I could rule that out pretty quickly.
I redid my IRP_MJ_READ, IRP_MJ_WRITE and IRP_MJ_CLEANUP completely, with focus on CcMgr
interaction, since there are so many special-case paths.
The good news I can now download files with browsers fine and MsMpEng.exe does not stop responding.
I am not entirely sure which part fixes it, but either way, I feel the interaction with CcMgr is more
correct, AND, it made the problem go away.