Alternatives to debugging MsMpEng.exe

So own file-system, and there is a bug in my code that I wish to fix. I mostly
test against Win10 and Win11.

The easiest example of the problem is if you download a file (for example with chrome),
the file is handed off to MsMpEng.exe for checking/scanning and does a bunch of operations
with my file system, but never reports back to chrome. Making the download appear stuck.

I’m going to assume I am doing something wrong, but it’s a pretty big haystack.

I have compared FileSpy.exe traces with normal NTFS downloads, and my own file system.
I have exercised FileTest.exe to see if the operations it uses in the trace logs return different values.
Successful runs has MsMpEng.exe create “:Zone.Identifier” stream, but never tries for me, it it is
somewhere around here.

Ordinarily, I would just debug MsMpEng and see where it bails out, but it is protected. I get why, but
obviously it is frustrating. Not that I want to debug MsMpEng at all, I would rather have it tell me
what is wrong. I am guessing it does not have any verbose logs?

Are there other ways to get closer to MsMpEng, perhaps by issuing queries directly somehow, skipping the whole
chome aspect of my debugging (chrome has a lot of source lines and took quite a while to compile and to step through).

Anyone else wrestled with MsMpEng? Any tips?

Break in with the debugger during the hang. Then run:

!process 0 1f msmpeng.exe

And see what the threads are doing.

Too long to paste in here it seems:

https://pastebin.com/eF79Zps5

To my untrained eye, I liked my odds on thread ffff800db3dd9080

        Child-SP          RetAddr               Call Site
        ffffb80d`5da616e0 fffff802`7301b840     nt!KiSwapContext+0x76
        ffffb80d`5da61820 fffff802`7301ad6f     nt!KiSwapThread+0x500
        ffffb80d`5da618d0 fffff802`7301a613     nt!KiCommitThreadWait+0x14f
        ffffb80d`5da61970 fffff802`73428981     nt!KeWaitForSingleObject+0x233
        ffffb80d`5da61a60 fffff802`73428a2a     nt!ObWaitForSingleObject+0x91
        ffffb80d`5da61ac0 fffff802`7320d9f5     nt!NtWaitForSingleObject+0x6a
        ffffb80d`5da61b00 00007ffc`fc2cd144     nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffb80d`5da61b00)
        000000be`63f7c108 00007ffc`f9fb306e     ntdll!NtWaitForSingleObject+0x14
        000000be`63f7c110 00007ffc`9a537896     KERNELBASE!WaitForSingleObjectEx+0x8e
        000000be`63f7c1b0 00007ffc`9a53781c     mpengine!FilterOplock::ReleaseOplock+0x36
        000000be`63f7c1e0 00007ffc`9a5377e4     mpengine!FilterOplock::~FilterOplock+0x18
        000000be`63f7c210 00007ffc`9a18bfce     mpengine!FilterOplock::`vector deleting destructor'+0x14
        000000be`63f7c240 00007ffc`9a5527a2     mpengine!CommonUtil::CRefObject::Release+0x36
        000000be`63f7c270 00007ffc`9a575dd3     mpengine!CommonUtil::AutoRef<CommonUtil::IMpThreadPoolProvider>::Release+0x16
        000000be`63f7c2a0 00007ffc`9a2f8786     mpengine!CWin32File::ReleaseOplock+0x13
        000000be`63f7c2d0 00007ffc`9a5def04     mpengine!nUFSP_native::CloseAllHandles+0x6a
        000000be`63f7c300 00007ffc`9a272e95     mpengine!nUFSP_native::CloseFile+0xd4
        000000be`63f7c330 00007ffc`9a27179d     mpengine!UfsNode::CloseFileUsingPlugin+0x39
        000000be`63f7c360 00007ffc`9a2712b1     mpengine!UfsNode::Close+0x35
        000000be`63f7c390 00007ffc`9a270dde     mpengine!UfsClientRequest::FindNextInNode+0x11d
        000000be`63f7c410 00007ffc`9a5034e9     mpengine!UfsNodeFinder::FindFirst+0xce
        000000be`63f7c4b0 00007ffc`9a24dc35     mpengine!UfsClientRequest::AnalyzeNode+0x89
        000000be`63f7c530 00007ffc`9a24d2db     mpengine!UfsClientRequest::AnalyzeLeaf+0xf9
        000000be`63f7c5b0 00007ffc`9a24de5f     mpengine!UfsClientRequest::AnalyzePath+0x1a3
        000000be`63f7c670 00007ffc`9a31ea70     mpengine!UfsCmdBase::ExecuteCmd<<lambda_63254cfa82a2be95f0c1106eef9d5b22> >+0x12b
        000000be`63f7c710 00007ffc`9a31d3b1     mpengine!UfsScanFileCmd::Execute+0x50
        000000be`63f7c9a0 00007ffc`9a31f1dd     mpengine!ksignal+0x1d1
        000000be`63f7cc90 00007ffc`9a2cb2a9     mpengine!EngineProcessFile+0x231
        000000be`63f7cdd0 00007ffc`9a30ca9c     mpengine!CResmgrFile::ScanExpanded+0x2dd
        000000be`63f7cfc0 00007ffc`9a30c5ea     mpengine!CResmgrFile::ScanExImpl+0x2a4
        000000be`63f7d240 00007ffc`9a28813b     mpengine!CResmgrFile::ScanEx+0x1a
        000000be`63f7d370 00007ffc`9a2cb417     mpengine!ResmgrProcessResource+0x1c7
        000000be`63f7d5a0 00007ffc`9a30ca9c     mpengine!CResmgrFile::ScanExpanded+0x44b
        000000be`63f7d790 00007ffc`9a30c5ea     mpengine!CResmgrFile::ScanExImpl+0x2a4
        000000be`63f7da10 00007ffc`9a28813b     mpengine!CResmgrFile::ScanEx+0x1a
        000000be`63f7db40 00007ffc`9a79bca4     mpengine!ResmgrProcessResource+0x1c7
        000000be`63f7dd70 00007ffc`9a782b5f     mpengine!ResScan+0xa24
        000000be`63f7e1b0 00007ffc`9a785de0     mpengine!ScanOpenWithContext+0x18c7
        000000be`63f7ea70 00007ffc`9a31d6ef     mpengine!UberScanOpen+0xa4c
        000000be`63f7eb80 00007ffc`9a532757     mpengine!ksignal+0x50f

I took another crack at it, dumping the stacks before, and after, and it seems this is the most likely stack:

# Child-SP          RetAddr               Call Site
00 ffffb80d`5ef106e0 fffff802`7301b840     nt!KiSwapContext+0x76
01 ffffb80d`5ef10820 fffff802`7301ad6f     nt!KiSwapThread+0x500
02 ffffb80d`5ef108d0 fffff802`7301a613     nt!KiCommitThreadWait+0x14f
03 ffffb80d`5ef10970 fffff802`73428981     nt!KeWaitForSingleObject+0x233
04 ffffb80d`5ef10a60 fffff802`73428a2a     nt!ObWaitForSingleObject+0x91
05 ffffb80d`5ef10ac0 fffff802`7320d9f5     nt!NtWaitForSingleObject+0x6a
06 ffffb80d`5ef10b00 00007ffc`fc2cd144     nt!KiSystemServiceCopyEnd+0x25
07 000000ae`3517c218 00007ffc`f9fb306e     ntdll!NtWaitForSingleObject+0x14
08 000000ae`3517c220 00007ffc`c1da5d76     KERNELBASE!WaitForSingleObjectEx+0x8e
09 000000ae`3517c2c0 00007ffc`c1da5cfc     mpengine!_rsignal+0x5d0d6
0a 000000ae`3517c2f0 00007ffc`c1da5cc4     mpengine!_rsignal+0x5d05c
0b 000000ae`3517c320 00007ffc`c19c0735     mpengine!_rsignal+0x5d024
0c 000000ae`3517c350 00007ffc`c1dac1de     mpengine+0xe0735
0d 000000ae`3517c380 00007ffc`c19209c3     mpengine!_rsignal+0x6353e
0e 000000ae`3517c3b0 00007ffc`c1920964     mpengine+0x409c3
0f 000000ae`3517c3e0 00007ffc`c1c56ad0     mpengine+0x40964
10 000000ae`3517c410 00000000`00000e77     mpengine!MpContainerCloseObject+0x334c0
11 000000ae`3517c418 00007ffc`c1b5d2ee     0xe77
12 000000ae`3517c420 00000000`00000000     mpengine+0x27d2ee

and googling around for things I can look at, I found out the following

 kd> kv

08 000000ae`3517c220 00007ffc`c1da5d76     : 00000270`44e25010 00000000`00000000 00000000`00000000 00000000`00001054 : KERNELBASE!WaitForSingleObjectEx+0x8e

this “could” be HANDLE 1054 being passed into WaitForSingleObjectEx:

 kd> !handle 1054 f

PROCESS ffff800db1933340
    SessionId: 0  Cid: 05d4    Peb: ae34cff000  ParentCid: 02b8
    DirBase: 113cd3000  ObjectTable: ffffce872eda0380  HandleCount: 952.
    Image: MsMpEng.exe

Handle table at ffffce872eda0380 with 952 entries in use

1054: Object: ffff800db66cc560  GrantedAccess: 001f0003 (Protected) (Inherit) Entry: ffffce874bfff150
Object: ffff800db66cc560  Type: (ffff800da62bdc40) Event
    ObjectHeader: ffff800db66cc530 (new version)
        HandleCount: 1  PointerCount: 32769

kd> dt nt!_object_header ffff800db66cc530
   +0x000 PointerCount     : 0n32769
   +0x008 HandleCount      : 0n1
   +0x008 NextToFree       : 0x00000000`00000001 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0xe6 ''
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0x8 ''
   +0x01b Flags            : 0 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y0
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Reserved         : 1
   +0x020 ObjectCreateInfo : 0xfffff802`73a53900 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0xfffff802`73a53900 Void
   +0x028 SecurityDescriptor : (null) 
   +0x030 Body             : _QUAD

Assuming I held the debugger in the right direction, it would seem it is waiting for a PUSH_LOCK. I wonder then if the problem is
perhaps related to IRP_MJ_LOCK_CONTROL. That is the most obvious locking with filesystems?

But could just be an internal lock released by something else too.

No, I don’t think you’re in the right direction…That thread is waiting on a Dispatcher Object (e.g. event, mutex, timer, semaphore, etc.) and the Lock field is just a lock protecting the object itself. If you look at the !thread output it will tell you which type of object the thread is waiting on. For example:

        THREAD ffff9c8cac2f2600  Cid 0004.000c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
            **fffff8061dc44180  SynchronizationEvent**

IRP_MJ_LOCK_CONTROL, otoh, is used for Byte Range Locking, which is something else entirely and isn’t generally involved in deadlocks (conflicting operations fail immediately with STATUS_FILE_LOCK_CONFLICT instead of waiting).

Yeah, it never calls IRP_MJ_LOCK_CONTROL so I could rule that out pretty quickly.

I redid my IRP_MJ_READ, IRP_MJ_WRITE and IRP_MJ_CLEANUP completely, with focus on CcMgr
interaction, since there are so many special-case paths.

The good news I can now download files with browsers fine and MsMpEng.exe does not stop responding.

I am not entirely sure which part fixes it, but either way, I feel the interaction with CcMgr is more
correct, AND, it made the problem go away.