A filter above reissues IRPs from their completion routine

Olof_Lagerkvist · August 24, 2021, 9:56am

Hi.

I am analyzing crash dumps from a place where our storage stack filter driver causes a crash in combination with another filter driver. Our filter driver is a write filter that is used with an application where users can attach a hard disk and the application applies the write filter driver, inject some files and settings, attach the disk to a Hyper-V VM and boots the VM.

Now, in this case the host BSODs when the Hyper-V VM is booting up and the stack trace indicates that there is another filter above us that resends IRPs again immediately in their completion routine. Since we complete write requests immediately in the calling thread this eventually causes a stack overflow and BSOD. Having a look at the disassembly I think that they check for some condition and some internal queue and if the expected condition is not met and the IRP is still in that queue, they reissue the IRP. That works as long as the completion routine is called from another thread after the dispatch routine has returned which of course is what virtually always happens otherwise in storage stack. But in this case since we complete the IRP from the same thread within the dispatch routine, they have not removed this IRP from their queue yet when the completion is called.

I will of course try to figure out if there is something else I am missing here or if there is some other bug in our driver that might mess up what other filters expect to find. I can of course also change our driver to complete the IRP from another thread with a little time delay to workaround this. But my general question here is, how common is this behavior to reissue IRPs in the completion routine? Having analyzed crash dumps for quite some time I must say that my feeling here is that reissuing IRPs directly in a completion routine is very uncommon behavior, but maybe I have just missed that so far. I would never do anything like that myself though, particularly not in a rather complex driver stack like the storage stack. It feels like it could sooner or later run into something like what we see now and I am surprised to see it like this.

Stack trace (aimwrfltr is our driver):

[... repeated until stack overflow ...]
aimwrfltr!AIMWrFltrWrite(struct _DEVICE_OBJECT * DeviceObject = <Value unavailable error>, struct _IRP * Irp = 0xffffdd0b`df740ba0)+0x1c0
nt!IofCallDriver+0x59
partmgr!PmWrite+0xf8
partmgr!PmGlobalDispatch+0x1f
nt!IofCallDriver+0x59
apmwin+0x207b
nt!IofCallDriver+0x59
gpt_loader+0x85bd
nt!IofCallDriver+0x59
PGPwded+0x15181
PGPwded+0x15212
nt!IopfCompleteRequest+0x119
nt!IofCompleteRequest+0x17
aimwrfltr!AIMWrFltrWrite(struct _DEVICE_OBJECT * DeviceObject = <Value unavailable error>, struct _IRP * Irp = 0xffffdd0b`df740ba0)+0x1c0
nt!IofCallDriver+0x59
partmgr!PmWrite+0xf8
partmgr!PmGlobalDispatch+0x1f
nt!IofCallDriver+0x59
apmwin+0x207b
nt!IofCallDriver+0x59
gpt_loader+0x85bd
nt!IofCallDriver+0x59
PGPwded+0x154ea
PGPwded+0x1060e
PGPwded+0x16b34
nt!IofCallDriver+0x59
partmgr!PartitionIo+0x1dd
partmgr!PartitionWrite+0x1e
partmgr!PmGlobalDispatch+0x1f
nt!IofCallDriver+0x59
volmgr!VmReadWrite+0xf9
klbackupdisk+0x2273
klbackupdisk+0x2517
klbackupdisk+0x2408
nt!IofCallDriver+0x59
fvevol!FveFilterRundownReadWrite+0x920
fvevol!FveFilterRundownWrite+0x4c
klbackupdisk+0x2273
klbackupdisk+0x2517
klbackupdisk+0x2408
nt!IofCallDriver+0x59
iorate!IoRateIssueAndRecordIo+0x7f
iorate!IoRateProcessIrpWrapper+0x19d
iorate!IoRateDispatchReadWrite+0x80
nt!IofCallDriver+0x59
volume!VolumePassThrough+0x23
nt!IofCallDriver+0x59
volsnap!VolsnapWriteFilter+0x11d
volsnap!VolSnapWrite+0x13
klbackupdisk+0x2273
klbackupdisk+0x2517
klbackupdisk+0x2408
nt!IofCallDriver+0x59
mounthlp+0x5429
nt!IofCallDriver+0x59
Ntfs!NtfsStorageDriverCallout+0x1c
nt!KxSwitchKernelStackCallout+0x2e
nt!KiSwitchKernelStackContinue
nt!KiExpandKernelStackAndCalloutOnStackSegment+0x18e
nt!KiExpandKernelStackAndCalloutSwitchStack+0xdc
nt!KeExpandKernelStackAndCalloutInternal+0x33
nt!KeExpandKernelStackAndCalloutEx+0x1d
Ntfs!NtfsMultipleAsync+0x110
Ntfs!NtfsNonCachedIo+0x3f9
Ntfs!NtfsCommonWrite+0xac7
Ntfs!NtfsFsdWrite+0x1e8
nt!IofCallDriver+0x59
FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15e
FLTMGR!FltpDispatch+0xb6
nt!IofCallDriver+0x59
PGPfsfd+0x52bb
nt!IofCallDriver+0x59
FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15e
FLTMGR!FltpDispatch+0xb6
nt!IofCallDriver+0x59
nt!IoSynchronousPageWriteEx+0x137
nt!MiIssueSynchronousFlush+0x70
nt!MiFlushSectionInternal+0x868
nt!MmFlushSection+0x166
nt!CcFlushCachePriv+0x6cd
nt!CcCoherencyFlushAndPurgeCache+0x73
Ntfs!NtfsFlushUserStream+0xc6
Ntfs!NtfsPerformOptimisticFlush+0xa1
Ntfs!NtfsCommonFlushBuffers+0x210
Ntfs!NtfsCommonFlushBuffersCallout+0x19
nt!KeExpandKernelStackAndCalloutInternal+0x78
nt!KeExpandKernelStackAndCalloutEx+0x1d
Ntfs!NtfsCommonFlushBuffersOnNewStack+0x66
Ntfs!NtfsFsdFlushBuffers+0xdd
nt!IofCallDriver+0x59
FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15e
FLTMGR!FltpDispatch+0xb6
nt!IofCallDriver+0x59
nt!IofCallDriver+0x59
FLTMGR!FltpDispatch+0xe8
nt!IofCallDriver+0x59
nt!IopSynchronousServiceTail+0x1a5
nt!NtFlushBuffersFileEx+0x1fc
nt!NtFlushBuffersFile+0x16
nt!KiSystemServiceCopyEnd+0x28
nt!KiServiceLinkage
nt!CmpDoFileFlush+0x27
nt!CmpFileFlush+0x18
nt!CmpFileFlushAndPurge+0x18b
nt!HvWriteHivePrimaryFile+0x111
nt!CmpFlushHive+0x661
nt!CmpCompleteUnloadKey+0x1ba
nt!CmpPerformUnloadKey+0x330
nt!CmUnloadKey+0x42f
nt!NtUnloadKey+0x11
nt!KiSystemServiceCopyEnd+0x28
0x00007ffe`8a3605a4

Our source is here: https://github.com/ArsenalRecon/Arsenal-Image-Mounter/blob/master/Unmanaged%20Source/aimwrfltr/write.cpp, function AIMWrFltrWrite

Peter_Viscarola_OSR · August 24, 2021, 11:45am

how common is this behavior to reissue IRPs in the completion routine?

It’s reasonably common, though I don’t recall if I’ve seen it in the Storage Stack. And, it’s also strongly discouraged for precisely the reasons you’re seeing: The potential for stack overflow even though “nobody does anything wrong.”

I used to rail against this practice mightily, but have mostly given up after seeing how many people do it and get away with it.

Pity you have to queue a work item to complete an operation that could be done synchronously, just to avoid this. Maybe check for available stack space, and only queue the work item if you’re running short? That will avoid you having to use the work item except on systems where this problem exists.

Peter

Peter_Viscarola_OSR · August 24, 2021, 1:05pm

This post got me thinking about how long we’ve been preaching that this is not a good idea. As evidence, I present to you, the following from Chapter 11 of Windows NT Device Driver Development (1998):

Peter

Olof_Lagerkvist · August 24, 2021, 7:08pm

@“Peter_Viscarola_(OSR)” said:
This post got me thinking about how long we’ve been preaching that this is not a good idea. As evidence, I present to you, the following from Chapter 11 of Windows NT Device Driver Development (1998):

Thanks. Yes I understand and absolutely agree. I consider it too dangerous to be seen as good practice, but still here we are. I’ll find out a workaround for now.