What can cause invocation of EvtRequestCancel to be deferred?

I’m seeing some strange behaviour in a KMDF driver, where a non-power-managed parallel-dispatch WDF I/O queue does not call my EvtRequestCancel routine as soon as the WDM cancel routine is invoked. I know this because I replaced WDF’s WDM cancel routine in the IRP with my own. That cancel routine prints a debug message and then calls WDF’s WDM cancel routine. It’s naughty, but it proves that the WDF I/O queue is deferring invocation of my EvtRequestCancel routine for some reason.

The sequence of events is:

  1. Program A opens the device and initiates an operation, which I will call “DOSOMETHING”. EvtIoDeviceControl marks the request CancelableEx without completing the request immediately. I am deliberately preventing DOSOMETHING from completing so that I can test its cancellation.

  2. Program B opens the same device and initiates an operation, which I will call “RESET”. This is completely synchronous (i.e. EvtDeviceIoControl does not return until the operation is finished), and does the following:

(a) Sets a flag that causes all incoming DOSOMETHINGs to fail.

(b) Synchronously waits for all ongoing DOSOMETHING operations to finish before proceeding (so it is waiting on 1 operation at this point). It uses KeWaitForSingleObject on a KEVENT, KernelMode, non-alertable.

(c) Does some stuff, and completes the request.

  1. I hit CTRL-C for program A. I see that the WDM cancel routine is invoked immediately, BUT the WDF I/O queue does not (yet) invoke my EvtRequestCancel for DOSOMETHING.

At this point, things are deadlocked. RESET is stuck at (b), waiting on 1 x DOSOMETHING to finish, but the WDF I/O queue has not invoked DOSOMETHING’s EvtRequestCancel. There’s really nothing else happening as far as the driver is concerned.

  1. I then run PROGRAM C, which opens the same device. At that moment, the WDF I/O queue decides to invoke EvtRequestCancel for DOSOMETHING, in the process context of C.

  2. DOSOMETHING completes its request with STATUS_CANCELLED and decrements a counter to 0, which causes the KEVENT to become signalled. RESET’s KeWaitForSingleObject finishes normally, so RESET completes its request normally too.

Programs A, B, and C then complete normally, and the expected Close and Cleanup events occur, and nothing bad happens. I’m currently running with driver verifier enabled, and it does not see any problems.

I have not specified any WdfSynchronizationLevel or WdfExecutionLevel for any of the WDF driver, device, file or I/O queue objects. My reading of the docs makes me think that this should result in the minimum of automatic synchronization.

What I don’t understand is why cancellation is being deferred by the WDF I/O queue, and, even weirder, why an incoming device open is sufficient to unblock the deadlock.

I took at a look at the WDF source code. I couldn’t really follow it properly, but the gist of what I saw is that invoking EvtRequestCancel immediately, in response to a WDM cancel routine being invoked, is conditional. There’s some reasonably complex logic in the main dispatch routine of the I/O queue object, which decides what callback the I/O queue invokes next. I couldn’t really figure out the rules that it plays by, though.

Can somebody spell out what needs to be done, if anything, to have no automatic synchronization in an I/O queue? Really, I want WDF to call my callbacks the way WDM would, for the most part. Is it sufficient to leave synchronization and execution levels for everything as InheritFromParent?

Is it in fact not possible to turn off serialization of EvtDeviceIoControl with EvtRequestCancel?

I wondered if the KeWaitForSingleObject is messing with IRQLs, given that the wait is non-alertable. Can this affect a WDF I/O queue’s dispatching?

Given that I pretty much want WDM-style behaviour for interactions with user mode, would I be better off using WdfDeviceInitAssignWdmIrpPreprocessCallback and handling IRP_MJ_READ/WRITE/DEVICE_CONTROL/CREATE/CLOSE/CLEANUP myself? With that approach, will EvtDeviceSelfManagedIoSuspend etc. be sufficient to avoid problems due to no longer having the necessary automatic interactions between the WDF I/O queue’s state and the WDF device object’s PnP/power state?

OS is Windows 10 x64. KMDF version string is “Kernel Mode Driver Framework (verifier on) version 01.021.0”.

Good description, good detail. Nicely written request for assistance.

And… good problem!

I think the ONLY thing you didn’t tell us is Queue structure, and the Dispatch Type of the Queue(s).

Also, can you confirm that you don’t have any File Object level callbacks-specified (Cleanup, Close).

Peter
OSR
@OSRDrivers

Hi Peter,

Thanks! I spent ages writing that post.

Ok, if by Queue structure, you’re asking whether there are multiple Queues and how requests are forwarded between them, there’s just one Queue. It is not power-managed, although I’ve tried it power-managed too (made no difference). Dispatch type is Parallel. Initialization of the attributes and config is as follows:

WDF_IO_QUEUE_CONFIG_INIT_DEFAULT_QUEUE(&queueConfig, WdfIoQueueDispatchParallel);
queueConfig.EvtIoRead = ADXDMAEvtIoRead;
queueConfig.EvtIoWrite = ADXDMAEvtIoWrite;
queueConfig.EvtIoDeviceControl = ADXDMAEvtIoDeviceControl;
queueConfig.EvtIoStop = ADXDMAEvtIoStop;
queueConfig.PowerManaged = WdfFalse;

WDF_OBJECT_ATTRIBUTES_INIT(&queueAttributes);
#if 0
queueAttributes.SynchronizationScope = WdfSynchronizationScopeNone;
queueAttributes.ExecutionLevel = WdfExecutionLevelDispatch;
#endif

status = WdfIoQueueCreate(hDevice, &queueConfig, &queueAttributes, &hQueue);
if (!NT_SUCCESS(status)) {
… error handling …

The EvtIoStop callback does nothing at the moment. I thought I would need to handle that event, but at the moment I don’t. I will comment that out to see if it makes a difference.

I noticed in the kernel debugger that there are two I/O queues in my driver, one of which I did not create. Is one of them created by WDF for internal use, e.g. handling PnP & Power IRPs?

I set callbacks on the File Objects. Code to initialize the config & attributes of the File objects is:

WDF_FILEOBJECT_CONFIG_INIT(
&fileConfig,
ADXDMAEvtDeviceFileCreate,
ADXDMAEvtFileClose,
ADXDMAEvtFileCleanup);

fileConfig.FileObjectClass = WdfFileObjectWdfCanUseFsContext | WdfFileObjectWdfCanUseFsContext2 | WdfFileObjectCanBeOptional;

WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&fileAttributes, ADXDMA_FILE_CONTEXT);
#if 0
fileAttributes.SynchronizationScope = WdfSynchronizationScopeNone;
fileAttributes.ExecutionLevel = WdfExecutionLevelPassive;
#endif

WdfDeviceInitSetFileObjectConfig(deviceInit, &fileConfig, &fileAttributes);

After copying & pasting that, I wonder if setting FileObjectClass has any side effects. I’m going to try commenting that out as soon I finish this post.

One thing I noticed in the documentation for WdfDeviceInitSetFileObjectConfig is this:

“By default, each framework file object inherits its synchronization scope and execution level from its parent device object. If the parent device object’s synchronization scope and execution level are not WdfSynchronizationScopeNone and WdfExecutionLevelPassive, the driver must set the WdfSynchronizationScopeNone and WdfExecutionLevelPassive values in the WDF_OBJECT_ATTRIBUTES structure that the FileObjectAttributesparameter specifies. Otherwise, WdfDeviceCreate will return an error status code.”

The KMDF Toaster FDO sample that I initially based my code on doesn’t bother to specify a SynchronizationScope or ExecutionLevel. As far as I can tell, requesting ExecutionLevelPassive is effectively requesting WDF to use a worker thread or something whenever the sytem invokes a WDM dispatch function at > PASSIVE_LEVEL. Given that WDM invokes CREATE, CLOSE and CLEANUP dispatch routines at passive level, I would have thought that ExecutionLevelDispatch would be OK too, because all that does is ask for a guarantee of <= DISPATCH_LEVEL, which WDM automatically satisfies. If inheriting from parent, and none of the parents set an ExecutionLevel, then the evaluated ExecutionLevel is ExecutionLevelDispatch, right? In any case, KMDF doesn’t seem to mind if I don’t set the ExecutionLevel for the file objects.

With regard to EvtDeviceFileCreate, EvtFileClose & EvtFileCleanup, I have verified using WdfDeviceInitAssignWdmIrpPreprocessCallback that nobody is calling the WDM dispatch routines during the sequence of events that I described in my original post. I can see that they get called in the usual process contexts (i.e. as they would be in a WDM driver) when the deadlock condition gets unblocked, and the three programs clean up after themselves.

By the way, if I were to install a checked build of Windows 10, would the symbol files have source code so that I could step through the KMDF source code? I tried stepping through KMDF assembly language in a release build of Windows 10, but there’s got to be a better way than that…

Source is not embedded in the symbols file, no matter if it is checked or free.

Bent from my phone


From: xxxxx@lists.osr.com on behalf of xxxxx@alpha-data.com xxxxx@lists.osr.com
Sent: Monday, August 21, 2017 1:48:30 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] What can cause invocation of EvtRequestCancel to be deferred?

By the way, if I were to install a checked build of Windows 10, would the symbol files have source code so that I could step through the KMDF source code? I tried stepping through KMDF assembly language in a release build of Windows 10, but there’s got to be a better way than that…


NTDEV is sponsored by OSR

Visit the list online at: https:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at https:

To unsubscribe, visit the List Server section of OSR Online at https:</https:></https:></https:>

Hmnmmm… it’s hard to believe that you want to be handling Cleanup. But, be that as it may.

If it won’t entirely screw-up your test, and just for the sake or argument/clarity/experimentation, I’d try commenting out the call to WdfDeviceInitSetFileObjectConfig.

I *really* can’t see what’s going awry. REALLY I can’t.

Maybe tomorrow there’ll be other people not looking at the eclipse, and somebody else can help out…

Peter
OSR
@OSRDrivers

I do some cleanup stuff in EvtFileCleanup, but I could remove setting the cleanup & close callbacks for the purposes of debugging this issue. Not so easy for create, though. You’ve got me wondering - why is EvtFileCleanup generally not needed in a KMDF driver? Does the I/O queue call EvtRequestCancel for driver-owned requests in response to IRP_MJ_CLEANUP? If that’s the case, I can see your point.

However, I’ve just noticed a bug in the DOSOMETHING code. It’s acquiring a WDFSPINLOCK twice (and releasing it twice). I didn’t notice that until now, and surprisingly, neither did the verifier. I’d better check what verifier options I have enabled.

That’s a bit of a schoolboy error on my part. Would that leave the IRQL at DISPATCH_LEVEL, because the 2nd call to WdfAcquireSpinlock overwrites the saved IRQL of the 1st acquire? Returning from EvtIoDeviceControl at DISPATCH_LEVEL isn’t going to do the Queue any good.

Unfortunately I can’t test a fix right now, but I’ll try it tomorrow.

Doron - OK, thanks for the info.

Unless something has changed recently, there are no checked Win 10 builds available? It was really puzzling that symbol files for checked builds were available, but no actual checked builds. I guess MSFT decided they didn’t want drivers to be as error free as possible.

Jan

On 8/21/17, 2:01 PM, “xxxxx@lists.osr.commailto:xxxxx on behalf of Doron Holan” on behalf of xxxxx@microsoft.commailto:xxxxx> wrote:

Source is not embedded in the symbols file, no matter if it is checked or free.

Bent from my phone
________________________________
From: xxxxx@lists.osr.com on behalf of xxxxx@alpha-data.com xxxxx@lists.osr.com
Sent: Monday, August 21, 2017 1:48:30 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] What can cause invocation of EvtRequestCancel to be deferred?

By the way, if I were to install a checked build of Windows 10, would the symbol files have source code so that I could step through the KMDF source code? I tried stepping through KMDF assembly language in a release build of Windows 10, but there’s got to be a better way than that…


NTDEV is sponsored by OSR

Visit the list online at: https:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at https:

To unsubscribe, visit the List Server section of OSR Online at https:


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></https:></https:></https:></mailto:xxxxx></mailto:xxxxx>

The need to implement cleanup is quite rare, in any Windows driver (other than a file system). Every time I’ve seen cleanup implemented, it was because the dev didn’t fully understand cancel. That’s not intended to be any reflection on you Mr. Whitlock. Just a statement of my experience.

The only time you want to implement cleanup is when you have process-specific state that Ned’s to be torn down, such as when you need to unmap a shared section. After you return from cleanup (if there is a cleanup) any pending Requsts will be canceled.

And, the only time you can handle cancel is if you own the Request.

Does that help at all? I’ll be happy to try to explain more fully, if it would be useful and you tell me what you’d like to know.

I’m a bit mystified as to how your driver could be recursively acquiring a spin lock and not hanging, if I call KeAcquireSpinLock on a lock that’s already held, I’ll block on that call until the locks available. There’s no checking for,which thread owns the lock… so wouldn’t your second attempt at acquisition deadlock?

Peter
OSR
@OSRDrivers

Hi Peter,

It turns out that I’m not acquring the same WDFSPINLOCK twice - I thought I was, but I was actually aquiring one associated with the device and then acquiring one associated with a file object, nested. I’ve been looking at this code for long enough that I’m starting to see things that aren’t there.

I refactored that bit of code so that it holds at most one driver-created spinlock at a given moment, because I tend to see nested spinlock acquire in the same bit of code as a sign of overcomplication, and is usually avoidable. I doubt it will make any difference, and I’ll try your suggestions tomorrow (can’t easily get rid of the EvtDeviceFileCreate callback though).

I do have more questions about cleanup, but I’ll save them for later - I think I’ve taken up enough of your time for the moment. I appreciate your taking the time to talk about this stuff.

I’ve now removed the EvtFileCleanup callback. There is still a create and close callback, however, and I can’t easily get rid of them, because there is some context information associated with each FileObject.

I also no longer set FileObjectClass.

The double spinlock acquire in my code was in fact a false alarm, due to tired eyes. It was a nested spinlock acquire of two different spinlocks. It’s gone, anyway. However, that exercise does remind me that a bug somewhere else, maybe in EvtDeviceIoControl, might cause the Queue to behave strangely.

None of these changes have made any difference to the behaviour that I’m seeing. I’m going to try to simplify the code as much as possible, so that it is little more than a test case.

Thanks for your help, Peter.

On the subject of cleanup, I think I see that in most cases, it shouldn’t be necessary. Deallocating resources associated with a FileObject should be done in EvtFileClose.

With regard to cleanup, there are two common scenarios that I am aware of:

  1. A user-mode program closes a handle while there are IRPs pending. My understanding of what happens in WDM land is that IRP_MJ_CLEANUP is delivered, assuming that the file object’s reference count has gone to 0. If desired, the driver can complete the pending IRPs with some kind of failure status. If the driver doesn’t, then those pending IRPs will complete normally in due course. The system *won’t* cancel them automatically because the process and its threads are still alive. Is this correct?

Supposing that is correct, then if you consider closing the handle to be a programming error, then you might as well let the pending IRPs complete normally in due course, so no need to handle IRP_MJ_CLEANUP.

  1. A process is forcibly terminated while there are IRPs pending. In that case, in WDM land, IRP_MJ_CLEANUP is delivered and because any threads are terminated too, the system invokes cancel routines for the pending IRPs. So there’s no need to handle IRP_MJ_CLEANUP in this case.

Is my reasoning for these two cases correct, and do things work roughly the same way in KMDF? My reading of the docs makes me think they do, except that the WDFIOQUEUE might not invoke certain callbacks in the same process context as WDM would, depending on DispatchType, ExecutionLevel and SynchronizationScope and what is going on at the time.

Alright, time to put this thread to bed. My conclusion is that the behaviour of WDF I/O Queue is to defer the dispatch of EvtRequestCancel if the Queue is in the “Dispatching” state, regardless of SynchronizationScope or ExecutionLevel, when WDF’s WDM cancel routine is invoked.

That is to say, if the driver has not returned from its EvtDeviceIoRequest by the time a WDM cancel routine is invoked for an IRP corresponding to WDF request, dispatch of EvtRequestCancel is deferred. Can anybody confirm that this is the actual expected behaviour?

Here’s the state of the Queue on breaking on a breakpoint placed in my WDM cancel routine, immediately before it calls WDF’s WDM cancel routine:


Queue: 2 !wdfqueue 0x000041f5ae614548
Parallel, Auto, Not power-managed, PowerOn, Can accept, Can dispatch, Dispatching, ExecutionLevelDispatch, SynchronizationScopeNone
Number of driver owned requests: 2
!wdfrequest 0x000041f5ad387258 !irp 0xffffa78bc8dd2d30
!wdfrequest 0x000041f5ae669f68 !irp 0xffffa78bc88a6d30
Number of waiting requests: 0

Number of driver owned cancelable requests: 1
!wdfrequest 0x000041f5ad387258 !irp 0xffffa78bc8dd2d30

EvtIoRead: (0xfffff80ee906ea70) adxdma!ADXDMAEvtIoRead
EvtIoWrite: (0xfffff80ee906ebe0) adxdma!ADXDMAEvtIoWrite
EvtIoDeviceControl: (0xfffff80ee906ed50) adxdma!ADXDMAEvtIoDeviceControl

I can see that the Queue is in the “Dispatching” state because an EvtDeviceIoControl invocation has not yet returned.

So I need to rethink my design, I guess.

xxxxx@alpha-data.com xxxxx@lists.osr.com wrote:

Alright, time to put this thread to bed. My conclusion is that the behaviour of WDF I/O Queue is to defer the dispatch of EvtRequestCancel if the Queue is in the “Dispatching” state, regardless of SynchronizationScope or ExecutionLevel, when WDF’s WDM cancel routine is invoked.

That is to say, if the driver has not returned from its EvtDeviceIoRequest by the time a WDM cancel routine is invoked for an IRP corresponding to WDF request, dispatch of EvtRequestCancel is deferred. Can anybody confirm that this is the actual expected behaviour?

Are you aware that the full source code for KMDF is available?

When a driver-owned IRP is canceled, KMDF puts the request on a list. 
That list is drained (and hence the cancel callback is called) during
the main “Dispatch” loop, before it dispatches new incoming IRPs.  So,
I’d say you’re right.

Why does your EvtDeviceIoRequest take so long?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim,

Thanks for the reply. Yes, I did notice the Dispatch routine in the I/O Queue, but it’s not exactly simple. I got the gist of it, which was that it may or may not dispatch a cancel immediately, depending on its state.

The lengthy EvtDeviceIoRequest (I call it “RESET” operation) is synchronously waiting for all “DOSOMETHING” operations to finish, so that it can reset a bunch of state used in processing of DOSOMETHINGs.

It seemed simplest to have the RESET operation execute synchronously, waiting on a KEVENT that gets set when then the condition (IsFailingIncomingDOSOMETHINGs && nDOSOMETHINGCount == 0) becomes true.

This wouldn’t be a problem in plain old WDM, but the design of the WDF I/O Queue doesn’t appear to permit the synchronous RESET operation to be overlapped with the EvtRequestCancel for DOSOMETHINGs, and the result is a deadlock. The docs kind of allude to this behaviour, but they make it sound optional, depending on SynchronizationScope and ExecutionLevel.

https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/using-automatic-synchronization

“If your driver enables automatic synchronization of the callback functions that handle I/O requests, the framework synchronizes these callback functions so that they run one at a time. The following table lists the callback functions that the framework synchronizes.”

It seems there is always some synchronization going on (compared to WDM behaviour) even for SynchronizationScopeNone.

There’s nothing stopping me converting RESET to an asynchronous operation or a worker thread. That way the EvtRequestCancel would get through in a timely manner and RESET would finish as expected, no deadlocks.

I’m a relative newbie to WDF, so I’m still learning its nuances.

What you are doing is pending a request while retaining request ownership.

You have another option. During device initialization, create a secondary queue configured for manual dispatching and with an EvtIoCanceledOnQueue.

Now when you receive a DOSOMETHING request, do not mark it as cancellable but requeue it to the secondary queue configured for manual dispatching (WdfRequestForwardToIoQueue), and return from the I/O handler. Requests forwarded to this queue will stay there unless you remove them manually or unless they are cancelled (and completed within the EvtIoCanceledOnQueue).

Because the request belongs to a WDF queue, it is owned by the framework and the framework will handle the request cancellation and call your registered EvtIoCanceledOnQueue to complete the request when such a cancellation occurs.

Would the delay still be present in that case ? I don’t know but it is Worth trying.

First of all, I apologize for being absent from this thread for the past couple of days. Duty calls, I’m afraid. I’ve been in meetings for the past day and a half, it seems.

No and yes. Cleanup is called when the HANDLE count on the FILE_OBJECT goes to zero (not the reference count). Other than that (which is probably a typo on your part) all good (modulo whether the handle has been opened for a completion port, which makes this happen differently, but let’s not go there).

Yeah. Well, I’m *not* an WDF newbie, and I certainly wouldn’t have expected the behavior you’re seeing. And I do *not* agree that this behavior is even hinted at in the documentation.

I’m not sure if the goal (of the KMDF devs) in this case was to reduce the inherent race conditions between I/O initiation and cancel or if the goal was merely to make their own implementation of cancel easier.

So… the conclusion is that if I’m blocking within my driver’s EvtIo Event Processing Callback, then I won’t ever get a cancel operation (while blocking) for ANY in-progress Request? Do I have that right?

Sure… synchronously blocking sssuuucks as a general architectural approach in a Windows driver. But, in the case Mr. Whitlock outlined, you can see that his expectation is kinda reasonable.

There are a lot of little edge-conditions like this in WDF. That’s why it’s always best to keep your thinking confined to WDF terms, and not try to (ever, if possible) mentally map between WDM and WDF.

Interesting. Thanks for bringing this issue to NTDEV, Mr. Whitlock. I’ve learned a lot. And I’m glad you problem got sorted. With little help from us, I guess, but…

Peter
OSR
@OSRDrivers

If you forward the request in a queue configured for manual dispatching and with an EvtIoCanceledOnQueue callback, then the EvtIoCanceledOnQueue callback is called when you hit CTRL+C (not deferred). The call occurs at APC_LEVEL in the context of the issuing process.

Hi W.D.,

That’s interesting to know, thanks. I’m not sure I like relying on that behaviour though, given that the docs don’t explicitly say that’s how it is. A future version of KMDF might break it, and maybe there are other corner cases where the behaviour is different.

Thanks to all who commented in this thread and especially to Peter.

>I’m not sure I like relying on that behaviour though, given that the docs don’t explicitly say that’s how it is.

This is the documented behaviour:

“The framework calls an EvtIoCanceledOnQueue callback function as soon as it determines that an I/O request has been canceled, regardless of the dispatching method that the driver has set for the I/O queue.”

https://msdn.microsoft.com/en-us/library/windows/hardware/ff541756(v=vs.85).aspx