WdfRequestUnmarkCancelable results in BSOD

I’m new to WDF driver programming, but I’ve been learning quite a bit from this site. Hopefully, someone can point me in the right direction. I have an IOCTL based DMA driver that I wrote for interfacing with PCIe FPGAs. It works well and is reliable until I add support for IO cancellation. Once I add the WdfRequestMarkCancelableEx, WdfRequestUnmarkCancelable calls I get the BSOD .

The driver code is rather long, but here’s what I think is relevant. The entry point is RiffaEvtIoDeviceControl. This starts the IO request for say the send direction, by calling RiffaIoctlSend. Both are listed below. I’ve commented out the WdfRequestMarkCancelableEx calls in the RiffaIoctlSend because it’s causing the BSOD. The EvtRequestCancel callback (RiffaEvtRequestCancel) is also listed below. After the send operation completes, the device causes an interrupt which eventually causes the RiffaCompleteRequest function (listed last) to be called. This calls WdfRequestCompleteWithInformation. You can see that WdfRequestUnmarkCancelable has been commented out there. I’m following the pattern on the MSDN WdfRequestUnmarkCancelable page. So I cannot figure out what’s going on. To make things worse, this works most of the time. It’s only when I run numerous back to back calls through this “send” path that I can get it to BSOD. The dump from the BSOD is not terribly useful to me. I’ve appended that last.

I’ll note that I’ve already double checked that I’m not putting the EvtRequestHandle callback in a PAGEable area. I suspect that the problem lies in either marking or unmarking, but the dump doesn’t help me narrow it down. Any direction or ideas would be greatly appreciated.

Thanks
Matt

=============================

/**
* This event is called when the framework receives IRP_MJ_DEVICE_CONTROL
* requests from the system.
* .
* Queue - Handle to the framework queue object that is associated with the
* I/O request.
*
* Request - Handle to a framework request object.
*
* OutputBufferLength - Length of the request’s output buffer,
* if an output buffer is available.
*
* InputBufferLength - Length of the request’s input buffer,
* if an input buffer is available.
*
* IoControlCode - the driver-defined or system-defined I/O control code
* (IOCTL) that is associated with the request.
*/
VOID RiffaEvtIoDeviceControl(IN WDFQUEUE Queue, IN WDFREQUEST Request,
IN size_t OutputBufferLength, IN size_t InputBufferLength,
IN ULONG IoControlCode) {
PDEVICE_EXTENSION devExt;

devExt = RiffaGetDeviceContext(WdfIoQueueGetDevice(Queue));

// Determine which I/O control code was specified.
switch (IoControlCode) {

case IOCTL_RIFFA_SEND: // (METHOD_OUT_DIRECT)
RiffaIoctlSend(devExt, Request, OutputBufferLength, InputBufferLength);
break;

case IOCTL_RIFFA_RECV: // (METHOD_OUT_DIRECT)
RiffaIoctlRecv(devExt, Request, OutputBufferLength, InputBufferLength);
break;

case IOCTL_RIFFA_LIST: // (METHOD_OUT_DIRECT)
RiffaIoctlList(devExt, Request, OutputBufferLength, InputBufferLength);
break;

case IOCTL_RIFFA_RESET: // (METHOD_OUT_DIRECT)
RiffaIoctlReset(devExt, Request);
break;

default:
// The specified I/O control code is unrecognized by this driver.
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, invalid ioctl request type\n”, devExt->Name);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_DEVICE_REQUEST, 0);
break;
}
}

/**
* Handles IRP_MJ_DEVICE_CONTROL requests for IOCTL_RIFFA_SEND.
* .
* DevExt - Handle to the device extension object.
*
* Request - Handle to a framework request object.
*
* OutputBufferLength - Length of the request’s output buffer,
* if an output buffer is available.
*
* InputBufferLength - Length of the request’s input buffer,
* if an input buffer is available.
*/
VOID RiffaIoctlSend(IN PDEVICE_EXTENSION DevExt, IN WDFREQUEST Request,
IN size_t OutputBufferLength, IN size_t InputBufferLength) {
NTSTATUS status = STATUS_SUCCESS;
PREQUEST_EXTENSION reqExt;
PRIFFA_FPGA_CHNL_IO io;
UINT64 length;
PCHAR buf = NULL;
size_t bufSize;

// Input should be non-zero
if(!InputBufferLength) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send zero length input buffer\n”, DevExt->Name);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_PARAMETER, 0);
return;
}

// Get the input buffer
status = WdfRequestRetrieveInputBuffer(Request, 0, &buf, &bufSize);
if (!NT_SUCCESS(status)) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send WdfRequestRetrieveInputBuffer failed\n”,
DevExt->Name);
WdfRequestCompleteWithInformation(Request, status, 0);
return;
}
if (bufSize < sizeof(RIFFA_FPGA_CHNL_IO)) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send input buffer too small to contain RIFFA_FPGA_CHNL_IO\n”,
DevExt->Name);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_PARAMETER, 0);
return;
}
io = (PRIFFA_FPGA_CHNL_IO)buf;

// Validate the length, last, chnl
length = (io->Length < (OutputBufferLength>>2) ? io->Length : (OutputBufferLength>>2));
if (io->Last > 1) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send invalid last: %d\n”, DevExt->Name, io->Last);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_PARAMETER, 0);
return;
}
if (io->Chnl >= DevExt->NumChnls) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send invalid channel: %d, device only has %d channel(s)\n”,
DevExt->Name, io->Chnl, DevExt->NumChnls);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_PARAMETER, 0);
return;
}

// Check that this isn’t an already running transaction
if (InterlockedExchange(&DevExt->Chnl[io->Chnl].InUse, 1) == 1) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send already in use on channel: %d\n”,
DevExt->Name, io->Chnl);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_PARAMETER, 0);
return;
}

// Set the channel number in the request context for RiffaEvtRequestCancel.
reqExt = RiffaGetRequestContext(Request);
reqExt->Chnl = io->Chnl;

// Start a send transaction.
if (length) {
// Start a DMA transaction for sending.
DevExt->Chnl[io->Chnl].Timeout = io->Timeout;
DevExt->Chnl[io->Chnl].Length = (length<<2);
DevExt->Chnl[io->Chnl].SpillAfter = (length<<2);
DevExt->Chnl[io->Chnl].Offset = (io->Offset<<2);
DevExt->Chnl[io->Chnl].Last = io->Last;
DevExt->Chnl[io->Chnl].Provided = 0;
DevExt->Chnl[io->Chnl].ProvidedPrev = 0;
DevExt->Chnl[io->Chnl].Confirmed = 0;
DevExt->Chnl[io->Chnl].ConfirmedPrev = 0;
DevExt->Chnl[io->Chnl].ActiveCount = 0;
DevExt->Chnl[io->Chnl].Cancel = 0;
DevExt->Chnl[io->Chnl].Request = Request;
InterlockedExchange(&DevExt->Chnl[io->Chnl].ReqdDone, 0);
status = RiffaStartDmaTransaction(DevExt, io->Chnl, (length<<2),
0, WdfDmaDirectionWriteToDevice);
if (!NT_SUCCESS(status)) {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, unable to start DMA transaction on channel: %d\n”,
DevExt->Name, io->Chnl);
WdfRequestCompleteWithInformation(Request, status, 0);
return;
}
//else {
// Mark the WDFREQUEST as cancellable.
//status = WdfRequestMarkCancelableEx(Request, RiffaEvtRequestCancel);
//if (!NT_SUCCESS(status)) {
// DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
// “riffa: fpga:%s, ioctl send WdfRequestMarkCancelableEx failed\n”,
// DevExt->Name);
// WdfRequestCompleteWithInformation(Request, status, 0);
//}
//}
}
else if (io->Last) {
// Program the device for zero length send (device RX) and complete
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send zero length with last set\n”, DevExt->Name);
RiffaProgramSend(DevExt, io->Chnl, (UINT32)length, io->Offset, io->Last);
WdfRequestCompleteWithInformation(Request, STATUS_SUCCESS, 0);
}
else {
// Invalid request, results in no send
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s, ioctl send invalid, no length or last\n”, DevExt->Name);
WdfRequestCompleteWithInformation(Request, STATUS_INVALID_PARAMETER, 0);
}
}

/**
* EvtRequestCancel handler for WDFREQUEST objects for IOCTL sends/receives.
* Called if the IO Manager or calling application needs to cancel the send.
* In practice this should only happen when the user application hangs and the
* user CTRL+C signals or Task Ends the application.
*
* Request - WDFREQUEST object from the IOCTL queue, representing the send
*/
VOID RiffaEvtRequestCancel(IN WDFREQUEST Request) {
PDEVICE_EXTENSION devExt;
PREQUEST_EXTENSION reqExt;
UINT32 chnl;
BOOLEAN canCancel;

devExt = RiffaGetDeviceContext(WdfIoQueueGetDevice(WdfRequestGetIoQueue(Request)));
reqExt = RiffaGetRequestContext(Request);

KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_TRACE_LEVEL,
“riffa: fpga:%s chnl:%d, request being cancelled\n”, devExt->Name,
(reqExt->Chnl >= RIFFA_MAX_NUM_CHNLS ? reqExt->Chnl - RIFFA_MAX_NUM_CHNLS : reqExt->Chnl)));

// Acquire the channel lock
WdfSpinLockAcquire(devExt->Chnl[reqExt->Chnl].SpinLock);

// See if we can cancel right now.
canCancel = (devExt->Chnl[reqExt->Chnl].ActiveCount == 0);
if (canCancel) {
// NULL out the request so that no other threads use it
devExt->Chnl[reqExt->Chnl].Request = NULL;
}
else {
// Set the cancel flag so that the last active thread cancels for us.
devExt->Chnl[reqExt->Chnl].Cancel = 1;
}

// Release the channel lock
WdfSpinLockRelease(devExt->Chnl[reqExt->Chnl].SpinLock);

// Cancel the request
if (canCancel) {
InterlockedExchange(&devExt->Chnl[reqExt->Chnl].Ready, 0);
InterlockedExchange(&devExt->Chnl[reqExt->Chnl].InUse, 0);
devExt->Chnl[reqExt->Chnl].ActiveCount = 0;
devExt->Chnl[reqExt->Chnl].Cancel = 0;
WdfDmaTransactionRelease(devExt->Chnl[reqExt->Chnl].DmaTransaction);
WdfTimerStop(devExt->Chnl[reqExt->Chnl].Timer, FALSE);
WdfRequestCompleteWithInformation(Request, STATUS_CANCELLED,
(ULONG_PTR)(devExt->Chnl[reqExt->Chnl].ConfirmedPrev +
devExt->Chnl[reqExt->Chnl].Confirmed)>>2);
}
}

/**
* Called when the WDFREQUEST object for the specified channel should be
* completed with the specified status. The WDFREQUEST has been marked
* cancelable so it must first be unmarked cancelable. After completion, the
* pointer to the WDFREQUEST is NULL’d out to indicate that the WDFREQUEST is
* no longer valid.
*
* DevExt - Pointer to the Device Extension
*
* Chnl - Channel number on which the DMA is taking place
*
* Status - NTSTATUS to set for completion
*/
VOID RiffaCompleteRequest(IN PDEVICE_EXTENSION DevExt, IN UINT32 Chnl,
IN NTSTATUS Status) {
NTSTATUS status;
WDFREQUEST request;

if ((request = DevExt->Chnl[Chnl].Request) != NULL) {
// Try to complete the request
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s chnl:%d, completing request\n”,
DevExt->Name, (Chnl < RIFFA_MAX_NUM_CHNLS ? Chnl : Chnl - RIFFA_MAX_NUM_CHNLS));

//status = WdfRequestUnmarkCancelable(request);
//if (status != STATUS_CANCELLED) {
// Complete the request
DevExt->Chnl[Chnl].Request = NULL;
WdfRequestCompleteWithInformation(request, Status,
(ULONG_PTR)(DevExt->Chnl[Chnl].ConfirmedPrev +
DevExt->Chnl[Chnl].Confirmed)>>2);
//}
//else {
// DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
// “riffa: fpga:%s chnl:%d, request already cancelled\n”,
// DevExt->Name, (Chnl < RIFFA_MAX_NUM_CHNLS ? Chnl : Chnl - RIFFA_MAX_NUM_CHNLS));
//}
}
else {
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL,
“riffa: fpga:%s chnl:%d, request already nulled out\n”,
DevExt->Name, (Chnl < RIFFA_MAX_NUM_CHNLS ? Chnl : Chnl - RIFFA_MAX_NUM_CHNLS));
}
}

=================================

Loading Dump File [F:\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols;C:\WinDDK\7600.16385.1\src\general\riffa\driver\windows\sys\objchk_win7_amd64\amd64
Executable search path is: C:\WinDDK\7600.16385.1\src\general\riffa\driver\windows\sys\objchk_win7_amd64\amd64
Windows 7 Kernel Version 7601 (Service Pack 1) MP (12 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7601.17514.amd64fre.win7sp1_rtm.101119-1850
Machine Name:
Kernel base = 0xfffff80002a11000 PsLoadedModuleList = 0xfffff80002c56e90
Debug session time: Tue Feb 17 03:38:55.592 2015 (UTC - 8:00)
System Uptime: 0 days 0:04:00.436
Loading Kernel Symbols



Loading User Symbols
PEB is paged out (Peb.Ldr = 000007ff`fffd3018). Type “.hh dbgerr001” for details
Loading unloaded module list

The context is partially valid. Only x86 user-mode context is available.
The wow64exts extension must be loaded to access 32-bit state.
.load wow64exts will do this if you haven’t loaded it already.
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {8, 2, 1, fffff88000f5c670}

Page 7c2028 not present in the dump file. Type “.hh dbgerr004” for details
Probably caused by : Unknown_Image ( Wdf01000!FxIrpQueue::RemoveIrpFromQueueByContext+30 )

Followup: MachineOwner

16.6: kd:x86> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000008, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000001, value 0 = read operation, 1 = write operation
Arg4: fffff88000f5c670, address which referenced memory

Debugging Details:

Page 7c2028 not present in the dump file. Type “.hh dbgerr004” for details

WRITE_ADDRESS: 0000000000000008

CURRENT_IRQL: 0

FAULTING_IP:
Wdf01000!FxIrpQueue::RemoveIrpFromQueueByContext+30
fffff880`00f5c670 48 dec eax

DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT

BUGCHECK_STR: 0xD1

LAST_CONTROL_TRANSFER: from 0000000000000000 to 0000000000000000

STACK_TEXT:
00000000 00000000 00000000 00000000 00000000 0x0

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
Wdf01000!FxIrpQueue::RemoveIrpFromQueueByContext+30
fffff880`00f5c670 48 dec eax

SYMBOL_NAME: Wdf01000!FxIrpQueue::RemoveIrpFromQueueByContext+30

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

BUCKET_ID: INVALID_KERNEL_CONTEXT

Followup: MachineOwner

Have you looked at the WDFLOG output?

It’s not pageable vs non-pageable… there’s a NULL pointer being dereferenced… probably the IRP that is underelying the WDFREQUEST is gone. See the arguments in the crash dump:

I’d bet you have a race condition on cancellation.

That cancel logic you have looks sort of tortured. Is there not a more clean way of handling this?

Peter
OSR
@OSRDrivers

On Mon, Feb 16, 2015 at 11:27 PM, wrote:

> WdfRequestUnmarkCancelable

WDF does not resolve the cancel race condition that has been a feature of
NT from day one. Pay close attention to the doc for
WdfRequestUnmarkCancelable. In particular

However, the driver must not call WdfRequestUnmarkCancelable after
> EvtRequestCancel
> https:
> callsWdfRequestComplete
> https:.
> In the following Example section, the example stores a local copy of the
> request handle and then clears it when the EvtRequestCancel callback
> function calls WdfRequestComplete. The driver does not call
> WdfRequestUnmarkCancelable if the local copy of the request handle has
> been cleared. This example uses the framework’s automatic synchronization
> https: to
> synchronize the callback functions.
>
> Note that calling WdfObjectReference
> https: to
> add an additional reference to the request object does not enable your
> driver to call WdfRequestUnmarkCancelable after the driver’s
> EvtRequestCancel
> https: callback
> function callsWdfRequestComplete
> https:
> .
>
https://msdn.microsoft.com/en-us/library/windows/hardware/ff550035(v=vs.85).aspx

The only way I know of to handle the race condition inherent in this design
is to associate a context with each request and to track the completion
state in that context. While the Irp will become invalid, the context and
the request object will not (if you use object references to keep them
valid), so you can independently track the state. IMHO KMDF ought to do
this for you, but not my design.

Mark Roddy</https:></https:></https:></https:></https:></https:>

@Peter

Thanks for pointing that out. So my guess is that calling WdfRequestUnmarkCancelable is resulting in the null pointer. Since the request object is not null (checked right before), the underlying IRP might be? I have debug statements in the EvtRequestCancel callback. But I guess I wouldn’t see them if they executed right before this null defreference? I’ll remove everything but the debug statements in that callback to see if it’s being triggered (I’m certainly not doing it from the user application).

As far as the cancel logic, isn’t this the only way to handle cancels in KMDF? The IO request comes into RiffaIoctlSend. If valid, WdfRequestMarkCancelable is called with the RiffaEvtRequestCancel callback. The send transfer takes place and the device interrupts to signal completion. Then RiffaCompleteRequest is called, where it checks if the RiffaEvtRequestCancel callback has already cancelled (the device context’s request field null check). If not null it completes the request. The only thing I see that I’m not doing is nulling out the device context’s request field in RiffaCompleteRequest. This could result in the RiffaEvtRequestCancel trying to complete the request, but again, I’m not sure what is triggering the cancel callback. Additionally, I might want to put a spin lock around the completion calls, but I was under the impression that automatic sync would prevent this race condition. Perhaps not.

I haven’t looked at the WDFLOG (if this is what you mean: https://msdn.microsoft.com/en-us/library/windows/hardware/ff545531(v=vs.85).aspx). I was hoping to avoid descending into this abyss but perhaps I need to rethink that.

Thanks,
Matt

@Mark

Right, the example on the WdfRequestMarkCancelable MSDN page is what I’m going for. Automatic sync, keep a reference to the WDFREQUEST on the device context object, null it out in the cancel callback. This is basically what (I believe) I’ve done here. I’m not seeing how my approach differs from the example and what you suggested.

Thanks,
Matt

Have you turned on wdf verifier?

d

Bent from my phone


From: xxxxx@gmail.commailto:xxxxx
Sent: ?2/?17/?2015 8:36 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] WdfRequestUnmarkCancelable results in BSOD

@Mark

Right, the example on the WdfRequestMarkCancelable MSDN page is what I’m going for. Automatic sync, keep a reference to the WDFREQUEST on the device context object, null it out in the cancel callback. This is basically what (I believe) I’ve done here. I’m not seeing how my approach differs from the example and what you suggested.

Thanks,
Matt


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

That needs to be the VERY FIRST thing you do. It’s not “descending into the abyss” it’s the first and primary step in debugging any issue you observe in a WDF driver.

Also, as Mr. Holan mentioned, be sure you’ve enabled WDF Verifier and (I’ll add) enabled verbose logging.

Well, no. There’s no “only way”… MY personal favorite way is to put pending Requests on a WDFQueue while they’re in progress. That pretty much eliminates the race problems. At least for me.

Well, you don’t want to put spin locks around your calls to WdfRequestComplete, if that’s what you mean. That’s generally considered “not a great idea” due to the chance of deadlocking.

There is NO DOUBT that WDF could be more helpful when it comes to cancel management. Maybe someday…

Peter
OSR
@OSRDrivers

A reference on the WDFREQUEST is insufficient if you are handling
cancellation of requests on your own. Instead you also need a request
context that tracks the state of the request - if it has or has not been
completed. What happens is that you can call WdfRequestMarkCancelable after
you have completed the request, at which point the IRP is invalid and the
crash occurs.

Mark Roddy

On Tue, Feb 17, 2015 at 11:36 AM, wrote:

> @Mark
>
> Right, the example on the WdfRequestMarkCancelable MSDN page is what I’m
> going for. Automatic sync, keep a reference to the WDFREQUEST on the device
> context object, null it out in the cancel callback. This is basically what
> (I believe) I’ve done here. I’m not seeing how my approach differs from the
> example and what you suggested.
>
> Thanks,
> Matt
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

On Tue, Feb 17, 2015 at 2:08 PM, wrote:

> Well, no. There’s no “only way”… MY personal favorite way is to put
> pending Requests on a WDFQueue while they’re in progress. That pretty much
> eliminates the race problems. At least for me.

That works if you can always cancel a request, otherwise you are back to
self managing the race condition.

Mark Roddy

Marion Bond wrote:

Cancel surely can fail; no?

From kernel mode, no it can’t. IoCancelIrp always succeeds. But the
“cancel operation” in that case is merely the setting of a single bit
and the calling of a callback. It does nothing else. It’s up to the
current owner to notice the bit and take some action, but that’s
completely separate from the “cancel” request.

The UM API CancelIO(Ex) is certainly documented to be able to fail and
anyway it provides no guarantee that the irp completes any faster

That can fail because of a bad handle, for example.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Yes. If the Request is always cancelable *while it is queued*, then the race condition is eliminated.

And if the Request is *never* cancelable, there’s no race condition in the first place (because you wouldn’t have marked the request as cancelable in the first place)

The problem exists when the Request is *sometimes* cancelable. In most cases, I’ve found that I can design my way out of this problem: Requests that will complete quickly? Never cancelable. Request that will complete “someday” (think a read on a serial port)? Always cancelable. Pair this up with a flag for your ISR (“there is no longer a Request in progress… do not put that data in the buffer, even if there’s one latent interrupt”) and you’re good to go.

Now, I’m not saying there aren’t cases where you’re not stuck doing it “the old fashioned way” and I’m not saying that KMDF shouldn’t make this easier. It could and it should. But, in drivers as in life, not putting yourself in the wrong situation can often make things easier.

Peter
OSR
@OSRDrivers

On Wed, Feb 18, 2015 at 8:42 AM, wrote:

> Pair this up with a flag for your ISR ("there is no longer a Request in
> progress… do not put that data in the buffer

Consider a controller that has requests in progress some of which will
complete “soon”, some “not so soon” and all of which may, at any point in
time, be dma-ing data into buffers. And consider that the protocol layer
insists that cancel must be honored in a timely fashion. There is no “flag
in your isr” that is relevant. Some blithely unaware piece of hardware is
going to be splattering memory with data asynchronous to whatever you do,
until you set out to inform it that a specific operation has to be stopped,
which is also likely an asynchronous operation. You cannot just cancel the
request in your cancel routine, or let a wdfqueue handle this for you. You
have to account for both a staged asynchronous cancel operation and a
concurrent completion operation. Adding a context to the request and doing
the accounting in that context seems to be the simplest and most correct
way to do this.

Mark Roddy

@ Mark

“Adding a context to the request and doing
the accounting in that context seems to be the simplest and most correct
way to do this.”

I’m keeping a reference to the WDFREQUEST on my device context object. There are only 2 places that can complete the request: the normal path and the EvtRequestCancel callback. When either one of these paths executes, I call WdfRequestComplete and null out the reference on the device context so that the other path won’t try to complete. This sounds like what you’re describing. Or are you saying something else?

@Peter

The computer that’s crashing is a client machine that I don’t have WDDK installed on. I’m working off the memory dump. I was able to get the wdflog from that, it’s pasted below. The transaction error is surprising. I’ll double check that I’m not trying to cancel the transaction multiple times. But what’s odd is that if I comment out the WdfRequestComplete call in the EvtRequestCancel callback, but leave the DbgPrintEx statements, the system never crashes. I never see a debug output that indicates the EvtRequestCancel callback is executed, and everything works great. But the moment I uncomment the WdfRequestComplete statement (and only that statement), the system BSODs and of course there’s nothing in the kernel debug log. It’s as if the driver knows I’m trying to debug it.

Any ideas?

16.6: kd:x86> !wdfcrashdump
Retrieving crashdump log information…
Local buffer 0x00374A98, bufferSize 4096
Driver: RIFFA
Trace searchpath is:

Trace format prefix is: %7!u!: %!FUNC! -
TMF file used for formatting log is: C:\WinDDK\7600.16385.1\tools\tracing\i386\Wdf01009.tmf
There are 54 log entries for RIFFA
— start of log —
1: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering PnP State 0x1AF599A0 from 0xFFFFFA80
2: FxPkgPnp::Dispatch - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F, IRP_MJ_PNP, !0xA0! IRP 0x801AF599
3: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering PnP State 0x1AF599A0 from 0xFFFFFA80
4: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering PnP State 0x1AF599A0 from 0xFFFFFA80
5: FxInterrupt::AssignResources - Is MSI? 1, MSI-ID 0, AffinityPolicy WdfIrqPolicyOneCloseProcessor, Priority WdfIrqPriorityUndefined, Group 0, Affinity 0xf0, Irql 0xb, Vector 0xb0
6: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power policy state 0x1AF599A0 from 0xFFFFFA80
7: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power idle state 0x1AF599A0 from 0xFFFFFA80
8: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
9: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
10: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
11: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
12: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
13: FxPkgIo::ResumeProcessingForPower - Power resume all queues of WDFDEVICE 0xE50A9FF8
14: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power idle state 0x1AF599A0 from 0xFFFFFA80
15: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power idle state 0x1AF599A0 from 0xFFFFFA80
16: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
17: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering Power State 0x1AF599A0 from 0xFFFFFA80
18: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power policy state 0x1AF599A0 from 0xFFFFFA80
19: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power policy state 0x1AF599A0 from 0xFFFFFA80
20: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power policy state 0x1AF599A0 from 0xFFFFFA80
21: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering power idle state 0x1AF599A0 from 0xFFFFFA80
22: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering PnP State 0x1AF599A0 from 0xFFFFFA80
23: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F entering PnP State 0x1AF599A0 from 0xFFFFFA80
24: FxPkgPnp::Dispatch - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F, IRP_MJ_PNP, !0xA0! IRP 0x801AF599
25: FxPkgFdo::HandleQueryPnpDeviceStateCompletion - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F returning PNP_DEVICE_STATE 0x452303264 IRP 0xFFFFFA80
26: FxPkgPnp::Dispatch - WDFDEVICE 0xE50A9FF8 !devobj 0x0000057F, IRP_MJ_PNP, !0xA0! type 0x801AF599 IRP 0x07FFFFFA
27: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
28: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
29: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
30: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
31: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
32: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
33: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
34: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
35: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
36: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
37: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
38: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
39: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
40: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
41: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
42: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
43: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
44: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
45: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
46: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
47: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
48: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
49: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
50: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
51: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DEE2D8 is already released, NTSTATUS= 57F
52: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E591F418 is already released, NTSTATUS= 57F
53: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6BF8 is already released, NTSTATUS= 57F
54: FxDmaTransactionBase::ReleaseForReuse - WDFDMATRANSACTION E4DE6418 is already released, NTSTATUS= 57F
---- end of log ----

Yeah… I’m not sure what the “Transaction is already released” messages are trying to say.

Did you enable driver verifier and WDF Verifier? Does this log have verbose logging enabled?

Peter
OSR
@OSRDrivers