I can’t seem to locate the original thread I started last month. So I’ll start a new one here.
I have an IOCTL based DMA driver that I wrote for interfacing with PCIe FPGAs. It’s for Win7 and built using the KMDF APIs. It works well and is reliable until I add support for IO cancellation. Once I add the WdfRequestMarkCancelableEx, WdfRequestUnmarkCancelable calls I get the BSOD .
The Minidump log when I have driver verifier turned on (as well as KMDF VerifyOn set to 1) is below.
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VERIFIER_ENABLED_VISTA_MINIDUMP
BUGCHECK_STR: 0x1E
PROCESS_NAME: svchost.exe
CURRENT_IRQL: 2
TAG_NOT_DEFINED_c000000f: FFFFF880032FFFB0
EXCEPTION_RECORD: fffff880032ff838 – (.exr 0xfffff880032ff838)
ExceptionAddress: fffff80002c7ccd0 (nt!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 1
Parameter[0]: 0000000000000000
TRAP_FRAME: fffff880032ff8e0 – (.trap 0xfffff880032ff8e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffff880032ffa78 rbx=0000000000000000 rcx=fffff98002aaafd0
rdx=fffff98002aaafd0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80002c7ccd1 rsp=fffff880032ffa78 rbp=fffff880032ffb02
r8=0000000000000000 r9=fffff88000ede610 r10=7efefefefefeff40
r11=8101010101010100 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
nt!DbgBreakPoint+0x1:
fffff800`02c7ccd1 c3 ret
Resetting default scope
LAST_CONTROL_TRANSFER: from fffff80002c7c8be to fffff80002c84e90
STACK_TEXT:
fffff880032fe918 fffff800
02c7c8be : fffffa80199bc580 fffff880
00c0bf6b fffff880032ff090 fffff800
02cb0a00 : nt!KeBugCheck
fffff880032fe920 fffff800
02cb06cd : fffff80002e95398 fffff800
02dce22c fffff80002c10000 fffff880
032ff838 : nt!KiKernelCalloutExceptionHandler+0xe
fffff880032fe950 fffff800
02caf4a5 : fffff80002dd62c4 fffff880
032fe9c8 fffff880032ff838 fffff800
02c10000 : nt!RtlpExecuteHandlerForException+0xd
fffff880032fe980 fffff800
02cc0431 : fffff880032ff838 fffff880
032ff090 fffff88000000000 00000000
c0000010 : nt!RtlDispatchException+0x415
fffff880032ff060 fffff800
02c84542 : fffff880032ff838 00000000
00000000 fffff880032ff8e0 fffff980
02aaae40 : nt!KiDispatchException+0x135
fffff880032ff700 fffff800
02c82374 : 0000000000000000 fffff880
032ff960 00000000c0000010 fffff980
02aaae40 : nt!KiExceptionDispatch+0xc2
fffff880032ff8e0 fffff800
02c7ccd1 : fffff88000efe0bd fffff980
02aaafd0 fffff98002aaafd0 00000000
00000000 : nt!KiBreakpointTrap+0xf4
fffff880032ffa78 fffff880
00efe0bd : fffff98002aaafd0 fffff980
02aaafd0 0000000000000000 fffff880
00ede610 : nt!DbgBreakPoint+0x1
fffff880032ffa80 fffff880
00eff691 : 0000000000000000 fffff980
02aaae40 fffff98002aaae40 fffff980
0c722e60 : Wdf01000!FxRequest::Vf_VerifyRequestIsCancelable+0x79
fffff880032ffad0 fffff880
00e5dbcc : 0000000000000002 fffff980
0c722e60 fffff880032ffb50 fffff880
00000008 : Wdf01000!FxIoQueue::Vf_VerifyRequestCancelable+0x99
fffff880032ffb10 fffff880
00e68685 : ffff9d59b1850b10 fffff980
0c722e60 0000067ff38dd198 00000000
00000022 : Wdf01000!FxIoQueue::RequestCancelable+0x3c
fffff880032ffb80 fffff880
061a821b : fffff9800c722e60 0000067f
fd4db1b8 fffff98002b24e40 fffffa80
1bf8049c : Wdf01000!imp_WdfRequestUnmarkCancelable+0x115
fffff880032ffbe0 fffff880
061a6a22 : 0000067ff38dd198 fffffa80
1bf8049c fffff880061a9c70 fffffa80
1bf8030c : riffa!WdfRequestUnmarkCancelable+0x1b [c:\winddk\7600.16385.1\inc\wdf\kmdf\1.9\wdfrequest.h @ 775]
fffff880032ffc10 fffff880
061a7d6c : fffffa801bf802d0 fffff800
00000000 fffff88000000000 fffffa80
1bf8030c : riffa!RiffaCompleteRequest+0xa2 [c:\winddk\7600.16385.1\src\general\riffa\driver\windows\sys\riffa.c @ 1462]
fffff880032ffc70 fffff880
061a5252 : fffffa801bf802d0 fffff880
00000000 fffff88000000001 fffffa80
00000000 : riffa!RiffaTransactionComplete+0x10c [c:\winddk\7600.16385.1\src\general\riffa\driver\windows\sys\riffa.c @ 1907]
fffff880032ffcc0 fffff880
00ed96d3 : 0000067ffd4db1b8 0000057f
e407ffd8 000000004e54241d 00000000
4559db6a : riffa!RiffaEvtInterruptDpc+0x622 [c:\winddk\7600.16385.1\src\general\riffa\driver\windows\sys\riffa.c @ 878]
fffff880032ffed0 fffff800
02c907ac : fffff880032d0180 fffff880
032d0180 fffff98002b24ed8 00000000
00000042 : Wdf01000!FxInterrupt::DpcHandler+0xc3
fffff880032fff00 fffff800
02c87925 : 0000000000000000 fffffa80
19cdb300 0000000000000000 fffff880
00ed9840 : nt!KiRetireDpcList+0x1bc
fffff880032fffb0 fffff800
02c8773c : 0000000000000000 00000000
00000000 0000000000000000 00000000
00000000 : nt!KxRetireDpcList+0x5
fffff880063657e0 00000000
00000000 : 0000000000000000 00000000
00000000 0000000000000000 00000000
00000000 : nt!KiDispatchInterruptContinue
STACK_COMMAND: kb
FOLLOWUP_IP:
riffa!WdfRequestUnmarkCancelable+1b [c:\winddk\7600.16385.1\inc\wdf\kmdf\1.9\wdfrequest.h @ 775]
fffff880`061a821b 4883c428 add rsp,28h
FAULTING_SOURCE_LINE: c:\winddk\7600.16385.1\inc\wdf\kmdf\1.9\wdfrequest.h
FAULTING_SOURCE_FILE: c:\winddk\7600.16385.1\inc\wdf\kmdf\1.9\wdfrequest.h
FAULTING_SOURCE_LINE_NUMBER: 775
FAULTING_SOURCE_CODE:
771: WDFREQUEST Request
772: )
773: {
774: return ((PFN_WDFREQUESTUNMARKCANCELABLE) WdfFunctions[WdfRequestUnmarkCancelableTableIndex])(WdfDriverGlobals, Request);
775: }
776:
777: //
778: // WDF Function: WdfRequestIsCanceled
779: //
780: typedef
SYMBOL_STACK_INDEX: c
SYMBOL_NAME: riffa!WdfRequestUnmarkCancelable+1b
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: riffa
IMAGE_NAME: riffa.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 54e4c333
FAILURE_BUCKET_ID: X64_0x1E_VRF_riffa!WdfRequestUnmarkCancelable+1b
BUCKET_ID: X64_0x1E_VRF_riffa!WdfRequestUnmarkCancelable+1b
Followup: MachineOwner
It seems to suggest that the call to WdfRequestUnmarkCancelable in my RiffaTransactionComplete function is to blame. OK. But my understanding is that I need to call WdfRequestUnmarkCancelable on every WDFREQUEST that I call WdfRequestMarkCancelable. According to the WdfRequestUnmarkCancelable MSDN page (https://msdn.microsoft.com/en-us/library/windows/hardware/ff550035(v=vs.85).aspx), this must be done everywhere except the EvtRequestCancel callback.
The general flow of things in my driver:
-> RiffaEvtIoDeviceControl()
// This is the entry point
-> RiffaIoctlSend()
// Save the WDFREQUEST in a field on the PDEVICE_EXTENSION object.
// Start a DMA transfer.
// Call WdfRequestMarkCancelableEx() with the WDFREQUEST and own EvtRequestCancel callback called RiffaEvtRequestCancel.
The driver waits for an interrupt to signal the end of the DMA transfer.
-> RiffaEvtInterruptDpc()
// Interrupt handler
-> RiffaTransactionComplete()
// Ends the DMA transfer, releases the DMA transaction object.
-> RiffaCompleteRequest()
// Gets WDFREQUEST off the PDEVICE_EXTENSION object. Checks if it’s null. If null the request was cancelled and this does nothing. If not null, sets the field on the PDEVICE_EXTENSION to null and calls WdfRequestUnmarkCancelable(), followed by WdfRequestCompleteWithInformation().
If ever invoked, the EvtRequestCancel callback, RiffaEvtRequestCancel, sets the WDFREQUEST field on the PDEVICE_EXTENSION to null and calls WdfRequestCompleteWithInformation().
It’s a pretty straightforward/simple driver modeled after the PLX9x5x driver example. I have yet been able to determine what’s causing this problem. The only thing I do know is that it only seems to happen when issuing a lot of back to back invocations. Stepping through invocations with a debugger seems to slow things down enough to prevent whatever’s going on.
I’ll post the relevant function bodies in a follow up post.
I’ll add a few extra detail and suspicions. The suspicions about possible causes stem from other somewhat related posts I’ve found. But as of yet, I’ve been unable to identify these ideas as the culprit.
-
The driver runs flawlessly without WdfRequestMarkCancelable & WdfRequestUnmarkCancelable. With low frequency and high frequency back to back invocations. But I need to be able to cancel the IO request. So I’m stuck using WdfRequestMarkCancelable & WdfRequestUnmarkCancelable.
-
Someone had a similar problem and discovered that the WDFREQUEST objects were being reused behind the scenes and that person’s driver was failing to call WdfRequestUnmarkCancelable on one of the code paths. His error was similar but was being raised when calling WdfRequestMarkCancelable on a WDFREQUEST that was presumably still marked cancelable. I don’t think this is what’s going on here, but knowing that the WDFREQUESTS are being used is useful.
-
In my initial post on this, someone vaguely suggested that the WDFREQUEST might be non-null on the PDEVICE_EXTENSION but that somehow the “underlying request” object" might have been completed. I don’t see how this could happen as both code paths that complete this WDFREQUEST mark it as null before doing so. This suggestion was a bit vague, so I don’t really know what would cause such a situation. If you do, please enlighten me.
Any ideas would be appreciated. Thanks.
Matt