Trying to fail DMA Transfer

All,

I’m still trying to work my problem of timing out stalled DMA transfers. Now that I got the timer working, I’m having a problem when trying to fail the stalled DMA transfer. Within the timer callback, once I detect that the transfer has stalled, I run the following code:

NTSTATUS Status;
WdfDmaTransactionDmaCompletedFinal(DMATransaction, 0, &Status);
WDFREQUEST Request = WdfDmaTransactionGetRequest(DMATransaction);
size_t bytesTransferred = WdfDmaTransactionGetBytesTransferred(DMATransaction);
WdfDmaTransactionRelease(DMATransaction);
Status = STATUS_IO_TIMEOUT;
WdfRequestCompleteWithInformation(Request, Status, bytesTransferred);

DMATransaction is pulled from the device context, and I’ve verified that it matches what was passed into my ProgramWriteDma() callback. However, the call to WdfDmaTransactionDmaCompletedFinal() bugchecks. The analysis leads me to believe the issue is related to the fact that I’m calling in a DISPATCH_LEVEL context (this is the timer’s callback). However, according to MSDN this method can be called at DISPATCH_LEVEL (which one would expect since it’s pretty much always called from a DPC). Any ideas? I’ve included the analysis report below.

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000000, memory referenced
Arg2: 00000000, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 00000000, address which referenced memory

Debugging Details:

READ_ADDRESS: 00000000

CURRENT_IRQL: 0

FAULTING_IP:
+1562faf00e0dfc0
00000000 ?? ???

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xD1

PROCESS_NAME: Idle

LAST_CONTROL_TRANSFER: from 80a30d41 to 80ab9138

FAILED_INSTRUCTION_ADDRESS:
+1562faf00e0dfc0
00000000 ?? ???

STACK_TEXT:
80ae5ee4 80a30d41 00000003 80ae6240 00000000 nt!RtlpBreakWithStatusInstruction
80ae5f30 80a319ac 00000003 8a53f8b0 8aa3c410 nt!KiBugCheckDebugBreak+0x19
80ae6310 80a31f18 0000000a 00000000 00000000 nt!KeBugCheck2+0x574
80ae6330 8010250a 0000000a 00000005 00000002 nt!KeBugCheck+0x14
80ae6340 a829b6d7 8aa3c490 a83029b4 80ae6374 HAL!KfLowerIrql+0x22
80ae6350 a829b7b4 8aac4738 8aa3c410 a829bc38 wdf01000!FxDmaScatterGatherTransaction::PutScatterGatherList+0x30
80ae635c a829bc38 8a53f8b0 8a9f27e0 00000000 wdf01000!FxDmaScatterGatherTransaction::TransferCompleted+0x15
80ae6374 a8299bdd 00000000 80ae63bc 00000003 wdf01000!FxDmaTransactionBase::DmaCompleted+0x147
80ae638c b95c7c0d 00000000 755c3be8 00000000 wdf01000!imp_WdfDmaTransactionDmaCompletedFinal+0xb6
80ae63a4 b95c84b4 755c3be8 00000000 80ae63bc HIBDrvr!WdfDmaTransactionDmaCompletedFinal+0x1d [c:\winddk\7600.16385.0\inc\wdf\kmdf\1.9\wdfdmatransaction.h @ 328]
80ae63d4 a82c9dc7 7560d818 a82ca2b4 80ae6420 HIBDrvr!EvtTimerTimeoutFunc+0xa4 [d:\hib_pci_driver\hibdrvr.cpp @ 1346]
80ae63f0 a82ca2cb 80ae6514 80a3de83 8a9f2848 wdf01000!FxTimer::TimerHandler+0x82
80ae63f8 80a3de83 8a9f2848 8a9f27e0 0c272f6a wdf01000!FxTimer::_FxTimerDpcThunk+0x17
80ae6514 80a3dfbe b88b23f6 00000001 80ae6524 nt!KiTimerListExpire+0x15d
80ae6534 80ad9899 80af26a0 00000000 0000b8c7 nt!KiTimerExpiration+0xce
80ae6560 80ad975f 00000000 0000000e 00000000 nt!KiRetireDpcList+0x7e
80ae6564 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x4f

STACK_COMMAND: kb

FOLLOWUP_IP:
HIBDrvr!WdfDmaTransactionDmaCompletedFinal+1d [c:\winddk\7600.16385.0\inc\wdf\kmdf\1.9\wdfdmatransaction.h @ 328]
b95c7c0d 5d pop ebp

FAULTING_SOURCE_CODE:
324: NTSTATUS* Status
325: )
326: {
327: return ((PFN_WDFDMATRANSACTIONDMACOMPLETEDFINAL) WdfFunctions[WdfDmaTransactionDmaCompletedFinalTableIndex])(WdfDriverGlobals, DmaTransaction, FinalTransferredLength, Status);

328: }
329:
330: //
331: // WDF Function: WdfDmaTransactionGetBytesTransferred
332: //
333: typedef

SYMBOL_STACK_INDEX: 9

SYMBOL_NAME: HIBDrvr!WdfDmaTransactionDmaCompletedFinal+1d

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: HIBDrvr

IMAGE_NAME: HIBDrvr.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4d5a9fdf

FAILURE_BUCKET_ID: 0xD1_CODE_AV_NULL_IP_HIBDrvr!WdfDmaTransactionDmaCompletedFinal+1d

BUCKET_ID: 0xD1_CODE_AV_NULL_IP_HIBDrvr!WdfDmaTransactionDmaCompletedFinal+1d

Followup: MachineOwner

The Bug Check indicates the problem not running in DISPATCH LEVEL but using not valid memory.
It is likely SGL is not valid. It may be the driver already complete the request.

Igor Sharovar

You wrote:

I’m still trying to work my problem of timing out stalled DMA transfers. Now that I got the timer working,
I’m having a problem when trying to fail the stalled DMA transfer. Within the timer callback, once I detect
that the transfer has stalled, …

What you’re asking is simply impossible. Once you have initiated a DMA transaction, the hardware owns that transaction (and the bus!) until the transaction is complete. If it stalls, you have a hardware bug, and you cannot recover from that in software.

Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

When I say “stalled” I don’t mean that the hardware has control of the bus. Typically it’ll be something like my driver requests the start of a DMA transfer, but the hardware never even requests the bus. Yes, it is a bug in my hardware, and until we get our hardware all straightened out it’s really annoying that the software locks up with a pending IRP. It seems to me that WdfDmaTransactionDmaCompletedFinal() is just for this type off purpose… the downside is that I can’t get it to work!

Igor, I did check into it and found that none of the three functions for completing a request are being called. So I’m having a really hard time figuring out what exactly the memory problem is.

Thanks,
Ryan

Rayn,
Tim gave to you a very good point. A DMA transaction is not finished and you don’t know what happened on the PCI bus. Try to use a PCI bus analyzer to see PCI transactions during your driver DMA transfer.

Igor Sharovar

Ryan stated …
When I say “stalled” I don’t mean that the hardware has control of the bus. Typically it’ll be something like my driver requests the start of a DMA transfer, but the hardware never even requests the bus. Yes, it is a bug in my hardware, and until we get our hardware all straightened out it’s really annoying that the software locks up with a pending IRP.

… and I reply …
This is a little more serious than “really annoying”; you have a race condition here, pure and simple. Consider that your driver does the “right things” and DMA transfer starts, but the HW doesn’t respond and you [somehow] cancel the DMA transfer and *then* the HW responds, blasting bytes into who knows what page of memory. OK, so you put a timeout waiting for the HW … how long, a uSec, a ms, an hour, a long weekend? And when the timeout hits, do you then tell the HW to cancel the DMA which might right then be in progress [which is another race?] … in a word, trying to fix the hardware issue like this with a software fix is going to be much more problematic than simply canceling a DMA transfer …

Faced with your problem, I would propose a modified common buffer DMA approach to the HW folks: allocate a block of pinned, contiguous memory and give this to the HW as it’s “sandbox”. Set up three registers, one a “start”, one a “complete” and one a “direction”. When you want to DMA *to* the HW set the “direction” high, you write the data to the buffer and then set the “start” high, then poll the “start” register for it to go low. When/if the HW starts it’s DMA it will set the “start” register low and when complete it will set the “complete” register high – your driver polls for the “complete” to go high after the “start” goes low. Once you get “complete” you read the buffer and rinse/ repeat. For DMA *from* the HW it’s the same idea; when the direction bit goes low you don’t treat the buffer as safe and wait for the “start” bit to go high which the HW should set when it’s starting it’s DMA. When/if the HW completes the DMA it will set the “complete” high and you know it’s safe to read from the buffer. You can monitor the time elapsed since “start” was set to soft-reset the HW as needed …

Yes, slow/ inefficient/ handmade/ etc. but much, much safer than trying to make the OS play chicken with the HW …

Cheers!

Craig, good thought on going the common buffer DMA approach instead. I’ll definitely look at going that route because there really are a number of advantages from a system approach for us.

As far as your comments about the race condition, I’m actually not worried about that. My timeout is 1.5 seconds, and IF the hardware is actually going to do a DMA, it’ll always happen within a few milliseconds (and usually within microseconds) of writing the start register, so if nothing has happened by 1.5 seconds I know the hardware is never going to try to do the transfer (and at that point the driver commands a hardware reset since something obviously went haywire).

I did figure out why the code in my original post was causing a bugcheck. I was calling WdfDmaTransactionDmaCompletedFinal() from within a spinlock acquired by calling WdfInterruptAcquireLock(). Unfortunately my knowledge of driver development isn’t good enough yet to figure out why this was a problem, but apparently that’s a no-no.

Thanks for the inputs. I’m sure you’ll hear from me in the near future as I run into a brick wall trying to implement a common buffer :slight_smile:

Ryan