Problem with KMDF occasionally not copying map registers to output buffer after DMA read.

I’m working on a 64bit KMDF driver for a card which is restricted to 32bit DMA transfers, so I’ve configured a DMA enabler using WdfDmaProfilePacket in order that KMDF will allocate suitable map registers for me. I’m using automatic synchronisation to reduce the need for locking throughout the driver.

When I send a read request to my device from usermode, in my EvtProgramReadDma function I note that the WDF read request’s output buffer has an address in 64bit space as expected, and has been initialised such that every byte is a 0xE0 filler byte. I then program the PLX9x5x chip on the card to perform the DMA card-to-RAM read operation, noting that the read buffer address provided to me by KMDF in the single-element SG list is a 32bit address, also as expected, and this buffer initially has seemingly arbitrary bytes in it.


EvtProgramReadDma successfully tells the PLX to start the transfer, and I then receive an interrupt from the chip’s DMA controller via my EvtInterruptIsr function confirming that the transfer is complete; I queue my EvtInterruptDpc function, which then executes.


If I set a WinDbg breakpoint at this point in execution, I can inspect the two buffers and see that the DMA-transferred data has arrived in the 32bit read buffer (map registers) as expected, and the final 64bit output buffer is still filled with the 0xE0 filler byte, also as expected. (For clarity, the read data in question will never be a series of 0xE0 bytes.)


I then call WdfDmaTransactionDmaCompletedFinal because, while I’m requesting theoretically large reads (of one 4K page), the device will only transfer the data it has already available before interrupting. The usermode application deals with inspecting the returned data (which sometimes might only amount to a few bytes) and with determining whether to issue a further read request - so I want my driver to indicate to the usermode application that no more bytes are expected to be transferred for this request. Thus KMDF will not try to start any further DMA reads for the request.


I will _almost_ always find after calling WdfDmaTransactionDmaCompletedFinal that the contents of the 32bit buffer have been transferred by KMDF to the final request output buffer (in 64bit address space).


The problem I’m having is that very occasionally, this doesn’t appear to happen. In this scenario, I log the call to WdfDmaTransactionCompletedDmaFinal with the correct non-zero byte count provided, it returns TRUE as expected, but the 64bit output buffer remains apparently unaltered, when usually it would have had data copied into it by KMDF by this point. When I continue to complete the IO request with WdfRequestSetInformation (providing the same non-zero byte count) and WdfRequestCompleteWithPriorityBoost (with STATUS_SUCCESS since I’m indicating a successful transfer, albeit of fewer bytes than requested), the usermode application finds that while the read request appears to have completed successfully with the noted byte count, the bytes in the 64bit buffer are still all the 0xE0 filler byte.


The particularly puzzling aspect is that in this scenario KMDF appears not to be copying _anything_ from the 32bit read buffer to the final 64bit output buffer – I’ve verified from my trace output that the length provided to WdfDmaTransactionDmaCompletedFinal is non-zero and the length provided to WdfRequestSetInformation is the same non-zero value. I’d expect still to get the specified number of bytes written to the 64bit buffer even if they were the wrong bytes (say if, owing to some bug in my code, the copy had somehow taken place just before the DMA controller placed the read data in the 32bit buffer etc) – but it appears that on occasion nothing at all is copied.


I did also try the slightly more convoluted approach of using WdfDmaTransactionDmaCompletedWithLength to indicate the success of the first transfer, resulting in the framework calling my EvtProgramReadDma function to program a further transfer for the remaining bytes, and in this case immediately calling WdfDmaTransactionCompletedDmaFinal in EvtProgramReadDma to indicate no more bytes will be transferred - but the same scenario would still arise on occasion.


Can anyone suggest why this might be happening?


Processor: x64 (Intel Core i5-4590)

Driver test PC: Dell Optiplex 3020

OS: Windows 10 20H2

KMDF: 1.9 (I still want to maintain Windows 7 compatibility)

SDK/WDK: 22000

First, let me thank you for what is one of the most complete problem descriptions that I’ve read in this forum in ages.

It sounds like you’re doing all the right steps. Have you run your driver under WDF Verifier with (everything but low resource simulation and with) DMA Verification enabled?

A gratuitous comment: For the amount of data you talk about transferring, you’d be much better off just using programmed I/O and forgetting about using DMA. But, there may be more going on in your driver than you describe here.

For fu, try building your driver with the latest version of the Framework. There were tons of DMA-related changes in KMDF 1.11 …. See if using the new version (and running on Win 20H2 as you are, or even newer) makes any difference.

Peter

Many thanks for your reply. Yes, I’ve got WDF Verifier verifying my driver with all its options turned on (save for handle tracking where I’ve only enabled tracking of DMA and IO-related objects), and I’ve got Driver Verifier verifying my driver (plus Wdf01000.sys) with all its checks turned on except (systematic|randomised) low resource simulation as you suggest.

I did try to use the !dma extension to WinDbg, but it always produces the message “Error retrieving adapter list offset.” which appears to be a standard problem with this extension that I’ve not yet found the relevant structure definitions to be able to work round.

I’ve just tried targeting the driver at KMDF 1.11, and upgrading the driver testing machine to Windows 10 21H2, but the problem continues to occur. I’d initially hoped that VerifyDownlevel being on might have turned something up following the Windows version upgrade, but I realise of course that both versions 20H2 and 21H2 include the same KMDF version and hence inherently have (I think) the same version of WDF Verifier.

One interesting observation is that the busier the machine is, the less likely the problem appears to be to occur, so it does seem as though there’s a race condition somewhere. Ideally I’d find some way to log every write to the output buffer of my read requests (since I think halting the machine and changing breakpoints etc will impact the timing of the operations), to at least determine whether the data is written to the output buffer and then erroneously cleared somehow, or whether the buffer remains entirely unaltered throughout the affected read request.

The transfers performed are a variety of sizes, some of which benefit greatly from DMA, but yes, at present the driver uses DMA for all reads, even in sub-10-byte cases such as my test case when it’d be far more efficient to use programmed I/O; this is something which I’m intending to improve once I’ve got the DMA operation operating reliably.

I’d be very grateful for anything further which might occur to you!

Pretty sure HalpDmaSyncMapBuffers is the function that does the copy back and forth, though I don’t have an easy way to try it at the moment. If you set a breakpoint there do you see it called in the working and broken cases?

I’ve tried setting a breakpoint in nt!HalpDmaSyncMapBuffers, but it doesn’t get hit at all, which is puzzling. Presumably though once that’s sorted my best bet would be to turn on KdPrint-style debug logging of my driver to WinDbg and use something like bm nt!HalpDmaSyncMapBuffers “ds mymodule!AddrOfSomeStringConstant; g” so that I essentially get some output interspersed into my existing debug output showing where the sync function was called - and hence in particular I could then see if it wasn’t called? (In this case I’d obviously rather use a WinDbg command that just displays an “immediate” string, but I’m not aware of a command that does that…)

I’ve also now discovered that if I use a scatter-gather DMA profile (changing WdfDmaProfilePacket to WdfDmaProfileScatterGatherDuplex), the problem never appears to occur. Specifically, just changing the profile parameter of WDF_DMA_ENABLER_CONFIG_INIT like this removes the problem - although making that single change in isolation of course means that my existing EVT_WDF_PROGRAM_DMA code will then only ever handle the first element from the scatter-gather list, hence introducing the potential for shortened reads.

That said, the application design happens to mitigate this slightly in two ways; the buffers used for these reads at present happen to be page-aligned 4K buffers so only one SG list element should be required, and if a read returns fewer bytes than necessary the application simply issues a further read.