Walking MDL To Create Hardware Format SGL

Hi All,
I have posted a part of this question earlier. Just to be in context I will copy some of the information from that.

Background

I am writing a very simple driver where the application will send me a read and/or write buffer (DMA from and to the device). The device needs a circular buffer for issuing request and completing request (NVME style queues). There can be more than one request issues to the device (4K requests). These requests transfer data to / from the user mode application buffers.

The DMA profile that I am going to use is
WdfDmaProfileScatterGather64Duplex : The device supports packet-based, scatter/gather DMA operations, using 64-bit addressing. The device also supports duplex operation.

From what I have read so far, the following is the flow, when I am using the Windows (KMDF) DMA Abstraction.

Add Device : Create the queues using the common buffer DMA APIs. To use Windows DMA abstraction, create a single DMA enabler object.
EvtIoxxx : Create and Initialize the DMA transaction object.
EvtProgramDma : You will get your SGL, format it to the h/w specific format and then program the hardware for the DMA transfer.
Interrupt calls the DPC
DPC completes the request [one transaction is one transfer]

Problem

**My problem is that the documentation states that “ If the driver has specified a packet based DMA profile, it must serialize all of the DMA transactions because the framework allows only one packet based DMA transaction to execute at any given time”. **

I do not understand this statement completely. Does it mean that you can only have a single DMA transfer possible at any given time. That is a sever limitation of DMA Abstraction.

My device will have to have multiple IOs pending at any given time (up to 4K IOs).

My question

If I cannot use the DMA Abstraction APIs, How will I create the SGL? The only option that remains is to walk through the MDL myself and then create the hardware specific SGL in a common buffer. [The Book “Developing Drivers With WDF” says that it is not a good idea to walk the MDL by yourself and build SGL]

Any help is really appreciated.

  • Aj

You have no choice but to use the Windows (or WDF) DMA abstractions. It’s not optional, if you want your driver to work. So, no MDL walking or any other such bullshit is allowed. Just forget about it and get back to solving your problem.

Let’s take a step back for a minute. You have a driver that implements the WDF abstraction, and uses a scatter/gather 64-but profile. Now… Do you have a functional problem, or are you concerned about a potential problem based on the documentation?

The issue the docs are talking about is map register allocation. Windows will allocate the map registers you need, and will block your calls to start new DMA operations until sufficient map registers are available. To avoid one driver allocating all the map registers and “freezing out” everyone else, there’s a max allocation algorithm. BUT… In your case (64-but Scatter/Gather profile) you’re not using any map registers (putting aside the whole DMAR/IOMMU thing for a moment). So, there should be no limit on the number of Transactions you can simultaneously start.

So…… back to the question: What problem are you trying to solve? Just keep creating, initializing, executing and completing, and deleting DMA Transactions. As long as your “execute” callback gets called, you’re good to go. Isn’t this the way that horrible 9x5x sample works (that’s a real question, I honestly don’t remember)?

Peter

The PLX 9x5x uses sequential queue so that is never going to be a issue with that. But you did answer my question Peter. I was just worried that once there is a DMA transaction in progress, I will not have the ability to post another transaction. The books/docs seem to imply that and the sample was also doing this.

Thanks a lot.
Appreciate the help.

  • Aj

Yeah, there are so very many reasons to dislike that sample…. But, a good sample that does real DMA on real hardware is exceptionally tough to put together. The vast differences in how DMA is used, coupled with the complexities of programming a real DMA engine (which can drown an otherwise “simple” example in implementation-specific details), make creating a DMA example that normal people can follow and learn from hard.

Microsoft should hire us to write a nice set of DMA samples, don’t you think? :wink:

Anyhow, code it up (I get the impression you haven’t done so yet). You’ll discover very quickly if there’s a problem, right?

So….

@“Peter_Viscarola_(OSR)”
Microsoft should hire us to write a nice set of DMA samples, don’t you think? :wink:

That would be great cause then we will have some really good samples in place which adhere to how things really work.

You are right, I have not coded it up at all and I am just trying to make design choices as of now. That is why I am asking such questions.
I will give this a try and then see how things go.

Thanks

  • Aj