DMA implementation using KMDF

I have a general question regarding DMA in a KMDF driver. I read through the documentation, the wdf book and the relevant samples and have come to the possibly incorrect conclusion that the DMA API framework presented in WDF is meant more for I/O driven events that use the DMA hardware as a Read/Write DMA device.

My driver uses DMA internally as a means of transferring data and is totally transparent to a user. I have no intention of exposing the DMA hardware capabilities to the user. Therefore it is not I/O driven. This is why I am not seeing the instant benefit of using the methods presented in what I was reading so far.

I can set up the DMA hardware(PLX 9056) to perform the transfers without using the DMA API but I am not certain how I should go about obtaining a buffer that is DMA safe. So far, it seems that the buffers obtained using the DMA API are the only memory buffers that appears to be DMA safe. I may be heading down the wrong path here already.

Does anyone have any information that could put me on the right track? Am I not seeing the fact that the DMA API framework can satisfy my need?

The DMA APIs are not tied to the i/o subsystem. You can initialize a
DMA transaction in 2 ways, by calling WdfDmaTransactionInitialize or
WdfDmaTransactionInitializeUsingRequest. The difference between the two
is that the former allows you to explicitly pass a buffer to use in the
transaction while WdfDmaTransactionInitializeUsingRequest extracts the
buffer (and length) from the WDFREQUEST based on the request type.

When you ask about obtaining a dma buffer that is safe, do you mean a
logical address or a virtual address? If you mean a logical address,
the only way to translate from a virtual address to a logical address
safely based on your hw characteristics is to use the OS DMA APIs (or
the KMDF APIs which use the OS APIs underneath)

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@ddc-web.com
Sent: Tuesday, July 24, 2007 3:38 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] DMA implementation using KMDF

I have a general question regarding DMA in a KMDF driver. I read through
the documentation, the wdf book and the relevant samples and have come
to the possibly incorrect conclusion that the DMA API framework
presented in WDF is meant more for I/O driven events that use the DMA
hardware as a Read/Write DMA device.

My driver uses DMA internally as a means of transferring data and is
totally transparent to a user. I have no intention of exposing the DMA
hardware capabilities to the user. Therefore it is not I/O driven. This
is why I am not seeing the instant benefit of using the methods
presented in what I was reading so far.

I can set up the DMA hardware(PLX 9056) to perform the transfers
without using the DMA API but I am not certain how I should go about
obtaining a buffer that is DMA safe. So far, it seems that the buffers
obtained using the DMA API are the only memory buffers that appears to
be DMA safe. I may be heading down the wrong path here already.

Does anyone have any information that could put me on the right track?
Am I not seeing the fact that the DMA API framework can satisfy my need?


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Doran -

Thanks for the reply. What I meant by obtaining a “dma safe” buffer I meant obtaining a buffer that was physically contiguous that be used in the case where I did not use the DMA API’s. I guess I am confused by the fact that to me it seems that I do not need the DMA API’s to actually perform the DMA. But, on the other hand, it seems I may need the DMA API’s to deal with not actually having physically contiguous memory.

The 9056 is a pretty “dumb” device in that once you load up a source, destination, xfer length and write a start bit to the register it will just go without regard to paging. Also, I belive the source address (since these are DMA reads off of my device) would have to be a physical address.

This method was implemented originally in Linux( i know its a dirty would for the OSR forums). In Linux, the only thing I really had to worry about was obtaining a physically contiguous memory buffer to write to and the rest was just a few register writes to make the DMA happen. I basically obtained a chunk on non-paged memory and wrote my own basic memory manager for my application that supplied buffers for dma transactions.

It was a very bad approach. It was very primtative but it worked since it was more of a an embedded system.

I think I just need some pointers on how to re-align my thinking since I am in the Windows world. Can these DMA API’s good actually alleviate the need for obtaining contiguous memory?

xxxxx@ddc-web.com wrote:

Thanks for the reply. What I meant by obtaining a “dma safe” buffer I meant obtaining a buffer that was physically contiguous that be used in the case where I did not use the DMA API’s. I guess I am confused by the fact that to me it seems that I do not need the DMA API’s to actually perform the DMA. But, on the other hand, it seems I may need the DMA API’s to deal with not actually having physically contiguous memory.

The 9056 is a pretty “dumb” device in that once you load up a source, destination, xfer length and write a start bit to the register it will just go without regard to paging. Also, I belive the source address (since these are DMA reads off of my device) would have to be a physical address.

You do not have to use ALL of the DMA APIs. The easiest “blessed”
method of getting a physically contiguous buffer is to use
IoGetDmaAdapter to get yourself a DMA adapter object, and just think of
that object as a “helper”. You can use its AllocateCommonBuffer and
FreeCommonBuffer methods to allocate and free physically contiguous
memory without using any of the other methods.

The other methods would help you so scatter/gather DMA and manage the
map registers, but for common buffer DMA, which sounds like what you
want, you don’t necessarilyneed anything other than those first two.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Using the Windows DMA abstraction is the only really proper way to do DMA in Windows.

Sure, you can get a contiguous buffer and jam the physical address into the device. And, yes, this will work on most hardware platforms. But by doing this, you’re breaking the contract that you have with the Windows HAL and Kernel that says they’ll provide you a simple, useful, set of interfaces and you agree to use them without going “behind their back” and doing things yourself.

So, sorry… I’d really recommend you embrace and use the proper Windows DMA abstractions. Not doing so is really no different than coding “IN” and “OUT” instructions in your driverinstead of WRITE_PORT_UCHAR – cuz you “know” that they’ll work.

Peter
OSR

(P.S. There used to be an excellent paper on DMA in both KMDF and WDM on WHDC… I see this has been “updated” (and lessened in quality) and now only contains info on KMDF: http://www.microsoft.com/whdc/driver/wdf/dma.mspx – In any case, it’s still got a core of good stuff and goes a long-way toward explaining the whole DMA abstraction thing. I recommend reading it).

Thanks to both Tim and Peter for your responses. I have read up on your suggestions and I have some further questions. I now do feel like there should be no reason not to use the API because of what it does for you. I went back to what Doran originally said to me and I saw his response from a different angle. He said :

"The DMA APIs are not tied to the i/o subsystem. You can initialize a DMA transaction in 2 ways, by calling WdfDmaTransactionInitialize or WdfDmaTransactionInitializeUsingRequest. The difference between the two is that the former allows you to explicitly pass a buffer to use in the transaction while WdfDmaTransactionInitializeUsingRequest extracts the buffer (and length) from the WDFREQUEST based on the request type. "

It seems that WdfDmaTransactionInitializeUsingRequest is directly tied to WDFREQUEST objects which primarily come down through the I/O subsystem. Andthat is fine because most of the time DMA will be used as a response from a Request. That leaves me to use WdfDmaTransactionInitialize . So far everything is clear to me. I followed the examples closely and started mirroring what it was doing in my code. My driver use DMA to xfer date from my device internally to the driver. I set up the enabler, allocate a common buffer, created a transaction object and came to point where I was about to start the dma trnsaction with a call to WdfDmaTransactionInitialize .

The problem I am currently trying to resolve is how to actually get the mdl pointer WdfDmaTransactionInitialize requires from the common buffer I just allocated. Af first I thought it would be some simple call but I still can’t find anything of that sort after searching for quite a while.

I feel like I am just missing some base concept here.

xxxxx@ddc-web.com wrote:

It seems that WdfDmaTransactionInitializeUsingRequest is directly tied to WDFREQUEST objects which primarily come down through the I/O subsystem. Andthat is fine because most of the time DMA will be used as a response from a Request. That leaves me to use WdfDmaTransactionInitialize . So far everything is clear to me. I followed the examples closely and started mirroring what it was doing in my code. My driver use DMA to xfer date from my device internally to the driver. I set up the enabler, allocate a common buffer, created a transaction object and came to point where I was about to start the dma trnsaction with a call to WdfDmaTransactionInitialize .

The problem I am currently trying to resolve is how to actually get the mdl pointer WdfDmaTransactionInitialize requires from the common buffer I just allocated. Af first I thought it would be some simple call but I still can’t find anything of that sort after searching for quite a while.

I feel like I am just missing some base concept here.

I think you are confusing two concepts.
WdfDmaTransactionInitializeUsingRequest is used when you intend to use
the user’s buffer directly as your DMA buffer. It helps you lock down
the pages, create an MDL, create a scatter/gather list, etc. But if you
are using a common buffer, then you aren’t using the user’s buffer
directly. Instead, you will usually be copying data from the user
buffer into/out of the common buffer.

Both schemes are valid, but they’re handled differently. For common
buffer DMA, you don’t need as much help from KMDF, because the buffer is
known to be physically contiguous already.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim,

I think I am confused also. To, me there seems to be a few things missing that connect these concepts together. Doran said earlier thread that the DMA API is not tied to the I/O subsystem, but so far all the examples in the DDK all use a Request in some way, shape or form. (including the one example of WdfDmaTransactionInitialize) Your response even refers to the complete DMA transaction as being tied to a Request. But, in my case, I am using DMA internally to the driver to get data from my device. There are no requests passed down to me that initiate a DMA transfer.

Saying this, it also seems to me that based on your previous post, that maybe getting a common buffer is not what I want to do becasue WdfDmaTransactionInitialize wants a PMDL. I have come across a function called IoAllocateMDL that I am going to attempt to call on the wdf common buffer I created to get it’s mdl so I can pass it along to WdfDmaTransactionInitialize. But this seems like a lot of work that is contrary to what KMDF has done to most operations you can perform at the driver level, It just feels like I am doing somehting manually that is probably already an automatic operation.

Which leads me to ask this question… What is the best method of using DMA in a driver whose DMA transactions never go up to the user? DMA operations that have nothing to do with user buffers. Perhaps i need to go to some older API functions becasue it seems the newer KMDF implementation doesn’t really help my situation.

Thanks again Tim for your response. This is helping me understand what’s going on tremendously.

xxxxx@ddc-web.com wrote:

Which leads me to ask this question… What is the best method of using DMA in a driver whose DMA transactions never go up to the user? DMA operations that have nothing to do with user buffers. Perhaps i need to go to some older API functions becasue it seems the newer KMDF implementation doesn’t really help my situation.

“Best” is a difficult question to answer.

Does your hardware do scatter/gather? If not, then you don’t really
need any operating system DMA services at all, other than
AllocateCommonBuffer and FreeCommonBuffer. You have a contiguous
physical buffer, you know the physical and virtual addresses and they
aren’t going to change. You have everything you need. Just go pound
the registers.

The DMA services are largely intended to help manage the relatively more
complex task of managing a device that does scatter/gather. The
additional complexity is worth it because you avoid an additional copy.
But if you are going to have an additional copy anyway, then you don’t
gain anything.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

The hardware I am using is a PLX 9056, which is essentially the same hardware used in the PLX example of the DDK (9656). It does support scatter/gather and I was hoping to use it. However, in order to use the DMA API I still need a way to obtain an mdl that is contiguous. Or, do I? Does the type of mdl I feed WdfDmaTransactionInitialize matter?

Aw… PLEASE don’t say that. Please??

The whole DMA abstraction is intended for a whole lot more than facilitating scatter/gather. Consider support for systems with more than 4GB of physical memory from 32-bit busmaster devices. Drivers failing to support this case is a major cause of failures.

The DMA model – in WDM and KMDF – is very easy to follow. The overhead is relatively low. It provides a nice, clean, programming model for a driver.

PLEASE don’t bypass the DMA abstraction. Doing so violates the contract with the HAL and the kernel. It’s no more valid to do this than to directly manipulate the PCI config space registers.

Peter
OSR

Peter, Thanks for answering. You are not the first person to state that I shouldn’t bypass the API.

However, the KMDF implementation does not seem to allow me to implement what I need to do. If you read through the thread you can see the specific points of trying to use API as is that are stumbling blocks for me. Do you have any suggestions as to how to overcome these?

So far, Tim’s suggestion, as “wrong” as it may be seems to be the only response that has actually shown a path to what I need to do.

I think it is time that maybe I give the WDM DMA model more attention and see if it suits my needs a little better.

Are there any dangers of mixing WDM and KMDF implementation?

You can quite happily mix WDM and KMDF. Use KMDF for request handling, PnP/Power support, serialization of your code… Use WDM for the DMA model if you wish. That’s certainly not problem.

But, don’t necessarily give up on the DMA Enabler functions in KMDF: WdfDmaTransactionInitialize doesn’t require a request (as you noted above, I think)… Then you call WdfDmaTransactionExecute, etc… No Request object required…

I realize it’s kinda hard to figure the DMA part of equation out while you’re simultaneously trying to figure out everything else about Windows drivers. Trust me, though… if you stick with KMDF you’ll have a heck of a lot less to figure out :slight_smile:

Peter
OSR

> I think I just need some pointers on how to re-align my thinking since I am
in the

Windows world. Can these DMA API’s good actually alleviate the need for
obtaining contiguous memory?

Usual NT kernel/WDM (not KMDF) has the 2 sets of DMA APIs: common-buffer and
MDL-based.

Common-buffer APIs allows you to allocate the DMA-safe kernel nonpaged memory.
The memory is contiguous both virtual address side and DMA side, you get the
starting DMA address for it, and the PVOID to it.

MDL-based APIs allow you to run the DMA over the user app’s buffer directly.

Some words about MDLs.

In Windows kernel, MDL is the buffer descriptor which describes the buffer in
terms of the underlying physical pages
. Its structure is “struct _MDL”,
followed by the array of physical page numbers.

To create a MDL, one of the following (I simplify a bit) can be used:

  1. IoAllocateMdl+MmProbeAndLockPages, to create a MDL around user or kernel
    paged buffer. MmProbeAndLockPages can fail by throwing a __try/__except
    exception. It actually locks the physical pages in memory, and puts their page
    numbers to the array at MDL’s tail. MmUnlockPages+IoFreeMdl is the undo.

  2. IoAllocateMdl+MmBuildMdlForNonPagedPool, to create a MDL around the kernel
    nonpaged memory. It cannot fail, and has the simple IoFreeMdl as undo.

  3. IoAllocateMdl+IoBuildPartialMdl, to create a sub-MDL over existing parent
    MDL. NOTE: you must guarantee that you destroy the sub-MDL before the parent
    MDL can be destroyed, violation of this rule gives you PFN_LIST_CORRUPT BSOD.

How can you use a MDL?

  1. As a parent MDL in IoBuildPartialMdl
  2. Map these pages to kernel addresses available from any context (ISRs
    included) by MmGetSystemAddressForMdlSafe. This is usually used for ISR-driven
    PIO hardware.
  3. Use it for DMA.

Also note that, for DO_DIRECT_IO devices, the Io manager automatically does
IoAllocateMdl+MmProbeAndLockPages around the app’s buffer and sends this MDL to
you as Irp->MdlAddress. So is done in METHOD_xxx_DIRECT IOCTLs.

So, Irp->MdlAddress is a MDL pre-built by Io and describing the app’s buffer it
passed to Read/WriteFile.

Now the MDL-based DMA itself. These routines translate the MDL, actually its
tail with physical page numbers, to a scatter-gather list - a list of
(StartDmaAddress, ByteLength) pairs. You can then send this list to your
hardware if you need so.

MDL-based DMA API has 2 APIs actually: old and new. The old is
AllocateAdapterChannel (allocates the necessary resources for MDL -> SGL
mapping) and MapTransfer (each next MapTransfer call returns you 1 next SGL
element - PhysicalAddress and length). FlushAdapterBuffers is the undo.

The new API is GetScatterGatherList and BuildScatterGatherList. They return you
the fully built SGL as a single entity in a single call.

You can use any of these APIs in your driver. Note that all of them need
IoGetDmaAdapter first.

As about Linux - I remember seeing “struct kiobuf” in some Linux kernel
sources, which was IIRC the same as MDL in Windows.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

As far as I can tell, nobody actually answered your question- my apologies if I missed a post somewhere.

You can use an MDL for a non-contiguous buffer in WdfDmaTransactionInitialize. It would be a rather worthless DDI if you could not ;->.

I can’t think, off-hand, of a reason to use WDM DMA in a KMDF driver- the KMDF model is quite complete and even adds some good value [in aligning buffers under unusual circumstances]. If you do have a case that requires this, I’d like to understand it, if I may (you can email me directly).

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@ddc-web.com
Sent: Friday, July 27, 2007 12:29 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] DMA implementation using KMDF

The hardware I am using is a PLX 9056, which is essentially the same hardware used in the PLX example of the DDK (9656). It does support scatter/gather and I was hoping to use it. However, in order to use the DMA API I still need a way to obtain an mdl that is contiguous. Or, do I? Does the type of mdl I feed WdfDmaTransactionInitialize matter?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi Eric,

I have almost the same task as yourself :
I have non-scatter/gather device and I should provide continuous DMA of audio data to driver allocated memory and video data to user allocated GPU memory.
I’m using the same function for both with a additional WdfCommonBufferCreate() call for audio.

In my case it’s a calls:
WdfDmaEnablerCreate
WdfCommonBufferCreate (if necessary)
IoAllocateMdl
MmBuildMdlForNonPagedPool
WdfDmaTransactionCreate
WdfDmaTransactionInitialize
WdfDmaTransactionExecute

It works pretty well for both DMA and I don’t have any problem with it

Regards,
Igor

Thank you for your answer Maxim. I ahve some further questions. Since WdfDmaTransactionInitialize , requires an MDL. There should be nothing wrong with calling IoAllocateMdl on a WDF Common Buffer object right?

Here is the harder question. If my first questions answer is yes then I am thinking of doing somehting like this. My implementation requires a memory pool… could I create a lookaside list and specify the AllocateCommonBuffer and FreeCommonBuffer as the the memory allocation/free functions when initializing the list?

This should, in my theory, give me a pool of DMA’able memory.

> This should, in my theory, give me a pool of DMA’able memory.

No, write your own list allocator implementation over the common buffer, the
OS-provided will not work.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Igor,

I did not see your post… I apologize. Our implementations are very similiar. My initial code looks almost exactly to your call sequence. I am starting feel a lot better since you have this working. Right now I encountering an error on IoAllocateMdl for the WdfCommonBuffers I created. I am sure it is something very silly.

Bob ,
Based on Igor’s call sequence(and mine) there is no need for the buffer source to be a WDFCommonBuffer? Any buffer will do as long as I feed it to IoAllocateMdl andMmBuildMdlForNonPagedPool ?

Common buffer guarantees physically continguous buffers, memory
allocated from pool does not give you that guarantee. If your dma
engine can handle non contiguous physical memory, any buffer will work.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@ddc-web.com
Sent: Monday, July 30, 2007 9:06 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] DMA implementation using KMDF

Igor,

I did not see your post… I apologize. Our implementations are very
similiar. My initial code looks almost exactly to your call sequence. I
am starting feel a lot better since you have this working. Right now I
encountering an error on IoAllocateMdl for the WdfCommonBuffers I
created. I am sure it is something very silly.

Bob ,
Based on Igor’s call sequence(and mine) there is no need for the buffer
source to be a WDFCommonBuffer? Any buffer will do as long as I feed it
to IoAllocateMdl andMmBuildMdlForNonPagedPool ?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer