Accessing MDL Created By WdfRequestProbeAndLockUserBufferForRead (/Write)

ajitabhs · March 16, 2022, 12:45am

Hi Guys,
I was just curious about something. I understand that WdfRequestProbeAndLockUserBufferForRead does the following things:-

Allocate MDL for the user mode buffer
Call MmProbeAndLockPages
Call MmGetSystemAddressForMdlSafe
Create WDF Memory object to represent the buffer
Add the WDF memory object as a child of the WdfRequest
Set the WDF Memory object to point to the previously allocated MDL.

Now my question is that once you have done “WdfRequestProbeAndLockUserBufferForRead”, how do I access the MDL? I need the MDL so that I can used the DMA APIs.
Do I just use WdfRequestRetrieveInputWdmMdl and WdfRequestRetrieveOutputWdmMdl functions?
Or can I just use WdfDmaTransactionInitializeUsingRequest [Because the everything is all set in the WdfRequest itself??] ??

Please let me know,
Thanks
Aj -

ajitabhs · March 16, 2022, 2:20am

Just to clarify a little more,

The IOCTLS are METHOD_BUFFERED
I am using WdfRequestRetrieveInputBuffer and WdfRequestRetrieveOutputBuffer to get the pointer to input and output memory.
The Input memory and output memory has embedded pointers in them. The pointer in the input buffer is “Read pointer” and the pointer in the output buffer is the “write pointer”.
I am using WdfRequestProbeAndLockUserBufferForRead on the “Input Pointer” and WdfRequestProbeAndLockUserBufferForWrite on the “Write Pointer”

All of this is being done in “EvtIoInCallerContext”.

Do I just use WdfRequestRetrieveInputWdmMdl and WdfRequestRetrieveOutputWdmMdl functions to get the MDL and then use DMA Apis or should I directly use WdfDmaTransactionInitializeUsingRequest ?

Thanks,
Aj-

Tim_Roberts · March 16, 2022, 3:38am

The input memory and output memory had embedded pointers in them.

I cannot emphasize strongly enough (and others will join me) what a phenomenally bad idea this is. Really. Don’t do it. Redesign things to use METHOD_XX_DIRECT ioctls instead. Are you really prepared to handle calls from 32-bit processes, which have different pointer sizes?

You can’t use WdfDmaTransactionInitializeUsingRequest, because neither of the buffers you want to use are in the IRP/WdfRequest.

ajitabhs · March 16, 2022, 5:33am

Mr. Tim,

I am completely aware of the concerns that you have raised and I do not myself like the METHOD_NEITHER. The problem here is that we are writing a accelerator for a open source library. The APIs exposed by that library cannot be changed. The interfaces that we expose to that library has to adhere to some limitations. For example, the library needs to pass 4 pieces of information

A input buffer
Metadata about the input buffer
A output buffer
Metadata about the output buffer.

With “DeviceIoControl” we can only pass two pointers (A InputBuffer pointer and a OutputBuffer pointer)

Now, I have already proposed that we can have Input buffer and Metadata of that buffer combined in to a single buffer and send across. But “I have been told” that it is a huge limitation from the perspective of interfaces needed by the library which we are accelerating. (Basically The Input buffer has and its metadata have different lifespans). Similar is the explanation given to me for the Output buffer. Copy is not an option.

From my experience, I could say (unfortunately) that the discussion ends right here.

Any suggestions in this matter are highly valuable and appreciated.

So my question again is If I cannot use WdfDmaTransactionInitializeUsingRequest, will I need to use WdfRequestRetrieveInputWdmMdl and WdfRequestRetrieveOutputWdmMdl functions to get the MDL and then use DMA Apis?

Please let me know.
Thanks,
Aj

Peter_Viscarola_OSR · March 16, 2022, 5:33pm

/shakes head

I’ve written the driver for an accelerator that supports a standard library (OpenCL). It was a very complex project, that required some really, difficult, architecture and implementation. I don’t know if you’re facing anything similar, but let me remind you that good architecture and design will result in easier/better/more-reliable implementation.

So, I urge you – most ardently – to be sure you’ve got an optimum architecture and a solid design… not something based on some guesses, some ideas, what some guys tell you in a forum, and what some guy you work for/with telling you some bullshit (if he’s so smart, why isn’t HE designing the driver).

In our project, we also have buffers (optionally) built in user-mode with pointers passed into the driver in InCallerContext… this was simply not practically avoidable. We did all the necessary MDL allocation and creation in WDM – obviously don’t forget to do your ProbeForRead/Write. If you later need to create a DMA transaction using this MDL, it’s simply a matter of calling WdfDmaTransactionInitialize.

But (referring back to your previous thread) you so very definitely do not want to be doing anything other than building your MDL in the context of the calling thread.

Peter

ajitabhs · March 16, 2022, 6:19pm

Hi Peter,
I agree with you here and I will not be using the DMA APIs in the context of calling thread. This is something which is in my hands. Not having embedded pointers in the input and the output buffers is something which is not, as I explained earlier.

So back to my question.

I have used WdfRequestProbeAndLockUserBufferForRead (and Write) for probing and locking the pages. How do I get the MDL for this. I know that WDF has internally created the MDL for me, I just don’t know how to get access to that MDL. From what you are saying it seems like I will need to use the WDM way of explicitly probing and locking the pages. allocating the MDL and then using the DMA APIs.

Please let me know.
Aj

Tim_Roberts · March 16, 2022, 7:03pm

It’s interesting. I checked the KMDF source code. The ProbeAndLock functions “associate” the MDL with the WDFMEMORY object they create, so that the WDFMEMORY refers to the buffer through that MDL, and there’s even a comment that says:

        // Some DMA drivers may just retrieve the MDL from the WDFMEMORY
        // and not even attempt to access the underlying bytes, other
        // than through hardware DMA.

but I can’t find any API anywhere that actually allows a driver to “retrieve the MDL from the WDFMEMORY”, and I don’t see anything that allows us to initialize a WDFDMATRANSACTION using a WDFMEMORY. Maybe I missed it, but it seems like a gap.

ajitabhs · March 16, 2022, 7:52pm

Thanks Tim. I could not find anything either.
Seems like I am the first one trying to DMA in to the user mode buffers using KMDF.
Will keep looking or else will move to WDM to get to the MDL, which I hate to but looks like a last resort.
Really appreciate the help!!
Aj

Peter_Viscarola_OSR · March 16, 2022, 8:41pm

WdfRequestProbeAndLockUserBufferForRead

I am so lost in this thread. Aren’t you dealing with a user data buffer, the pointer to and length for which are passed-in as an argument to your driver? If that’s the case, then you most certainly wouldn’t be calling WdfRequestProbeAndLockUserBufferForRead, which assumes a WDFREQUEST.

??

Peter

ajitabhs · March 16, 2022, 9:03pm

@Peter,
Let me clarify a bit. I have a IOCTL which is METHOD_BUFFERED. Using DeviceIoControl we pass two buffers. One is the “input buffer” and other is the “output buffer”. The “input buffer” has a embedded “Read Pointer” and “Read Length”. Similarly the “output buffer” has the embedded “Write Pointer” and “Write Length” in it.

When the call comes to the EvtIoInCallerContext callback, I do the following:-

WdfRequestRetrieveInputBuffer : To get access to the [DeviceIoControl] “Input Buffer” .
WdfRequestRetrieveOutputBuffer : To Get access to the [DeviceIoControl]“Output Buffer”.
Once I have access to these buffers I need to probe and lock the embedded “Read Pointer” in the “Input Buffer” and “Write Pointer” in the “output buffers”.

I use WdfRequestProbeAndLockUserBufferForRead to Probe and Lock the embedded “Read Pointer” in the “Input Buffer” and WdfRequestProbeAndLockUserBufferForWrite to Probe and Lock the embedded “Write Pointer” in the “Output Buffer”.

These Probe and Lock APIs are document to say that these can/should be used to probe and lock the embedded pointers.

Here:-

“For example, an I/O control code that uses the buffered access method might pass a structure that contains an embedded pointer to a user-mode buffer. In such a case, the driver can useWdfRequestProbeAndLockUserBufferForRead to obtain a memory object for the buffer.”

But it does not says how do I get access to MDL for DMA after I get the memory object.

Hope it makes it a little clear.
Aj

Peter_Viscarola_OSR · March 16, 2022, 9:12pm

OK… Now I understand.

First:

Similarly the “output buffer” has the embedded “Write Pointer” and “Write Length” in it.

That’s just wrong. You want to pass BOTH of these pointers in via the InBuffer.

There’s extra meta-data that’s passed as well, right? In addition to just pointer and length??? How much meta-data??? Isn’t there a better way to be doing this? Like, passing the read pointer/length as the OutBuffer and the write pointer/length as the InBuffer… and passing that meta-data in some other way? I dunno, maybe embed it in the InBuffer? Send it as a separate IOCTL? Some other way??

I’m note entirely sure you’ve done the best design, as I perhaps alluded to earlier. And I think you’re making things too hard.

If you really, reallly, really need to do it the way you’ve described… Why not just get the buffer pointers and lengtsh (as you’re doing now), and do what I do: Probe and lock the buffer and build the MDL using WDM? Isn’t that, ah, MUCH easier??

Do you WANT the lifetime of this buffer to be the same as the REQUEST that it came with?? If so, store the PMDL in a Context associated with the Request and tear them down in the Request Destroy callback.

Peter

(Edited to add a bunch of stuff)

ajitabhs · March 16, 2022, 9:36pm

Yes, I think that going the WDM route is the way to go for this and yes, I will register the Destroy callback for this. This is the route that I am going to take. But I think it would have been a lot easier if I could access the WDFMEMORY MDL somehow. Just one API and I do not have to mix WDM with WDF.

I do believe that the documentation is a little misleading about using “WdfRequestProbeAndLockUserBufferForRead”

The reason why I have two structures “Input Buffer” and “Output Buffer” , with two embedded pointers is because I have one fire thread and one complete thread. The fire thread (and the input buffer) unblocks as soon as the DeviceIoControl returns. The “Output buffer” unblocks when the overlapped result says that the operation is completed, reducing the (life time) coupling between the input and the output buffers. But I do not see why we cannot achieve the same effect with passing the pointers in the Input buffer itself. Will make this modification right away.

MBond2 · March 17, 2022, 1:20am

you are certainly making a mistake. the idea of a ‘fire thread’ and a ‘complete thread’ demonstrate that most conclusively. Then there is the very strange overlapped result logic

ajitabhs · March 17, 2022, 3:49am

After thinking a little bit I realized the mistake. Now making amendments to the design. Basically lpInBuffer has to have all the parameters that are needed for the command to complete and lpOutabuffer needs to have everything that I need to return from that command. I forgot this “The IoManager allocates a system buffer which is the max(inbuffersize, outbuffersize) and then uses the same for sending and returning the parameters from/to the driver”.
Don’t know how I forgot such a small thing. That’s a side effect of being away from writing drivers for sometime
It has nothing to do with the life cycles of the buffer. The buffers in which we are doing the DMA needs to be alive for the entire lifecycle of the wdfrequest anyway.

Thanks everyone for all the help and a great discussion.

Aj

Peter_Viscarola_OSR · March 17, 2022, 12:51pm

I’m going to advise you, again, to reconsider your design. If, indeed, you have separate “fire” and “complete” lifetimes… with the ability to have separate notifications, you really want two separate IOCTLs, one for “fire” and another for “complete” — that’d solve this all very nicely. You could even tie them together, right? Requiring a “complete” before a paired “fire” or something…. Remember: This isn’t Linux. In Windows , IOCTLs can be asynchronous. This is the way Windows drivers are intended to work.

Peter