Benchmarking DMA performance through DeviceIoControl

Hi all, I am currently converting our WDM driver for a PCIe device to KMDF.
In the WDM driver we used ReadFile/WriteFile for DMA transfers, but we are converting this now to use a custom IOCTL call in the hope to pass more information in/out of the driver for benchmarking purposes. This is a feature we want to use to check performance in general, but I already would like to use this for debugging why the KMDF version is about 10% slower.

I finally figured out that for the IOCTL calls I have to use METHOD_OUT_DIRECT for DMA reads and METHOD_IN_DIRECT for DMA writes so that the userland buffers get mapped correctly.
For these IOCTLs the “input” buffer is copied by KMDF from userland to the driver and I can’t pass back any information through that buffer. The output buffer is automagically mapped/locked via KMDF so that I can easily get the SG lists, but that only maps the actual buffer for DMA transfers where I can’t add additional information.

Is there any way to pass back additional info or is the only way to switch to using METHOD_NEITHER for the IOCTL calls and having to do all the mapping/locking of the buffers myself?

Also, what KMDF/WDM functions should I use to benchmark all the various steps for the DMA? Is there something reliable like QueryPerformanceCounter in KMDF?

Thanks.

If you call WdfRequestRetrieveOutputBuffer you can return whatever you want in the output buffer…

Peter
OSR
@OSRDrivers

xxxxx@osr.com wrote:

If you call WdfRequestRetrieveOutputBuffer you can return whatever you want in the output buffer…

Yes, but he’s using his output buffer as the DMA source/sink. He needs
an ioctl with three buffers: one buffered in, one buffered out, and one
direct in/out.

In the worst case, I suppose he could pass a user-mode pointer in the
buffered input buffer. That’s no worse than thinking about METHOD_NEITHER.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thanks, guys.
Tim is right, I probably have to use METHOD_NEITHER and map both the DMA
buffer and the other buffer for passing back information myself.
Do you think this adds a lot of overhead to the actual DMA?
I mean, the DMA buffer has to be mapped either way; with METHOD_NEITHER
I’ll just have to do it myself rather than relying on KDMF for that.
But mapping a second buffer, is that adding a noticable amount of overhead
to the whole DMA transaction? We are trying to squeeze out as much
performance as we can.
I guess I could just not map the second buffer if the performance info we
are trying to pass back is not required; the info is usually not required
in most instances (only for debugging and bandwidth checking).

Any suggestions regarding QueryPerformanceCounter in KMDF?

Thanks.

On 25 October 2016 at 02:14, Tim Roberts wrote:

> xxxxx@osr.com wrote:
> > If you call WdfRequestRetrieveOutputBuffer you can return whatever you
> want in the output buffer…
>
> Yes, but he’s using his output buffer as the DMA source/sink. He needs
> an ioctl with three buffers: one buffered in, one buffered out, and one
> direct in/out.
>
> In the worst case, I suppose he could pass a user-mode pointer in the
> buffered input buffer. That’s no worse than thinking about METHOD_NEITHER.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:></http:>

If you really have command, response, and data buffers, you can combine the command and response buffers into one. Search on this list for posts with my name attached from about 2005-2009 for threads addressing that particular problem. If the actual address is a Seagate.com address, you’ve found the right thread(s).

I think we ended up using METHOD_BUFFERED and embedding the data buffer pointer.

Phil

Not speaking for LogRhythm
Phil Barila | Senior Software Engineer
720.881.5364 (w)
[https://ecrm.logrhythm.com/rs/050-UWT-888/images/LR_email.jpg]
A LEADER in Gartner’s SIEM Magic Quadrant (2012-2016)
Highest Score in Gartner’s 2015 SIEM Critical Capabilities Report
A CHAMPION in Info-Tech Research Group’s 2015 SIEM Vendor Landscape Report
SC Labs RECOMMENDED in 2016 SIEM and UTM Group Test | 5-Star Rating (2009-2016)

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Bragulla
Sent: Wednesday, October 26, 2016 5:48 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Benchmarking DMA performance through DeviceIoControl

Thanks, guys.
Tim is right, I probably have to use METHOD_NEITHER and map both the DMA buffer and the other buffer for passing back information myself.
Do you think this adds a lot of overhead to the actual DMA?
I mean, the DMA buffer has to be mapped either way; with METHOD_NEITHER I’ll just have to do it myself rather than relying on KDMF for that.
But mapping a second buffer, is that adding a noticable amount of overhead to the whole DMA transaction? We are trying to squeeze out as much performance as we can.
I guess I could just not map the second buffer if the performance info we are trying to pass back is not required; the info is usually not required in most instances (only for debugging and bandwidth checking).

Any suggestions regarding QueryPerformanceCounter in KMDF?

Thanks.

On 25 October 2016 at 02:14, Tim Roberts > wrote:
xxxxx@osr.commailto:xxxxx wrote:
> If you call WdfRequestRetrieveOutputBuffer you can return whatever you want in the output buffer…

Yes, but he’s using his output buffer as the DMA source/sink. He needs
an ioctl with three buffers: one buffered in, one buffered out, and one
direct in/out.

In the worst case, I suppose he could pass a user-mode pointer in the
buffered input buffer. That’s no worse than thinking about METHOD_NEITHER.


Tim Roberts, xxxxx@probo.commailto:xxxxx
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:

— NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! Details at To unsubscribe, visit the List Server section of OSR Online at</http:></http:></http:></mailto:xxxxx></mailto:xxxxx>

Tim Bragulla wrote:

Tim is right, I probably have to use METHOD_NEITHER and map both the
DMA buffer and the other buffer for passing back information myself.

Actually, Tim did NOT suggest METHOD_NEITHER. Tim suggested passing a
user-mode pointer in your input buffer. That way, the I/O system maps
the output buffer for you.

Do you think this adds a lot of overhead to the actual DMA?

Insignificant. I assume this is a small buffer, so you’re talking about
locking and unlocking one page of memory.

I mean, the DMA buffer has to be mapped either way; with
METHOD_NEITHER I’ll just have to do it myself rather than relying on
KDMF for that.
But mapping a second buffer, is that adding a noticable amount of
overhead to the whole DMA transaction? We are trying to squeeze out as
much performance as we can.

Premature optimization is the root of all evil. First, make it work.
Then, make it work fast.

Any suggestions regarding QueryPerformanceCounter in KMDF?

Did you Google for this? QueryPerformanceCounter in user mode ends up
calling KeQueryPerformanceCounter in kernel mode.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hi Tim,

yes, sorry about misinterpreting your email.
After I wrote it I came to the conclusion that I will
use METHOD_OUT_DIRECT/METHOD_IN_DIRECT so that at least the user buffer for
the DMA gets automatically mapped.
And I’ll pass in another user pointer with the input buffer and map it if
this is required (which should only be the case for debugging).
That buffer should be very small as it won’t be passing back a lot of data.

I know I can use KeQueryPerformanceCounter, but also heard of RDSTC, so I
wasn’t sure what the recommended method would be; should have been clearer
about it…

Thanks for your help; very much appreciated!

On 27 October 2016 at 10:21, Tim Roberts wrote:

> Tim Bragulla wrote:
> >
> > Tim is right, I probably have to use METHOD_NEITHER and map both the
> > DMA buffer and the other buffer for passing back information myself.
>
> Actually, Tim did NOT suggest METHOD_NEITHER. Tim suggested passing a
> user-mode pointer in your input buffer. That way, the I/O system maps
> the output buffer for you.
>
>
> > Do you think this adds a lot of overhead to the actual DMA?
>
> Insignificant. I assume this is a small buffer, so you’re talking about
> locking and unlocking one page of memory.
>
>
> > I mean, the DMA buffer has to be mapped either way; with
> > METHOD_NEITHER I’ll just have to do it myself rather than relying on
> > KDMF for that.
> > But mapping a second buffer, is that adding a noticable amount of
> > overhead to the whole DMA transaction? We are trying to squeeze out as
> > much performance as we can.
>
> Premature optimization is the root of all evil. First, make it work.
> Then, make it work fast.
>
>
> > Any suggestions regarding QueryPerformanceCounter in KMDF?
>
> Did you Google for this? QueryPerformanceCounter in user mode ends up
> calling KeQueryPerformanceCounter in kernel mode.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:></http:>

Tim Bragulla wrote:

I know I can use KeQueryPerformanceCounter, but also heard of RDSTC,
so I wasn’t sure what the recommended method would be; should have
been clearer about it…

On some systems, KeQueryPerformanceCounter actually returns the rdtsc
value, although these days I believe it usually uses a high-performance
counter on the motherboard. It depends on the motherboard capabilities.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.