WDFQUEUEs and KMDF DMA questions

OSR_Community_User · October 5, 2006, 3:41am

A while ago, I earned the coveted PeterGV “the most interesting post on
NTDEV in 3 weeks” award by asking about searching WDFQUEUEs for requests,
since the docs clearly say that it’s not kosher to simply call
WdfIoQueueRetrieveFoundRequest() without first calling
WdfIoQueueFindRequest(). Since we have a fixed hardware capacity, and the
tag is essentially the index into the array of hardware resources, an array
of requests is the most natural structure to use for our requests that are
inflight on the hardware. In all of the back and forth, Doron gave me our
current arrangement, which is described below.

We ended up with two queues, the default Q that all IO goes through, which
we configure as parallel, so we can have maximal parallel pre-processing
(BuildSGL, translate to HBA-specific form) as we implement the KMDF version
of Extreme DMA, and a “ReadyForTheHardwareQ” (RFHQ) that is also parallel,
which has a single EvtIoDefault callback that marks the request
cancellable, then hangs it on the hardware. All of this works great, until
we get to the state that the hardware is almost full, and the app sends
more IO than will fit into the hardware.

An example of this, SATA NCQ has the capacity of 32 tagged commands, and
let’s suppose we get to 31 running, and 3 more arrive from the host app.
Since my RFHQ is parallel, all three pop out at *about* the same time, and
the first one to grab the lock wins. The other two invocations of
EvtIoDefault have no place to put their requests, since I can’t put them
back on the RFHQ they came out of, since it’s parallel, and all the docs
say WdfRequestRequeue is only valid on a manual Q.

Since I am using a parallel queue for the RFHQ, it appears my options are
to either forward any excess (no room at the inn) IOs to a manual holding
queue, which I would then have to poll. If I’m going to do that, I might
as well make the RFHQ a manual queue, and be done with it.

Alternatively, I could stop the RFHQ by calling WdfIoQueueStop(), then
forward them back into the RFHQ, and restart it by calling
WdfIoQueueStart() when there is room on the hardware, but that puts the
deferred requests at the tail of the Q, and I can easily imagine a
pathological case where some IO(s) get(s) stuck in a loop like that if the
timing and phase of the moon are correct.

Question #1: Do I understand the usage and limitations of parallel
WDFQUEUEs correctly at this point? Is my overcapacity scenario correct?

KMDF enhancement request #1. Add WdfRequestRequeue capability to WDFQUEUES
configured as parallel. It wouldn’t be burdensome to require that the
queue be stopped first, in order to avoid the busy looping and potential
recursion of repeated requeues.

Question #2: Given that I have a fixed hardware limit, is my conclusion
that a manual Q is the right thing for the RFHQ the correct conclusion, or
have I missed something?

I mentioned that we’ve approximated the KMDF version of Extreme DMA. We’re
diving to WDM for the DMA Adapter and using BuildScatterGatherList to fill
a pre-allocated SGL. This doesn’t appear to be possible within KMDF. The
reason this doesn’t appear to be possible is that the EvtProgramDma
callback is supposed to return TRUE when it starts the IO on the hardware,
and FALSE otherwise. I had a lot less discomfort calling BuildSGL, because
the callback for that one is a VOID function, so there was no need to be
dishonest when disconnecting the callback from the act of starting it on
the hardware. I would probably be willing to trade the pre-allocated SGL
for the if I could defer the hardware start (hang it on my RFHQ) in
EvtProgramDma.

Question #3: Do I understand the KMDF DMA contract right? It isn’t kosher
to say TRUE when I may cancel or abort the IO without actually starting the
DMA on the hardware?

KMDF enhancement request #2: Allow pre-allocated SGLs.

KMDF enhancement request #3: Allow decoupling the SGL construction from
the DMA start.

I really do like the KMDF a lot. But I’m also pushing the limits of
“playing nice”, because I’m doing a lab harness, not a production driver,
so I recognize that my needs may not be consistent with some KMDF design
requirements.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842

OSR_Community_User · October 5, 2006, 10:38am

Having a manual holding queue for requests that can’t fit on the
hardware sounds like an okay solution. There’s a callback for empty to
non-empty transitions so you don’t have any need to poll the queue.

It sounds like making the RFQ a manual queue would be sensible. If the
first thing you do on the other side is to grab a lock and find a slot
in your request table then there’s little point in having multiple
threads going through the queue at the same time anyway as you’re just
increasing contention on your lock.

-p

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@seagate.com
Sent: Thursday, October 05, 2006 12:40 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] WDFQUEUEs and KMDF DMA questions

A while ago, I earned the coveted PeterGV “the most interesting post on
NTDEV in 3 weeks” award by asking about searching WDFQUEUEs for
requests,
since the docs clearly say that it’s not kosher to simply call
WdfIoQueueRetrieveFoundRequest() without first calling
WdfIoQueueFindRequest(). Since we have a fixed hardware capacity, and
the
tag is essentially the index into the array of hardware resources, an
array
of requests is the most natural structure to use for our requests that
are
inflight on the hardware. In all of the back and forth, Doron gave me
our
current arrangement, which is described below.

We ended up with two queues, the default Q that all IO goes through,
which
we configure as parallel, so we can have maximal parallel pre-processing
(BuildSGL, translate to HBA-specific form) as we implement the KMDF
version
of Extreme DMA, and a “ReadyForTheHardwareQ” (RFHQ) that is also
parallel,
which has a single EvtIoDefault callback that marks the request
cancellable, then hangs it on the hardware. All of this works great,
until
we get to the state that the hardware is almost full, and the app sends
more IO than will fit into the hardware.

An example of this, SATA NCQ has the capacity of 32 tagged commands, and
let’s suppose we get to 31 running, and 3 more arrive from the host app.
Since my RFHQ is parallel, all three pop out at *about* the same time,
and
the first one to grab the lock wins. The other two invocations of
EvtIoDefault have no place to put their requests, since I can’t put them
back on the RFHQ they came out of, since it’s parallel, and all the docs
say WdfRequestRequeue is only valid on a manual Q.

Since I am using a parallel queue for the RFHQ, it appears my options
are
to either forward any excess (no room at the inn) IOs to a manual
holding
queue, which I would then have to poll. If I’m going to do that, I
might
as well make the RFHQ a manual queue, and be done with it.

Alternatively, I could stop the RFHQ by calling WdfIoQueueStop(), then
forward them back into the RFHQ, and restart it by calling
WdfIoQueueStart() when there is room on the hardware, but that puts the
deferred requests at the tail of the Q, and I can easily imagine a
pathological case where some IO(s) get(s) stuck in a loop like that if
the
timing and phase of the moon are correct.

Question #1: Do I understand the usage and limitations of parallel
WDFQUEUEs correctly at this point? Is my overcapacity scenario correct?

KMDF enhancement request #1. Add WdfRequestRequeue capability to
WDFQUEUES
configured as parallel. It wouldn’t be burdensome to require that the
queue be stopped first, in order to avoid the busy looping and potential
recursion of repeated requeues.

Question #2: Given that I have a fixed hardware limit, is my conclusion
that a manual Q is the right thing for the RFHQ the correct conclusion,
or
have I missed something?

I mentioned that we’ve approximated the KMDF version of Extreme DMA.
We’re
diving to WDM for the DMA Adapter and using BuildScatterGatherList to
fill
a pre-allocated SGL. This doesn’t appear to be possible within KMDF.
The
reason this doesn’t appear to be possible is that the EvtProgramDma
callback is supposed to return TRUE when it starts the IO on the
hardware,
and FALSE otherwise. I had a lot less discomfort calling BuildSGL,
because
the callback for that one is a VOID function, so there was no need to be
dishonest when disconnecting the callback from the act of starting it on
the hardware. I would probably be willing to trade the pre-allocated
SGL
for the if I could defer the hardware start (hang it on my RFHQ) in
EvtProgramDma.

Question #3: Do I understand the KMDF DMA contract right? It isn’t
kosher
to say TRUE when I may cancel or abort the IO without actually starting
the
DMA on the hardware?

KMDF enhancement request #2: Allow pre-allocated SGLs.

KMDF enhancement request #3: Allow decoupling the SGL construction from
the DMA start.

I really do like the KMDF a lot. But I’m also pushing the limits of
“playing nice”, because I’m doing a lab harness, not a production
driver,
so I recognize that my needs may not be consistent with some KMDF design
requirements.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Beverly_Brown · October 5, 2006, 10:57am

On 10/5/06, Peter Wieland wrote:
> Having a manual holding queue for requests that can’t fit on the
> hardware sounds like an okay solution. There’s a callback for empty to
> non-empty transitions so you don’t have any need to poll the queue.

There is? I didn’t see that in the current (KMDF 1.1) definition of
WDF_IO_QUEUE_CONFIG. Where is it hiding?

Beverly

OSR_Community_User · October 5, 2006, 11:21am

Look at WdfIoQueueReadyNotify. That’s how you set the callback Peter
mentioned.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of “Beverly Brown”

Sent: Thursday, October 05, 2006 8:57 AM

To: “Windows System Software Devs Interest List”

Subject: Re: [ntdev] WDFQUEUEs and KMDF DMA questions

On 10/5/06, Peter Wieland < xxxxx@windows.microsoft.com> wrote:

> Having a manual holding queue for requests that can’t fit on the

> hardware sounds like an okay solution. There’s a callback for empty to

> non-empty transitions so you don’t have any need to poll the queue.

There is? I didn’t see that in the current (KMDF 1.1) definition of

WDF_IO_QUEUE_CONFIG. Where is it hiding?

Beverly

Jake_Oshins · October 6, 2006, 12:38am

I’ve been working with WDFQUEUEs for a while now and I’ve developed my own
opinions about how to structure my code in these sorts of situations. So,
for all to comment on, are Jake’s rules of WDFQUEUE usage:

Sort your I/O at the top level and forward to a set of second-level
*manual* queues. You seem to be doing this already.
Set the callback telling you when a manual queue goes non-empty just as
Peter suggests below.
Once you’ve retrieved a request from that manual queue, never put it
back into any WDFQUEUE ever again.

Way too much of my effort (and effort from other people on this list) has
been wasted finding a reasonable way to search queues for requests that have
been partially processed. Furthermore, the cancellation model in KMDF,
which is really quite nice, is completely undermined if you put a partially
processed request back into a queue, which automatically makes it cancelable
whether you like it or not.

I find that my code is much more manageable if I just put partially
processed requests on linked lists, using the request context structures for
the list entries. If the request is in a state where it needs to be
cancelable, then call WdfRequestMarkCancelable.

Jake Oshins
Windows Kernel Team

“Peter Wieland” wrote in message
news:xxxxx@ntdev…
Having a manual holding queue for requests that can’t fit on the
hardware sounds like an okay solution. There’s a callback for empty to
non-empty transitions so you don’t have any need to poll the queue.

It sounds like making the RFQ a manual queue would be sensible. If the
first thing you do on the other side is to grab a lock and find a slot
in your request table then there’s little point in having multiple
threads going through the queue at the same time anyway as you’re just
increasing contention on your lock.

-p

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@seagate.com
Sent: Thursday, October 05, 2006 12:40 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] WDFQUEUEs and KMDF DMA questions

A while ago, I earned the coveted PeterGV “the most interesting post on
NTDEV in 3 weeks” award by asking about searching WDFQUEUEs for
requests,
since the docs clearly say that it’s not kosher to simply call
WdfIoQueueRetrieveFoundRequest() without first calling
WdfIoQueueFindRequest(). Since we have a fixed hardware capacity, and
the
tag is essentially the index into the array of hardware resources, an
array
of requests is the most natural structure to use for our requests that
are
inflight on the hardware. In all of the back and forth, Doron gave me
our
current arrangement, which is described below.

We ended up with two queues, the default Q that all IO goes through,
which
we configure as parallel, so we can have maximal parallel pre-processing
(BuildSGL, translate to HBA-specific form) as we implement the KMDF
version
of Extreme DMA, and a “ReadyForTheHardwareQ” (RFHQ) that is also
parallel,
which has a single EvtIoDefault callback that marks the request
cancellable, then hangs it on the hardware. All of this works great,
until
we get to the state that the hardware is almost full, and the app sends
more IO than will fit into the hardware.

An example of this, SATA NCQ has the capacity of 32 tagged commands, and
let’s suppose we get to 31 running, and 3 more arrive from the host app.
Since my RFHQ is parallel, all three pop out at about the same time,
and
the first one to grab the lock wins. The other two invocations of
EvtIoDefault have no place to put their requests, since I can’t put them
back on the RFHQ they came out of, since it’s parallel, and all the docs
say WdfRequestRequeue is only valid on a manual Q.

Since I am using a parallel queue for the RFHQ, it appears my options
are
to either forward any excess (no room at the inn) IOs to a manual
holding
queue, which I would then have to poll. If I’m going to do that, I
might
as well make the RFHQ a manual queue, and be done with it.

Alternatively, I could stop the RFHQ by calling WdfIoQueueStop(), then
forward them back into the RFHQ, and restart it by calling
WdfIoQueueStart() when there is room on the hardware, but that puts the
deferred requests at the tail of the Q, and I can easily imagine a
pathological case where some IO(s) get(s) stuck in a loop like that if
the
timing and phase of the moon are correct.

Question #1: Do I understand the usage and limitations of parallel
WDFQUEUEs correctly at this point? Is my overcapacity scenario correct?

KMDF enhancement request #1. Add WdfRequestRequeue capability to
WDFQUEUES
configured as parallel. It wouldn’t be burdensome to require that the
queue be stopped first, in order to avoid the busy looping and potential
recursion of repeated requeues.

Question #2: Given that I have a fixed hardware limit, is my conclusion
that a manual Q is the right thing for the RFHQ the correct conclusion,
or
have I missed something?

I mentioned that we’ve approximated the KMDF version of Extreme DMA.
We’re
diving to WDM for the DMA Adapter and using BuildScatterGatherList to
fill
a pre-allocated SGL. This doesn’t appear to be possible within KMDF.
The
reason this doesn’t appear to be possible is that the EvtProgramDma
callback is supposed to return TRUE when it starts the IO on the
hardware,
and FALSE otherwise. I had a lot less discomfort calling BuildSGL,
because
the callback for that one is a VOID function, so there was no need to be
dishonest when disconnecting the callback from the act of starting it on
the hardware. I would probably be willing to trade the pre-allocated
SGL
for the if I could defer the hardware start (hang it on my RFHQ) in
EvtProgramDma.

Question #3: Do I understand the KMDF DMA contract right? It isn’t
kosher
to say TRUE when I may cancel or abort the IO without actually starting
the
DMA on the hardware?

KMDF enhancement request #2: Allow pre-allocated SGLs.

KMDF enhancement request #3: Allow decoupling the SGL construction from
the DMA start.

I really do like the KMDF a lot. But I’m also pushing the limits of
“playing nice”, because I’m doing a lab harness, not a production
driver,
so I recognize that my needs may not be consistent with some KMDF design
requirements.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · October 6, 2006, 5:37pm

Jake,

Thanks for the confirmation of what I had come to conclude myself.

We were doing the second level queues as parallel, but we have to do it
as manual now, so I’m doing exactly what you suggest in 2).
Yes, that was my conclusion, too. In my particular case, an array is
more useful than a list, but I can see how a list would be useful, too, if
I didn’t have the tag as a natural index, or if the tags were not
contiguous, or random, or…

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of “Jake Oshins”

Sent: Thursday, October 05, 2006 10:38 PM

To: “Windows System Software Devs Interest List”

Subject: Re:[ntdev] WDFQUEUEs and KMDF DMA questions

I’ve been working with WDFQUEUEs for a while now and I’ve developed my own

opinions about how to structure my code in these sorts of situations. So,

for all to comment on, are Jake’s rules of WDFQUEUE usage:

1) Sort your I/O at the top level and forward to a set of second-level

manual queues. You seem to be doing this already.

2) Set the callback telling you when a manual queue goes non-empty just as

Peter suggests below.

3) Once you’ve retrieved a request from that manual queue, never put it

back into any WDFQUEUE ever again.

Way too much of my effort (and effort from other people on this list) has

been wasted finding a reasonable way to search queues for requests that
have

been partially processed. Furthermore, the cancellation model in KMDF,

which is really quite nice, is completely undermined if you put a partially

processed request back into a queue, which automatically makes it
cancelable

whether you like it or not.

I find that my code is much more manageable if I just put partially

processed requests on linked lists, using the request context structures
for

the list entries. If the request is in a state where it needs to be

cancelable, then call WdfRequestMarkCancelable.

- Jake Oshins

Windows Kernel Team