WDF queue overheads

I would like to know how wdf queues are implemented in the framework and
the overhead due to queues. Does every queue have a dedicated thread?

I have a situation where the driver should wait for some asynchronous
input from my device at some point of time. It can either create a
workitem or create a Request and queue it to do this job. I find the
later method advantageous, but I want to know the run time overhead due
to an additional queue.

xxxxx@gmail.com wrote:

I would like to know how wdf queues are implemented in the framework and
the overhead due to queues. Does every queue have a dedicated thread?

What would make you think so? No, this is not the case, nor is it even
necessary. Think about the way they are implemented. Parallel queues
dispatch immediately, on the thread of the requestor. Serial queues
only dispatch when a previous request is completed, and that’s only done
by using a WDF call, so there is already a thread to use. Manual queues
only dispatch on user request.

In each case, there is already a thread to use. No separate thread is
necessary.

I have a situation where the driver should wait for some asynchronous
input from my device at some point of time. It can either create a
workitem or create a Request and queue it to do this job. I find the
later method advantageous, but I want to know the run time overhead due
to an additional queue.

Queues are lightweight and incredibly useful. The simple fact that they
understand how to handle cancellation is in itself a huge win. The two
major lessons I have learned from my time with KMDF is “exploit queues
whenever possible” and “don’t oversynchronize”.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Vijairaj, can you explain how you are waiting for the hw in your design? Are you saying you want to create a request in the driver and insert it into a queue? Or you will send synchronous i/o in a work item?

As tim says, there is no thread/queue. The only overhead of a wdf queue is the size in memory (which is not that much) and a single lock.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Monday, October 29, 2007 10:09 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] WDF queue overheads

xxxxx@gmail.com wrote:

I would like to know how wdf queues are implemented in the framework and
the overhead due to queues. Does every queue have a dedicated thread?

What would make you think so? No, this is not the case, nor is it even
necessary. Think about the way they are implemented. Parallel queues
dispatch immediately, on the thread of the requestor. Serial queues
only dispatch when a previous request is completed, and that’s only done
by using a WDF call, so there is already a thread to use. Manual queues
only dispatch on user request.

In each case, there is already a thread to use. No separate thread is
necessary.

I have a situation where the driver should wait for some asynchronous
input from my device at some point of time. It can either create a
workitem or create a Request and queue it to do this job. I find the
later method advantageous, but I want to know the run time overhead due
to an additional queue.

Queues are lightweight and incredibly useful. The simple fact that they
understand how to handle cancellation is in itself a huge win. The two
major lessons I have learned from my time with KMDF is “exploit queues
whenever possible” and “don’t oversynchronize”.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

First, thanks Tim for your response.

> I would like to know how wdf queues are implemented in the framework
> and the overhead due to queues. Does every queue have a dedicated
> thread?

What would make you think so?

It’s rather my wish. The driver creates a new request and this has to be
queued without blocking the application that called the driver. I
thought of using a work item, but then the document says that work item
are scarce resources and I don’t want to use them since this request
can block for minutes. I want to keep the driver simple, that’s why I am
looking in to possible options without creating a thread myself.


Vijairaj

There is no need to create a thread if I understand you correctly. First, the application which called into your driver will never be blocked (assuming the application opened the handle as OVERLAPPED) unless you put explicit blocking in your EvtIoXxx routine. This happens b/c the framework always marks incoming requests as pending and returns STATUS_PENDING to the application.

So now you want to have your hardware tell you of something that happened. It will tell you in either an ISR (from which you will queue a dpc) or in an io completion routine. Either way, you don’t actually hold onto any thread. You program the hw and when it tells you it is done, you complete the request that you put into the WDFQUEUE.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, October 29, 2007 10:49 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] WDF queue overheads

First, thanks Tim for your response.

> I would like to know how wdf queues are implemented in the framework
> and the overhead due to queues. Does every queue have a dedicated
> thread?

What would make you think so?

It’s rather my wish. The driver creates a new request and this has to be
queued without blocking the application that called the driver. I
thought of using a work item, but then the document says that work item
are scarce resources and I don’t want to use them since this request
can block for minutes. I want to keep the driver simple, that’s why I am
looking in to possible options without creating a thread myself.


Vijairaj


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks Doron for your response,

Vijairaj, can you explain how you are waiting for the hw in your
design? Are you saying you want to create a request in the driver and
insert it into a queue? Or you will send synchronous i/o in a work
item?

The total architecture is like this:

  1. This is a USB device with one bulkin and one bulkout endpoint.
  2. The Driver always originates the communication and the device always
    has an answer.
  3. Multiple applications can simultaneously have handles open to the
    driver.
  4. Usually, the application sends an IOCTL to the driver and the driver
    performs a Write+Read operation. Write+Read is atomic w.r.t Application.
  5. In some special cases the communication with the device is as follows:
    Appl -(IOCTL1)-> Driver –> Write
    <------------------------+

App2 -(IOCTLn)-> Driver -(Queue)-> WaitOnQueue

App1 -(IOCTL2)-> Driver
<-------------+

App1 -(IOCTL3)-> Driver –> Read
<------------------------+
6. After App1(IOCTL1), till App1(IOCTL3) All other requests AppN(IOCTLn)
are queued.
7. Now, As I can’t trust on App1 to issue IOCTL3, I have to atomatically
initiate a read after IOCTL1.

Here are the possible ways I could think of:

  1. Queue a work item after App1(IOCTL1) and wait for the device response
    in the work items context.
  2. Create a request and queue it (provided the queue has an associated
    thread. But now as I understand from your earlier response, this is not
    possible).
  3. Briefly configure the BulkIN endpoint as a continuous reader.

> There is no need to create a thread if I understand you correctly.

First, the application which called into your driver will never be
blocked (assuming the application opened the handle as OVERLAPPED)
unless you put explicit blocking in your EvtIoXxx routine.

Read request that I call in EvtIoXxx uses the
WDF_REQUEST_SEND_OPTION_SYNCHRONOUS, so there is explicit blocking,
which is the reason that the application thread will block. If I don’t
find any other simpler alternative, then I have to revert to your
suggestion and implement a non blocking read.

A couple of details that you should consider

  1. you cannot queue a driver created request into a WDFQUEUE, only io manager requests can be put into a WDFQUEUE. Note you can do this, but it is a little hairy and I can’t go into the details now
  2. once you configure a pipe for a continuous reader, you can’t unconfigure the cont reader. It is an all or nothing affair.

I would suggest that you do use the continuous reader and queue the incoming read buffers in the driver (you can queue the WDFMEMORYs directly if you put a LIST_ENTRY in the context for the WDFMEMORY that is returned by specifying the appropriate context type in the WDF_OBJECT_ATTRIBUTES when configuring the reader) and modify your design around the c.r. and buffering. It appears that you have quite a complicated design, I would strongly suggest that you move as much of the complexity up to the application and have your driver be as simple as possible.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, October 29, 2007 11:30 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] WDF queue overheads

Thanks Doron for your response,

Vijairaj, can you explain how you are waiting for the hw in your
design? Are you saying you want to create a request in the driver and
insert it into a queue? Or you will send synchronous i/o in a work
item?

The total architecture is like this:

  1. This is a USB device with one bulkin and one bulkout endpoint.
  2. The Driver always originates the communication and the device always
    has an answer.
  3. Multiple applications can simultaneously have handles open to the
    driver.
  4. Usually, the application sends an IOCTL to the driver and the driver
    performs a Write+Read operation. Write+Read is atomic w.r.t Application.
  5. In some special cases the communication with the device is as follows:
    Appl -(IOCTL1)-> Driver –> Write
    <------------------------+

App2 -(IOCTLn)-> Driver -(Queue)-> WaitOnQueue

App1 -(IOCTL2)-> Driver
<-------------+

App1 -(IOCTL3)-> Driver –> Read
<------------------------+
6. After App1(IOCTL1), till App1(IOCTL3) All other requests AppN(IOCTLn)
are queued.
7. Now, As I can’t trust on App1 to issue IOCTL3, I have to atomatically
initiate a read after IOCTL1.

Here are the possible ways I could think of:

  1. Queue a work item after App1(IOCTL1) and wait for the device response
    in the work items context.
  2. Create a request and queue it (provided the queue has an associated
    thread. But now as I understand from your earlier response, this is not
    possible).
  3. Briefly configure the BulkIN endpoint as a continuous reader.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Why not do an async read and process the results in the read’s completion routine?

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, October 29, 2007 11:41 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] WDF queue overheads

There is no need to create a thread if I understand you correctly.
First, the application which called into your driver will never be
blocked (assuming the application opened the handle as OVERLAPPED)
unless you put explicit blocking in your EvtIoXxx routine.

Read request that I call in EvtIoXxx uses the
WDF_REQUEST_SEND_OPTION_SYNCHRONOUS, so there is explicit blocking,
which is the reason that the application thread will block. If I don’t
find any other simpler alternative, then I have to revert to your
suggestion and implement a non blocking read.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> Why not do an async read and process the results in the read’s completion

routine?

I am currently looking in to this and have to reorganize a few things.
Thanks for your time and effort.

One comment to add to Doron’s suggestions: instead of relying on the app to
issue the read IOCTL, have the initial IOCTL request be a bidirectional
Write/Read operation.

----- Original Message -----
From: “Doron Holan”
To: “Windows System Software Devs Interest List”
Sent: Tuesday, October 30, 2007 2:41 AM
Subject: RE: RE:[ntdev] WDF queue overheads

A couple of details that you should consider
1) you cannot queue a driver created request into a WDFQUEUE, only io
manager requests can be put into a WDFQUEUE. Note you can do this, but it
is a little hairy and I can’t go into the details now
2) once you configure a pipe for a continuous reader, you can’t unconfigure
the cont reader. It is an all or nothing affair.

I would suggest that you do use the continuous reader and queue the incoming
read buffers in the driver (you can queue the WDFMEMORYs directly if you put
a LIST_ENTRY in the context for the WDFMEMORY that is returned by specifying
the appropriate context type in the WDF_OBJECT_ATTRIBUTES when configuring
the reader) and modify your design around the c.r. and buffering. It
appears that you have quite a complicated design, I would strongly suggest
that you move as much of the complexity up to the application and have your
driver be as simple as possible.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, October 29, 2007 11:30 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] WDF queue overheads

Thanks Doron for your response,

> Vijairaj, can you explain how you are waiting for the hw in your
> design? Are you saying you want to create a request in the driver and
> insert it into a queue? Or you will send synchronous i/o in a work
> item?

The total architecture is like this:
1. This is a USB device with one bulkin and one bulkout endpoint.
2. The Driver always originates the communication and the device always
has an answer.
3. Multiple applications can simultaneously have handles open to the
driver.
4. Usually, the application sends an IOCTL to the driver and the driver
performs a Write+Read operation. Write+Read is atomic w.r.t Application.
5. In some special cases the communication with the device is as follows:
Appl -(IOCTL1)-> Driver –> Write
<------------------------+

App2 -(IOCTLn)-> Driver -(Queue)-> WaitOnQueue

App1 -(IOCTL2)-> Driver
<-------------+

App1 -(IOCTL3)-> Driver –> Read
<------------------------+
6. After App1(IOCTL1), till App1(IOCTL3) All other requests AppN(IOCTLn)
are queued.
7. Now, As I can’t trust on App1 to issue IOCTL3, I have to atomatically
initiate a read after IOCTL1.

Here are the possible ways I could think of:
1. Queue a work item after App1(IOCTL1) and wait for the device response
in the work items context.
2. Create a request and queue it (provided the queue has an associated
thread. But now as I understand from your earlier response, this is not
possible).
3. Briefly configure the BulkIN endpoint as a continuous reader.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> > The EvtIoDeviceControl function has some KeWaitXXX calls with

> indefinite timeout.

Ok, so there’s your real problem. That’s a terrible design.

IMHO dismissing indefinite wait as a whole is unreasonable.

Here is some background to the design and the reason for indefinite wait:

  1. This is a smart card reader driver.
  2. This indefinite wait is when the driver waits for data from the device.
  3. The user mode smart card APIs don’t have options to perform
    asynchronous operation, so even if the driver provides this capability
    albeit with added complexity, its going to be wholly unused.
  4. This is not a simple transfer of data between application and the
    device but involves some formatting added by the driver before writing
    to device and a lot of post processing after reading from the device so
    a lot of context has to be preserved if the requests are placed in queue
    while waiting for data from the device. The easiest alternative was to
    wait for the data from the device.
  5. In addition to all these, the request has to be passed to the kernel
    mode smclib which has it’s own complexities.
  6. In addition to this there some other complexities involved as
    described @ http://www.osronline.com/showthread.cfm?link=119520

Taking all this in to account, indefinite wait was the simplest solution
that I could think of. Do you have any alternate suggestions?


Vijairaj

Actually the NT driver model is built around not performing infinite waits. By doing this you guarantee that the thread cannot be terminated while in the wait. This means the application becomes completely unresponsive. By not blocking you allow the thread to continue and do other things.

  1. this has no bearing on doing blocking waits
  2. you get data from the device either by sending an irp to it or an interrupt fires. Either way gives you an async way to know when it is done without blocking
  3. you don’t know what’s underneath the covers or what will change in the future. by handling the i/o async it doesn’t matter if the caller issues the i/o async or sync; by optimizing for the sync case you are precluding future improvements
  4. Post processing: that is what completion routine is for. does the post processing routine require IRQL == passive_level? That would be the only reason not to do it directly in the completion routine (in which case you want to queue a work item). Put the context of the operation into an object context structure for the request and you have everything that you need.
  5. smclib’s complexity has no real bearing here as far as I can tell. If smclib’s post processing is complex, it is going to be complex after a sync operation or from the completion routine

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Wednesday, October 31, 2007 10:38 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] WDF queue overheads

> The EvtIoDeviceControl function has some KeWaitXXX calls with
> indefinite timeout.

Ok, so there’s your real problem. That’s a terrible design.

IMHO dismissing indefinite wait as a whole is unreasonable.

Here is some background to the design and the reason for indefinite wait:

  1. This is a smart card reader driver.
  2. This indefinite wait is when the driver waits for data from the device.
  3. The user mode smart card APIs don’t have options to perform
    asynchronous operation, so even if the driver provides this capability
    albeit with added complexity, its going to be wholly unused.
  4. This is not a simple transfer of data between application and the
    device but involves some formatting added by the driver before writing
    to device and a lot of post processing after reading from the device so
    a lot of context has to be preserved if the requests are placed in queue
    while waiting for data from the device. The easiest alternative was to
    wait for the data from the device.
  5. In addition to all these, the request has to be passed to the kernel
    mode smclib which has it’s own complexities.
  6. In addition to this there some other complexities involved as
    described @ http://www.osronline.com/showthread.cfm?link=119520

Taking all this in to account, indefinite wait was the simplest solution
that I could think of. Do you have any alternate suggestions?


Vijairaj


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer