IOCTL requests from 4 threads running on 4 different cores

Hello,

My PCI card receive data at ~3Gb/Sec.
It has 4 separate channels and one PCI interrupt.

Each time the card gets 4MB of data it creates an interrupt.
The device driver is written in KMDF 7600.16385.1 and is based on general\PLX9x5x
Upon interrupt PLxEvtInterruptDpc puts a message in an internal queue(out of 4). The queue is selected according to the interrupt status register…
There are 4 internal queues. One queue for a channel.

The device driver also contains a queue that handles IOCTL requests from the user level driver.
The user level driver sends an IOCTL request and returns when PLxEvtInterruptDpc sends a message to the
internal queue.
The IOCTL request contains the queue ID so the IOCTL handler checks only the relevant queue for a new message.

The application contains 4 threads. Each thread handles data from a channel.
It runs on a multi-core PC so each thread can run on a different core (windows decides on this).

From what I know, the device driver runs on core 0 only.
How can it answer to IOCTL requests from 4 different cores ?
Should I use WdfIoQueueDispatchParallel for IOCTL queue ?

Thanks,
Zvika

No device drivers will run on as many cores as you have. There is no
restriction on what core it runs on. You should be using a parallel
queue for performance here. But there is no problem with it handling
requests from N cores it will take the request on the core it came in
on.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Zvi Vered” wrote in message news:xxxxx@ntdev:

> Hello,
>
> My PCI card receive data at ~3Gb/Sec.
> It has 4 separate channels and one PCI interrupt.
>
> Each time the card gets 4MB of data it creates an interrupt.
> The device driver is written in KMDF 7600.16385.1 and is based on general\PLX9x5x
> Upon interrupt PLxEvtInterruptDpc puts a message in an internal queue(out of 4). The queue is selected according to the interrupt status register…
> There are 4 internal queues. One queue for a channel.
>
> The device driver also contains a queue that handles IOCTL requests from the user level driver.
> The user level driver sends an IOCTL request and returns when PLxEvtInterruptDpc sends a message to the
> internal queue.
> The IOCTL request contains the queue ID so the IOCTL handler checks only the relevant queue for a new message.
>
> The application contains 4 threads. Each thread handles data from a channel.
> It runs on a multi-core PC so each thread can run on a different core (windows decides on this).
>
> From what I know, the device driver runs on core 0 only.
> How can it answer to IOCTL requests from 4 different cores ?
> Should I use WdfIoQueueDispatchParallel for IOCTL queue ?
>
> Thanks,
> Zvika

What Mr. Burn meant to say was “No. Device drivers will run on as many cores as you have.”

To clarify further: Your statement that “the device driver runs on core 0 only” is not correct.

Peter
OSR

>To clarify further: Your statement that “the device driver runs on core 0 only” is not correct.

It could well happen that all interrupts from the device are directed to core 0, that is ISR is always called on core 0, and its DPC will also run there.

But it doesn’t matter from which core the IRP came from. You can have 4 UM threads running on different or the same core. Unless you set the thread affinity, they will run on a core that the scheduler chooses. This doesn’t matter for the driver.

>Unless you set the thread affinity, they will run on a core that the scheduler chooses.
And if the hardware has the MSI-X support could be even more benefits in terms of performance.

Igor Sharovar

> Hello,

My PCI card receive data at ~3Gb/Sec.
It has 4 separate channels and one PCI interrupt.

Each time the card gets 4MB of data it creates an interrupt.
The device driver is written in KMDF 7600.16385.1 and is based on
general\PLX9x5x
Upon interrupt PLxEvtInterruptDpc puts a message in an internal queue(out
of 4). The queue is selected according to the interrupt status register…
There are 4 internal queues. One queue for a channel.

The device driver also contains a queue that handles IOCTL requests from
the user level driver.
The user level driver sends an IOCTL request and returns when
PLxEvtInterruptDpc sends a message to the
internal queue.
The IOCTL request contains the queue ID so the IOCTL handler checks only
the relevant queue for a new message.

The application contains 4 threads. Each thread handles data from a
channel.
It runs on a multi-core PC so each thread can run on a different core
(windows decides on this).

From what I know, the device driver runs on core 0 only.

I have no idea how you know this, because it is undocumented. Also, a
device driver has several components, such as the ISR, the DPC, and the
top-level dispatch routines. So you can’t possibly know where something
is running unless you specify what part of the driver you are talking
about.

How can it answer to IOCTL requests from 4 different cores ?

I don’t know what you mean by “answer”, since that is not a verb that
would apply in this case. You can respond to the arrival of the IOCTL,
and you can expect, on a processor of N cores, that you could have N
concurrent threads running in your passive level dispatch routine. And
more than that, because at passive level a thread might exhaust its
timeslice and be descheduled, leaving it in the middle of something (note
that spin locks cannot be preempted this way), so you could have as many
threads as feel like it running in your passive-level dispatch routine.

Then there is the dequeing of the next operation from the queue. As far
as I know, you could potentially have N threads trying to do this, and the
same time as M threads are trying to put things into the queues.

Then there is the IoCompleteRequest call (I don’t know its Wdf equivalent)
and you could have many of these happening at the same time.

In addition, although I don’t know if any hardware vendors have
implemented this, there is the potential for multiple cores to want to be
in the ISR, but that is prevented by the ISR spinlock in the KINTERRUPT
object. But if the interrupts are routed by the hardware to different
cores, then the DPCs, which by default run on the core that took the
interrupt, could be running concurrently. You should expect this will
happen.

It is very important to understand all the possible concurrencies that can
occur in a driver, and expect that any concurrency that is possible will
occur.

Should I use WdfIoQueueDispatchParallel for IOCTL queue ?

Thanks,
Zvika

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>>To clarify further: Your statement that "the device driver runs on core 0

> only" is not correct.

It could well happen that all interrupts from the device are directed to
core 0, that is ISR is always called on core 0, and its DPC will also run
there.

But it doesn’t matter from which core the IRP came from. You can have 4 UM
threads running on different or the same core. Unless you set the thread
affinity, they will run on a core that the scheduler chooses. This doesn’t
matter for the driver.

Actually, it matters a LOT if each of these threads can be accessing a
shared object, such as a queue.

And putting a footnote on page 189 of the manual that says “You must set
the thread affinity of all applications to a specific core” will never be
read, and in any case would scream “I HAD NO CLUE WHAT I WAS DOING!” which
is not something you want to make evident.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer