My question is not on the technical side, but more on the design side.
Consider I have to drive a device that, once configured and started (by writing 1 to a register), processes data asynchronously and asserts an interrupt each time it finishes processing a chunk of data (a fixed size packed), until it’s stopped (by clearing the previous register).
An application needs to access the data to do something with it.
What I need is avoiding any unnecessary memcpy on the datapath between the device and the application.
What would be the best design pattern, based on your experience (this list is of course not exhaustive) ?
Allocate memory for n packets inside the application. Forward the PacketReady interrupt to the application (through a semaphore for example), so that the application can issue a ReadFile to get the data, as soon as it has a free buffer.
Allocate memory for n packets inside the application. Make the application issue n ReadFile before the start, keep the PacketReady interrupt private to the driver, and make the driver perform the DMAs and complete the ReadFile requests one after the other (avoids the need for a shared semaphore between the driver and the application).
Allocate memory for n packets inside the driver. Forward the PacketReady interrupt to the application (through a semaphore for example), so that the application can issue a DeviceIoControl to map the buffer inside its process address space and access the data.
> 1. Allocate memory for n packets inside the application. Forward the PacketReady interrupt to the
application (through a semaphore for example), so that the application can issue a ReadFile to get
the data, as soon as it has a free buffer.
Semaphores are useless for most practical tasks, at least their Windows implementation.
Discussed recently here.
This idea is also bad even if you use event and not semaphore. Bad due to scheduling delays.
Allocate memory for n packets inside the application. Make the application issue n ReadFile
before the start, keep the PacketReady interrupt private to the driver, and make the driver perform the
DMAs and complete the ReadFile requests one after the other (avoids the need for a shared
semaphore between the driver and the application).
Very good and correct idea.
Just note that, if your hardware does not support scatter-gather DMA, then you will have a mandatory memcpy(), at least 1 of them.
Allocate memory for n packets inside the driver. Forward the PacketReady interrupt to the
application (through a semaphore for example), so that the application can issue a DeviceIoControl to
map the buffer inside its process address space and access the data.
Usually considered to be worse idea then 2, but also has its uses, especially if there will be always 1 app working with the device.
Just note that you should map the memory once at app init, and then use the “get the producer/consumer pointers” IOCTL to the driver to get the actual part of the buffer filled with data at the moment.
This is how WaveRT sound driver works, with the Vista+ sound server process as the only app which uses this contract.
If the application issues a ReadFile, the driver will have to lock the pages in physical memory before starting the DMA. Is it best to VirtualLock all the buffers inside the application during initialization so that there is no latency penalty during the ReadFile ?
At the moment the application stops the acquisition, there are pending I/Os (ISRs, DPCs, and ReadFile). How can I be sure that all pending I/Os are processed by the application before answering that the stop is done ? What is the usual way to do that ?
Here is what I wanted to do :
Clear the start register of the device.
Delay the stop IOCTL IRP completion until a read pointer (managed by the ReadFile completions queue) reaches a write pointer (managed by the PacketReady interrupts reception).
But what to do with all the ReadFile issued by the application that will never be completed by the driver (because they have no PacketReady counterpart) ?
> 1. If the application issues a ReadFile, the driver will have to lock the pages in physical memory
before starting the DMA.
The driver just sets DO_DIRECT_IO, IO manager does the rest automatically.
Is it best to VirtualLock all the buffers inside the application during initialization so that there is no
latency penalty during the ReadFile ?
You can try, but IIRC VirtualLock is a recommendation and not the hard lock.
But what to do with all the ReadFile issued by the application that will never be completed by the
driver (because they have no PacketReady counterpart) ?