Completion Routine of WDF filter driver is sometimes not called

ColinFinck · August 9, 2021, 2:00pm

Hi all!

As Sysinternals PortMon does not work on 64-bit Windows, I’ve written an open-source upper filter driver to monitor reads and writes of legacy serial/parallel ports: https://github.com/enlyze/PortSniffer
To save me from repeating all the PnP boilerplate code (and generally follow this decade’s best practices), I’ve decided to write it as a WDF (KMDF) driver.

The driver seemed to work very well during my testing.
But I recently got a bug report about missing reads when using PuTTY and a serial loopback adapter (RX and TX lines directly connected to make an echo server): https://github.com/enlyze/PortSniffer/issues/4
Without my filter driver, I can type a character in PuTTY and it is instantly shown in PuTTY’s console thanks to the loopback adapter.
As soon as I activate my filter driver and monitor reads, the character is not shown in PuTTY’s console. Monitoring only writes still works well.
The difference is that I can monitor write requests directly, but for reads, I need to wait for the lower driver to call my Completion Routine when it has finished reading.

I’ve debugged that bug a bit and it turned out that the Completion Routine I set in https://github.com/enlyze/PortSniffer/blob/b3b9ed0520310faef5a3d727575dd01517daf8c4/src/driver/EnlyzePortSniffer.c#L807 is not called in this scenario. This happens independently of the used Windows version or USB-Serial adapter/driver (tried both FTDI and Prolific to rule out a bug in the lower driver).
I already used WinDbg to step through the PortSnifferFilterEvtIoRead routine of my driver and confirm that all its calls succeed, which is the case.
However, from the debug traces, I can conclude that PortSnifferFilterEvtIoRead is called while PortSnifferFilterEvtIoReadCompletionRoutine is not, despite being set via WdfRequestSetCompletionRoutine(Request, PortSnifferFilterEvtIoReadCompletionRoutine, filterContext).
I have also enabled KMDF Verifier and checked the WDF Log, with no new insights though.
Finally, I looked for other examples of KMDF filter drivers, but there is basically only Microsoft’s “toastmon” example which does the same as I’m doing: https://github.com/microsoft/Windows-driver-samples/blob/df47b2d284558fa9aacd19257153037e4ebba60e/general/toaster/toastDrv/kmdf/toastmon/toastmon.c#L940-L959

Do you have any idea why my Completion Routine is not called, or why this bug is only triggered by a serial loopback adapter?
Any particular things I could try out to further debug this issue?
Are there any other public examples of KMDF filter drivers I could check?

Best regards,

Colin Finck

ColinFinck · August 9, 2021, 3:55pm

I’ve tried a few more things and eventually changed WDF_IO_QUEUE_CONFIG_INIT_DEFAULT_QUEUE(&ioQueueConfig, WdfIoQueueDispatchSequential) to WDF_IO_QUEUE_CONFIG_INIT_DEFAULT_QUEUE(&ioQueueConfig, WdfIoQueueDispatchParallel).
This fixes the problem for me.

The question now is: Why does that fix the problem?
I understand that the serial loopback adapter always creates a write and a read request nearly simultaneously. I would have expected both requests to arrive at my filter driver though, no matter if requests from the I/O queue are dispatched sequentially or in parallel.

Neither https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wudfddi_types/ne-wudfddi_types-_wdf_io_queue_dispatch_type nor https://www.osr.com/using-counted-queues-in-wdf/ speak about any hard limitations when using sequentially dispatched I/O queues.

Doron_Holan · August 9, 2021, 4:19pm

Filters should operate under the premise of least surprise. In case of I/O, that means by default the filter should not be changing the order, asynchronous state, or dependencies between incoming requests. Why? Because these could be a part of the IO contract for the function driver and applications can take dependencies on contract (and unfortunately on the internal implementation). WdfFdoSetFilter gives you the correct filter behavior out of the box. Your addition of the sequential queue interfered with the I/O contract and the assumptions the app was making. The docs won’t go into this level of detail because they discuss general behavior, not specific in device stack assumptions/contracts/quirks.

ColinFinck · August 10, 2021, 8:02am

I already do call WdfFdoInitSetFilter as the first thing when entering PortSnifferFilterEvtDeviceAdd: https://github.com/enlyze/PortSniffer/blob/b3b9ed0520310faef5a3d727575dd01517daf8c4/src/driver/EnlyzePortSniffer.c#L609

To register actual functions for filtering read and write requests, I need to initialize an I/O Queue myself though, and I need to decide between WdfIoQueueDispatchParallel and WdfIoQueueDispatchSequential there.
There is no “default” setting here.

Can I be sure that WdfIoQueueDispatchParallel will always cause least surprises for the underlying lower driver?

Tim_Roberts · August 10, 2021, 4:20pm

Here is an overgeneralized rule for which I will no doubt be criticized: You never need sequential dispatching. Your default choice should always be parallel. Sequential is too often used as a crutch. If you have a thick stack of poorly written legacy code that doesn’t worry about multithreading, then maybe it’s easier to choose sequential then to think about locking issues, but for new code and for filter code, parallel should be the first choice.

Peter_Viscarola_OSR · August 10, 2021, 5:29pm

overgeneralized …

Yes, I think over generalized is right. It is certainly NOT the tool to jump to for everything.

But… Sequential Dispatching can be very useful. We use it surprisingly frequently in our drivers.

The first use is for devices where doing one thing at a time, and finishing it before starting another thing, makes sense. You’ve got a thermocouple. There are commands to read the temp, to calibrate the thermocouple, to gather statistics, and whatever else similar. No sense handling these Requests in parallel. In an expanded use of this kind, you can have (for example) one read and one write in progress simultaneously with two Queues that use Sequential Dispatching. Very handy for many devices, with perhaps the additional of a third Queue for handling IOCTLs (either Sequential or Parallel).

The second use is for devices that can handle multiple, simultaneous, Requests and where the “preparation step” for putting a Request onto the device isn’t very significant. Maybe you fill-in some data in a shared-memory table or a set of device registers, and let the device “have at it.” In any case, while you’re waiting for the device to process the Request to completion (presumably signaled by an interrupt or something) you forward the pending Request(s) to a WDF Queue (with Manual Dispatching)… which, of course, triggers the release of another Request (if there is one) from the Queue with Sequential Dispatching. The use of Sequential Dispatching here makes the code simple and clean and with no, real, added disadvantage.

“What you see is a function of where you sit.” So, the types of drivers you write will dictate the types of facilities you wind-up using. I started my view of WDF thinking “Sequential bad, Parallel good” – and as I got more familiar with the use of Queues, I discovered I sort of liked Sequential dispatching.

Peter

ColinFinck · August 11, 2021, 11:58am

Thanks for the detailed answers!

Based on that, I will switch to WdfIoQueueDispatchParallel for the device I’m filtering, but continue to use WdfIoQueueDispatchSequential for the control device of my filter driver.
The control device is set exclusive via WdfDeviceInitSetExclusive anyway, and all critical paths are guarded via wait locks, but can’t hurt to have a third concurrency protection (can it?)

MBond2 · August 21, 2021, 11:22pm

A key point to observe is the sophistication of the device being controlled. Peter uses the example of a thermocouple - which is a very simple device. It might be a super complex, hyper accurate thermocouple built of gold platinum and diamonds, but from the point of view of the IO system, it is simple and supports only sequential commands. A sequential queue is a good choice in this case because Microsoft has already written the synchronization logic that you would otherwise have to duplicate.

Contrast that device with an NVMe storage device. That device can process thousands of concurrent read and write operations in hardware. Certainly a sequential queue in software cannot possibly drive the full throughput of such a device.

If you don’t know any better, why apply a limit that could be terribly wrong? If the final device access will be sequential, someone will enforce that limitation at a lower level. But if it is not, then applying a limitation will at least reduce system performance - and as Doron points out, disrupt the actual function of the stack