I’ve been getting reports of rare and very occasional BSODs which affect our product. We queue APCs to trigger LoadLibrary() in processes and the BSODs are all the result of an APC rundown function being invoked after our driver has been unloaded (during an upgrade of the product). In some cases, the rundown function is being invoked 60 seconds or so after the unload has occurred.
I’ve spent the day reading through the Reactos APC source code and then looking at the Windows APC functions in IDA. I’ve concluded that only half the APC APIs are available. The cancellation functions like KeRemoveQueueApc() are not exported. Even flushing is not possible because KeFlushQueueApc() is also not exported.
In the past, our product has been written to work around this by using an interlocked counter. When APCs are inserted onto the queue, the counter is incremented and when either the kernel or rundown functions are invoked, the counter is decremented. When the driver is unloaded, we wait for the counter to fall to zero before allowing the driver to unload. An arbitrary maximum of 10 seconds has been given to this wait.
It’s been pointed out to me today that there are two problems with this:
It looks like you're new here. If you want to get involved, click one of these buttons!
|Upcoming OSR Seminars|
|Writing WDF Drivers||21 Oct 2019||OSR Seminar Space & ONLINE|
|Internals & Software Drivers||18 Nov 2019||Dulles, VA|
|Kernel Debugging||30 Mar 2020||OSR Seminar Space|
|Developing Minifilters||27 Apr 2020||OSR Seminar Space & ONLINE|