Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
Hello All:
Thanks for taking a look at my discussion. My problem seems like it would be pretty common, but I couldn't find any specific talks or text.
Scenario:
SW:
FPGA:
FpgaDriver:
General Problem:
My Solution Attempt 1 (failed):
My Solution Attempt 2 ( Pending response from this discussion ):
Thanks again for your help.
Regards,
Juan
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 9-13 Sept 2024 | Live, Online |
Developing Minifilters | 15-19 July 2024 | Live, Online |
Internals & Software Drivers | 11-15 Mar 2024 | Live, Online |
Writing WDF Drivers | 20-24 May 2024 | Live, Online |
Comments
Yes. If we step back a bit: the DPC watchdog is there to make sure that the system remains responsive. Normal DPCs come in at such a high priority, that they preempt the scheduler. In ugly cases, you can have more CPUs than runnable threads, yet the poor threads are stuck underneath a DPC so they just get starved, even though other CPUs are sitting idle.
If you have an unbounded amount of work to do, you cannot do it all in a DPC. And queuing an unbounded number of back-to-back DPCs is cheating; the system is still unresponsive.
If you don't care about 10 microseconds of latency, do all your work in a thread. Super easy. Nobody will get upset if your work is unbounded, and the scheduler knows what to do with threads.
In your ISR, queue a DPC that signals an kevent/kqueue/etc that readies your thread (in case it isn't running already). The only downside of this approach is that there's a few extra microseconds of latency between ISR and thread, since the scheduler might have to context switch your thread if it was sleeping. Of course, if the workload is truly bottomless, even this latency doesn't matter, since the thread can just loop forever.
If you really do care about that bit of latency, then you can try a hybrid approach: do the first few I/Os in the DPC, but fall back to a thread if it looks like you've spent more than a few milliseconds in the DPC. (There's the same bit of latency when you fall back to the thread, but this won't matter, since these I/Os were already delayed by the time your DPC spend processing the head of the queue.)
If you're running a newish kernel,
KeShouldYieldProcessor
can help you with that hybrid approach. Call it every millisecond or so; when it returns TRUE, you should fall back to a thread. This API has the very interesting property that it will actually suppress the DPC watchdog if the kernel can determine that your DPC isn't actually blocking any other work on the CPU. So if you use it properly, and if there's little else happening on the CPU, you can run your DPC for unbounded duration.Note that DPCs don't run faster than threads; the CPU runs as fast as it does, regardless of IRQL. If the software can't keep up while running in a loop in a thread, then it won't somehow get faster while running in a DPC. (A low-priority thread can get pre-empted more than a DPC gets pre-empted, but if you crank the thread priority up to 15, then you'll only be pre-empted by roughly the same things that could delay your DPC.)
You do need to have a story for what to do when the software can't keep up with the hardware. You can build a backpressure system (if the FPGA detects the FIFO is full, it slows down); find a way to parallelize the work across more CPUs; simply drop I/Os; or buy a faster CPU.
@Jeffrey_Tippet_[MSFT]:
Thank you for the great information!
Approach 2 (WorkItem) - Success:
I ended up having free time so I started work on my 2nd approach (using a WorkItem).
I ran into another issue, a WorkItem with a thread-consuming while-loop that caused a BSOD as well.
Approach 3 (Hybrid):
Thanks!
Juan
Hmmmm... JUST as an aside for Mr. Tippet: **KeShouldYieldProcessor **is (still) undocumented. The prototype is provided in the WDK, but the function remains undocumented.
Peter
Peter Viscarola
OSR
@OSRDrivers
Noted. I filed this bug to see if it we can get this doc page written: https://github.com/MicrosoftDocs/windows-driver-docs/issues/1825
If your code is running "forever", it's better to have your own dedicated thread. Workitems are meant for short-lived tasks, where the cost of allocating & spinning up a thread would outweigh the work it does.
On older kernels, the kernel has a finite and fixed number of worker threads... something like 8 threads, if I recall correctly? So if 8 drivers try to spin up long-running workitems, it's really bad. Newer kernels (Win8+, I think) will dynamically grow the workitem pool if it appears to be starved. But in any case, you don't need to put the workitem pool into that awkward position: just allocate your own thread.
What is the other BSOD? If it's another DPC watchdog timeout (0x133), then it's because you raised to DISPATCH_LEVEL for too long. Taking a thread and raising to DISPATCH_LEVEL for unbounded duration is almost as bad for the system as having long-running DPCs, since it blocks thread scheduling and DPCs. DISPATCH_LEVEL should be reserved for short-lived, deterministic bursts of activity.
My apologies. I could/should have done that myself. I forget that we have this nice GitHub based bug system for the docs...
Peter
Peter Viscarola
OSR
@OSRDrivers
@Jeffrey_Tippet_[MSFT]
Sorry for the poor wording, the BSOD was with regard to the WorktItem running indefinitely.
Approach 4 (SystemThread):
I began working on this approach and made the changes to have the DPC create a new system thread. However, now, I have ran into a KERNEL_AUTO_BOOST_LOCK_ACQUISITION_WITH_RAISED_IRQL.
So it must be something to do with the data I am accessing. The data access was OK in a WorkItem (PASSIVE_LEVEL) but no longer in a SystemThread( DISPATCH_LEVEL ).
{thinking out lout}:
For context - simplified code snippets below:
DPC:
extern "C" void EvtInterruptDpc(IN WDFINTERRUPT interrupt, IN WDFOBJECT object)
{
WDFDEVICE device = WdfInterruptGetDevice(interrupt);
PDEVICE_CONTEXT deviceContextP = GetDeviceContext(device);
PsCreateSystemThread( &( deviceContextP->mySysThreadHandle ),
GENERIC_EXECUTE | 0xFFFF,
NULL, NULL, NULL,
FncSysThreadRoutine,
interrupt );
}
SysThread:
VOID FncSysThreadRoutine( IN PVOID context )
{
WDFINTERRUPT interrupt = ( WDFINTERRUPT ) context;
WDFDEVICE device = WdfInterruptGetDevice( interrupt );
PDEVICE_CONTEXT pDevContext = GetDeviceContext( device );
While( !pDevContext->myMetaDataFifo.isEmpty() )
{
pDevContext->myReadRequests.processNextReadReq( isReadReqEmpty );
if( isReadReqEmpty )
timeout.lowPart = 50000; // 5ms
KeWaitForSingleObject( &(pDevContext->myNewReadReqEvt), Executive, KernelMode, TRUE, &timeout );
}
pDevContext->InterruptRegister.writeRegister( FPGA_INTERRUPT_ENABLE_ADDR, ENABLE_ALL_INTERRUPTS );
}
//
Thanks again.
Juan
Update:
I tried to narrow down the offending call. I went as far as to comment out the body of FncSysThreadRoutine and simply allow it to return - Added a TraceEvent to see if the function was executed - still BSOD with same error.
So I commented out the call to PsCreateSysThread and no more BSOD.
Update:
I ended up moving the PsCreateSysThread to the EvtDeviceAdd. Then, using an event in the DPC, I signal the thread to run. I pass into the Thread the Device context, and from there, I am able to get what I need to access the MetaDataFifo and ReadRequestCollection.
With my limited testing, things seem to be working.
Is this the correct approach to this; Having the thread created at Device Add time?
By the way, thread function is calling PsTerminateSystemThread before it exits.
Thanks to all.
Juan.
Did you think to check the documentation?
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-pscreatesystemthread
Tim Roberts, [email protected]
Software Wizard Emeritus
Sure, that can be fine.
That's fine.
Unfortunately this has a race. You need to check the fifo one more time after you enable interrupts. Otherwise, the fifo might get a new item after you've exited the loop, but before you've enabled interrupts.
If your device is always putting new items into the fifo, then you don't have to worry about the race, since maybe you'll miss an interrupt on some, but another will inevitably come along and get you the interrupt.
Hello Jeffrery:
So what I ended up doing in moving the enable interrupts back up to the DPC.
The DPC will signal a NewMetaDataEvt event, enable the interrupts and exit.
When the FIFO is empty, the processing thread will wait for a NewMetaDataEvt event.
I'm not sure if the below is the "best" way of doing this, but it seems to be working (see below pseudo code).
Thank you, again, Jeffrey!
SysThread:
Regards,
Juan
Hello Jeffrey (re-posting, previous post was deleted??)
So what I ended up doing is moving the enable interrupts back up to the DPC.
The DPC will signal a newMetaDataEvt, enable interrupts, and exit.
The processing thread will then wait for the newMetaDataEvt when it detects the FIFO is empty.
I don't know if the below approach is the best, but it seems to be working...
Thank you, again, Jeffrey.
SysThread:
Regards,
Juan
In general, it looks good.
If I understand correctly, this means the FW generates an interrupt "only if the FIFO is empty before it inserts a new item". That should be fine.
There is a super rare, tiny theoretical race with
setEvent( processThreadExitEvt )
: as soon as you set the event, PNP can stop your device and unload your driver. If there's another few CPU instructions after that lastsetEvent
, they could try run after the driver is unloaded, which would crash. This is very unlikely, since it'd require just the perfect sequence of thread scheduling to happen. You could fix it by changing the driver unload code to wait on the thread itself instead ofprocessThreadExitEvt
. (Thread handles are waitable objects too, and enter the signaled state when the thread has finished exiting.)You can eliminate some DPCs + fiddling with events if you suppress interrupts while you know the thread is actively doing work and not waiting. But this is a small optimization of CPU usage, which may not be worth the additional code complexity. It depends on your tradeoff between the cost of CPU usage versus the cost of having to maintain complicated code. It sounds like you already know this, and have chosen to optimize for readable code, which is a great choice
You may be able to shave off a few CPU cycles from the waits by changing to a
SynchronizationEvent
. Then you can delete theclear( )
calls. This usage of KEVENTs effectively becomes a "condition variable", if you have encountered that term before.