KMDF - Rescheduling a DPC or break DPC out into a WorkItem

OneKneeToe · October 16, 2019, 6:52pm

Hello All:

Thanks for taking a look at my discussion. My problem seems like it would be pretty common, but I couldn’t find any specific talks or text.

Scenario:
SW:

Allocates memory in chunks. Each chunk is given a Chunk ID.
SW then pushes ChunkIDs and Memory addresses into an FPGA FIFO (ChunkFifo).
SW then issues a FileRead on the FpgaDriver and waits for the read complete.

FPGA:

Pops a Chunk from the ChunkFifo.
DMAs data and generates meta-data associated with this chunk.
FPGA pushes the Chunk MetaData into a MetaDataFifo.
FPGA generates an interrupt, only if the MetaDataFifo is empty.

FpgaDriver:

It receives the file read requests and pushes it into a ReadRequestCollection.
It receive an interrupt. The ISR queues a DPC to handle the interrupt.
- The DPC checks the MetaDataFifo if it is empty.
  - If the MetaDataFifo is empty, exit DPC.
  - If the MetaDataFifo is not empty, check the ReadRequestCollection
  - If the ReadRequestCollection is empty, return DPC.
  - If the ReadRequestCollection is not empty
    - Pop the MetaData from the FIFO.
    - Copy the MetaData into the ReadRequest buffer.
    - Complete the ReadRequest
    - Exit DPC.

General Problem:

I will have no more Interrupts from the FPGA as long as the FIFO is not empty.
If I return from the DPC I wont get another interrupt.
If I add a loop in the DPC (while FIFO not empty) I will run into a DPC Watchdog Timeout.

My Solution Attempt 1 (failed):

Process one MetaData-ReadRequest.
If the FIFO is not empty re-schedule the interrupt - WdfInterruptQueueDpcForIsr(…) - and exit the DPC.
This approach works for a few seconds until I eventually get a DPC Watchdog Timeout.
- I am not entirely sure why.
- The DPCs are returning as there is nothing there to block the DPC.
- Is it because there are too many consecutive DPCs running back-to-back?

My Solution Attempt 2 ( Pending response from this discussion ):

I saw something about having the DPC schedule a WorkItem.
The DPC can return and the WorkItem can run as long as it needs as it can be pre-empted.
Let the WorkItem run in a while-loop until the FIFO is empty.
- This could be never if the FPGA is producing data faster than SW can read - currently the case.

Thanks again for your help.

Regards,
Juan

Jeffrey_Tippet_MSFT · October 16, 2019, 10:53pm

Is it because there are too many consecutive DPCs running back-to-back?

Yes. If we step back a bit: the DPC watchdog is there to make sure that the system remains responsive. Normal DPCs come in at such a high priority, that they preempt the scheduler. In ugly cases, you can have more CPUs than runnable threads, yet the poor threads are stuck underneath a DPC so they just get starved, even though other CPUs are sitting idle.

If you have an unbounded amount of work to do, you cannot do it all in a DPC. And queuing an unbounded number of back-to-back DPCs is cheating; the system is still unresponsive.

If you don’t care about 10 microseconds of latency, do all your work in a thread. Super easy. Nobody will get upset if your work is unbounded, and the scheduler knows what to do with threads.

In your ISR, queue a DPC that signals an kevent/kqueue/etc that readies your thread (in case it isn’t running already). The only downside of this approach is that there’s a few extra microseconds of latency between ISR and thread, since the scheduler might have to context switch your thread if it was sleeping. Of course, if the workload is truly bottomless, even this latency doesn’t matter, since the thread can just loop forever.

If you really do care about that bit of latency, then you can try a hybrid approach: do the first few I/Os in the DPC, but fall back to a thread if it looks like you’ve spent more than a few milliseconds in the DPC. (There’s the same bit of latency when you fall back to the thread, but this won’t matter, since these I/Os were already delayed by the time your DPC spend processing the head of the queue.)

If you’re running a newish kernel, KeShouldYieldProcessor can help you with that hybrid approach. Call it every millisecond or so; when it returns TRUE, you should fall back to a thread. This API has the very interesting property that it will actually suppress the DPC watchdog if the kernel can determine that your DPC isn’t actually blocking any other work on the CPU. So if you use it properly, and if there’s little else happening on the CPU, you can run your DPC for unbounded duration.

This could be never if the FPGA is producing data faster than SW can read - currently the case.

Note that DPCs don’t run faster than threads; the CPU runs as fast as it does, regardless of IRQL. If the software can’t keep up while running in a loop in a thread, then it won’t somehow get faster while running in a DPC. (A low-priority thread can get pre-empted more than a DPC gets pre-empted, but if you crank the thread priority up to 15, then you’ll only be pre-empted by roughly the same things that could delay your DPC.)

You do need to have a story for what to do when the software can’t keep up with the hardware. You can build a backpressure system (if the FPGA detects the FIFO is full, it slows down); find a way to parallelize the work across more CPUs; simply drop I/Os; or buy a faster CPU.

OneKneeToe · October 17, 2019, 3:18pm

@Jeffrey_Tippet_[MSFT]:

Thank you for the great information!

I will give the hybrid approach a try today.

Approach 2 (WorkItem) - Success:
I ended up having free time so I started work on my 2nd approach (using a WorkItem).

The WorkItem will loop while the FIFO is not empty.

I ran into another issue, a WorkItem with a thread-consuming while-loop that caused a BSOD as well.

I used a Condition-Variable ( KEvent ) to signal when a new read request has been added to the collection.
Now, if there are no more read requests, the WorkItem will KeWaitForSingleObject for the KEvent.
If the MetaDataFifo is empty, the WorkItem will exit.

Approach 3 (Hybrid):

Thank you, again, Jeffrey for the response.
Do some work in the DPC for N ms (~10ms) .
- Use KeShouldYieldProcessor as a safeguard, in case I need to exit the DPC prior to N ms.
- Before exiting DPC, if FIFO is not empty, handover work (design decision needed):
  - To a WorkItem if MetaDataFIFO will be empty more often than full.
  - To a SystemThread if MetaDataFIFO will rarely be empty.
    - If this is the case, better to simplify the DPC and just hand-over all the work to a thread.

Thanks!
Juan

Peter_Viscarola_OSR · October 17, 2019, 4:41pm

Hmmmm… JUST as an aside for Mr. Tippet: **KeShouldYieldProcessor **is (still) undocumented. The prototype is provided in the WDK, but the function remains undocumented.

Peter

Jeffrey_Tippet_MSFT · October 17, 2019, 6:18pm

@“Peter_Viscarola_(OSR)” said:
Hmmmm… JUST as an aside for Mr. Tippet: **KeShouldYieldProcessor **is (still) undocumented. The prototype is provided in the WDK, but the function remains undocumented.

Noted. I filed this bug to see if it we can get this doc page written: https://github.com/MicrosoftDocs/windows-driver-docs/issues/1825

Jeffrey_Tippet_MSFT · October 17, 2019, 6:24pm

I ran into another issue, a WorkItem with a thread-consuming while-loop that caused a BSOD as well.

If your code is running “forever”, it’s better to have your own dedicated thread. Workitems are meant for short-lived tasks, where the cost of allocating & spinning up a thread would outweigh the work it does.

On older kernels, the kernel has a finite and fixed number of worker threads… something like 8 threads, if I recall correctly? So if 8 drivers try to spin up long-running workitems, it’s really bad. Newer kernels (Win8+, I think) will dynamically grow the workitem pool if it appears to be starved. But in any case, you don’t need to put the workitem pool into that awkward position: just allocate your own thread.

What is the other BSOD? If it’s another DPC watchdog timeout (0x133), then it’s because you raised to DISPATCH_LEVEL for too long. Taking a thread and raising to DISPATCH_LEVEL for unbounded duration is almost as bad for the system as having long-running DPCs, since it blocks thread scheduling and DPCs. DISPATCH_LEVEL should be reserved for short-lived, deterministic bursts of activity.

Peter_Viscarola_OSR · October 17, 2019, 6:36pm

I filed this bug to see if it we can get this doc page written

My apologies. I could/should have done that myself. I forget that we have this nice GitHub based bug system for the docs…

Peter

OneKneeToe · October 17, 2019, 10:05pm

@Jeffrey_Tippet_[MSFT]

What is the other BSOD?
Sorry for the poor wording, the BSOD was with regard to the WorktItem running indefinitely.

Approach 4 (SystemThread):
I began working on this approach and made the changes to have the DPC create a new system thread. However, now, I have ran into a KERNEL_AUTO_BOOST_LOCK_ACQUISITION_WITH_RAISED_IRQL.

So it must be something to do with the data I am accessing. The data access was OK in a WorkItem (PASSIVE_LEVEL) but no longer in a SystemThread( DISPATCH_LEVEL ).

{thinking out lout}:

How then can I handover this work to a DISPATCH thread?
What data is accessible at DISPATCH?
Is the problem with the interrupt or device objects?
- What if I pass in pointers to the MetaDataFifo and myReadRequests objects instead?

For context - simplified code snippets below:

DPC:
extern “C” void EvtInterruptDpc(IN WDFINTERRUPT interrupt, IN WDFOBJECT object)
{
WDFDEVICE device = WdfInterruptGetDevice(interrupt);
PDEVICE_CONTEXT deviceContextP = GetDeviceContext(device);
PsCreateSystemThread( &( deviceContextP->mySysThreadHandle ),
GENERIC_EXECUTE | 0xFFFF,
NULL, NULL, NULL,
FncSysThreadRoutine,
interrupt );
}

SysThread:
VOID FncSysThreadRoutine( IN PVOID context )
{
WDFINTERRUPT interrupt = ( WDFINTERRUPT ) context;
WDFDEVICE device = WdfInterruptGetDevice( interrupt );
PDEVICE_CONTEXT pDevContext = GetDeviceContext( device );
While( !pDevContext->myMetaDataFifo.isEmpty() )
{
pDevContext->myReadRequests.processNextReadReq( isReadReqEmpty );
if( isReadReqEmpty )
timeout.lowPart = 50000; // 5ms
KeWaitForSingleObject( &(pDevContext->myNewReadReqEvt), Executive, KernelMode, TRUE, &timeout );
}
pDevContext->InterruptRegister.writeRegister( FPGA_INTERRUPT_ENABLE_ADDR, ENABLE_ALL_INTERRUPTS );
}

//

Thanks again.
Juan

OneKneeToe · October 17, 2019, 11:05pm

Update:

I tried to narrow down the offending call. I went as far as to comment out the body of FncSysThreadRoutine and simply allow it to return - Added a TraceEvent to see if the function was executed - still BSOD with same error.

So I commented out the call to PsCreateSysThread and no more BSOD.

Why can’t my DPC create a system thread?

OneKneeToe · October 18, 2019, 2:18am

Update:

I ended up moving the PsCreateSysThread to the EvtDeviceAdd. Then, using an event in the DPC, I signal the thread to run. I pass into the Thread the Device context, and from there, I am able to get what I need to access the MetaDataFifo and ReadRequestCollection.

With my limited testing, things seem to be working.

Is this the correct approach to this; Having the thread created at Device Add time?

By the way, thread function is calling PsTerminateSystemThread before it exits.

Thanks to all.
Juan.

Tim_Roberts · October 18, 2019, 6:27am

So I commented out the call to PsCreateSysThread and no more BSOD.
Why can’t my DPC create a system thread?

Did you think to check the documentation?

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-pscreatesystemthread

IRQL PASSIVE_LEVEL

Jeffrey_Tippet_MSFT · October 18, 2019, 6:01pm

Is this the correct approach to this; Having the thread created at Device Add time?

Sure, that can be fine.

By the way, thread function is calling PsTerminateSystemThread before it exits.

That’s fine.

while ( fifo has any items ) {
. . . do stuff . . .;
}
writeRegister( ENABLE_ALL_INTERRUPTS );

Unfortunately this has a race. You need to check the fifo one more time after you enable interrupts. Otherwise, the fifo might get a new item after you’ve exited the loop, but before you’ve enabled interrupts.

If your device is always putting new items into the fifo, then you don’t have to worry about the race, since maybe you’ll miss an interrupt on some, but another will inevitably come along and get you the interrupt.

OneKneeToe · October 22, 2019, 11:17pm

@“Jeffrey_Tippet_[MSFT]” said:

while ( fifo has any items ) {
. . . do stuff . . .;
}
writeRegister( ENABLE_ALL_INTERRUPTS );

Unfortunately this has a race. You need to check the fifo one more time after you enable interrupts. Otherwise, the fifo might get a new item after you’ve exited the loop, but before you’ve enabled interrupts.

Hello Jeffrery:

So what I ended up doing in moving the enable interrupts back up to the DPC.

The ISR disabled the interrupts, clears the interrupts, and finally schedules the DPC and exit.
The DPC will signal a NewMetaDataEvt event, enable the interrupts and exit.
- Since the FW will not generate another interrupt unless the MetaDataFifo is empty, I should be OK.
When the FIFO is empty, the processing thread will wait for a NewMetaDataEvt event.
I also added a exit-path, so when the device is closed, the processing thread can exit.

I’m not sure if the below is the “best” way of doing this, but it seems to be working (see below pseudo code).

Thank you, again, Jeffrey!

SysThread:

VOID FncSysThreadRoutine( IN PVOID context )
{
    // keepRunning changes at device close time.
    while( pDevContext->keepRunning )
    {
        // while-loop as a double check.
        while( metaDataFifo is empty )
        {
            waitForSingle( newMetaDataEvt )
            clear( newMetaDataEvt )
        } // while FIFO

        // Have meta data, now check if we have read requests.
        if( readRequestCollection is empty )
        {
            waitForSingle( newReadReqEvt )
            clear( newReadReqEvt )
        }

        // Have meta data and a read request.
        // Copy meta data and complete the request.

    } // while keep running

    // Tell device close that we are exiting.
    setEvent( systemThreadExitEvt );

} // FncSysThreadRoutine

Regards,
Juan

OneKneeToe · October 22, 2019, 11:31pm

@“Jeffrey_Tippet_[MSFT]” said:

while ( fifo has any items ) {
. . . do stuff . . .;
}
writeRegister( ENABLE_ALL_INTERRUPTS );

Unfortunately this has a race. You need to check the fifo one more time after you enable interrupts. Otherwise, the fifo might get a new item after you’ve exited the loop, but before you’ve enabled interrupts.

Hello Jeffrey (re-posting, previous post was deleted??)

So what I ended up doing is moving the enable interrupts back up to the DPC.

The ISR will disable interrupts, clear the interrupt, schedule the DPC and exit.
The DPC will signal a newMetaDataEvt, enable interrupts, and exit.
- Since the FW will only generate an interrupt if the FIFO is empty, I should be OK leaving the interrupt enabled.
The processing thread will then wait for the newMetaDataEvt when it detects the FIFO is empty.

I don’t know if the below approach is the best, but it seems to be working…

Thank you, again, Jeffrey.

SysThread:

VOID FncSysThreadRoutine( IN PVOID context )
{
    // keepRunning updated by device close.
    While( pDeviceContext->keepRunning )
    {
        // While-Loop as a double check.
        While( MetaDataFIFO is empty )
        {
            waitOnSingle( newMetaDataEvt );
            clear( newMetaDataEvt );
        } // while metaDataFIFO

        // We have meta data, now lets check if we have a read request.
        if( readRequestCollection is empty )
        { 
            waitOnSingle( newReadRequest );
            clear( newReadRequest );
       } // if read empty

        // Have meta data and a read request.
        // Copy the meta data and complete the read request.

    } // while keepRunning

    // Let the device close know we're exiting.
    setEvent( processThreadExitEvt );
} // FncSysThreadRoutine

Regards,
Juan

Jeffrey_Tippet_MSFT · October 23, 2019, 12:42am

In general, it looks good.

Since the FW will only generate an interrupt if the FIFO is empty, I should be OK leaving the interrupt enabled.

If I understand correctly, this means the FW generates an interrupt “only if the FIFO is empty before it inserts a new item”. That should be fine.

There is a super rare, tiny theoretical race with setEvent( processThreadExitEvt ): as soon as you set the event, PNP can stop your device and unload your driver. If there’s another few CPU instructions after that last setEvent, they could try run after the driver is unloaded, which would crash. This is very unlikely, since it’d require just the perfect sequence of thread scheduling to happen. You could fix it by changing the driver unload code to wait on the thread itself instead of processThreadExitEvt. (Thread handles are waitable objects too, and enter the signaled state when the thread has finished exiting.)

You can eliminate some DPCs + fiddling with events if you suppress interrupts while you know the thread is actively doing work and not waiting. But this is a small optimization of CPU usage, which may not be worth the additional code complexity. It depends on your tradeoff between the cost of CPU usage versus the cost of having to maintain complicated code. It sounds like you already know this, and have chosen to optimize for readable code, which is a great choice

You may be able to shave off a few CPU cycles from the waits by changing to a SynchronizationEvent. Then you can delete the clear( ) calls. This usage of KEVENTs effectively becomes a “condition variable”, if you have encountered that term before.