Win10: Interrupt miss in application

zvivered · October 21, 2020, 7:55am

Hello,

O.S: windows 10 IOT LTSC
My FPGA is creating an MSI interrupt every 20msec.
The application is sending an IOCTL request to the driver.
In the driver, this request is forwarded to internal queue (WdfRequestForwardToIoQueue)
Upon interrupt, interrupt handler DPC pulls the request from the internal queue and calls to : WdfRequestCompleteWithInformation to answer the IOCTL in the application.
This answer contains an interrupt counter.

The above mechanism works OK for some time (e.g 30 minutes).
and then I’m getting in the application an interrupt miss error. The application missed an interrupt (according to the counter).

There are 2 other processes that work with this driver at the same time.
Each of them also waits for an interrupt. But each application (out of 3) waits for a different interrupt.
Can this cause the problem ?

Is there an alternative to the above mechanism so there will be no interrupt miss ?

Thank you,
Zvika

Peter_Viscarola_OSR · October 21, 2020, 12:55pm

Haven’t we already had this discussion, Mr. @zvivered ?

HOW is the interrupt being “missed”? Is there not an IOCTL from the app available in the DPC?

How many IOCTLs is the app sending? It should send a bunch of them, so that the driver always has one available in the DPC when it needs one.

Peter

zvivered · October 21, 2020, 7:12pm

Hi Peter, All,

Thank you for your reply.
Currently: Application is sending **one ** IOCTL which is blocked till interrupt (or timeout)

You suggest: During application initialization, application will send **N ** IOCTL requests and not wait for reply.
Then, in the forever while loop, application will send **one **request and wait for reply.
Did I understand correctly ?

Best regards,
Zvika

Peter_Viscarola_OSR · October 21, 2020, 7:25pm

I’m not sure if you understood correctly, because I don’t have your code in front of me.

You open a handle for what Windows calls “overlapped I/O”
You send N IOCTLs, N > 1, and do not wait.
You receive IOCTL completions, using whatever method you choose
Each time you get one IOCTL completion, before processing that completed IOCTL, you send back an IOCTL to the driver, and do not wait
You process the IOCTL completion.
Goto 3

We HAVE covered this with you multiple times in the past… dating back as far as 2014.

Peter

zvivered · October 21, 2020, 9:28pm

Hi Peter,

Your help is highly appreciated.

Best regards,
Zvika

zvivered · December 22, 2020, 8:16am

Hello Peter, All,

In the attached code I implemented the advises I got here.
But I still miss interrupts.

Application

Open (Gwbr.cpp, line 20)
call WaitForDwellEvent (Gwbr.cpp, line 77) 10 times with timeout = 0.

while (1)
{
WaitForDwellEvent (INFINITE, &Counter);
//Check Counter
WaitForDwellEvent (0, &Temp); //Send IOCTL - Do not wait
}

Driver
Init.c: allocate 7 manual queues.

Control.c: handle IOCTL requests. Line 119
Handle IOCTL sent in the above WaitForDwellEvent

IsrDpc.c: Handle 7 MSI interrupts. Line 194
If an interrupt is active, the driver pulls a message from the relevant queue and completes the request.

I hope it’s not too much to ask.
Can you please inspect the code and tell me what am I doing wrong ?

Thank you in advance,
Zvika

Peter_Viscarola_OSR · December 22, 2020, 12:47pm

hope it’s not too much to ask.

With all due respect, it IS too much to ask from me.

A proper code review is time consuming, complex, and expensive. For me to take a brief look, and toss some random comments to you based on general impressions gained while eating breakfast, would risk my not giving you the right answers… which is worse than giving you no answers at all. I know this from experience.

Perhaps some of our other colleagues here will be able to help better than I can.

Peter

Tim_Roberts · December 23, 2020, 7:43am

I can see one flaw right off the bat. Every WDFREQUEST you get must either be completed or forwarded to another I/O queue. The “default” branch in your ioctl handler doesn’t do that. In most samples, you don’t complete the request in each switch case, you just set “status” and “information” and complete the request after the switch. The “default” case can then just “break”, and the STATUS_INVALID_DEVICE_REQUEST will happen automatically. Your case that forwards to an I/O queue would then “return” instead of “break”.

You’re also not checking for errors in the WdfRequestRetrieveXxx calls, nor are you checking that the BAR offsets are within range. That’s extremely dangerous. You should never trust the user-mode app.

Tim_Roberts · December 23, 2020, 7:55am

Without seeing source code for SendDeviceIoctl, there’s little we can do. Nothing in the code you posted actually sends an ioctl or waits for a result.

You still haven’t described how you know interrupts are being “missed”. I assume you’re getting them all in kernel mode, so is it your user-mode accounting that comes up short?

MBond2 · December 24, 2020, 12:27am

this code does not make sense

while (1)
{
WaitForDwellEvent (INFINITE, &Counter);
//Check Counter
WaitForDwellEvent (0, &Temp); //Send IOCTL - Do not wait
}

looking at nothing else in your code, this does not follow any valid pattern.

for overlapped IO, there are two general patterns - top down or bottom up. The ‘top down’ pattern is the classic one where the application determines what to do and the kernel does it and tells you when its done. A typical example is writing to a file. The timing of when a certain activity should start is entirely under the control of the UM process.

the ‘bottom up’ pattern is where the what to do is still controlled by the UM application, but the when to do it is under the control of the KM component. either within itself, or in response to some hardware or network event.

with the top down pattern, IO is initiated from UM when work needs to be done. you do not expect any special queueing on the driver side because the driver’s job is to do the work ‘now’. This loop cannot implement that pattern.

with the bottom up pattern, IO is initiated from UM in advance of when the work is to be done by the driver. in this pattern the driver is expected to queue and pend multiple IOCTLs from UM, and then complete them ‘later’. This loop also does not implement this pattern, as further IOCTLs submitted from UM to KM should happen in the completion handler and not be in any kind of loop.

but in my shop, while(1) would be a gigantic red flag. While(true) and proper code formatting (a matter of opinion)

reading between the lines, I am almost sure that when you say that events are being ‘missed’, they happen during times where you driver does not have a pending IOCTL to complete because this loop cannot ever sent enough.

zvivered · December 26, 2020, 6:41pm

Hello Tim, MBond2, All,

The attached Gwbr.cpp.txt contains also the code for SendDeviceIoctl
The WaitForDwellEvent sends an IOCTL to the driver which is handled in Control.c.txt (line 119)
The IOCTL is forwarded to a queue allocated only for this IOCTL code and the request is completed in IsrDps.c.txt (line 210)

In IsrDpc.c.txt, upon getting a “dwell” interrupt, a request is pulled from this queue (line 200) and completed.

The tip I got in another thread: https://community.osr.com/discussion/292320/calling-deviceiocontrol-from-multiply-threads#latest
was to send few IOCTLs without waiting during initialization.
The IOCTL waits for ever till a “dwell” interrupt is received. Only then, one more IOCTL is sent without waiting .

The method for checking a missed interrupt in the application:
Upon “dwell” interrupt, a counter is incremented. This counter is sent back to the application upon an interrupt.
The reply is built in IsrDpc.c.txt (line 207)

Thank you,
Zvika

MBond2 · December 26, 2020, 11:49pm

I guess we can infer that you have fixed your problem. we can also infer that your pattern is the ‘bottom up’ pattern and you have correctly identified that pending multiple IRPs allows your driver to do something useful when these interrupts happen during times when your previous solution did not have any pending IO.

next you need to consider how many pending IRPs you should have. This is a problem that you likely cannot exactly solve, but it depends on how important it is to ‘never’ miss an event and what happens when the system is under load and the UM components cannot process completions and send new IRPs for a ‘long’ time

Tim_Roberts · December 27, 2020, 12:06am

An interrupt-driven driver always needs to be able to handle the case where a new interrupt arrives before the previous interrupt has finished processing, especially if interrupts arrive quickly. Are you handling that case? You can’t assume a nice linear progression from ISR to DPC to user-mode.

Peter_Viscarola_OSR · December 27, 2020, 7:40pm

IOW, you are not guaranteed a 1:1 correspondence between calling WdfInterruptQueueDpcForIsr and calls to your DPC. Not to mention, your DPC can run in parallel on multiple CPUs simultaneously.

Peter

zvivered · December 29, 2020, 9:58am

Hello Tim, MBond2, Peter, All,

Thank you very much for your replies.

Can you please offer a mechanism (or a sample code) to handle a case in which new interrupt arrives before previous interrupt finished processing ?
How should I handle a not 1:1 correspondence between calling WdfInterruptQueueDpcForIsr and calls to your DPC ?

Best regards,
Zvika

Peter_Viscarola_OSR · December 29, 2020, 4:06pm

How should I handle a not 1:1 correspondence between calling WdfInterruptQueueDpcForIsr and calls to your DPC ?

I get this question almost every time I teach our WDF seminar. And I frankly never know how to answer it. The answer is: “You write some code. That is, you write whatever code you need to write to handle this situation, in your driver, based on your hardware. I can’t tell you what that is.”

This code might be something as simple as keeping a count of unserviced interrupts in your ISR, and decrementing that count to zero in your DPC.

Or, maybe that code entails you saving some state away (say, the latched state of your device’s registers) on each ISR invocation… and retrieving that saved state in your DPC and processing it. This save/retrieve might be something as simple as non-destructively OR’ing register status into copy of one or more of your hardware registers (we call these “shadow registers”) that you maintain in (for example) your Interrupt context. Or the save/retrieve might be something as complex as storing a bunch of stuff into a data area and then making an entry on a linked-list (with ExInterlockedInsertTailList) for each ISR invocation, and retrieving and processing all those entries (using ExInterlockedRemoveHeadList) once you get to your DPC.

Only you know what you need to do, based on your hardware and driver architecture. The key thing is you cannot assume that your calls to WdfInterruptQueueDpcForIsr and calls to your DpcForIsr will be 1:1… you have to understand that these can be N:1… and write whatever code you need to write to ensure that your driver “does the right thing” under this constraint.

Peter

Tim_Roberts · December 29, 2020, 7:08pm

The interrupt tells you “something interesting happened”. It should not be giving you any more detail than that. It is up to the driver to determine what happened. If some data finished transferring, then your driver needs to be able to find out how much data there was. If your driver has to assume that each interrupt means 64k bytes transferred, then your design is badly broken.

MBond2 · December 29, 2020, 11:16pm

In addition to all of the coalescing that can happen in software, interrupts can be coalesced in hardware too. Message signaled interrupts have been the norm for many years now and the hardware will queue exactly 1 and discard any duplicates.

the point is the same: no matter how you get to your DPC, n things of interest have happened on your hardware, and you have n things to report back up to UM or wherever. and if you don’t have at least n pending IRPs to complete, then you have to decide what to do. Throw the data away or queue it up (to some limit). but as Tim points out, if you can’t figure out how to calculate n, or get the details of all n things, you are sure to have data loss before you even worry about how the UM program calls the driver.

certain conditions will inevitably result in data loss. so then it becomes an engineer’s job to determine acceptable thresholds. If you anticipate an interrupt frequency on the order of 1 Hz, than just about any design that works at all will work in nearly every case. But even then, there is still no reason to do it deliberately badly

zvivered · December 31, 2020, 10:54am

Hello,

I think I have further information:

As I wrote, during init, I’m sending 10 IOCTL requests with the code: DWELL_INTERRUPT_REQUEST_CODE
I checked that those requests are handled.
For each of them, the message is sent to a queue.

Upon interrupt, a message is read from this queue in DPC using WdfIoQueueRetrieveNextRequest
In the application, after this IOCTL is answered, another IOCTL is sent.

In case of “interrupt miss”, WdfIoQueueRetrieveNextRequest returnes with 0x8000001A which means: STATUS_NO_MORE_ENTRIES

How can this happen ?
Is it possible that the queue is too small ?

Thank you,
Zvika

Peter_Viscarola_OSR · December 31, 2020, 2:34pm

Your app hasn’t sent enough entires. Have the app send 100 instead of 10 entries. See if that works.

Peter