Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV
Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Win10: Interrupt miss in application

zviveredzvivered Member Posts: 81

Hello,

O.S: windows 10 IOT LTSC
My FPGA is creating an MSI interrupt every 20msec.
The application is sending an IOCTL request to the driver.
In the driver, this request is forwarded to internal queue (WdfRequestForwardToIoQueue)
Upon interrupt, interrupt handler DPC pulls the request from the internal queue and calls to : WdfRequestCompleteWithInformation to answer the IOCTL in the application.
This answer contains an interrupt counter.

The above mechanism works OK for some time (e.g 30 minutes).
and then I'm getting in the application an interrupt miss error. The application missed an interrupt (according to the counter).

There are 2 other processes that work with this driver at the same time.
Each of them also waits for an interrupt. But each application (out of 3) waits for a different interrupt.
Can this cause the problem ?

Is there an alternative to the above mechanism so there will be no interrupt miss ?

Thank you,
Zvika

Comments

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,235

    Haven’t we already had this discussion, Mr. @zvivered ?

    HOW is the interrupt being “missed”? Is there not an IOCTL from the app available in the DPC?

    How many IOCTLs is the app sending? It should send a bunch of them, so that the driver always has one available in the DPC when it needs one.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • zviveredzvivered Member Posts: 81

    Hi Peter, All,

    Thank you for your reply.
    Currently: Application is sending **one ** IOCTL which is blocked till interrupt (or timeout)

    You suggest: During application initialization, application will send **N ** IOCTL requests and not wait for reply.
    Then, in the forever while loop, application will send **one **request and wait for reply.
    Did I understand correctly ?

    Best regards,
    Zvika

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,235

    I'm not sure if you understood correctly, because I don't have your code in front of me.

    1. You open a handle for what Windows calls "overlapped I/O"
    2. You send N IOCTLs, N > 1, and do not wait.
    3. You receive IOCTL completions, using whatever method you choose
    4. Each time you get one IOCTL completion, before processing that completed IOCTL, you send back an IOCTL to the driver, and do not wait
    5. You process the IOCTL completion.
    6. Goto 3

    We HAVE covered this with you multiple times in the past... dating back as far as 2014.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • zviveredzvivered Member Posts: 81

    Hi Peter,

    Your help is highly appreciated.

    Best regards,
    Zvika

  • zviveredzvivered Member Posts: 81
    edited December 2020

    Hello Peter, All,

    In the attached code I implemented the advises I got here.
    But I still miss interrupts.

    Application

    Open (Gwbr.cpp, line 20)
    call WaitForDwellEvent (Gwbr.cpp, line 77) 10 times with timeout = 0.

    while (1)
    {
    WaitForDwellEvent (INFINITE, &Counter);
    //Check Counter
    WaitForDwellEvent (0, &Temp); //Send IOCTL - Do not wait
    }

    Driver
    Init.c: allocate 7 manual queues.

    Control.c: handle IOCTL requests. Line 119
    Handle IOCTL sent in the above WaitForDwellEvent

    IsrDpc.c: Handle 7 MSI interrupts. Line 194
    If an interrupt is active, the driver pulls a message from the relevant queue and completes the request.

    I hope it's not too much to ask.
    Can you please inspect the code and tell me what am I doing wrong ?

    Thank you in advance,
    Zvika

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,235

    hope it's not too much to ask.

    With all due respect, it IS too much to ask from me.

    A proper code review is time consuming, complex, and expensive. For me to take a brief look, and toss some random comments to you based on general impressions gained while eating breakfast, would risk my not giving you the right answers... which is worse than giving you no answers at all. I know this from experience.

    Perhaps some of our other colleagues here will be able to help better than I can.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,762

    I can see one flaw right off the bat. Every WDFREQUEST you get must either be completed or forwarded to another I/O queue. The "default" branch in your ioctl handler doesn't do that. In most samples, you don't complete the request in each switch case, you just set "status" and "information" and complete the request after the switch. The "default" case can then just "break", and the STATUS_INVALID_DEVICE_REQUEST will happen automatically. Your case that forwards to an I/O queue would then "return" instead of "break".

    You're also not checking for errors in the WdfRequestRetrieveXxx calls, nor are you checking that the BAR offsets are within range. That's extremely dangerous. You should never trust the user-mode app.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,762

    Without seeing source code for SendDeviceIoctl, there's little we can do. Nothing in the code you posted actually sends an ioctl or waits for a result.

    You still haven't described how you know interrupts are being "missed". I assume you're getting them all in kernel mode, so is it your user-mode accounting that comes up short?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • MBond2MBond2 Member Posts: 233

    this code does not make sense

    while (1)
    {
    WaitForDwellEvent (INFINITE, &Counter);
    //Check Counter
    WaitForDwellEvent (0, &Temp); //Send IOCTL - Do not wait
    }

    looking at nothing else in your code, this does not follow any valid pattern.

    for overlapped IO, there are two general patterns - top down or bottom up. The 'top down' pattern is the classic one where the application determines what to do and the kernel does it and tells you when its done. A typical example is writing to a file. The timing of when a certain activity should start is entirely under the control of the UM process.

    the 'bottom up' pattern is where the what to do is still controlled by the UM application, but the when to do it is under the control of the KM component. either within itself, or in response to some hardware or network event.

    with the top down pattern, IO is initiated from UM when work needs to be done. you do not expect any special queueing on the driver side because the driver's job is to do the work 'now'. This loop cannot implement that pattern.

    with the bottom up pattern, IO is initiated from UM in advance of when the work is to be done by the driver. in this pattern the driver is expected to queue and pend multiple IOCTLs from UM, and then complete them 'later'. This loop also does not implement this pattern, as further IOCTLs submitted from UM to KM should happen in the completion handler and not be in any kind of loop.

    but in my shop, while(1) would be a gigantic red flag. While(true) and proper code formatting (a matter of opinion)

    reading between the lines, I am almost sure that when you say that events are being 'missed', they happen during times where you driver does not have a pending IOCTL to complete because this loop cannot ever sent enough.

  • zviveredzvivered Member Posts: 81

    Hello Tim, MBond2, All,

    The attached Gwbr.cpp.txt contains also the code for SendDeviceIoctl
    The WaitForDwellEvent sends an IOCTL to the driver which is handled in Control.c.txt (line 119)
    The IOCTL is forwarded to a queue allocated only for this IOCTL code and the request is completed in IsrDps.c.txt (line 210)

    In IsrDpc.c.txt, upon getting a "dwell" interrupt, a request is pulled from this queue (line 200) and completed.

    The tip I got in another thread: https://community.osr.com/discussion/292320/calling-deviceiocontrol-from-multiply-threads#latest
    was to send few IOCTLs without waiting during initialization.
    The IOCTL waits for ever till a "dwell" interrupt is received. Only then, one more IOCTL is sent without waiting .

    The method for checking a missed interrupt in the application:
    Upon "dwell" interrupt, a counter is incremented. This counter is sent back to the application upon an interrupt.
    The reply is built in IsrDpc.c.txt (line 207)

    Thank you,
    Zvika

  • MBond2MBond2 Member Posts: 233

    I guess we can infer that you have fixed your problem. we can also infer that your pattern is the 'bottom up' pattern and you have correctly identified that pending multiple IRPs allows your driver to do something useful when these interrupts happen during times when your previous solution did not have any pending IO.

    next you need to consider how many pending IRPs you should have. This is a problem that you likely cannot exactly solve, but it depends on how important it is to 'never' miss an event and what happens when the system is under load and the UM components cannot process completions and send new IRPs for a 'long' time

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,762

    An interrupt-driven driver always needs to be able to handle the case where a new interrupt arrives before the previous interrupt has finished processing, especially if interrupts arrive quickly. Are you handling that case? You can't assume a nice linear progression from ISR to DPC to user-mode.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,235

    IOW, you are not guaranteed a 1:1 correspondence between calling WdfInterruptQueueDpcForIsr and calls to your DPC. Not to mention, your DPC can run in parallel on multiple CPUs simultaneously.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • zviveredzvivered Member Posts: 81

    Hello Tim, MBond2, Peter, All,

    Thank you very much for your replies.

    Can you please offer a mechanism (or a sample code) to handle a case in which new interrupt arrives before previous interrupt finished processing ?
    How should I handle a not 1:1 correspondence between calling WdfInterruptQueueDpcForIsr and calls to your DPC ?

    Best regards,
    Zvika

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,235
    edited December 2020

    How should I handle a not 1:1 correspondence between calling WdfInterruptQueueDpcForIsr and calls to your DPC ?

    I get this question almost every time I teach our WDF seminar. And I frankly never know how to answer it. The answer is: "You write some code. That is, you write whatever code you need to write to handle this situation, in your driver, based on your hardware. I can't tell you what that is."

    This code might be something as simple as keeping a count of unserviced interrupts in your ISR, and decrementing that count to zero in your DPC.

    Or, maybe that code entails you saving some state away (say, the latched state of your device's registers) on each ISR invocation... and retrieving that saved state in your DPC and processing it. This save/retrieve might be something as simple as non-destructively OR'ing register status into copy of one or more of your hardware registers (we call these "shadow registers") that you maintain in (for example) your Interrupt context. Or the save/retrieve might be something as complex as storing a bunch of stuff into a data area and then making an entry on a linked-list (with ExInterlockedInsertTailList) for each ISR invocation, and retrieving and processing all those entries (using ExInterlockedRemoveHeadList) once you get to your DPC.

    Only you know what you need to do, based on your hardware and driver architecture. The key thing is you cannot assume that your calls to WdfInterruptQueueDpcForIsr and calls to your DpcForIsr will be 1:1... you have to understand that these can be N:1... and write whatever code you need to write to ensure that your driver "does the right thing" under this constraint.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,762

    The interrupt tells you "something interesting happened". It should not be giving you any more detail than that. It is up to the driver to determine what happened. If some data finished transferring, then your driver needs to be able to find out how much data there was. If your driver has to assume that each interrupt means 64k bytes transferred, then your design is badly broken.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • MBond2MBond2 Member Posts: 233

    In addition to all of the coalescing that can happen in software, interrupts can be coalesced in hardware too. Message signaled interrupts have been the norm for many years now and the hardware will queue exactly 1 and discard any duplicates.

    the point is the same: no matter how you get to your DPC, n things of interest have happened on your hardware, and you have n things to report back up to UM or wherever. and if you don't have at least n pending IRPs to complete, then you have to decide what to do. Throw the data away or queue it up (to some limit). but as Tim points out, if you can't figure out how to calculate n, or get the details of all n things, you are sure to have data loss before you even worry about how the UM program calls the driver.

    certain conditions will inevitably result in data loss. so then it becomes an engineer's job to determine acceptable thresholds. If you anticipate an interrupt frequency on the order of 1 Hz, than just about any design that works at all will work in nearly every case. But even then, there is still no reason to do it deliberately badly

  • zviveredzvivered Member Posts: 81

    Hello,

    I think I have further information:

    As I wrote, during init, I'm sending 10 IOCTL requests with the code: DWELL_INTERRUPT_REQUEST_CODE
    I checked that those requests are handled.
    For each of them, the message is sent to a queue.

    Upon interrupt, a message is read from this queue in DPC using WdfIoQueueRetrieveNextRequest
    In the application, after this IOCTL is answered, another IOCTL is sent.

    In case of "interrupt miss", WdfIoQueueRetrieveNextRequest returnes with 0x8000001A which means: STATUS_NO_MORE_ENTRIES

    How can this happen ?
    Is it possible that the queue is too small ?

    Thank you,
    Zvika

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,235

    Your app hasn’t sent enough entires. Have the app send 100 instead of 10 entries. See if that works.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • zviveredzvivered Member Posts: 81

    Hello,

    I had a BUG in the application.
    In SendDeviceIoctl (line 92) , if the timeout is 0, rc=WAIT_TIMEOUT and the event is not inserted back to the pool.
    After fixing it, I'm not getting STATUS_NO_MORE_ENTRIES from WdfIoQueueRetrieveNextRequest in IsrDpc.c

    The cause to the miss:
    The time from one interrupt to another is 20msec.
    But the time required to process this event in the application can take more than 20msec.

    Thank you very much for the detailed answers.
    I really appreciate your help.

    Zvika

  • MBond2MBond2 Member Posts: 233

    you do understand that you can certainly get this failure again even after you have fixed whatever bug you had. It is inherent in the nature of the system and you should have some kind of strategy to deal with it. many times, the strategy is simply 'i need 4 buffers pended to handle normal load, so i'll, pend 4,000 and if things go that far bad then i'll just raise an error'. that is a valid strategy. It is not valid to say I need 4 and so i'll pend 8 and cross my fingers

  • zviveredzvivered Member Posts: 81

    Hello MBond2,

    Thank you very much !

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Writing WDF Drivers 7 Dec 2020 LIVE ONLINE
Internals & Software Drivers 25 Jan 2021 LIVE ONLINE
Developing Minifilters 8 March 2021 LIVE ONLINE