Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Please help. PCI driver DMA DPC cause BSOD on Win10 1809 version

zyezye Member Posts: 4

Hi,

Please help us for a DPC time out BSOD issue on Win10 1809 version. We have a customized PCIE card to capture data from another device. We developed a WDM driver which works well for many years from Win7 to Win10 1703 version. But now the same driver got BSOD on Win10 1809 as below (We did not try win10 1709 and 1803 version)

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: fffff80644d5c380, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding this single DPC timeout

The problem only happens on the DPC call for the DMA interrupt. In the DPC function for the DMA. We have the code like
....
KeAcquireSpinLockAtDpcLevel(&devExt->ImageReadStateLock);
readIrp = devExt->CurrentReadIrp;
if (!readIrp) {
....
KeReleaseSpinLockFromDpcLevel(&devExt->ImageReadStateLock);
}
else
{
devExt->ReadAdapter->DmaOperations->PutScatterGatherList(
devExt->ReadAdapter,
devExt->ReadScatterGatherList,
FALSE);
devExt->ReadScatterGatherList = NULL;
....
}

The dump file shows the "PutScatterGatherList()" function was stalled and took long time which finally triggered the DPC_WATCHDOG_VIOLATION.

I really do not know how to fix this problem since the same code works well for many years. Is it possible that some win10 1809 (or maybe 1803) security policy configuration can cause this problem?

Please help! Thanks a lot!

Comments

  • Martin_BurnickiMartin_Burnicki Member - All Emails Posts: 24

    I've never used code like this in the drivers I'm maintaining. Anyway, in the code fragment you posted, KeAcquireSpinLockAtDpcLevel() is called on entry, but KeReleaseSpinLockFromDpcLevel() is only called in the "if (!readIrp)" code path, but not in the "else" part.

    If this is really missing in the full code, not only in the fragment you posted, this smells like a reason for a big problem. Just an idea.

  • zyezye Member Posts: 4

    Thanks a lot for your reply. We do have "KeReleaseSpinLockFromDpcLevel" in the else part. I just did not show it. As I said, this driver works very stable for many years from Win7 to Win10 1703 version. The dump file shows the "PutScatterGatherList()" function can not finish in a short time, which I do not understand.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,380

    I think I've answered this question off-line... but just for the record:

    The crash you’re seeing isn’t unusual… it simply is telling you that you’re spending too much time in your DpcForIsr, as described in your dump. You're allowed 500 clock ticks, you spent 501 clock ticks... see more info in the DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK as described in the crash.

    The only fix for this is to discover WHY you’re spending so much time in your DpcForIsr, and not do that. Seriously. Do some profiling in your DpcForIsr… figure out exactly what’s going on.

    This is the only way to avoid this issue.

    How much time are you spending waiting for that lock? Is it highly contended? Do you loop in your DpcForIsr (doing "the grand loop"?)... if so, do you limit the number of iterations you use in your loop?

    Windows is becoming increasingly strict in enforcing "proper behavior" in drivers. As you know, when YOU spend lots of time in YOUR DpcForIsr, you risk increasing latency for OTHER drivers in the system... to the point where those drivers can start to fail. The ONLY fix for this is to ensure that your driver is "well behaved"...

    So, sorry... no quick fix available for this one!

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,380

    (I'm working with the OP, who is a former WDF student of mine, offline... I'll ask him to post-back here once his problem is rectified.)

    Peter Viscarola
    OSR
    @OSRDrivers

  • Jan_BottorffJan_Bottorff Member - All Emails Posts: 471
    via Email
    As others have said, you need to figure out why your driver spends so much time at dispatch_level. If it were me, I'd add some ETW tracing in the ISR/DPC code and collect data on how much time it spends in different areas. I'd start by sprinkling half a dozen events with some summary data like bytes transferred at interesting places, like at the entry/exit of the DPC. ETW events will be timestamped with a QPC timer, which is often currently processor clock speed / 1024, so around 0.3 uSec resolution. Some of the etw trace decoder tools are silly and only show ms resolution, even though the actual timestamp is much better. I've seen etw traces with a million events/sec, but it's better if you keep the event rate somewhat lower. You CAN still extract ETW trace data if the system crashes, there are windbg commands to write the buffers to a file you can view with your preferred ETW tools. You should also look at non-crashing runs and get a sense of what normal for your driver timing is like.

    The DPC watchdog timer is set REALLY long, 0x500 ticks is about 20 seconds (assuming 16ms ticks). Are you really spending 20 seconds in one DPC? Typically, if you spend 20 seconds in a DPC or ISR you're stuck on a spinlock or something. There also are two forms of DPC_WATCHDOG timeout, one is a per DPC limit, and one is for back-to-back DPCS, which includes DPC's from different drivers, so your driver might need to be frugal with time if another driver is greedy, which is why your driver should not be greedy.

    It's REALLY easy to add ETW tracing with the TraceLogging API, see my example on this list from 6 months ago, or follow the basic MSFT doc pages. I often use the perfview tool on github to look at trace. Traceview also does the trick, especially if you use the etw name to guid hashing tool (etwguid), then you can just put *MyTraceProvider into traceview instead of the guid. Message analyzer seems to currently be broken (again) for TraceLogging traces.

    Some kinds of drivers will absolutely need DPC_WATCHDOG avoidance code, like probably any driver that can received unbounded fast data streams. The classic example is a high speed network card, Ethernet packets just arrive, and you can't just use more and more DPC time as the received rate goes up.

    Jan
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,035
    via Email
    Jan_Bottorff wrote:
    > The DPC watchdog timer is set REALLY long, 0x500 ticks is about 20 seconds (assuming 16ms ticks).

    I believe the term "ticks" here actually refers to 100ns timing units,
    so 0x500 actually means 128µs.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,380

    so 0x500 actually means 128µs.

    And that would make sense, because the "official" MSFT guideline is to spend "no more than 100us" in your DPC per invocation.

    The OP has suspended work on this issue temporarily, but I have counseled him to do his due diligence in terms of debugging/profiling. There is, I suppose, a vanishingly small chance that he's discovered a Windows bug (there have been ongoing changes to the DMA code to support DMA remapping)... and I don't have source code to look at right now to be able to run the issue down for him.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,151

    You can get the DPC watchdog timeout information from a dump using !dpcwatchdog. Looking at a DPC_WATCHDOG_VIOLATION dump (thank you Google and tech support forums for thy bounty) you can see the timeout is 20 seconds for a single DPC and 120 seconds cumulative:

    Current DPC: tcpip!TcpPeriodicTimeoutHandler (Normal DPC)
    Debugger Saved IRQL: 13
    Cumulative DPC Time Limit: 120.000 seconds
    Current Cumulative DPC Time: 20.000 seconds
    Single DPC Time Limit: 20.000 seconds
    Current Single DPC Time: 20.016 seconds
    

    This is a Windows 10 RS3 target, so possible the numbers are smaller on RS4/RS5.

    -scott
    OSR

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,151

    Another fun fact for today that I just learned: the DPC watchdog appears to be disabled if you have the debugger attached. I tried to force a machine to crash by infinitely looping on a spinlock in a DPC and nothing happened. Breaking in and running !dpcwatchdog I see this:

    Current DPC: Nothing!BuggyDpcRoutine (Normal DPC)
    Debugger Saved IRQL: 13
    Cumulative DPC Time Limit: 0.000 seconds
    Current Cumulative DPC Time: 72.234 seconds
    Single DPC Time Limit: 0.000 seconds
    Current Single DPC Time: 72.234 seconds
    

    Makes sense that this crash might come as a surprise to someone if they did most of their test/validation with a debugger attached.

    -scott
    OSR

  • Jan_BottorffJan_Bottorff Member - All Emails Posts: 471
    via Email
    I'm pretty sure the DPC watchdog limit is a lot longer than 128 uSec, although 20 seconds does seem like an extremely long time. As I remember it get's checked in the timer ISR/IPI, so a single 16msec timer tick would be way longer than 128 uSec. Perhaps it's in opaque DPC_WATCHDOG units. The APIs to retrieve it just will tell you what fraction of the way you are to a timeout. I usually pick a number like 25% as a threshold of when to handoff work to a passive level thread.

    Perhaps a request for a documentation update is in order, to clarify what unit the watchdog "ticks" are in.

    It would be pretty easy for the OP to write a few lines of code to measure what the limit actually is. Raise IRQL to DISPATCH_LEVEL, loop around incrementing a counter with a 10 uSec stall in each loop, see how many times you loop before getting a DPC_WATCHDOG crash.

    There also is another watchdog timer which basically detected when a core is no longer responding to timer ticks. All the cores get timer ticks, and there is code that makes sure all cores are responding to those ticks. A core can sometimes be stuck "outside" OS execution, like trapped back to a hypervisor. I believe the number I've recently seen is if 4 timer ticks happen, and a core doesn't react, a timer watchdog timeout bugcheck happens. If for example you have two processor sockets, and one socket get's into a fatal error state, half the cores will stop responding to timer ticks. Another case happens if for example you go do a PCI config read, and the device keeps responding with a retry status, that core is basically locked on that read and not responding to interrupts. A lot of complex I/O devices process PCI config requests in firmware, so what you see if that firmware locks up while the OS is doing a config cycle is an OS timer watchdog bugcheck.

    Even though the watchdog timers can be annoying to developers, I do believe they are a really important feature for detecting hardware and software errors.

    Jan
  • Jan_BottorffJan_Bottorff Member - All Emails Posts: 471
    via Email
    Excellent point!

    Jan
  • Jeffrey_Tippet_[MSFT]Jeffrey_Tippet_[MSFT] Member - All Emails Posts: 545

    There are two different and basically unrelated DPC watchdogs.

    The single DPC watchdog measures how long a KDEFERRED_ROUTINE executes. If it's more than 20 seconds†, you get a bugcheck 133/0.

    The cumulative DPC watchdog measures how long a processor stays at DISPATCH_LEVEL (or higher). If it's more than 120 seconds†, you get a bugcheck 133/1.

    Note that the cumulative DPC watchdog doesn't actually have anything to do with DPCs. You can trigger it from a regular boring thread, just by doing KeRaiseIrql(DISPATCH_LEVEL), then waiting for a couple minutes -- no DPC involved. (Another common repro: trying to recursively acquire a spinlock.)

    Also note that the cumulative DPC watchdog complains about situations that are not always under your control. If someone ran their crappy code at high IRQL for 119 seconds, then handed the processor to your DPC, you might happen to be the unlucky chap on the processor when the 120 second timer expires. Recent versions of Windows include callstack captures of the last 10 seconds† before a 133/1 is declared, so you can see what other things were going on just prior to the bugcheck.

    @Scott_Noone_(OSR) said:

    Another fun fact for today that I just learned: the DPC watchdog appears to be disabled if you have the debugger attached.

    I am not aware of that behavior. However, DPC watchdogs are disabled in most varieties of VM guests†. Use physical hosts, if you need to test against the DPC watchdog.

    If you are currently running in a DPC (an actual DPC, not just raised IRQL), you can use the KeQueryDpcWatchdogInformation to learn what the DPC watchdog limits are†, and maybe how close you are to exceeding them†. However, using this API correctly requires a deep understanding of the NT kernel and its DPC watchdog. So we generally recommend that you do not use this API, except possibly for debugging / diagnostics. To properly scare you away, let me point out that the returned DpcWatchdogCount has some super strange behavior. (It only makes sense if are intimately familiar with how the DPC watchdog is implemented. Give me a whiteboard and a beer, and I can explain it in about 20 minutes.) Also, the behavior of DpcWatchdogCount changed somewhere between Windows 8.1 and Windows 10, and is likely to subtlely change again.

    @Jan_Bottorff said:

    Some kinds of drivers will absolutely need DPC_WATCHDOG avoidance code, like probably any driver that can received unbounded fast data streams. The classic example is a high speed network card, Ethernet packets just arrive, and you can't just use more and more DPC time as the received rate goes up.

    ^^This^^. If your driver has no bound on the number of DPCs that get queued up, then your driver has a 133/1 bugcheck waiting for it.

    † Implementation detail, subject to change in any release, even a QFE. In fact, some of these details have already changed since the DPC watchdog was invented.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,035
    via Email
    On Oct 19, 2018, at 5:18 PM, Jan_Bottorff wrote:
    >
    > I'm pretty sure the DPC watchdog limit is a lot longer than 128 uSec, although 20 seconds does seem like an extremely long time.

    I suppose we should do an experiment to find out, but I think it really is 128us. I encountered this bug check a year or two ago, and it was for an operation much less than a millisecond.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • zyezye Member Posts: 4

    Thanks very much for all your replies. After 2 weeks's desperately debugging, unfortunately I still can not fix this problem. But I still want to give an update here. We have opened one ticket with Microsoft WDK group and provided 2 dump files to Microsoft engineers. However no solution found yet.

    I really do not think our DPC function is the root reason. The same driver code has been used almost for 10 years from Win7 to Win10 (1607, 1703, 1709 and 1803 version) and no problems. Only for the latest Win10 1809 version, the BSOD can consistently happen (with the same hardware).

    My debug shows the root reason is from the "PutScatterGatherList()" function inside of our DPC. The "PutScatterGatherList()" can not finish and take forever to run, which I do not understand.

    To prove this, I did another test to disable all the interrupts in our driver to enforce the DPC function not running. Since the "PutScatterGatherList()" function is also used in our "CancelActiveRead()" function, I can make another BSOD by canceling the DMA read only without DPC running.

    Our "CancelActiveRead()" function has the code like:

    VOID CancelActiveRead(  PDEVICE_OBJECT DeviceObject,   PIRP Irp)
    {
        //-----------------------------------------------------------------------------
        PGEMINI_DEVICE_EXT  DevExt = (PGEMINI_DEVICE_EXT)DeviceObject->DeviceExtension;
        KIRQL OldIrql;
        //-----------------------------------------------------------------------------
    
        DbgPrint("\nCancelActiveRead() enter()\n");
    
        ......  // Some code here
    
        //-------------------------------------------------------------------------
        KeAcquireSpinLock(&DevExt->ImageReadStateLock, &OldIrql);
    
        DbgPrint("\nCancelActiveRead() get ImageReadStateLock\n");
    
        //
        //  This drives everything practically into reset
        //
        SetPCICardReset(DevExt);
    
        //
        //  Clear the Transfer length
        //
        ProgramDmaTransferLength (DevExt, 0);
    
        //
        //  Undo some of the resets
        //
        ClearPCICardReset(DevExt);
    
        //
        //  Disarm the Card!  - yes we are doing this underneath the
        //  app sort of, except that either the app told up to CancelIo,
        //  or Windows is killing the thread...
        //
        ClearControlRegisterBits( DevExt, CTRL_BIT_ARM );
        SetControlRegisterBits( DevExt, CTRL_BIT_FLUSH_INPUT );
    
        //
        //  If these are equal, the DPC hasn't seen it yet
        //
        if (DevExt->CurrentReadIrp == Irp)
            DevExt->CurrentReadIrp = NULL;
    
    
        //
        //  Return the scatter/gather list we've been using
        //
        //  Third parameter is FALSE because this is a Read
        //  operation
        //
        DbgPrint("\nCancelActiveRead() Call PutScatterGatherList()\n");
        DevExt->ReadAdapter->DmaOperations->PutScatterGatherList(
                                            DevExt->ReadAdapter,
                                            DevExt->ReadScatterGatherList,
                                            FALSE);
        DbgPrint("\nCancelActiveRead() end PutScatterGatherList()\n");
        //
        //  Make sure we don't have a stale pointer
        //
        DevExt->ReadScatterGatherList = NULL;
    
    
        KeReleaseSpinLock(&DevExt->ImageReadStateLock, OldIrql);
        DbgPrint("\nCancelActiveRead() release ImageReadStateLock\n");
        //-------------------------------------------------------------------------
    
        .......  // other codes here
        .......
    }
    

    From the WinDbg, I got
    "CancelActiveRead() enter()

    CancelActiveRead() get ImageReadStateLock

    CancelActiveRead() Call PutScatterGatherList()
    "
    NO ANY output after this.

    Since I added two dbgprints just before and after the "PutScatterGatherList()", I can only see the first dbgprint is displayed but the second dbgprint never show up. And the target machine freezes.

    In our driver, the "PutScatterGatherList()" function is only used in "CancelActiveRead()" and DPC function. Therefore I think the "PutScatterGatherList()" take forever to run is the root problem caused the BSOD. Because it can not finish forever, finially triggered the DPC_WATCHDOG_VIOLATION.

    I just do not understand what causes the "PutScatterGatherList()" can not finish. And I still have no clue how to fix it.

  • zyezye Member Posts: 4

    We finally figured out the reason. I give an update here. It may be helpful for others. After working with Microsoft and Scott (OSR), we figured out the Win10 1809 (RS5) has a bug in Version2 DMA adapter. The V2 API can loop forever when the MDL is out of range. We changed to V3 version DMA adapter and tested it. So far the V3 works and we have not find issues yet. Thanks a lot for everyone's help.

  • LuboLubo Member Posts: 7

    Can i get more information about the bug in "Version2" DMA somewhere? Was this fixed in 1903 update? Is there a way to work around this other than forcing the DMA Version 3 (which is not easily possible on our side atm.).

    The reason I am curious is that we are seeing very similar problem with our PCIe driver on one customer's PC and nowhere else. They have the update 1809 installed. One set of dump files show DPC_WATCHDOG_VIOLATION/DPC_QUEUE_EXECUTION_TIMEOUT_EXCEEDED and the other dump file set always point to our DPC routine on a specific code line - a division by zero bugcheck although there is no division at that line (for the record: the only division in our DPC is done several lines before that and the divisor is a cached performance counter frequency value).

    I don't want to start describing our problem in detail, just trying to figure out if this could be the same problem.

    Thank you!

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,151

    Does the call stack point to your driver calling PutScatterGatherList? That was the footprint of the bug we saw in the V2 version of the structure. It was acknowledged as bug in 1809 but I didn't follow up to see if it was fixed or not.

    -scott
    OSR

  • LuboLubo Member Posts: 7

    Hi Scott! (former student here) No reference to PutScatterGatherList at all in the stack. We do not actually do any DMA/SG related stuff in the DPC. We just create a DMA transaction, fill Common Buffer with SG descriptors at the very beginning and then let the device fill the memory in a circular buffer fashion - interrupting each time an "image" is DMA'd (it's an image acquisition device, I mean... camera).
    I might be on a wrong path then but since the same driver has been working fine for years the 1809 changes raised some suspicion. I am going to dig deeper, and will perhaps post a separate question if I get completely stuck.

    For reference, here are the two kind of stack traces we are getting from multiple dumps (the customer sees a BSOD about once a day). The PMPCIE is our driver.

    KMODE_EXCEPTION_NOT_HANDLED
    nt!KeBugCheckEx
    nt!KiFatalExceptionHandler+0x22
    nt!RtlpExecuteHandlerForException+0xf
    nt!RtlDispatchException+0x430
    nt!KiDispatchException+0x144
    nt!KiExceptionDispatch+0xc2
    nt!KiDivideErrorFault+0x2ef
    PMPCIE!PmEvtInterruptDpc+0xaa [c:\dev\...\pmpcieinterruptsupport.cpp @ 188] 
    Wdf01000!FxInterrupt::_InterruptDpcThunk+0x9d [minkernel\wdf\framework\shared\irphandlers\pnp\km\interruptobjectkm.cpp @ 410] 
    nt!KiExecuteAllDpcs+0x2e7
    nt!KiRetireDpcList+0x1ae
    nt!KiIdleLoop+0x5a
    
    DPC_WATCHDOG_VIOLATION
    nt!KeBugCheckEx
    nt!KeAccumulateTicks+0x144c93
    nt!KeClockInterruptNotify+0xcf
    hal!HalpTimerClockIpiRoutine+0x21
    nt!KiCallInterruptServiceRoutine+0xa5
    nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
    nt!KiInterruptDispatchNoLockNoEtw+0x37
    nt!MiResolveProtoPteFault+0x399
    nt!MiDispatchFault+0x359
    nt!MmAccessFault+0x15a
    nt!KiPageFault+0x343
    
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,035

    Can you show us the code in PmpEvtInterruptDpc?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • LuboLubo Member Posts: 7

    Hi Tim, appreciate your help but I am not sure if I can post the whole code. Apologies if the digest below doesn't help. Perhaps I can send the whole thing via email?

    Below is a shortened version and explanation why it's done this way. Maybe you'll spot a problem right away.

    ISR (not shown here)

    1. ISR can get fired very often, in extreme cases it could be 20000x a second (e.g. at 20000 FPS).
    2. HOWEVER: In this particular case the rate would be much lower around 10 ISRs a second.
    3. Since we had issues with skipped frames at very high rates we implemented an "ISR event queueing", as described here.
    4. ISR is called on every frame written, i.e. the device interrupts: "hey, I just DMA'd a frame into your circ. buffer"
    5. ISR retrieves a preallocated event item (ExInterlockedPopEntrySList), fills the event with a simple frame counter and timestamp from KeQueryPerformanceCounter.
    6. ISR queues the event into a custom ISR-DPC queue (ExInterlockedInsertTailList, max 64 items capacity).
    7. ISR then calls WdfInterruptQueueDpcForIsr (if the same interrupt arrives again before the DPC is executed the WdfInterruptQueueDpcForIsr should fail but the 'event' is still queued before that).
    8. ISR routine leaves.

    DPC (below)

    1. DPC iterates over the 'ISR event queue' (PmIrqEventDequeue -> ExInterlockedRemoveHeadList)
    2. DPC checks the mask for the EOF event. (there might be different flags used for other purposes)
    3. DPC queues a 'notification' event into another, different queue, this queue is processed by a thread on passive level that sends notifications to user space.
    4. (in short: there is a queue between ISR and DPC and different queue between DPC and passive level thread)
    5. Meanwhile, the ISR may get fired again, pushing another event to the queue, that's why we limit the DPC with MAX_IRQ_EVENTS

    So far this has been working fine although I'm not an experience kernel developer - if there is a problem with the approach itself I'll be happy to learn. I am also investigating a possibility that the division by Frequency is really the problem and the Frequency gets corrupted (it's updated at the start of the acquisition). But again, this has been flawless for years and it still is on 99% of computers we tested, even with very high frame rates. Just the one computer on customer's site reports these weird bugchecks so far.

    Function itself

    VOID PmEvtInterruptDpc(IN WDFINTERRUPT WdfInterrupt, IN WDFOBJECT WdfDevice) {
        PDEVICE_EXTENSION DeviceExtension = GetDeviceContext(WdfDevice);
        ULONG ulProcessed = 0;
        PPM_IRQ_EVENT irqEvt = NULL;
    
        while ((irqEvt = PmIrqEventDequeue(DeviceExtension)) != NULL) {
            if( irqEvt->InterruptMask & PMPCIE_INT_MASK_EOF) {
                // We need to store the last timestamps because these are still used with status IOCTL
                DeviceExtension->TimeStamp = irqEvt->TimeStamp;
                DeviceExtension->TimeStamp.QuadPart -= DeviceExtension->AcquisitionStartTime.QuadPart;
                DeviceExtension->TimeStamp.QuadPart *= RES100US;
                DeviceExtension->TimeStamp.QuadPart /= DeviceExtension->Frequency.QuadPart; //< **** DIVISION HERE! ****
    
                PM_ACQ_METADATA acqMd;
                acqMd.EofCount = irqEvt->EofCount;
                acqMd.TimeStampEof = DeviceExtension->TimeStamp.QuadPart;
                PmQueueAcqEvent(DeviceExtension,PM_EVENT_DING_NOTIFICATIONS,(void *)NOTIFICATION_EOF, &acqMd); //< **** LINE 188 ****, bugcheck sometimes reported here, division by zero
                KeSetEvent(&DeviceExtension->AcqEvent,0,FALSE);
                irqEvt->InterruptMask &= (ULONG)(~PMPCIE_INT_MASK_EOF); // Clear the flag
            }
            ulProcessed++;
            // Push the event object back for reuse
            PmIrqEventReturnForReuse(DeviceExtension, irqEvt);
            // Since we could theoretically spend infinite time in this loop (because ISR
            // can keep filling the queue, we should stop at some point and give up)
            if (ulProcessed > MAX_IRQ_EVENTS) // Max 64 items
            {
                DbgPrint("PMPCIE: Too many IRQs pending in DPC, quitting.");
                break;
            }
        }
        return;
    }
    
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,035

    You might add ASSERT(DeviceExtension->Frequency.QuadPart); just before the division.

    Is RES100US equal to 10,000? Remember that, on some systems, the frequency in KeQueryPerformanceCounter returns the CPU clock, and in that case it wouldn't take much of a multiplication to overflow a 64-bit integer.

    Have you disassembled the code to see if the exception is really occurring on a div instruction? It could be another case of optimization rearrangement. If you look at the code, it doesn't actually NEED to compute the division until it passes acqMd to PmQueueAcqEvent.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,380

    Can you please show us the code where the divide is being done?

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • LuboLubo Member Posts: 7

    Peter, the division is done right on this line:
    DeviceExtension->TimeStamp.QuadPart /= DeviceExtension->Frequency.QuadPart;

    Tim, yes RES100US is defined as 10000. Not sure about the overflow you mentioned, even if the dividend would overflow to zero it's the divisor Frequency that worries me (unless I misunderstood completely).

    Good point about the optimization! WinDbg is clear in this regard:

    FAULTING_IP: 
    PMPCIE!PmEvtInterruptDpc+aa [c:\...\pmpcieinterruptsupport.cpp @ 188]
    fffff802`80339926 48f7bfe0130000  idiv    rax,qword ptr [rdi+13E0h]
    

    And the disassembly (may not exactly correspond to the code in above post)

        fffff802`803398fa 747d           je      PMPCIE!PmEvtInterruptDpc+0xfd (fffff802`80339979)
        fffff802`803398fc 488b4b30       mov     rcx, qword ptr [rbx+30h]
        fffff802`80339900 4c8d4dd0       lea     r9, [rbp-30h]
        fffff802`80339904 482b8fd8130000 sub     rcx, qword ptr [rdi+13D8h]
        fffff802`8033990b 4c8b87f8130000 mov     r8, qword ptr [rdi+13F8h]
        fffff802`80339912 4869c110270000 imul    rax, rcx, 2710h
        fffff802`80339919 4c8987f0130000 mov     qword ptr [rdi+13F0h], r8
        fffff802`80339920 4899           cqo     
        fffff802`80339922 4c8945d8       mov     qword ptr [rbp-28h], r8
        fffff802`80339926 48f7bfe0130000 idiv    rax, qword ptr [rdi+13E0h] // <===
        fffff802`8033992d 8bd0           mov     edx, eax
        fffff802`8033992f 488987e8130000 mov     qword ptr [rdi+13E8h], rax
        fffff802`80339936 412bd0         sub     edx, r8d
        fffff802`80339939 488945e0       mov     qword ptr [rbp-20h], rax
        fffff802`8033993d 899700140000   mov     dword ptr [rdi+1400h], edx
        fffff802`80339943 8b4b28         mov     ecx, dword ptr [rbx+28h]
        fffff802`80339946 894dd0         mov     dword ptr [rbp-30h], ecx
        fffff802`80339949 8b4b2c         mov     ecx, dword ptr [rbx+2Ch]
        fffff802`8033994c 8955e8         mov     dword ptr [rbp-18h], edx
        fffff802`8033994f ba02000000     mov     edx, 2
        fffff802`80339954 894dd4         mov     dword ptr [rbp-2Ch], ecx
        fffff802`80339957 488bcf         mov     rcx, rdi
        fffff802`8033995a 448d4206       lea     r8d, [rdx+6]
        fffff802`8033995e e801efffff     call    PMPCIE!PmQueueAcqEvent (fffff802`80338864)
    

    Now assuming the Frequency is really the culprit - which means we must have a synchronization problem - this raises a question:

    Can a div/0 exception in DPC lead to a DPC_WATCHDOG_VIOLATION? Perhaps the OS is saving the stack trace, handling the exception, further delaying the DPC, causing the watchdog to cause a bugcheck sooner, essentially hiding the root cause in the other dump file?

    In any case I now cache the Frequency only once, in EvtDeviceAdd (MSDN says it's set at system boot and never changes if we target modern OSes) and waiting for the customer to see if that helps. If yes, it will confirm the synchronization problem.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,035

    My caution about overflow was not related to your div by zero problem. I'm just pointing out the potential math problems. I don't think it's an issue for you, because your multiplier is smallish, but let me describe how this kind of code can cause problems.

    On some systems, KeQueryPerformanceCounter is based on the CPU's cycle counter. That means the frequency is the clock frequency. Say your CPU is 3.6 GHz. That means the counter overflows a 32-bit integer every second. That's OK, the field is 64-bits wide.

    But some people try to convert to Windows' 100ns units by multiplying the counter by 10,000,000 and dividing by the frequency. That's a problem, because it only takes 9 minutes at 3.6GHz for cycle counter x 10,000,000 to overflow a 64-bit integer. You'll still get a result, but truncated.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,151

    Unlikely that the divide by zero is causing the watchdog. For the DPC_WATCHDOG_VIOLATION post the output of the following:

    !running -ti

    -scott
    OSR

  • LuboLubo Member Posts: 7

    Unfortunately !running -ti gives:

    6: kd> !running -ti
    GetUlongFromAddress: unable to read from fffff80126b602c4
    Couldn't find processor info
    

    Probably because this is an "Automatic" small minidump. Also it seems we actually have only one dump with this watchdog error so far. It might be completely unrelated - the customer simply sent the whole Minidump folder... So far no reports on further crashes, fingers crossed this was the div/0 problem.

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,151

    Yeah, that's a minidump and it's not going to be much use for the DPC_WATCHDOG problem. The clipped version of the stack you provided looks reasonably normal, so the problem is probably on another processor. With a minidump you only get the one processor so it's a dead end. We'll all just pretend we didn't see that one :)

    -scott
    OSR

  • MichaelKMichaelK Member Posts: 1

    Can a div/0 exception in DPC lead to a DPC_WATCHDOG_VIOLATION?

    It's not necessary div/0 exception, it also might be division overflow (happens when the result of division doesn't fit to AX). And it actually fits the instruction sequence you posted above:
    ...
    cqo
    ...
    idiv

    if rax has MSB set and the divisor is too small, it might lead to the exception.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Developing Minifilters 29 July 2019 OSR Seminar Space
Writing WDF Drivers 23 Sept 2019 OSR Seminar Space
Kernel Debugging 21 Oct 2019 OSR Seminar Space
Internals & Software Drivers 18 Nov 2019 Dulles, VA