Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
Hi,
Please help us for a DPC time out BSOD issue on Win10 1809 version. We have a customized PCIE card to capture data from another device. We developed a WDM driver which works well for many years from Win7 to Win10 1703 version. But now the same driver got BSOD on Win10 1809 as below (We did not try win10 1709 and 1803 version)
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: fffff80644d5c380, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding this single DPC timeout
The problem only happens on the DPC call for the DMA interrupt. In the DPC function for the DMA. We have the code like
....
KeAcquireSpinLockAtDpcLevel(&devExt->ImageReadStateLock);
readIrp = devExt->CurrentReadIrp;
if (!readIrp) {
....
KeReleaseSpinLockFromDpcLevel(&devExt->ImageReadStateLock);
}
else
{
devExt->ReadAdapter->DmaOperations->PutScatterGatherList(
devExt->ReadAdapter,
devExt->ReadScatterGatherList,
FALSE);
devExt->ReadScatterGatherList = NULL;
....
}
The dump file shows the "PutScatterGatherList()" function was stalled and took long time which finally triggered the DPC_WATCHDOG_VIOLATION.
I really do not know how to fix this problem since the same code works well for many years. Is it possible that some win10 1809 (or maybe 1803) security policy configuration can cause this problem?
Please help! Thanks a lot!
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 16-20 October 2023 | Live, Online |
Developing Minifilters | 13-17 November 2023 | Live, Online |
Internals & Software Drivers | 4-8 Dec 2023 | Live, Online |
Writing WDF Drivers | 10-14 July 2023 | Live, Online |
Comments
I've never used code like this in the drivers I'm maintaining. Anyway, in the code fragment you posted, KeAcquireSpinLockAtDpcLevel() is called on entry, but KeReleaseSpinLockFromDpcLevel() is only called in the "if (!readIrp)" code path, but not in the "else" part.
If this is really missing in the full code, not only in the fragment you posted, this smells like a reason for a big problem. Just an idea.
Thanks a lot for your reply. We do have "KeReleaseSpinLockFromDpcLevel" in the else part. I just did not show it. As I said, this driver works very stable for many years from Win7 to Win10 1703 version. The dump file shows the "PutScatterGatherList()" function can not finish in a short time, which I do not understand.
I think I've answered this question off-line... but just for the record:
The crash you’re seeing isn’t unusual… it simply is telling you that you’re spending too much time in your DpcForIsr, as described in your dump. You're allowed 500 clock ticks, you spent 501 clock ticks... see more info in the DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK as described in the crash.
The only fix for this is to discover WHY you’re spending so much time in your DpcForIsr, and not do that. Seriously. Do some profiling in your DpcForIsr… figure out exactly what’s going on.
This is the only way to avoid this issue.
How much time are you spending waiting for that lock? Is it highly contended? Do you loop in your DpcForIsr (doing "the grand loop"?)... if so, do you limit the number of iterations you use in your loop?
Windows is becoming increasingly strict in enforcing "proper behavior" in drivers. As you know, when YOU spend lots of time in YOUR DpcForIsr, you risk increasing latency for OTHER drivers in the system... to the point where those drivers can start to fail. The ONLY fix for this is to ensure that your driver is "well behaved"...
So, sorry... no quick fix available for this one!
Peter
Peter Viscarola
OSR
@OSRDrivers
(I'm working with the OP, who is a former WDF student of mine, offline... I'll ask him to post-back here once his problem is rectified.)
Peter Viscarola
OSR
@OSRDrivers
The DPC watchdog timer is set REALLY long, 0x500 ticks is about 20 seconds (assuming 16ms ticks). Are you really spending 20 seconds in one DPC? Typically, if you spend 20 seconds in a DPC or ISR you're stuck on a spinlock or something. There also are two forms of DPC_WATCHDOG timeout, one is a per DPC limit, and one is for back-to-back DPCS, which includes DPC's from different drivers, so your driver might need to be frugal with time if another driver is greedy, which is why your driver should not be greedy.
It's REALLY easy to add ETW tracing with the TraceLogging API, see my example on this list from 6 months ago, or follow the basic MSFT doc pages. I often use the perfview tool on github to look at trace. Traceview also does the trick, especially if you use the etw name to guid hashing tool (etwguid), then you can just put *MyTraceProvider into traceview instead of the guid. Message analyzer seems to currently be broken (again) for TraceLogging traces.
Some kinds of drivers will absolutely need DPC_WATCHDOG avoidance code, like probably any driver that can received unbounded fast data streams. The classic example is a high speed network card, Ethernet packets just arrive, and you can't just use more and more DPC time as the received rate goes up.
Jan
> The DPC watchdog timer is set REALLY long, 0x500 ticks is about 20 seconds (assuming 16ms ticks).
I believe the term "ticks" here actually refers to 100ns timing units,
so 0x500 actually means 128µs.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
And that would make sense, because the "official" MSFT guideline is to spend "no more than 100us" in your DPC per invocation.
The OP has suspended work on this issue temporarily, but I have counseled him to do his due diligence in terms of debugging/profiling. There is, I suppose, a vanishingly small chance that he's discovered a Windows bug (there have been ongoing changes to the DMA code to support DMA remapping)... and I don't have source code to look at right now to be able to run the issue down for him.
Peter
Peter Viscarola
OSR
@OSRDrivers
You can get the DPC watchdog timeout information from a dump using !dpcwatchdog. Looking at a DPC_WATCHDOG_VIOLATION dump (thank you Google and tech support forums for thy bounty) you can see the timeout is 20 seconds for a single DPC and 120 seconds cumulative:
This is a Windows 10 RS3 target, so possible the numbers are smaller on RS4/RS5.
-scott
OSR
Another fun fact for today that I just learned: the DPC watchdog appears to be disabled if you have the debugger attached. I tried to force a machine to crash by infinitely looping on a spinlock in a DPC and nothing happened. Breaking in and running !dpcwatchdog I see this:
Makes sense that this crash might come as a surprise to someone if they did most of their test/validation with a debugger attached.
-scott
OSR
Perhaps a request for a documentation update is in order, to clarify what unit the watchdog "ticks" are in.
It would be pretty easy for the OP to write a few lines of code to measure what the limit actually is. Raise IRQL to DISPATCH_LEVEL, loop around incrementing a counter with a 10 uSec stall in each loop, see how many times you loop before getting a DPC_WATCHDOG crash.
There also is another watchdog timer which basically detected when a core is no longer responding to timer ticks. All the cores get timer ticks, and there is code that makes sure all cores are responding to those ticks. A core can sometimes be stuck "outside" OS execution, like trapped back to a hypervisor. I believe the number I've recently seen is if 4 timer ticks happen, and a core doesn't react, a timer watchdog timeout bugcheck happens. If for example you have two processor sockets, and one socket get's into a fatal error state, half the cores will stop responding to timer ticks. Another case happens if for example you go do a PCI config read, and the device keeps responding with a retry status, that core is basically locked on that read and not responding to interrupts. A lot of complex I/O devices process PCI config requests in firmware, so what you see if that firmware locks up while the OS is doing a config cycle is an OS timer watchdog bugcheck.
Even though the watchdog timers can be annoying to developers, I do believe they are a really important feature for detecting hardware and software errors.
Jan
Jan
There are two different and basically unrelated DPC watchdogs.
The single DPC watchdog measures how long a KDEFERRED_ROUTINE executes. If it's more than 20 seconds†, you get a bugcheck 133/0.
The cumulative DPC watchdog measures how long a processor stays at DISPATCH_LEVEL (or higher). If it's more than 120 seconds†, you get a bugcheck 133/1.
Note that the cumulative DPC watchdog doesn't actually have anything to do with DPCs. You can trigger it from a regular boring thread, just by doing KeRaiseIrql(DISPATCH_LEVEL), then waiting for a couple minutes -- no DPC involved. (Another common repro: trying to recursively acquire a spinlock.)
Also note that the cumulative DPC watchdog complains about situations that are not always under your control. If someone ran their crappy code at high IRQL for 119 seconds, then handed the processor to your DPC, you might happen to be the unlucky chap on the processor when the 120 second timer expires. Recent versions of Windows include callstack captures of the last 10 seconds† before a 133/1 is declared, so you can see what other things were going on just prior to the bugcheck.
@Scott_Noone_(OSR) said:
I am not aware of that behavior. However, DPC watchdogs are disabled in most varieties of VM guests†. Use physical hosts, if you need to test against the DPC watchdog.
If you are currently running in a DPC (an actual DPC, not just raised IRQL), you can use the KeQueryDpcWatchdogInformation to learn what the DPC watchdog limits are†, and maybe how close you are to exceeding them†. However, using this API correctly requires a deep understanding of the NT kernel and its DPC watchdog. So we generally recommend that you do not use this API, except possibly for debugging / diagnostics. To properly scare you away, let me point out that the returned DpcWatchdogCount has some super strange behavior. (It only makes sense if are intimately familiar with how the DPC watchdog is implemented. Give me a whiteboard and a beer, and I can explain it in about 20 minutes.) Also, the behavior of DpcWatchdogCount changed somewhere between Windows 8.1 and Windows 10, and is likely to subtlely change again.
^^This^^. If your driver has no bound on the number of DPCs that get queued up, then your driver has a 133/1 bugcheck waiting for it.
† Implementation detail, subject to change in any release, even a QFE. In fact, some of these details have already changed since the DPC watchdog was invented.
>
> I'm pretty sure the DPC watchdog limit is a lot longer than 128 uSec, although 20 seconds does seem like an extremely long time.
I suppose we should do an experiment to find out, but I think it really is 128us. I encountered this bug check a year or two ago, and it was for an operation much less than a millisecond.
—
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Thanks very much for all your replies. After 2 weeks's desperately debugging, unfortunately I still can not fix this problem. But I still want to give an update here. We have opened one ticket with Microsoft WDK group and provided 2 dump files to Microsoft engineers. However no solution found yet.
I really do not think our DPC function is the root reason. The same driver code has been used almost for 10 years from Win7 to Win10 (1607, 1703, 1709 and 1803 version) and no problems. Only for the latest Win10 1809 version, the BSOD can consistently happen (with the same hardware).
My debug shows the root reason is from the "PutScatterGatherList()" function inside of our DPC. The "PutScatterGatherList()" can not finish and take forever to run, which I do not understand.
To prove this, I did another test to disable all the interrupts in our driver to enforce the DPC function not running. Since the "PutScatterGatherList()" function is also used in our "CancelActiveRead()" function, I can make another BSOD by canceling the DMA read only without DPC running.
Our "CancelActiveRead()" function has the code like:
From the WinDbg, I got
"CancelActiveRead() enter()
CancelActiveRead() get ImageReadStateLock
CancelActiveRead() Call PutScatterGatherList()
"
NO ANY output after this.
Since I added two dbgprints just before and after the "PutScatterGatherList()", I can only see the first dbgprint is displayed but the second dbgprint never show up. And the target machine freezes.
In our driver, the "PutScatterGatherList()" function is only used in "CancelActiveRead()" and DPC function. Therefore I think the "PutScatterGatherList()" take forever to run is the root problem caused the BSOD. Because it can not finish forever, finially triggered the DPC_WATCHDOG_VIOLATION.
I just do not understand what causes the "PutScatterGatherList()" can not finish. And I still have no clue how to fix it.
We finally figured out the reason. I give an update here. It may be helpful for others. After working with Microsoft and Scott (OSR), we figured out the Win10 1809 (RS5) has a bug in Version2 DMA adapter. The V2 API can loop forever when the MDL is out of range. We changed to V3 version DMA adapter and tested it. So far the V3 works and we have not find issues yet. Thanks a lot for everyone's help.
Can i get more information about the bug in "Version2" DMA somewhere? Was this fixed in 1903 update? Is there a way to work around this other than forcing the DMA Version 3 (which is not easily possible on our side atm.).
The reason I am curious is that we are seeing very similar problem with our PCIe driver on one customer's PC and nowhere else. They have the update 1809 installed. One set of dump files show DPC_WATCHDOG_VIOLATION/DPC_QUEUE_EXECUTION_TIMEOUT_EXCEEDED and the other dump file set always point to our DPC routine on a specific code line - a division by zero bugcheck although there is no division at that line (for the record: the only division in our DPC is done several lines before that and the divisor is a cached performance counter frequency value).
I don't want to start describing our problem in detail, just trying to figure out if this could be the same problem.
Thank you!
Does the call stack point to your driver calling PutScatterGatherList? That was the footprint of the bug we saw in the V2 version of the structure. It was acknowledged as bug in 1809 but I didn't follow up to see if it was fixed or not.
-scott
OSR
Hi Scott! (former student here) No reference to
PutScatterGatherList
at all in the stack. We do not actually do any DMA/SG related stuff in the DPC. We just create a DMA transaction, fill Common Buffer with SG descriptors at the very beginning and then let the device fill the memory in a circular buffer fashion - interrupting each time an "image" is DMA'd (it's an image acquisition device, I mean... camera).I might be on a wrong path then but since the same driver has been working fine for years the 1809 changes raised some suspicion. I am going to dig deeper, and will perhaps post a separate question if I get completely stuck.
For reference, here are the two kind of stack traces we are getting from multiple dumps (the customer sees a BSOD about once a day). The PMPCIE is our driver.
Can you show us the code in PmpEvtInterruptDpc?
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Hi Tim, appreciate your help but I am not sure if I can post the whole code. Apologies if the digest below doesn't help. Perhaps I can send the whole thing via email?
Below is a shortened version and explanation why it's done this way. Maybe you'll spot a problem right away.
ISR (not shown here)
ExInterlockedPopEntrySList
), fills the event with a simple frame counter and timestamp fromKeQueryPerformanceCounter
.ExInterlockedInsertTailList
, max 64 items capacity).WdfInterruptQueueDpcForIsr
(if the same interrupt arrives again before the DPC is executed theWdfInterruptQueueDpcForIsr
should fail but the 'event' is still queued before that).DPC (below)
PmIrqEventDequeue
->ExInterlockedRemoveHeadList
)MAX_IRQ_EVENTS
So far this has been working fine although I'm not an experience kernel developer - if there is a problem with the approach itself I'll be happy to learn. I am also investigating a possibility that the division by Frequency is really the problem and the Frequency gets corrupted (it's updated at the start of the acquisition). But again, this has been flawless for years and it still is on 99% of computers we tested, even with very high frame rates. Just the one computer on customer's site reports these weird bugchecks so far.
Function itself
You might add ASSERT(DeviceExtension->Frequency.QuadPart); just before the division.
Is RES100US equal to 10,000? Remember that, on some systems, the frequency in KeQueryPerformanceCounter returns the CPU clock, and in that case it wouldn't take much of a multiplication to overflow a 64-bit integer.
Have you disassembled the code to see if the exception is really occurring on a div instruction? It could be another case of optimization rearrangement. If you look at the code, it doesn't actually NEED to compute the division until it passes acqMd to PmQueueAcqEvent.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Can you please show us the code where the divide is being done?
Peter
Peter Viscarola
OSR
@OSRDrivers
Peter, the division is done right on this line:
DeviceExtension->TimeStamp.QuadPart /= DeviceExtension->Frequency.QuadPart;
Tim, yes
RES100US
is defined as10000
. Not sure about the overflow you mentioned, even if the dividend would overflow to zero it's the divisorFrequency
that worries me (unless I misunderstood completely).Good point about the optimization! WinDbg is clear in this regard:
And the disassembly (may not exactly correspond to the code in above post)
Now assuming the
Frequency
is really the culprit - which means we must have a synchronization problem - this raises a question:In any case I now cache the
Frequency
only once, inEvtDeviceAdd
(MSDN says it's set at system boot and never changes if we target modern OSes) and waiting for the customer to see if that helps. If yes, it will confirm the synchronization problem.My caution about overflow was not related to your div by zero problem. I'm just pointing out the potential math problems. I don't think it's an issue for you, because your multiplier is smallish, but let me describe how this kind of code can cause problems.
On some systems, KeQueryPerformanceCounter is based on the CPU's cycle counter. That means the frequency is the clock frequency. Say your CPU is 3.6 GHz. That means the counter overflows a 32-bit integer every second. That's OK, the field is 64-bits wide.
But some people try to convert to Windows' 100ns units by multiplying the counter by 10,000,000 and dividing by the frequency. That's a problem, because it only takes 9 minutes at 3.6GHz for cycle counter x 10,000,000 to overflow a 64-bit integer. You'll still get a result, but truncated.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Unlikely that the divide by zero is causing the watchdog. For the DPC_WATCHDOG_VIOLATION post the output of the following:
!running -ti
-scott
OSR
Unfortunately
!running -ti
gives:Probably because this is an "Automatic" small minidump. Also it seems we actually have only one dump with this watchdog error so far. It might be completely unrelated - the customer simply sent the whole Minidump folder... So far no reports on further crashes, fingers crossed this was the div/0 problem.
Yeah, that's a minidump and it's not going to be much use for the DPC_WATCHDOG problem. The clipped version of the stack you provided looks reasonably normal, so the problem is probably on another processor. With a minidump you only get the one processor so it's a dead end. We'll all just pretend we didn't see that one
-scott
OSR
It's not necessary div/0 exception, it also might be division overflow (happens when the result of division doesn't fit to AX). And it actually fits the instruction sequence you posted above:
...
cqo
...
idiv
if rax has MSB set and the divisor is too small, it might lead to the exception.