Changing DPC time allotment (in ticks) to help slow non-updated drivers avoid 0x133 watchdog BSODs

Hi,

Some drivers for older equipment are not updated (and may never be updated) to behave quite good with newer Windows 10 builds (e.g. 2004) so they start very slow, especially during wake-up from hibernation procedure, what leads to the beloved DPC_WATCHDOG_VIOLATION BSODs. The error does not appear every time since drivers mostly manage to finish within the five hundred ticks threshold (which translates to about twenty seconds in human terms) but I must allow the drivers to do their job for a little longer than five hundred ticks – maybe six hundred ticks will work for the error to go away altogether.

I could not find proper documentation on how this threshold should be changed (I was surprised a bit, considering that MS maintains a giant site which has almost everything in triple or quadruple versions but not this).

I can create “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog” registry branch but the only official documentation I found is this: https://docs.microsoft.com/en-us/windows-hardware/drivers/display/disabling-the-watchdog-timer-while-testing-display-drivers – it only talks about a subset of drivers, namely, display drivers, not all drivers as necessary, and it does not offer a way to change the magical five hundred ticks threshold anyway.

So, does anyone know how to change the DCP time allotment (in ticks) threshold for the watchdog? Or, if it is not possible, how to turn off the DCP watchdog not just for display drivers but altogether? (I know it is not recommended, but I need to solve the issue in any way possible for the time being).

Thanks in advance.

Assuming that this is a closed system, this might be reasonable to do. But other than through a specific support arrangement with Microsoft, I don’t think you will even find a supported or documented way to do this

@MBond2 said:
Assuming that this is a closed system, this might be reasonable to do. But other than through a specific support arrangement with Microsoft, I don’t think you will even find a supported or documented way to do this

I am surprised that MS does allow the watchdog to be turned off for display drivers to aid both driver developers and users of slow video card operations (for cryptocurrency mining or scientific calculations) but does not officially allow it for any other types of drivers which also might require some freedom during development or use.

You’re surprised? Really? I gotta tell you, any driver that spends 20 seconds in a DPC is badly broken. Heck, cut that to to 2 seconds, or even 200 milliseconds, and I’d STILL say it was broken. Surely there are event-based mechanisms of handling it that don’t require camping out with interrupts blocked.

+1 with emphasis.

Peter

let’s probe a bit more. The example exceptions are for ‘cryptocurrency mining or scientific calculations’. How do these pertain in any way to a need for a long time running at DPC?

Reading between the lines, I think they’re implying that systems dedicated to heavily compute-intensive tasks often use sucky graphics cards with poorly written drivers.

@MBond2 said:
let’s probe a bit more. The example exceptions are for ‘cryptocurrency mining or scientific calculations’. How do these pertain in any way to a need for a long time running at DPC?

There is a lot of that on the net. People do really use the watchdog off for display drivers feature.

You do know that display drivers are consistently the #1 or #2 category of driver with the most bugs, right? This, as measured by crashes in the field.

Peter

@“Peter_Viscarola_(OSR)” said:
You do know that display drivers are consistently the #1 or #2 category of driver with the most bugs, right? This, as measured by crashes in the field.

Peter

Indeed; I just wish MS would also make registry setting for other drivers, too. And add setting for DPC time allotment threshold.

@Tim_Roberts said:
You’re surprised? Really? I gotta tell you, any driver that spends 20 seconds in a DPC is badly broken. Heck, cut that to to 2 seconds, or even 200 milliseconds, and I’d STILL say it was broken. Surely there are event-based mechanisms of handling it that don’t require camping out with interrupts blocked.

This is not solely a driver issue. For example, older Windows 10 versions did not hang on me and newer versions do it even though not a single driver was updated (I have an older hardware).

So there are changes in the OS that MS has made that affected the perfomance of the drivers, but I did not see their log of underlying low-level HAL changes so I am not sure what has happened and resulted in me having regular hangs/DPC watchdog violation errors that never occured before with the very same drivers.

And add setting for DPC time allotment threshold.

But that makes absolutely no sense, architecturally. The max time spent in a DPC is set for the overall good health of the system… not for the benefit of one driver.

As Mr Roberts noted, the limit is enormous and measured in seconds. Drivers (including my own drivers) used to get past the limit on older versions of Windows by queuing a new DPC (and thus causing the timer to reset). Newer versions of Windows prevent this mitigation by having an “overall time spent at DPC level” watchdog. We can argue whether that is a good idea or not (I would argue it is not) but that really doesn’t matter… cuz it’s what we’ve got to live with.

Peter

Again, how does this pertain in any way to a need for a long time running at DPC? What sort of ‘features’ are we talking about?

We are talking about older drivers that have worked absolutely fine with older builds of Windows 10 but not so much now. Since the drivers are not going to be updated, one of the solutions would be to make the DPC time allotment threshold bigger. Most of the time there are still no errors as the drivers, even though slow, manage to finish their job just before the threshold. So changing it by just a little bit will already solve the issue.

Last I looked at this there is no way for a driver or admin to control this value. A hypervisor can disable it for a guest (see hv-relaxed in QEMU) and it’s a little longer if Verifier is enabled, but that’s it.

So, even though configuring the value might fix the symptom you’re having it’s not an option. You might want to debug the crashes further and see if there’s something else contributing that you might have control over. Otherwise you’re stuck without access to the source/developers.

Thanks.

To debug things, I have recorded the OS’ start-up with Windows Performance Recorder and I opened a resulting giant 4.2 GB record report file with Windows Performance Analyzer but I can not find the time span/frame when drivers are loaded to see which of them is worse than others. Not sure if this will ultimately help me but at least I will better understand what is going on.

Does anyone know how to navigate the report and find the drivers there?

What kind of work does the driver do for which it needs 20 seconds in a DPC - spinning in physical I/O? heavy calculations? Scanning all the RAM or kernel address space?
– pa

Most of the time there are still no errors as the drivers, even though slow, manage to finish their job just before the threshold

You make it sound like the drivers, being old, are tired and don’t move as quickly as newer drivers :wink:

I think I already explained why some drivers that would work on older versions of Windows are “caught” the by the DPC watchdog.

Let me go out on a limb here: Any driver that spends a huge amount of time – which I will arbitrarily define as any integer number of milliseconds that requires 4 digits – is broken. It was broken when it was written. It is broken now. It will be broken in the future. The fact that Windows didn’t DO anything to flag such drivers as broken in earlier versions of the OS does not alter this fact. I can’t tell you the number of times that we’ve had to clients with well written drivers that were delayed – and thus had to deal with error conditions – caused by other drivers in the system that were misbehaving. This is what the watchdog attempts to flag/curtail.

When you’re running in kernel-mode, you’re part of the operating system. You have to play by the rules of the operating system. If you don’t, all the OTHER parts of the OS are subject to error. Unfortunately, not every dev is sufficiently schooled, sufficiently adept, or has sufficient time to ensure that their OS extensions are working properly. Hence… we have the DPC watchdog.

Peter