About cumulative DPC_WATCHDOG_VIOLATION

Hello,

I am confused with the interpretation of help for DPC_WATCHDOG_VIOLATION with Parameter 1 = 1, i.e. cumulative DPC_WATCHDOG_VIOLATION.

It says “The system cumulatively spent an extended period of time at IRQL DISPATCH_LEVEL or above”.
I am confused because it says “system” and not “core”. The description actually makes sense for a core. But if it is really about the entire system, then does it mean that all cores together can contribute to this BSOD? i.e. the time duration of 2 more cores running at DPC level is added when this timeout value is checked against?

If it is really from the perspective of a single core then I suppose there is no other way than to stop scheduling DPCs back to back, right?

Also is it right to assume that the driver called out in the BSOD can just be a victim and not really the culprit?

Regards,
Suresh

Responding to my own post as I now have some clarity, and also a subsequent question.

I confirmed that it is indeed meant for a core and not the system (as in all the cores together). The !dpcwatchdog tells which core is causing it.

Now mine is a storport miniport. So in order to avoid it I thought I would queue a workitem with the assumption that the workitem will subsequently be scheduled on the same core where I am calling the API StorPortQueueWorkItem.
However this assumption proved wrong!
I see that the workitem gets scheduled on a random core, which I have confirmed by using StorPortGetCurrentProcessorNumber in both the scheduling thread (at DPC level) and in the subsequent worker thread (at PASSIVE level). They don’t always match.

So given this, how do I make sure that a given core reaches the PASSIVE level when you want it to? Is there any other way apart from using StorPortQueueWorkItem?

Regards,
Suresh

how do I make sure that a given core reaches the PASSIVE level when you want it to

Hmmmm… rapidly return from your DPC? And hope there aren’t a pile of DPCs queued behind yours?

Thanks Peter, but how to make it deterministic is my concern.

The issue (cumulative DPC watchdog) happens only in certain heavy load tests.
Even though I don’t have control on DPCs scheduled by others, I can possibly briefly disable interrupts for my driver. But when to enable them back would be a question.

If there was a way to make sure that a given core has reached PASSIVE level, and I thought scheduling workitem is one, then I could have enabled the interrupts back in such PASSIVE level thread.

I hear you: How about using StorPortSetSystemGroupAffinityThread in your worker thread?

I guess I don’t understand why you want to get scheduled on the same core. To avoid being the perpetrator of the watchdog timer violation, get out of your dpc ASAP. So the workitem approach does that, right? When it runs, on whatever core, re-enable device interrupts as appropriate. You do not need to be in the same core to do this. There are of course other reasons to keep execution on the same set of cores, but they are performance related, not this dpc watchdog problem.

I guess I don’t understand why you want to get scheduled on the same core

He’s concerned about the cumulative timer.

What the OP does not seem to understand is that, even if you DO schedule a work item (IRQL == PASSIVE_LEVEL) on the same core, that work item will not run until the DPC List has been emptied. That’s why I said “rapidly return from your DPC” and Mr. Roddy said “get out of your dpc ASAP”

Nothing that you schedule at IRQL PASSIVE_LEVEL will ever take precedence (and certainly not de-schedule) anything running at IRQL DISPATCH_LEVEL (like your DPC or any DPCs behind yours in the DPC list).

So, to make a long answer short, “rapidly return from your DPC” – Do as little work in your DPC as possible, schedule your work item to do as much work as you safely can, and then get out of your DPC.

I understand that the work item will not get a chance to run till the DPC queue on that core is exhausted. That’s why I am planning to disable the interrupts, so that no more DPCs get scheduled at least from my driver.

I do plan to re-enable the interrupts from my work item. But what if the work item gets scheduled on other core while the DPC queue on the original core is still not depleted? It will just start queuing the DPCs again.
So I want to re-enable the interrupts only after making sure that the core at DISPATCH has got a chance to briefly transition to PASSIVE, and a work item on that very core guarantees that.

Btw thanks Peter for the pointer to StorPortSetSystemGroupAffinityThread and it works exactly the way I want! Unfortunately it is supported only from WS2022 onwards and I need to support WS2019 too.

OK… so, I think I finally understand what you’re trying to do. You want some sort of signal that the core on which your DPC is running has returned to IRQL PASSIVE_LEVEL, so that you know for sure you can re-enable interrupts which will fire another DPC without incurring the dread cumulative DPC watchdog violation.

Your thinking is that this is better than simply queuing a work item to run on ANY processor, then re-enabling interrupts… because if you do that, you don’t KNOW that the core has returned to PASSIVE_LEVEL and, if it hasn’t, then you haven’t “fixed” anything in terms of the cumulative DPC watchdog.

OK… that’s correct. But, consider a few things: First, the cumulative DPC watchdog is very large. For you to get a cumulative DPC watchdog violation, SOMEBODY is spending far more time than they should at IRQL DISPATCH_LEVEL. Not a little too much time, but like WAY too much time.

Second, you have no knowledge or control of how much time has been spent at IRQL DISPATCH_LEVEL before your DPC is entered, or how many requests remain on the DPC list after your DPC object. This means that all you can do is return from your DPC as quickly as possible each time your DPC is invoked. That’s really the only control you have. If you always do that, then you’re doing just about all your driver can do to reduce the time the system at IRQL DISPATCH_LEVEL. If there’s ANOTHER driver in the system that misbehaving, you’re not going to be able to fix that no matter what… even if you DO delay re-enabling your interrupts until the core has returned to PASSIVE_LEVEL (and thereby potentially diminishing the performance of your device/driver). The other driver could STILL be spending too much time in its DPC.

In short, I would suggest the proper mitigation for this problem is for your driver to “rapidly return from your DPC” or, as a wise man once said, “get out of your dpc ASAP”. While I admire your desire for a more definitive solution, I really don’t think there is one to be found. If you’re already doing this (and I mean REALLY returning ASAP from your DPC), and problems persist… the issue is another driver in the system and not yours.

Peter

So basically your plan is to starve your device to mitigate bad behavior by other devices. That is really very considerate of you. I personally would just minimize my dpc time.

Thank you Peter and Mark! The advice is pretty clear now and is reasonable.

note also that when you have really heavy performance, the worker at passive should not just handle the one ‘thing’ that triggered the interrupt, but check for more ‘things’ that have arrived on the device that will trigger a new interrupt immediately before you re-enable interrupts. Polling at passive will perform better than interrupts at very high data rates. Hybrid algorithms are complex to implement but can substantially improve performance

1 Like