Raising IRQL on one core blocks threads on other cores?

I don’t know, but if I had this problem I would use windbg to look at what
the fork was going on with all the other ‘cpus’.

Mark Roddy

Most of us have seen cores get “lost” by a deadlock on a spin lock;

Well, this is a slightly different story, albeit a similar once

In case of a spinlock- related deadlock you are,indeed, just bound to deadlock completely at some point. If a CPU tries
to acquire the target spinlock in this situation, it is just bound to deadlock. Once spinlocks normally get acquired on the code paths that happen to be generally useful, and, hence, get executed on more or less regular basis, all cores are eventually going to get out of play at some point, trying to acquire a lock that will never ever get released.

However, in our (HYPOTHETICAL,of course) , case, we lose that ability to execute DPCs that queued to the DPC queue of the spinning core.
The same DPC cannot get enqueued more than once, can it? Therefore, every time an interrupt gets routed to that target core, we are just bound to “lose” a DPC that ISR queues - they will get placed to the target core’s queue.

then it eventually descends into a sort of weird, dead, state for the reasons Mr. Bassov lists

Fair enough - in case of a spinlock-related deadlock you may start noticing some “funny” effects well before all CPUs are actually deadlocked, and it is going to happen because of “DPC starvation” that I have described above

Another point to consider is that, in order to get the observations that the OP gets (i.e. of a frozen GUI), all we have to do is to “lose” a SINGLE
DPC (i.e. the one that gets queued by the mouseclass driver) this way…

Anton Bassov

What a long reply. Concision is a virtue, Mr. Bassov.

In case of a spinlock- related deadlock you are, indeed, just bound to deadlock completely at some point

Well… yes or no. It depends on the paths in which the spin-lock is acquired and what’s happening on the device.

In case of a spinlock-related deadlock you may start noticing some “funny” effects well before all CPUs are actually deadlocked, and it is going to happen because of “DPC starvation” that I have described above

Which, you know, is what I said…

the observations that the OP gets (i.e. of a frozen GUI), all we have to do is to “lose” a SINGLE DPC

Well… yes? More accurately, we need the USB HCD to queue a DPC on the core that is no longer servicing the DPC list (because it’s “otherwise engaged” at IRQL DISPATCH_LEVEL). It’d be bad luck for this to happen right away. And even in this case, you’d still see other UI elements working. The keyboard won’t immediately die.

In summary, as interrupts eventually get routed to the “busy” core, the DPCs that service the “bottom half” of those interrupts will be blocked, and no further DPC will be queued for those device instances for which a DPC is pending, as Mr. Bassov correctly notes.

So, again… the system “eventually descends into a sort of weird, dead, state” – “eventually” here might be a few seconds. But I can’t see this being immediate.

Mr. Roddy’s comment is really the most insightful: Just look at what’s happening with WinDbg, and we can all stop guessing?

Peter

Wow, thank you all for the very insightful replies. As it stands, I’ve been testing this on my local machine, which may not be the best idea. If I can get it working in a Virtual Machine then I think I can take a look with WinDbg what’s going on. Very new to this all, so I apologize for my ignorances.

Hmmmmm… I’m not sure I’d trust results on a VM.

Get a (physical machine) test system hooked up.

ETA: More cores would be better, as well. Get a test system with 4 or 8 cores and you should definitely see the slower degradation we’d expect. On a 2 core machine, you have a 50% chance of interrupts happening on the “locked” core. Also, get something with GUI elements running (they’ll run until, you know, the graphics card gets hung).

Peter

There are (mostly older) systems with chipsets that cause ALL interrupts to occur on CPU 0 instead of being evenly spread among CPUs. Regardless, if an interrupt occurs on the CPU that you are spinning and there’s a DPC involved, it won’t run until the spinning DPC is done so that may lock up the GUI / system.

//Daniel

um no?

If there is an interrupt on the cpu that is running a dpc thread, the
interrupt service will run and then the dpc will resume. Also a dpc is not
a hardware interrupt, it cannot by itself block other cpus.

Mark Roddy

That’s what I think I said. The spinning DPC will resume. Perhaps an exception to this is if a HighImportance DPC is queued, then it should run immediately.

//Daniel

finding any machine in 2020 that has only 2 cores seems hard, but I think that many current desktop grade systems do route all interrupts to cpu 0 even in 2020.

in any case it will be a highly platform specific behavior that will certainly change between each physical machine and between VMs on different hypervisors and with differences in the physical machines that they run on too

Perhaps an exception to this is if a HighImportance DPC is queued, then it should run immediately.

Absolutely not. There is never DPC preemption.

All setting the Importance of a DPC does is influence (a) whether the DPC Object is queue at the head or the tail of the DPC List, and (b) If the DPC is targeted to a processor other than the current processor, whether a IPC is sent to the remote processor to inform it that it now has something on the DPC list that needs to be processed.

Peter

Absolutely not. There is never DPC preemption.

That’s what I would expect too, but here’s what the docs say about HighImportance:
Place the DPC at the beginning of the DPC queue, and begin processing the queue immediately.

If KeInsertQueueDpc is executed by the ISR, cannot it process the DPC queue before the ISR returns ?

//Daniel

If KeInsertQueueDpc is executed by the ISR, cannot it process the DPC queue before the ISR returns

No. How could this work. The ISR is running at DIRQL and holding the interrupt spin lock… we need to run the DPC at IRQL DISPATCH_LEVEL and not holding the lock.

How could DPC preemption work, given everything we know about DPCs?

Peter

Place the DPC at the beginning of the DPC queue, and begin processing the queue immediately.

… IF the kernel is not already processing the queue. That’s the missing clause here.

It difficult to write what is exactly correct in this situation concisely. I would write it to read

Place the DPC at the beginning of the DPC list, and cause the list to be evaluated the next time the IRQL is about to drop below IRQL DISPATCH_LEVEL. If the DPC is queued to a processor other than the current processor, an IPI is sent to that processor, thus causing the DPC list on that processor to serviced subsequent to the IPI.”

Peter

the docs say about HighImportance:
Place the DPC at the beginning of the DPC queue, and begin processing the queue immediately.

Regardless of its priority, a DPC cannot preempt ANY code that runs at IRQL>APC_LEVEL. Full stop. Otherwise,it would simply violate the most fundamental principles of Windows design.

In order to understand why, consider what happens if you hold a spinlock and your code gets preempted by DPC, - if the target DPC tries to acquire the lock in question, the deadlock is guaranteed.

In fact, raising IRQL to DISPATCH_LEVEL is just a way of saying “This code must not get preempted by anyone, apart from ISR, because I may hold a lock”…

Anton Bassov

There are (mostly older) systems with chipsets that cause ALL interrupts to occur on CPU 0 instead of being evenly spread among CPUs.

Something tells me that you must be speaking about the chipsets from VIA Technologies…

Anton Bassov

If the DPC is queued to a processor other than the current processor, an IPI is sent to that processor

It’s interesting to realize that the system needs to be able to queue a DPC no matter what it was previously doing. The DPC queue cannot use a spinlock of some sort, what should it do if an interrupt occurs while the lock was already held ? Still I don’t see why the IPI is required. Why shouldn’t any CPU be able to access the DPC queue on any other CPU ?

Something tells me that you must be speaking about the chipsets from VIA Technologies.

Possibly, from what I see, it’s becoming more a thing of the past. They are great for uninterrupted live audio, if you set affinity of your threads to a CPU >=1.

//Daniel

Why shouldn’t any CPU be able to access the DPC queue on any other CPU ?

It can. That’s not the reason for the IPI. DPCs with Medium and Low importance can also be queued to a non-local processor. The IPI is only generated for High Importance DPCs, as a way to cause the “remote” processor to immediately evaluate the DPC List. Otherwise, the remote processors DPC List wouldn’t be evaluated until the next time there was a raise and subsequent lowering of IRQL above DISPATCH_LEVEL.

Peter

The DPC queue cannot use a spinlock of some sort,

Of course it can…

The only thing you have to do is to elevate IRQL to the level above DIRQL when accessing this lock, and everything will work just fine.
Therefore, it cannot be a"regular" spinlock, i.e the one that you lock with KeAcquireSpinlock()

what should it do if an interrupt occurs while the lock was already held ?

As long as current IRQL is above DIRQL there is simply no chance for an ISR to run until IRQL gets lowered. Therefore, there is no
problem here whatsoever…

Why shouldn’t any CPU be able to access the DPC queue on any other CPU ?

Of course it should - otherwise it would be simply unable to queue a DPC to some other processor’s queue

Still I don’t see why the IPI is required.

As Peter explained to you already, it does so simply in order to make the target CPU check it’s DPC queue.

OTOH, I was under the impression that high-and-medium-priority DPCs are always enqueued to the current processor’s queue, so that only a low-priority one may end up in some other processor’s one. If this is the case, then there is no urgency with it whatsoever, so that requesting an IPI seems to be a bit to the extreme…

Anton Bassov

OTOH, I was under the impression that high-and-medium-priority DPCs are always enqueued to the current processor’s queue,

Nope. Importance and Target Processor are separate parameters.

When the docs say “begin processing the queue immediately” read “generate an IPI if the target processors is other than the current processor.”

Peter