Threaded DPCs [was Thread scheduler and deferred procedure calls]

Pavel_A1 · July 29, 2008, 7:31am

Anton, let’s move to a new er… thread

xxxxx@hotmail.com wrote:
>> Also, NT6 has a new type of “threaded” DPCs that can be preempted by
realtime threads
>
>
> There is no such thing as “threaded DPC” - if you give DPCs a own
stack that can get swapped, then it becomes already a workitem or
dedicated thread, depending on whether you process multiple DPCs. in
context of the same thread of give each DPC its own stack Execution unit
like that becomes eligible for scheduling, and, hence, can block, get
synchronized by dispatcher synchronization constructs, rather than
spinlocks and, in all respects, is not a subject to limitations imposed
by being a DPC.
>
> In other words, this “new type of DPCs” had been around since the
beginning of NT…

These threaded DPCs confuse me too.
Mainly by the fact that sometimes they can run at passive,
and sometimes at dispatch. As you noticed, completely
different sync mechanisms are needed in each case.
And checking IRQL each time on entry to these DPCs is expensive.

But usually a DPC immediately acquires some spinlock to
sync with the rest of the driver, that will bring it back to dispatch.
Then, the effect of threaded DPCs is only in that realtime
threads may get ahead of them.

Regards,
–PA

anton_bassov · July 29, 2008, 8:40am

> These threaded DPCs confuse me too. Mainly by the fact that sometimes they can run at passive,

and sometimes at dispatch.

Well, running them at DPC level just defeats the purpose of the whole thing, in the first place, because at this point, they already cannot be preempted, unless NT6 just introduced a different logic of handling real-time threads…

Anton Bassov

Peter_Viscarola_OSR · July 29, 2008, 5:14pm

“threaded DPCs” were new in, ah, XP maybe… They were added speculatively for later use? Instead of running the DPC routine at DISPATCH_LEVEL they run it at IRQL PASSIVE_LEVEL.

IIRC, this was one of those things added around the time people were searching for better solutions to smooth media streaming. To the best of my knowledge, they’ve never really been used. However, I haven’t actually gone LOOKING for drivers that create threaded DPCs.

Hey… if you want a passive-level callback and you don’t want to create your own thread pool… perhaps threaded DPCs are for you.

P

anton_bassov · July 29, 2008, 6:50pm

Peter,

if you want a passive-level callback and you don’t want to create your own thread pool…

… then queuing a workitem seems to be the right solution for you, don’t you think. What we are actually trying to establish here is what makes these “threaded DPCs” so terribly different from workitems. What makes them basically different from the “normal” DPCs is plainly obvious, i.e. the ability to block execution…

Anton Bassov

Peter_Viscarola_OSR · July 29, 2008, 7:07pm

From your ISR?

THAT’S the difference. Quite frankly, I was confused about why these were a good idea since I saw the code hit the depot. I don’t see it myself. But… you know… Cutler wrote it, so we just must be too dumb to see its brilliance.

Peter
OSR

David_R_Cattley · July 29, 2008, 8:57pm

Pithyness aside for a second…

So a work item guarantees you are in the ‘system’ process context or some such, right?

A threaded DPC runs as PASSIVE_LEVEL and if I understand Peter’s comment, can be queued from IRQL >= DISPATCH_LEVEL.

So that restriction is easily solved in work-item-world with a DPC that schedules a work-item so it must be somehow even different from that 'cause that seems just a bit lame.

Reading the FM (WDK DOCS) leads me to conclude that threaded DPCs were not created to make it easy for a driver writer to run a callback at PASSIVE_LEVEL by contract but a way to make it possible for the system to prioritize *some* DPC-like activity behind real-time threads. The rather terse “introduction to threaded dpcs” sort-a says “hey, it might run at PASSIVE_LEVEL, it might run at DISPATCH_LEVEL, be ready for either” as a way of warning that it is really just a DPC after all and you should not use routines that *require* IRQL == DISPATCH_LEVEL unless you explicitly raise IRQL.

So, as it kinda says, it’s a DPC but not that important that it needs to keep all threads from running. You might use it for shutting off the lawn sprinkler but not pop’n up the bagel in the toaster. Nobody likes burnt bagels but you can never put too much water on the grass (in MA in July anyway).

Does the threaded DPC ‘steal’ context and have other behaviors like being able to target a CPU, etc. like a real (non-threaded) DPC? I don’t see how it can make a context guarantee nor how to set a CPU affinity but that might just be me.

I’m just really curious 'cause someday I am sure I will have to sit through a design review where the presenter claims to have solved world hunger, perpetual motion, and gravity compensation just through the use of threaded DPCs and I would like to be able to ‘measure’ the goodness.

Thx.

-dave

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Tuesday, July 29, 2008 7:09 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Threaded DPCs [was Thread scheduler and deferred procedure calls]

From your ISR?

THAT’S the difference. Quite frankly, I was confused about why these were a good idea since I saw the code hit the depot. I don’t see it myself. But… you know… Cutler wrote it, so we just must be too dumb to see its brilliance.

Peter
OSR

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · July 29, 2008, 9:00pm

From your ISR?

Of course -apparently, this is exactly what you do by queuing this “threaded DPC”. This is nothing more than just the question of terminology…

I was confused about why these were a good idea since I saw the code hit the depot.

Well, the advantage is obvious - in some cases DPC is unnecessary, because the only thing it does is queueing a workitem that does all the job. The only question is why they could not see it from the very beginning, and originally disallowed queuing workitems from ISR. When they realized that
it can be beneficial sometimes, they allowed it, but presented the whole thing not as an workitem but as some " threaded DPC" , which is very typical of MSFT - they just could not miss their chance of “introducing a new feature”…

Anton Bassov

Daniel_Terhell · July 30, 2008, 4:20am

"On server systems, where overall system performance is more important than
system latency, threaded DPCs work in the identical manner as ordinary DPCs
do. " Argh, I hope that is missing the words “by default”. If you know how
many audioheads use server system because they don’t want the nonsense and
because the timeslice is longer which reduces the chance that their realtime
(block processing) thread is preempted before it finishes.

Still I am not seeing how threaded DPCs are going to be of any help to a
realtime application unless it becomes the common way of scheduling work in
the kernel and take over ordinary DPCs. I think is’s not going to be
possible for the system to substitute third-party normal DPCs with threaded
DPCs because it cannot detect the logic it depends on.

//Daniel

“Pavel A.” wrote in message news:xxxxx@ntdev…
> Anton, let’s move to a new er… thread
>
> xxxxx@hotmail.com wrote:
> >> Also, NT6 has a new type of “threaded” DPCs that can be preempted by
> realtime threads
> >
> >
> > There is no such thing as “threaded DPC” - if you give DPCs a own
> stack that can get swapped, then it becomes already a workitem or
> dedicated thread, depending on whether you process multiple DPCs. in
> context of the same thread of give each DPC its own stack Execution unit
> like that becomes eligible for scheduling, and, hence, can block, get
> synchronized by dispatcher synchronization constructs, rather than
> spinlocks and, in all respects, is not a subject to limitations imposed by
> being a DPC.
> >
> > In other words, this “new type of DPCs” had been around since the
> beginning of NT…
>
> These threaded DPCs confuse me too.
> Mainly by the fact that sometimes they can run at passive,
> and sometimes at dispatch. As you noticed, completely
> different sync mechanisms are needed in each case.
> And checking IRQL each time on entry to these DPCs is expensive.
>
> But usually a DPC immediately acquires some spinlock to
> sync with the rest of the driver, that will bring it back to dispatch.
> Then, the effect of threaded DPCs is only in that realtime
> threads may get ahead of them.
>
> Regards,
> --PA
>

Daniel_Terhell · July 30, 2008, 6:24am

wrote in message news:xxxxx@ntdev…
> Peter,
>
>> if you want a passive-level callback and you don’t want to create your
>> own thread pool…
>
> … then queuing a workitem seems to be the right solution for you, don’t
> you think. What we are actually trying to establish here is what makes
> these “threaded DPCs” so terribly different from workitems. What makes
> them basically different from the “normal” DPCs is plainly obvious, i.e.
> the ability to block execution…
>

Work items are not an option in ISRs due to IRQL restrictions.

//Daniel

Maxim_S_Shatskih · July 30, 2008, 6:50am

> Work items are not an option in ISRs due to IRQL restrictions.

Am I correct that threaded DPC is the same as work item but can be queued from
the ISR?

–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

David_R_Cattley · July 30, 2008, 7:43am

Based on WDK doc the answer seems to be a resounding NO.

A Threaded DPC does not make a context guarantee. A Work Item does.
A Threaded DPC runs at IRQL <= DISPATCH_LEVEL. A Work Item runs at PASSIVE_LEVEL.
A Threaded DPC may be targeted to a specific CPU (NT6).
An administrative (registry) option controls if Threaded DPCs are enabled or simply treated as ‘normal’ DPCs.

It seems that Threaded DPCs are simply to allow a driver to tell the OS the task to be performed is lower priority than real-time threads and can wait until no real-time threads are runnable on any (or the target) processor. If the OS has Threaded DPCs disabled, the driver gets a normal DPC.

It would appear to be a design mistake to assume Threaded DPCs are a way of scheduling a PASSIVE_LEVEL callback. That behavior is *possible* but not part of the contract. The WORK_ITEM remains as the mechanism to supply that capability.

It sounds like a grand way to keep certain DPC activity from messing up streaming media or other work which is thread based.

-Dave

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Wednesday, July 30, 2008 6:44 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Threaded DPCs [was Thread scheduler and deferred procedure calls]

Work items are not an option in ISRs due to IRQL restrictions.

Am I correct that threaded DPC is the same as work item but can be queued from
the ISR?

–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Peter_Viscarola_OSR · July 30, 2008, 9:42am

When threaded, threaded DPCs will always run at IRQL PASSIVE_LEVEL.

But, think about it. The contract has to call for <= DISPATCH_LEVEL because an admin can “shut off” threaded DPCs with a registry parameter. Thus, your driver is forced to treat a “threaded DPC routine” as a “regular DPC routine” just in case.

It seems to me like a big mess. Like many other things about Windows drivers, the behavior of YOUR driver is ultimately determined by what OTHER drivers do in the system. So, if everybody ELSEs drivers used threaded DPCs, and YOUR pseudo-real-time multimedia signal processing driver used real DPCs, you’d be better off.

It’s back to the case of knowledgeable devs using the system appropriately again. If everybody actually followed the rules and did only highly time critical stuff in their ISRs, and limited the work to only what’s necessary in their DPCs, we’d all be fine… and probably wouldn’t need to even discuss threaded DPCs.

Peter
OSR

anton_bassov · July 30, 2008, 9:54am

> If you know how many audioheads use server system because they don’t want the nonsense

and because the timeslice is longer which reduces the chance that their realtime
(block processing) thread is preempted before it finishes.

Quantum duration becomes meaningful only when there are multiple threads of the same priority
in the ready state. If your thread of interest is of sufficiently high real-time priority (and, apparently, the audio application designer will ensure it), quantum duration is of no importance whatsoever. The only practical consequence of a longer quantum for interactive user is slow GUI. This is why buying server-grade OS because you want to be able to run audio applications is just a total no-brainer…

I think is’s not going to be possible for the system to substitute third-party normal DPCs
with threaded DPCs because it cannot detect the logic it depends on.

It may be possible, but they have to find some common denominator for all types of DPCs, which will require quite a lot of changes in the kernel. Under the currently existing model it is impossible. For example, if DPC routine calls KeAcquireSpinlockAtDpcLevel() and you invoke it at low IRQL, this opens a possibility for a deadlock.

Still I am not seeing how threaded DPCs are going to be of any help to a realtime application
unless it becomes the common way of scheduling work in the kernel and take over ordinary DPCs.

That’ for sure - either scheduler has control over absolutely all DPCs, or it has control over none, because any DPC that it has no control over may put it out of the play for any time it wises…

Anton Bassov

Daniel_Terhell · July 30, 2008, 1:39pm

wrote in message news:xxxxx@ntdev…
> Quantum duration becomes meaningful only when there are multiple threads
> of the same priority
> in the ready state. If your thread of interest is of sufficiently high
> real-time priority (and, apparently, the audio application designer will
> ensure it), quantum duration is of no importance whatsoever. The only
> practical consequence of a longer quantum for interactive user is slow
> GUI. This is why buying server-grade OS because you want to be able to
> run audio applications is just a total no-brainer…
>

Audio is processed in blocks by realtime threads before they are sent down.
You do not want that you consume your timeslice before you finished
processing your block, this means your processing thread needs to be
rescheduled and is sure to cause latency. Note that all this is not about
performance, it’s about latency. That’s why audioheads are running server
platforms as audio workstations. I agree that a workstation OS can be
optimized for background applications as well but the idea that multimedia
applications only run on client OS is just not realistic.

//Daniel

anton_bassov · July 30, 2008, 6:29pm

> You do not want that you consume your timeslice before you finished processing your block,

this means your processing thread needs to be rescheduled and is sure to cause latency.
Note that all this is not about performance, it’s about latency.

Rescheduling currently running thread involves no latency whatsoever - it will just resume execution, and that’s it. You don’t have remove it from the run list if you give it another timeslice, do you. The problem you have described may occur:

In case if there are some other threads of the same real-time priority on the ready list. In such case they will be given their timeslices after your thread’s quantum expires, which will, indeed, result in some latency - you have to wait until you are given CPU again.
In case if your thread gets preempted by some thread of a higher priority. However, if it happens, it will happen regardless of quantum duration

In any case, it would be unwise for you to run multiple real-time applications concurrently. because, as a result, none of them will be a real-time. Assuming that there is only one real-time application around at the moment, it may happen only if the application itself is poorly designed

That’s why audioheads are running server platforms as audio workstations.

That’s just because they are “audioheads”. Do you really think they may have any idea about the thread quantum??? What, apparently, attracts them is the fact that server-grade OS offers higher performance -
they just don’t understand that this higher performance becomes obvious when there are thousands threads running simultaneously. Otherwise, the only thing they get is slow GUI.

the idea that multimedia applications only run on client OS is just not realistic.

Well, multimedia applications may run everywhere, but , apparently, “audioheads” use their computers not only for music, don’t you think. As a result of using server-grade OS, they get exactly the same performance of multimedia applications and noticeable slowdown of “regular” GUI ones…

Anton Bassov

Daniel_Terhell · July 30, 2008, 7:18pm

MESSAGE REPLY (WITH SUBJECT, BODY AND SUMMARY HEADER)

They had a limited quantum but because it was higher on a server OS
fortunately for the thread it could do its processing before its quantum was
finished so they did not add any extra latency

//Daniel

wrote in message news:xxxxx@ntdev…
>> You do not want that you consume your timeslice before you finished
>> processing your block,
>> this means your processing thread needs to be rescheduled and is sure to
>> cause latency.
>> Note that all this is not about performance, it’s about latency.
>
> Rescheduling currently running thread involves no latency whatsoever - it
> will just resume execution, and that’s it. You don’t have remove it from
> the run list if you give it another timeslice, do you. The problem you
> have described may occur:
>
> 1. In case if there are some other threads of the same real-time priority
> on the ready list. In such case they will be given their timeslices after
> your thread’s quantum expires, which will, indeed, result in some
> latency - you have to wait until you are given CPU again.
>
> 2. In case if your thread gets preempted by some thread of a higher
> priority. However, if it happens, it will happen regardless of quantum
> duration
>
>
> In any case, it would be unwise for you to run multiple real-time
> applications concurrently. because, as a result, none of them will be a
> real-time. Assuming that there is only one real-time application around at
> the moment, it may happen only if the application itself is poorly
> designed
>
>> That’s why audioheads are running server platforms as audio workstations.
>
> That’s just because they are “audioheads”. Do you really think they may
> have any idea about the thread quantum??? What, apparently, attracts
> them is the fact that server-grade OS offers higher performance -
> they just don’t understand that this higher performance becomes obvious
> when there are thousands threads running simultaneously. Otherwise, the
> only thing they get is slow GUI.
>
>
>> the idea that multimedia applications only run on client OS is just not
>> realistic.
>
> Well, multimedia applications may run everywhere, but , apparently,
> “audioheads” use their computers not only for music, don’t you think. As
> a result of using server-grade OS, they get exactly the same performance
> of multimedia applications and noticeable slowdown of “regular” GUI
> ones…
>
> Anton Bassov
>
>
>
>
>

anton_bassov · July 30, 2008, 7:39pm

> 3) They had a limited quantum but because it was higher on a server OS fortunately

for the thread it could do its processing before its quantum was finished so they did
not add any extra latency

Where would this latency possibly come from if there were no other threads of the same (or higher) real-time priority at the time??? The thread would be just given a chance to resume execution, and that’s it - even if its quantum had expired, it would be automatically given a new one so that it would resume its execution on the spot…

Anton Bassov