Threaded DPC

Hi,
I probably have overseen that information on MSDN, but I couldn’t find it…:

Will other CPU cores executing while a threaded DPC is running?

Thanks,
Viele Grüße/Cheers,
Hagen.

Yes, all other cores still can execute code, including your driver. A threaded dpc is not a global lock.

d

Bent from my phone


From: 30101461400n behalf of
Sent: Thursday, June 7, 2018 1:39 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Threaded DPC

Hi,
I probably have overseen that information on MSDN, but I couldn’t find it…:

Will other CPU cores executing while a threaded DPC is running?

Thanks,
Viele Gr??e/Cheers,
Hagen.


NTDEV is sponsored by OSR

Visit the list online at: https:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at https:

To unsubscribe, visit the List Server section of OSR Online at https:</https:></https:></https:>

Thanks, Doron!

…just to understand it correctly:
If I move my IRQ initiated handling from traditional DPC into Threaded DPC it will still be ensured that those are queued and not running in parallel on different cores?

Thanks,
Hagen.

Ummmm… that’s not guaranteed today. DPCs *can* definitely run in parallel on separate cores.

IIRC, the introduction of threaded DPCs was tied to a “glitch free media playback” issue. Also, note that Threaded DPCs can be *disabled* on a system-wide basis and when this happens, the threaded DPC runs as an actual DPC.

In short… Threaded DPCs are not something I would recommend. You want a work item? Queue one from your DPC.

Peter
OSR
@OSRDrivers

Thanks, Peter,

Oh! Thats something!

Currently I only signal an event at DPC time, which triggers streaming in user mode. I am planning to move the streaming into the kernel for multi client support. So I was considering doing that in Threaded DPC.

I remember workitems on the other hand are worse on latency and that would definitely be an issue here…

So I will have to rethink the concept…

Thanks,
Hagen.

May I suggest: PsCreateSystemThread *one time* in your driver. Have your ISR pass data to and signal that thread when there’s work to do. There’s relatively low latency in that approach.

Peter
OSR
@OSRDrivers

> If I move my IRQ initiated handling from traditional DPC into Threaded DPC it

will still be ensured that those are queued and not running in parallel on different cores?

Although a DPC cannot be queued more than once at any given moment (check the headers for more info - IIRC, KDPC structure has only a single list entry so that it just cannot be queued more than once), it still can execute in parallel on different cores after having been dequeued. Therefore, you need a spinlock-level synchronisation if your DPC routine accesses some global variables that may be accessed by some outside code.

Once a threaded DPC may be running just as a “regular” one for the reasons that are beyond your control (for example, if Admin has disabled threaded DPCs in the registry settings), you have to write it exactly the same way you would write a"regular" one, without any special assumptions.

Anton Bassov

Thanks, Peter,

I will have to do some measurements with your suggestion.
I understand this does not compare to the latency of a kernel-user mode signalization, which with the current approach creates a jitter disturbing stable low latent streaming. Though we can go really low here, from time to time Windows just decides to do some other stuff before signalling/executing the waiting user mode thread.

Thanks, Anton,
our streaming approach is lock free as such, so producer and consumer are allowed to run in parallel. But I must ensure that neither the producer nor the consumer side itself executes its own work on more cores in parallel.

Thanks for you suggestions! Highly Appreciated!
Hagen.

Weeeeellll? Again, I’m not sure I understand what you’re saying here.

There’s absolutely no difference between Windows scheduling a user-mode thread and Windows scheduling a kernel-mode thread. While the average latency is low, the WORST CASE latency can be… significant in some cases.

I’ve been giving you “bits and pieces” of info here… and at this point, I think I’d need to understand more about your overall goals and constraints before I suggested anything further.

Peter
OSR
@OSRDrivers

> I think I’d need to understand more about your overall goals and constraints before

I suggested anything further.

I think what the OP really needs is an introduction to the system-level concepts. In fact, I had realised it the very moment I saw his original question concerning “a threaded DPC blocking all CPUs in the system”, and his subsequent “follow-ups” had further confirmed my original suspicion.

What I would recommend here is, first of all, just getting a copy of “Windows Internals” (plus, probably, some book on the generic OS-level concepts as well). I think this should be a starting point.

Otherwise, he would be simply unable to understand what we are actually saying, effectively turning in some kind of VinayKP who, after having had worked on his NDIS driver for around 18 months, finally discovered for himself that apps may gain an access to the network services only via the socket interface…

Anton Bassov

Thanks Peter,

I had a wrong presumption here, judging from measurements I did between IRQ->DPC latency, which performs at fairly low jitter, and the DPC->user thread signalization, which simply speaking is not suitable for real time audio. Well it performs most of the time below a level where glitch free low latent audio is realized. But “most of time” is still bad when you loose the “one and only take” on a music studio performance.

Thanks Anton,
for the suggestion!

Cheers,
Hagen.

> But “most of time” is still bad when you loose the “one and only take” on a

music studio performance.

Real-time audio recording is certainly a task with hard RT requirements, because, as you have said it yourself, missing even is single deadline is unacceptable. There is absolutely no way to turn any GPOS, including Windows, into RTOS. Check the archives - we had discussed it so many times in this NG.

The best thing you can do if you need to run some hard RT task on GPOS is to run the entire GPOS as a separate task within RTOS, so that you can get the best of the two worlds. This has been done more than once. However, I am not sure if you can do it with Windows these days, because you would need to write your own HAL, and I am not sure if HAL Developer kit is still available.

Certainly, you can run use an open-source OS for the purpose, or, in case if Windows environment is a crucial requirement for your project, to run Windows as a guest within a hypervisor with hard RT capabilities. The “only” problem is that, assuming that you are speaking about the PC-grade system, your choice of a hardware platform is not really suitable for hard RT tasks either because of the possibility of SMI. It is unmaskable and completely transparent to the OS software, so that the OS cannot do anything about it - it cannot mask SMI; it does not know what triggers SMI and how often it gets fired; and executing it may take hundred of milliseconds (no, this is not a typo). It does not really sound like a reliable option for the professional sound recording, does it…

This is why the professional sound recording relies, AFAIK, upon the special-purpose hardware than may cost 30-50 times more than a PC. Certainly, there are quite a few attempts being made to use the Windows PC for this purpose - AFAIK, there are multiple projects that try to do it. However, you just cannot make it 100% reliable, no matter how hard you try…

Anton Bassov

Want to record in real-time on a Windows PC? Our customers do it all the
time via Roland VS series recorders hooked to the PC via a virtualized SCSI
hard disk. Our driver easily handles the load, which isn’t onerous, and the
Windows file system has the horsepower and flexibility (i.e. buffering) to
not miss a lick - pun intended.

Bill Casey, ASC

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Sunday, June 10, 2018 10:22 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Threaded DPC

But “most of time” is still bad when you loose the “one and only take” on
a
music studio performance.

Real-time audio recording is certainly a task with hard RT requirements,
because, as you have said it yourself, missing even is single deadline is
unacceptable. There is absolutely no way to turn any GPOS, including
Windows, into RTOS. Check the archives - we had discussed it so many times
in this NG.

The best thing you can do if you need to run some hard RT task on GPOS is to
run the entire GPOS as a separate task within RTOS, so that you can get the
best of the two worlds. This has been done more than once. However, I am not
sure if you can do it with Windows these days, because you would need to
write your own HAL, and I am not sure if HAL Developer kit is still
available.

Certainly, you can run use an open-source OS for the purpose, or, in case if
Windows environment is a crucial requirement for your project, to run
Windows as a guest within a hypervisor with hard RT capabilities. The
“only” problem is that, assuming that you are speaking about the PC-grade
system, your choice of a hardware platform is not really suitable for hard
RT tasks either because of the possibility of SMI. It is unmaskable and
completely transparent to the OS software, so that the OS cannot do anything
about it - it cannot mask SMI; it does not know what triggers SMI and how
often it gets fired; and executing it may take hundred of milliseconds (no,
this is not a typo). It does not really sound like a reliable option for the
professional sound recording, does it…

This is why the professional sound recording relies, AFAIK, upon the
special-purpose hardware than may cost 30-50 times more than a PC.
Certainly, there are quite a few attempts being made to use the Windows PC
for this purpose - AFAIK, there are multiple projects that try to do it.
However, you just cannot make it 100% reliable, no matter how hard you
try…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at:
http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at
http:</http:></http:></http:>

Who handles the _actual _ job of recording under this scheme? If I got it right,
the role of the PC is just to get the recorded data from the recording device and to save it
(i.e. to do something that does not require RT processing as long as device is capable of buffering data), and the _actual _ recording/processing is done by the device itself. In order to keep the PC blissfully ignorant of all the “gory details”, the device presents itself to the PC as as SCSI disk.

Did I get it right? If this is the case, this is most definitely NOT “real-time recording on a Windows PC” - it requires a separate recording device that , probably, happens to be more expensive than the PC that it is attached to, and is operated by its own OS that is completely independent from its Windows host



BTW, once we are at it, the whole thing sounds pretty much as an advertisement, which is strictly prohibited on this list. I just wonder who is going to be the “end recipient” on the receiving end
(pun intended) of “The Hanging Judge’s” whip. Once you were replying to my post, I would not rule out the possibility of getting accused of indirectly provoking you to advertise your product.

Anton Bassov

Thanks Anton,

I am aware that Windows is not an RT, the requirements are to provide drivers to be used in a Windows based recording studio. Therefor the goal here is on a best effort. The customer is aware of no matter how much buffering is involved here (in terms of still being semi-RT), a spike which invalides (=delays) the data can always occur.
Now this is all more theoretically because in practice it seems to be good enough, but we really have to put some effort in to keep it well behaving. (Including MMCS, and other advices to the costumer how to configure the system).

@Bill,
its not a problem to record all data, the problem is to present them on-time. And I don’t see how offloading it to the file system would help here.
More, its the bi-directionallity of the data flow thru Windows defining the time in which data is valid. (Having an external routing in an external machine does not help using both directions (on-time) inside a Windows based DAW.)

Thanks,
Hagen.

Fair enough…

What you can try here is to adjust thread priorities to ensure that no other thread can run on the CPU when your target thread is eligible for running. Another suggestion is to minimise the number of devices that may interrupt in order to ensure that DPC queue is as short as possible. For example, in your particular case NIC does not really seem to be needed (at least not when the recording session goes on), and this kind of device may do “wonders” in terms of latency if network traffic is high enough. Locking all memory buffers that your target thread may access
in order to avoid the possibility of page faults may be yet another suggestion…

Anton Bassov

Hmmmm…

The distribution of latencies from ISR to DPC will vary significantly, depending on the system and the devices and drivers that are configured. It is not at ALL uncommon to have very substantial variations in ISR-DPC latencies, due to an unfortunate convergence of events. It’s one of the most common issues that Windows driver devs, who have real-time, time critical, or vast volumes of data encounter.

I *do* however agree that the variation in latencies, as well as the actual measured times, from ISR to DPC should be smaller than those for DPC to user thread.

There are lots of techniques to battle latency at various places in the chain. At this point, without knowing more about your requirements and constraints, it would be irresponsible to continue to make recommendations.

Windows “battle” with latencies are historic. One of the most embarrassing things, as an architect practicing in the field of Windows internals, is to walk into almost any recording studio in the world and see Pro Tools running on a MAC. Now, granted, most of these users don’t care what OS they’re running… but even in “Windows shops” you tend to see Pro Tools on a MAC. In my experience, this is due to the historic problems folks have had with latencies (and resulting drops) in Windows.

Pro Tools very well on Windows these days.

ANYhow… like I said… I hesitate to recommend anything more specific without knowing more specifics about your situation.

Peter
OSR
@OSRDrivers

Thanks for all the suggestion and comments!
I am satisfied with the answers to the OP, though receiving even more answers to the deeper topics behind it!

Thanks!
Hagen.