Sysvad: how to implement Inverted Call to notify user-mode applications.

jmmagalhaes · August 26, 2020, 5:59pm

Hello, my name is João Magalhães and I am part of a driver development team.

Our goal is to make a noise cancelling driver, where its architecture goes like this:

The capture would be more or less this:
1 - on Skype (or other application) the user selects the driver’s microphone based on Sysvad
2 - the audio engine captures the microphone sound
3 - the audio engine sends the microphone stream to an application to remove noise from the stream
4 - our driver receives the modified stream
5 - our driver sends the stream to Skype

And the reproduction would be more or less this:
1 - on Skype (or other application) the user selects the driver’s speaker based on Sysvad
2 - Skype sends the audio stream to our driver
3 - our driver sends the audio stream to an application to remove noise from the stream
4 - the audio engine receives the modified stream
5 - the audio engine sends the stream to the actual system speaker

We will use Sysvad for two porpuses: to get the audio stream from the user-mode application and send it to the speaker via audio engine, and to get the audio stream from the mic via audio engine and send it to this same application.

We thought about using Inverted Call notifications, to warn our application when the audio buffer is ready to be read. I first tried using the Inverted Call example to make a separate device only for this notification system via IOCTL, but I couldn’t make sysvad access the notification queue from this other device.

Now I am trying to create the notification queue in Sysvad itself, but its architecture is based on mini-ports to access every audio channel, and I am stuck trying to figure out some way to implement this, and if it is really the best way to do it.

The core questions are: is the inverted call the best way to notify our application when the buffer is ready to be read, and, if it is, which is the best way to do it?

Thanks in advance and regards,
João Magalhães

PS: Inverted Call example

Peter_Viscarola_OSR · August 26, 2020, 6:18pm

DO NOT repeatedly post the same question. If your question doesn’t show up in the forum immediately, don’t assume the forum is broken and post it a second time. Especially if you are a new user.

Read the sticky at the top of this category titled “Did you post something and it did not appear?”

Frustrated,

Peter

Peter_Viscarola_OSR · August 26, 2020, 6:22pm

is the inverted call the best way to notify our application when the buffer is ready to be read

Preface: I know nothing… less than nothing… about audio drivers. So… my reply is based on what I know about standard WDF drivers. Mr. Roberts will hopefully be kind enough to correct me if I write something that doesn’t apply…

Why not have the app just post a bunch of a sync reads that the driver can use whenever it needs them? Why go through the whole “notify the app that it needs to do a read, then the apps sends a read” dance? We have asynchronous IOCTLs here on Windows. Why not use them?

Peter

jmmagalhaes · August 26, 2020, 8:11pm

Firstly, I’d like to apologize for sending doubled posts, I really didn’t know, and I should have read the sticky.

We are pretty new in driver development and I started learning about it just a while ago, so I could be saying something that is not the most optimal way to do things.

But about those sync reads that you mentioned, you mean I could put them on a queue and, as the driver writes the first buffer, it completes the request and send it to the application? Or did I understand it wrong?

Peter_Viscarola_OSR · August 26, 2020, 8:20pm

you mean I could put them on a queue and, as the driver writes the first buffer, it completes the request and send it to the application?

Yup. That’s how drivers typically work on Windows (when the read itself is not necessary to provoke an operation on the hardware, of course).

If you device just “produces” data, the most common pattern is for the driver to check a Queue to see if there’s a Read that can be used for that data, if so, it uses the just-arrived data to (fully or partially) satisfy that pending Read. If not, it depends: Typically the driver buffers the data waiting for the app to send it a Read. It is reasonably uncommon for drivers to “signal” the app “I just got some data, please send me a read so I can give it to you.”

The idea is most commonly that the app starts up, and supplies the driver with enough buffers (via sending async reads or IOCTLs) so that the driver never “runs out” and has to buffer the data itself. Because, you know, that can get tricky.

Peter

Tim_Roberts · August 26, 2020, 8:22pm

Yes. Well, “send it to the application” is a bit much. The driver just completes the request. The rest is automatic.

Audio drivers are not necessarily a good place to start. Audio drivers are big, and there are real-time considerations even for a virtual driver.

MBond2 · September 6, 2020, 11:13pm

In addition to what has been said, remember that context switches are expensive. Especially, between KM and UM context and you want to minimize them. Memory is cheap, so why make two context switches when you can make just one?

Yes, this is a fundamental difference between the signal / select concepts common on *nix platforms and standard windows programming. You may know this already, but look into OVERLAPPED IO and the IOCP concepts.

Tim_Roberts · September 7, 2020, 12:16am

In addition to what has been said, remember that context switches are expensive. Especially, between KM and UM context

That’s easy to say, but I’m not sure the data backs you up. The “syscall” instruction that switches from KM to UM takes about 10 times as long as a function call. That’s not expensive. If you have to switch from one UM process to another, that can be expensive because there are operating system and page tables to update, but a UM-KM-UM transition within a single thread is not very expensive at all.

Besides, what’s the alternative?

MBond2 · September 7, 2020, 12:35am

Not to make two calls instead of one for a single operation?

Peter_Viscarola_OSR · September 7, 2020, 3:11am

context switches are expensive.

Sorry, but this off repeated bit of “common wisdom” hasn’t really been true since the days of “int 0x2e” — processor vendors have spend copious amounts of time and effort to get “syscall” and “sysenter” to execute quickly. Look how very little the instruction does.

Peter

MBond2 · September 7, 2020, 10:29pm

Contrast the jmp instruction. That is certainly much cheaper than sysenter / sysleave

But the point does not depend on the cost of these instructions. The logic is the same as with the question of what is the best kind of synchronization – none; when optimizing an algorithm, what is the ideal result for any step – NOP etc.

If you can change the design in such a way that you need do less work, or the work that needs to be done can be done more independently, or with fewer resources etc., then a clear wins result.

In this particular case, using OVERLAPPED IO to pend a sufficient number of buffers in advance of data being available, is clearly a better design (on Windows) than having the driver ‘signal’ the application that data is now available, the application waking up, issuing a read and getting it. Cheap or expensive, the entire ‘signal’ work path disappears. There are fewer calls between UM and KM and the design of both application and driver are simplified. Less code to write and maintain, better performance at runtime, are there any disadvantages with this plan except that it does not run on Windows 95? Even if you want to port to *NIX (or use a common application code base), OVERLAPPED / IOCP are simple to emulate from a select loop with a queue. What am I missing?

Tim_Roberts · September 8, 2020, 4:36am

In this particular case, using OVERLAPPED IO to pend a sufficient number of buffers in advance of data being available,
is clearly a better design (on Windows) than having the driver ‘signal’ the application that data is now available, the application waking up,
issuing a read and getting it.

The difference is trivial. What you have saved here is the read. After all, what inverted call does is complete the request, which sets an event and signals the application that data is now available, thus waking the application. The two processes are virtually identical, except that the non-overlapped scenario removes one UM->KM->UM round trip. Is that better? Sure, it is. That’s why it is recommended. Is is SIGNIFICANTLY better? No.

Performance is important, but reliability, maintainability, and programmer time is also important. If you come up with a good, reliable design that works and happens to have one more UM/KM transition than a comparable but more complicated design, the trade-off is just not as obvious as you claim.

MBond2 · September 9, 2020, 9:28pm

The difference is not trivial, but we needn’t argue about it since we all recommend the same approach.

hungcui · September 21, 2020, 4:19pm

hi @jmmagalhaes
Can you show me how to do it. I would be very grateful if you could send me the code for this project. please

MBond2 · September 22, 2020, 12:53am

You want me to write the code for you? To a very great degree, I will try to help you in any way that I can, but there was a sign in the machine shop in my high school that read something like ‘you cannot help someone by doing for them what they should do for themselves’. This is I’m sure a paraphrase as high school was a long time ago, but you get the point.

On this forum I will help to you a virtually unlimited degree to understand how to write the code you need to write, but I’m not going to write it for you. I think I express the prevailing opinion of other members also.