Seeking Assistance: How to Create Virtual Input and Output Devices in Windows 11

Hello everyone,

I hope this message finds you well. I'm currently working on a project where I need to create virtual microphone and speaker devices (virtual input and output devices) on Windows 11. My goal is to develop a solution that allows audio routing between applications without the need for physical audio hardware.

Here's what I'm aiming to achieve:

  • Virtual Microphone (Input Device): An emulated microphone that can feed audio data from a source (e.g., an audio file or another application) into applications that accept microphone input (e.g., Zoom, Skype).

  • Virtual Speaker (Output Device): A virtual speaker that can capture audio output from applications and route it elsewhere (e.g., to a recording application or for processing).

What I've Explored So Far:

  1. Windows Driver Kit (WDK) with C++:

    • I considered developing a kernel-mode driver using the WDK.
    • Explored the SYSVAD (System Virtual Audio Driver) sample provided by Microsoft.
    • Realized that kernel-mode driver development is complex and poses risks to system stability.
  2. User-Mode Alternatives:

    • Looked into using Virtual Audio Cable software to create virtual devices.
    • Explored the Windows Audio Session API (WASAPI) for loopback capture and rendering.
    • Considered creating an application that routes audio between input and output streams.

Challenges I'm Facing:

  • Limited Documentation and Examples:

    • There's a scarcity of comprehensive guides on creating virtual audio devices in user mode.
    • Most resources focus on kernel-mode drivers, which I prefer to avoid due to complexity.
  • Need for a User-Mode Solution:

    • I want to develop this entirely in user mode to reduce risk and simplify deployment.
    • Aim to avoid dealing with driver signing and kernel-mode debugging.
  • Integration with Windows 11:

    • Ensure compatibility with Windows 11's audio architecture.
    • Make the virtual devices appear as standard audio devices in the system settings.

What I'm Requesting Help With:

  1. Guidance on User-Mode Development:

    • How can I create virtual input and output audio devices in user mode on Windows 11?
    • Are there any APIs, SDKs, or frameworks that facilitate this process?

Thank you in advance for your assistance!

If your plan is to have this work with unmodified user-mode applications, then it cannot be done in user-mode. End of story. Audio Engine only enumerates kernel devices. Virtual Audio Cable has a kernel-mode driver, and if it will solve your problem, that's a far better solution than developing your own.

The SimpleAudioSample sample does most of what you need, and is much easier to understand than SYSVAD. SYSVAD has become a platform to demonstrate every new feature that gets added to the audio system.

If you have a certain amount of control over the applications, it is possible to write user-mode DirectShow audio sources and sinks, or even MediaFoundation filters, but the applications have to know to look there.

1 Like

Thank you for your assistance.

I have successfully executed the SimpleAudioSample and removed the beep sound. Now, I would like to capture audio from the user's microphone, process it, and then transmit the processed audio to the virtual microphone created by SimpleAudioSample. The goal is to connect this virtual device to any online meeting software, allowing it to receive the enhanced audio.

My objective is similar to Krisp, a noise removal software. They capture the user's microphone input, apply machine learning-based noise reduction, and then send the processed audio to a virtual microphone. I aim to implement a similar functionality.

Would you be able to guide me in achieving this?

I have to be a little cautious, because Krisp was one of my clients. They are smart people, and I respect the work they've done. There is certainly generic advice I can offer.

The audio processing will go on in a separate application. You don't want that kind of computing in a kernel driver. That application attaches to the live microphone and the live speaker. The meeting software would attach to your virtual audio driver for both microphone and speaker. The rest is just plumbing, and timing, of course. You're introducing latency, and have to consider that in your processing.

One of the trickier parts of the process like that is managing the buffer sizes. Too big, and the latency goes wild. Too small, and you get underruns that cause pops and clicks.

I have implemented a similar approach; however, I am unable to transmit the processed audio to the virtual microphone because the SimpleAudioSample virtual microphone does not provide an output channel. Currently, I am utilizing PortAudio for routing.

Right, that's what you have to write: a set of private ioctls that provides back-door access to the circular buffers used by ReadBytes and WriteBytes. That's the core of the job.