Windows Audio Driver Architecture

Hello I recently finished a Windows 10/11 PTPv2 daemon project (you can find it on GitHub under nt2ds/Win32_PTP, as I cannot post links). This daemon is part of an AES67 Virtual sound card (like Dante DVS or Voicemeeter) that will transmit AES67 streams onto the network.
This means that the Virtual Audio Device (VAD from now on) must appear as a normal playback device in Windows, capture the Audio Data, and send it to the Transmitter App that will run in user space.

The last week I’ve been trying to wrap my head around Windows Drivers and Audio Architecture but it is all so scattered and not well documented that I still have many many questions.

The most important of all is:
Should I use WaveRT or WaveCyclic for a Virtual Audio Device?
From an older port here, I read that WaveCyclic is the way for a VAD because WaveRT requires actual hardware to make Windows send the audio packets to a physical device in as opposed to WaveCyclic which (from my understanding) creates a Memory Space where it writes the actual audio data it captures in a Cyclic buffer.

Q2:
What is up with WDM and WASAPI?
Isn’t just WDM the Windows Driver Model the way the Windows Driver Architecture is structured? What does it have to do with WASAPI? What does it have to do with KS?
How does WDM even stick with audio here except for defining a driver’s architecture?

Q3:
Microsoft on Microsoft Learn says that KS is the Kernel Mode processing of streamed data. Ok, and?
How do I process streamed data is the Kernel?
Who actually processes the data?
How does the one that processes the data, actually get the data?
There are lots of things Microsoft Learn leaves outside.

Q4:
How is a device actually “presented” as a playback device in the System so I can just click and change the output? Whose “responsibility” is this? WDM, Wave, KS, WASAPI?
Isn’t WDM just the Windows Driver Model which specifies Driver Architecture? Am I missing something here?
I cannot understand the whole Window Audio Architecture. What goes where, how do all these things connect with each other.
I would really appreciate some answers so I can clear things in my head or some material which I can read regarding these topics.
Thanks in advance!

I can’t make decisions for your for what technologies to use in your project, but I can hopefully explain a bit about the audio stack.

WaveRT is generally the more modern and preferred technology over WaveCyclic, and the modern built-in drivers like HDAUDIO and USBAUDIO2 use it. WaveCyclic sends IOCTLs through the driver stack as isolated packets to read/write audio data. WaveRT doesn’t send “packets“ of audio data through the stack. Instead, the audio consumer and producer receive a shared cyclic memory buffer where they both read and write data from/to directly.

WDM is, as you said, just a general model for Windows drivers. WASAPI on the other hand is a usermode audio API. They have no direct connection. KS is the kernelmode framework for streaming of media data, and it is built on top of the WDM. KS and WDM are operating below WASAPI.

Kernel streaming is a quite complex framework, so there isn’t a generalized answer to your questions. It depends on what kind of data we are talking about (audio, video), what hardware, what your role is, and so on. KS is essentially an interface between the hardware specific driver (miniport) and software components that want to talk to it. Explaining KS entirely here would be a very long read, but the documentation exists. You have to understand things like pins, the various roles in the framework, specific IOCTL codes and so on. At its core, miniports expose so called pins that represent device specific data streams. A software component calls KsCreatePin to instantiate such a pin, then configure it using IOCTLs and start streaming by either reading data provided by the miniport/hardware or sending data to it (for WaveCyclic data is tranfered through IRPs, for WaveRT by writing/reading to the shared buffer). For WaveRT, the most important codes here are the KSPROPERTY_RTAUDIO_* ones (like KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION). For me, using the tool KsStudio (part of the WDK) was very helpful. The software component that usually handles the KS interactions from the software side on modern Windows is audiodg, which is a usermode service and also acts as the “bridge“ between KS and WASAPI.

If you are writing an audio miniport driver yourself, a lot of the KS specific logic is actually handled by the PortCls driver for you, so you should check this part as well

The audio samples from microsoft on github are a valuable resource. See, for example Simple audio sample