How to intercept and modify microphone(speaker) data stream in audio filter driver(KMDF)?

I am new to kernel driver development and I know very little about audio drivers and related knowledge.

Now I have a work item to develop a filter driver to intercept the microphone data and modify it.

I used KMDF to implement a demo. Specifically, I called WdfDeviceInitSetIoInCallerContextCallback to register a callback function and output some instructions in the callback function. The code is as follows:

VOID AudioFilterIoInCallerContext(_In_ WDFDEVICE Device, _In_ WDFREQUEST Request)
{
    NTSTATUS status;

    WDF_REQUEST_PARAMETERS params;
    WDF_REQUEST_PARAMETERS_INIT(&params);
    WdfRequestGetParameters(Request, &params);
    if (params.Type == WdfRequestTypeDeviceControl)
    {
        //IOCTL_KS_PROPERTY              3080195
        //IOCTL_KS_ENABLE_EVENT          3080199
        //IOCTL_KS_DISABLE_EVENT         3080203
        //IOCTL_KS_METHOD                3080207
        //IOCTL_KS_WRITE_STREAM          3112979
        //IOCTL_KS_READ_STREAM           3096599
        //IOCTL_KS_RESET_STATE           3080219
        size_t inLen = params.Parameters.DeviceIoControl.InputBufferLength;
        size_t ouLen = params.Parameters.DeviceIoControl.OutputBufferLength;
        PVOID t3Ptr = params.Parameters.DeviceIoControl.Type3InputBuffer;
        ULONG ctlCode = params.Parameters.DeviceIoControl.IoControlCode;
        if (ctlCode == 3096599)
        {
            PIRP pIRP = WdfRequestWdmGetIrp(Request);
            DbgPrint("[MH]IORD: %lu, IL: %llu, OL: %llu, IP: 0x%p, UP: 0x%p, SP: 0x%p\n", ctlCode, inLen, ouLen, t3Ptr, pIRP->UserBuffer, pIRP->AssociatedIrp.SystemBuffer);
        }
        else if (ctlCode == 3112979)
        {
            PIRP pIRP = WdfRequestWdmGetIrp(Request);
            DbgPrint("[MH]IOWT: %lu, IL: %llu, OL: %llu, IP: 0x%p, UP: 0x%p, SP: 0x%p\n", ctlCode, inLen, ouLen, t3Ptr, pIRP->UserBuffer, pIRP->AssociatedIrp.SystemBuffer);
        }
        else
        {
            DbgPrint("[MH]IOCTL: %lu, IL: %llu, OL: %llu, IP:0x%p\n", ctlCode, inLen, ouLen, t3Ptr);
        }
    }
    else
    {
        DbgPrint("[MH]IRP, type: %d, func: %lu\n", params.Type, params.MinorFunction);
    }

    status = WdfDeviceEnqueueRequest(Device, Request);

    if (!NT_SUCCESS(status))
    {
        DbgPrint("[MH]WdfDeviceEnqueueRequest failed with status code 0x%08X\n", status);
        WdfRequestComplete(Request, status);
    }
}

I then installed the driver as an upper filter driver in the device's registry entry.

HKLM\SYSTEM\CurrentControlSet\Enum\HDAUDIO\FUNC_01&VEN_15AD&DEV_1975&SUBSYS_15AD1975&REV_1001\5&217be3d6&0&0001 UpperFilters

I thought that when I play music, the callback function should be triggered and output IOCTL_KS_WRITE_STREAM continuously, but in fact it is not, only the output of individual IOCTL_KS_PROPERTY instructions.

I don't know where the problem is, and Microsoft doesn't seem to have any official documents to refer to.

My system: Windows 11 Pro x64 Build 22621

Request help, thanks.

Nope, it doesn't work that way. In a WAVERT driver, which is what most hardware today uses, the audio packets do not travel through ioctls. Instead, the Audio Engine gets pointers directly into circular buffers in the hardware. It reads and writes the hardware directly. The audio driver is not involved in the audio stream in any way.

Also remember that, with the Audio Engine, applications don't talk too the audio driver. They talk to the Audio Engine process, which is where all the streaming happens.

Philosophically, they do not want you to "intercept and modify microphone(speaker) data" at all. The ease of filtering is what killed the original audio model; so many people forced in cute little audio filter gadgets that the latency became completely crazy, and made it impossible to write reliable professional audio applications. That's why the new architecture was introduced in Vista.

It IS possible to do filtering (like echo cancellation and other effects) using Audio Processing Objects, or APOs. These APOs are DLLs that get loaded into the Audio Engine process at well-defined places in the stream. They aren't particularly easy to install, but at least they're in user mode. Read about it here:

1 Like

Thanks for your answers.

Nope, it doesn't work that way. In a WAVERT driver, which is what most hardware today uses, the audio packets do not travel through ioctls. Instead, the Audio Engine gets pointers directly into circular buffers in the hardware. It reads and writes the hardware directly. The audio driver is not involved in the audio stream in any way.

Through your answer, I found the official Microsoft document. I had seen it before, but I didn’t know what it meant at that time. Now I understand. WaveRT Port Driver

However, APOs may not solve my problem. My goal is not to modify the original audio data.

There is a device on my motherboard that appears to be a USB sound card, but it is actually a fake sound card. If you use this device directly for recording, no data will be generated. User-mode program A needs to use this device as the sound source, and the audio data stream comes from user-mode program B. A is a third-party program, and B is a program I developed myself. For this reason, A and B cannot communicate, so I thought of the solution of filter driver, getting B's data in the driver and feeding it to A.

Now, if the filter driver is not feasible, I may need to study how to write data to the circular buffer, and I need to learn more. If you have good suggestions, I hope you can give me your advice.

Thank you very much!

My first guess is that you are misdiagnosing the situation, because it sounds crazy. No manufacturer would put a USB device on their motherboard that pretends to be a microphone but provides no data. Where did it come from? Is your microphone broken? If so, then just disable the device.

Audio redirection like that is COMPLICATED. Months of work. There is a package on the internet called Virtual Audio Cable that can do what you're asking, and I believe it is free for personal use.

USB Audio Devices are not WaveRT devices, because the circular buffers are not directly writeable. Requests go to USBAUDIO.SYS, which turns them into USB requests.

presumably it is possible to tell program A which audio device to record from? It is not somehow hard coded to only record from that specific device?

If so, your problem could be much simpler. If not, and you have full control of the system, it may be simpler to hook the open calls to make it possible to open a different device. Providing audio data from a UM application to a virtual audio device that is entirely under your control is a much easier task than trying to hijack some other device. That's not a trivial project either

Thanks for your reply.

My first guess is that you are misdiagnosing the situation, because it sounds crazy. No manufacturer would put a USB device on their motherboard that pretends to be a microphone but provides no data. Where did it come from? Is your microphone broken? If so, then just disable the device.

It is not an ordinary device, but a special one. It is embedded in another device, which is a Linux system. The sound card is virtualized by Linux.

In the end, my work item was cancelled, and Linux was responsible for providing the audio data to the microphone.

Thanks again.

Thanks for your answers.

presumably it is possible to tell program A which audio device to record from? It is not somehow hard coded to only record from that specific device?

A maintains a whitelist of device identifiers and will use the devices in the whitelist. No specific encoding or format is used.

Please refer to what is said above.

Thanks again.

That's probably a good choice given the difficulties of any other solution that doesn't follow the normal rules for an audio device, and the idea that whatever component presents itself as an audio device should provide the audio data