Hello everyone! I’m working on a project where I need to capture input from a computer’s microphone. I want to process this input to merge it with another audio stream. This way, I can play the combined audio as if it’s coming directly from the user’s microphone.
So far, I’ve attempted to capture microphone input, merge it with the other audio stream, and then output it using virtual cables. However, we’re encountering significant latency issues. Our product is built on C#, and currently, we’re utilizing NAudio for our audio processing needs. Despite trying every available API, we’re struggling to reduce the input delay.
After some research, I discovered that certain software solutions tackle this problem differently. They install a specific driver on the user’s microphone, which seemingly enables them to modify the microphone’s input by adding extra data or sounds. This approach eliminates the necessity of performing all the processing within the application itself. Instead, the driver handles receiving the additional stream and outputs it through the microphone.
It seems like that might be the case. I’ve been exploring various resources to understand how these software solutions accomplish this, and it appears that Windows Audio Processing Objects (APOs) might be what I’m searching for, although I’m not entirely certain yet.
I came here to ask the following questions:
Are Windows APOs the solution for this? Am I on the right track?
While I consider myself to be a fairly seasoned developer, I’m completely new to the world of Windows ecosystem development, especially regarding drivers. Any help or guidance would be greatly appreciated.