Sync data exchange between kernel und user mode

K_Weller · December 22, 2023, 12:37pm

Hi,
I’m looking for a proper way to sync my kernel data and my user mode data with focus of low latency.
I have a doubled buffered UM application which receives and send data from and to USB (ISOCH streaming).
(Double buffering like described here: )

I need to send the USB data input to the UM app and send the output of the UM app down to USB.
One package in and one package out must be in sync.
I have implemented the Inverted Call model successfully for signaling incoming data from the USB.
Until now my suggestion was, once I received data from USB IN, I will complete the IOCTRL and the UM app can read the data and arrange the double buffer toggle and fill my USB output buffer.
With this I have the corresponding out package ready and the USB OUT can pick the output data for the next USB out completion.
It seems that the timing for the USB callbacks for in and out jittering much and also the timing from my IOCTRL completion into UM have some peaks. So that from time to time my output buffer runs empty. Of course I could increase the buffering somehow. But my focus must be on low latency.
Is there another way than to increase the buffering in between the UM and the kernel to sync both worlds?
What I’m also not 100% sure and I want to ask:
Can I be sure that the incoming and outgoing USB callbacks always in the same order like I have started themor ther happen from the beginning:
In, out, in, out, …, in, out
or is it possible that from time to time something could happen like:
In, out, in, in, out, in, out,…,
or een worse for me: in, out, out, in, out, in…

So any idea how I can design this properly would help me.

Thanks
K. Weller

MBond2 · December 23, 2023, 3:05am

it is unclear what you mean by synching data here. Most drivers don’t retain or process the data coming from lower or higher layers, but simply pass it along (possibly with a transformation)

Inverted call is not appropriate for providing data to UM. Its a model for KM to request services from a UM service.

Ordering is not guaranteed.

If you describe your overall objective, we might be able to help more

Tim_Roberts · December 23, 2023, 7:53am

No, USB doesn’t do that. Remember that a USB frame is all scheduled in advance, and is locked in once the frame begins. There’s inherent buffering that you simply cannot control. By the time you receive a notification that data has arrived, the next microframe has already started and you can’t insert something in time.

USB is totally unsuited to applications that require synchronized back-and-forth transactions like this. You need to design your hardware to be more flexible. I’m sorry, there’s just no alternative.

MBond2 · December 24, 2023, 12:57am

Tim, you clearly read more from the OP’s post than I did. Do you think he is trying to implement some kind of service (satisfied in UM) back to the HW side with no pipeline?

If that’s what he wants, it will surely never work. USB is particularly bad for this pattern, but it won’t work well on any bus unless the latency and jitter tolerances are at least 2 orders of magnitude slower than the expected rate (that’s a rule of thumb)

Tim_Roberts · December 24, 2023, 7:35am

Yep, that’s what his diagram shows. Isochronous in through KM to UM, processed and send back to KM to isochronous OUT.

MBond2 · December 25, 2023, 2:52am

I guess you read his message better than I did. I couldn’t quite understand what he was getting at except that he has a problem with latency.

We don’t know what his overall objective / problem is, but almost certainly pipelining needs to be in his future. USB is particularly bad, but there isn’t any bus that will work well for this

K_Weller · December 27, 2023, 4:48pm

Sorry for the late reply and thank you very much MBond2 and Tim_Roberts for your answeres and your dealing with my question. I really appreciate it!

I try to deal with Audio data via USB Isochronous just like Tim described.

The UM receives the KM in data and once I will signal the UM app that the input buffer from the UM is filled up, at the same time I get one out package which I want to deliver back to KM to send it out via USB as quick as possible.
So I want to tighten this as much as possible and I’m looking for a good design/mechanism how to do it.

The usb data is channel interleaved and the UM buffers expect one channel per buffer, so I need to copy it anyway and I can not pass the memory back and forth by passing the pointers to it etc.

I hope this makes it more clear!?

Thanks again for your time and help!

Regards,
K. Weller

MBond2 · December 27, 2023, 9:14pm

audio data? Normally audio is loss tolerant. I know you want to provide the best outcome, but can you loose data? The answer to that question makes a big difference to your design choices

Tim_Roberts · December 28, 2023, 12:40am

Stepping back a moment, is there a reason you didn’t simply design your hardware to be USB Audio Class compliant? Then the operating system’s efficient and streamlined Audio Engine would have handled all of these details for you.

K_Weller · January 4, 2024, 9:54am

Excuse me for the late answer and upade here I was unfortunatly some days off.

MBond2:
No of course I don’t want to loos data. You will hear that most probably and this is a no go.

Tim Roberts:

I need to deal with the existing ASIO interface which is given because of low latency.

My HW is USB Audio Class compliant and uses asynchronous feedback so the amount of the samples for input and output are even because the HW clock is used and DAC/ADC are in sync.

So I need to transfere the data between UM and KM at the same rate and as quick as possible.

Regards,
K. Weller

Tim_Roberts · January 5, 2024, 4:45am

WHAT data do you need to transfer? If you are USB Audio Class compliant, then the Audio Engine is taking care of the data. What are you transferring? Where does it come from?

K_Weller · January 5, 2024, 8:58am

I transfer interleaved audio data by myself from and to the HW.
Unfortunately I cannot use the audio engine for this because I need to mix several audio data before I send it out to the HW for example. The other way round it is similar, I need to distribute the incoming audio data to different recipients. That’s given and can’t be changed.

So my focus for this question is:
I get data from the HW, I will marshaling them from interleaved structure into plain buffer each channel:
[CH1,CH2,CH1,CH2] → [CH1, CH1], [CH2, CH2]
Now I need to transfer this to UM, or I could think about doing the marshaling in UM but it needs to be done.
The other way round I will receive [CH1, CH1], [CH2, CH2] from UM and marshal them to interleaved audio, [CH1,CH2,CH1,CH2], or like above I do this in UM as well.

The question is still, how I design the transfer and notification between KM and UM.
As I said, I have implemented the inverted call stuff for this.
But still struggling how I can synchronize the incoming and outgoing data, without buffering in between.
If I get the input block, I need to deliver the output block just in time.

Thanks
K. Weller

Tim_Roberts · January 5, 2024, 11:42pm

I need to distribute the incoming audio data to different recipients. That’s given and can’t be changed.

Your path is going to end in tears. Every layer you add, adds latency. That’s an axiom. You are talking about reimplementing the Audio Engine. Surely it would be better to find a way to implement your desires within the highly optimized skeleton that already exists.

Have you investigated what can be done in an APO? There are APOs per endpoint, and APOs per device.

MBond2 · January 6, 2024, 1:57am

I think you need to apply some judgement. The goals of minimum possible latency and maximum data loss prevention are incompatible. You are going to have to decide which one is more important. For Microsoft, the audio engine prioritizes latency at the expense of possible data loss.

As an example, imagine that you are a person of very regular habits. Every morning you get up and check the weather forecast. Then one day for some reason the weather isn’t available when you normally read it and you miss a day. The next day, do you try to ready about yesterday’s weather and today’s weather? or do you just read today’s forecast? If the purpose of reading the weather is to decide what to ware, they yesterday’s forecast is of no help and should just be skipped. But if the purpose of reading the forecast is to collect time series data and produce some analytics (forecast accuracy, longest streak of rain / sun / hot / cold etc.) then yesterday’s forecast is still of use and should be read

The fist kind of use is loss tolerant. The second use is not - the veracity of the analysis is compromised by missing data. Being loss tolerant does not mean that we want to lose data - if the weather forecast was available, you would have used it - and it does mean that you have to decide how to dress without the usual input. One choice is to dress the same as yesterday assuming that the weather doesn’t change too much (repeat the previous audio sample). Another is to go back to bed and wait for tomorrow (null sample). There are others and I am skipping over many details

Windows is a pre-emptive multi-tasking OS. By definition there will be times when your data is delayed. On modern machines & versions of Windows those times will be rare, but the design of your software has to consider that once in a blue moon, it will happen.

The audio engine has been the subject of extensive discussions in large part I think because it assumes that audio data is loss tolerant. This is rare for data processed by Windows. Which means that it uses techniques that are rarely used on Windows - shared memory that can both underflow and overflow. Memory barriers and the hardware based cache coherency protocol provide integrity with a minimum of latency. But great care is taken on the KM side to avoid potential security issues. And the cleanup problem. And, and and. This is not a simple project and it took Microsoft several major revisions over a span of years to get to what we have today

Tim_Roberts · January 6, 2024, 4:23am

For Microsoft, the audio engine prioritizes latency at the expense of possible data loss.

For what it’s worth, this is true of all of the streaming components. The kernel streaming video path also prioritizes latency at the expense of data loss. This is why USB isochronous pipes is used for audio and video, because it implements the same philosophy.

K_Weller · January 6, 2024, 8:15am

Thanks Tim,

well, the driver allready exists and is working (since some time and working years).
The ASIO part is on a “fastpath” and will be treated with high priority, others can wait.
So what I want to do is I need to optimize the signaling between KM and UM.
APO is not what I want.
So I would really appreciate if there are some hints how I could achive this, how I could designe the KM->UM->KM part in an efficent way.

Lets fade out audio for a moment and try to look at it like this:
What I want is, passing the incomming USB data as quick as possible from USB input to USB output taking the roundabout way over UM.
Does this makes sense?

Thanks
K. Weller

K_Weller · January 6, 2024, 1:51pm

Sorry somehow the latest two posts did not show up when I answered the first one above.

Thanks MBond2 for your comment.

Hmm well I think I know and understand what both of you say about loss tolerant. And I’m aware that isochronous has no error correction.
But at the end of the day I don’t care about the USB part here, I care about KM->UM->KM.

So mabye the above statement or “requirement” should be the base of my question:

What I want is, passing the incomming USB data as quick as possible from USB input to USB output taking the roundabout way over UM.

But do you mean I should accept data loss at this point too?
And let’s say I accept it, what is a good design to implement this?

Thanks again.

K. Weller

MBond2 · January 6, 2024, 11:31pm

I think I can speak for Tim and say that we are both saying that you want to accept the possibility of loss in the UM ↔ KM part. You have already accepted the possibility of loss on the hardware side, so you aren’t really accepting any new risk

Given that you don’t want to use the audio engine, the next question is whether all of these streams of audio should be sent to the same UM process? or more specifically the same HANDLE that UM will open to your driver?

The next model you want to learn about is the long lived IRP method of creating shared memory regions. And then you want to look into interlocked data structures and side band control IRPs.

Assuming that this is not a commodity device for broad distribution, you can justify the security issues associated with a simple /naive approach.

Tim_Roberts · January 7, 2024, 1:22am

What I want is, passing the incomming USB data as quick as possible from USB input to USB output taking the roundabout way over UM.
Does this makes sense?

You have said you are USB Audio Class compliant. So, are you completely replacing usbaudio.sys? How do you expect to intercept the data without involving Audio Engine (which, of course, is already a UM component).

K_Weller · January 7, 2024, 2:30pm

Given that you don’t want to use the audio engine, the next question is whether all of these streams of audio should be sent to the same UM process? or more specifically the same HANDLE that UM will open to your driver?
Yes only one HANDLE is on UM side. This one HANDLE gets the Audio input and delivers the Audio output.

long lived IRP method of creating shared memory regions.
Sending an IRP and keep it PENDING and use the memory of the OVERLAPPED or what do you mean here?
I found some in the OSR forum here but not much in the rest of the WWW.

side band control IRPs.
Can you give me a more detailed link to this. I didn’t find much about this either.

Does this makes sense?
I think we can reduce it to this, so yes it makes sense.

So, are you completely replacing usbaudio.sys?
Yes