How to perform adaptative resampling

OSR_Community_User · April 25, 2016, 4:26am

Hi,

I am developping an audio driver which sends audio data using a socket
to a streaming application.

The driver and the streaming application runs on two different
computers and have both a distinct clock and I have no way to
synchronize them.

I can’t modify the streaming application code, but it sends me usefull
playback feedback letting me know if I send too much or not enough
audio data.

With two different clocks and no way to synchronize them, I am having
what seems to be a classic problem of adaptative resampling (here an
article explaining the problem
http://kokkinizita.linuxaudio.org/papers/adapt-resamp.pdf I am refering
to) and I can see that my driver sends data too slowly which leads to a
buffer under run after a few minutes of continuous playback.

I can’t really modify the streaming application source code, so, I have
to find a way to implement some kind of adaptative resampling on the
driver side.

I am not sure how I should implement such mechanism.

Depending on the audio feedback sent by the streaming application, I
compute a clock adjustement and set a timer to trigger each second +/-
this clock adjustement. Each time this timer triggers, I call the
IPortWaveCyclic::Notify method. I also implement the
IMiniportWaveCyclicStream::SetNotificationFreq but it does not work.

Thanks in advance for any tips

Cheers

Alex_Grig · April 25, 2016, 11:27am

It depends on what audio quality you operate and what magnitude of discrepancy you need to compensate.

If it’s not HiFi music data, you cound simply insert an extra sample or remove a sample every so often.

OSR_Community_User · April 25, 2016, 12:14pm

On 2016-04-25 15:24:57 +0000, xxxxx@broadcom.com said:

It depends on what audio quality you operate and what magnitude of
discrepancy you need to compensate.

If it’s not HiFi music data, you cound simply insert an extra sample or
remove a sample every so often.

Smart idea but I would prefer not to drop nor insert any samples.
This driver should provide an audio data path as bit perfect as possible.

Alex_Grig · April 25, 2016, 1:26pm

>This driver should provide an audio data path as bit perfect as possible.

This doesn’t make sense. You need to come with numbers and what result you want to have, and then choose the solution that satisfies these numbers. It depends on whether the playback rate discrepancy is 0.1% or 1% or 10% or 0.01%. Whether the audio is speech or music. And so on.

Tim_Roberts · April 25, 2016, 2:01pm

Matthieu Collette wrote:

Smart idea but I would prefer not to drop nor insert any samples.
This driver should provide an audio data path as bit perfect as possible.

That’s complicated. That means doing your own metrics to come up with
the long-term average data rate on each end, and adjusting your resample
algorithm accordingly. Which, I guess, is exactly what you said in your
subject line.

You might ask your question on the [wdmaudiodev] mailing lists. Most of
the cool audio guys hang out there, including several members of the
Microsoft audio team, who do participate in discussions.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Alex_Grig · April 25, 2016, 3:49pm

Are both source and playback hardware devices?

OSR_Community_User · April 26, 2016, 3:30am

On 2016-04-25 17:58:48 +0000, Tim Roberts said:

Matthieu Collette wrote:
> Smart idea but I would prefer not to drop nor insert any samples.
> This driver should provide an audio data path as bit perfect as possible.

That’s complicated. That means doing your own metrics to come up with
the long-term average data rate on each end, and adjusting your resample
algorithm accordingly. Which, I guess, is exactly what you said in your
subject line.

You might ask your question on the [wdmaudiodev] mailing lists. Most of
the cool audio guys hang out there, including several members of the
Microsoft audio team, who do participate in discussions.

Hi Tim,

Yes, I definitely need an adaptative solution because the bandwidth on
the network link between the computeur and the end device may vary
depending on the type of connection (Ethernet, Wifi).
Based on the feedback data sent by the end device, I am able to know if
I should resent some packets, accelerate or slow down the pace, but I
am still struggling with the driver part.

I’ll take a look at the [wdmaudiodev] list and post a few questions to
those cool guys you are talking about.

I’ll keep you updated.

Thanks

OSR_Community_User · April 26, 2016, 3:37am

On 2016-04-25 17:24:00 +0000, xxxxx@broadcom.com said:

>This driver should provide an audio data path as bit perfect as possible.

This doesn’t make sense. You need to come with numbers and what result
you want to have, and then choose the solution that satisfies these
numbers. It depends on whether the playback rate discrepancy is 0.1% or
1% or 10% or 0.01%. Whether the audio is speech or music. And so on.

By bit perfect I mean no data loss and no data compression, I mean
being able to send audio data as is.

I am not sure to understand what you mean by rate discrepancy, you mean
rate delta or rate ratio between two devices ?

OSR_Community_User · April 26, 2016, 3:38am

On 2016-04-25 19:46:49 +0000, xxxxx@broadcom.com said:

Are both source and playback hardware devices?

Yes, both are hardware devices.

Maxim_S_Shatskih · April 26, 2016, 4:22am

> I am not sure to understand what you mean by rate discrepancy, you mean

rate delta or rate ratio between two devices ?

I would suggest you to find the old USB 1.x spec, and read the chapter “Issues with Isochronous Devices” in it.

Probably the chapter is also there in modern USB specs.

The chapter is more like nothing about USB details, it is about general principles of resampling, clock mastering, adaptive rates etc. It is not only useful for USB, but also for generic DirectShow with its concept of “master clock”.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · April 26, 2016, 4:38am

On 2016-04-26 08:20:02 +0000, Maxim S. Shatskih said:

>
> I am not sure to understand what you mean by rate discrepancy, you
> mean> rate delta or rate ratio between two devices ?

I would suggest you to find the old USB 1.x spec, and read the chapter
“Issues with Isochronous Devices” in it.

Probably the chapter is also there in modern USB specs.

The chapter is more like nothing about USB details, it is about general
principles of resampling, clock mastering, adaptive rates etc. It is
not only useful for USB, but also for generic DirectShow with its
concept of “master clock”.

Ok thanks, I’ll try to find this document.

Alex_Grig · April 26, 2016, 10:08am

OK, so far we have:

Hardware source with some fixed sample rate.
Remote hardware sink with some fixed sample rate that differs by unknown maximum delta %.
Network link, unknown whether lossy (UDP) or lossless (TCP/IP).
Unknown type of audio; cannot decide which loss/distortion is objectionable or not.

Do you know these variables yourself, or just wandering in the dark, like the rest of this forum?

OSR_Community_User · April 26, 2016, 11:26am

On 2016-04-26 14:05:44 +0000, xxxxx@broadcom.com said:

Hi !

First of all, thanks for your time and help.

OK, so far we have:

Hardware source with some fixed sample rate.
The sample rate of the hardware source is not fixed, it depends on the
current file being played.
Allowed value could be one of 44100, 48000, 88200, 96000, 176400 or 192000 Hz.

Remote hardware sink with some fixed sample rate that differs by
unknown maximum delta %.
The sample rate of the remote hardware are the same possible values as
the source hardware ones.
This sample rate may differ by an unknown maximum delta %.

Network link, unknown whether lossy (UDP) or lossless (TCP/IP).
Two network links have to be considered.
A TCP connection between the driver and the streaming application.
A UDP connection between the streaming application and the remote
hardware, either ethernet or wifi.

Unknown type of audio; cannot decide which loss/distortion is
objectionable or not.
I’m dealing with Hifi audio data and do not want to alter audio data is
possible.
The prefered use case is to play audio file as is without any
modification or do not play it.

Do you know these variables yourself, or just wandering in the dark,
like the rest of this forum?
Sorry, I realize that I could have given a more detailled description
of my problem.
Here what I know for sure.

The basic setup is as follow:

a driver runnning on a computer, sending audio data through a TCP
socket to a streaming application which also runs on the same computer
a hardware remote device receiving audio data from the streaming
application through a UDP socket

I don’t have access to the remote hardware source code and I can’t modify it.

I don’t have access to the streaming application source code neither, I
just know it tells me if I send too much or nor enough data.
I also know it is packetizing audio data to be send through the UDP
socket and it is in charge of re emitting some audio packets in case
some are lost.

Alex_Grig · April 26, 2016, 11:54am

This whole scheme is quite ill-conceived.

Typical cheap crystal oscillators have precision about 0.001% (think 1 second per day); very cheap ones can be 0.01% (10 seconds per day). Suppose your source is clocked by a Very Cheap Oscillator and will give you 1 second drift every two hours.

You have a few different approaches to this:

If you don’t care about latency, you can buffer 1 second of data, and have trouble-free playback of a two hour file, which is longer than a longest symphony, and definitely longer than a longest symphony movement (part).
If you need small latency (why?), you can use a sliding filter interpolation, which is a FIR filter with coefficients calculated on the fly.
You can use VCXO to clock your playback device, or adjust the PLL dividers in the playback device as necessary.

To figure out how to filter the playback rate delta, you need to be an EE.

OSR_Community_User · April 26, 2016, 12:30pm

On 2016-04-26 15:51:32 +0000, xxxxx@broadcom.com said:

This whole scheme is quite ill-conceived.
You are right, things could have been done differently. I have actually
no other choices but to use this scheme.

Typical cheap crystal oscillators have precision about 0.001% (think 1
second per day); very cheap ones can be 0.01% (10 seconds per day).
Suppose your source is clocked by a Very Cheap Oscillator and will give
you 1 second drift every two hours.
I didn’t know those stats.
Actually, 10 to 20 minutes are enough to lead to a buffer under run
which seems way to fast, even in the worse case you are refering to.
It seems I have a performance issue more than a clock issue because my
driver does not provide audio data fast enough.

You have a few different approaches to this:

If you don’t care about latency, you can buffer 1 second of data,
and have trouble-free playback of a two hour file, which is longer than
a longest symphony, and definitely longer than a longest symphony
movement (part).
This solution seems not too complicated to implement, I’ll try this one first.
Speaking of latency, what about network latency, should this be an
issue ? Computer and remote playback device can’t be connected together
without using a router.

If you need small latency (why?), you can use a sliding filter
interpolation, which is a FIR filter with coefficients calculated on
the fly.
Small latency is not a requirement, but reliability is.

You can use VCXO to clock your playback device, or adjust the PLL
dividers in the playback device as necessary.
To figure out how to filter the playback rate delta, you need to be an EE.
The playback device has its clock, why VCXO would be a better solution
? In any case, I can’t modify the hardware.

Thanks a lot for your help.

Tim_Roberts · April 26, 2016, 12:47pm

Matthieu Collette wrote:

Actually, 10 to 20 minutes are enough to lead to a buffer under run
which seems way to fast, even in the worse case you are refering to.
It seems I have a performance issue more than a clock issue because my
driver does not provide audio data fast enough.

The transport medium should be irrelevant. It will affect the latency,
and it might cause short-term spikes that you really can’t hide, but
over the long term at a given latency, you should only be experiencing
the difference between the production rate and the consumption rate.
That’s computable, although not necessarily easy.

Now, I suppose it’s not entirely impossible that a sucky internet
connection might not be able to keep up with a high-end audio stream.
That’s the kind of problem that the early video conference systems had
to handle, usually be renegotiating a lower quality compression rate.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Maxim_S_Shatskih · April 26, 2016, 5:48pm

> Ok thanks, I’ll try to find this document.

What I remember from it:

the device can be:
a) adaptive
it provides the USB commands to tweak its internal clock to adapt it exactly to the sampling rate (avoiding skew).
b) synchronous
it ties its sampling clock to USB bus clock. So, if you tweak the USB bus clock (yes, for the whole bus), then this device is like adaptive
c) asynchronous
no clock tweaking at all, jitter or software resampling is a must.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 26, 2016, 6:23pm

> The sample rate of the hardware source is not fixed, it depends on the

current file being played.
Allowed value could be one of 44100, 48000, 88200, 96000, 176400 or 192000 Hz.

File being played? then it is software source, not hardware. It has no clock of its own, and trivially adapts to the sink clock.

Usually, RTP is used for audio over network, it is based on UDP.

You can do deep buffering like DirectShow does, this will ensure the Hi-Fi quality, but will cause time lag from start to first audible sound. Unacceptable (as is DirectShow, at least was 10 years ago) for communication apps.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

MBond · April 26, 2016, 7:19pm

Early? I thought this is exactly what skype and webex do now

Sent from Mailhttps: for Windows 10

From: Tim Robertsmailto:xxxxx
Sent: April 26, 2016 12:45 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: Re: [ntdev] How to perform adaptative resampling

Matthieu Collette wrote:
>
> Actually, 10 to 20 minutes are enough to lead to a buffer under run
> which seems way to fast, even in the worse case you are refering to.
> It seems I have a performance issue more than a clock issue because my
> driver does not provide audio data fast enough.

The transport medium should be irrelevant. It will affect the latency,
and it might cause short-term spikes that you really can’t hide, but
over the long term at a given latency, you should only be experiencing
the difference between the production rate and the consumption rate.
That’s computable, although not necessarily easy.

Now, I suppose it’s not entirely impossible that a sucky internet
connection might not be able to keep up with a high-end audio stream.
That’s the kind of problem that the early video conference systems had
to handle, usually be renegotiating a lower quality compression rate.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

Tim_Roberts · April 26, 2016, 7:47pm

Marion Bond wrote:

Early? I thought this is exactly what skype and webex do now

Yes, with video. Twenty years ago, in the ISDN days, you had to do it
with audio, too. I doubt they do that much any more. Voice-quality
datarates aren’t much of a bandwidth burden these days

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.