How to perform adaptative resampling

Hi,

I am developping an audio driver which sends audio data using a socket
to a streaming application.

The driver and the streaming application runs on two different
computers and have both a distinct clock and I have no way to
synchronize them.

I can’t modify the streaming application code, but it sends me usefull
playback feedback letting me know if I send too much or not enough
audio data.

With two different clocks and no way to synchronize them, I am having
what seems to be a classic problem of adaptative resampling (here an
article explaining the problem
http://kokkinizita.linuxaudio.org/papers/adapt-resamp.pdf I am refering
to) and I can see that my driver sends data too slowly which leads to a
buffer under run after a few minutes of continuous playback.

I can’t really modify the streaming application source code, so, I have
to find a way to implement some kind of adaptative resampling on the
driver side.

I am not sure how I should implement such mechanism.

Depending on the audio feedback sent by the streaming application, I
compute a clock adjustement and set a timer to trigger each second +/-
this clock adjustement. Each time this timer triggers, I call the
IPortWaveCyclic::Notify method. I also implement the
IMiniportWaveCyclicStream::SetNotificationFreq but it does not work.

Thanks in advance for any tips :wink:

Cheers

It depends on what audio quality you operate and what magnitude of discrepancy you need to compensate.

If it’s not HiFi music data, you cound simply insert an extra sample or remove a sample every so often.

On 2016-04-25 15:24:57 +0000, xxxxx@broadcom.com said:

It depends on what audio quality you operate and what magnitude of
discrepancy you need to compensate.

If it’s not HiFi music data, you cound simply insert an extra sample or
remove a sample every so often.

Smart idea but I would prefer not to drop nor insert any samples.
This driver should provide an audio data path as bit perfect as possible.

>This driver should provide an audio data path as bit perfect as possible.

This doesn’t make sense. You need to come with numbers and what result you want to have, and then choose the solution that satisfies these numbers. It depends on whether the playback rate discrepancy is 0.1% or 1% or 10% or 0.01%. Whether the audio is speech or music. And so on.

Matthieu Collette wrote:

Smart idea but I would prefer not to drop nor insert any samples.
This driver should provide an audio data path as bit perfect as possible.

That’s complicated. That means doing your own metrics to come up with
the long-term average data rate on each end, and adjusting your resample
algorithm accordingly. Which, I guess, is exactly what you said in your
subject line.

You might ask your question on the [wdmaudiodev] mailing lists. Most of
the cool audio guys hang out there, including several members of the
Microsoft audio team, who do participate in discussions.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Are both source and playback hardware devices?

On 2016-04-25 17:58:48 +0000, Tim Roberts said:

Matthieu Collette wrote:
> Smart idea but I would prefer not to drop nor insert any samples.
> This driver should provide an audio data path as bit perfect as possible.

That’s complicated. That means doing your own metrics to come up with
the long-term average data rate on each end, and adjusting your resample
algorithm accordingly. Which, I guess, is exactly what you said in your
subject line.

You might ask your question on the [wdmaudiodev] mailing lists. Most of
the cool audio guys hang out there, including several members of the
Microsoft audio team, who do participate in discussions.

Hi Tim,

Yes, I definitely need an adaptative solution because the bandwidth on
the network link between the computeur and the end device may vary
depending on the type of connection (Ethernet, Wifi).
Based on the feedback data sent by the end device, I am able to know if
I should resent some packets, accelerate or slow down the pace, but I
am still struggling with the driver part.

I’ll take a look at the [wdmaudiodev] list and post a few questions to
those cool guys you are talking about.

I’ll keep you updated.

Thanks

On 2016-04-25 17:24:00 +0000, xxxxx@broadcom.com said:

>This driver should provide an audio data path as bit perfect as possible.

This doesn’t make sense. You need to come with numbers and what result
you want to have, and then choose the solution that satisfies these
numbers. It depends on whether the playback rate discrepancy is 0.1% or
1% or 10% or 0.01%. Whether the audio is speech or music. And so on.

By bit perfect I mean no data loss and no data compression, I mean
being able to send audio data as is.

I am not sure to understand what you mean by rate discrepancy, you mean
rate delta or rate ratio between two devices ?

On 2016-04-25 19:46:49 +0000, xxxxx@broadcom.com said:

Are both source and playback hardware devices?

Yes, both are hardware devices.

> I am not sure to understand what you mean by rate discrepancy, you mean

rate delta or rate ratio between two devices ?

I would suggest you to find the old USB 1.x spec, and read the chapter “Issues with Isochronous Devices” in it.

Probably the chapter is also there in modern USB specs.

The chapter is more like nothing about USB details, it is about general principles of resampling, clock mastering, adaptive rates etc. It is not only useful for USB, but also for generic DirectShow with its concept of “master clock”.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

On 2016-04-26 08:20:02 +0000, Maxim S. Shatskih said:

>
> I am not sure to understand what you mean by rate discrepancy, you
> mean> rate delta or rate ratio between two devices ?

I would suggest you to find the old USB 1.x spec, and read the chapter
“Issues with Isochronous Devices” in it.

Probably the chapter is also there in modern USB specs.

The chapter is more like nothing about USB details, it is about general
principles of resampling, clock mastering, adaptive rates etc. It is
not only useful for USB, but also for generic DirectShow with its
concept of “master clock”.

Ok thanks, I’ll try to find this document.

OK, so far we have:

  1. Hardware source with some fixed sample rate.
  2. Remote hardware sink with some fixed sample rate that differs by unknown maximum delta %.
  3. Network link, unknown whether lossy (UDP) or lossless (TCP/IP).
  4. Unknown type of audio; cannot decide which loss/distortion is objectionable or not.

Do you know these variables yourself, or just wandering in the dark, like the rest of this forum?

On 2016-04-26 14:05:44 +0000, xxxxx@broadcom.com said:

Hi !

First of all, thanks for your time and help.

OK, so far we have:

  1. Hardware source with some fixed sample rate.
    The sample rate of the hardware source is not fixed, it depends on the
    current file being played.
    Allowed value could be one of 44100, 48000, 88200, 96000, 176400 or 192000 Hz.
  1. Remote hardware sink with some fixed sample rate that differs by
    unknown maximum delta %.
    The sample rate of the remote hardware are the same possible values as
    the source hardware ones.
    This sample rate may differ by an unknown maximum delta %.
  1. Network link, unknown whether lossy (UDP) or lossless (TCP/IP).
    Two network links have to be considered.
    A TCP connection between the driver and the streaming application.
    A UDP connection between the streaming application and the remote
    hardware, either ethernet or wifi.
  1. Unknown type of audio; cannot decide which loss/distortion is
    objectionable or not.
    I’m dealing with Hifi audio data and do not want to alter audio data is
    possible.
    The prefered use case is to play audio file as is without any
    modification or do not play it.

Do you know these variables yourself, or just wandering in the dark,
like the rest of this forum?
Sorry, I realize that I could have given a more detailled description
of my problem.
Here what I know for sure.

The basic setup is as follow:

  • a driver runnning on a computer, sending audio data through a TCP
    socket to a streaming application which also runs on the same computer
  • a hardware remote device receiving audio data from the streaming
    application through a UDP socket

I don’t have access to the remote hardware source code and I can’t modify it.

I don’t have access to the streaming application source code neither, I
just know it tells me if I send too much or nor enough data.
I also know it is packetizing audio data to be send through the UDP
socket and it is in charge of re emitting some audio packets in case
some are lost.

This whole scheme is quite ill-conceived.

Typical cheap crystal oscillators have precision about 0.001% (think 1 second per day); very cheap ones can be 0.01% (10 seconds per day). Suppose your source is clocked by a Very Cheap Oscillator and will give you 1 second drift every two hours.

You have a few different approaches to this:

  1. If you don’t care about latency, you can buffer 1 second of data, and have trouble-free playback of a two hour file, which is longer than a longest symphony, and definitely longer than a longest symphony movement (part).

  2. If you need small latency (why?), you can use a sliding filter interpolation, which is a FIR filter with coefficients calculated on the fly.

  3. You can use VCXO to clock your playback device, or adjust the PLL dividers in the playback device as necessary.

To figure out how to filter the playback rate delta, you need to be an EE.

On 2016-04-26 15:51:32 +0000, xxxxx@broadcom.com said:

This whole scheme is quite ill-conceived.
You are right, things could have been done differently. I have actually
no other choices but to use this scheme.

Typical cheap crystal oscillators have precision about 0.001% (think 1
second per day); very cheap ones can be 0.01% (10 seconds per day).
Suppose your source is clocked by a Very Cheap Oscillator and will give
you 1 second drift every two hours.
I didn’t know those stats.
Actually, 10 to 20 minutes are enough to lead to a buffer under run
which seems way to fast, even in the worse case you are refering to.
It seems I have a performance issue more than a clock issue because my
driver does not provide audio data fast enough.

You have a few different approaches to this:

  1. If you don’t care about latency, you can buffer 1 second of data,
    and have trouble-free playback of a two hour file, which is longer than
    a longest symphony, and definitely longer than a longest symphony
    movement (part).
    This solution seems not too complicated to implement, I’ll try this one first.
    Speaking of latency, what about network latency, should this be an
    issue ? Computer and remote playback device can’t be connected together
    without using a router.
  1. If you need small latency (why?), you can use a sliding filter
    interpolation, which is a FIR filter with coefficients calculated on
    the fly.
    Small latency is not a requirement, but reliability is.
  1. You can use VCXO to clock your playback device, or adjust the PLL
    dividers in the playback device as necessary.
    To figure out how to filter the playback rate delta, you need to be an EE.
    The playback device has its clock, why VCXO would be a better solution
    ? In any case, I can’t modify the hardware.

Thanks a lot for your help.

Matthieu Collette wrote:

Actually, 10 to 20 minutes are enough to lead to a buffer under run
which seems way to fast, even in the worse case you are refering to.
It seems I have a performance issue more than a clock issue because my
driver does not provide audio data fast enough.

The transport medium should be irrelevant. It will affect the latency,
and it might cause short-term spikes that you really can’t hide, but
over the long term at a given latency, you should only be experiencing
the difference between the production rate and the consumption rate.
That’s computable, although not necessarily easy.

Now, I suppose it’s not entirely impossible that a sucky internet
connection might not be able to keep up with a high-end audio stream.
That’s the kind of problem that the early video conference systems had
to handle, usually be renegotiating a lower quality compression rate.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> Ok thanks, I’ll try to find this document.

What I remember from it:

  • the device can be:
    a) adaptive
    it provides the USB commands to tweak its internal clock to adapt it exactly to the sampling rate (avoiding skew).
    b) synchronous
    it ties its sampling clock to USB bus clock. So, if you tweak the USB bus clock (yes, for the whole bus), then this device is like adaptive
    c) asynchronous
    no clock tweaking at all, jitter or software resampling is a must.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> The sample rate of the hardware source is not fixed, it depends on the

current file being played.
Allowed value could be one of 44100, 48000, 88200, 96000, 176400 or 192000 Hz.

File being played? then it is software source, not hardware. It has no clock of its own, and trivially adapts to the sink clock.

Usually, RTP is used for audio over network, it is based on UDP.

You can do deep buffering like DirectShow does, this will ensure the Hi-Fi quality, but will cause time lag from start to first audible sound. Unacceptable (as is DirectShow, at least was 10 years ago) for communication apps.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Early? I thought this is exactly what skype and webex do now

Sent from Mailhttps: for Windows 10

From: Tim Robertsmailto:xxxxx
Sent: April 26, 2016 12:45 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: Re: [ntdev] How to perform adaptative resampling

Matthieu Collette wrote:
>
> Actually, 10 to 20 minutes are enough to lead to a buffer under run
> which seems way to fast, even in the worse case you are refering to.
> It seems I have a performance issue more than a clock issue because my
> driver does not provide audio data fast enough.

The transport medium should be irrelevant. It will affect the latency,
and it might cause short-term spikes that you really can’t hide, but
over the long term at a given latency, you should only be experiencing
the difference between the production rate and the consumption rate.
That’s computable, although not necessarily easy.

Now, I suppose it’s not entirely impossible that a sucky internet
connection might not be able to keep up with a high-end audio stream.
That’s the kind of problem that the early video conference systems had
to handle, usually be renegotiating a lower quality compression rate.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

Marion Bond wrote:

Early? I thought this is exactly what skype and webex do now

Yes, with video. Twenty years ago, in the ISDN days, you had to do it
with audio, too. I doubt they do that much any more. Voice-quality
datarates aren’t much of a bandwidth burden these days


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.