Basic AVStream data flow questions

OSR_Community_User · July 18, 2012, 9:44am

I’m about to start work on my first AVStream minidriver (PCI-based video capture). Unfortunately, I’m still a bit unclear on the overall data flow for such drivers, and have a few questions in that regard:

1.) First and foremost, is the data pulled up from higher layers or is is pushed up from lower layers? I have been assuming that, like most drivers, data is pulled up via requests from higher layers via IRPs. However, this seems somewhat counterintuitive, given that the video capture hardware will be isochronously spewing data up to the driver.

2.) If, indeed, data is being pulled up from higher layers (assume, for example, that some Media Foundation application is the ultimate client), who is responsible for issuing the IRPs, and how do they get issued at the appropriate rate to consume the data at the same rate that it is being produced?

3.) Terminology. The documentation states that “Each pin has an independent queue of data buffers. When a data packet arrives at the pin (either a read or write request), AVStream adds the packet to the queue and might call the pin’s process dispatch.”. Are these “data buffers” synonymous with “frames”? Are these "data packet"s synonymous with “IRPs”? Is there a one-to-one-relationship between frames and IRPs?

4.) Assume that there is some drift or other mismatch in production/consumption rates. What happens if the previous frame has not been completed by the time a new request is received? Conversely, what happens if the hardware sends data and there isn’t yet a frame buffer to put it in?

I apologize that these are probably basic questions, but the answers were not immediately apparent to me while reading through the docs.

Tim_Roberts · July 18, 2012, 12:38pm

xxxxx@cspeed.com wrote:

I’m about to start work on my first AVStream minidriver (PCI-based video capture). Unfortunately, I’m still a bit unclear on the overall data flow for such drivers, and have a few questions in that regard:

1.) First and foremost, is the data pulled up from higher layers or is is pushed up from lower layers? I have been assuming that, like most drivers, data is pulled up via requests from higher layers via IRPs. However, this seems somewhat counterintuitive, given that the video capture hardware will be isochronously spewing data up to the driver.

Well, the answer really is “both”. Empty frames are created by the
graph and handed to you from above. Your “Process” callback will be
called when empty frames are available. It’s up to you to fill those
frames and send them back. If you check the “avshws” sample in the WDK,
you’ll see that they map the empty frames as soon as they arrive, so
they can have the fake hardware DMA directly into the user buffer.
Thus, they might have many empty frames being held in the driver.

That means your driver must decide how it will handle starvation, if the
buffers get behind. In some cases, I have found it easier to handle
this by having my own circular buffer scheme, and then copying data into
the user buffers.

2.) If, indeed, data is being pulled up from higher layers (assume, for example, that some Media Foundation application is the ultimate client), who is responsible for issuing the IRPs, and how do they get issued at the appropriate rate to consume the data at the same rate that it is being produced?

When the DirectShow graph starts, DirectShow queries the graph to find
out who is going to be responsible for allocating frames. Sometimes,
it’s the source filter, sometimes no one does, in which case the graph
allocates the memory. The filters all agree on buffer allocation
requirements (and your AVStream filter gets to participate in that),
which identifies how large the buffers need to be, and how many teach
filter suggests. The graph allocates those frames and starts them
circulating.

After you fill a buffer and send it back (by advancing the “leading edge
pointer”), the buffer moves along to the next filter in the graph. When
it gets to a filter that consumes the buffer (often the renderer), the
buffer is returned to the free list and will be handed back to you.

So, no one really cares how fast you produce data. They will hand you
empty buffers as soon as they become empty, and you fill them. For a
real-time preview, frames are displayed immediately, so modulo graph
overhead, buffers are always available.

3.) Terminology. The documentation states that “Each pin has an independent queue of data buffers. When a data packet arrives at the pin (either a read or write request), AVStream adds the packet to the queue and might call the pin’s process dispatch.”. Are these “data buffers” synonymous with “frames”? Are these "data packet"s synonymous with “IRPs”? Is there a one-to-one-relationship between frames and IRPs?

For frame-based video, one buffer equals one frame. For audio, one
buffer equals some quantum of time.

In AVStream, you don’t usually think about IRPs. Empty buffers are sent
from user mode (by ksproxy) in a DeviceIoControl request
(IOCTL_KS_READ_STREAM). Like all ioctls, that results in an IRP, but
the IRP is intercepted by ks.sys, which is linked with your driver.
Ks.sys then calls your Process callback. Your Process callback then
uses the KS APIs to fetch the leading edge of the stream. You don’t
directly work with the IRP.

4.) Assume that there is some drift or other mismatch in production/consumption rates. What happens if the previous frame has not been completed by the time a new request is received? Conversely, what happens if the hardware sends data and there isn’t yet a frame buffer to put it in?

Your driver has to decide what to do if you get data and there is no
empty buffer at the leading edge. The avshws sample handles it this
way: as each empty buffer arrives, they lock it in memory and create a
DMA request. The DMA completion interrupt then advances the leading
edge, which sends the filled buffer back up. In this case, if buffers
get behind, the driver will simply not have a DMA request pending, and
it will be up to the hardware to decide what to do with the data.

In my AVStream drivers, I generally decide to throw away my oldest frame
when this happens.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

OSR_Community_User · July 18, 2012, 3:24pm

Tim,

Thanks for the very thorough response! This was definitely helpful in helping me to understand AVStreams better.

I do have a couple more questions, though, based on your responses.

Is it the Frames member of the KSALLOCATOR_FRAMING structure that is responsible for determining how many total frames will be available to the minidriver? For example, if I specified 4, does this mean that the minidriver could theoretically handle a burst of 4 frames from the hardware without losing any data (assuming that none are in use prior to the burst)?

Also, it is a little difficult to tell from the “simulated” DMA in the sample driver, but does KSPIN_FLAG_GENERATE_MAPPINGS only work for host-initiated DMA transfers, or is it possible to use with device-initiated transfers? (I’m a little flaky on DMA, so it’s possible that this question doesn’t make sense…)

OSR_Community_User · July 18, 2012, 3:52pm

Whoops, almost forgot. I also had another, easy question:

Are there any AVStream/related APIs that require C++ interfacing? I noticed that both AVStream sample drivers use C++, but could C be used just as easily instead?

Tim_Roberts · July 20, 2012, 1:01pm

xxxxx@cspeed.com wrote:

Is it the Frames member of the KSALLOCATOR_FRAMING structure that is responsible for determining how many total frames will be available to the minidriver? For example, if I specified 4, does this mean that the minidriver could theoretically handle a burst of 4 frames from the hardware without losing any data (assuming that none are in use prior to the burst)?

No, the promises aren’t that concrete. They’re really just strong
suggestions. When you say “4”, the graph will allocate and circulate at
least 4 frame buffers. That does NOT mean there will always be 4 empty
buffers waiting to be filled. Three of the 4 buffers might be in
process farther downstream.

Further, in some cases, you might not even get 4. If the graph
discovers it can connect you directly to a texture surface in the
renderer, for example, it might create two texture surfaces and hand
THOSE to you to be filled in.

Also, it is a little difficult to tell from the “simulated” DMA in the sample driver, but does KSPIN_FLAG_GENERATE_MAPPINGS only work for host-initiated DMA transfers, or is it possible to use with device-initiated transfers? (I’m a little flaky on DMA, so it’s possible that this question doesn’t make sense…)

Think about the timing. Before the device can do DMA, the buffer must
be mapped and locked, and you have to tell the device those addresses.
If your device is doing DMA on its own, how could you synchronize the
device with the buffer addresses?

If your device does DMA autonomously, then you have no choice but to
allocate a common buffer in your driver, do the DMA to that, and copy
from there into the leading edge.

Are there any AVStream/related APIs that require C++ interfacing? I noticed that both AVStream sample drivers use C++, but could C be used just as easily instead?

It can all be done in C. The advantage of C++ is that you can borrow
from the samples more directly.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Maxim_S_Shatskih · July 23, 2012, 4:50am

>However, this seems somewhat counterintuitive

No. Pull is easier then push. Actually, push is a notification+pull.

get issued at the appropriate rate to consume the data at the same rate that it is being produced?

The client pre-issues lots of IRPs to the driver. Yes, if queue replenishment (this pre-issuing) will be done too rarely, then you will have a packet drop. So, the only guarantee the client needs is to call pre-issue not rarely then once per some time period, which perion is X * N, where N is the number of pre-issued packets.

With video and its constant rate, this is trivial.

4.) Assume that there is some drift or other mismatch in production/consumption rates.

There is no “consumption rate” here, the consumer is not realtime.

There is only pre-issue rate, which can be implemented to be higher beyound any doubt then the your video stream’s requirement.

Conversely, what happens if the hardware sends data and there isn’t yet a frame buffer to put it in?

Then you have a packet drop.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · July 23, 2012, 5:07pm

Great! Thanks again for all your help. This is all very useful information.