AVStream driver buffering

I’ve done a variety of kernel-mode drivers over the years, but this is
my first AVStream driver. I’ve created a driver based on the “avshws”
sample in the DDK. It’s for a simple video capture card which does not
support DMA.

In my Process routine, I simply advance the leading edge stream pointer
in my Process routine. In my DPC for ISR (triggered when data is
available), I get the trailing edge stream pointer, copy the data from
the hardware into the buffer, and advance the trailing edge pointer.

However, what I see occasionally is a NULL return from
KsPinGetTrailingEdgeStreamPointer. It appears that I’m only ever
getting one buffer to fill; sometimes Process is called too late (i.e.
after my hardware interrupt and the call to the DPC).

I’ve tried playing with the KSALLOCATOR_FRAMING_EX values, to no avail.
I guess I expected that I would get a queue of buffers (i.e. at least
2), and that I could fill them directly as data arrived from the hardware.

Can someone point me in the right direction here? I’d really like to
avoid buffering the data internally, since an extra copy would be
undesirable.

Thanks in advance,

– mkj


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

No takers? Let me try asking a somewhat different question. If I
understand correctly, I can create my own allocator to achieve the
buffering I want. However, there is no guarantee that my allocator will
be used, is that correct?

This device will be used in an application we are creating; does that
change the equation with respect to allocators? When you are in control
of the graph, are the way(s) to ensure the capture device’s allocator is
the one which gets used?

If not (or if creating an allocator is not a wise path to go down), is
there any way to influence the standard allocator to put more than one
buffer into play? Or am I basically stuck–do I really need my own
private buffer(s), and should I just live with the extra RtlCopyMemory?

TIA,

–mkj

Michael Jones wrote:

I’ve done a variety of kernel-mode drivers over the years, but this is
my first AVStream driver. I’ve created a driver based on the “avshws”
sample in the DDK. It’s for a simple video capture card which does not
support DMA.

In my Process routine, I simply advance the leading edge stream pointer
in my Process routine. In my DPC for ISR (triggered when data is
available), I get the trailing edge stream pointer, copy the data from
the hardware into the buffer, and advance the trailing edge pointer.

However, what I see occasionally is a NULL return from
KsPinGetTrailingEdgeStreamPointer. It appears that I’m only ever
getting one buffer to fill; sometimes Process is called too late (i.e.
after my hardware interrupt and the call to the DPC).

I’ve tried playing with the KSALLOCATOR_FRAMING_EX values, to no avail.
I guess I expected that I would get a queue of buffers (i.e. at least
2), and that I could fill them directly as data arrived from the hardware.

Can someone point me in the right direction here? I’d really like to
avoid buffering the data internally, since an extra copy would be
undesirable.

Thanks in advance,

– mkj


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

It’s been a while since this question was asked, but… I will try it anyway.

Why are you using trailing edge stream pointer? It’s possible that you are doing what you want to
be doing, but I am guessing that you might be misunderstanding the purpose of leading and trailing
edge stream pointer.

In one of hardware designs that I happened to have a pleasure working on the FIFO was so small
that it couldn’t hold any meaningfully sized data. However it was possible to feed it a number of
buffers so it would switch to the next one when it finished working on the current one. For that
we had to use the trailing edge to have our hardware buffer and work one more than one buffer at a
time.
Even with that design we would still have to work with leading edge mostly and release it first.

If I understand you correctly you are releasing trailing edge stream pointer, which is not
something that downstream clients would expect you to do.
You are handing them a buffer out-of-order. Most of the time that will keep on waiting for the
first buffer (leading edge) unless leading edge=trailing edge, which means you have only 1 buffer
queue.

Also playing with allocator properties may not buy you anything because in many cases those
represent only preferences and will be overwritten or ignored by downstream client.

– Max.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-262702-
xxxxx@lists.osr.com] On Behalf Of Michael Jones
Sent: Friday, September 15, 2006 10:06 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] AVStream driver buffering

I’ve done a variety of kernel-mode drivers over the years, but this is
my first AVStream driver. I’ve created a driver based on the “avshws”
sample in the DDK. It’s for a simple video capture card which does not
support DMA.

In my Process routine, I simply advance the leading edge stream pointer
in my Process routine. In my DPC for ISR (triggered when data is
available), I get the trailing edge stream pointer, copy the data from
the hardware into the buffer, and advance the trailing edge pointer.

However, what I see occasionally is a NULL return from
KsPinGetTrailingEdgeStreamPointer. It appears that I’m only ever
getting one buffer to fill; sometimes Process is called too late (i.e.
after my hardware interrupt and the call to the DPC).

I’ve tried playing with the KSALLOCATOR_FRAMING_EX values, to no avail.
I guess I expected that I would get a queue of buffers (i.e. at least
2), and that I could fill them directly as data arrived from the hardware.

Can someone point me in the right direction here? I’d really like to
avoid buffering the data internally, since an extra copy would be
undesirable.

Thanks in advance,

– mkj


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks for the response. I was beginning to give up hope :slight_smile:

In my Process routine, I simply advance the Lead Edge pointer. When my
device interrupts, I schedule a DPC; in the DPC I get the trailing edge
pointer, copy the data from the device into the (trailing edge) buffer,
and then advance the trailing edge pointer.

What I *thought* would happen was that I would be given some number of
buffers, each of which would cause a call to my Process function (I set
KSPIN_FLAG_INITIATE_PROCESSING_ON_EVERY_ARRIVAL along with
KSPIN_FLAG_DISTINCT_TRAILING_EDGE), where I advance the leading edge.
Then, in my DPC, I would fill the oldest buffer (via the trailing edge),
and advance that.

Clearly, that is not what’s happening. Is the solution to instead clone
the leading edge (in Process), and use the clone in my DPC? I chose to
use the trailing edge pointer since it seemed a little simpler than
dealing with clones.

–mkj

Max Paklin wrote:

It’s been a while since this question was asked, but… I will try it anyway.

Why are you using trailing edge stream pointer? It’s possible that you are doing what you want to
be doing, but I am guessing that you might be misunderstanding the purpose of leading and trailing
edge stream pointer.

In one of hardware designs that I happened to have a pleasure working on the FIFO was so small
that it couldn’t hold any meaningfully sized data. However it was possible to feed it a number of
buffers so it would switch to the next one when it finished working on the current one. For that
we had to use the trailing edge to have our hardware buffer and work one more than one buffer at a
time.
Even with that design we would still have to work with leading edge mostly and release it first.

If I understand you correctly you are releasing trailing edge stream pointer, which is not
something that downstream clients would expect you to do.
You are handing them a buffer out-of-order. Most of the time that will keep on waiting for the
first buffer (leading edge) unless leading edge=trailing edge, which means you have only 1 buffer
queue.

Also playing with allocator properties may not buy you anything because in many cases those
represent only preferences and will be overwritten or ignored by downstream

Playing with allocators is something that I would consider the last resort tactics. Definitely not
something that you should try.
You are obviously mismanaging AVStream queue.

Also you can assert that you pin allocator properties are hard requirements by removing
“preferences-only” flag, which is usually used by default. Then your allocator properties will be
honored or pin connection will fail.

Generally speaking if you control the graph and the filter you are connecting to then you can do
whatever you want. If the filter is not under your control then chances are high that it will use
its own allocator (standard DShow allocator) and will ignore yours. DShow baseclasses library
source is in Windows SDK. See for yourself how allocator gets selected.

– Max.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-263291-
xxxxx@lists.osr.com] On Behalf Of Michael Jones
Sent: Wednesday, September 20, 2006 1:26 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] AVStream driver buffering

No takers? Let me try asking a somewhat different question. If I
understand correctly, I can create my own allocator to achieve the
buffering I want. However, there is no guarantee that my allocator will
be used, is that correct?

This device will be used in an application we are creating; does that
change the equation with respect to allocators? When you are in control
of the graph, are the way(s) to ensure the capture device’s allocator is
the one which gets used?

If not (or if creating an allocator is not a wise path to go down), is
there any way to influence the standard allocator to put more than one
buffer into play? Or am I basically stuck–do I really need my own
private buffer(s), and should I just live with the extra RtlCopyMemory?

TIA,

–mkj

Michael Jones wrote:
> I’ve done a variety of kernel-mode drivers over the years, but this is
> my first AVStream driver. I’ve created a driver based on the “avshws”
> sample in the DDK. It’s for a simple video capture card which does not
> support DMA.
>
> In my Process routine, I simply advance the leading edge stream pointer
> in my Process routine. In my DPC for ISR (triggered when data is
> available), I get the trailing edge stream pointer, copy the data from
> the hardware into the buffer, and advance the trailing edge pointer.
>
> However, what I see occasionally is a NULL return from
> KsPinGetTrailingEdgeStreamPointer. It appears that I’m only ever
> getting one buffer to fill; sometimes Process is called too late (i.e.
> after my hardware interrupt and the call to the DPC).
>
> I’ve tried playing with the KSALLOCATOR_FRAMING_EX values, to no avail.
> I guess I expected that I would get a queue of buffers (i.e. at least
> 2), and that I could fill them directly as data arrived from the
hardware.
>
> Can someone point me in the right direction here? I’d really like to
> avoid buffering the data internally, since an extra copy would be
> undesirable.
>
> Thanks in advance,
>
> – mkj
> _______________________________________________
> //
> // Michael K. Jones
> // Stone Hill Consulting, LLC
> // http://www.stonehill.com
> //_______________________________________________


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Trailing edge is NOT the head of the queue, leading edge is. You are not picking up the oldest
element as you appear to think.
I can’t understand what you are trying to do and why you think you need to deal with trailing
edge.

The easiest method to go about AVStream queue management is to get the leading edge, create a
clone and advance the stream pointer. Just as you said. In your DPC you use the clone, eject it
when you are done with it and go to the next one. Easy and low cost solution. Clones are allocated
from a lookaside and therefore cheap. They are nothing but a copy of the buffer header; nothing
else is involved and no additional overhead is incurred.

On a personal note I never liked queue management implementation in AVStream. To me it’s counter
intuitive and requires serious brain twisting before you can get it. The whole concept seems to be
overdesigned quite a bit.
At some time I remember myself regularly walking AVStream queues, advancing stream pointers and
collecting clones in my own list. THat way I could at least think about the data I had in my hands
straight. Plus that warm and fuzzy feeling of holding all bits and pieces in my hands certainly
contributed to the urge of implementing that solution.

– Max.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-263745-
xxxxx@lists.osr.com] On Behalf Of Michael Jones
Sent: Monday, September 25, 2006 10:54 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] AVStream driver buffering

Thanks for the response. I was beginning to give up hope :slight_smile:

In my Process routine, I simply advance the Lead Edge pointer. When my
device interrupts, I schedule a DPC; in the DPC I get the trailing edge
pointer, copy the data from the device into the (trailing edge) buffer,
and then advance the trailing edge pointer.

What I *thought* would happen was that I would be given some number of
buffers, each of which would cause a call to my Process function (I set
KSPIN_FLAG_INITIATE_PROCESSING_ON_EVERY_ARRIVAL along with
KSPIN_FLAG_DISTINCT_TRAILING_EDGE), where I advance the leading edge.
Then, in my DPC, I would fill the oldest buffer (via the trailing edge),
and advance that.

Clearly, that is not what’s happening. Is the solution to instead clone
the leading edge (in Process), and use the clone in my DPC? I chose to
use the trailing edge pointer since it seemed a little simpler than
dealing with clones.

–mkj

Max Paklin wrote:
> It’s been a while since this question was asked, but… I will try it
anyway.
>
> Why are you using trailing edge stream pointer? It’s possible that you
are doing what you want to
> be doing, but I am guessing that you might be misunderstanding the
purpose of leading and trailing
> edge stream pointer.
>
> In one of hardware designs that I happened to have a pleasure working on
the FIFO was so small
> that it couldn’t hold any meaningfully sized data. However it was
possible to feed it a number of
> buffers so it would switch to the next one when it finished working on
the current one. For that
> we had to use the trailing edge to have our hardware buffer and work one
more than one buffer at a
> time.
> Even with that design we would still have to work with leading edge
mostly and release it first.
>
> If I understand you correctly you are releasing trailing edge stream
pointer, which is not
> something that downstream clients would expect you to do.
> You are handing them a buffer out-of-order. Most of the time that will
keep on waiting for the
> first buffer (leading edge) unless leading edge=trailing edge, which
means you have only 1 buffer
> queue.
>
> Also playing with allocator properties may not buy you anything because
in many cases those
> represent only preferences and will be overwritten or ignored by
downstream


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Max Paklin wrote:

Trailing edge is NOT the head of the queue, leading edge is. You are not picking up the oldest element as you appear to think. I can’t understand what you are trying to do and why you think you need to deal with trailing edge.

Actually, if you read the documentation, his plan seems fundamentally
sound. If you have enabled the trailing edge, it should point to the
oldest unfilled frame, whereas the leading edge points to the newest
unfilled frame. Thus, when a new frame comes in, I want to copy it to
the trailing edge and advance that.

However, since none of the samples use the trailing edge, and the
leading-edge-clone mechanism works just fine, it would seem like good
advice to forget about the trailing edge stuff. It was designed for
decoders that need to produce multiple frames, possibly out of order.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thanks Tim. It’s good to know I didn’t completely miss the boat when I
read the docs. I will change it to clone the leading edge, however,
since the trailing edge scheme seems to be Terra Incognito.

Cheers,

– mkj

Tim Roberts wrote:

Max Paklin wrote:

> Trailing edge is NOT the head of the queue, leading edge is. You are not picking up the oldest element as you appear to think. I can’t understand what you are trying to do and why you think you need to deal with trailing edge.
>
>

Actually, if you read the documentation, his plan seems fundamentally
sound. If you have enabled the trailing edge, it should point to the
oldest unfilled frame, whereas the leading edge points to the newest
unfilled frame. Thus, when a new frame comes in, I want to copy it to
the trailing edge and advance that.

However, since none of the samples use the trailing edge, and the
leading-edge-clone mechanism works just fine, it would seem like good
advice to forget about the trailing edge stuff. It was designed for
decoders that need to produce multiple frames, possibly out of order.


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

Hmm. just ran across this code snippet in the Europa sample
(src\wdm\bda\europa\anlgvideocap.cpp, in the function CAnlgVideoCap::Start):

NTSTATUS ntStatus = STATUS_UNSUCCESSFUL;
DWORD dwIndex = 0;

do //while system buffers available
{
ntStatus = CAnlgVideoCap::Process();
dwIndex++;
}
while( ntStatus == STATUS_SUCCESS );

//check, if the number of inserted buffer is equal to
//formerly demanded buffers
if( dwIndex != NUM_VD_CAP_STREAM_BUFFER )
{
// This is not a bug but there is a Microsoft restriction
// where sometimes we dont get all the requested buffers.
_DbgPrintF( DEBUGLVL_VERBOSE,
(“Warning: Number of buffers do not match!”));
}

The Process function does a lot of stuff, but basically it advances the
leading edge stream pointer once each time it is called. Could this be
related to what I’m seeing? Anyone know what this “restriction” is, and
if there’s any way to avoid it (or at least increase the odds that I
won’t run in to it)?

Cheers,

– mkj


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

I stand corrected.
The only time I used the trailing edge was about 5-6 years ago. My memory is not serving me well.

– Max.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-264315-
xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Friday, September 29, 2006 9:49 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] AVStream driver buffering

Max Paklin wrote:

>Trailing edge is NOT the head of the queue, leading edge is. You are not
picking up the oldest element as you appear to think. I can’t understand
what you are trying to do and why you think you need to deal with trailing
edge.
>
>

Actually, if you read the documentation, his plan seems fundamentally
sound. If you have enabled the trailing edge, it should point to the
oldest unfilled frame, whereas the leading edge points to the newest
unfilled frame. Thus, when a new frame comes in, I want to copy it to
the trailing edge and advance that.

However, since none of the samples use the trailing edge, and the
leading-edge-clone mechanism works just fine, it would seem like good
advice to forget about the trailing edge stuff. It was designed for
decoders that need to produce multiple frames, possibly out of order.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Just to wrap up my own thread, it turns out that the trailing edge
pointer scheme I outline below does indeed work as I had hoped. I had
been testing my filter in GraphEdt since I could not convince AmCap to
see my capture device.

In GraphEdt, when I rendered my output pin, some one upstream seems to
only allocate a single buffer, and that’s why I only ever saw the one
buffer. Over the past several days I have been working to get AmCap to
recognize my device; lo and behold, when it finally did, it allocated
the number of buffers I asked for (3 in this case).

This was before I got around to replacing the trailing edge stuff with
the “clone the leading edge” code. I added some debug tracing to be
sure it was operating as I expected; it does, at least in AmCap.

Thanks to all who responded. It seems that with AvStream drivers, the
bugs you need to fix are not the ones which are apparent :slight_smile:

Cheers,

–mkj

Michael Jones wrote:

I’ve done a variety of kernel-mode drivers over the years, but this is
my first AVStream driver. I’ve created a driver based on the “avshws”
sample in the DDK. It’s for a simple video capture card which does not
support DMA.

In my Process routine, I simply advance the leading edge stream pointer
in my Process routine. In my DPC for ISR (triggered when data is
available), I get the trailing edge stream pointer, copy the data from
the hardware into the buffer, and advance the trailing edge pointer.

However, what I see occasionally is a NULL return from
KsPinGetTrailingEdgeStreamPointer. It appears that I’m only ever
getting one buffer to fill; sometimes Process is called too late (i.e.
after my hardware interrupt and the call to the DPC).

I’ve tried playing with the KSALLOCATOR_FRAMING_EX values, to no avail.
I guess I expected that I would get a queue of buffers (i.e. at least
2), and that I could fill them directly as data arrived from the hardware.

Can someone point me in the right direction here? I’d really like to
avoid buffering the data internally, since an extra copy would be
undesirable.

Thanks in advance,

– mkj


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________