AVStream: add metadata to RGB24 capture output pin ?

bernard_willaert · April 3, 2014, 7:44am

The question is basically identical to this post:
http://www.osronline.com/showthread.cfm?link=229801

We have a capture filter with one output pin - fixed format RGB24 ( that does not support metadata by itself).
We would like to add proprietary metadata to each frame that is DMA’d into a stream buffer.
Something like setting the PresentationTime/ Duration / DataUsed in the KSSTRAM_HEADER struct.
The KSSTREAM_HEADER structure documentation mentions:
The KSSTREAM_HEADER structure is a variable-length structure that describes a packet of data to be read from or written to a streaming driver pin.
And in the Remarks section:
This structure can be followed in memory by additional information specific to the type of data in the data packet.
So, if we specify header.Size = (sizeof(KSSTREAM_HEADER) + sizeof (PROPRIETARY_DATASTRUCT)), can we add a PROPRIETARY_DATASTRUCT at the end of the KSSTREAM_HEADER struct?

If so, how can we retrieve this additional data from the capture pin ?

We now have a slightly modified version of the SDK DirectShow “dump” filter sample, that retrieves all the data from an IMediaSample in the Receive callback.
This data is written to a file. It uses the the standard IMediaSample API to get the timestamps, etc…
Is there a way we can get to the raw KSSTRAM_HEADER for each frame, using this dump filter ?
This would allow us to extract the extra metadata and write it to a file for offline review/analysis.

Thank You in advance for any input.

Bernard Willaert
Software Development Engineer
Barco - Healthcare division
Belgium

Tim_Roberts · April 3, 2014, 12:52pm

xxxxx@hotmail.com wrote:

We have a capture filter with one output pin - fixed format RGB24 ( that does not support metadata by itself).
We would like to add proprietary metadata to each frame that is DMA’d into a stream buffer.
Something like setting the PresentationTime/ Duration / DataUsed in the KSSTRAM_HEADER struct.
The KSSTREAM_HEADER structure documentation mentions:
The KSSTREAM_HEADER structure is a variable-length structure that describes a packet of data to be read from or written to a streaming driver pin.
And in the Remarks section:
This structure can be followed in memory by additional information specific to the type of data in the data packet.
So, if we specify header.Size = (sizeof(KSSTREAM_HEADER) + sizeof (PROPRIETARY_DATASTRUCT)), can we add a PROPRIETARY_DATASTRUCT at the end of the KSSTREAM_HEADER struct?

No. It is variable length, but the variability is not under your
control, unless you are sending the KS ioctls directly. In a DirectShow
graph, the size of the KSSTREAM_HEADER is established by ksproxy when he
sends the IOCTL_KS_READ_STREAM ioctls. For a video stream, for example,
the stream header actually consists of a KSSTREAM_HEADER plus a
KS_FRAME_INFO.

I hope it is obvious that your driver cannot unilaterally decide to
extend the size of the KSSTREAM_HEADER. If you are given a 40-byte
buffer, you can’t simply decide to write 64 bytes into it.

We now have a slightly modified version of the SDK DirectShow “dump” filter sample, that retrieves all the data from an IMediaSample in the Receive callback.
This data is written to a file. It uses the the standard IMediaSample API to get the timestamps, etc…
Is there a way we can get to the raw KSSTRAM_HEADER for each frame, using this dump filter ?

No. Ksproxy translates the KSSTREAM_HEADER information into DirectShow
structures and then recycles the memory. By the time it gets to a
downstream filter, the header is long gone.

You have a couple of choices. You could use steganography, by embedding
your information into the low-order bit of each pixel byte. If this is
for debug information, you could replace the first scanline with your
metadata, and have your app write over that with black. Or, you can
invent your own FOURCC code, and have a DirectShow filter that
translates it to RGB24 by removing the metadata.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

bernard_willaert · April 4, 2014, 2:36am

Thank You, Tim !
So basically there are 3 options:

have a dedicated output pin at the capture filter for a metadata stream + the normal video output pin. Our concern is then: how synchronized are these 2 streams ?
one single output pin with a bigger buffer size that combines pixel data and metadata. Then we need to throw in an additional custom splitter filter to render the video in a standard way and separate it from the additional data
embed / encrypt the metadata in the standard video output and enable this option for debugging only: ( learned a new word here: ‘steganography’, thanks for that!)

Kind regards,

Bernard Willaert

bernard_willaert · April 4, 2014, 8:31am

>>For a video stream, for example, the stream header actually consists of a KSSTREAM_HEADER plus a KS_FRAME_INFO.

If we fill in the KS_FRAME_INFO in the driver:
How do we get to the extra KS_FRAME_INFO from a user mode debug filter like “dump” ?
There is only a “receive” callback with a parameter IMediaSample* and a dedicated API, that doesn’t include something like “getKsFrameInfo()” ?

Thanks,

Bernard Willaert

Tim_Roberts · April 4, 2014, 2:18pm

xxxxx@hotmail.com wrote:

>> For a video stream, for example, the stream header actually consists of a KSSTREAM_HEADER plus a KS_FRAME_INFO.
If we fill in the KS_FRAME_INFO in the driver:
How do we get to the extra KS_FRAME_INFO from a user mode debug filter like “dump” ?
There is only a “receive” callback with a parameter IMediaSample* and a dedicated API, that doesn’t include something like “getKsFrameInfo()” ?

No, you can’t. This is all abstracted by ksproxy. It is a private
communication between the Microsoft ksproxy component and the Microsoft
KS port driver. The KS_FRAME_INFO is received by ksproxy; ksproxy
copies the known fields into the equivalent DirectShow structures, and
then recycles the memory. Since there is no interface for extending the
DirectShow interfaces, there’s no place to put the extra information.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim_Roberts · April 4, 2014, 2:31pm

xxxxx@hotmail.com wrote:

So basically there are 3 options:

have a dedicated output pin at the capture filter for a metadata stream + the normal video output pin. Our concern is then: how synchronized are these 2 streams ?

That’s up to you. You put the timestamps in the stream header. If you
put the same timestamps in the two headers, then your application can
sync them up.

If those are your only two pins, then you could use a “filter-centric”
AVStream model instead of a “pin-centric” model. In the filter-centric
model, you have your Process callback in the filter instead of the
pins. The Process callback will not be called unless there are
available frames on all the pins. It gets handed an array of pins, and
you process them all at once.

The avssamp sample driver is filter-centric, as a way to ship video plus
the associated audio together.

embed / encrypt the metadata in the standard video output and enable this option for debugging only: ( learned a new word here: ‘steganography’, thanks for that!)

You’ll be hearing that a lot more in the future. All of your money, for
example, now contains information embedded in the artwork using
steganography that can be used to detect counterfeiting. Stock and bond
certificates have used it for a couple of decades. Some copy machines
actually detect the signatures and refuse to copy protected documents.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.