We have created a capture filter with a single video output pin - based on the avshws sample.
Our PCIe device is capable of doing bus master scatter/gather DMA from on-board capture memory to system memory.
The frame size of captured video can be as large as 4 Mpixel @ 3 bytes/pixel –> 12 MByte / frame.
So, worst case for each frame, there are 12 MB / 4 kB = 3000 S/G list entries needed.
According to our HW engineers, this might be a bottleneck in performance when the driver has to refresh the entire S/G mappings in the FPGA for
each individual frame in the pin DispatchProcess() callback - using KSSTREAM_POINTER_OFFSET entries.
This would be the case when the video consumer connected to our output pin (e.g. a renderer) allocates and provides the frame buffers,
in other words, when we set the MemoryFlags member in the KS_FRAMING_ITEM structure to KSALLOCATOR_REQUIREMENTF_MUST_ALLOCATE.
Ideally, we would like to fill the S/G list in hardware only once, before the start of streaming captured video, with maybe enough
entries to span an exact number of frames (e.g. 3 frames) and then recycle this buffer as a ringbuffer.
In order to do this, we probably have to allocate the target capture memory ourselves in the AVStream driver?
Or is a “common buffer” approach a better solution to avoid the continuous S/G mapping list update in the FPGA for each frame ?
and how does that affect performance compared to the direct S/G approach ?
The alternative approach we used in the past - not using AVStream:
- allocate a large system memory buffer at driver startup
- program the S/G list in the FPGA based DMA bus master once
- with each captured frame: DMA this frame into the system memory buffer
- activate a user mode callback + pointer to the captured frame.
This has proven to be very efficient.
Thanks in advance for any advice on this.
- Bernard Willaert
Software Development Engineer
Barco - Healthcare division
Belgium