AVStream driver allocate buffer

Hi all,

My AVStream driver was modified from WDK sample avshws. But it used common buffer DMA instead of scatter / gather DMA. Now i used RtlCopyMemory to copy data from common buffer to the output pin’s frame buffer. But there are many instances on a PC, so the amount of data transfer is large.

In avshws sample, every time when i get a PKSSTREAM_POINTER (by call KsPinGetLeadingEdgeStreamPointer) then the frame buffer is already allocated, so i have to copy data from common buffer to the frame buffer. I wonder whether there is an approach that when i get the PKSSTREAM_POINTER, the frame buffer was not allocated and i just need to tell it the address of the data, that is to say, i just need to give it a pointer of the common buffer instead of memory copy. If so, i think the CPU loading will decrease a lot when the amount of data transfer is large.

I’ve heard of memory map on linux, is there are some similar technic on windows?

Thanks in advance!

>data from common buffer to the frame buffer. I wonder whether there is an approach that when i get the

PKSSTREAM_POINTER, the frame buffer was not allocated and i just need to tell it the address of the
data, that is to say, i just need to give it a pointer of the common buffer instead of memory copy. If so, i
think the CPU loading will decrease a lot when the amount of data transfer is large.

The classic way is to use scatter-gather DMA, and accept the device’s data to the chain of IRP’s MDLs.

Also you can map the whole driver’s common buffer to the app’s address space once in the driver’s open path, and access the data directly by pointer, having some kind of “advance produce pointer” IOCTL to the driver.

Usually, this is considered to be not-so-good idea, but in multimedia stacks (for instance, in Vista+ audio) this can be a good idea, since the mixer requires bytewise access to the output data by pointer, and this approach allows to have the mixer in user mode and directly use the common buffer as mixer’s output without dealing with memory allocations, IRPs and MDLs.

So, for audio stack, this kind of data transfer does exist. I expect it to be represented in AVStream too as some standard data access method, but don’t know the exact details.

I’ve heard of memory map on linux

This is what is described above.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

xxxxx@gmail.com wrote:

My AVStream driver was modified from WDK sample avshws. But it used common buffer DMA instead of scatter / gather DMA. Now i used RtlCopyMemory to copy data from common buffer to the output pin’s frame buffer. But there are many instances on a PC, so the amount of data transfer is large.

What do you mean by “many instances in a PC”?

In avshws sample, every time when i get a PKSSTREAM_POINTER (by call KsPinGetLeadingEdgeStreamPointer) then the frame buffer is already allocated, so i have to copy data from common buffer to the frame buffer. I wonder whether there is an approach that when i get the PKSSTREAM_POINTER, the frame buffer was not allocated and i just need to tell it the address of the data, that is to say, i just need to give it a pointer of the common buffer instead of memory copy.

Not in AVStream. This is why the sample starts with scatter/gather DMA
instead of common buffer. You can say that you want to be the “memory
allocator” for your output pin, but even with that there’s no guarantee
that the requests will come in the order that your DMA runs.

How much data are you transferring?

If so, i think the CPU loading will decrease a lot when the amount of data transfer is large.

Why do you think CPU loading is a problem? What is the CPU load?
Today’s CPUs can copy memory pretty darned fast. We had a capture chip
doing two full HDTV streams over PCIExpress using a common buffer
solution that needed these copies, and it was never more than about 20%
of a CPU from 4 years ago.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim,

What do you mean by “many instances in a PC”?

=> my driver have many device instance on a PC.

How much data are you transferring?

=> about 500M to 600 M per second.

xxxxx@gmail.com wrote:

What do you mean by “many instances in a PC”?

=> my driver have many device instance on a PC.

How many?

How much data are you transferring?

=> about 500M to 600 M per second.

Do you mean megaBYTES here? That will require at least a 4-lane
PCIExpress slot, and most PCs don’t have more than 2 or 3 such slots, so
you can’t have all that many instances.

What do you hope to DO with that much data on a continuous basis? You
can’t shove it to disk or out to a network.

If that really is your bandwidth requirement, then it’s possible that a
common buffer design will not work for you. Does your hardware do
scatter/gather DMA?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.