hi all,
i am developing a AVStream mini driver, i have to copy large amout of video data from common buffer to the output pin’s buffer. I don’t think this is an effective way, because this will take up much CPU time.
I wonder whether there is an effective approach to copy data rather than RtlCopyMemory.
such as memory map and so on?
here is part of my key code, i think it’s not effective enough.
PKSSTREAM_POINTER pStreamPointer = KsPinGetLeadingEdgeStreamPointer (m_Pin, KSSTREAM_POINTER_STATE_LOCKED);
PKSSTREAM_HEADER pHeader = pStreamPointer->StreamHeader;
PBYTE pData = static_cast< PBYTE >( pHeader->Data );
RtlCopyMemory(pData, pAddr, nLength); // pAddr is the address of common buffer
can some body give me some advice, thanks in advance!
How big is the bufferh
d
Sent from my phone with no t9, all spilling mistakes are not intentional.
-----Original Message-----
From: xxxxx@gmail.com
Sent: Wednesday, June 10, 2009 8:12 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] copy large amount of data
hi all,
i am developing a AVStream mini driver, i have to copy large amout of video data from common buffer to the output pin’s buffer. I don’t think this is an effective way, because this will take up much CPU time.
I wonder whether there is an effective approach to copy data rather than RtlCopyMemory.
such as memory map and so on?
here is part of my key code, i think it’s not effective enough.
PKSSTREAM_POINTER pStreamPointer = KsPinGetLeadingEdgeStreamPointer (m_Pin, KSSTREAM_POINTER_STATE_LOCKED);
PKSSTREAM_HEADER pHeader = pStreamPointer->StreamHeader;
PBYTE pData = static_cast< PBYTE >( pHeader->Data );
RtlCopyMemory(pData, pAddr, nLength); // pAddr is the address of common buffer
can some body give me some advice, thanks in advance!
—
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
D1 picture size * 4channel * 25 fps(PAL) or D1 picture size * 4channel * 30 fps(NTSC) , that is
704*576*2*4*25 or 720*480*2*30 bytes per sencond per device.
there may be 4 devices per PC, so the data amout is large.
Without having much thought about AvStream drivers (don’t these use DMA?)
RtlCopyMemory/memcpy is just implemented as a REP MOV instruction which is
like molasses going uphill in January (with crunches ©). You can get much
better performance with SSE instructions which can copy 8 or 16 bytes at a
time even on a 32bit system.
//Daniel
wrote in message news:xxxxx@ntdev…
> I wonder whether there is an effective approach to copy data rather than
> RtlCopyMemory.
>
Yes, i have to copy data from DMA to the output pin.
Have you run any measurements of the performance hit that these memory
copies introduce?
I’ve worked on custom drivers (not AVstream drivers) that bring several Gb/s
to user mode with memory copies, and usually the performance hit is not that
bad at all.
Just my 2 cents
GV
----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Wednesday, June 10, 2009 9:26 PM
Subject: RE:[ntdev] copy large amount of data
> D1 picture size * 4channel * 25 fps(PAL) or D1 picture size * 4channel *
> 30 fps(NTSC) , that is
>
> 7045762425 or 7204802*30 bytes per sencond per device.
>
> there may be 4 devices per PC, so the data amout is large.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
xxxxx@gmail.com wrote:
D1 picture size * 4channel * 25 fps(PAL) or D1 picture size * 4channel * 30 fps(NTSC) , that is
704*576*2*4*25 or 720*480*2*30 bytes per sencond per device.
there may be 4 devices per PC, so the data amout is large.
Don’t guess, do the math. 720x480x2x30 is 20 megabytes per second.
That’s a trivial amount of data. Current processors can copy memory at
a gigabyte per second or more. Even with 4 devices running full bore,
you are talking about less than 10% CPU load.
FIRST, get it to work. THEN, decide whether the performance is adequate.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
Tim Roberts wrote:
xxxxx@gmail.com wrote:
> D1 picture size * 4channel * 25 fps(PAL) or D1 picture size * 4channel * 30 fps(NTSC) , that is
>
> 704*576*2*4*25 or 720*480*2*30 bytes per sencond per device.
>
> there may be 4 devices per PC, so the data amout is large.
>
>
Don’t guess, do the math. 720x480x2x30 is 20 megabytes per second.
That’s a trivial amount of data. Current processors can copy memory at
a gigabyte per second or more. Even with 4 devices running full bore,
you are talking about less than 10% CPU load.
FIRST, get it to work. THEN, decide whether the performance is adequate.
Following up to my own post, we did an AVStream driver for a PCIExpress
HDTV capture card a three years ago that could capture two streams at
once. Because of the DMA implementation on the chip, I used a common
buffer and copied the data to the stream. Even when running Media
Center and displaying the streams full screen, it was still less than
25% CPU load, and processors today are better yet.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
xxxxx@resplendence.com wrote:
Without having much thought about AvStream drivers (don’t these use DMA?)
Yes, assuming it’s on a bus that does DMA, but for timing reasons it’s
common to DMA into a common buffer and copy into the stream buffers.
RtlCopyMemory/memcpy is just implemented as a REP MOV instruction
which is like molasses going uphill in January (with crunches ©). You
can get much better performance with SSE instructions which can copy 8
or 16 bytes at a time even on a 32bit system.
You’re exaggerating. Yes, you can get better performance with SSE, but
not necessarily “much better”, and there is a cost in complexity and
startup time. REP MOVSD runs at one dword per CPU cycle.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.