PCI/AGP memory question

Hi,

at the beginning I must explain that I am not an expert in driver subject so please do not expect that I know too much.

Here’ s my problem:
I have old TV tuner PCI card based on Zoran ZR36120 multimedia controller. Because the company doesn’t exist any more and the driver was 16-bit and yet I wanted to train myself a little bit I decided to implement WDM streaming minidriver for my hardware (actually my job is not far away from it). After some time I managed to do it - my driver works but the problem is that display is not smooth enough.
ZR36120 chip enables DMA transfers to contiguous memory areas (no scatter/gather support) - top & bottom video fields. Hence I have requested in DriverEntry allocation of common DMA buffer (DmaBufferSize in HW_INITIALIZATION_DATA structure) and I have setup ZR36120 to do DMA transfers to this buffer. Next I mapped SRB data buffer using “write combining” when SRB is queued by the driver.

pSrbExt->pMDL = IoAllocateMdl(pDataPacket->Data, biWidthBytes VIDEO_HEIGHT, FALSE, FALSE, NULL);

if(pSrbExt-pMDL != NULL) {
MmProbeAndLockPages(pSrbExt->pMDL, KernelMode, IoWriteAccess);
pSrbExt->pvData = MmMapLockedPagesSpecifyCache(pSrbExt->pMDL, KernelMode, MmWriteCombined, NULL, FALSE, NormalPagePriority);
};

then in DPC, after DMA transfer of video field is completed I simply copy the field to SRB data buffer using mapped pSrbExt->pvData

when SRB is completed I call:

if(pSrbExt->pvData != NULL) {
MmUnmapLockedPages(pSrbExt->pvData, pSrbExt->pMDL);
};

MmUnlockPages(pSrbExt->pMDL);
IoFreeMdl(pSrbExt->pMDL);

OK, the problem is that it takes a long time, ab. 14 ms (without “write comining” it was 18 ms) to copy field so I have only 6 ms for OS left. I have 720x288x2 bytes to copy for every field what gives 29.6 MB/s. Is it all what I can expect for AGP (x2) ?
I have also notived that when only line line per field is DMA transfered copying of the same field buffer from common DMA buffer to SRB buffer takes 10 ms ? Shouldn’t PCI & AGP be independent ? Shoulnd’t I achieve bigger transfers (up to 512 MB/s) ? Copying the same block between memory takes only 5 ms. But when I use optimized (what doesn’t help at all when I copy to SRB data buffer) memory copying procedure (MMX,SIMD,prefetching) it takes only 1-2 ms.
Can it be improved ?
I use PIII 800 MHz with VIA chipsets.

Thanks in advance and best regards
Dariusz Dziara

Dariusz Dziara wrote:

Here’ s my problem:
I have old TV tuner PCI card based on Zoran ZR36120 multimedia
controller. …

OK, the problem is that it takes a long time, ab. 14 ms (without
“write comining” it was 18 ms) to copy field so I have only 6 ms for
OS left. I have 720x288x2 bytes to copy for every field what gives
29.6 MB/s. Is it all what I can expect for AGP (x2) ?
I have also notived that when only line line per field is DMA
transfered copying of the same field buffer from common DMA buffer to
SRB buffer takes 10 ms ? Shouldn’t PCI & AGP be independent ?

That depends what you mean. There is only one “pipe” out of the CPU.
That pipe goes to a bridge chip, and the PCI and AGP bridges both come
out of that bridge.

The big speed boost from AGP comes when the device itself is pulling
from memory. When you’re blasting from the CPU with a rep movsd, AGP is
just PCI with a fancy paint job.

You said you’re using a DMA common buffer? OK, so the Zoran DMAs over
PCI into your common buffer. You use MoveMemory to copy that to the SRB
buffer. Eventually, a render filter copies that into an overlay surface
on the graphics card over AGP. Right? That’s a lot of copies. The
Zoran is a plain PCI device. Your frames are about 450k bytes; on a
perfect PCI bus, that’s at least 3 ms, and it’s quite possible that the
Zoran’s bus mastering is not perfect.

Plus, ALL of those transfers involve main memory, so they’re all going
to be fighting for the memory bus.

Full-frame 30fps video is a hefty burden for an 800MHz processor.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Just a thought: Is the output from the Zoran Chip in a format that could be used by Video Chip CoProcessor just as a Direct Draw Surface. If so you could get the Video Chip CoProcessor to Blt the Video data. Just a thought.

Regards,
William Michael Jones “Mike”
“Dariusz Dziara” wrote in message news:xxxxx@ntdev…
Hi,

at the beginning I must explain that I am not an expert in driver subject so please do not expect that I know too much.

Here’ s my problem:
I have old TV tuner PCI card based on Zoran ZR36120 multimedia controller. Because the company doesn’t exist any more and the driver was 16-bit and yet I wanted to train myself a little bit I decided to implement WDM streaming minidriver for my hardware (actually my job is not far away from it). After some time I managed to do it - my driver works but the problem is that display is not smooth enough.
ZR36120 chip enables DMA transfers to contiguous memory areas (no scatter/gather support) - top & bottom video fields. Hence I have requested in DriverEntry allocation of common DMA buffer (DmaBufferSize in HW_INITIALIZATION_DATA structure) and I have setup ZR36120 to do DMA transfers to this buffer. Next I mapped SRB data buffer using “write combining” when SRB is queued by the driver.

pSrbExt->pMDL = IoAllocateMdl(pDataPacket->Data, biWidthBytes VIDEO_HEIGHT, FALSE, FALSE, NULL);

if(pSrbExt-pMDL != NULL) {
MmProbeAndLockPages(pSrbExt->pMDL, KernelMode, IoWriteAccess);
pSrbExt->pvData = MmMapLockedPagesSpecifyCache(pSrbExt->pMDL, KernelMode, MmWriteCombined, NULL, FALSE, NormalPagePriority);
};

then in DPC, after DMA transfer of video field is completed I simply copy the field to SRB data buffer using mapped pSrbExt->pvData

when SRB is completed I call:

if(pSrbExt->pvData != NULL) {
MmUnmapLockedPages(pSrbExt->pvData, pSrbExt->pMDL);
};

MmUnlockPages(pSrbExt->pMDL);
IoFreeMdl(pSrbExt->pMDL);

OK, the problem is that it takes a long time, ab. 14 ms (without “write comining” it was 18 ms) to copy field so I have only 6 ms for OS left. I have 720x288x2 bytes to copy for every field what gives 29.6 MB/s. Is it all what I can expect for AGP (x2) ?
I have also notived that when only line line per field is DMA transfered copying of the same field buffer from common DMA buffer to SRB buffer takes 10 ms ? Shouldn’t PCI & AGP be independent ? Shoulnd’t I achieve bigger transfers (up to 512 MB/s) ? Copying the same block between memory takes only 5 ms. But when I use optimized (what doesn’t help at all when I copy to SRB data buffer) memory copying procedure (MMX,SIMD,prefetching) it takes only 1-2 ms.
Can it be improved ?
I use PIII 800 MHz with VIA chipsets.

Thanks in advance and best regards
Dariusz Dziara

Hi,

in the first place thanks to everyone for response.

  1. ZR36120 supports both YUV 4:2:2 & RGB 8:8:8. I currently I use YUV format (2 bytes) because RGB seams to yield to much data (3 bytes per pixes) and DMA transfers simply doesn’t work on my PC.
    I am not sure if YUV blitting to DirectDraw surface is possible at all in my case (Windows XP + GeForce II Ultra).

  2. As I concluded from some driver code I found in the Internet, accessing DirectDraw requires DirectDraw user mode handlers. I have noticed that those handlers are taken from KS_FRAME_INFO (hDirectDraw & hSurfaceHandle) structure.

But as I’ve read somewhere these fields are set when “Overlay Mixer” or “Overlay Mixer 2” filter is connected.
What about “VRM9” which replaces “Overlay Mixer” ? Can DirectDraw be used in such case ?
I also report “hDirectDraw” & “hSurfaceHandle” values when SRB arrives and there’s another thing I do not understand - every second time these handler are NULL. Why ?

Best Regards
Dariusz Dziara

----- Original Message -----
From: William Michael Jones
Newsgroups: ntdev
To: Windows System Software Devs Interest List
Sent: Tuesday, February 07, 2006 7:37 PM
Subject: Re:[ntdev] PCI/AGP memory question

Just a thought: Is the output from the Zoran Chip in a format that could be used by Video Chip CoProcessor just as a Direct Draw Surface. If so you could get the Video Chip CoProcessor to Blt the Video data. Just a thought.

Regards,
William Michael Jones “Mike”
“Dariusz Dziara” wrote in message news:xxxxx@ntdev…
Hi,

at the beginning I must explain that I am not an expert in driver subject so please do not expect that I know too much.

Here’ s my problem:
I have old TV tuner PCI card based on Zoran ZR36120 multimedia controller. Because the company doesn’t exist any more and the driver was 16-bit and yet I wanted to train myself a little bit I decided to implement WDM streaming minidriver for my hardware (actually my job is not far away from it). After some time I managed to do it - my driver works but the problem is that display is not smooth enough.
ZR36120 chip enables DMA transfers to contiguous memory areas (no scatter/gather support) - top & bottom video fields. Hence I have requested in DriverEntry allocation of common DMA buffer (DmaBufferSize in HW_INITIALIZATION_DATA structure) and I have setup ZR36120 to do DMA transfers to this buffer. Next I mapped SRB data buffer using “write combining” when SRB is queued by the driver.

pSrbExt->pMDL = IoAllocateMdl(pDataPacket->Data, biWidthBytes VIDEO_HEIGHT, FALSE, FALSE, NULL);

if(pSrbExt-pMDL != NULL) {
MmProbeAndLockPages(pSrbExt->pMDL, KernelMode, IoWriteAccess);
pSrbExt->pvData = MmMapLockedPagesSpecifyCache(pSrbExt->pMDL, KernelMode, MmWriteCombined, NULL, FALSE, NormalPagePriority);
};

then in DPC, after DMA transfer of video field is completed I simply copy the field to SRB data buffer using mapped pSrbExt->pvData

when SRB is completed I call:

if(pSrbExt->pvData != NULL) {
MmUnmapLockedPages(pSrbExt->pvData, pSrbExt->pMDL);
};

MmUnlockPages(pSrbExt->pMDL);
IoFreeMdl(pSrbExt->pMDL);

OK, the problem is that it takes a long time, ab. 14 ms (without “write comining” it was 18 ms) to copy field so I have only 6 ms for OS left. I have 720x288x2 bytes to copy for every field what gives 29.6 MB/s. Is it all what I can expect for AGP (x2) ?
I have also notived that when only line line per field is DMA transfered copying of the same field buffer from common DMA buffer to SRB buffer takes 10 ms ? Shouldn’t PCI & AGP be independent ? Shoulnd’t I achieve bigger transfers (up to 512 MB/s) ? Copying the same block between memory takes only 5 ms. But when I use optimized (what doesn’t help at all when I copy to SRB data buffer) memory copying procedure (MMX,SIMD,prefetching) it takes only 1-2 ms.
Can it be improved ?
I use PIII 800 MHz with VIA chipsets.

Thanks in advance and best regards
Dariusz Dziara


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com