AW: RE: Mapping scattered pages into process address spac- e

Just a few remarks on the answers so far:

  1. MmProbeAndLockPages() is fine for Win2000 but you should not use it
    anymore on Windows XP as it is marked as almost obsolete by MS in the most
    recent DDKs. The driver verifier will bluescreen on an attempt to use it.
    So, if you want to be able to run on NT 5.1, use
    MmProbeAndLockPagesSpecifyCache().

  2. If your device can do scatter/gather, why even worry about common
    buffers? You’re trying to share your memory with an application, so simply
    malloc() it in your app and pass it to the driver via an IOCTL. Make a MDL
    describing your buffer and then use MmProbeAndLockPages() to lock it down
    and MmMapLockedPagesSpecifyCache() it to your app returning the virtual user
    mode address to your app. Just don’t forget to unlock the buffer before your
    app quits or more seriously is killed by a TerminateProcess() call. You will
    be bluescreened with a PROCESS_HAS_LOCKED_PAGES code. Register a driver
    object that will call your driver back on process termination so you can
    clean up.

  3. If performance is crucial to your project, you might consider two
    things. Using the system supplied DMA functionality might be slow compared
    to what is possible. If you are using busmaster DMA you need not tell the
    system about any transfers you’re doing. (Io)MapTransfer et al. do not need
    to be used. Lock your pages and get their physical address by referencing
    the MDLs page array by using ((MDL*)p)+1 kind of access. MDLs store the
    physical pages’ page frame number (that is physical address >> 12) right
    after their control structure. At least in x86 incarnations this is so.
    Itanium is different but I don’t think you will ever plan on that platform.
    ALPHA and PA Risc and PPC are gone, so why worry?
    Secondly, make sure you use cached memory for your buffers. That way
    you will get the most out of your CPU’s read bandwidth. I have heard of
    reports that AllocateCommonBuffer() will not allocate any cached memory
    though requesting it via the CacheEnabled parameter in the call.If you don’t
    need to touch the buffers anymore at later time this doesn’t matter.

Oh, before I forget, is there anybody out there who wants faster bus
speeds? With PCI most of the software improvements don’t give you better
performance as you’re already limited to about 200 MB/s. I would like to see
something like AGP which I can tell you gives you about 770 MB/s (my best
speed I measured on a Fire GL 2 graphics board) when running 4x mode. Isn’t
that something? So if you have any weight to in the industry, why not step
out and stand in for systems with multiple AGP ports. AFAIK there’s no limit
on the number of AGP slots in a system, lest someone has a better
understanding of the architecture.

Cheers,

Dipl.Ing. (FH) Klaus P. Gerlicher
Software design engineer
ATi Research GmbH
Tel. +49-(0)8151-266-420
xxxxx@ATi.com

-----Urspr?ngliche Nachricht-----
Von: Stephen Williams [SMTP:xxxxx@icarus.com]
Gesendet am: Sonntag, 5. August 2001 00:03
An: NT Developers Interest List
Betreff: [ntdev] RE: Mapping scattered pages into process
address space

xxxxx@storagecraft.com said:
> > Common buffer pages are never scattered.
>
> They are if you allocate them a page at a time, as needed.

xxxxx@storagecraft.com said:
> Is it a good practice? Why not allocate the single, but large
common
> buffer?

That’s what I currently do, but these are buffers big enough to hold
a set of images, so that’s a lot of common memory that is not
available
to the system once the buffer is allocated. I only want to allocate
the memory when the device is in use, and in amounts dictated by the
application.

Because the buffer is so large, it is nary impossible to allocate
a contiguous buffer that large once the system is running. The
memory
is too fragmented.

Under Linux, the driver allocates the memory a page at a time
according
to the needs of the application/firmware, and since I allocate it a
page
at a time, fragmentation is not an issue. The application uses the
mmap
system call, which the driver implements by supplying a vm_ops with
the
appropriate funcitons, and I’m golden. Mapping scattered memory into
the
application is trivial.

Under NT, this technique simply doesn’t exist to I provide an ioctl
to
perform the mmap. I use ZwMapViewOfSection to map a buffer into the
address space of the application, and that works. (This buffer can
be
shared by threads/processes.) However, that buffer under NT must be
contiguous.

And since it is contiguous under NT, it is impractical
(fragmentation)
to allocate it when the application needs it and I’m forced to make
the user configure the driver (by registry keys) to preallocate the
buffer at driver boot time. This is an extra, useless administrative
step, and a waste of memory because the memory is only needed when
the
application is using the board, and different applications may need
different size buffers.

Another possibility is to allocate the memory in the application and
pass it down through an ioctl. The ioctl can pin the memory down and
get the physical addresses needed to pass to the hardware. The
problem
I see with this is that the memory may wind up somewhere where the
device can’t get to it (isn’t that what AllocateCommonBuffer is
for?)
and then where will I be?

Steve Williams “The woods are lovely, dark and deep.
xxxxx@icarus.com But I have promises to keep,
xxxxx@picturel.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”


You are currently subscribed to ntdev as: xxxxx@ATi.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com