MapTransfer on PCI

I have a processor board on a PCI bus that communicates with
applications on the host via shared memory and message passing.
Both the shared memory and the message passing are managed by
the driver that talks to a matching driver on the PCI board O/S.

Currently, the shared memory is created using AllocateCommonBuffer,
then application programs map the buffer using a DeviceIoControl
that uses MmMapLockedPagesSpecifyCache. This seems to work,
except that I’m allocating tens of megabytes of common buffers,
which seems a little impolite.

I am looking at instead locking down memory allocated in user
mode, using a DeviceIoControl and METHOD_DIRECT, then using
MapTransfer to get mappings of that memory, page at a time
(scattered pages are OK for the target board) that the target
board can use to randomly access those pages.

The problem here is that I’m only able to get some 65000 map
registers for each board, begging the question “Am I trading
one precious resource for another?”

Another problem is that I’m not certain that pages thusly mapped
are directly mapped, or if I’m actually getting an internal
trampoline buffer that will need to be flushed periodically.

And what is really the story with these “map registers” when
the device is a bus mastering PCI device that can generate
32bit addresses on a 32bit system?

I’m not immediately concerned about 64bit systems, but I do want
to keep them in mind as they are starting to be interesting. But
of course even 32bit systems with PAE will have extended addresses,
although my hardware currently only generates 32bit addresses.

Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

I once saw unlimited number of DMA map registers in this case.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Stephen Williams”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Thursday, June 23, 2005 10:40 PM
Subject: [ntdev] MapTransfer on PCI

>
> I have a processor board on a PCI bus that communicates with
> applications on the host via shared memory and message passing.
> Both the shared memory and the message passing are managed by
> the driver that talks to a matching driver on the PCI board O/S.
>
> Currently, the shared memory is created using AllocateCommonBuffer,
> then application programs map the buffer using a DeviceIoControl
> that uses MmMapLockedPagesSpecifyCache. This seems to work,
> except that I’m allocating tens of megabytes of common buffers,
> which seems a little impolite.
>
> I am looking at instead locking down memory allocated in user
> mode, using a DeviceIoControl and METHOD_DIRECT, then using
> MapTransfer to get mappings of that memory, page at a time
> (scattered pages are OK for the target board) that the target
> board can use to randomly access those pages.
>
> The problem here is that I’m only able to get some 65000 map
> registers for each board, begging the question “Am I trading
> one precious resource for another?”
>
> Another problem is that I’m not certain that pages thusly mapped
> are directly mapped, or if I’m actually getting an internal
> trampoline buffer that will need to be flushed periodically.
>
> And what is really the story with these “map registers” when
> the device is a bus mastering PCI device that can generate
> 32bit addresses on a 32bit system?
>
> I’m not immediately concerned about 64bit systems, but I do want
> to keep them in mind as they are starting to be interesting. But
> of course even 32bit systems with PAE will have extended addresses,
> although my hardware currently only generates 32bit addresses.
> –
> Steve Williams “The woods are lovely, dark and deep.
> steve at icarus.com But I have promises to keep,
> http://www.icarus.com and lines to code before I sleep,
> http://www.picturel.com And lines to code before I sleep.”
>
> —
> Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

I tested it. I asked for 128K map registers for each detected board
by passing 131072 to the requested number of map registers, and Windows
changed the value to 65537 (0x10001). That is the in-out parameter
that is supposed to limit the number of MapTransfers one may have
active. I tested on Windows 2000. I need this to work on 2000/XP,
and so on.

And the other part of the question is whether this is a common buffer
mapping or Windows is really returning a pointer to an internal buffer
that I must flush with FlushAdapterBuffers. That would be fatal to me.

Maxim S. Shatskih wrote:

I once saw unlimited number of DMA map registers in this case.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Stephen Williams”
> Newsgroups: ntdev
> To: “Windows System Software Devs Interest List”
> Sent: Thursday, June 23, 2005 10:40 PM
> Subject: [ntdev] MapTransfer on PCI

>>The problem here is that I’m only able to get some 65000 map
>>registers for each board, begging the question “Am I trading
>>one precious resource for another?”


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

You cannot construct a direct IO request that is greater than 64M (- a
little bit) as the MDLs are restricted here as well.

On a 32bit platform with a 32bit busmaster pci device will not use
bounce buffers, so map registers are nothing much at all other than
physical addresses.

Obviously on a 64bit platform map register bounce buffers are a distinct
possibility.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Stephen Williams
Sent: Thursday, June 23, 2005 3:07 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] MapTransfer on PCI

I tested it. I asked for 128K map registers for each detected board
by passing 131072 to the requested number of map registers, and Windows
changed the value to 65537 (0x10001). That is the in-out parameter
that is supposed to limit the number of MapTransfers one may have
active. I tested on Windows 2000. I need this to work on 2000/XP,
and so on.

And the other part of the question is whether this is a common buffer
mapping or Windows is really returning a pointer to an internal buffer
that I must flush with FlushAdapterBuffers. That would be fatal to me.

Maxim S. Shatskih wrote:

I once saw unlimited number of DMA map registers in this case.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Stephen Williams”
> Newsgroups: ntdev
> To: “Windows System Software Devs Interest List”
> Sent: Thursday, June 23, 2005 10:40 PM
> Subject: [ntdev] MapTransfer on PCI

>>The problem here is that I’m only able to get some 65000 map
>>registers for each board, begging the question “Am I trading
>>one precious resource for another?”


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@stratus.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Actually, Mark is slightly wrong – a 32-bit device will end up using
trampoline buffers IF you enable PAE mode - doesn’t make any sense
unless you have >4GB RAM but it is possible nonetheless!

/simgr

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Roddy, Mark
Sent: Thursday, June 23, 2005 3:59 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] MapTransfer on PCI

You cannot construct a direct IO request that is greater than 64M (- a
little bit) as the MDLs are restricted here as well.

On a 32bit platform with a 32bit busmaster pci device will not use
bounce buffers, so map registers are nothing much at all other than
physical addresses.

Obviously on a 64bit platform map register bounce buffers are a distinct
possibility.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Stephen Williams
Sent: Thursday, June 23, 2005 3:07 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] MapTransfer on PCI

I tested it. I asked for 128K map registers for each detected board by
passing 131072 to the requested number of map registers, and Windows
changed the value to 65537 (0x10001). That is the in-out parameter that
is supposed to limit the number of MapTransfers one may have active. I
tested on Windows 2000. I need this to work on 2000/XP, and so on.

And the other part of the question is whether this is a common buffer
mapping or Windows is really returning a pointer to an internal buffer
that I must flush with FlushAdapterBuffers. That would be fatal to me.

Maxim S. Shatskih wrote:

I once saw unlimited number of DMA map registers in this case.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Stephen Williams”
> Newsgroups: ntdev
> To: “Windows System Software Devs Interest List”
> Sent: Thursday, June 23, 2005 10:40 PM
> Subject: [ntdev] MapTransfer on PCI

>>The problem here is that I’m only able to get some 65000 map registers

>>for each board, begging the question “Am I trading one precious
>>resource for another?”


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@stratus.com To
unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Roddy, Mark wrote:

You cannot construct a direct IO request that is greater than 64M (- a
little bit) as the MDLs are restricted here as well.

I’m already prepared for that, in that I can make multiple mappings
of <64Meg.

On a 32bit platform with a 32bit busmaster pci device will not use
bounce buffers, so map registers are nothing much at all other than
physical addresses.

That implies that Windows would not care about the number of map
registers requested of IoGetDmaAdapter, yet it cuts my request down
to some limited value. Why? Color me confuzed.

Obviously on a 64bit platform map register bounce buffers are a distinct
possibility.

Granted, and I’m going to have to address that sometime. That’s a
seperate development path, though. That brings up a question, how
can I tell that this is happening so that I can start reporting error
messages?

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Stephen Williams
Sent: Thursday, June 23, 2005 3:07 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] MapTransfer on PCI

I tested it. I asked for 128K map registers for each detected board
by passing 131072 to the requested number of map registers, and Windows
changed the value to 65537 (0x10001). That is the in-out parameter
that is supposed to limit the number of MapTransfers one may have
active. I tested on Windows 2000. I need this to work on 2000/XP,
and so on.

And the other part of the question is whether this is a common buffer
mapping or Windows is really returning a pointer to an internal buffer
that I must flush with FlushAdapterBuffers. That would be fatal to me.

Maxim S. Shatskih wrote:

> I once saw unlimited number of DMA map registers in this case.
>
>Maxim Shatskih, Windows DDK MVP
>StorageCraft Corporation
>xxxxx@storagecraft.com
>http://www.storagecraft.com
>
>----- Original Message -----
>From: “Stephen Williams”
>>Newsgroups: ntdev
>>To: “Windows System Software Devs Interest List”
>>Sent: Thursday, June 23, 2005 10:40 PM
>>Subject: [ntdev] MapTransfer on PCI
>
>
>>>The problem here is that I’m only able to get some 65000 map
>>>registers for each board, begging the question “Am I trading
>>>one precious resource for another?”
>
>
>
>


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

Stephen Williams wrote:

That implies that Windows would not care about the number of map
registers requested of IoGetDmaAdapter, yet it cuts my request down
to some limited value. Why? Color me confuzed.

These are very good questions.

First of all, Windows does not now and never has used the contents of
the NumberOfMapRegisters parameter for INPUT. Ever. I see that this
appears in the comments section in the DDK docs, but it is wrong – at
least for all existing HALs. I don’t know how this got started, but I’ve
been hearing it for years.

So, the value you pass into this parameter is entirely ignored.

Second, whether you use bounce buffers depends not only on whether
you’re a bus master and you can directly address all of memory, but also
whether you indicate that your device supports scatter/gather (this is
all according to the DEVICE_DESCRIPTION data structure, of course).

If you are a bus master AND you support scatter/gather (and you support
64 bit addressing when there’s memory on the system over the 4GB mark)
then the number of map registers returned from IoGetDmaAdapter is set
according to the max transfer size you indicated on your call.

If you do NOT support scatter/gather, even if you’re a bus master, the
number of map registers you can use at one time is limited. The number
probably depends on the HAL implementation.

For anyone that’s not up on their DMA concepts (and how many of us
really are, right?) I’d recommend reading the WHDC whitepaper (that I
wrote) on DMA in WDM and KMDF:
http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/dma.doc
– I think you’ll find it worth a quick skim at least because it deals
exactly with the questions you’ve raised.

HTH,

Peter
OSR

Hi Peter,

Since you just admitted to being the author of dma.doc (which I’ve read
multiple times now - thank you for writing it!), I had a question
regarding forcing alignment requirements for a WDM driver. I searched /
reread this doc multiple times and was unable to find anything showing
the appropriate way of forcing alignment in a WDM driver.

I did find some information in the DDK, a section called “Initializing a
Device Object” which describes setting the AlignmentRequirement field of
the device object. Would this be the correct way to accomplish this?
(I ask because I am working on a WDM driver for a PCI card which must
have quadword address alignment or it - the card - goes boom.)

Thanks in Advance!
-Mike

PeterGV (OSR) wrote:

Stephen Williams wrote:

> That implies that Windows would not care about the number of map
> registers requested of IoGetDmaAdapter, yet it cuts my request down
> to some limited value. Why? Color me confuzed.
>
These are very good questions.

First of all, Windows does not now and never has used the contents of
the NumberOfMapRegisters parameter for INPUT. Ever. I see that this
appears in the comments section in the DDK docs, but it is wrong – at
least for all existing HALs. I don’t know how this got started, but
I’ve been hearing it for years.

So, the value you pass into this parameter is entirely ignored.

Second, whether you use bounce buffers depends not only on whether
you’re a bus master and you can directly address all of memory, but
also whether you indicate that your device supports scatter/gather
(this is all according to the DEVICE_DESCRIPTION data structure, of
course).

If you are a bus master AND you support scatter/gather (and you
support 64 bit addressing when there’s memory on the system over the
4GB mark) then the number of map registers returned from
IoGetDmaAdapter is set according to the max transfer size you
indicated on your call.

If you do NOT support scatter/gather, even if you’re a bus master, the
number of map registers you can use at one time is limited. The
number probably depends on the HAL implementation.

For anyone that’s not up on their DMA concepts (and how many of us
really are, right?) I’d recommend reading the WHDC whitepaper (that I
wrote) on DMA in WDM and KMDF:
http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/dma.doc
– I think you’ll find it worth a quick skim at least because it deals
exactly with the questions you’ve raised.

HTH,

Peter
OSR


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@hologic.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

PeterGV (OSR) wrote:

Stephen Williams wrote:

> That implies that Windows would not care about the number of map
> registers requested of IoGetDmaAdapter, yet it cuts my request down
> to some limited value. Why? Color me confuzed.
>

If you are a bus master AND you support scatter/gather (and you support
64 bit addressing when there’s memory on the system over the 4GB mark)
then the number of map registers returned from IoGetDmaAdapter is set
according to the max transfer size you indicated on your call.

Ah HAH! I set the MaximumLength to 0x0fffffff, and that /4096 is
the number of map registers it returned, plus one (presumably for
alignment). I set the MaximumLength to very large (there is no
maximum length for the hardware) and I get much higher numbers
output.

My device is bus mastering and supports scatter/gather, so on
32bit systems I should be golden. My hardware is also 64bit
capable so with some software work I should be golden there as
well.


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

Michael Becker wrote:

Hi Peter,

I searched /
reread this doc multiple times and was unable to find anything showing
the appropriate way of forcing alignment in a WDM driver.
I did find some information in the DDK, a section called “Initializing a
Device Object” which describes setting the AlignmentRequirement field of
the device object. Would this be the correct way to accomplish this?

Another really good question.

While the full story is more complex, the short answer is that there
really IS no way to “force” an alignment requirement in Windows. But,
YES… you should specify you’re required alignment in the
AlignmentRequirement field of the Device Object. See the DDK topic
“Initializing a Device Object”

You can prove this to yourself very simply: Create a driver that does
just about nothing. After you’ve created the device object and attached
it to the underlying device stack, set the alignment requirement to, oh,
something like 0xFF (indicating a 256 byte alignment requirement), and
set neither DO_BUFFERED_IO nor DO_DIRECT_IO in your device object (thus
requesting NEITHER I/O).

Write a user-mode app that sends a byte aligned buffer to your driver
and see what happens.

The answer: Your driver gets the buffer.

Part of the story is that the alignment requirement is typically
advisory, and nothing more. If a user opens your device specifying
NO_INTERMEDIATE_BUFFERING (like for a file in the disk stack), then if
the buffer sent by the user is not aligned the user MAY get back an
error (depending on the file system involved).

Bottom line: For alignment requirement, set the field in the device
object, and then you’re on your own :slight_smile:

I hope that helps,

Peter
OSR

Stephen Williams wrote:

I am looking at instead locking down memory allocated in user
mode, using a DeviceIoControl and METHOD_DIRECT, then using
MapTransfer to get mappings of that memory, page at a time
(scattered pages are OK for the target board) that the target
board can use to randomly access those pages.

Now I have another question about the MapTransfer function, and
that is “How are the pages unmapped?” It is possible for my PCI
device (assuming I an not stuck with a trampoline buffer) that
I don’t need to.

I suspect, though that the cannonical way is to use FreeAdapterChannel
to “free” the mappings. My problem is that I’m supporting random
map and unmap of pages. If it really takes a FreeAdapterChannel
to release mapping resorces, then that appears to be *global*
(for the adapter channel) and also the documentation implies that
I can have only 1 AdapterChannel for a PCI device, and that implies
that FreeAdapterChannel is global for the device.

This isn’t quite what I want. Better would be to create an AdapterChannel
for each mapped region, and free that AdapterChannel when I want to
clean up just that mapping. Unfortunately, the documentation seems
to imply I can have only one AdapterChannel allocated per PCI bus
mastering DMA device. Eh?


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

Stephen Williams wrote:

Stephen Williams wrote:

Now I have another question about the MapTransfer function, and
that is “How are the pages unmapped?” It is possible for my PCI
device (assuming I an not stuck with a trampoline buffer) that
I don’t need to.

I suspect, though that the cannonical way is to use FreeAdapterChannel
to “free” the mappings. My problem is that I’m supporting random
map and unmap of pages. If it really takes a FreeAdapterChannel
to release mapping resorces, then that appears to be *global*
(for the adapter channel) and also the documentation implies that
I can have only 1 AdapterChannel for a PCI device, and that implies
that FreeAdapterChannel is global for the device.

This isn’t quite what I want. Better would be to create an AdapterChannel
for each mapped region, and free that AdapterChannel when I want to
clean up just that mapping. Unfortunately, the documentation seems
to imply I can have only one AdapterChannel allocated per PCI bus
mastering DMA device. Eh?

In the first place, I’d be happier overall (and so would you) if you
were using GetScatterGatherList/PutScatterGatherList. This is just in
general a more modern function and doesn’t have the “one request can be
queued at a time or you blue screen” restriction that
AllocateAdapterChannel has. And it’s got a clearer programming/use model.

However…
You may have multiple DMA Adapters per device. In fact, I typically
call IoGetDmaAdapter once per simultaneously active DMA operation that
my device can support (like, one read and one write).
AllocateAdapterChannel requests to allocate THE (single) channel. You
are correct that there aren’t multiple adapter channels per DMA Adapter.

AllocateAdapterChannel sets up a single operation and “allocates” the
map registers for the operation. You keep these by returning
DeallocateObjectKeepRegisters from your execution routine.

FreeMapRegisters “frees” the map registers you allocated.

FreeAdapterChannel reclaims the object, and starts the queued request,
if there is one. Prior to doing this, it also calls FreeMapRegisters if
you forgot to do so.

For each transfer iteration, you always must:

MapTransfer - This “programs” the map registers
(do your DMA operation)
FlushAdapterBuffers - F;ushing any data remaining in the adapter cache.

Sooooo… how many of these operations do you need in progress at a time?

You could always create mutliple DMA adapters, AllocateAdapterChannel,
and then KEEP the map registers (don’t call either FreeMapRegisters or
FreeAdapterChannel).

You can then do pairs of MapTransfer (DMA operation here) and then
FlushAdapterBuffers. Repeat.

Call FreeAdapterChannel before unloading.

It’s not pretty, but it should work, and won’t be abberant assuming
you’re not using bounce buffers for any reason.

P

PeterGV (OSR) wrote:

Stephen Williams wrote:
> I suspect, though that the cannonical way is to use FreeAdapterChannel
> to “free” the mappings. My problem is that I’m supporting random
> map and unmap of pages. If it really takes a FreeAdapterChannel
> to release mapping resorces, then that appears to be *global*
> (for the adapter channel) and also the documentation implies that
> I can have only 1 AdapterChannel for a PCI device, and that implies
> that FreeAdapterChannel is global for the device.

You could always create mutliple DMA adapters, AllocateAdapterChannel,
and then KEEP the map registers (don’t call either FreeMapRegisters or
FreeAdapterChannel).

You can then do pairs of MapTransfer (DMA operation here) and then
FlushAdapterBuffers. Repeat.

Call FreeAdapterChannel before unloading.

It’s not pretty, but it should work, and won’t be abberant assuming
you’re not using bounce buffers for any reason.

Hah! Sounds like just the ticket! It didn’t occur to me to think
that I can have multiple adapters. It makes sense, actually. With this
technique, I can bring up and tear down my mappings (up to 16, each
up to 64Meg) as fluidly as the whims of the application.

The only lingering concern of mine is what happens when (not if) I
start seeing 64bit host machines. I believe if I tell the adapter that
the device is 64bit capable (a true statement, after a bit of firmware
work) I’ll also be safe from evil trampoline buffers.


Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

Stephen Williams wrote:

Hah! Sounds like just the ticket! It didn’t occur to me to think
that I can have multiple adapters. It makes sense, actually. With this
technique, I can bring up and tear down my mappings (up to 16, each
up to 64Meg) as fluidly as the whims of the application.

The only lingering concern of mine is what happens when (not if) I
start seeing 64bit host machines. I believe if I tell the adapter that
the device is 64bit capable (a true statement, after a bit of firmware
work) I’ll also be safe from evil trampoline buffers.

Excatly correct.

Using multiple adapters when your device is capable of multiple
simultaneous DMA operations actually makes good sense. You’re properly
describing to the HAL the extent of your DMA capabilities. One adapter
equals one “simultaneous” DMA transfer.

I don’t know if you’re planning to use the solution I outlined of
calling GetDmaAdapter, keeping the mapping registers for your driver’s
lifetime, and then calling MapTransfer/FlushAdapterBuffers

has the caveat that “this only makes good engineering sense if your
adapter can reach all of physical memory without the assistance of the
HAL.” IOW, your adapter needs to be 64-bit capable on any machine with
4GB or more of physical memory, it must support scatter/gather in
hardware, and it must indicate both of these things in the
DEVICE_DESCRIPTION data structure.

PeterGV (OSR) wrote:

Stephen Williams wrote:

>
> Hah! Sounds like just the ticket! It didn’t occur to me to think
> that I can have multiple adapters. It makes sense, actually. With this
> technique, I can bring up and tear down my mappings (up to 16, each
> up to 64Meg) as fluidly as the whims of the application.
>
> The only lingering concern of mine is what happens when (not if) I
> start seeing 64bit host machines. I believe if I tell the adapter that
> the device is 64bit capable (a true statement, after a bit of firmware
> work) I’ll also be safe from evil trampoline buffers.
>

Excatly correct.

Using multiple adapters when your device is capable of multiple
simultaneous DMA operations actually makes good sense. You’re properly
describing to the HAL the extent of your DMA capabilities. One adapter
equals one “simultaneous” DMA transfer.

Seems obvious once a few live brain cells are assigned to the concept.

I don’t know if you’re planning to use the solution I outlined of
calling GetDmaAdapter, keeping the mapping registers for your driver’s
lifetime, and then calling MapTransfer/FlushAdapterBuffers

Calling GetDmaAdapter and using FlushAdapterBuffers was my intention,
given current understanding. It’s easy for my driver to frame the
map and unmap with Get… and Flush…

has the caveat that “this only makes good engineering sense if your
adapter can reach all of physical memory without the assistance of the
HAL.” IOW, your adapter needs to be 64-bit capable on any machine with
4GB or more of physical memory, it must support scatter/gather in
hardware, and it must indicate both of these things in the
DEVICE_DESCRIPTION data structure.

All of that is (mostly) so. I have a little firmware work to do on
the card itself to manage 64bit addresses, but the PCI bridge that the
card is using does support 64bit, and I understand what to do in the
DEVICE_DESCRIPTION (set a single extra flag) when I’m ready.

Thanks again,

Steve Williams “The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep.”

Stephen Williams wrote:

Calling GetDmaAdapter and using FlushAdapterBuffers was my intention,
given current understanding. It’s easy for my driver to frame the
map and unmap with Get… and Flush…

OK… sorry… I mis-typed above, and I’m afraid I’ve confused things.
Let me clarify again, if only for the archive:

IoGetDmaAdapter – As many times as you have simultaneous operations.

Then:

  1. AllocateAdapterChannel (allocates adapter object and map registers)

  2. execution routine callback

  3. within execution routine callback MapTransfer

  4. execution routine callback returns “DeallocateObjectKeepRegisters”
    (returning the adapter object)

  5. DMA completes (typically DpcForIsr)

  6. In DpcForIsr: FlushAdapterBuffers

  7. In DpcForIsr: FreeMapRegisters

  8. In DpcForIsr: FreeAdapterChannel (starts pending request, if there is
    one, resulting in callback to execution routine).

The standard DMA model would have you execute steps 1 through 8 for each
packet-based I/O request. You may do this in parallel as many times as
you have DMA Adapters. But BE CAREFUL: Note that there’s a single queue
in your device object that’s used if an adapter is not available. Thus,
you really, really, want to use the new DMA model and not this old-style
stuff if at all possible.

My point earlier – in addition to the fact that you can allocate
multiple DMA adapters – was that you could do steps 1-4 once in your
driver during intialiation, then call MapTransfer and
FlushAdapterBuffers to process each request, and then steps 7 and 8
during unload. My caveat (that this only worked for bus-master DMA
controllers that support h/w scatter/gather and can reach all of
physical memory without the HAL’s help) applied to this “modified”
scheme. This scheme is not a good idea for general use, but can be
useful under limited circumstances if you know what you’re doing.

Sorry for being confusing,

Peter
OSR

> Now I have another question about the MapTransfer function, and

that is “How are the pages unmapped?” It is possible for my PCI
device (assuming I an not stuck with a trampoline buffer) that
I don’t need to.

I suspect, though that the cannonical way is to use FreeAdapterChannel
to “free” the mappings. My problem is that I’m supporting random

No, FlushAdapterBuffers is the undo for MapTransfer.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com