1G DMA buffer under XP - how?

Hi!

I need 1G DMA buffer for our PCI card. It features scatter DMA*, but
supports 32bit adressing only.
What is the preferred method to allocate such a huge DMA buffer?
MSDN says, that under XP, MmAllocateContiguousMemory can allocate more
than 1G. But what are the actual limits of AllocateCommonbuffer, and
how big is the biggest buffer, that can be passed to a kernel driver via
direct io? (METHOD_IN_BUFFERED)

*: our scatter DMA is a little special. The pages are linked together
with the last DWORD, it points to the next page.

Thanks for your help.


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

I don’t like your chances. Much of top 2GB of virtual address range is
already
reserved for kernel use, although system can restrict itself to top 1GB,
but that is
(possibly) only to provide 3G user virtual address range.

If your card supports scatter DMA then you don’t need to allocate physically
contiguous memory.

method_in_buffered is not direct io. METHOD_IN_DIRECT or METHOD_NEITHER
would be needed for large io transfers.

ned

VF wrote:

Hi!

I need 1G DMA buffer for our PCI card. It features scatter DMA*, but
supports 32bit adressing only.
What is the preferred method to allocate such a huge DMA buffer?
MSDN says, that under XP, MmAllocateContiguousMemory can allocate more
than 1G. But what are the actual limits of AllocateCommonbuffer, and
how big is the biggest buffer, that can be passed to a kernel driver via
direct io? (METHOD_IN_BUFFERED)

*: our scatter DMA is a little special. The pages are linked together
with the last DWORD, it points to the next page.

Thanks for your help.

Direct IO is limited to ~64M. However it is unclear from the OP’s post why
he thinks he needs to allocate a 1GB buffer. I think he has a
misunderstanding about his devices DMA capabilities. If it does SG then he
certainly doesn’t continguous memory.

Assume that what he really needs to do is to perform DMA transfers from/to a
huge virtually contiguous user mode buffer. Supposedly (as in I’ve never
actually done it) one can use METHOD_NEITHER to communicate from user mode
to the driver, and then the driver can construct MDLs to create 64M windows
on the 1GB user VA.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of ned kacavenda
Sent: Wednesday, October 27, 2004 7:04 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] 1G DMA buffer under XP - how?

I don’t like your chances. Much of top 2GB of virtual
address range is already reserved for kernel use, although
system can restrict itself to top 1GB, but that is
(possibly) only to provide 3G user virtual address range.

If your card supports scatter DMA then you don’t need to
allocate physically contiguous memory.

method_in_buffered is not direct io. METHOD_IN_DIRECT or
METHOD_NEITHER would be needed for large io transfers.

ned

VF wrote:

>Hi!
>
>I need 1G DMA buffer for our PCI card. It features scatter DMA*, but
>supports 32bit adressing only.
>What is the preferred method to allocate such a huge DMA buffer?
>MSDN says, that under XP, MmAllocateContiguousMemory can
allocate more
>than 1G. But what are the actual limits of AllocateCommonbuffer, and
>how big is the biggest buffer, that can be passed to a kernel driver
>via direct io? (METHOD_IN_BUFFERED)
>
>*: our scatter DMA is a little special. The pages are linked
together
>with the last DWORD, it points to the next page.
>
>Thanks for your help.
>
>
>


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as:
xxxxx@hollistech.com To unsubscribe send a blank email to
xxxxx@lists.osr.com

Presumably the 64Mb limit is on account of IO manager creating an MDL
for direct io
use. in METHOD_NEITHER, which I have used, the 64mb limit is not there
if mdl memory is
allocated first. If 1GB of user memory is invloved then either I/O is
not time critical, in which
case the DMA trasfer could be done in smaller segments OR if I/O IS time
critical he will have
no chance of locking down all of the memory for a single I/O trasfer,
regardless whether it is
described by single MDL or multiple MDLs. He’ll have to be a little
more descriptive in what
the aim of the exercise is.

ned

Mark Roddy wrote:

Direct IO is limited to ~64M. However it is unclear from the OP’s post why
he thinks he needs to allocate a 1GB buffer. I think he has a
misunderstanding about his devices DMA capabilities. If it does SG then he
certainly doesn’t continguous memory.

Assume that what he really needs to do is to perform DMA transfers from/to a
huge virtually contiguous user mode buffer. Supposedly (as in I’ve never
actually done it) one can use METHOD_NEITHER to communicate from user mode
to the driver, and then the driver can construct MDLs to create 64M windows
on the 1GB user VA.

Hi guys!

Thanks for the answers.
I badly need the 1G buffer, but it needn’t to be continuous, because
the card has a scatter DMA-like feature.
But unfortunately, the size of a buffer described by an Mdl structure
is limited to 64M. No Mdl can be larger than 64M.
So direct io (METHOD_IN_DIRECT of course, thanks for the correction)
can deal with buffers up to 64M.
The solution may be, to use 16 separate buffers, max of 64M each.
They can be passed to the driver in series, it then links them together,
and starts the DMA. I don’t know if this will work, there may be hidden
(at least for me) limits.

Good news, that the buffer needn’t to be mapped entirely in the system
virtual address space. It can be mapped in parts (to link the pages
together), or not at all. I would be very happy, if only one page could
have been mapped at a time. The user program can make the linking too,
however it’s a very ugly and unsecure method.

I write about linking the pages, because my card does not feature real
scatter DMA, but a special solution only, similar to scatter DMA.
The pages are linked together with their last DWORD, it’s a pointer to
the next page. So only 32 bit addressing is implemented, only the low
4G is accessible to the DMA engine. This is hopefully not a problem for
me, because we’ll never need to deal with buffers larger than 1G.

The DMA will sweep through the buffer at 40 megabytes/sec, or may be
faster in the future. If it reaches the end of the buffer, it should
restart at the beginning, continuosly, using always the same datas.
Only the operator can stop the operation.
Because the high speed, the entire buffer must be locked into the memory
before the DMA starts. The PCI bus speed does not allow the buffer to be
swapped out, etc…
The DMA may done in parts, the next buffer may start processing when
the first is done, but all must be locked into the memory.

This is the problem, I hope you can understand my bad english :slight_smile:
I will be happy about any idea, information.
The only chances I know so far, is to use the MmAllocateContiguousMemory,
it’s documented to able to allocate over 1G running under XP.
Or 16 separate buffers, max of 64M each can be processed in series.
But this method is not acknowledged surely working by the professionals yet.
I’m not familiar with windows kernel driver development, I only wrote
simple drivers for PCI IO cards so far.
The first method seems much more simpler to me, however I know that it’s
very ugly, makes a lot of overhead, etc…
Even the memory must be allocated at driver initialization time, because
later, when the system memory gets fragmented, other buffers are locked
into the memory, I won’t get so much continuous space.

Thanks for your help, please let me know if you have any idea.


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

Thus spake Mark Roddy :

> Direct IO is limited to ~64M. However it is unclear from the OP’s post why
> he thinks he needs to allocate a 1GB buffer. I think he has a
> misunderstanding about his devices DMA capabilities. If it does SG then he
> certainly doesn’t continguous memory.

The 1G buffer is absolutely required, but it needn’t to be continuous.
I understand the DMA capabilities, because I developed the card too :slight_smile:

> Assume that what he really needs to do is to perform DMA transfers from/to a
> huge virtually contiguous user mode buffer. Supposedly (as in I’ve never
> actually done it) one can use METHOD_NEITHER to communicate from user mode
> to the driver, and then the driver can construct MDLs to create 64M windows
> on the 1GB user VA.

This sounds good! But METHOD_NEITHER is not too much use, is it?
I can pass the buffer virtual address and size in a METHOD_BUFFERED request
too. The problem is, I cannot deal with user mode virtual addresses from
kernel mode code. How can I build an Mdl for the buffer, or for a part of
it? And how can it be mapped to the system virtual address space?
Thanks for the idea, I’ll dig myself into the DDK docs…
If somebody have time to explain this a little more, please!
Source example etc… would be nice. Thanks!


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“We all live in a yellow subroutine”

>

This sounds good! But METHOD_NEITHER is not too much use, is it?
I can pass the buffer virtual address and size in a METHOD_BUFFERED request
too. The problem is, I cannot deal with user mode virtual addresses from
kernel mode code. How can I build an Mdl for the buffer, or for a part of
it? And how can it be mapped to the system virtual address space?

VF… You really need to step back for a minute and take some time to
learn the basics about how to write a driver for Windows. I mean no
offense by this, but seriously, you have little chance of succeeding in
writing a realiable driver without more background. At least take the
time to read a couple of good books on the topic. Really. Your device
has some interesting challenges, and without understanding the bigger
picture, your driver is going to have some equally interesting problems!

For example: What are you going to do about 32-bit x86 systems that
happen to have 4GB or more of physical memory installed? Surely you’re
going to say “well, I won’t support those”… but will your driver
actually CHECK for this when its loaded and refuse to load as result?

You need to do some upfront design: Who allocates the buffer, for
example? Can the user-mode program allocate the buffer?

Whether or not you can use the User Virtual Address that’s passed into
your driver via method_neither depends on the context in which your
driver will be called. Assuming you’re called in the context of the
requesting process, given the user VA, you can build an MDL of whatever
size you want with IoAllocateMdl and MmProbeAndLockPages. On initial
setup of the buffer list, you can map the buffer described by the MDL
into Kernel Virtual Address space using MmMapLockedPagesSpecifyCache in
order for you to setup the linkage between the pages.

But, in all this, consider the fact the you’re going to attempt to pin
1GB of physical memory. I’m sure hoping that this system has LOTS of
additional physical memory on it.

What I’ve described to you here are basic pricipals of Windows driver
development, not esoteric advanced knowledge. So, once again, I most
strongly suggest that you step back and do some reading first. You’ll
save time in the long run, and the result will be a more reliable driver
in the end.

Peter
OSR

You do not say whether the DMA at 40MB per second must scan through all
of the memory
without delays. If so, then allocating 1G of virtual memory in user
space, imitialising it
and then passing multiple i/o requests to kernel driver will not work.
For dma to be
continuously active over the 1G, all of the memory must be locked down.
This is not
going to succeed, regardless of whether it is described by 1 mdl or 16 mdls.

If things are not time critical then you have more freedom.

If i/o IS time critical I suggest you give up on using NT memory or even NT.
One solution is to reserve memory outside NT by using boot.ini MAXMEM
switch so that NT uses a defined amount of memory for itself and
remainder can
be claimed by a driver you can write, similar to some ramdisk
implementations.
The memory would be physically contiguous and can be mapped into a user
process space provided the process space you are mapping this memory into
has a large enough hole in its 2G address range. That is not guaranteed
on account
of various dll load addresses you may not have control over.
Check ddk mapmem sample driver.
One thing you should be careful of is shared vga memory where some
cheap motherboards reserve top part of your RAM for vga use.
You driver must detect this by parsing resource usage info in registry.
Unfortunately I’ve never been able to determine unused memory range
using NT so you may have to do raw pointer access to non-nt memory
(after mapping bits of it into kernel virtual memory) . If you touch memory
that other hardware (vga) may have reserved you may lockup the box.

ned

VF wrote:

Hi guys!

Thanks for the answers.
I badly need the 1G buffer, but it needn’t to be continuous, because
the card has a scatter DMA-like feature.
But unfortunately, the size of a buffer described by an Mdl structure
is limited to 64M. No Mdl can be larger than 64M.
So direct io (METHOD_IN_DIRECT of course, thanks for the correction)
can deal with buffers up to 64M.
The solution may be, to use 16 separate buffers, max of 64M each.
They can be passed to the driver in series, it then links them together,
and starts the DMA. I don’t know if this will work, there may be hidden
(at least for me) limits.

Good news, that the buffer needn’t to be mapped entirely in the system
virtual address space. It can be mapped in parts (to link the pages
together), or not at all. I would be very happy, if only one page could
have been mapped at a time. The user program can make the linking too,
however it’s a very ugly and unsecure method.

I write about linking the pages, because my card does not feature real
scatter DMA, but a special solution only, similar to scatter DMA.
The pages are linked together with their last DWORD, it’s a pointer to
the next page. So only 32 bit addressing is implemented, only the low
4G is accessible to the DMA engine. This is hopefully not a problem for
me, because we’ll never need to deal with buffers larger than 1G.

The DMA will sweep through the buffer at 40 megabytes/sec, or may be
faster in the future. If it reaches the end of the buffer, it should
restart at the beginning, continuosly, using always the same datas.
Only the operator can stop the operation.
Because the high speed, the entire buffer must be locked into the memory
before the DMA starts. The PCI bus speed does not allow the buffer to be
swapped out, etc…
The DMA may done in parts, the next buffer may start processing when
the first is done, but all must be locked into the memory.

This is the problem, I hope you can understand my bad english :slight_smile:
I will be happy about any idea, information.
The only chances I know so far, is to use the MmAllocateContiguousMemory,
it’s documented to able to allocate over 1G running under XP.
Or 16 separate buffers, max of 64M each can be processed in series.
But this method is not acknowledged surely working by the professionals yet.
I’m not familiar with windows kernel driver development, I only wrote
simple drivers for PCI IO cards so far.
The first method seems much more simpler to me, however I know that it’s
very ugly, makes a lot of overhead, etc…
Even the memory must be allocated at driver initialization time, because
later, when the system memory gets fragmented, other buffers are locked
into the memory, I won’t get so much continuous space.

Thanks for your help, please let me know if you have any idea.