Hi guys!
Thanks for the answers.
I badly need the 1G buffer, but it needn’t to be continuous, because
the card has a scatter DMA-like feature.
But unfortunately, the size of a buffer described by an Mdl structure
is limited to 64M. No Mdl can be larger than 64M.
So direct io (METHOD_IN_DIRECT of course, thanks for the correction)
can deal with buffers up to 64M.
The solution may be, to use 16 separate buffers, max of 64M each.
They can be passed to the driver in series, it then links them together,
and starts the DMA. I don’t know if this will work, there may be hidden
(at least for me) limits.
Good news, that the buffer needn’t to be mapped entirely in the system
virtual address space. It can be mapped in parts (to link the pages
together), or not at all. I would be very happy, if only one page could
have been mapped at a time. The user program can make the linking too,
however it’s a very ugly and unsecure method.
I write about linking the pages, because my card does not feature real
scatter DMA, but a special solution only, similar to scatter DMA.
The pages are linked together with their last DWORD, it’s a pointer to
the next page. So only 32 bit addressing is implemented, only the low
4G is accessible to the DMA engine. This is hopefully not a problem for
me, because we’ll never need to deal with buffers larger than 1G.
The DMA will sweep through the buffer at 40 megabytes/sec, or may be
faster in the future. If it reaches the end of the buffer, it should
restart at the beginning, continuosly, using always the same datas.
Only the operator can stop the operation.
Because the high speed, the entire buffer must be locked into the memory
before the DMA starts. The PCI bus speed does not allow the buffer to be
swapped out, etc…
The DMA may done in parts, the next buffer may start processing when
the first is done, but all must be locked into the memory.
This is the problem, I hope you can understand my bad english 
I will be happy about any idea, information.
The only chances I know so far, is to use the MmAllocateContiguousMemory,
it’s documented to able to allocate over 1G running under XP.
Or 16 separate buffers, max of 64M each can be processed in series.
But this method is not acknowledged surely working by the professionals yet.
I’m not familiar with windows kernel driver development, I only wrote
simple drivers for PCI IO cards so far.
The first method seems much more simpler to me, however I know that it’s
very ugly, makes a lot of overhead, etc…
Even the memory must be allocated at driver initialization time, because
later, when the system memory gets fragmented, other buffers are locked
into the memory, I won’t get so much continuous space.
Thanks for your help, please let me know if you have any idea.
–
Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”