map registers

Hi!

I’m still working on our casette recorder project… :frowning:
As I wrote a few months ago, we need a huge DMA buffer, at least 1G.
It needn’t to be contiguous, may be fragmented, our PCI card supports
scatter/gather busmaster DMA.
But the only way I know so far to allocate such a huge DMA buffer, is
the MmAllocateContiguousMemorySpecifyCache or AllocateCommonBuffer.
(Both of these works fine basically, only the amount limit is the problem)
The Mdl structure itself, and all functions that based on Mdl’s, are
limited to about 64M, it’s far not enough.
Look at this URL:

http://www.microsoft.com/whdc/driver/tips/km-basics.mspx#EEAA

—>
MmAllocateContiguousMemory first attempts to allocate memory from the nonpaged pool. If a driver requests more contiguous memory than is currently available in the nonpaged pool, MmAllocateContiguousMemory instead sometimes tries to find a consecutive run of physical pages that satisfy the driver’s request. If so, it maps the pages using system page-table entries (PTEs) and returns this mapping to the caller.

Whether MmAllocateContiguousMemory searches for physical pages depends on several constraints, which might change in the future. For example, currently it searches only if the caller is running at APC_LEVEL or lower, because the memory manager must acquire synchronization at APC_LEVEL before searching.

The number of system PTEs varies in different releases of the operating system. It can range up to 460 MB on Windows 2000 and might exceed 1 GB on Windows XP and later versions. Consequently, a driver running under Windows XP or Windows Server™ 2003 might be able to allocate significantly more memory than under Windows 2000, even on the same physical hardware.
—>

The problem:
I never managed to allocate 1G in one block. My allocator function tries
to allocate the entire buffer, if it fails, tries to allocate the half of
it, until the requested amount reached, or the size drops below 64M.
It can allocate almost 1G, but only in 4…5 chunks. (The machine has
2G system memory) The largest chunk is 384 or 512M, I don’t remember.
The allocation takes a couple of minutes, usually more than 10 min. The
mouse freezes meanwhile.
What is wrong with this?
My driver is an usual kernel driver for a PCI card. What’s it’s running
level? Maybe not below APC_LEVEL?

I tried to change the NonpagedPoolSize registry entry. MS says that
maximum 80% of the system RAM is allowed here. I set it to 1G, then
1.5G, with no noticeable effect, the 1G allocation still fails.
Do you have any idea to make this working?
What version of winXP should be used, installation options, registry
settings etc… I need only a little more memory, 1G or maybe 1.1G of total
will do fine. And I would be very happy if the allocation runs faster.

My driver got 2 map registers only when calling IoGetDmaAdapter.
Maybe this is not enough? How can I make windows give me more map regs?

Or if this won’t work at all, I need a completely different solution.
For example, let’s hide 1.5G memory from the total of 2G for windows.
How can this be done, and then how can I access the hidden memory?

Thanks in advance.

Ps: Happy New Year!


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

xxxxx@lists.osr.com wrote on 01/05/2005 12:00:00 PM:

Hi!

I’m still working on our casette recorder project… :frowning:
As I wrote a few months ago, we need a huge DMA buffer, at least 1G.
It needn’t to be contiguous, may be fragmented, our PCI card supports
scatter/gather busmaster DMA.
But the only way I know so far to allocate such a huge DMA buffer, is
the MmAllocateContiguousMemorySpecifyCache or AllocateCommonBuffer.
(Both of these works fine basically, only the amount limit is the
problem)
The Mdl structure itself, and all functions that based on Mdl’s, are
limited to about 64M, it’s far not enough.

I suppose it’s a lack of imagination on my part, but I’m having trouble
figuring out how something as slow as a cassette tape needing anything
like that much memory, since that kind of memory allocation suggests a
very high bandwidth requirement, and I just don’t see a cassette having
that much bandwidth.

I found your original posts from a couple of months back, is this the
“almost” SG busmaster that uses a linked list of buffer segments instead
of a real SG implementation, and it’s further screwy because you stated
that it’s used as a circular buffer?

You stated a requirement of only 40 MB/sec in October. I think you can
easily achieve that with a smaller circular buffer than 1GB.

Phil

Philip D. Barila Windows DDK MVP
Seagate Technology, LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.

Thus spake Philip D Barila :

> I suppose it’s a lack of imagination on my part, but I’m having trouble
> figuring out how something as slow as a cassette tape needing anything
> like that much memory, since that kind of memory allocation suggests a
> very high bandwidth requirement, and I just don’t see a cassette having
> that much bandwidth.

It runs at 128x speed, records the two sides the same time.
The required bandwidth is about 40 megabytes/sec.
I don’t think, that the data can be simultaneously read from HDD and
written to the PCI card, because the IDE controller is generally
connected to the PCI too, and the two together is 80M/s.
Therefore, I need an entirely memory locked buffer, because even part of
it cannot be swapped out to disk too…

> I found your original posts from a couple of months back, is this the
> “almost” SG busmaster that uses a linked list of buffer segments instead
> of a real SG implementation, and it’s further screwy because you stated
> that it’s used as a circular buffer?

Yes. Because of the limited resources of the CPLD circuit used.
But a real scatter/gather implementation wouldn’t help a bit.
One SG list can’t be bigger than the maximum size of an Mdl, which is
only 64 megabytes :frowning:
The buffer should be played more times repeatedly, 80x…100x before new
data arrives.

> You stated a requirement of only 40 MB/sec in October. I think you can
> easily achieve that with a smaller circular buffer than 1GB.

How?

> Philip D. Barila Windows DDK MVP


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

Nobody uses such huge buffers.

What is usually done is:

  • next MJ_WRITE IRP arrived to the device is passed via
    GetScatterGatherList and its SGL is attached to the tail of the “huge SGL”.
  • it is the “huge SGL” over which the device’s DMA runs.
  • IRPs are completed when the device DMA passed all SGL entries for a
    particular IRP.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “VF”
To: “Windows System Software Devs Interest List”
Sent: Wednesday, January 05, 2005 10:00 PM
Subject: [ntdev] map registers

>
> Hi!
>
> I’m still working on our casette recorder project… :frowning:
> As I wrote a few months ago, we need a huge DMA buffer, at least 1G.
> It needn’t to be contiguous, may be fragmented, our PCI card supports
> scatter/gather busmaster DMA.
> But the only way I know so far to allocate such a huge DMA buffer, is
> the MmAllocateContiguousMemorySpecifyCache or AllocateCommonBuffer.
> (Both of these works fine basically, only the amount limit is the problem)
> The Mdl structure itself, and all functions that based on Mdl’s, are
> limited to about 64M, it’s far not enough.
> Look at this URL:
>
> http://www.microsoft.com/whdc/driver/tips/km-basics.mspx#EEAA
>
> —>
> MmAllocateContiguousMemory first attempts to allocate memory from the
nonpaged pool. If a driver requests more contiguous memory than is currently
available in the nonpaged pool, MmAllocateContiguousMemory instead sometimes
tries to find a consecutive run of physical pages that satisfy the driver’s
request. If so, it maps the pages using system page-table entries (PTEs) and
returns this mapping to the caller.
>
> Whether MmAllocateContiguousMemory searches for physical pages depends on
several constraints, which might change in the future. For example, currently
it searches only if the caller is running at APC_LEVEL or lower, because the
memory manager must acquire synchronization at APC_LEVEL before searching.
>
> The number of system PTEs varies in different releases of the operating
system. It can range up to 460 MB on Windows 2000 and might exceed 1 GB on
Windows XP and later versions. Consequently, a driver running under Windows XP
or Windows ServerT 2003 might be able to allocate significantly more memory
than under Windows 2000, even on the same physical hardware.
> —>
>
> The problem:
> I never managed to allocate 1G in one block. My allocator function tries
> to allocate the entire buffer, if it fails, tries to allocate the half of
> it, until the requested amount reached, or the size drops below 64M.
> It can allocate almost 1G, but only in 4…5 chunks. (The machine has
> 2G system memory) The largest chunk is 384 or 512M, I don’t remember.
> The allocation takes a couple of minutes, usually more than 10 min. The
> mouse freezes meanwhile.
> What is wrong with this?
> My driver is an usual kernel driver for a PCI card. What’s it’s running
> level? Maybe not below APC_LEVEL?
>
> I tried to change the NonpagedPoolSize registry entry. MS says that
> maximum 80% of the system RAM is allowed here. I set it to 1G, then
> 1.5G, with no noticeable effect, the 1G allocation still fails.
> Do you have any idea to make this working?
> What version of winXP should be used, installation options, registry
> settings etc… I need only a little more memory, 1G or maybe 1.1G of total
> will do fine. And I would be very happy if the allocation runs faster.
>
> My driver got 2 map registers only when calling IoGetDmaAdapter.
> Maybe this is not enough? How can I make windows give me more map regs?
>
> Or if this won’t work at all, I need a completely different solution.
> For example, let’s hide 1.5G memory from the total of 2G for windows.
> How can this be done, and then how can I access the hidden memory?
>
> Thanks in advance.
>
> Ps: Happy New Year!
>
> –
> Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
> “My love is REAL, unless declared INTEGER.”
>
>
> —
> Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

xxxxx@lists.osr.com wrote on 01/06/2005 12:03:54 AM:

Thus spake Philip D Barila :
>
> > I suppose it’s a lack of imagination on my part, but I’m having
trouble
> > figuring out how something as slow as a cassette tape needing anything

> > like that much memory, since that kind of memory allocation suggests a

> > very high bandwidth requirement, and I just don’t see a cassette
having
> > that much bandwidth.
>
> It runs at 128x speed, records the two sides the same time.
> The required bandwidth is about 40 megabytes/sec.
> I don’t think, that the data can be simultaneously read from HDD and
> written to the PCI card, because the IDE controller is generally
> connected to the PCI too, and the two together is 80M/s.
> Therefore, I need an entirely memory locked buffer, because even part of
> it cannot be swapped out to disk too…

OK, that explains your bandwidth requirements. One PCI bus is capable of
sustaining this, as long as both adapters (disk HBA and your own) don’t do
excessive PIO or excessive numbers of short bursts. Since you don’t have
any control over the disk HBA, make sure your device transfers in long
bursts. Two PCI busses, which is very common, will eliminate the
contention between the adapters for the bus. More on this later.

> > I found your original posts from a couple of months back, is this the
> > “almost” SG busmaster that uses a linked list of buffer segments
instead
> > of a real SG implementation, and it’s further screwy because you
stated
> > that it’s used as a circular buffer?
>
> Yes. Because of the limited resources of the CPLD circuit used.
> But a real scatter/gather implementation wouldn’t help a bit.
> One SG list can’t be bigger than the maximum size of an Mdl, which is
> only 64 megabytes :frowning:
> The buffer should be played more times repeatedly, 80x…100x before new
> data arrives.

I don’t understand this requirement. You have to write to 80 or 100 tapes
before some new data arrives, and that data arrives on a schedule?

> > You stated a requirement of only 40 MB/sec in October. I think you
can
> > easily achieve that with a smaller circular buffer than 1GB.
>
> How?

You only need to make sure the tape doesn’t get starved for data from the
disk. 40 MB/sec is approaching the limits of a single disk streaming from
the inner cylinders, but you can get around that by using multiple disks,
in a RAID 0 if you don’t need fault-tolerance, and it doesn’t sound like
you do. You can also just partition your disk with a small partition,
which will limit the data storage to the outer cylinders. So you can
definitely feed the tape’s bandwidth requirements from the disk.

Once you’ve setup your storage to ensure that you have sufficient
bandwidth, you can transfer the data from the disk into your circular
buffer until it’s full, then start your tape reading it. Start
transferring more data into the buffer right behind the reader. You have
to synch the reader and writer for this to work, so the writer (disk)
doesn’t overrun the data your reader (tape) is still reading.

Phil

Philip D. Barila Windows DDK MVP
Seagate Technology, LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.

Thus spake Philip D Barila :

> I don’t understand this requirement. You have to write to 80 or 100 tapes
> before some new data arrives, and that data arrives on a schedule?

No. This is not a requirement, only plus information :slight_smile:
The same data will be processed a lot of times. Minimum of 80…100 times
(one tape reel), maybe much more.

> You only need to make sure the tape doesn’t get starved for data from the
> disk. 40 MB/sec is approaching the limits of a single disk streaming from
> the inner cylinders, but you can get around that by using multiple disks,

[…]

Stop, please. Thank you for your help, but this would be much more
complicated and expensive, than allocating 1G-1.1G buffer.
The whole system works fine now, I only need some more buffer memory.

The latest idea (Big thanks to Graham Simon) is to limit window’s
memory usage by the /maxmem option, and then map the hidden memory
by MmMapIoSpace into kernel virtual address space in little parts.
(like ramdisk drivers do)
To figure out the physical address of the usable memory range, SMBIOS
must be queryed via WMI (for the total memory), and then
MmGetPhysicalMemoryRanges() to exclude the windows memory.
I need the physical address and size of the hidden memory, must find a
solution for this. Can someone help me?
I’m reading a lot of docs, don’t know when will I finish.

> Phil


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“Failed reading source file… (A muvelet sikeresen befejezodott.)”

Thus spake Maxim S. Shatskih :

> - next MJ_WRITE IRP arrived to the device is passed via
> GetScatterGatherList and its SGL is attached to the tail of the “huge SGL”.
> - it is the “huge SGL” over which the device’s DMA runs.
> - IRPs are completed when the device DMA passed all SGL entries for a
> particular IRP.

Ok, thanks, I didn’t know this. But doesn’t help me to allocate such
a huge buffer…

> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

xxxxx@lists.osr.com wrote on 01/06/2005 10:07:24 AM:

[snip]

Stop, please. Thank you for your help, but this would be much more
complicated and expensive, than allocating 1G-1.1G buffer.
The whole system works fine now, I only need some more buffer memory.

Since you are bent on doing it this way, then you’ve already described how
to do this.

You say that your hardware will just jump through the link at the end of
each buffer, so you don’t need one buffer, you need as many as it takes to
allocate as much as you need. Lock each one of them with an MDL, so that
they won’t ever get paged. Then just link them together in whatever form
your hardware needs. You might have more success at this if you allocate
your memory in UM, then send it to your driver in a series of IOCTLs. You
can pend all the IOCTLs until you are done with them. Then go back and
complete them. If you define your IOCTL as METHOD_IN_DIRECT or
METHOD_OUT_DIRECT, the IoManager will take care of probing and locking
them for you.

Phil

Philip D. Barila Windows DDK MVP
Seagate Technology, LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.

> It runs at 128x speed, records the two sides the same time.

The required bandwidth is about 40 megabytes/sec.
I don’t think, that the data can be simultaneously read from HDD and
written to the PCI card, because the IDE controller is generally
connected to the PCI too, and the two together is 80M/s.

An IDE controller is usually part of a chipset (South Bridge) and
is connected to a memory controller by much faster bus (at least 266 MB/s
for not too old chipsets) than a standard PCI 133 MB/s.
Something like this:
RAM<–>MC<–256MB/s connection–>IDE/SB<–133BM/s PCI–>PCIDEV0/1/2/…

So in your case the internal bus will use 80/256 = 31% of max BW,
the PCI bus with slots will use 40/166 = 31% of max BW.
It seems that you don’t need to have the buffer for all of your data.

Dmitriy Budko, VMware

Thus spake xxxxx@seagate.com :

> You say that your hardware will just jump through the link at the end of
> each buffer, so you don’t need one buffer, you need as many as it takes to
> allocate as much as you need. Lock each one of them with an MDL, so that
> they won’t ever get paged. Then just link them together in whatever form

Won’t work. If the total size of chunks allocated by AllocateCommonBuffer()
or MmAllocateContiguousMemorySpecifyCache() calls can’t grow bigger than
about 1 gigabytes, then a lot of Mdl allocated memory wouldn’t too.
The problem may be not the amount of memory itself, but the amount of
kernel virtual address range! At least I think.
So I need to allocate the memory, without mapping it.
Then map it in chunks of an acceptable size.
Solutions I know so far:
-AWE: could not disable cache, not well documented, perhaps may be virtual,
like the old GlobalLock, and not lock real physical pages into memory, …
Or is it worth trying?
-Hiding memory and mapping in parts by MmMapIoSpace()
This should work, but have a lot of questions.
Any more idea?
Or am I wrong, and more Mdl’s can solve the problem? Have anybody tried
this?

> complete them. If you define your IOCTL as METHOD_IN_DIRECT or
> METHOD_OUT_DIRECT, the IoManager will take care of probing and locking
> them for you.

Yes, I use this to transfer the data from the user program to the buffer.

> Phil


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

Reserving the memory or using AWE won’t work in all situations, since
neither is going to guarantee you get memory that your controller can
read. The only component that really knows what memory windows your
controller can access is the HAL, and that requires going through common
buffer or DMA buffer mapping.

However if you’re going to try and go down this route, perhaps you
should instead look at MmAllocatePagesForMdl. You could do this to
build multiple MDLs containing physical pages but using no virtual
address mappings.

So how are you filling this 1GB buffer with data? Is there another
controller that’s going to be writing into it? Or are you copying data
to it from other source.

-p

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of VF
Sent: Thursday, January 06, 2005 9:57 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] map registers

Thus spake xxxxx@seagate.com :
>
> > You say that your hardware will just jump through the link
> at the end
> > of each buffer, so you don’t need one buffer, you need as
> many as it
> > takes to allocate as much as you need. Lock each one of
> them with an
> > MDL, so that they won’t ever get paged. Then just link
> them together
> > in whatever form
>
> Won’t work. If the total size of chunks allocated by
> AllocateCommonBuffer() or
> MmAllocateContiguousMemorySpecifyCache() calls can’t grow
> bigger than about 1 gigabytes, then a lot of Mdl allocated
> memory wouldn’t too.
> The problem may be not the amount of memory itself, but the
> amount of kernel virtual address range! At least I think.
> So I need to allocate the memory, without mapping it.
> Then map it in chunks of an acceptable size.
> Solutions I know so far:
> -AWE: could not disable cache, not well documented, perhaps
> may be virtual, like the old GlobalLock, and not lock real
> physical pages into memory, …
> Or is it worth trying?
> -Hiding memory and mapping in parts by MmMapIoSpace() This
> should work, but have a lot of questions.
> Any more idea?
> Or am I wrong, and more Mdl’s can solve the problem? Have
> anybody tried this?
>
> > complete them. If you define your IOCTL as METHOD_IN_DIRECT or
> > METHOD_OUT_DIRECT, the IoManager will take care of probing
> and locking
> > them for you.
>
> Yes, I use this to transfer the data from the user program to
> the buffer.
>
> > Phil
>
> –
> Valenta Ferenc Visit me at
> http://ludens.elte.h u/~vf/
> “My love is REAL, unless declared INTEGER.”
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as:
> xxxxx@windows.microsoft.com To unsubscribe send a blank
> email to xxxxx@lists.osr.com
>

Thus spake Peter Wieland :

> Reserving the memory or using AWE won’t work in all situations, since
> neither is going to guarantee you get memory that your controller can
> read. The only component that really knows what memory windows your

Yes, I know. But it should work, because there are no different memory
types in the system, all of the system memory is accessible through DMA.
Does anybody know the UserPfnArray structure?
Maybe I allocate the buffer with this function, then map the pages with
MmMapIoSpace. Or is this a bad idea?

> However if you’re going to try and go down this route, perhaps you
> should instead look at MmAllocatePagesForMdl. You could do this to
> build multiple MDLs containing physical pages but using no virtual
> address mappings.

I’ll try this too… Thanks.

> So how are you filling this 1GB buffer with data? Is there another
> controller that’s going to be writing into it? Or are you copying data
> to it from other source.

No, a little user program provides the data. (From files on CD/HDD, but
the data goes through a couple of conversions, for example, Dolby)
Then the driver copies it to the DMA buffer.

> -p


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
This advertising space is for sale

Thus spake Dmitriy Budko :

> So in your case the internal bus will use 80/256 = 31% of max BW,
> the PCI bus with slots will use 40/166 = 31% of max BW.
> It seems that you don’t need to have the buffer for all of your data.

But the HDD will go thermal calibrating, etc… And a buffer underrun
occurs :frowning: Our idea is, that after the data loaded into the buffer, and
the DMA started, no software or hardware performance problem can slow
down the operation any more, the PCI card works independently.

> Dmitriy Budko, VMware


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
… ERR0R: Timing error! Please wait! And wait! And wait!

> -----Original Message-----

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of VF
Sent: Thursday, January 06, 2005 10:38 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] map registers

Thus spake Peter Wieland :
>
> > Reserving the memory or using AWE won’t work in all
> situations, since
> > neither is going to guarantee you get memory that your
> controller can
> > read. The only component that really knows what memory windows your
>
> Yes, I know. But it should work, because there are no
> different memory types in the system, all of the system
> memory is accessible through DMA.

So as long as it’s only ever going to run on this one computer I guess
you’re fine. Personally I’d feel pretty unsafe taking that bet.

> Does anybody know the UserPfnArray structure?
> Maybe I allocate the buffer with this function, then map the
> pages with MmMapIoSpace. Or is this a bad idea?

It’s a bad idea.

>
> > However if you’re going to try and go down this route, perhaps you
> > should instead look at MmAllocatePagesForMdl. You could do this to
> > build multiple MDLs containing physical pages but using no virtual
> > address mappings.
>
> I’ll try this too… Thanks.
>
> > So how are you filling this 1GB buffer with data? Is there another
> > controller that’s going to be writing into it? Or are you copying
> > data to it from other source.
>
> No, a little user program provides the data. (From files on
> CD/HDD, but the data goes through a couple of conversions,
> for example, Dolby) Then the driver copies it to the DMA buffer.

Explain again why you’re composing the data in kernel mode? Offhand it
seems like you could compose it in big user-mode buffers (where address
space is readily available) and call WriteFile to issue asynchronous
writes to your driver. The driver requests a scatter-gather list for
the buffer, then puts those into the big ring of disjoint addresses that
your controller is reading from.

You’ve already decided to assume that all the memory in your system is
addressible by your controller, so there won’t be any expensive DMA
mapping done and you should get very good throughput.

>
> > -p
>
> –
> Valenta Ferenc Visit me at
> http://ludens.elte.h u/~vf/
> This advertising space is for sale
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as:
> xxxxx@windows.microsoft.com To unsubscribe send a blank
> email to xxxxx@lists.osr.com
>

Thus spake Peter Wieland :

> Explain again why you’re composing the data in kernel mode? Offhand it
> seems like you could compose it in big user-mode buffers (where address

Because the card does not supports real scatter/gather DMA.
The pages should be linked together by their last longword.
Only the driver knows the physical addresses…

> writes to your driver. The driver requests a scatter-gather list for
> the buffer, then puts those into the big ring of disjoint addresses that
> your controller is reading from.
>
> You’ve already decided to assume that all the memory in your system is
> addressible by your controller, so there won’t be any expensive DMA
> mapping done and you should get very good throughput.

The only problem is, how to allocate and lock so much memory.
If it’s done, there are a couple of methods to handle the buffers, perform
the DMA. But first, the buffer must be allocated.
I’ll try the Mdl trick you said.
Tomorrow, in my timezone, it’s 01:00, going to sleep…


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

Hi!

Boys, you are fantastic. The driver now works fine, with the method
detailed in my previous letter.
I can’t even believe it, must perform some tests…
Thanks everybody, specially to Graham Simon and Peter Wieland!


Valenta Ferenc Visit me at http://ludens.elte.h u/~vf/
“My love is REAL, unless declared INTEGER.”

> > - next MJ_WRITE IRP arrived to the device is passed via

> GetScatterGatherList and its SGL is attached to the tail of the “huge SGL”.
> - it is the “huge SGL” over which the device’s DMA runs.
> - IRPs are completed when the device DMA passed all SGL entries for a
> particular IRP.

Ok, thanks, I didn’t know this. But doesn’t help me to allocate such
a huge buffer…

Such an approach allows you to relax the requirement and to operate without the
huge buffer.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

>An IDE controller is usually part of a chipset (South Bridge) and

is connected to a memory controller by much faster bus (at least 266 MB/s
for not too old chipsets) than a standard PCI 133 MB/s.

Are you sure it is not a plain PCI device?

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> Because the card does not supports real scatter/gather DMA.

The pages should be linked together by their last longword.
Only the driver knows the physical addresses…

Then you have 2 opportunities:

  • major hardware redesign to make the card something like the IDE DMA
    controllers of VX chipset, or OHCI 1394, or aic78xx or ANY other PCI logic
    which proper scatter-gather-list support
    OR
  • have an interrupt on each contiguous chunk.

The card is grossly misdesigned, it mix metadata with data, for instance,

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> >An IDE controller is usually part of a chipset (South Bridge) and

>is connected to a memory controller by much faster bus (at
least 266 MB/s
>for not too old chipsets) than a standard PCI 133 MB/s.

Are you sure it is not a plain PCI device?

Maxim Shatskih, Windows DDK MVP

Yes, I am sure. It looks like a plain PCI device for drivers and BIOS
but it is quite different from HW point of view.

From the datasheet for the old (2000) Intel’s ICH2 I/O Hub
http://www.intel.com/design/chipsets/datashts/29068702.pdf

“The chipset’s hub interface architecture ensures
that the I/O subsystem; both PCI and the integrated I/O features
(IDE, AC’97, USB, etc.), will receive adequate bandwidth. By placing
the I/O bridge on the hub interface (instead of PCI), the hub architecture
ensures that both the I/O functions integrated into the ICH2 and the PCI
peripherals obtain the bandwidth necessary for peak performance.”

Dmitriy Budko, VMware