Maximum 16 map registers?

Hi,

We just came across some server hardware that behaves a bit unusual. Specifically the call to IoGetDmaAdapter returns a mere 16 as the number of maximum map registers.

Does anyone have an idea why this might be happening?

We have tested our drivers on various x64 machines with a limited number of map registers, but 16 is so low that we have serious timing issues (incoming packets get lost due to very high input rates).

Some basic information about the machine:
Fujitsu Siemens Primergy RX300 S3
Dual Core Intel Xeon 5160
Windows Server 2003 - 5.2.3790 SP2 (32 bit)
BIOS: Fujitsu Siemens, Phoenix Technologies Ltd 4.06 Rev 1.07.2119 2/1/2007
SMBIOS: 2.34
RAM: 4GB

On the Properties tab of “My Computer”, under the amount of memory it says “Physical Address Extension”.

I tried tweaking with some BIOS settings but to no avail. I also tried booting with the /NOPAE switch but there was no change still (even the “Physical Address Extension” label was still there on the properties tab of “My Computer”).

Any help would be greatly appreciated.

Regards
Dimitris Staikos
Unibrain

Some more information:
(*) Booting with 2GB of RAM results in a reasonable number for maximum map registers (524289) and everything works OK.
(*) Booting with 4GB and /BURNMEMORY=1024 results in 524289 map registers.
(*) Booting with 4GB and /BURNMEMORY=1023 results in 16 map registers.

We don’t have in our lab the appropriate type of memory for this motherboard so we could not boot this machine with more than 4GB and see how it behaves, so I don’t have that info.

Regards,
Dimitris Staikos
Unibrain

Have you tried using /MAXMEM to limit memory instead of /BURNMEMORY? IIRC
burnmemory take the memory from low kernel address space, which is the same
area the map registers are allocated from.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntdev…
> Some more information:
> () Booting with 2GB of RAM results in a reasonable number for maximum
> map registers (524289) and everything works OK.
> (
) Booting with 4GB and /BURNMEMORY=1024 results in 524289 map
> registers.
> (*) Booting with 4GB and /BURNMEMORY=1023 results in 16 map registers.
>
> We don’t have in our lab the appropriate type of memory for this
> motherboard so we could not boot this machine with more than 4GB and see
> how it behaves, so I don’t have that info.
>
> Regards,
> Dimitris Staikos
> Unibrain
>

xxxxx@unibrain.com wrote:

Some more information:
(*) Booting with 2GB of RAM results in a reasonable number for maximum map registers (524289) and everything works OK.
(*) Booting with 4GB and /BURNMEMORY=1024 results in 524289 map registers.
(*) Booting with 4GB and /BURNMEMORY=1023 results in 16 map registers.

We don’t have in our lab the appropriate type of memory for this motherboard so we could not boot this machine with more than 4GB and see how it behaves, so I don’t have that info.

Right. The threshold where this happens actually depends on your
motherboard. Here’s the explanation.

The motherboard has to leave room in the physical address space for the
PCI bus. Motherboards do that division differently. Some allocate
3.5GB for memory and 0.5GB for PCI. Some do 3GB/1GB. Some actually do
2GB/2GB. Yours, apparently, does 3GB/1GB.

The physical memory that doesn’t fit in that lower space gets assigned
physical addresses above 4GB. So, the memory map of your machine looks
like this:

0-1GB RAM
1-2GB RAM
2-3GB RAM
3-4GB PCI devices
4-5GB RAM

As long as ALL of the physical memory in use has 32-bit addresses,
Windows does not need to use DMA bounce buffers at all. The DMA
abstraction APIs become a do-nothing passthrough, so you can pretend
that there are an infinite number of map registers.

But as soon as ANY page of memory has a physical address beyond 32-bits,
the operating system has to assume that every DMA operation will involve
one of those pages, so it has to allocate real bounce buffers in the low
4GB. It only allocates a few such bounce buffers, so you only get a few
map registers.

When you burn all of that extra gigabyte, all of the accessible physical
memory fits in 32-bits. When you burn less than a full gigabyte, some
of it lives above 4GB, and you need bounce buffers and map registers.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thanks Don and Tim!

Some light has been shed into to the situation, but still it remains a mystery to me why the operating system decides to set just 16 as the map register limit, instead of 256 which we get on x64 systems (which is a value we can work with).

We tested with the MAXMEM switch, but if we specify anything greater than 3072 (3GB) the OS always works as if we had specified exactly 3072 and we always get the practically unlimited number of map registers (524289).

By applying some “common sense reverse engineering by hypothesis” I can assume the following:
Both 256 that we get on x64 systems and 16 that we get on this weird machine are non-random numbers, and they are always the same no matter how much memory is used. This can only mean that they are *probably* not the result of some formula, but rather values hardwired into the operating system, with 256 being the norm for 64-bit systems and 16 being some sort of corner case or a fallback case where Windows is confused about what to do so it gives out a value that is very “safe”.

So now I am trying to figure out how we can talk the OS into returning a reasonable number like 256 instead of 16. Any suggestions?

Warm Regards,
Dimitris Staikos
Unibrain

wrote in message news:xxxxx@ntdev…
>
> We tested with the MAXMEM switch, but if we specify anything greater than
> 3072 (3GB) the OS always works as if we had specified exactly 3072 and we
> always get the practically unlimited number of map registers (524289).
>

On the memory greater than 3GB look at how much memory the system is using.
There are a number of the Intel chipsets that just grab the top 1GB so even
if there is 4GB of RAM the system only acts as if there is 3GB. This is
what Tim was commenting on about motherboards and chipsets.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

Don, I am not quite sure I get what you mean. If I understand Tim correctly, in my case when only 3GB of RAM is used, plus 1GB by the Intel chipset, I end up with 4GB total which is good enough for 32-bit addressing so practically the map registers get out of the picture and I get a huge number as the maximum number of map registers. When I go slightly above 4GB (RAM+space reserved by the chipset), I only get 16 map registers.

My question remains, why 16 and not 256 like it is on x64 systems? 16 is way too low. Isn’t there anything we can do to change this?

Dimitris Staikos
Unibrain

In an earlier post you stated:

“We tested with the MAXMEM switch, but if we specify anything greater than
3072 (3GB) the OS always works as if we had specified exactly 3072 and we
always get the practically unlimited number of map registers (524289).”

What I was referring to is that I suspect your Intel motherboard’s chipset
is taking the other gigabyte and that is why it acts like 3072.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntdev…
> Don, I am not quite sure I get what you mean. If I understand Tim
> correctly, in my case when only 3GB of RAM is used, plus 1GB by the Intel
> chipset, I end up with 4GB total which is good enough for 32-bit
> addressing so practically the map registers get out of the picture and I
> get a huge number as the maximum number of map registers. When I go
> slightly above 4GB (RAM+space reserved by the chipset), I only get 16 map
> registers.
>
> My question remains, why 16 and not 256 like it is on x64 systems? 16 is
> way too low. Isn’t there anything we can do to change this?
>
> Dimitris Staikos
> Unibrain
>

Just as a final update on this thread, let me inform everyone on our findings on this issue.

It seems that when 32-bit Windows is running on 32-bit hardware with PAE capabilities or 64-bit hardware with PAE capabilities, and (RAM + PCI Space Reserved by Intel Chipset)>4GB then Windows sets the maximum number of map registers to 16. This was witnessed on several machines and informally confirmed by Peter Wieland in an email communication we had.

I don’t have the reasoning why such a small number was selected by the kernel team.
What I know is that when the system is in this configuration there are LOTS of problems. Looks like most people haven’t properly tested their drivers on such a configuration.

We did various tests on the same PC with 2GB or RAM and then 8GB of RAM.
For starters the 2GB configuration is much faster. Booting is much faster, login is instant while at 8GB it takes more than a minute, etc
Moreover it looks like even standard drivers failed to function properly on the 8GB configuration. For example 1394 camera devices (using the Microsoft drivers) would be visible to MSN’s video configuration dialog when running with 2GB, but no camera would be available on the same dialog when running on 8GB (even though the camera device was present in device manager).
May be it is a bug in MSN but when we installed our own IIDC camera driver (running on top of the Microsoft 1394 stack) then the camera was listed in MSN even at 8GB, however the preview video dialog was displaying an “all grey” image. The camera was started and streaming all right, so it somehow occurred to me to try a remote desktop from my PC to the test PC and open the video configuration dialog on the remote session. Guess what, I could see the video image on the remote session, but not on the local session! Does that mean that the video drivers had some sort of problem on displaying video the way the MSN video configuration dialog asked them to? Who knows?

Of course the testing we did was not scientific or complete and sound, but we have other work to do as well instead of testing MS corner cases :slight_smile: Our client also confirmed that they had several other problems with the “max 16 map registers” server machines, so they dumped these machines and will use different hardware. May be that was Microsoft’s intention when they set the map register limit to 16; that everybody has so many problems with this configuration that they finally dump it.

Dimitris Staikos
Unibrain

Regarding the reason why it is set to 16, I think that is basically an historic accident. The original use of the ‘map register’ functionality on x86 platforms was to provide a mechanism for ISA devices to function in a 32 bit world. Those devices had to access physical memory below 16MB, that memory was considered to be a very scarce resource, and so each device was allotted at most 64k at a time.

Microsoft, in my opinion, has no motivation to improve this situation for 32bit crippled PCI devices. They would prefer that such PCI devices go away and stop being plugged into their systems. Consequently, brain-dead 32bit PCI devices will continue to suffer from a 64k block of map registers until those devices are as rare as isa devices.

That said, if your driver is well behaved, other than being slow, it should not have ‘lots of problems’. In fact it should not have any problems at all.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-303147-
xxxxx@lists.osr.com] On Behalf Of xxxxx@unibrain.com
Sent: Sunday, October 14, 2007 6:41 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Maximum 16 map registers?

Just as a final update on this thread, let me inform everyone on our
findings on this issue.

It seems that when 32-bit Windows is running on 32-bit hardware with
PAE capabilities or 64-bit hardware with PAE capabilities, and (RAM +
PCI Space Reserved by Intel Chipset)>4GB then Windows sets the maximum
number of map registers to 16. This was witnessed on several machines
and informally confirmed by Peter Wieland in an email communication we
had.

I don’t have the reasoning why such a small number was selected by the
kernel team.
What I know is that when the system is in this configuration there are
LOTS of problems. Looks like most people haven’t properly tested their
drivers on such a configuration.

We did various tests on the same PC with 2GB or RAM and then 8GB of
RAM.
For starters the 2GB configuration is much faster. Booting is much
faster, login is instant while at 8GB it takes more than a minute, etc
Moreover it looks like even standard drivers failed to function
properly on the 8GB configuration. For example 1394 camera devices
(using the Microsoft drivers) would be visible to MSN’s video
configuration dialog when running with 2GB, but no camera would be
available on the same dialog when running on 8GB (even though the
camera device was present in device manager).
May be it is a bug in MSN but when we installed our own IIDC camera
driver (running on top of the Microsoft 1394 stack) then the camera was
listed in MSN even at 8GB, however the preview video dialog was
displaying an “all grey” image. The camera was started and streaming
all right, so it somehow occurred to me to try a remote desktop from my
PC to the test PC and open the video configuration dialog on the remote
session. Guess what, I could see the video image on the remote session,
but not on the local session! Does that mean that the video drivers had
some sort of problem on displaying video the way the MSN video
configuration dialog asked them to? Who knows?

Of course the testing we did was not scientific or complete and sound,
but we have other work to do as well instead of testing MS corner cases
:slight_smile: Our client also confirmed that they had several other problems with
the “max 16 map registers” server machines, so they dumped these
machines and will use different hardware. May be that was Microsoft’s
intention when they set the map register limit to 16; that everybody
has so many problems with this configuration that they finally dump it.

Dimitris Staikos
Unibrain


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks for your response Mark!

Microsoft, in my opinion, has no motivation to improve this situation
for 32bit crippled PCI devices. They would prefer that such PCI devices
go away and stop being plugged into their systems.
Consequently, brain-dead 32bit PCI devices will continue to suffer from
a 64k block of map registers until those devices are as rare as isa devices.

I hadn’t realized that PCI devices that ONLY have Dma32BitAddresses==TRUE were nowadays considered crippled. I thought most devices are like that :slight_smile: Anyway, since we are working with 1394 adapters, which use the OHCI 1.1 spec, we are stuck with 32-bit DMA addressing.

That said, if your driver is well behaved, other than being slow,
it should not have ‘lots of problems’.
In fact it should not have any problems at all.

It’s clear to me now that my previous points were not so clear, so I will try to clarify using some real numbers in a LONG topic.

When the device you work with is under your “complete” control, being slow is not a big issue.
However when your device operates autonomously, shooting data at you at rates up to 800Mbps, then you better be fast enough or you miss data. This is exactly what happens with 1394 isochronous transmissions and hi-res 1394 digital cameras that are capable of streaming at very high rates.

Isochronous transmissions are “guaranteed bandwidth” but not “guaranteed delivery”.
Isochronous packets arrive every 125 microseconds.
If one 1394 packet is lost because the 1394 chip is not ready to do DMA when the packet is transmitted on the cable then the whole frame is useless.
Moreover due to the packet header format you have no 100% reliable way of detecting that something bad happened. As a bonus the next frame also gets lost (even if you manage to detect that the previous was bad). I will be happy to clarify both points to anyone interested.

Now let’s see why max 64KB DMA transfer is a big pain for 1394 cameras.

Instead of speaking theoretically I will perform an example calculation using a medium-resolution format, specifically IIDC Format_1 Mode_3 which is 1024x768 YUV 4:2:2 at 15 fps.
In this setup each frame consists of 512 packets of 3072 bytes each.

If I decide to use all 16 available map registers in one DMA transfer I will be able to map up to 64KB, which gives me 65536/3072=21.33 packets. Let’s forget page alignment and suppose that indeed 21 packets can always be mapped.
This means that the driver will build a DMA context program for the 1394 chip that will be able to receive 21 isochronous packets. After the 21st packet is received an interrupt occurs and the 1394 DMA context is HALTED.

At this point we have exactly 125 microseconds before the 22nd packet flies on the cable.
125 microseconds to get into the ISR, have the DPC queued, executed, process the DMA context program descriptors to find out what happened, free DMA resources and prepare the DMA context program for the next 21 packets and restart the halted DMA context (which is not an instant operation).

Since each frame is 512 packets, I will have to break each frame into 512/21=24.38==>25 dma transfers.
This means that 24 times for each incoming frame I depend on very delicate timing.

How does it work in practice? Very very poorly.
So at some point not so long ago we switched to what we call “Double Buffering Isochronous Receive”.

This means that when a single frame cannot fit into one DMA transfer, we split it into pieces that are each HALF the maximum DMA transfer. This way we have two active DMA transfers programmed on the 1394 chip.
The 1394 chip generates an interrupt half-way (for the 1st transfer) and continues receiving isochronous packets into the buffers of the 2nd DMA transfer. Instead of a mere 125 microseconds, the isochronous request completion processing cycle has much more time to do its work, plus the DMA context on the 1394 chip does NOT get halted.

Let’s see how this translates into the 16 map register case. Instead of using all 16 map registers in one DMA transfer, we program 2 DMA transfers using 8 map registers each.
32KB/3072 = 10.66 which results into 10 isochronous packets in each DMA transfer (ignoring page alignment).

So now we have 2 active DMA context programs on the 1394 chip. When the first one completes the drivers have 10*125 microseconds to do their job. This sounds much more promising but in the tests we run (on pretty recent and decent hardware) the results are not good.

Here are some test results:

YUV422 800x600 7.5fps –> Actual rate reported 5.6 fps. Image quality “looks” OK
YUV422 800x600 15 fps –> Actual rate reported 10.9 fps. Image very often is “scrambled” (packets may contain a non-integral number of scan lines so a lost packet produces a visible rearrangement in the rest of the image).
YUV422 1024x768 7.5fps –> Actual rate reported 0.xx fps. Image is always “scrambled”.

Btw, YUV422 1024x768 7.5 fps uses isochronous receive double buffering with 18 packets in each DMA transfer and still the system can’t keep up. Going at 15 fps (10 isochronous packets per DMA transfer) simply makes the situation worse (if it really makes any difference to say that something is worse than 0.xx fps).

I gather that we could detect the 16 map register limitation and try to use Common Buffer DMA instead, writing whole new bunches of code just for these fancy machines.

I don’t know if it is worth it. I think that max 16 map registers is plain silly in today’s hardware under any modern OS configuration.

Since we mainly deal with industrial clients and software houses, our only option is to advise them NOT to use such hardware with our drivers. They won’t crash if they do, but they won’t be able to do much meaningful work with 1394 digital cameras.

Warm Regards,
Dimitris Staikos
Unibrain