Can we allocate L == P memory ?

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/allocating-system-space-memory

[2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ne-wdm-_pool_type

From the above URLs apparently I could not find a way we can get system memory where logical == physical address. Is it conclusive?

In case somebody is wondering why we need, well, these are possibilites which make things more innovative as I think and also I am sure atleast some drivers out there which will get benefits.

Thanks
Sudhakar

The term “logical address” can have multiple meanings. By “logical” here you mean, specifically, “device bus logical”?? Like, for use in DMA?

Assuming that’s the case, the answer is “no” – There is no case where Windows will guarantee that a Device Bus Logical Address will be the same Physical Address. This is quite deliberate. Device Bus Logical Addresses are an abstraction of the DMA subsystem. WHERE the memory that backs that logical address is physically located is not your concern.

Peter

In case somebody is wondering why we need, well, these are possibilites which make things more innovative
as I think and also I am sure atleast some drivers out there which will get benefits.

You seem to have a “rather peculiar” understanding of innovation…

Can you give a practical example of a situation when imposing “logical_address ==physical_address” limitation may be beneficial?
As with any other limitation, the only benefits that I can see are the ones of removing it.

Anton Bassov

Recent versions of Windows will use the IOMMU if the hardware is known to work correctly. With an IOMMU, Windows can assign any integer value to “L”. So we could make L == P if there’s some good reason to go through the trouble. But I’m having difficulty imagining such a motivation.

sudhakar wrote:

From the above URLs apparently I could not find a way we can get system memory where logical == physical address. Is it conclusive?

In case somebody is wondering why we need, well, these are possibilites which make things more innovative as I think and also I am sure atleast some drivers out there which will get benefits.

In an unusual turn of events, I agree with Anton.  ;)  I think you
simply misunderstand the concepts here.  What you’re saying is basically
nonsense.

What we call the “physical address” is the address that the CPU has to
put on its address bus in order to access memory.  It is meaningful only
to the north bridge and the memory controller.

What we call the “logical address” is specific to a particular I/O bus. 
Imagine, for example, a PCIExpress bridge that, for whatever reason,
only implements a 32-bit address space on the device side, but still
needs to access arbitrary physical memory. The bridge could have a
32-to-64 mapping table to cross the spaces.  When doing DMA, the address
that your driver needs to tell your I/O device is the 32-bit mapping. 
That’s the “logical address”.

So, the need for logical/physical mapping is not under the operating
system’s control.  There is no single “logical address” for a given
physical address.  The physical-to-logical mapping has to be done on a
bus-by-bus basis.

In an unusual turn of events, I agree with Anton

Not really. More on it below…

What we call the “physical address” is the address that the CPU has to put on its address bus in order to access memory.
It is meaningful only to the north bridge and the memory controller.

The physical-to-logical mapping has to be done on a bus-by-bus basis.

…which does not hold true for a x86-based system that is NOT equipped with IOMMU - there is no difference between the bus address (i.e. as it is known to a device) and the memory one (i.e. as it is known to the CPU) on such a system. Certainly, it would be pretty stupid to make an assumption like that in your code - after all, this is what DMA API is for. However, the address in itself is still perfectly meaningful to ANY bus agent on such a system

So, the need for logical/physical mapping is not under the operating system’s control.

…and this part already does not hold true for a x86-based system that IS equipped with IOMMU. The OS software could do such a mapping just fine if it had any interest in doing so…

In other words, our “eternal state of mutual disagreement” still holds…

Anton Bassov

And we are back to the question of “are we describing the Windows architectural model or the model for a given processor” concept, from the discussion in another concurrent thread.

In short… “What Mr. Roberts said.”

In needlessly longer verbiage: In Windows, except under some very limited special circumstances (that I’m not even sure I can articulate), you are never allowed to assume a Device Bus Logical Address is equivalent to a Physical Address. You’re just not. It’s a primary architectural construct of the DMA subsystem. We were doing this when we had “mapping registers”, both real and virtual, before there were ever what we know today as IOMMUs. The concept that the NT team used dates back, believe it or not, to the PDP-11 (Unibus Map Registers… hence the name “Map Registers”, by the way).

There is no reason to assume that the address we use to reference main memory from the CPU’s memory bus is the same address space that we use to reference main memory from a given device bus as part of a DMA operation. These 2+ address spaces could be entirely independent (and use a different mapping granularity than main memory, as well, by the way).

Sooo… “What Mr. Roberts said”

Peter

And we are back to the question of “are we describing the Windows architectural model or the model for a given processor” concept,

Actually, what we are trying to do here is just to give a proper definition of the term “logical address”. More on it below

“What Mr. Roberts said”

…is, IMHO, not entirely correct definition of it. If you asked me to give my definition of this term I would do it the following way:

“Logical address is just a generic term. Its actual meaning depends upon the hardware specifics of the particular system and/or architecture. This address may or may not correspond to the “physical” address as it is known to the CPU, which, again, depends upon the hardware specifics of the particular system and/or architecture ( and, on some platforms, upon the OS-level software as well). As a driver writer one should never ever make any assumptions concerning this part, and always rely upon the system-provided API when dealing with the logical addresses”.

Anton Bassov

To a developer no assumptions about physical addresses: Agreed.

I have realized the word logical is generally associated with I/O.

I was referring virtual address by the word logical without any thought for DMA in the first place. But the posts helped me in understanding kernel inticacies and the nomenclature.

Without the DMA context, suppose in a large RAM system, if many of the drivers can request huge chunk of P == V addresses in the beginning and manage, we can have reduced page tables & perf benefits ??

With DMA context, as I read IOMMU and other stuff now, realizing it is akin to distributed file system or even more the cluster file system semantics retold wrt sharing. P == V for where possible may enable some more innovation here. But I do not have any immediate knowledge on the complexities to suggest a clear thing.

The reason why I mentioned “L==P” in the first place was, I had remembered long back seeing this construct when debugging on a different OS (NetWare). But I do know how it is allocated and what were the usecases and hypothesized some. Looking at documentation of windows, I could not find any such APIs. Then thought I would mention here so I may get some knowledge on typical possible usecases if such possibility exists. I hope this is fine.

thanks
sudhakar

sudhakar wrote:

Without the DMA context, suppose in a large RAM system, if many of the drivers can request huge chunk of P == V addresses in the beginning and manage, we can have reduced page tables & perf benefits ??

Nope.  Remember that, with exactly one exception (the CR3 register),
every single address reference in an x86/x64 processor is a virtual
address that goes through the page tables.  Every single one.  There
is simply no hardware mechanism that allows you to bypass the page
tables, so there is absolutely nothing to be gained by a physical ==
virtual mapping.

The Windows kernel is smart enough to use 4MB pages where possible for
big kernel sections, which eases the memory burden for the page tables,
but that’s all implementation detail that has no operational impact on
drivers or applications.

Because virtual memory address translation is so fundamental to
low-level operation, that part of the processor architecture is highly
optimized.  The look-aside tables mean that the address translation
rarely adds more than 1 cycle to an instruction.

The reason why I mentioned “L==P” in the first place was, I had remembered long back seeing this construct when debugging on a different OS (NetWare). … Looking at documentation of windows, I could not find any such APIs.

There are no APIs, because there is no hardware mechanism to abstract. 
It is what it is.  Seriously, there is nothing to see here.  You should
move on to things you CAN control.

so there is absolutely nothing to be gained by a physical == virtual mapping.

Let’s continue our “history of mutual disagreements”…

Let’s assume we have a x86_64 system with N GB of physical memory. How many page tables do we need in order to make a linear mapping( i.e. the one where virtual_address==fixed_offset+physical_address), of all this memory into the kernel address space?

If we ignore memory holes for the sake of simplicity, our calculations stand as following;

  1. 1 + N/(512*512) Level 2 tables
  2. 1 + N/512 Level 3 tables
  3. N Level 4 tables

In practical terms, we would need few extra page tables in order to handle the memory holes, but this number is negligible.

Now compare it to the “loose” mapping, i.e the one that maintains no correspondence between the physical and virtual addresses whatsoever. Which of them do you think is going to require more page tables?

Another point to consider is that allocations/deallocations of linearly-mapped memory reduce to simple heap operations, without any need for messing around with kernel VADs,PTEs and other things related to physical-to-virtual memory mapping.

Certainly, one may claim that these advantages are insignificant. However, if we look at Linux we are going to notice that kmalloc- area allocations (i.e the ones where virtual_address==fixed_offset+physical_address) are encouraged, and vmalloc-area ones (i.e the ones where above mentioned virtual_address -to-physical_address mapping does not hold) are, for the reasons of performance, frowned upon…

Anton Bassov