"Page Fault article" how H/W translate address

i read this (Page Fault) article, there is confusion related to this
concept, the line are:
“Windows and the underlying hardware platform agree upon a means to tell
the hardware how to translate the address of a virtual page into a
corresponding physical page”.
i want to know how H/W Perform Address translation? Is not responsibility
of Operating System.
Secondly, how operating system map virtual address on physical.

thanks in advance, with Regard
Naushahi

Naushahi wrote:

i read this (Page Fault) article, there is confusion related to this
concept, the line are:
“Windows and the underlying hardware platform agree upon a means to tell
the hardware how to translate the address of a virtual page into a
corresponding physical page”.
i want to know how H/W Perform Address translation? Is not responsibility
of Operating System.
Secondly, how operating system map virtual address on physical.

I don’t know where you found this article. I have to assume that the
sentence you’re quoting is out of context, because it doesn’t seem to
say much.

Each cpu platform where Windows runs has some sort of virtual memory
mechanism. Ia32, with which I’m very familiar, has a 2-tier page table
structure and associated address translation hardware. Program code uses
32-bit virtual addresses. The CPU’s translation logic decomposes the
virtual address into a page-directory index, a page index, and an
offset. It uses the directory and page indices to locate a page table
entry. The page table entry in turn either points to a page frame in
physical memory, or else it contains a “fault” bit that causes the cpu
to generate a page fault whenever a program tries to access memory
within that page. The operating system’s virtual memory manager handles
the page fault by fetching a page from a swap file or by doing any of a
few other things.

There are a great many other details about this, and I’m pretty sure
someone will jump into this thread to point out some of them, but these
are the basics.

The Inside Windows Xxx books contain a thorough discussion of virtual
memory management in the NT line of systems.


Walter Oney, Consulting and Training
Basic and Advanced Driver Programming Seminars
Now teaming with John Hyde for USB Device Engineering Seminars
Check out our schedule at http://www.oneysoft.com

Hi, Walter Oney

thanx for reply. i Found this article from OSROnline, to give the
reference i am pasting this
So What Is A Page Fault?
OSR Staff | Published: 06-May-03| Modified: 07-May-03

may u read this in detail to get concept.
another thing u say “address translation hardware”. is there any physical
programe code hardware? with it operating system and CPU interact for
address tanslation, explain about this.

thankx with regard
Naushahi

The problem with trying to explain exactly how address translation is done
is that it varies from machine to machine.

On a current x86 CPU address translation is done by taking the segment
register and offset register and computing the linear address. In Windows,
that linear address is the virtual address. The CPU then first looks in a
page translation cache it maintains (the “translation lookaside buffer”) to
see if it already knows the virtual to physical page translation. If there
is no entry in the cache, it then uses the physical address in the CR3
register as the base address of the page tables.

The format of the page tables is defined by the x86 processor. You’d think
it would be simple, but it isn’t. For a typical machine (that is, an x86
that is not using some of the unusual options) the CR3 is the address of the
page directory. There are 1024 entries, with each entry being 4 bytes long.
Each 4 byte entry can address a single page (it uses 20 bits for this, the
other bits are control information). That new page is a “page table”. A
single page table consists of 1024 entries, each being 4 bytes long. Each
entry can address a single page (again, using 20 bits to do it). If it does
address a page, that is actually the page containing the data.

So, if you have a 32 bit linear address, the 10 most significant bits are
used to find the correct entry in the page directory (since 2^10 = 1024).
The next 10 bits are used to find the correct entry in the page table
(again, 2^10 = 1024). That means you have the low 12 bits - and 2^12 is
4096, the exact size of one page. So the low 12 bits of the address are
used to find the correct BYTE on the page.

One of the control bits in both the page directory as well as the page table
is the “Valid” bit. If this bit is cleared, it tells the hardware that the
entry is NOT valid, which will cause a page fault. If this bit is set, it
tells the hardware that the entry is valid, so that it may use the 20 bits
that represent a page address.

Again, please note that I restricted my example to one specific case of the
x86. The process is SIMILAR in other cases. For example, the AMD-64 in
LONLONG mode uses five levels of translation in its page tables and each
entry is 8 bytes. The IA-64 uses 8 byte entries and pages are 8KB.

I hope this helps.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Naushahi [mailto:xxxxx@hotmail.com]
Sent: Wednesday, May 14, 2003 12:56 AM
To: NT Developers Interest List
Subject: [ntdev] Re: “Page Fault article” how H/W translate address

Hi, Walter Oney

thanx for reply. i Found this article from OSROnline, to give the
reference i am pasting this
So What Is A Page Fault?
OSR Staff | Published: 06-May-03| Modified: 07-May-03

may u read this in detail to get concept.
another thing u say “address translation hardware”. is there any physical
programe code hardware? with it operating system and CPU interact for
address tanslation, explain about this.

thankx with regard
Naushahi


You are currently subscribed to ntdev as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

> may u read this in detail to get concept.

another thing u say “address translation hardware”. is there any physical
programe code hardware? with it operating system and CPU interact for
address tanslation, explain about this.

To add a little to what TGony said and more directly address your point:

The hardware processor has some method (or maybe more than one method) of
taking a virtual address and determining where in physical memory that
address resides. The general method is to use what are called “page
tables”, which are arrays of partial pointers, that are indexed by pieces of
the original virtual address. The entries in the page tables are then added
together, ored together, or used as indexes for further page table accesses,
until a final address is arrived at. In almost any point in this operation
a page table entry can indicate that it does not contain a valid address,
which will result in an address translation failure, commonly known as a
‘page fault’.

These tables can be very complicated, and the format is defined by the
hardware. However, the OS has to *build* these tables, because it it the
OS, and not the hardware, that knows where it is putting things in physical
memory. So the OS has to know as much about address translation as the
hardware does, and they have to agree on how it is done.

So the OS and hardware interact in translation. The OS knows where it is
putting things in memory. It then builds the page tables (in the format the
hardware wants) to say where the items live in memory. It then tells the
hardware where to find the page tables it built. Once the OS has told the
hardware about the tables, the hardware can use them to translate virtual
addresses to the real physical addresses.

That is a VERY brief and shallow overview! Address translation can be very
complex, both in the hardware and in the OS. It is even more complex than
WDM driver design, though not by much.

Loren

And in case of a pagefault, the Processor/Os gets the corresponding page
from the disk and loads it into Physical Memory. I would like to know the
mechanism behind this!! Does this go as an interrupt to which the ‘Disk
Driver’ provides the ISR…

Giri.

The problem with trying to explain this area is really that different
hardware works differently. For example, the MIPS processor (where Windows
once ran in its history) did not have page tables - only a TLB. While this
might not seem to make a difference, since the MIPS was the original target
processor it impacted the overall NT design.

The x86 is more complicated because you also have to go through the extra
step of computing the linear address. Windows mitigates that by using flat
addressing (they set up the selectors and then don’t change them), but the
CPU still supports this. Windows even USES it in a few cases.

I’d argue that building a robust VM system is considerably more complex than
designing a WDM driver. Imagine just the nightmare of supporting hot
swappable memory!

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Loren Wilton [mailto:xxxxx@earthlink.net]
Sent: Wednesday, May 14, 2003 6:27 AM
To: NT Developers Interest List
Subject: [ntdev] Re: “Page Fault article” how H/W translate address

may u read this in detail to get concept.
another thing u say “address translation hardware”. is there any physical
programe code hardware? with it operating system and CPU interact for
address tanslation, explain about this.

To add a little to what TGony said and more directly address your point:

The hardware processor has some method (or maybe more than one method) of
taking a virtual address and determining where in physical memory that
address resides. The general method is to use what are called “page
tables”, which are arrays of partial pointers, that are indexed by pieces of
the original virtual address. The entries in the page tables are then added
together, ored together, or used as indexes for further page table accesses,
until a final address is arrived at. In almost any point in this operation
a page table entry can indicate that it does not contain a valid address,
which will result in an address translation failure, commonly known as a
‘page fault’.

These tables can be very complicated, and the format is defined by the
hardware. However, the OS has to *build* these tables, because it it the
OS, and not the hardware, that knows where it is putting things in physical
memory. So the OS has to know as much about address translation as the
hardware does, and they have to agree on how it is done.

So the OS and hardware interact in translation. The OS knows where it is
putting things in memory. It then builds the page tables (in the format the
hardware wants) to say where the items live in memory. It then tells the
hardware where to find the page tables it built. Once the OS has told the
hardware about the tables, the hardware can use them to translate virtual
addresses to the real physical addresses.

That is a VERY brief and shallow overview! Address translation can be very
complex, both in the hardware and in the OS. It is even more complex than
WDM driver design, though not by much.

Loren


You are currently subscribed to ntdev as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Giri,

This is completely inside the OS - the hardware attempts the translation and
if it fails it generates a page fault - then it is up to the OS to handle it
from there!

Processing a page fault is a very intensive task - the one routine that
handles this is thousands of lines of code, just to decide if the page fault
SHOULD be handled.

If a given fault should be handled (so the page table can be modified and
the faulting operation restarted) the OS needs to decide how to handle it.
There are numerous cases here and doing them off the top of my head I risk
missing them, but here goes:

  • The page could actually BE valid, but the access be disallowed (e.g.,
    writing on a read-only page). The OS throws an exception, which of course
    will skip back up the stack PAST the interrupt frame. Hope your driver or
    application can handle the exception!

  • The page could be valid, the access disallowed but the page be “copy on
    write”. The OS allocates a new physical page, copies the data contents of
    the current page to the new page and changes the page table to point to that
    new page (oh, and adjusting the other data structures internally, of
    course).

  • The page could be invalid but the page is “demand zeroed” in which case
    the OS will fix up the page table to point to this newly filled in zeroed
    out page.

  • The page could be invalid and the data contents stored in a paging file.
    In this case the OS uses the page table entry to figure out which paging
    file and which page within the paging file. The OS allocates a physical
    page and then reads the data contents in from the paging file.

  • The page could be invalid and the data contents backed by a section. In
    this case the OS will figure out where the data for the section is located.
    In most cases the section is file backed, so the OS will allocate a new
    physical page and initiate I/O from the file.

The OS “knows” where the data for different virtual addresses exist because
it maintains a “virtual address descriptor tree” (VAD tree) that describes
each range of virtual pages within the address space (note: there are
different rules for the OS part of the address space since there is only ONE
there and it is present in all process contexts).

If the OS must read data it calls the I/O Manager to do so (IoPageRead) and
provides it with an MDL describing the newly allocated physical memory. The
I/O Manager then builds an IRP_MJ_READ, notes this is paging I/O
(IRP_PAGING_IO and IRP_NOCACHE) and sends it to the owner of the file object
specified in the call to IoPageRead. Typically this is a file system
driver. File system drivers MUST have special handling for paging I/O, and
particularly paging FILE I/O. They then figure out where the data is
located on disk and call down to the block storage driver (whatever it is)
to retrieve the appropriate data.

Presumably, this eventually requires involvement from a disk driver. Not
all disk drivers are interrupt driven, although most are. But this is just
part of the normal storage stack.

One other odd note: paging I/O operations are called at IRQL APC_LEVEL
because the system needs to avoid additional reentrancy. This is to
mitigate the threat of stack overflows and reentrancy deadlocks.

Unfortunately, I can go on for days about VM (and some of the list
subscribers have heard some of my lectures on this topic area.) It is just a
complex topic.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Seshagiri Babu K.V. [mailto:xxxxx@sasken.com]
Sent: Wednesday, May 14, 2003 6:46 AM
To: NT Developers Interest List
Subject: [ntdev] Re: “Page Fault article” how H/W translate address

And in case of a pagefault, the Processor/Os gets the corresponding page
from the disk and loads it into Physical Memory. I would like to know the
mechanism behind this!! Does this go as an interrupt to which the ‘Disk
Driver’ provides the ISR…

Giri.


You are currently subscribed to ntdev as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

> I’d argue that building a robust VM system is considerably more complex
than

designing a WDM driver. Imagine just the nightmare of supporting hot
swappable memory!

I did this a couple years ago on Whistler. Also hot-swappable APICs and
part of hot swappable CPUs, but at the time there wasn’t quite enough
support available in NT to do that. We had a whole program for
hot-swappable most-everything, like we’ve done on mainframes for the better
part of 25 years now. The Kernel guys were not real happy when they saw
some of the HAL hacks I had to do to make that stuff work. I wasn’t either,
but I didn’t have access to the source to do it right.

Loren