Few basic doubts related to NT internals

OSR_Community_User · June 10, 2002, 9:42am

I have following doubts

Why do we have 2 separate threads named “Modified Page writer” and
“Mapped Page writer” ? I understand that these two threads do two slightly
separate tasks but why could not we have a single thread doing both the
tasks ?Afterall, both tasks involved flushing data from volatile physical
memory to the secondary storage.
Where exactly is the “copy on write flag located”?Is it in the
represented by some flag in the PFN data structure, or some flag in the
Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?
Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?
PFN for a page in working set contains two entries named “Reference
count” and “Share count”.What is the difference in these two?

Regards,
Prashant S

Dan_Partelly · June 10, 2002, 10:50am

Both those dedicated thread serve to one purpose only: flush modified page
frames to secondary storage, thus ensuring that always will be a certain
number of page frames available for reuse. Logic says (altough I did not
checked in Windbg) that 2 threads are required because the mapped page
writter, while flushing to sencondary stoarge, can cause page faults,
reclaiming free memory for inpage operations to complete. But if there are
no free pages for satisfying the request, the system would deadlock. the
simpliest design is 2 threads , the mapped page can block whithout any
problem. Again , this is an educated guess,
it was not verified by tracing the code.

The page fault handler uses bit 9 in a hardware PTE (IA32 architecture
,the bit is not CPU architectural, but software “overloaded”) to determine
if it must copy the page before a write. However, the whole COW issue is a
bit more complex, and is managed through other core OS structures as well.

Other usefull bits are

bit 10: PrtotypePte
bit11: Transition Pte

The main reasons why a inpage operation have to run at IRQL < DISPATCH is
because the thread must block, waiting for the inpage (brinigng the page
back from secondary storage) operation to complete. Blocking for a non zero
interval at DISPATCH or higher IRQL is imposible. (DISPATCH IRQL is the
level at witch the OS thread dispatcher operates)

The reference count indicates ifany PTE referes to this page in the PFN
database. Valid page frames have a non zero reference count, and this
indicates that the page is activly used. The reference count is decremented
whenever a PTE no longer points to it. When the reference count reach 0,
the page frame is considered no more in use , and is automatically put on
one of 4 different MM lists, which holds unused pages. (bad page list, free
page list, modified page list, and standby page list).
Usualy this is incremented when a page is intially put into the working set,
and later when the page is locked in memory for any purpose, usualy IO. The
reference count is decremented if the ShareCount reach 0, or the page is
beeing unlocked.

If think the share count indicates whatever a process working set references
this. When share count is 0, the page in question is no more owned by a
working set.

Id like Tony Mason to comment a bit on this last point, if he reads this
post, and tell me if Im right here.

----- Original Message -----
From:
To: “NT Developers Interest List”
Sent: Monday, June 10, 2002 4:42 PM
Subject: [ntdev] Few basic doubts related to NT internals

> I have following doubts
>
> 1. Why do we have 2 separate threads named “Modified Page writer” and
> “Mapped Page writer” ? I understand that these two threads do two slightly
> separate tasks but why could not we have a single thread doing both the
> tasks ?Afterall, both tasks involved flushing data from volatile physical
> memory to the secondary storage.
>
> 2. Where exactly is the “copy on write flag located”?Is it in the
> represented by some flag in the PFN data structure, or some flag in the
> Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?
>
> 3. Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?
>
> 4. PFN for a page in working set contains two entries named “Reference
> count” and “Share count”.What is the difference in these two?
>
>
> Regards,
> Prashant S
>
> —
> You are currently subscribed to ntdev as: xxxxx@rdsor.ro
> To unsubscribe send a blank email to %%email.unsub%%
>

OSR_Community_User · June 10, 2002, 11:06am

> 1. Why do we have 2 separate threads named “Modified Page writer” and

“Mapped Page writer” ? I understand that these two threads do two slightly
separate tasks but why could not we have a single thread doing both the
tasks ?Afterall, both tasks involved flushing data from volatile physical
memory to the secondary storage.

I guess, There is a chance of Overload in case of 1 thread, thats why
default it is handling in 2 different seperate threads ( clean )

Where exactly is the “copy on write flag located”?Is it in the
represented by some flag in the PFN data structure, or some flag in the
Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?

In PTE, there is a bit called Read/Write ( 1st bit )
0 - Read only
1 - Read/Write

NT uses this as “copy on write”.

Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?

PageFault handler and Scheduler will be running in IRQL_DISPATCH_LEVEL. In
this case if somethings happens in this level itself it cant switch to page
fault handler to rectify the Fault.

PFN for a page in working set contains two entries named “Reference
count” and “Share count”.What is the difference in these two?

No idea, i have to refer.

Hope somebody points if it is wrong.

Regards,
Satish K.S

OSR_Community_User · June 10, 2002, 11:12am

> 2.

The page fault handler uses bit 9 in a hardware PTE (IA32 architecture
,the bit is not CPU architectural, but software “overloaded”) to determine
if it must copy the page before a write. However, the whole COW issue is a
bit more complex, and is managed through other core OS structures as well.

Other usefull bits are

bit 10: PrtotypePte
bit11: Transition Pte

I have lil doubt

If i enable the write bit in PTE ( 1st bit ) and if i write something, how
does CPU will generate fault ?

Regards,
Satish K.S

Dan_Partelly · June 10, 2002, 11:13am

> I guess, There is a chance of Overload in case of 1 thread, thats why

default it is handling in 2 different seperate threads ( clean )

nope.

In PTE, there is a bit called Read/Write ( 1st bit )
0 - Read only
1 - Read/Write

Implementation is based on page level protection, but you aint right about
this one as well.

this case if somethings happens in this level itself it cant switch to
page
fault handler to rectify the Fault.

something happens indeed. The question was “why”. I bet he was knowing as
well that something happens.

----- Original Message -----
From: “int3”
To: “NT Developers Interest List”
Sent: Monday, June 10, 2002 6:07 PM
Subject: [ntdev] Re: Few basic doubts related to NT internals

>
>
> > 1. Why do we have 2 separate threads named “Modified Page writer” and
> > “Mapped Page writer” ? I understand that these two threads do two
slightly
> > separate tasks but why could not we have a single thread doing both the
> > tasks ?Afterall, both tasks involved flushing data from volatile
physical
> > memory to the secondary storage.
>
> I guess, There is a chance of Overload in case of 1 thread, thats why
> default it is handling in 2 different seperate threads ( clean )
>
> > 2. Where exactly is the “copy on write flag located”?Is it in the
> > represented by some flag in the PFN data structure, or some flag in the
> > Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?
>
> In PTE, there is a bit called Read/Write ( 1st bit )
> 0 - Read only
> 1 - Read/Write
>
> NT uses this as “copy on write”.
>
> > 3. Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?
>
> PageFault handler and Scheduler will be running in IRQL_DISPATCH_LEVEL. In
> this case if somethings happens in this level itself it cant switch to
page
> fault handler to rectify the Fault.
>
> > 4. PFN for a page in working set contains two entries named “Reference
> > count” and “Share count”.What is the difference in these two?
>
> No idea, i have to refer.
>
> Hope somebody points if it is wrong.
>
> Regards,
> Satish K.S
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@rdsor.ro
> To unsubscribe send a blank email to %%email.unsub%%
>

OSR_Community_User · June 10, 2002, 5:28pm

> 1. Why do we have 2 separate threads named “Modified Page writer”
and

“Mapped Page writer” ? I understand that these two threads do two
slightly
separate tasks but why could not we have a single thread doing both
the
tasks ?Afterall, both tasks involved flushing data from volatile
physical
memory to the secondary storage.

Yes, and there is a Cc’s lazy writer also (not a thread, lots of
ExQueueWorkItem callbacks).
The reason is that they are for different tasks.
Lazy writer is to decrease the window under which the cache is dirty
and the crash will be disastrous.
Mapped Page Writer is used to flush the user mappings mapped with
write access, has the same intent as lazy writer, but does this in a
different way.
Modified Page Writer is for pagefiles, its goal is just to increase
the number of available physical pages.

Where exactly is the “copy on write flag located”?Is it in the
represented by some flag in the PFN data structure, or some flag in
the
Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?

One of the “available for OS use” flags in the PTE.

Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?

Because resolving a page fault requires a wait for inpage to complete,
and this is not allowed on DISPATCH_LEVEL.

PFN for a page in working set contains two entries named
“Reference
count” and “Share count”.What is the difference in these two?

Share count is number of valid PTEs pointing to it.
Reference count is number of MDL-based locks on it.

Max

Dan_Partelly · June 10, 2002, 9:56pm

> Share count is number of valid PTEs pointing to it.

Reference count is number of MDL-based locks on it.

Max

Are you sure about this ? Can you point me the exact dissasembly range which
make you conclude this ? I thinink locking with MmProbeAndLockPages
increment the ReferenceCount and not share Count. I really think your wrong
about this , Max.

Regards, Dan

----- Original Message -----
From: “Maxim S. Shatskih”
To: “NT Developers Interest List”
Sent: Monday, June 10, 2002 11:52 PM
Subject: [ntdev] Re: Few basic doubts related to NT internals

> > 1. Why do we have 2 separate threads named “Modified Page writer”
> and
> > “Mapped Page writer” ? I understand that these two threads do two
> slightly
> > separate tasks but why could not we have a single thread doing both
> the
> > tasks ?Afterall, both tasks involved flushing data from volatile
> physical
> > memory to the secondary storage.
>
> Yes, and there is a Cc’s lazy writer also (not a thread, lots of
> ExQueueWorkItem callbacks).
> The reason is that they are for different tasks.
> Lazy writer is to decrease the window under which the cache is dirty
> and the crash will be disastrous.
> Mapped Page Writer is used to flush the user mappings mapped with
> write access, has the same intent as lazy writer, but does this in a
> different way.
> Modified Page Writer is for pagefiles, its goal is just to increase
> the number of available physical pages.
>
> > 2. Where exactly is the “copy on write flag located”?Is it in the
> > represented by some flag in the PFN data structure, or some flag in
> the
> > Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?
>
> One of the “available for OS use” flags in the PTE.
>
> > 3. Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?
>
> Because resolving a page fault requires a wait for inpage to complete,
> and this is not allowed on DISPATCH_LEVEL.
>
> > 4. PFN for a page in working set contains two entries named
> “Reference
> > count” and “Share count”.What is the difference in these two?
>
> Share count is number of valid PTEs pointing to it.
> Reference count is number of MDL-based locks on it.
>
> Max
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@rdsor.ro
> To unsubscribe send a blank email to %%email.unsub%%
>

Dan_Partelly · June 10, 2002, 10:01pm

Sorry Max, I misunderstood what you wanted to say. We both where talking
about same thing, my english understanding was wrong here.

----- Original Message -----
From: “Maxim S. Shatskih”
To: “NT Developers Interest List”
Sent: Monday, June 10, 2002 11:52 PM
Subject: [ntdev] Re: Few basic doubts related to NT internals

> > 1. Why do we have 2 separate threads named “Modified Page writer”
> and
> > “Mapped Page writer” ? I understand that these two threads do two
> slightly
> > separate tasks but why could not we have a single thread doing both
> the
> > tasks ?Afterall, both tasks involved flushing data from volatile
> physical
> > memory to the secondary storage.
>
> Yes, and there is a Cc’s lazy writer also (not a thread, lots of
> ExQueueWorkItem callbacks).
> The reason is that they are for different tasks.
> Lazy writer is to decrease the window under which the cache is dirty
> and the crash will be disastrous.
> Mapped Page Writer is used to flush the user mappings mapped with
> write access, has the same intent as lazy writer, but does this in a
> different way.
> Modified Page Writer is for pagefiles, its goal is just to increase
> the number of available physical pages.
>
> > 2. Where exactly is the “copy on write flag located”?Is it in the
> > represented by some flag in the PFN data structure, or some flag in
> the
> > Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?
>
> One of the “available for OS use” flags in the PTE.
>
> > 3. Why no page fault is allowed above IRQL_DISPATCH_LEVEL ?
>
> Because resolving a page fault requires a wait for inpage to complete,
> and this is not allowed on DISPATCH_LEVEL.
>
> > 4. PFN for a page in working set contains two entries named
> “Reference
> > count” and “Share count”.What is the difference in these two?
>
> Share count is number of valid PTEs pointing to it.
> Reference count is number of MDL-based locks on it.
>
> Max
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@rdsor.ro
> To unsubscribe send a blank email to %%email.unsub%%
>

OSR_Community_User · June 10, 2002, 10:09pm

> > Share count is number of valid PTEs pointing to it.

> Reference count is number of MDL-based locks on it.
>
> Max
>

Are you sure about this ? Can you point me the exact dissasembly
range which
make you conclude this ?

I can.

I thinink locking with MmProbeAndLockPages
increment the ReferenceCount and not share Count.

And this is what I said. Reference count is number of
MmProbeAndLockPages locks on the page, plus IIRC 1 for all share
counts. Share count is number of PTEs. I don’t remember off-head
whether system PTEs from MmMapLockedPages touch the share count, IIRC
not so.

Max

OSR_Community_User · June 11, 2002, 12:34am

> 2. Where exactly is the “copy on write flag located”?Is it in the

represented by some flag in the PFN data structure, or some flag in the
Page Table Entry (PTE) or in the Virtual Address Descriptor (VAD) ?

Let me go back once again.

Non-shared physical pages :
PDE->PTE->PFN->PTE

Page Table entry(PTE) will to Page Frame Number(PFN) ( which has it own
structure and mostly defined by Hardware itself, ex: x86 ), if it is not
shared, this structure will pointing back to Page Table Entry(PTE) only. In
PFN structure there is a field which holds the back pointer to PTE.

In Shared Physical pages :
PDE->PTE->PFN->PPTE

Page Table entry(PTE) will to Page Frame Number(PFN). If it is shared, this
will point to PPTE ( Prototype PTE ), instead of PTE, These PPTE entries
will be maintained by VMM. PPTE has exact format of the hardware PTE. When
the Page fault occurs, VMM will detect wheather it is Copy-On-Write or not
using the PPTE flag which is located in PFN structure itself and acts
occordingly.

CPU dont care as long as it get the value in Back pointer Field of PFN. it
can be either PTE or PPTE.

Hope this helps a bit.

Regards,
Satish K.S