Dispatcher objects from a paged pool

Rotor_Clix · March 18, 2011, 7:16pm

Never allocate dispatcher objects from a paged pool. If you do, it will cause occasional system bugchecks.
Why?

“Because dispatcher objects are manipulated by the dispatcher at interrupt request levels (IRQLs) greater than or equal to DISPATCH_LEVEL,”
http://www.osr.com/crash_analysis.pdf

What manipulates objects?
Dispatcher itself. Right?

Dispatcher also, brings back paged-out memory, doesn’t it? So even dispatcher object is paged-out, it must not be a problem because the working component dispatcher itself.

I don’t understand why allocating objects from paged pools cause a problem.

Doron_Holan · March 18, 2011, 7:20pm

Because dispatcher objects are referenced at DISPATCH_LEVEL and you cannot handle a page fault at dispatch. Same rules apply for dispatcher objects on the stack with a UserMode wait (which allows the stack to be paged out)

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Friday, March 18, 2011 4:15 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Dispatcher objects from a paged pool

Never allocate dispatcher objects from a paged pool. If you do, it will cause occasional system bugchecks.
Why?

“Because dispatcher objects are manipulated by the dispatcher at interrupt request levels (IRQLs) greater than or equal to DISPATCH_LEVEL,”
http://www.osr.com/crash_analysis.pdf

What manipulates objects?
Dispatcher itself. Right?

Dispatcher also, brings back paged-out memory, doesn’t it? So even dispatcher object is paged-out, it must not be a problem because the working component dispatcher itself.

I don’t understand why allocating objects from paged pools cause a problem.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Rotor_Clix · March 18, 2011, 7:31pm

Thanks Mr. Holan but which part of kernel handle page-fault?
Isn’t the answer thread dispatcher?

Tim_Roberts · March 18, 2011, 7:41pm

xxxxx@gmail.com wrote:

Thanks Mr. Holan but which part of kernel handle page-fault?
Isn’t the answer thread dispatcher?

What would lead you to that conclusion? The thread dispatcher handles
the scheduling of threads. Pages faults are handled by the page fault
handler, in the memory manager. A page fault does not require a new
thread – it’s handled on the faulting thread.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Doron_Holan · March 18, 2011, 7:44pm

It doesn’t matter who handles the fault, what matters is the irql when the fault occurred. If there is a bug in the kernel touching paged pool at dispatch outside of the page fault handler, it still is a bugcheck. It not as if the pf handler can specialcase faults

d

dent from a phine with no keynoard

-----Original Message-----
From: xxxxx@gmail.com
Sent: Friday, March 18, 2011 4:31 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Dispatcher objects from a paged pool

Thanks Mr. Holan but which part of kernel handle page-fault?
Isn’t the answer thread dispatcher?

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Rotor_Clix · March 18, 2011, 8:39pm

“What would lead you to that conclusion?” Tim Roberts

1-) Never allocate dispatcher objects from a paged pool
2-) dispatcher objects are manipulated by the dispatcher at (…) DISPATCH_LEVEL
3-) you cannot handle a page fault at dispatch

I think I just combined 3 piece of information. So I concluded page-fault handler must run at dispatch_level and dispatcher runs at dispatch_level so dispatcher handle page faults. But this is wrong.

Because I just search and see that page faults handle at apc_level.

Is that right: For example Scheduler selects a thread but it is paged-out so it calls or selects(?) page fault handler, page fault handler brings its pages, and after that scheduler makes the thread runs. Right?

If right, it can also makes page fault brings back the paged-out dispatcher object. (This is the confusion that I am trying to figure out)

Thanks Doron Holan and Tim Roberts

Tim_Roberts · March 18, 2011, 8:52pm

xxxxx@gmail.com wrote:

“What would lead you to that conclusion?” Tim Roberts

1-) Never allocate dispatcher objects from a paged pool
2-) dispatcher objects are manipulated by the dispatcher at (…) DISPATCH_LEVEL
3-) you cannot handle a page fault at dispatch

I think I just combined 3 piece of information. So I concluded page-fault handler must run at dispatch_level and dispatcher runs at dispatch_level so dispatcher handle page faults. But this is wrong.

I see your reasoning. Yes, “dispatch_level” is used in many cases other
than by the dispatcher.

Is that right: For example Scheduler selects a thread but it is paged-out so it calls or selects(?) page fault handler, page fault handler brings its pages, and after that scheduler makes the thread runs. Right?

Not exactly. The kernel thread data structures themselves are never
paged out. The process memory space might be paged out, but the
scheduler doesn’t worry about that. It merely selects the thread and
starts it executing. If the very first instruction that thread runs
happens to be paged out, then the process will immediately get a
user-mode page fault (from the processor chip itself). The page fault
handler will read in the page, and the thread continues on until it
touches another page that is paged out.

In all but some very exceptional cases, paging can simply be ignored.
It is something that happens magically in the background. You can
pretend that memory is infinite. The only time this model fails is when
you are at a raised IRQL.

If right, it can also makes page fault brings back the paged-out dispatcher object. (This is the confusion that I am trying to figure out)

Page faults cause a processor interrupt. The page fault handler runs as
the interrupt request handler for the page fault exception. The
conflict with raised IRQL code is that you do not want it to be
interrupted. The kernel often needs to run through its lists of pending
dispatcher objects, and that process also runs at a raised IRQL (to
prevent interrupts). Thus, there is a requirement that dispatcher
objects never be paged out.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

anton_bassov · March 18, 2011, 9:58pm

> So I concluded page-fault handler must run at dispatch_level and dispatcher runs

at dispatch_level so dispatcher handle page faults. But this is wrong.

IRQL>= DISPATCH_LEVEL is just a Windows notion of atomic context, which means that you cannot make blocking calls at elevated IRQL. If page is swapped out to the disk the failing thread has to be put to sleep while
page is being brought back into RAM, and this is the kind of thing you cannot do at elevated IRQL. This is why you cannot cause page faults at elevated IRQL. Simple, ugh…

Anton Bassov

Rotor_Clix · March 19, 2011, 8:06am

Thanks for replies. I read them carefully.

Meanwhile i searched and read that:
http://www.osronline.com/showthread.cfm?link=132420

"Page fault handler (INT 15) gets raised whenever page is not in memory (Present bit is off), and it gets raised at any IRQL *in context of the failing thread* - page faults are always handled in context of the failing thread regardless of IRQL, because they are just exceptions "

I think this is valid for all type of exceptions. So exceptions preempts all kinds of interrupts(I mean all kind of code that run any IRQL).

So what http://msdn.microsoft.com/en-us/library/ms810029.aspx this documentation says for description of APC_LEVEL: Asynchronous procedure calls and “page faults” , is not entirely correct and misleading. Right?

Anton Bassov :“If it above APC_LEVEL, you get a bugcheck with PAGE_FAULT_IN_NONPAGED_AREA code, because it does not make sense for page fault handler to even proceed to examining VADs”
Can you please explain that part?

anton_bassov · March 19, 2011, 8:53am

> Anton Bassov :"If it above APC_LEVEL, you get a bugcheck with

PAGE_FAULT_IN_NONPAGED_AREA code, because it does not make sense for page
fault handler to even proceed to examining VADs" Can you please explain that part?

If page fault gets raised it means that failing thread accessed the address that is either paged out to the disk or just plainly invalid. In order to find out whether the address is valid VADs have to be examined. However, if thread runs at elevated IRQL it cannot be put to sleep - even if the address in itself is valid the system still has to bugcheck. Therefore, it just does not make sense to examine VADs if you are about to bugcheck anyway. What may be possibly unclear here???

Anton Bassov

Aditya_Shrivastava · March 21, 2011, 5:36pm

@Anton,

>However, if thread runs at elevated IRQL it cannot be put to sleep - even if the address in itself is valid the system still has to bugcheck.

And why does it need to put that thread to sleep, considering the execution is already halted and will only resume after the fault is resolved?

Aditya

Alex_Grig · March 21, 2011, 7:30pm

"And why does it need to put that thread to sleep, considering the execution is
already halted "

Because the code that needs to touch the dispatch object (for example, KeSetEvent) may be running in context of a DPC. Also, it will run under a spinlock. If it will be put to sleep under that spinlock, that will create a massive deadlock.

anton_bassov · March 21, 2011, 7:54pm

> And why does it need to put that thread to sleep, considering the execution is already halted

and will only resume after the fault is resolved?

It is not halted until a blocking call is made - don’t forget that page fault handler that runs in context of the failing thread. Therefore, if page contents are about to be brought into RAM from the disk page fault handler has to put a failing thread to sleep, and this is something that it cannot do at elevated IRQL. In some cases blocking is, indeed, unnecessary - for example, if page is on the standby list (i.e. physically is still in RAM), or if zeroed page is sufficient for resolving the fault the whole thing can be done without blocking. However, consider the extra degree of uncertainty and the number of additional heisenbugs that allowing “nonblocking” page faults at elevated IRQL would imply…

Anton Bassov