PASSIVE_LEVEL code while paging file is unavailable

In order for migration and save/restore to work with my xen drivers,
they need to be able to disconnect from the ‘backend’ devices (disk and
network) on save and then reconnect on resume.

The teardown code to do this runs at PASSIVE_LEVEL, so obviously at some
point during the suspend operation I will have disconnected the scsiport
controller that contains the operating system and the paging file.
Anything that tries to swap after that point therefore is going to hang
until such time as the restore is complete.

If all the code that my drivers want to run is currently running in
nonpaged memory, does it matter that other code will hang on pagefile
access? In other words, will the fact that a pagefile operation is
pending somewhere stop my code from running, or only the thread waiting
for the pagefile?

What I’m finding is that most of the time it works, but occasionally it
hangs during the suspend. I’m not sure if it is a result of the above
occurring, or an unrelated bug in my code. The hang happens because a
call to KeWaitForSingleObject never returns, even though the event being
waited on definitely gets set.

I assume I won’t have this problem if I run at DISPATCH_LEVEL, but that
would involve a fair amount of work so I’d like to be sure about what is
going on… any ideas?

Thanks

James

> What I’m finding is that most of the time it works, but occasionally
it

hangs during the suspend. I’m not sure if it is a result of the above
occurring, or an unrelated bug in my code. The hang happens because a
call to KeWaitForSingleObject never returns, even though the event
being
waited on definitely gets set.

Actually… it turns out that the event was definitely not getting set
(typo == setting wrong event). It worked most of the time because the
thing being waited for was almost always already complete so
KeWaitForSingleObject wasn’t getting called.

Anyway, my question still stands… are threads running at PASSIVE_LEVEL
that don’t access the disk in any way (eg paging file or other on-disk
content) going to be blocked if the disk becomes unavailable?

Thanks

James

> are threads running at PASSIVE_LEVEL that don’t access the disk in any way (eg paging file

or other on-disk content) going to be blocked if the disk becomes unavailable?

Well, I am afraid you are the only one who knows an answer to this question - it depends on what your code does. For example, if it waits on event/mutex that is supposed to get signaled by some other thread that relies upon pagefile, what do you think will happen if this "other thread"blocks??? Who is going to signal the event then???

Anton Bassov

>

> are threads running at PASSIVE_LEVEL that don’t access the disk in
any
> way (eg paging file
> or other on-disk content) going to be blocked if the disk becomes
> unavailable?

Well, I am afraid you are the only one who knows an answer to this
question - it depends on what your code does. For example, if it waits
on
event/mutex that is supposed to get signaled by some other thread that
relies upon pagefile, what do you think will happen if this "other
thread"blocks??? Who is going to signal the event then???

You didn’t read the first email did you? None of my code needs the
pagefile. There is no ‘other thread that needs the pagefile’ that my
code is waiting on.

With the disk unavailable, more and more of the system is eventually
going to be stalled waiting for the pagefile/registry/other disk stuff.
My question is, will that indirectly affect the operation of my code
somehow? Eg would Windows decide not to service a thread because another
(completely unrelated) thread is waiting on the pagefile? Could I create
a new thread in the above situation (again - my thread doesn’t need the
pagefile that I’m aware of, but maybe there is some ‘under the covers’
requirement for it).

The more I think about it, the more I think the answer is ‘no, my code
will run just fine even if the rest of the pageable parts of the system
lock up around it’, especially as most of the drivers in question are in
the paging path* so it wouldn’t make sense for them to be able to get
stalled. I probably just need to re-order my disconnection sequence so
that the non-paging-path drivers are all completely disconnected before
the paging-path drivers are disconnected…

(* I think that’s the right term - I mean that they are required to
support paging at all)

Once again, writing it out has helped me put my thoughts in order - it
just happened this time after I hit ‘send’ on the original email :slight_smile:

James

You’re right, strictly speaking. Your thread will run as long as it’s runnable… if it doesn’t pagefault anything from the paging file, it won’t wait on the paging file.

so…

the answer to this is NO. As long as the thread, and anything upon which the thread depends, doesn’t need resources from the paging file.

Peter
OSR

> The more I think about it, the more I think the answer is

‘no, my code will run just fine even if the rest of the
pageable parts of the system lock up around it’, especially
as most of the drivers in question are in the paging path* so
it wouldn’t make sense for them to be able to get stalled. I
probably just need to re-order my disconnection sequence so
that the non-paging-path drivers are all completely
disconnected before the paging-path drivers are disconnected…

My experience is if the system disk/pagefile becomes inaccessible, page
fault disk I/O’s timeout (the default is a small number of seconds) or fail,
and eventually (maybe instantly maybe not for many minutes) a page fault I/O
fails that’s important and the system blue screens with a crash like “kernel
in-page error”.

Jan

> You didn’t read the first email did you?

I did, but it does not provide enough information that is needed for giving you a definite reply. More on this below…

None of my code needs the pagefile. There is no ‘other thread that needs the pagefile’
that my code is waiting on.

Yes, but what about API calls that it makes??? This is what I meant in my previous reply - after all, the scenario that I had described in it may well happen behind the scenes, rather than directly in your code.
The above statement implies that your code does not make any API calls that may go blocking for this or that reason behind the scenes. In other words, your code is able to run at elevated IRQL withut any problem. Is it the case here???

Anton Bassov

Peter is right. It’s also worth pointing out that this same condition
exists in regular WDM drivers which are part of the paging stack and its
parents during a transition to any sleep state. After the power state
callback is invoked, all future code in that driver will be invoked with the
disks potentially spun down and in a state where they can’t be spun up.
Lots of it can run at PASSIVE_LEVEL, though the rules have to be worked out
very carefully so that you don’t call anything that will cause a page fault.
You can know whether various functions can be called in this state if they
can be called at either PASSIVE_LEVEL or DISPATCH_LEVEL. As an example,
KeWaitForSingleObject can be called at DISPATCH_LEVEL (though at that IRQL
the timeout must be 0.) So you can know for certain that
KeWaitForSingleObject doesn’t cause a page fault.

Typically, WDM drivers in this situation just handle their power IRPs in
code that can run at DISPATCH_LEVEL. Doron and I thought long and hard
about this when we were laying out the PnP/Power stuff in what became KMDF.
We didn’t want to force all PnP/Power callbacks to DISPATCH_LEVEL in these
drivers, but the rules were definitely hard to explain as a result. I even
labeled all the special states in diagram of the power state machine that
drives KMDF with little lightning bolts to remind us that we could get
shocked. I believe that the DDK has some text to try to explain this
contract on the PnP/Power callbacks, but I knew that no option we chose
would be straight forward. Fortunately, the issue rarely comes up because
we at Microsoft write most of the drivers which are parents or grandparents
of the paging stacks. Filters, though, can still be an issue.


Jake Oshins
Hyper-V I/O Architect (former power guy, former KMDF designer)
Windows Kernel Team

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
> You’re right, strictly speaking. Your thread will run as long as it’s
> runnable… if it doesn’t pagefault anything from the paging file, it
> won’t wait on the paging file.
>
> so…
>
>


>
> the answer to this is NO. As long as the thread, and anything upon which
> the thread depends, doesn’t need resources from the paging file.
>
> Peter
> OSR
>

>…

Typically, WDM drivers in this situation just handle their
power IRPs in code that can run at DISPATCH_LEVEL. Doron and
I thought long and hard about this when we were laying out
the PnP/Power stuff in what became KMDF.
We didn’t want to force all PnP/Power callbacks to
DISPATCH_LEVEL in these drivers, but the rules were
definitely hard to explain as a result. I even labeled all
the special states in diagram of the power state machine that
drives KMDF with little lightning bolts to remind us that we
could get shocked. I believe that the DDK has some text to
try to explain this contract on the PnP/Power callbacks, but
I knew that no option we chose would be straight forward.
Fortunately, the issue rarely comes up because we at
Microsoft write most of the drivers which are parents or
grandparents of the paging stacks. Filters, though, can
still be an issue.

I think it’s not a very good assumption anymore that only Microsoft will
write drivers in the paging path. For example, ANY NIC driver that is being
used for iSCSI boot is essentially in the paging path. Any converged I/O
adapters, like the Broadcom Ethernet CNIC’s or the Xsigo Infiniband product
have virtual busses below the virtual storage adapters. It seems quite
possible FCoE adapters from Emulex and Qlogic will also have virtual busses
in the paging path.

It’s nice to hear you indicated the danger areas on the power state diagram
used for KMDF. I assume that’s the same power state diagram that they showed
on the screen at the DDC but said they would not be giving to us developers,
so your helpful warnings are ONLY useful to Microsoft internal developers.

Jan

> > The more I think about it, the more I think the answer is

> ‘no, my code will run just fine even if the rest of the
> pageable parts of the system lock up around it’, especially
> as most of the drivers in question are in the paging path* so
> it wouldn’t make sense for them to be able to get stalled. I
> probably just need to re-order my disconnection sequence so
> that the non-paging-path drivers are all completely
> disconnected before the paging-path drivers are disconnected…

My experience is if the system disk/pagefile becomes inaccessible,
page
fault disk I/O’s timeout (the default is a small number of seconds) or
fail,
and eventually (maybe instantly maybe not for many minutes) a page
fault
I/O
fails that’s important and the system blue screens with a crash like
“kernel in-page error”.

When the system is idle, the chances are the save+restore/migrate will
have completed before windows has time to issue another srb. If it does
issue another srb, there will only be a short additional delay before
the srb is complete. If it delays long enough for windows to think
something has gone wrong then something probably has gone wrong :slight_smile:

Thanks

James

>

You’re right, strictly speaking. Your thread will run as long as it’s
runnable… if it doesn’t pagefault anything from the paging file, it
won’t wait on the paging file.

so…

[quote]
are threads running at PASSIVE_LEVEL that don’t access the disk in any
way
(eg paging file or other on-disk content) going to be blocked if the
disk
becomes unavailable?
[/quote]

the answer to this is NO. As long as the thread, and anything upon
which
the thread depends, doesn’t need resources from the paging file.

In retrospect, it was probably a bit of a silly question. I should have
realised that in order to be ‘in’ the paging path, my driver must not be
depending on the page file or I would have run into trouble long ago.

James

In general, unless explicitly specified otherwise, most PASSIVE_LEVEL routines are pagable. You clearly can’t call anything pagable if you have disabled the page file without risking deadlocking.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Friday, February 13, 2009 7:07 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] PASSIVE_LEVEL code while paging file is unavailable

> are threads running at PASSIVE_LEVEL that don’t access the disk in
any
> way (eg paging file
> or other on-disk content) going to be blocked if the disk becomes
> unavailable?

Well, I am afraid you are the only one who knows an answer to this
question - it depends on what your code does. For example, if it waits
on
event/mutex that is supposed to get signaled by some other thread that
relies upon pagefile, what do you think will happen if this "other
thread"blocks??? Who is going to signal the event then???

You didn’t read the first email did you? None of my code needs the
pagefile. There is no ‘other thread that needs the pagefile’ that my
code is waiting on.

With the disk unavailable, more and more of the system is eventually
going to be stalled waiting for the pagefile/registry/other disk stuff.
My question is, will that indirectly affect the operation of my code
somehow? Eg would Windows decide not to service a thread because another
(completely unrelated) thread is waiting on the pagefile? Could I create
a new thread in the above situation (again - my thread doesn’t need the
pagefile that I’m aware of, but maybe there is some ‘under the covers’
requirement for it).

The more I think about it, the more I think the answer is ‘no, my code
will run just fine even if the rest of the pageable parts of the system
lock up around it’, especially as most of the drivers in question are in
the paging path* so it wouldn’t make sense for them to be able to get
stalled. I probably just need to re-order my disconnection sequence so
that the non-paging-path drivers are all completely disconnected before
the paging-path drivers are disconnected…

(* I think that’s the right term - I mean that they are required to
support paging at all)

Once again, writing it out has helped me put my thoughts in order - it
just happened this time after I hit ‘send’ on the original email :slight_smile:

James


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

“In general, unless explicitly specified otherwise, most PASSIVE_LEVEL routines
are pagable.”

I think Vista/Win2008 finally almost got rid of that insanity. They don’t seem to page the kernel stacks out anymore, but still contain pageable code, which occasionally gets in the way of the power management. I remember when Win2008 tried to call paged code in the processor addition simulation test, while DriverVerifier marked the page of nt!PopSystemIrpCompletion() function inaccessible.

If you sum up the whole System32\Drivers directory, you’ll get some 20 megabytes. Amount of paged code sections would be under 5 MB or so. Is all the danger of such obscure bugs worth saving 5MB of nonpaged space?

I agree. OTOH, why have something like the parallel and serial port drivers resident in memory when you’re just about NEVER going to use either…

There are two sides to each coin,

Peter
OSR

“OTOH, why have something like the parallel and serial port drivers
resident in memory when you’re just about NEVER going to use either…”

Take, for example, Win7 x64. Total size of pageable sections in cdfs, cdrom, classpnp, disk, fastfat, kbdclass, kbdhid, mouclass, pci, serial, wdf01000 is WHOLE FAT HALF A MEGABYTE. About 500 kilobytes. I can only imagine amount of regression testing spent not to break their fragile locking schemes after each little change.

Not all drivers dynamically lock/unlock their PAGEable segments, look at mouclass and kbdclass sources in the wdk.

d
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
Sent: Wednesday, February 18, 2009 2:52 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PASSIVE_LEVEL code while paging file is unavailable

“OTOH, why have something like the parallel and serial port drivers
resident in memory when you’re just about NEVER going to use either…”

Take, for example, Win7 x64. Total size of pageable sections in cdfs, cdrom, classpnp, disk, fastfat, kbdclass, kbdhid, mouclass, pci, serial, wdf01000 is WHOLE FAT HALF A MEGABYTE. About 500 kilobytes. I can only imagine amount of regression testing spent not to break their fragile locking schemes after each little change.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Keep in mind that in virtualization heavy environments, that cost may be multiplied by the number of VMs.

Also, there are some pretty large drivers out there (say, look at the size of a recent NVIDIA or ATI video driver).

  • S

-----Original Message-----
From: xxxxx@broadcom.com
Sent: Wednesday, February 18, 2009 11:07
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PASSIVE_LEVEL code while paging file is unavailable

“In general, unless explicitly specified otherwise, most PASSIVE_LEVEL routines
are pagable.”

I think Vista/Win2008 finally almost got rid of that insanity. They don’t seem to page the kernel stacks out anymore, but still contain pageable code, which occasionally gets in the way of the power management. I remember when Win2008 tried to call paged code in the processor addition simulation test, while DriverVerifier marked the page of nt!PopSystemIrpCompletion() function inaccessible.

If you sum up the whole System32\Drivers directory, you’ll get some 20 megabytes. Amount of paged code sections would be under 5 MB or so. Is all the danger of such obscure bugs worth saving 5MB of nonpaged space?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Yes that’s a feature that made sense when systems had only 16MB of RAM
available. That number has scaled with a factor of 1024 since then so I just
cannot understand why they don’t do efforts to get rid of the page file
altogether.

//Daniel

wrote in message news:xxxxx@ntdev…
> “In general, unless explicitly specified otherwise, most PASSIVE_LEVEL
> routines
> are pagable.”
>
> I think Vista/Win2008 finally almost got rid of that insanity. They
> don’t seem to page the kernel stacks out anymore, but still contain
> pageable code, which occasionally gets in the way of the power management.
>

At the moment, we’re very concerned with the number of VMs we can run on a
single machine. If we got rid of the page file, then we just have to put it
back at the virtualization layer, and we’d have a lot less information about
what’s worth paging.

I’ve heard your argument a lot lately. Please understand that we experience
huge memory pressure all the time. Everybody would like to get just one
more VM running well.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Team

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
> Yes that’s a feature that made sense when systems had only 16MB of RAM
> available. That number has scaled with a factor of 1024 since then so I
> just cannot understand why they don’t do efforts to get rid of the page
> file altogether.
>
> //Daniel
>
>
>
> wrote in message news:xxxxx@ntdev…
>> “In general, unless explicitly specified otherwise, most PASSIVE_LEVEL
>> routines
>> are pagable.”
>>
>> I think Vista/Win2008 finally almost got rid of that insanity. They
>> don’t seem to page the kernel stacks out anymore, but still contain
>> pageable code, which occasionally gets in the way of the power
>> management.
>>
>
>

The way I see it, even though there is a paging file you cannot push it very
far. If you make this much bigger than your actual amount of RAM, above a
certain threshold you will see a sharp drop in usability and scalability.
It’s just a way of pretending to have just the double of what you actually
own which looks kind of greedy.

If I were to be king for a day I would build a test OS with all virtual
memory resident and then start counting how much the world would be a better
place. The performance increase I imagine to be incredible and I am
foreseeing low latencies and predictability in the time domain with the
absence of an expensive page fault handler which needs to read in pages from
disk.

//Daniel

“Jake Oshins” wrote in message
news:xxxxx@ntdev…
> At the moment, we’re very concerned with the number of VMs we can run on a
> single machine. If we got rid of the page file, then we just have to put
> it back at the virtualization layer, and we’d have a lot less information
> about what’s worth paging.
>
> I’ve heard your argument a lot lately. Please understand that we
> experience huge memory pressure all the time. Everybody would like to get
> just one more VM running well.
>
> –
> Jake Oshins
> Hyper-V I/O Architect
> Windows Kernel Team
>
> This post implies no warranties and confers no rights.
>
> --------------------------------------------
>
> wrote in message news:xxxxx@ntdev…
>> Yes that’s a feature that made sense when systems had only 16MB of RAM
>> available. That number has scaled with a factor of 1024 since then so I
>> just cannot understand why they don’t do efforts to get rid of the page
>> file altogether.
>>
>> //Daniel
>>
>>
>>
>> wrote in message news:xxxxx@ntdev…
>>> “In general, unless explicitly specified otherwise, most PASSIVE_LEVEL
>>> routines
>>> are pagable.”
>>>
>>> I think Vista/Win2008 finally almost got rid of that insanity. They
>>> don’t seem to page the kernel stacks out anymore, but still contain
>>> pageable code, which occasionally gets in the way of the power
>>> management.
>>>
>>
>>
>