STOP: 0x4D (0x2A8FB,0x2A8FB,0,0) when allocating large amounts of memory using MmAllocatePagesForMdl

James_Harper · February 28, 2011, 6:54am

A user has reported that they are seeing bug check 0x4D when stress
testing my xen drivers when ‘ballooning down’ (returning physical memory
to xen for use in other VMs). The allocation of memory to give back to
xen is done using MmAllocatePagesForMdl(Ex).

Given the large values for P1 and P2 in the bug check, representing
~714MB of dirty pages waiting to be written to the page file, my theory
is that my driver has allocated so much memory so quickly that Windows
doesn’t have time to write out the pages to disk and can’t satisfy a
subsequent allocation request. Does that sound plausible?

A dirty solution is probably just to introduce a delay into the ‘balloon
down’ loop, but it would be better I think to check with Windows if
there is enough free memory before we take it all. What API is best to
give me that figure? I’m allocating memory 1MB at a time so any
individual allocation shouldn’t exhaust things if I check for headroom
first, it’s just when I do it 700 times in rapid succession that it
causes a problem.

Thanks

James

Alex_Grig · February 28, 2011, 10:30am

Are you making the call on PASSIVE_LEVEL?

James_Harper · February 28, 2011, 5:15pm

>

Are you making the call on PASSIVE_LEVEL?

Yes. My call to MmAllocatePagesForMdlEx is done at PASSIVE_LEVEL.

James

Alex_Grig · February 28, 2011, 6:05pm

From what context the function is called? Is it a system workitem or your own workitem?

James_Harper · February 28, 2011, 6:12pm

>

From what context the function is called? Is it a system workitem or
your own
workitem?

It’s running in a thread created in my driver. It waits on notification
from Xen that the allocated memory has changed then allocates memory and
gives the physical pages xen, or asks xen to put pages back into
physical memory and frees the memory back to windows, depending on if we
are decreasing or increasing our memory size. It does this in a loop,
1MB at a time.

James

Alex_Grig · February 28, 2011, 6:15pm

What flags do you pass to the call?

James_Harper · February 28, 2011, 6:34pm

>

What flags do you pass to the call?

None. Flags = 0. During initial allocation (at the start of DriverEntry)
I have tested using MM_DONT_ZERO_ALLOCATION to try and resolve a
performance problem but it doesn’t make any difference anyway.

Right now I’m experimenting with not allocating any further memory if
\KernelObjects\LowMemoryCondition is set. Is there a way to
KeWaitForMultipleObjects but wait for the event to be cleared?
Performance and response isn’t critical here so I’m just doing a loop
with an exponential delay for now.

James

MBond · February 28, 2011, 6:52pm

While it would be a good thing to play nice in the sand box by not
allocating all available memory, I suspect that your bug is an unchecked
allocation failure. Your first need to fix this bug; and it will be easier
if it is readily reproducible

“James Harper” wrote in message news:xxxxx@ntdev…

What flags do you pass to the call?

None. Flags = 0. During initial allocation (at the start of DriverEntry)
I have tested using MM_DONT_ZERO_ALLOCATION to try and resolve a
performance problem but it doesn’t make any difference anyway.

Right now I’m experimenting with not allocating any further memory if
\KernelObjects\LowMemoryCondition is set. Is there a way to
KeWaitForMultipleObjects but wait for the event to be cleared?
Performance and response isn’t critical here so I’m just doing a loop
with an exponential delay for now.

James

James_Harper · February 28, 2011, 7:02pm

>

While it would be a good thing to play nice in the sand box by not
allocating all available memory, I suspect that your bug is an
unchecked
allocation failure. Your first need to fix this bug; and it will be
easier
if it is readily reproducible

An unchecked allocation failure would be something like:

some_struct = alloc_mem(sizeof(some_struct));
some_struct->some_val = 42;

which would be an 0xD1 or similar BSoD right? 0x4D seems like an
internal windows thing like:

ptr = alloc_mem(1234);
if (!ptr)
KeBugCheck(0x4D);

Eg in the case of an allocation that must succeed.

James

James_Harper · March 2, 2011, 7:12pm

My latest conclusion is that the user is just overloading their VM’s
excessively, resulting in an overload of the disk IO channel, resulting
in extraordinarily long delays in IO requests being completed. I added
timestamping to the logging and can see that after 60 seconds of no
response and the issuing of several scsi reset commands, Windows crashes
with the 0x4D error.

The user is running some tests which exercise a 2GB working set of data
on a VM with 512MB available, on several VM’s at once, resulting in
massive amounts of disk thrashing. This is obviously showing up a bit of
a weakness in Xen’s IO scheduling as 60 seconds for an IO completion is
obviously not correct in any sense of the word - Xen is not servicing
all the VM’s IO requests fairly.

I’m not sure there is much I can do about this. It would be nice to
simply let the VM stall but Windows is obviously under enormous memory
pressure to page out in this case and so a disk IO channel hang is going
to break things.

I’m not sure I’m doing the HwScsiResetBus correctly though. I’ll start
another thread for that.

James

MBond · March 2, 2011, 7:52pm

I see. 60 seconds for an IOP completion is certainly a long time but not
unheard of. I have seen loaded disks where a write can take several minutes
to complete (UM WriteFile on FILE_FLAG_NO_BUFFERING +
FILE_FLAG_WRITE_THROUGH) It will be a major performance problem if the IOP
is a paging request and that kind of latency may trigger other problems but
I don’t know of any reason for the BSOD but others more expert than me will
surely comment.

“James Harper” wrote in message news:xxxxx@ntdev…

My latest conclusion is that the user is just overloading their VM’s
excessively, resulting in an overload of the disk IO channel, resulting
in extraordinarily long delays in IO requests being completed. I added
timestamping to the logging and can see that after 60 seconds of no
response and the issuing of several scsi reset commands, Windows crashes
with the 0x4D error.

The user is running some tests which exercise a 2GB working set of data
on a VM with 512MB available, on several VM’s at once, resulting in
massive amounts of disk thrashing. This is obviously showing up a bit of
a weakness in Xen’s IO scheduling as 60 seconds for an IO completion is
obviously not correct in any sense of the word - Xen is not servicing
all the VM’s IO requests fairly.

I’m not sure there is much I can do about this. It would be nice to
simply let the VM stall but Windows is obviously under enormous memory
pressure to page out in this case and so a disk IO channel hang is going
to break things.

I’m not sure I’m doing the HwScsiResetBus correctly though. I’ll start
another thread for that.

James

Pavel_Lebedinsky · March 2, 2011, 8:55pm

The problem is not just that individual IOs are taking too long. Bugcheck 4d occurs when a thread needs to obtain a free physical page and is not able to do that for 70 seconds. That means all free/zeroed/standby pages in the system have already been consumed, processes have already been trimmed to their minimums, and writing dirty pages to disk also didn’t produce any clean pages during that time (or all those pages have been stolen by other threads).

Thanks,
Pavel

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of m
Sent: Wednesday, March 02, 2011 4:52 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] 0x4D (0x2A8FB,0x2A8FB,0,0) when allocating large amounts of memory using MmAllocatePagesForMdlEx

I see. 60 seconds for an IOP completion is certainly a long time but not unheard of. I have seen loaded disks where a write can take several minutes to complete (UM WriteFile on FILE_FLAG_NO_BUFFERING +
FILE_FLAG_WRITE_THROUGH) It will be a major performance problem if the IOP is a paging request and that kind of latency may trigger other problems but I don’t know of any reason for the BSOD but others more expert than me will surely comment.

“James Harper” wrote in message news:xxxxx@ntdev…

My latest conclusion is that the user is just overloading their VM’s excessively, resulting in an overload of the disk IO channel, resulting in extraordinarily long delays in IO requests being completed. I added timestamping to the logging and can see that after 60 seconds of no response and the issuing of several scsi reset commands, Windows crashes with the 0x4D error.

The user is running some tests which exercise a 2GB working set of data on a VM with 512MB available, on several VM’s at once, resulting in massive amounts of disk thrashing. This is obviously showing up a bit of a weakness in Xen’s IO scheduling as 60 seconds for an IO completion is obviously not correct in any sense of the word - Xen is not servicing all the VM’s IO requests fairly.

I’m not sure there is much I can do about this. It would be nice to simply let the VM stall but Windows is obviously under enormous memory pressure to page out in this case and so a disk IO channel hang is going to break things.

I’m not sure I’m doing the HwScsiResetBus correctly though. I’ll start another thread for that.

James_Harper · March 2, 2011, 9:19pm

>

The problem is not just that individual IOs are taking too long.
Bugcheck 4d
occurs when a thread needs to obtain a free physical page and is not
able to
do that for 70 seconds. That means all free/zeroed/standby pages in
the system
have already been consumed, processes have already been trimmed to
their
minimums, and writing dirty pages to disk also didn’t produce any
clean pages
during that time (or all those pages have been stolen by other
threads).

Thankyou Pavel. That’s the sort of answer I was looking for. So I can
tell the user that if their Xen server cannot satisfy the IO request in
70 seconds then there is nothing I can do about it

James

Pavel_Lebedinsky · March 3, 2011, 1:43am

I think you can actually help prevent this kind of situation by allocating in chunks and stopping if available memory gets too low (like you proposed at the beginning of this thread). The best way to check available memory is using the High/LowMemoryCondition kernel events. Under most conditions you should probably stop allocating as soon as the high memory event is no longer signaled. When things are getting really desperate it might be OK to continue beyond that, but I think you should definitely stop once the low memory event becomes signaled.

Thanks,
Pavel

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Wednesday, March 02, 2011 6:19 PM
To: Windows System Software Devs Interest List
Subject: RE: Re:[ntdev] 0x4D (0x2A8FB,0x2A8FB,0,0) when allocating large amounts of memory using MmAllocatePagesForMdlEx

Thank you Pavel. That’s the sort of answer I was looking for. So I can tell the user that if their Xen server cannot satisfy the IO request in
70 seconds then there is nothing I can do about it

Paolo_Bonzini · March 3, 2011, 10:36am

Have you looked at the balloon drivers for KVM (at
http://www.kernel.org/pub/scm/virt/kvm/kvm-guest-drivers-windows.git/)? They
do use LowMemoryCondition.

James_Harper · March 7, 2011, 7:14pm

>

I think you can actually help prevent this kind of situation by
allocating in
chunks and stopping if available memory gets too low (like you
proposed at the
beginning of this thread). The best way to check available memory is
using the
High/LowMemoryCondition kernel events. Under most conditions you
should
probably stop allocating as soon as the high memory event is no longer
signaled. When things are getting really desperate it might be OK to
continue
beyond that, but I think you should definitely stop once the low
memory event
becomes signaled.

It appears that the Linux kernel in Dom0 (‘host’ OS) this particular
user is using has a bug that means IO requests from the VMs are serviced
unfairly, resulting in some VM’s getting starved for IO, in some cases
long enough for Windows to give up and declare that pages are
unavailable. So the problem is exaggerated but not actually caused by my
drivers, and the user now has a solution.

Thanks for your time

James