kernel stack allocations have failed

Ah if only the source were up on github.

So I have this annoying problem with a set of VM images: every dump I look
at has the VERY DISCONCERTING warning from !vm that:

******* 427456 kernel stack PTE allocations have failed ******
where the large number of failures is variable, but the warning is always
present.

There is no indication that these alleged failures are causing any problem
at all, but while root causing other real problems, various people see
that and decide that they need look no further.

If I break into a functioning vm with the debugger, same warning.

I’ve determined that all of the failures occur
between nt!ExpInitSystemPhase1 and nt!ExInitSystemPhase2. There are no
failures after nt!ExInitSystemPhase2 starts.

I’d love to be able to set a breakpoint on whatever counter is being
incremented that !vm is looking at, but I have no idea what that counter
is. Any clues?

Mark Roddy

In spite of its hideous licensing model (that only gets worse by the day),
IDA tells me it might be from the following structure:

?? ((nt!_MI_SYSTEM_INFORMATION
*)@@masm(nt!MiState))->SystemPtes.KernelStackPteInfo
struct _MI_SYSTEM_PTE_TYPE * 0x81893290
+0x000 Bitmap : _RTL_BITMAP
+0x008 BasePte : (null)
+0x00c Flags : 0
+0x010 VaType : 0 ( MiVaUnused )
+0x014 FailureCount : (null)
+0x018 PteFailures : 0
+0x01c SpinLock : 0
+0x01c GlobalPushLock : (null)
+0x020 TotalSystemPtes : 0
+0x024 Hint : 0
+0x028 LowestBitEverAllocated : 0
+0x02c CachedPtes : (null)
+0x030 TotalFreeSystemPtes : 0

Specifically looks to be the PteFailures value. Does that match your number
in the dump?

-scott
OSR
@OSRDrivers

Nah I looked at that thanks to Alex et all in V7 of Internals. That counter
doesn’t correlate. As in it’s zero but whatever !vm is looking at isn’t.

Thanks anyway.

Mark Roddy

On Thu, Sep 6, 2018 at 3:04 PM Scott Noone <
xxxxx@lists.osr.com> wrote:

> In spite of its hideous licensing model (that only gets worse by the day),
> IDA tells me it might be from the following structure:
>
> ?? ((nt!_MI_SYSTEM_INFORMATION
> *)@@masm(nt!MiState))->SystemPtes.KernelStackPteInfo
> struct _MI_SYSTEM_PTE_TYPE * 0x81893290
> +0x000 Bitmap : _RTL_BITMAP
> +0x008 BasePte : (null)
> +0x00c Flags : 0
> +0x010 VaType : 0 ( MiVaUnused )
> +0x014 FailureCount : (null)
> +0x018 PteFailures : 0
> +0x01c SpinLock : 0
> +0x01c GlobalPushLock : (null)
> +0x020 TotalSystemPtes : 0
> +0x024 Hint : 0
> +0x028 LowestBitEverAllocated : 0
> +0x02c CachedPtes : (null)
> +0x030 TotalFreeSystemPtes : 0
>
> Specifically looks to be the PteFailures value. Does that match your
> number
> in the dump?
>
> -scott
> OSR
> @OSRDrivers
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

Oh and I thought about renewing my ida license after grovelling through
windbg’s disassembler, but it’s license is indeed hideous. I’d have to go
beg others to pay for it.

Mark Roddy

On Thu, Sep 6, 2018 at 4:33 PM Mark Roddy wrote:

> Nah I looked at that thanks to Alex et all in V7 of Internals. That
> counter doesn’t correlate. As in it’s zero but whatever !vm is looking at
> isn’t.
>
> Thanks anyway.
>
> Mark Roddy
>
>
> On Thu, Sep 6, 2018 at 3:04 PM Scott Noone <
> xxxxx@lists.osr.com> wrote:
>
>> In spite of its hideous licensing model (that only gets worse by the
>> day),
>> IDA tells me it might be from the following structure:
>>
>> ?? ((nt!_MI_SYSTEM_INFORMATION
>> *)@@masm(nt!MiState))->SystemPtes.KernelStackPteInfo
>> struct _MI_SYSTEM_PTE_TYPE * 0x81893290
>> +0x000 Bitmap : _RTL_BITMAP
>> +0x008 BasePte : (null)
>> +0x00c Flags : 0
>> +0x010 VaType : 0 ( MiVaUnused )
>> +0x014 FailureCount : (null)
>> +0x018 PteFailures : 0
>> +0x01c SpinLock : 0
>> +0x01c GlobalPushLock : (null)
>> +0x020 TotalSystemPtes : 0
>> +0x024 Hint : 0
>> +0x028 LowestBitEverAllocated : 0
>> +0x02c CachedPtes : (null)
>> +0x030 TotalFreeSystemPtes : 0
>>
>> Specifically looks to be the PteFailures value. Does that match your
>> number
>> in the dump?
>>
>> -scott
>> OSR
>> @OSRDrivers
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: <
>> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>> software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer&gt;
>>
></http:>

Oh wait, I take that all back. That is exactly what I need. Thanks!
Mark Roddy

On Thu, Sep 6, 2018 at 4:36 PM Mark Roddy wrote:

> Oh and I thought about renewing my ida license after grovelling through
> windbg’s disassembler, but it’s license is indeed hideous. I’d have to go
> beg others to pay for it.
>
> Mark Roddy
>
>
> On Thu, Sep 6, 2018 at 4:33 PM Mark Roddy wrote:
>
>> Nah I looked at that thanks to Alex et all in V7 of Internals. That
>> counter doesn’t correlate. As in it’s zero but whatever !vm is looking at
>> isn’t.
>>
>> Thanks anyway.
>>
>> Mark Roddy
>>
>>
>> On Thu, Sep 6, 2018 at 3:04 PM Scott Noone <
>> xxxxx@lists.osr.com> wrote:
>>
>>> In spite of its hideous licensing model (that only gets worse by the
>>> day),
>>> IDA tells me it might be from the following structure:
>>>
>>> ?? ((nt!_MI_SYSTEM_INFORMATION
>>> *)@@masm(nt!MiState))->SystemPtes.KernelStackPteInfo
>>> struct _MI_SYSTEM_PTE_TYPE * 0x81893290
>>> +0x000 Bitmap : _RTL_BITMAP
>>> +0x008 BasePte : (null)
>>> +0x00c Flags : 0
>>> +0x010 VaType : 0 ( MiVaUnused )
>>> +0x014 FailureCount : (null)
>>> +0x018 PteFailures : 0
>>> +0x01c SpinLock : 0
>>> +0x01c GlobalPushLock : (null)
>>> +0x020 TotalSystemPtes : 0
>>> +0x024 Hint : 0
>>> +0x028 LowestBitEverAllocated : 0
>>> +0x02c CachedPtes : (null)
>>> +0x030 TotalFreeSystemPtes : 0
>>>
>>> Specifically looks to be the PteFailures value. Does that match your
>>> number
>>> in the dump?
>>>
>>> -scott
>>> OSR
>>> @OSRDrivers
>>>
>>>
>>> —
>>> NTDEV is sponsored by OSR
>>>
>>> Visit the list online at: <
>>> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>>>
>>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>>> software drivers!
>>> Details at http:
>>>
>>> To unsubscribe, visit the List Server section of OSR Online at <
>>> http://www.osronline.com/page.cfm?name=ListServer&gt;
>>>
>></http:>

FWIW I just ran !vm on a crash and I see the same thing:

******* 181632 kernel stack PTE allocations have failed ******

The structure looks bogus:

2: kd> ?? ((nt!_MI_SYSTEM_INFORMATION
*)@@masm(nt!MiState))->SystemPtes.KernelStackPteInfo
struct _MI_SYSTEM_PTE_TYPE
+0x000 Bitmap : _RTL_BITMAP_EX
+0x010 BasePte : 0xfffff802c6da1590 _MMPTE +0x018 Flags : 0x20 +0x01c VaType : 0 ( MiVaUnused ) +0x020 FailureCount : 0x00000000000b820a -> ??
+0x028 PteFailures : 0x2c580
+0x030 SpinLock : 0
+0x030 GlobalPushLock : (null)
+0x038 Vm : 0x0000000000000005 _MMSUPPORT_INSTANCE +0x040 TotalSystemPtes : 0xa38400 +0x048 Hint : 0xffffd7fb71000000
+0x050 LowestBitEverAllocated : 0xfffffc7e40000000 +0x058 CachedPtes : 0x0000000f0000000b _MI_CACHED_PTES
+0x060 TotalFreeSystemPtes : 0xfffff802`c6da1008

My debugger build is 17134 and the crash is from OS build 16299.

It really looks like the data type doesn’t match what the target machine is
using, which is kind of terrifying. But, seems very unlikely that this
actually has anything to do with kernel stack allocations failing (this
crash was generated from a debugger ASSERT to do with file sizes).

-scott
OSR
@OSRDrivers

hmmmm maybe its just 16299.

That would generate a lot of wtf sighs of relief.

Mark Roddy

On Thu, Sep 6, 2018 at 4:55 PM Scott Noone <
xxxxx@lists.osr.com> wrote:

> FWIW I just ran !vm on a crash and I see the same thing:
>
> *181632 kernel stack PTE allocations have failed
>
> The structure looks bogus:
>
> 2: kd> ?? ((nt!_MI_SYSTEM_INFORMATION
> *)@@masm(nt!MiState))->SystemPtes.KernelStackPteInfo
> struct _MI_SYSTEM_PTE_TYPE
> +0x000 Bitmap : _RTL_BITMAP_EX
> +0x010 BasePte : 0xfffff802c6da1590 _MMPTE<br>&gt; +0x018 Flags : 0x20<br>&gt; +0x01c VaType : 0 ( MiVaUnused )<br>&gt; +0x020 FailureCount : 0x00000000000b820a -> ??
> +0x028 PteFailures : 0x2c580
> +0x030 SpinLock : 0
> +0x030 GlobalPushLock : (null)
> +0x038 Vm : 0x0000000000000005 _MMSUPPORT_INSTANCE<br>&gt; +0x040 TotalSystemPtes : 0xa38400<br>&gt; +0x048 Hint : 0xffffd7fb71000000
> +0x050 LowestBitEverAllocated : 0xfffffc7e40000000<br>&gt; +0x058 CachedPtes : 0x0000000f0000000b _MI_CACHED_PTES
> +0x060 TotalFreeSystemPtes : 0xfffff802`c6da1008
>
> My debugger build is 17134 and the crash is from OS build 16299.
>
> It really looks like the data type doesn’t match what the target machine
> is
> using, which is kind of terrifying. But, seems very unlikely that this
> actually has anything to do with kernel stack allocations failing (this
> crash was generated from a debugger ASSERT to do with file sizes).
>
> -scott
> OSR
> @OSRDrivers
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

> kernel stack PTE allocations have failed

I just wonder how something like that may be possible from the technical standpoint, in the first place…

The way I understand it, the kernel stack of a thread is allocated upon thread creation and stays physically present throughout its entire lifetime, unless the target thread enters user-mode wait. In this case its kernel stack may get PHYSICALLY swapped out, but its virtual address still remains the same, for understandable reasons. How can PTE allocation for the kernel stack fail but thread creation per se still succeed??? Could someone please enlighten me on this

Anton Bassov

Well sure, that was why I needed to get some answer as to why 'bag was
reporting what ought to be a fairly fatal condition on systems that seemed
to be happily chugging along. Each process gets a kernel thread, so I
could imagine a situation where processes were randomly failing to start
because of some temporary resource malfunction. It just didn’t seem
realistic.

Mark Roddy

On Thu, Sep 6, 2018 at 7:54 PM xxxxx@hotmail.com
wrote:

> > kernel stack PTE allocations have failed
>
>
> I just wonder how something like that may be possible from the technical
> standpoint, in the first place…
>
> The way I understand it, the kernel stack of a thread is allocated upon
> thread creation and stays physically present throughout its entire
> lifetime, unless the target thread enters user-mode wait. In this case its
> kernel stack may get PHYSICALLY swapped out, but its virtual address still
> remains the same, for understandable reasons. How can PTE allocation for
> the kernel stack fail but thread creation per se still succeed??? Could
> someone please enlighten me on this
>
>
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>