Strange stack corruption during single-step debugging

Hello!
Function to reproduce:
function proc
add rsp,8
sub rsp,8
ret
function endp

If i just run this function everything is ok. But if i try to f10 then
stack becomes corrupted (return address is overwritten). If i set access
breakpoint at address of return address it turns out that
nt!KiDebugTrapOrFault overwrites the stack. If i switch add and sub
function proc
sub rsp,8
add rsp,8
ret
function endp
this code works fine even in single-step debugging.

Don’t really have a question, but maybe you have something to say about
this situation. I understand that incremented stack pointer shouldn’t be
used inside this function to reference memory, but i definitely don’t see
anything criminal about just incrementing and decrementing stack pointer
itself. Don’t know if this is bug in debug exception dispatcher or intended
behavior.

Don’t forget that stack grows downwards on x86…

What you do in the first version (i.e. when you add 8 to RSP) can be interpreted, from the logical standpoint, as either :

  1. De-allocating a local variable from the stack before the function even attempts to return, or

2 Popping off an argument that has not yet been pushed on the stack

Certainly, as long as you re-adjust it accordingly before returning everything should work just fine (and it,indeed, does without a debugger), but the very concept of interactive debugging invariably involves memory modifications that have to be done by the debugger. When you single-step through the code, the debugger has to insert a breakpoint into the code and,apparently, to modify the stack in an expectation of the forthcoming debug exception, and to do so upon every instruction execution. Therefore, your “bold” stack modification, apparently, just gets at odds with the ones made by the debugger, so that the stack gets corrupted.

However,in the latter case the only thing that you do is just pushing an argument on the stack,
so that everything works flawlessly even in single-stepping mode

Anton Bassov

Thank you Anton !
You are absolutely right. I never actually digged how debuggers work and
was assuming that cpu just stops after each instruction if trap flag is
set. But processor actually generates debug exception after each
instruction. Now that behavior makes sense.

On Fri, Jun 29, 2018 at 2:28 PM xxxxx@hotmail.com <
xxxxx@lists.osr.com> wrote:

Don’t forget that stack grows downwards on x86…

What you do in the first version (i.e. when you add 8 to RSP) can be
interpreted, from the logical standpoint, as either :

  1. De-allocating a local variable from the stack before the function even
    attempts to return, or

2 Popping off an argument that has not yet been pushed on the stack

Certainly, as long as you re-adjust it accordingly before returning
everything should work just fine (and it,indeed, does without a debugger),
but the very concept of interactive debugging invariably involves memory
modifications that have to be done by the debugger. When you single-step
through the code, the debugger has to insert a breakpoint into the code
and,apparently, to modify the stack in an expectation of the forthcoming
debug exception, and to do so upon every instruction execution. Therefore,
your “bold” stack modification, apparently, just gets at odds with the ones
made by the debugger, so that the stack gets corrupted.

However,in the latter case the only thing that you do is just pushing an
argument on the stack,
so that everything works flawlessly even in single-stepping mode

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at: <
http://www.osronline.com/showlists.cfm?list=ntdev\>

MONTHLY seminars on crash dump analysis, WDF, Windows internals and
software drivers!
Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

Except where the platform ABI defines the notion of a true “red zone” where some portion of unallocated stack space is guaranteed not to be used by traps, exceptions, etc. (NT does not for most platforms), you must always assume that space on the stack that has not yet been allocated is volatile and might be overwritten at any time, even if you subsequently allocate the same stack slot again later in the same function.

This may happen even without the debugger, e.g. if you take a hardware interrupt or some other trap that pushes data onto the stack and overwrites data in the previously-unallocated stack region.

  • Ken (Msft)

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Friday, June 29, 2018 6:08 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Strange stack corruption during single-step debugging

Thank you Anton !
You are absolutely right. I never actually digged how debuggers work and was assuming that cpu just stops after each instruction if trap flag is set. But processor actually generates debug exception after each instruction. Now that behavior makes sense.

On Fri, Jun 29, 2018 at 2:28 PM xxxxx@hotmail.commailto:xxxxx > wrote:
Don’t forget that stack grows downwards on x86…

What you do in the first version (i.e. when you add 8 to RSP) can be interpreted, from the logical standpoint, as either :

1. De-allocating a local variable from the stack before the function even attempts to return, or

2 Popping off an argument that has not yet been pushed on the stack

Certainly, as long as you re-adjust it accordingly before returning everything should work just fine (and it,indeed, does without a debugger), but the very concept of interactive debugging invariably involves memory modifications that have to be done by the debugger. When you single-step through the code, the debugger has to insert a breakpoint into the code and,apparently, to modify the stack in an expectation of the forthcoming debug exception, and to do so upon every instruction execution. Therefore, your “bold” stack modification, apparently, just gets at odds with the ones made by the debugger, so that the stack gets corrupted.

However,in the latter case the only thing that you do is just pushing an argument on the stack,
so that everything works flawlessly even in single-stepping mode

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at: http:>

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:>

To unsubscribe, visit the List Server section of OSR Online at http:>
— NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! Details at To unsubscribe, visit the List Server section of OSR Online at</http:></http:></http:></mailto:xxxxx>

Thank you so much Ken ! I didn’t even think about interrupt dispatching.
Indeed now it is crystal clear that randomly incrementing stack pointer is
dangerous practice. Thank you very much!

On Fri, 29 Jun 2018 at 21:14, xxxxx@valhallalegends.com <
xxxxx@lists.osr.com> wrote:

Except where the platform ABI defines the notion of a true “red zone”
where some portion of unallocated stack space is guaranteed not to be used
by traps, exceptions, etc. (NT does not for most platforms), you must
always assume that space on the stack that has not yet been allocated is
volatile and might be overwritten at any time, even if you subsequently
allocate the same stack slot again later in the same function.

This may happen even without the debugger, e.g. if you take a hardware
interrupt or some other trap that pushes data onto the stack and overwrites
data in the previously-unallocated stack region.

  • Ken (Msft)

*From:* xxxxx@lists.osr.com [mailto:
xxxxx@lists.osr.com] *On Behalf Of *xxxxx@gmail.com
*Sent:* Friday, June 29, 2018 6:08 AM
*To:* Windows System Software Devs Interest List
> Subject: Re: [ntdev] Strange stack corruption during single-step
> debugging
>
>
>
> Thank you Anton !
>
> You are absolutely right. I never actually digged how debuggers work and
> was assuming that cpu just stops after each instruction if trap flag is
> set. But processor actually generates debug exception after each
> instruction. Now that behavior makes sense.
>
>
>
> On Fri, Jun 29, 2018 at 2:28 PM xxxxx@hotmail.com <
> xxxxx@lists.osr.com> wrote:
>
> Don’t forget that stack grows downwards on x86…
>
> What you do in the first version (i.e. when you add 8 to RSP) can be
> interpreted, from the logical standpoint, as either :
>
> 1. De-allocating a local variable from the stack before the function even
> attempts to return, or
>
> 2 Popping off an argument that has not yet been pushed on the stack
>
>
> Certainly, as long as you re-adjust it accordingly before returning
> everything should work just fine (and it,indeed, does without a debugger),
> but the very concept of interactive debugging invariably involves memory
> modifications that have to be done by the debugger. When you single-step
> through the code, the debugger has to insert a breakpoint into the code
> and,apparently, to modify the stack in an expectation of the forthcoming
> debug exception, and to do so upon every instruction execution. Therefore,
> your “bold” stack modification, apparently, just gets at odds with the ones
> made by the debugger, so that the stack gets corrupted.
>
>
>
> However,in the latter case the only thing that you do is just pushing an
> argument on the stack,
> so that everything works flawlessly even in single-stepping mode
>
>
> Anton Bassov
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev
> https:
> >
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:> https:
> >
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer
> https:
> >
>
> — NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars
> on crash dump analysis, WDF, Windows internals and software drivers!
> Details at To unsubscribe, visit the List Server section of OSR Online at
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:></https:></https:></http:></https:>

> Except where the platform ABI defines the notion of a true ???red zone???
> where some portion of unallocated stack space is guaranteed not to be used
> by traps, exceptions, etc. (NT does not for most platforms), you must

always assume that space on the stack that has not yet been allocated is
volatile and might be overwritten at any time, even if you subsequently
allocate the same stack slot again later in the same function. >

Actually, the above seems to be valid in 100% of cases, regardless of ABI…

Just look at the whole thing from the CPU’s perspective. When interrupt or exception occurs it has to set up a trap frame based upon the stack pointer. Therefore, if you randomly add some value to the stack pointer (which is equivalent to de-allocating some local variables) and interrupt occurs, the trap frame is going to overwrite at least a part of the memory occupied by “de-allocated” local variables anyway, even if trap handler itself does not touch certain areas of the stack.

At this pojnt we are already it a position to give a precise explanation to why the OP’s code crashes. Judging from his code, there are no local variables in his function, so that by adding 8 bytes to RSP he actually de-allocates memory that is occupied by the return address. As a result, a trap frame that get set up due to the debug exception overwrites the address that the target function is supposed to return to. As a result, when he restores the RSP the value on top of the stack points already to the middle of nowhere, so that he crashes when his target function tries to make a return…

If he had some local variables in his function he would get away with the whole thing as long as he did not expect them to be of certain values upon the stack restoration. However, once there are no local variables he crashes right on the spot…

Anton Bassov

It depends on the processor architecture and the OS’s platform ABI.

Some architectures do not cause the hardware to directly push data onto the kernel stack when dispatching an interrupt. Some do, but allow the OS to designate an alternate stack pointer if it so desires.

NT on AMD64 and NT on x86 don’t have a true “red zone”. NT on ARM32/Thumb2 has a very small (8 byte) red zone in user *and* kernel mode that will not be overwritten by interrupts or the OS itself when the OS reflects exceptions/etc. down to user mode, which is reserved for use by profiling instrumentation tools that may need temporary storage on architectures that don’t have combined address computation and store/load instructions. That red zone is intended only for profiling/instrumentation tools though, and, for example, compiled code or handwritten assembler should steer clear of using it.

Some other non-NT OS’s have larger red zones or may employ them on different architectures than we do, or for different purposes (i.e. other than just allowing instrumentation tools a convenient way to insert code that can grab a register at any time).

  • Ken (Msft)

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@hotmail.com
Sent: Friday, June 29, 2018 4:47 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Strange stack corruption during single-step debugging

> Except where the platform ABI defines the notion of a true ???red zone???
> where some portion of unallocated stack space is guaranteed not to be used > by traps, exceptions, etc. (NT does not for most platforms), you must
> always assume that space on the stack that has not yet been allocated
> is volatile and might be overwritten at any time, even if you
> subsequently allocate the same stack slot again later in the same
> function. >

Actually, the above seems to be valid in 100% of cases, regardless of ABI…

Just look at the whole thing from the CPU’s perspective. When interrupt or exception occurs it has to set up a trap frame based upon the stack pointer. Therefore, if you randomly add some value to the stack pointer (which is equivalent to de-allocating some local variables) and interrupt occurs, the trap frame is going to overwrite at least a part of the memory occupied by “de-allocated” local variables anyway, even if trap handler itself does not touch certain areas of the stack.

At this pojnt we are already it a position to give a precise explanation to why the OP’s code crashes. Judging from his code, there are no local variables in his function, so that by adding 8 bytes to RSP he actually de-allocates memory that is occupied by the return address. As a result, a trap frame that get set up due to the debug exception overwrites the address that the target function is supposed to return to. As a result, when he restores the RSP the value on top of the stack points already to the middle of nowhere, so that he crashes when his target function tries to make a return…

If he had some local variables in his function he would get away with the whole thing as long as he did not expect them to be of certain values upon the stack restoration. However, once there are no local variables he crashes right on the spot…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at: https:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at https:

To unsubscribe, visit the List Server section of OSR Online at https:</https:></https:></https:>

> It depends on the processor architecture

Sure - I was speaking strictly about x86 and x86_64

Some architectures do not cause the hardware to directly push data onto
the kernel stack when dispatching an interrupt.

Sure. In fact, the very idea of hardware-assisted stack management (i.e. with special instructions for calls, pushes,pops and returns, and, hence, with a special register for a stack pointer) is at odds with the very concept of a load-and store architecture. The only way that memory is supposed to be accessed on a load-and store architecture is by special load and store instructions, but above mentioned stack-related instructions are just bound to access it behind the scenes.

Therefore, these architectures prefer to deal with registers,rather than with memory (i.e with a stack), when it comes to control transfer. For example, when you call a function they save the return address in a register, rather than on the stack (i.e.so-called branch-and-link). It is a callee’s responsibility to save it on the stack with a store instruction so that it knows where to return. When it comes to returning, the return address ,again, has to be loaded into a register before branching is made.

On such an architecture everything, indeed, depends upon the software-defined convention (i.e. ABI) - it has a total control over the stack layout …

Anton Bassov

That code is NOT ok in normal execution, it’s just the failure will not happen every time, and may even only happen rarely depending on how many interrupts trigger on that processor.

When you add to the stack pointer, you are telling the processor all memory on the stack below the new value can be overwritten at any time, like if an interrupt happens on that processor. When you’re single stepping, you may be taking and returning from an interrupt between the stepped instructions, and the return address is corrupted because you told the processor it was free to push items on the stack from an interrupt.

When dealing with Windows kernel assembler, you should always assume anything below the stack pointer is garbage or will become garbage at any moment.

Jan

From: xxxxx@lists.osr.com On Behalf Of xxxxx@gmail.com
Sent: Friday, June 29, 2018 2:36 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Strange stack corruption during single-step debugging

Hello!
Function to reproduce:
function proc
add rsp,8
sub rsp,8
ret
function endp

If i just run this function everything is ok. But if i try to f10 then stack becomes corrupted (return address is overwritten). If i set access breakpoint at address of return address it turns out that nt!KiDebugTrapOrFault overwrites the stack. If i switch add and sub
function proc
sub rsp,8
add rsp,8
ret
function endp
this code works fine even in single-step debugging.

Don’t really have a question, but maybe you have something to say about this situation. I understand that incremented stack pointer shouldn’t be used inside this function to reference memory, but i definitely don’t see anything criminal about just incrementing and decrementing stack pointer itself. Don’t know if this is bug in debug exception dispatcher or intended behavior.
— NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! Details at To unsubscribe, visit the List Server section of OSR Online at