Incorrect function call on stack

Hi,

I am debugging one 0xD1 bugcheck and finding an inconsistency in the stack trace.

Here goes the stack trace:

Child-SP RetAddr Call Site

00 fffff80415750468 fffff8040d1c6869 nt!KeBugCheckEx
01 fffff80415750470 fffff8040d1c2c8e nt!KiBugCheckDispatch+0x69
02 fffff804157505b0 fffff804675d3088 nt!KiPageFault+0x44e
03 fffff80415750740 fffff804675d35fb MyDriver!Function7+0x38
04 fffff80415750770 fffff804675f3005 MyDriver!Function4+0x39b
05 (Inline Function) ---------------- MyDriver!Function3+0x10c 06 fffff804157507f0 fffff8046760507f MyDriver!Function2+0x2e5 07 fffff80415750870 fffff8046d87a6f8 MyDriver!Function1+0x34f 08 fffff80415750900 fffff8040d067729 storport!RaidpAdapterTimerDpcRoutine+0x58 09 fffff80415750960 fffff8040d0666e7 nt!KiProcessExpiredTimerList+0x159 0a fffff80415750a50 fffff8040d1b8a5a nt!KiRetireDpcList+0x4a7 0b fffff80415750c60 00000000`00000000 nt!KiIdleLoop+0x5a

Note that Function7 is getting called from Function 4.
Here is the skeleton of Function 4:

void Function4()
{


(pFunction5)();
→ Function6();
return;
}

In the WinDbg the line where Function6 is called is highlighted, which means the the control was passed from the previous line.
That happens to be a call to Function 5 using a pointer to that function. So in the stack Function5 should show up after Function 4, but is is Function7! The Function7 is never called from anywhere within Function4 anyway.

Any idea what can lead to this? If I execute a ‘ln’ command on the value of that function pointer (pFunction5) I correctly get Function 5!

Thanks,
Suresh

Did you compile your driver in Debug or Release build? In latter case, compiler (or linker) may perform some inter-procedural optimizations that may introduce some mess into driver code. For example, it may be found that Function5/6 is the same as Function7, so there is no need to store all of them in the driver binary.

If you wish to track function calls more reliably, make a debug print in the beginning and/or an end of your functions and than use the !dbgprint command (or just watch the prints in WinDbg). Alternatively, you can just single step your code (which is a very slow process but may help in the end).

Thanks for your comment Martin. This issue is however not repro, it was observed once and we have the dump for the same. However even if it happened once then it is scary as it may happen anytime again, and by Murphy’s law likely in customer config :stuck_out_tongue:

Wanted to understand what could have caused it and if there is anything to fix in our code. I don’t think using function pointers in drivers is a bad idea, or is it?
Also is it really related to the function pointer? But then WinDbg command ‘ln’ does show the correct value for the intended function.

It’s possibly due to tail call optimization. If the end of your function is something like this:

foo() {
    return bar();
}

Normally the compiler will generate the following assembly:

call bar
ret

But that’s not optimal because the return address doesn’t do anything but a ret instruction to foo’s caller. Instead, the compiler may choose to leave foo’s caller’s return address on the stack and convert the call to a jmp:

jmp bar

The call stack in this case looks wrong because this frame has no return back to foo on the stack.

Well, let’s think about the whole thing from the debugger’s perspective. It does not know where a function actually ends
(i.e. the locationof its RET instruction), right. The only thing that it knows is the address where a function begins (it can get this info from the debug symbols). Therefore, if it sees that the return address on the stack falls somewhere in between the addresses of the functions N and N+1 it may logically conclude that function N has made a call.

However, a function N+1 is not necessarily going to begin immediately after the RET instruction of a function N. In fact, you can be almost sure that there is at least some 0xCC-filled padding in between these two. Furthermore, PE sections are page-aligned, so that the area in between the end of the last function and the end of the page is going to be 0xCC-filled as well.

Now let’s consider what happens if execution jumps to some 0xCC-filled area that immediately follows a function X. It is understandable that the exception will get raised straight away, but how is the debugger going to understand the address that gets pushed on the stack? It is going to interpret it as MyDriver!FunctionX+0xabc, although this function has absolutely nothing to do with the exception.

In other words, you may be just looking at the wrong culprit. For example, some function that Function4 calls may be corrupting the stack once in a while, effectively returning to the wrong address in the end of code section, which, in turn, causes an exception.

Anton Bassov

However, a function N+1 is not necessarily going to begin immediately after the RET instruction of a function N. In fact, you can be almost sure that there is at least some 0xCC-filled padding in between these two. Furthermore, PE sections are page-aligned, so that the area in between the end of the last function and the end of the page is going to be 0xCC-filled as well.

I would suggest the same thing, however, there is MyDriver!Function7+0x38 on the stack, so it seems not to be the case (unless the function is really really short which can be easily checked in disassembler). Of course, the debugger may be misled by a stack corrpution.

OP:
Did you run your driver with Driver Verifier and/or used your own memory allocation wrappers to check for memory corruption? You may also try to extensively test the driver on 32-bit Windows, since call stack is more reliable there (lack of general purpose registers for compiler optimizations, stdcall calling convention).

…unless the function is really really short …

…or just contains some 0xCC-filled padding that is meant to be either jumped over or avoided by returning…

Anton Bassov

Thanks for all the comments and suggestions. In the meantime I have noticed one thing that I would like to point out.

As I mentioned earlier, in the WinDbg with the source path set when I select the frame MyDriver!Function4, I see that the cursor is pointing to the call to Function6 and so I assumed the call was transferred from the previous instruction (which is call to Function5 using a pointer to function).

However inside Function6 there is actually a call to Function7 (note that Function 7 is at the top of the stack)! So the execution actually happened in this order Function4 → Function6 → Function7.

So two assumptions gone wrong:

  1. The line highlighted in the source window is yet to be executed and it is always the previous line that transfers the call or has problem
  2. The call stack should have shown the order Function4 → Function6 → Function7 (but in reality Function6 is missing)

Are these really wrong assumptions to have, or anything can be lurking beneath this issue?

@Martin, yes, I have checked with the Verifier, but not yet in checked on 32-bit machine.
Also the actual crash is in Function7 which I have fixed now, just trying to get my understanding of interpreting the stack traces correctly.

Regards,
Suresh

  1. The call stack should have shown the order Function4 → Function6 → Function7 (but in reality Function6 is missing)

This may happen when Function6 gets inlined.

When we are asking the compiler to make a function inline, the stack trace still shows it as inline. E.g. Function3 is inline below.

04 fffff80415750770 fffff804675f3005 MyDriver!Function4+0x39b
05 (Inline Function) ---------------- MyDriver!Function3+0x10c 06 fffff804157507f0 fffff804`6760507f MyDriver!Function2+0x2e5

Do you mean Function6 gets inlined without we specifically telling compiler to make it so? And in such a case it won’t appear in the stack, even as inline, like above?

Do you mean Function6 gets inlined without we specifically telling compiler to make it so? And in such a case it won’t appear in the stack, even as inline, like above?

I think the compiler can basically inline what it thinks makes better results (in terms of speed etc.). Unless you tell otherwise. I suppose the inline operation may be doe in such a way that leaves no traces (on stack). The code of the inlined function just gets “copied” to place(s) of its invocation.

The debugger is not omniscient. It does the best it can. All it has are a series of mileposts (Function4 enters here, Function6 enters here). It does not have a complete thought-transference from the compiler and linker. In addition, the x64 calling sequence means that many functions never store their parameters on the stack at all. In addition, exception handling in itself is an imprecise science. The debugger makes its best guess, but it’s always going to be up to a human to divine what actually went on.

> Do you mean Function6 gets inlined without we specifically telling compiler to make it so? 

Apparently, not in a Debug build - how would you debug the code like that? However, in the Release one it may well happen so if the compiler optimisations are enabled. For example, consider a static function that gets called only once in the entire file. In such case the compiler may well relocate it, and combine it with the caller.

Anton Bassov