Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Incorrect function call on stack

Suresh_PatilSuresh_Patil Member Posts: 119

Hi,

I am debugging one 0xD1 bugcheck and finding an inconsistency in the stack trace.

Here goes the stack trace:

# Child-SP RetAddr Call Site
00 fffff80415750468 fffff8040d1c6869 nt!KeBugCheckEx
01 fffff80415750470 fffff8040d1c2c8e nt!KiBugCheckDispatch+0x69
02 fffff804157505b0 fffff804675d3088 nt!KiPageFault+0x44e
03 fffff80415750740 fffff804675d35fb MyDriver!Function7+0x38
04 fffff80415750770 fffff804675f3005 MyDriver!Function4+0x39b
05 (Inline Function) ---------------- MyDriver!Function3+0x10c 06 fffff804157507f0 fffff8046760507f MyDriver!Function2+0x2e5 07 fffff80415750870 fffff8046d87a6f8 MyDriver!Function1+0x34f 08 fffff80415750900 fffff8040d067729 storport!RaidpAdapterTimerDpcRoutine+0x58 09 fffff80415750960 fffff8040d0666e7 nt!KiProcessExpiredTimerList+0x159 0a fffff80415750a50 fffff8040d1b8a5a nt!KiRetireDpcList+0x4a7 0b fffff80415750c60 00000000`00000000 nt!KiIdleLoop+0x5a

Note that Function7 is getting called from Function 4.
Here is the skeleton of Function 4:

void Function4()
{
....
....
(pFunction5)();
--> Function6();
return;
}

In the WinDbg the line where Function6 is called is highlighted, which means the the control was passed from the previous line.
That happens to be a call to Function 5 using a pointer to that function. So in the stack Function5 should show up after Function 4, but is is Function7! The Function7 is never called from anywhere within Function4 anyway.

Any idea what can lead to this? If I execute a 'ln' command on the value of that function pointer (pFunction5) I correctly get Function 5!

Thanks,
Suresh

Comments

  • Martin_DrábMartin_Dráb Member - All Emails Posts: 46

    Did you compile your driver in Debug or Release build? In latter case, compiler (or linker) may perform some inter-procedural optimizations that may introduce some mess into driver code. For example, it may be found that Function5/6 is the same as Function7, so there is no need to store all of them in the driver binary.

    If you wish to track function calls more reliably, make a debug print in the beginning and/or an end of your functions and than use the !dbgprint command (or just watch the prints in WinDbg). Alternatively, you can just single step your code (which is a very slow process but may help in the end).

    Martin Dráb

  • Suresh_PatilSuresh_Patil Member Posts: 119

    Thanks for your comment Martin. This issue is however not repro, it was observed once and we have the dump for the same. However even if it happened once then it is scary as it may happen anytime again, and by Murphy's law likely in customer config :p

    Wanted to understand what could have caused it and if there is anything to fix in our code. I don't think using function pointers in drivers is a bad idea, or is it?
    Also is it really related to the function pointer? But then WinDbg command 'ln' does show the correct value for the intended function.

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,135

    It's possibly due to tail call optimization. If the end of your function is something like this:

    foo() {
        return bar();
    }
    

    Normally the compiler will generate the following assembly:

    call bar
    ret
    

    But that's not optimal because the return address doesn't do anything but a ret instruction to foo's caller. Instead, the compiler may choose to leave foo's caller's return address on the stack and convert the call to a jmp:

    jmp bar
    

    The call stack in this case looks wrong because this frame has no return back to foo on the stack.

    -scott
    OSR

  • anton_bassovanton_bassov Member Posts: 5,003

    Well, let's think about the whole thing from the debugger's perspective. It does not know where a function actually ends
    (i.e. the locationof its RET instruction), right. The only thing that it knows is the address where a function begins (it can get this info from the debug symbols). Therefore, if it sees that the return address on the stack falls somewhere in between the addresses of the functions N and N+1 it may logically conclude that function N has made a call.

    However, a function N+1 is not necessarily going to begin immediately after the RET instruction of a function N. In fact, you can be almost sure that there is at least some 0xCC-filled padding in between these two. Furthermore, PE sections are page-aligned, so that the area in between the end of the last function and the end of the page is going to be 0xCC-filled as well.

    Now let's consider what happens if execution jumps to some 0xCC-filled area that immediately follows a function X. It is understandable that the exception will get raised straight away, but how is the debugger going to understand the address that gets pushed on the stack? It is going to interpret it as MyDriver!FunctionX+0xabc, although this function has absolutely nothing to do with the exception.

    In other words, you may be just looking at the wrong culprit. For example, some function that Function4 calls may be corrupting the stack once in a while, effectively returning to the wrong address in the end of code section, which, in turn, causes an exception.

    Anton Bassov

  • Martin_DrábMartin_Dráb Member - All Emails Posts: 46

    However, a function N+1 is not necessarily going to begin immediately after the RET instruction of a function N. In fact, you can be almost sure that there is at least some 0xCC-filled padding in between these two. Furthermore, PE sections are page-aligned, so that the area in between the end of the last function and the end of the page is going to be 0xCC-filled as well.

    I would suggest the same thing, however, there is MyDriver!Function7+0x38 on the stack, so it seems not to be the case (unless the function is really really short which can be easily checked in disassembler). Of course, the debugger may be misled by a stack corrpution.

    OP:
    Did you run your driver with Driver Verifier and/or used your own memory allocation wrappers to check for memory corruption? You may also try to extensively test the driver on 32-bit Windows, since call stack is more reliable there (lack of general purpose registers for compiler optimizations, stdcall calling convention).

    Martin Dráb

  • anton_bassovanton_bassov Member Posts: 5,003

    .....unless the function is really really short .......

    .....or just contains some 0xCC-filled padding that is meant to be either jumped over or avoided by returning.....

    Anton Bassov

  • Suresh_PatilSuresh_Patil Member Posts: 119

    Thanks for all the comments and suggestions. In the meantime I have noticed one thing that I would like to point out.

    As I mentioned earlier, in the WinDbg with the source path set when I select the frame MyDriver!Function4, I see that the cursor is pointing to the call to Function6 and so I assumed the call was transferred from the previous instruction (which is call to Function5 using a pointer to function).

    However inside Function6 there is actually a call to Function7 (note that Function 7 is at the top of the stack)! So the execution actually happened in this order Function4 -> Function6 -> Function7.

    So two assumptions gone wrong:
    1) The line highlighted in the source window is yet to be executed and it is always the previous line that transfers the call or has problem
    2) The call stack should have shown the order Function4 -> Function6 -> Function7 (but in reality Function6 is missing)

    Are these really wrong assumptions to have, or anything can be lurking beneath this issue?

    @Martin, yes, I have checked with the Verifier, but not yet in checked on 32-bit machine.
    Also the actual crash is in Function7 which I have fixed now, just trying to get my understanding of interpreting the stack traces correctly.

    Regards,
    Suresh

  • Martin_DrábMartin_Dráb Member - All Emails Posts: 46

    2) The call stack should have shown the order Function4 -> Function6 -> Function7 (but in reality Function6 is missing)

    This may happen when Function6 gets inlined.

    Martin Dráb

  • Suresh_PatilSuresh_Patil Member Posts: 119

    When we are asking the compiler to make a function inline, the stack trace still shows it as inline. E.g. Function3 is inline below.

    04 fffff80415750770 fffff804675f3005 MyDriver!Function4+0x39b
    05 (Inline Function) ---------------- MyDriver!Function3+0x10c 06 fffff804157507f0 fffff804`6760507f MyDriver!Function2+0x2e5

    Do you mean Function6 gets inlined without we specifically telling compiler to make it so? And in such a case it won't appear in the stack, even as inline, like above?

  • Martin_DrábMartin_Dráb Member - All Emails Posts: 46

    Do you mean Function6 gets inlined without we specifically telling compiler to make it so? And in such a case it won't appear in the stack, even as inline, like above?

    I think the compiler can basically inline what it thinks makes better results (in terms of speed etc.). Unless you tell otherwise. I suppose the inline operation may be doe in such a way that leaves no traces (on stack). The code of the inlined function just gets "copied" to place(s) of its invocation.

    Martin Dráb

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,002

    The debugger is not omniscient. It does the best it can. All it has are a series of mileposts (Function4 enters here, Function6 enters here). It does not have a complete thought-transference from the compiler and linker. In addition, the x64 calling sequence means that many functions never store their parameters on the stack at all. In addition, exception handling in itself is an imprecise science. The debugger makes its best guess, but it's always going to be up to a human to divine what actually went on.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • anton_bassovanton_bassov Member Posts: 5,003
    > Do you mean Function6 gets inlined without we specifically telling compiler to make it so? 
    

    Apparently, not in a Debug build - how would you debug the code like that? However, in the Release one it may well happen so if the compiler optimisations are enabled. For example, consider a static function that gets called only once in the entire file. In such case the compiler may well relocate it, and combine it with the caller.

    Anton Bassov

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Developing Minifilters 29 July 2019 OSR Seminar Space
Writing WDF Drivers 23 Sept 2019 OSR Seminar Space
Kernel Debugging 21 Oct 2019 OSR Seminar Space
Internals & Software Drivers 18 Nov 2019 Dulles, VA