Strange exception record in mini dump

Hi guys. I have got two mini dump files, both of them had an exception record in it. The strange thing is if I manually evaluate the code at eip the result doesn’t math the exception record :
.ecxr show me that I am trying to read at 20bda4d4, which is a valid address at the current thread’s stack, it should’t caught an exception:
0:048> .ecxr
eax=00000055 ebx=00000004 ecx=00000240 edx=0c748da8 esi=00000240 edi=0d364b18
eip=50de2523 esp=20bda404 ebp=20bdadc8 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
syMain+0x683:
50de2523 8b9d0cf7ffff mov ebx,dword ptr [ebp-8F4h] ss:002b:20bda4d4=00000000

.exr -1 give me that I am trying to read at 209da4d4, which is an invalid address and caught an exception:
0:048> .exr -1
ExceptionAddress: 50de2523 (syMain+0x00000683)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 209da4d4
Attempt to read from address 209da4d4

What confused me is that it seems like ebp in the exception record had got a wrong value but how could that happened? I mean the exception record saved the state when exception happened, so when cpu execute at 50de2523: mov ebx,dword ptr [ebp-8F4h] and triggered an exception, ebp should saved in the exception record immediately.So am I missing something here? It’ll be great thankful if anyone can help.

Here is more information Windbg giving:
0:048> !teb
TEB at ffe7d000
ExceptionList: 20bd9e10
StackBase: 20be0000
StackLimit: 20ae0000
SubSystemTib: 00000000
FiberData: 00001e00
ArbitraryUserPointer: 00000000
Self: ffe7d000
EnvironmentPointer: 00000000
ClientId: 000023f4 . 00002140
RpcHandle: 00000000
Tls Storage: 0051df30
PEB Address: fffde000
LastErrorValue: 0
LastStatusValue: c0000023
Count Owned Locks: 0
HardErrorMode: 0

    0:048> ub syMain+0x683
    syMain+0x671:
    50de2511 03c0            add     eax,eax
    50de2513 8bd8            mov     ebx,eax
    50de2515 668b02          mov     ax,word ptr [edx]
    50de2518 03d3            add     edx,ebx
    50de251a 6689044f        mov     word ptr [edi+ecx*2],ax
    50de251e 41              inc     ecx
    50de251f 3bce            cmp     ecx,esi
    50de2521 7cf2            jl      syMain+0x675 (50de2515)
    
    0:048> u syMain+0x683
    syMain+0x683:
    50de2523 8b9d0cf7ffff    mov     ebx,dword ptr [ebp-8F4h]
    50de2529 8bbdecf6ffff    mov     edi,dword ptr [ebp-914h]
    50de252f 8b5508          mov     edx,dword ptr [ebp+8]
    50de2532 47              inc     edi
    50de2533 89bdecf6ffff    mov     dword ptr [ebp-914h],edi
    50de2539 3bfa            cmp     edi,edx
    50de253b 0f8c7ffdffff    jl      syMain+0x420 (50de22c0)
    50de2541 85db            test    ebx,ebx

On May 23, 2019, at 8:08 PM, NetSpring wrote:
>
> Hi guys. I have got two mini dump files, both of them had an exception record in it. The strange thing is if I manually evaluate the code at eip the result doesn’t math the exception record :

Do you have some very large data structures on the stack? I see references to ebp-0x900, which means you have at least three pages of stack, but is there lots more? Perhaps you should try making some of those statics.

Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Note: The email was trying to reply to an invalid Discussion (291354).

@Tim_Roberts said:
On May 23, 2019, at 8:08 PM, NetSpring wrote:

Hi guys. I have got two mini dump files, both of them had an exception record in it. The strange thing is if I manually evaluate the code at eip the result doesn’t math the exception record :

Do you have some very large data structures on the stack? I see references to ebp-0x900, which means you have at least three pages of stack, but is there lots more? Perhaps you should try making some of those statics.

Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Note: The email was trying to reply to an invalid Discussion (291354).

Hi Tim, Thanks for your reply. Indeed I have got some pretty large arrays on the stack, but that shouldn’t be the problem cause we still got enough space on the stack. What confused me is:

  1. .ecxr show me that I am reading at 0x20bda4d4, which is a valid stack address
  2. .exr -1 show me that I am reading at 0x209da4d4, which is an invalid address

So which command should I trust or is it possible that mini dump had record a wrong exception address?

Well, repeating something that I said on the [ntdev] list this morning, the debugger is not omniscient. Trust, but verify.

Stack is a funny thing. Just because there’s a megabyte of stack does not mean that a megabyte of memory is allocated for it. The address space is reserved, but pages won’t be allocated until they are needed.

Still, none of that explains a 0x200000 difference in the addresses, unless you really do have 2 megabytes of arrays. Are you able to reproduce this? Have you TRIED moving the arrays out of the stack, perhaps by using a std::vector?

@Tim_Roberts said:
Well, repeating something that I said on the [ntdev] list this morning, the debugger is not omniscient. Trust, but verify.

Stack is a funny thing. Just because there’s a megabyte of stack does not mean that a megabyte of memory is allocated for it. The address space is reserved, but pages won’t be allocated until they are needed.

Still, none of that explains a 0x200000 difference in the addresses, unless you really do have 2 megabytes of arrays. Are you able to reproduce this? Have you TRIED moving the arrays out of the stack, perhaps by using a std::vector?

Thanks Tim. I can’t reproduce this problem and I only have three dumps from the customer. One dump shows a 0x200000 difference in the addresses, the other two shows a bigger value with 0x2000 0000. What I know is the reserved page will trigger a page fault exception and OS will handle this for us so we shouldn’t worry about the missed page right?
StackBase: 20be0000 StackLimit: 20ae0000

And according to the !teb command show that we only have a 1M size stack so we can’t have 2 megabytes of arrays on the stack. Am I right? What you mean by “none of that explains a 0x200000 difference in the addresses, unless you really do have 2 megabytes of arrays”? Why if I have 2 megabytes of arrays will explain the difference?