Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Some memory corruptions debugging

Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224
edited April 14 in WINDBG

I got the following CI error during very early boot, without our minifilter:

* This break indicates this binary is not signed correctly: \Device\HarddiskVolume2\ALFACW_2019_04_14-13_37_21\Windows\System32\COMDLG32.dll * and does not meet the system policy. * The binary was attempted to be loaded in the process: \Device\HarddiskVolume2\Windows\System32\smss.exe * This is not a failure in CI, but a problem with the failing binary. * Please contact the binary owner for getting the binary correctly signed. Code Integrity violation: 6271

Obviously, the driver corrupted loading of the mentioned DLL. But the stack is not helpful:
890e383c 863a625a 8959ca80 00000000 00000008 CI!CipReportAndReprieveUMCIFailure+0x331 (FPO: [Non-Fpo]) 890e3908 81dbfaa2 8959ca80 8ee00000 000d6000 CI!CiValidateImageHeader+0x79e (FPO: [13,35,4]) 890e3954 81dbf7ba 000d6000 8c55c580 8c783a78 nt!SeValidateImageHeader+0x4a (FPO: [Non-Fpo]) 890e3a3c 81debdc4 ffffffff 8c783a78 00000000 nt!MiValidateSectionCreate+0x1b4 (FPO: [Non-Fpo]) 890e3a68 81dc2b19 8d413200 ffffffff 8c783a78 nt!MiValidateSectionSigningPolicy+0x5e (FPO: [Non-Fpo]) 890e3b50 81dea336 00000000 00000000 00b4f4b8 nt!MiCreateNewSection+0x41d (FPO: [Non-Fpo]) 890e3be0 81de9adf 00000010 01000000 8c783a78 nt!MiCreateImageOrDataSection+0x256 (FPO: [Non-Fpo]) 890e3c84 81de9a3c 00000000 890e3cfc 00000010 nt!MiCreateSection+0x7f (FPO: [Non-Fpo]) 890e3cc4 81de98b7 890e3d4c 000f001f 00b4f4b8 nt!MmCreateSection+0x82 (FPO: [Non-Fpo]) 890e3d30 81bd93ed 00b4f428 000f001f 00b4f4b8 nt!NtCreateSection+0x137 (FPO: [Non-Fpo]) 890e3d30 7788fcd0 00b4f428 000f001f 00b4f4b8 nt!KiSystemServicePostCall (FPO: [0,3] TrapFrame @ 890e3d54) 00b4f3d0 7788e8aa 7788aec1 00b4f428 000f001f ntdll!KiFastSystemCallRet (FPO: [0,0,0]) 00b4f3d4 7788aec1 00b4f428 000f001f 00b4f4b8 ntdll!ZwCreateSection+0xa (FPO: [7,0,0]) 00b4f470 00fe830f 0000008d 00b4f4f8 00000000 ntdll!LdrVerifyImageMatchesChecksumEx+0x71 (FPO: [Non-Fpo]) WARNING: Frame IP not in any known module. Following frames may be wrong. 00b4f58c 778b7267 00b700e0 040030cc 0000ff00 0xfe830f 00b4f5c4 778555fc 00b700d8 04002ac8 04000010 ntdll!RtlpHpVsFreeChunkInsert+0x61ac8 00b4f60c 7784e90c 00000000 00000000 00b70000 ntdll!RtlpHpVsContextFree+0x36c (FPO: [3,9,4]) 00b4f668 7785424e 04002b24 00000000 778542e1 ntdll!RtlpHpFreeHeap+0x32c (FPO: [3,7,4]) 00b4f6a8 00000000 f6373652 00000000 04002b28 ntdll!RtlpHpVsContextAllocate+0x19e (FPO: [Non-Fpo])

How do I even get the data loaded for that file in WinDBG, to see if the offending offsets give a clue about the corruption?
Or any other ways to figure this one. Placing debug prints is not helpful.

A similar issue occurred on a different boot, but with MiCreateSection causing a fault in memset, due to an invalid address or size.

Comments

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224

    Another case (similar situation) reports a memory_corruption with Arg2 = 2 (execute?)
    Funny enough, some debug prints showed that my driver read in data JUST below that address (e.g. we read in bde7000, Length 1000, and the faulting address Arg1 is bde8000). <this part does not always happen, i.e. I do not always have debug output showing such address "concatenation".

    That's one of the other I mentioned faulting in MiCreateSection, running memset on the faulting address.
    ```kd> !pte 9dc7d000 << faulting address
    VA 9dc7d000
    PDE at C0602770 PTE at C04EE3E8
    contains 000000000F9CC863 contains 0000000000000000
    pfn f9cc ---DA--KWEV not valid

    kd> !pte 9dc7c000
    VA 9dc7c000
    PDE at C0602770 PTE at C04EE3E0
    contains 000000000F9CC863 contains 80000000231F0863
    pfn f9cc ---DA--KWEV pfn 231f0 ---DA--KWEV```

    Last call before KiTrap:
    a02a7984 81badf2d 9dc7d000 00000000 fffff000 nt!memset+0x45

    which, provided the debug output is all there is, would make no sense, attempting to set the FFFFF000 bytes to zero.

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224

    Any clues how I can even start debugging either of these? I can almost instantly reproduce the second (memory_corruption) BSOD, with pure certainly, so I am quite sure it is something I am doing.

    But corrupting memory that far off, or setting it to an incorrect access state?

    My driver is supposed to offload all writes to several folders (windows is not one of them, at least not yet).

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,123

    Don't believe the "Args to Child", it's meaningless. Have you tried enabling Verifier all over the place? I'd start with your driver and FltMgr.sys

    -scott
    OSR

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224
    via Email
    I did all variations of DV, either special pool or memory tracking
    (not at the same time), I/O verifications, etc.
    The only one I did not test was Low Memory simulation.

    >
    > Don't believe the "Args to Child", it's meaningless. Have you tried enabling
    > Verifier all over the place? I'd start with your driver and FltMgr.sys
    >

    Why are "Args to Child" meaningless?
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,966
    via Email
    Dejan_Maksimovic wrote:
    >
    > Why are "Args to Child" meaningless?

    In the 64-bit world, many functions don't ever commit their parameters
    to memory.  The first 4 are all done in registers, where the dump can't
    see them.  In many cases, they do spill to memory, so the arguments are
    not entirely useless, but if you see trash, it's probably not real.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224
    via Email
    Uh, right...... except in this case, I was looking at that stack and it had:
    ```push eax
    push 0
    push [ecx+edx]
    call memset```


    Though the memory must be corrupted already somehow at this point.

    Some update:
    Since I can reproduce the CI Verifier error 6474, on a specific file
    during boot, I dumped its contents into my driver buffer (whenever it
    was read in, where it was read), .memwrite-ed the data in WinDBG, and
    compared to the DLL on that W10 system without my driver.
    Bit to bit identical. I might be corrupting the location where the
    signature is read (if it's in some CAT file), or even worse the
    location of the calculated hash (a completely random location).


    Back to square one.
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 12,966

    Dejan_Maksimovic wrote:

    Uh, right...... except in this case, I was looking at that stack and it had:
    push eax push 0 push [ecx+edx] call memset

    That's 32-bit code. The comment does not apply.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,123

    On the x86 the Args to Child show EBP+8h, EBP+Ch, and EBP+10h. These are not necessarily the arguments to the function (though they might be).

    Did you try !chkimg in the debugger? This should be the same as what you've done, but worth a shot.

    -scott
    OSR

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224
    via Email
    !chkimg does not work, as the DLL was not loaded. I have nothing to
    direct !chkimg to.
    (yes, I did .reload)



    > On the x86 the Args to Child show EBP+8h, EBP+Ch, and EBP+10h. These are not
    > necessarily the arguments to the function (though they might be).
    >
    > Did you try !chkimg in the debugger? This should be the same as what you've
    > done, but worth a shot.
    >
  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,123

    Oh well, worth a shot...

    I hate bugs like these. Your saving grace is that it's reproducible and presumably early during boot, so the repro is fast. If it were me I would log every single operation that comes in to the filter for this specific file. Then I would run a ProcMon boot trace without my filter and see what the results are when I'm not there. Rinse and repeat until something presents itself.

    -scott
    OSR

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224
    via Email
    Kinda true, the fact that it is reproed so easily and always makes me
    happy that it is some code bug, rather than a "deal breaker" (seen
    those in FS world?:))

    ProcMon boot trace sounds like a groovie idea! Let's hope it does not
    crash (the amount of performance/crash issues we had with ProcMon is
    funny, without our driver!)


    > Oh well, worth a shot...
    >
    > I hate bugs like these. Your saving grace is that it's reproducible and
    > presumably early during boot, so the repro is fast. If it were me I would
    > log every single operation that comes in to the filter for this specific
    > file. Then I would run a ProcMon boot trace without my filter and see what
    > the results are when I'm not there. Rinse and repeat until something
    > presents itself.
  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224

    Well, the one difference I notice in the ProcMon boot log is that LESS of the DLL is read than in the case when my driver is loaded.

    When the error occurred, I also did a dump of 0x400 bytes of the address passed to Se/CIValidateImageSection - it is bit to bit identical to the data in the scesrv.dll without my driver.

    I can surmise that either the other values were overwritten (if the checksum is passed at all to these routines), or that the driver already corrupted so much of the memory, that the validator routines are f***ed :) But they should be read/execute, not writable right?

    For anyone wondering, due to the APIs involved, the idea is to just redirect data read for the DLLs from another location, without changing any of the data. There is no malware intention here.

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 224

    In the case where memory_corruption occurs, and not a CI error, where the last non bugcheck code is memset, I notice that the address passed to memset is exactly 0x1000 above an address that contains a valid header (and inspecting the stack shows which DLL is loaded - comparing that DLL to the valid address shows they match bit to bit).

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Developing Minifilters 29 July 2019 OSR Seminar Space
Writing WDF Drivers 23 Sept 2019 OSR Seminar Space
Kernel Debugging 21 Oct 2019 OSR Seminar Space
Internals & Software Drivers 18 Nov 2019 Dulles, VA