Some memory corruptions debugging

Dejan_Maksimovic · April 14, 2019, 12:48pm

I got the following CI error during very early boot, without our minifilter:

[quote]

This break indicates this binary is not signed correctly: \Device\HarddiskVolume2\ALFACW_2019_04_14-13_37_21\Windows\System32\COMDLG32.dll
and does not meet the system policy.
The binary was attempted to be loaded in the process: \Device\HarddiskVolume2\Windows\System32\smss.exe
This is not a failure in CI, but a problem with the failing binary.
Please contact the binary owner for getting the binary correctly signed.
Code Integrity violation: 6271[/quote]

Obviously, the driver corrupted loading of the mentioned DLL. But the stack is not helpful:

890e3908 81dbfaa2 8959ca80 8ee00000 000d6000 CI!CiValidateImageHeader+0x79e (FPO: [13,35,4])
890e3954 81dbf7ba 000d6000 8c55c580 8c783a78 nt!SeValidateImageHeader+0x4a (FPO: [Non-Fpo])
890e3a3c 81debdc4 ffffffff 8c783a78 00000000 nt!MiValidateSectionCreate+0x1b4 (FPO: [Non-Fpo])
890e3a68 81dc2b19 8d413200 ffffffff 8c783a78 nt!MiValidateSectionSigningPolicy+0x5e (FPO: [Non-Fpo])
890e3b50 81dea336 00000000 00000000 00b4f4b8 nt!MiCreateNewSection+0x41d (FPO: [Non-Fpo])
890e3be0 81de9adf 00000010 01000000 8c783a78 nt!MiCreateImageOrDataSection+0x256 (FPO: [Non-Fpo])
890e3c84 81de9a3c 00000000 890e3cfc 00000010 nt!MiCreateSection+0x7f (FPO: [Non-Fpo])
890e3cc4 81de98b7 890e3d4c 000f001f 00b4f4b8 nt!MmCreateSection+0x82 (FPO: [Non-Fpo])
890e3d30 81bd93ed 00b4f428 000f001f 00b4f4b8 nt!NtCreateSection+0x137 (FPO: [Non-Fpo])
890e3d30 7788fcd0 00b4f428 000f001f 00b4f4b8 nt!KiSystemServicePostCall (FPO: [0,3] TrapFrame @ 890e3d54)
00b4f3d0 7788e8aa 7788aec1 00b4f428 000f001f ntdll!KiFastSystemCallRet (FPO: [0,0,0])
00b4f3d4 7788aec1 00b4f428 000f001f 00b4f4b8 ntdll!ZwCreateSection+0xa (FPO: [7,0,0])
00b4f470 00fe830f 0000008d 00b4f4f8 00000000 ntdll!LdrVerifyImageMatchesChecksumEx+0x71 (FPO: [Non-Fpo])
WARNING: Frame IP not in any known module. Following frames may be wrong.
00b4f58c 778b7267 00b700e0 040030cc 0000ff00 0xfe830f
00b4f5c4 778555fc 00b700d8 04002ac8 04000010 ntdll!RtlpHpVsFreeChunkInsert+0x61ac8
00b4f60c 7784e90c 00000000 00000000 00b70000 ntdll!RtlpHpVsContextFree+0x36c (FPO: [3,9,4])
00b4f668 7785424e 04002b24 00000000 778542e1 ntdll!RtlpHpFreeHeap+0x32c (FPO: [3,7,4])
00b4f6a8 00000000 f6373652 00000000 04002b28 ntdll!RtlpHpVsContextAllocate+0x19e (FPO: [Non-Fpo])

How do I even get the data loaded for that file in WinDBG, to see if the offending offsets give a clue about the corruption?
Or any other ways to figure this one. Placing debug prints is not helpful.

A similar issue occurred on a different boot, but with MiCreateSection causing a fault in memset, due to an invalid address or size.

Dejan_Maksimovic · April 14, 2019, 4:47pm

Another case (similar situation) reports a memory_corruption with Arg2 = 2 (execute?)
Funny enough, some debug prints showed that my driver read in data JUST below that address (e.g. we read in bde7000, Length 1000, and the faulting address Arg1 is bde8000). <this part does not always happen, i.e. I do not always have debug output showing such address “concatenation”.

That’s one of the other I mentioned faulting in MiCreateSection, running memset on the faulting address.

                    VA 9dc7d000
PDE at C0602770            PTE at C04EE3E8
contains 000000000F9CC863  contains 0000000000000000
pfn f9cc      ---DA--KWEV   not valid

kd> !pte 9dc7c000
                    VA 9dc7c000
PDE at C0602770            PTE at C04EE3E0
contains 000000000F9CC863  contains 80000000231F0863
pfn f9cc      ---DA--KWEV   pfn 231f0     ---DA--KWEV```

Last call before KiTrap:
```a02a7984 81badf2d 9dc7d000 00000000 fffff000 nt!memset+0x45```

which, provided the debug output is all there is, would make no sense, attempting to set the FFFFF000 bytes to zero.

Dejan_Maksimovic · April 15, 2019, 6:56pm

Any clues how I can even start debugging either of these? I can almost instantly reproduce the second (memory_corruption) BSOD, with pure certainly, so I am quite sure it is something I am doing.

But corrupting memory that far off, or setting it to an incorrect access state?

My driver is supposed to offload all writes to several folders (windows is not one of them, at least not yet).

Scott_Noone_OSR · April 18, 2019, 8:23pm

Don’t believe the “Args to Child”, it’s meaningless. Have you tried enabling Verifier all over the place? I’d start with your driver and FltMgr.sys

Dejan_Maksimovic · April 18, 2019, 8:52pm

I did all variations of DV, either special pool or memory tracking
(not at the same time), I/O verifications, etc.
The only one I did not test was Low Memory simulation.

Don’t believe the “Args to Child”, it’s meaningless. Have you tried enabling
Verifier all over the place? I’d start with your driver and FltMgr.sys

Why are “Args to Child” meaningless?

Tim_Roberts · April 18, 2019, 9:41pm

Dejan_Maksimovic wrote:

Why are “Args to Child” meaningless?

In the 64-bit world, many functions don’t ever commit their parameters
to memory. The first 4 are all done in registers, where the dump can’t
see them. In many cases, they do spill to memory, so the arguments are
not entirely useless, but if you see trash, it’s probably not real.

Dejan_Maksimovic · April 18, 2019, 10:41pm

Uh, right… except in this case, I was looking at that stack and it had:

push 0  
push [ecx+edx]  
call memset```  
  
Though the memory must be corrupted already somehow at this point.  
  
Some update:  
Since I can reproduce the CI Verifier error 6474, on a specific file  
during boot, I dumped its contents into my driver buffer (whenever it  
was read in, where it was read), .memwrite-ed the data in WinDBG, and  
compared to the DLL on that W10 system without my driver.  
Bit to bit identical. I might be corrupting the location where the  
signature is read (if it's in some CAT file), or even worse the  
location of the calculated hash (a completely random location).  
  
Back to square one.

Tim_Roberts · April 18, 2019, 11:09pm

Dejan_Maksimovic wrote:

Uh, right… except in this case, I was looking at that stack and it had:
push 0
push [ecx+edx]
call memset```

That’s 32-bit code. The comment does not apply.

Scott_Noone_OSR · April 19, 2019, 6:54pm

On the x86 the Args to Child show EBP+8h, EBP+Ch, and EBP+10h. These are not necessarily the arguments to the function (though they might be).

Did you try !chkimg in the debugger? This should be the same as what you’ve done, but worth a shot.

Dejan_Maksimovic · April 19, 2019, 7:47pm

!chkimg does not work, as the DLL was not loaded. I have nothing to
direct !chkimg to.
(yes, I did .reload)

On the x86 the Args to Child show EBP+8h, EBP+Ch, and EBP+10h. These are not
necessarily the arguments to the function (though they might be).

Did you try !chkimg in the debugger? This should be the same as what you’ve
done, but worth a shot.

Scott_Noone_OSR · April 19, 2019, 8:06pm

Oh well, worth a shot…

I hate bugs like these. Your saving grace is that it’s reproducible and presumably early during boot, so the repro is fast. If it were me I would log every single operation that comes in to the filter for this specific file. Then I would run a ProcMon boot trace without my filter and see what the results are when I’m not there. Rinse and repeat until something presents itself.

Dejan_Maksimovic · April 19, 2019, 8:11pm

Kinda true, the fact that it is reproed so easily and always makes me
happy that it is some code bug, rather than a “deal breaker” (seen
those in FS world?:))

ProcMon boot trace sounds like a groovie idea! Let’s hope it does not
crash (the amount of performance/crash issues we had with ProcMon is
funny, without our driver!)

Oh well, worth a shot…

I hate bugs like these. Your saving grace is that it’s reproducible and
presumably early during boot, so the repro is fast. If it were me I would
log every single operation that comes in to the filter for this specific
file. Then I would run a ProcMon boot trace without my filter and see what
the results are when I’m not there. Rinse and repeat until something
presents itself.

Dejan_Maksimovic · April 20, 2019, 7:45pm

Well, the one difference I notice in the ProcMon boot log is that LESS of the DLL is read than in the case when my driver is loaded.

When the error occurred, I also did a dump of 0x400 bytes of the address passed to Se/CIValidateImageSection - it is bit to bit identical to the data in the scesrv.dll without my driver.

I can surmise that either the other values were overwritten (if the checksum is passed at all to these routines), or that the driver already corrupted so much of the memory, that the validator routines are f***ed But they should be read/execute, not writable right?

For anyone wondering, due to the APIs involved, the idea is to just redirect data read for the DLLs from another location, without changing any of the data. There is no malware intention here.

Dejan_Maksimovic · April 21, 2019, 9:20pm

In the case where memory_corruption occurs, and not a CI error, where the last non bugcheck code is memset, I notice that the address passed to memset is exactly 0x1000 above an address that contains a valid header (and inspecting the stack shows which DLL is loaded - comparing that DLL to the valid address shows they match bit to bit).

Dejan_Maksimovic · June 6, 2019, 8:36pm

This post did not appear when sent via e-mail, so…
Hmmm If not for a completely random debug print that I added
recently, for a completely different reason, I would not have solved
this (hopefully ;)).

So what seemed to be an issue? It appears that the file size seen by
the cache manager is zero (this is what our bug was, the file opened
seemed to be zero when it was queried by Cc).
The section creation API still tried to read the data in, and by this
time the file had its right size (but the cache manager view was
probably not invalidated by the FS?).

The file size is zero - the PAGE_SIZE is 0x1000. From WRK, the only
place where RtlZeroMemory (memset) is called with a random parameter
is when it calculated FILE_SIZE-PAGE_SIZE.
Thus memset trying to zero out 0xFFFFF000 bytes of memory Even
though the data read in is correct, the amount read in is correct -
the file size is not invalidated I guess.

Huh, 3 months almost