Hi guys,
We have a filter storage driver that hooks up on a couple of levels in the driver stack. The system BSODs occasionally with no consistency at any particular point. All BSODs related to some kind of a memory corruption though. Mostly, crashes at either:
- “IRQL_NOT_LESS_OR_EQUAL (a), an attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses.” or,
- “KERNEL_SECURITY_CHECK_FAILURE (139), ExceptionCode: c0000409 (Security check failure or stack buffer overrun).”
I will provide a few WinDbg outputs of BSODs below, however, don’t pay too much attention at any particular one, because those are just some of them and others fail pretty much all around the place, at different points: within our driver, system kernel and even at application level (see “My NOTES” within the outputs below). unfortunately, I can’t post complete BSOD outputs here, because of a message size limitation , so I will cut most of them.
We also ran this on different Windows 10 x64 systems with the same results, so, that definitely doesn’t belong to the system.
One more thing to note. In quite a few BSODs we’ve seen the following error message for the “>!timer” cmd: “Timer at has wrong Blink! (Blink , should be )” and that was regarding one of our TimerDPC procedures. So, I would very much appreciate if you gave me any insights on how the timer’s blink address can get corrupted. Well, I was hoping our DPC code is an issue, however, some BSODs occurred without that DPC is even being initialized by KeInitializeDpc/KeInitializeTimer.
The verifier usually crashes at our DPC timer handler accessing one of its members within DPC DeferredContext located in NonPaged Pool (see one of the outputs below), however, that’s not always the case. As I mentioned above, BSODs occur even without our DPC being initialized, as it gets initialized only when certain conditions occur.
I have two guesses at this point. It is either something happens with our DPC timer/handler or with a system thread stack. So, I was wondering if you could have a look into this and let me know where we should dig into further.
I would greatly appreciate any help with any of your ideas on what might be happening here. Please let me know should you need more information on anything.
Thank you!
Mike
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses.
Arguments:
Arg1: ffffb04e0000002d, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff800410c556b, address which referenced memory
READ_ADDRESS: Unable to get offset of nt!_MI_VISIBLE_STATE.SpecialPool
Unable to get value of nt!_MI_VISIBLE_STATE.SessionSpecialPool
ffffb04e0000002d
STACK_TEXT:
nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiPageFault+0x469
nt!RtlpHpLfhSubsegmentDecommitPages+0xcb
nt!RtlpHpLfhOwnerCompact+0x90
nt!RtlpHpLfhContextCompact+0xaf
nt!RtlpHpHeapCompact+0x76
nt!ExpHpCompactionRoutine+0x207
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28
KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: fffff8887b1e69c0, Address of the trap frame for the exception that caused the bugcheck
Arg3: fffff8887b1e6918, Address of the exception record for the exception that caused the bugcheck
Arg4: 0000000000000000, Reserved
CURRENT_IRQL: 2
ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.
STACK_TEXT:
nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiFastFailDispatch+0xd0
nt!KiRaiseSecurityCheckFailure+0x323
nt!RtlpHpLfhSlotAllocate+0x19d53c
nt!ExAllocateHeapPool+0x2b1
nt!ExAllocatePoolWithTag+0x64
nt!ObWaitForMultipleObjects+0x399
nt!NtWaitForMultipleObjects+0x119
nt!KiSystemServiceCopyEnd+0x25
nt!KiServiceLinkage
dxgkrnl!CTokenManager::ProcessTokens+0x1bd
dxgkrnl!CTokenManager::TokenThread+0x79
dxgkrnl!NtTokenManagerThread+0x1be
nt!KiSystemServiceCopyEnd+0x25
win32u!NtTokenManagerThread+0x14
dwmcore!CGlobalSurfaceManager::ProcessKernelTokens+0x114
dwmcore!CGlobalSurfaceManager::s_TokenThreadMain+0x9
KERNEL32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21
KERNEL_SECURITY_CHECK_FAILURE (139)
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: ffff9b82e8e36e90, Address of the trap frame for the exception that caused the bugcheck
Arg3: ffff9b82e8e36de8, Address of the exception record for the exception that caused the bugcheck
Arg4: 0000000000000000, Reserved
STACK_TEXT:
nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiFastFailDispatch+0xd0
nt!KiRaiseSecurityCheckFailure+0x30e
nt!RtlpHpLfhSlotAllocate+0x1830fa
nt!ExAllocateHeapPool+0x98b
nt!ExAllocatePoolWithTag+0x3d
Wdf01000!FxPoolAllocator+0x73 [minkernel\wdf\framework\shared\object\wdfpool.cpp @ 337]
Wdf01000!FxIoTarget::FormatIoctlRequest+0x329 [minkernel\wdf\framework\shared\targets\general\km\fxiotargetkm.cpp @ 373]
Wdf01000!FxIoTargetSendIoctl+0x158 [minkernel\wdf\framework\shared\targets\general\fxiotargetapi.cpp @ 1193]
Wdf01000!imp_WdfIoTargetSendIoctlSynchronously+0x48 [minkernel\wdf\framework\shared\targets\general\fxiotargetapi.cpp @ 1421]
My NOTES: This one occured from within our applcation (as you will see this wasn’t even calling our driver):
KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: ffff81088d55b550, Address of the trap frame for the exception that caused the bugcheck
Arg3: ffff81088d55b4a8, Address of the exception record for the exception that caused the bugcheck
Arg4: 0000000000000000, Reserved
PROCESS_NAME: .exe
STACK_TEXT:
nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiFastFailDispatch+0xd0
nt!KiRaiseSecurityCheckFailure+0x30e
nt!KiInsertTimerTable+0x14bc6c
nt!KiCommitThreadWait+0x4e4
nt!KeWaitForSingleObject+0x520
win32kfull!UmfdClientSendAndWaitForCompletion+0x12e
win32kfull!UmfdQueryAdvanceWidths+0xdc
win32kfull!RFONTOBJ::bGetWidthTable+0x123
win32kfull!NtGdiGetWidthTable+0x192
nt!KiSystemServiceCopyEnd+0x28
win32u!NtGdiGetWidthTable+0x14
gdi32full!bFillWidthTableForGTE+0x166
gdi32full!pcfLocateCFONT+0x310
gdi32full!GetTextExtentPointWInternal+0x141
gdi32full!GetTextExtentPoint32W+0xe
GDI32!GetTextExtentPoint32WStub+0x43
_7ff671310000!CGraphWnd::DrawData+0x2b5
_7ff671310000!CGraphWnd::OnPaint+0x11b
My NOTES: This one occured at our driver code accessing a member within our DPC DeferredContext, which was located in NonPaged Pool:
IRQL_NOT_LESS_OR_EQUAL (a)
Arguments:
Arg1: ffffa78dd9d80f50, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff8067c8953d1, address which referenced memory
PROCESS_NAME: svchost.exe
STACK_TEXT:
nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiPageFault+0x454
nt!KiCancelTimer+0x41
nt!KiSetTimerEx+0x7b
nt!KeSetTimer+0x14
VerifierExt!KeSetTimer_wrapper+0x3e
nt!VerifierKeSetTimer+0x10
!Xxxxx::timerDpc+0x2c8