Query about mini filter kernel stack corruption

Hello,

I am working on a mini filter driver.

This driver enables reading of offline/archived files which are stored at a remote location by interfacing with a user mode program.

  1. The overall architecture involves creating a custom reparse point, when this reparse point is accessed, the mini filter gets a call back and it sends a message to a user mode program using FltSendMessage where the file offset and the data size is passed. The user mode program receives this message, fetches the data from remote storage and sends the data back to the driver using FilterSendMessage.

  2. The mini filter has a custom thread pool which has a max of 5 threads and they are used to send a message to the use mode program using FltSendMessage.

  3. The callback associated with IRP_MJ_READ creates a custom notification which contains the reparse point name, the offset needed and the length needed. The callback calls FltSendMessage to user mode program.

  4. The Message Notify function registered using FltCreateCommunicationPort receives the incoming data and calls FltCompletePendedPostOperation.

For large files >5gb which are being copied the driver raises a random bug check because the kernel stack gets corrupted. The crash may happen in any of the routines.
In most of the routines I am tracing the remaining kernel stack size using IoGetRemainingStackSize() and before the crash all threads print stack size in the range of 18kbyes to 21kbytes. However suddenly all the threads start showing invalid stack size upto 556Mbytes and as the parameters are invalid the driver crashes.

Please find below the trace message where one thread shows correct stack size but the same thread after few more reads shows a very big stack size.
The bugcheck is raised because IP is NULL.

PostReadWrite: setting up job offset = 26287767552, DataSize = 32768, ThreadID = 0000000000000930, Stack Size = 18904
?
?
?
PostReadWrite: setting up job offset = 26289569792, DataSize = 32768, ThreadID = 0000000000000DE8, Stack Size = 18904

MessageNotify: Result = 1 Address = FFFFD00022602000, offset = 0, DataSize = 32768 ThreadID = 00000000000003B4 StackSize = 21960 STOFF = 26287767552 IOPLENGTH = 32768

PostReadWrite: setting up job offset = 26289602560, DataSize = 32768, ThreadID = 0000000000000F00, Stack Size = 18904

MessageNotify: Result = 1 Address = FFFFD00022614000, offset = 0, DataSize = 32768 ThreadID = 00000000000003B4 StackSize = 21960 STOFF = 26287800320 IOPLENGTH = 32768

PostReadWrite: setting up job offset = 26289635328, DataSize = 32768, ThreadID = 0000000000000930, Stack Size = 556267992

MessageNotify: Result = 1 Address = FFFFD00022653000, offset = 0, DataSize = 32768 ThreadID = 00000000000003B4 StackSize = 544646600 STOFF = 26287833088 IOPLENGTH = 32768

PostReadWrite: setting up job offset = 26289668096, DataSize = 32768, ThreadID = 0000000000000324, Stack Size = 576137688

This crash does not happen if the user mode program introduces a delay of 1ms after each call to FilterSendMessage.

The code does not write big buffers into pointers stored in stack.
Please let me know how can I troubleshoot this further to identify root cause.
Please let me know if you need more information.

thanks.
Mandar

Can you post the !analyze information from WinDbg when the crash occurs?

Pete


Kernel Drivers
Windows File System and Device Driver Consulting
www.KernelDrivers.com
866.263.9295

------ Original Message ------
From: mandar.nanivadekar@hp.com
To: “Windows File Systems Devs Interest List”
Sent: 9/27/2016 11:06:35 PM
Subject: [ntfsd] Query about mini filter kernel stack corruption

>Hello,
>
>I am working on a mini filter driver.
>
>This driver enables reading of offline/archived files which are stored
>at a remote location by interfacing with a user mode program.
>
>1. The overall architecture involves creating a custom reparse point,
>when this reparse point is accessed, the mini filter gets a call back
>and it sends a message to a user mode program using FltSendMessage
>where the file offset and the data size is passed. The user mode
>program receives this message, fetches the data from remote storage and
>sends the data back to the driver using FilterSendMessage.
>
>2. The mini filter has a custom thread pool which has a max of 5
>threads and they are used to send a message to the use mode program
>using FltSendMessage.
>
>3. The callback associated with IRP_MJ_READ creates a custom
>notification which contains the reparse point name, the offset needed
>and the length needed. The callback calls FltSendMessage to user mode
>program.
>
>4. The Message Notify function registered using
>FltCreateCommunicationPort receives the incoming data and calls
>FltCompletePendedPostOperation.
>
>For large files >5gb which are being copied the driver raises a random
>bug check because the kernel stack gets corrupted. The crash may happen
>in any of the routines.
>In most of the routines I am tracing the remaining kernel stack size
>using IoGetRemainingStackSize() and before the crash all threads print
>stack size in the range of 18kbyes to 21kbytes. However suddenly all
>the threads start showing invalid stack size upto 556Mbytes and as the
>parameters are invalid the driver crashes.
>
>Please find below the trace message where one thread shows correct
>stack size but the same thread after few more reads shows a very big
>stack size.
>The bugcheck is raised because IP is NULL.
>
>PostReadWrite: setting up job offset = 26287767552, DataSize = 32768,
>ThreadID = 0000000000000930, Stack Size = 18904
>?
>?
>?
>PostReadWrite: setting up job offset = 26289569792, DataSize = 32768,
>ThreadID = 0000000000000DE8, Stack Size = 18904
>
>MessageNotify: Result = 1 Address = FFFFD00022602000, offset = 0,
>DataSize = 32768 ThreadID = 00000000000003B4 StackSize = 21960 STOFF =
>26287767552 IOPLENGTH = 32768
>
>PostReadWrite: setting up job offset = 26289602560, DataSize = 32768,
>ThreadID = 0000000000000F00, Stack Size = 18904
>
>MessageNotify: Result = 1 Address = FFFFD00022614000, offset = 0,
>DataSize = 32768 ThreadID = 00000000000003B4 StackSize = 21960 STOFF =
>26287800320 IOPLENGTH = 32768
>
>PostReadWrite: setting up job offset = 26289635328, DataSize = 32768,
>ThreadID = 0000000000000930, Stack Size = 556267992
>
>MessageNotify: Result = 1 Address = FFFFD00022653000, offset = 0,
>DataSize = 32768 ThreadID = 00000000000003B4 StackSize = 544646600
>STOFF = 26287833088 IOPLENGTH = 32768
>
>PostReadWrite: setting up job offset = 26289668096, DataSize = 32768,
>ThreadID = 0000000000000324, Stack Size = 576137688
>
>
>This crash does not happen if the user mode program introduces a delay
>of 1ms after each call to FilterSendMessage.
>
>The code does not write big buffers into pointers stored in stack.
>Please let me know how can I troubleshoot this further to identify root
>cause.
>Please let me know if you need more information.
>
>thanks.
>Mandar
>
>
>
>
>
>—
>NTFSD is sponsored by OSR
>
>
>MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>software drivers!
>Details at http:
>
>To unsubscribe, visit the List Server section of OSR Online at
>http:</http:></http:>

Thanks peter

Please find all the info below. This includes analyze -v, the thread info.
Please let me know if any more info is needed or if the minidump can be uploaded somewhere.

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but …
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: 0000000000000000, The address that the exception occurred at
Arg3: ffffd0012e65a8c8, Exception Record Address
Arg4: ffffd0012e65a0d0, Context Record Address

Debugging Details:

VIRTUAL_MACHINE: VMware

SYSTEM_VERSION: None

BIOS_DATE: 09/30/2014

BASEBOARD_PRODUCT: 440BX Desktop Reference Platform

BASEBOARD_VERSION: None

BUGCHECK_P1: ffffffffc0000005

BUGCHECK_P2: 0

BUGCHECK_P3: ffffd0012e65a8c8

BUGCHECK_P4: ffffd0012e65a0d0

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP:
+0
00000000`00000000 ?? ???

EXCEPTION_RECORD: ffffd0012e65a8c8 – (.exr 0xffffd0012e65a8c8)
ExceptionAddress: 0000000000000000
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000008
Parameter[1]: 0000000000000000
Attempt to execute non-executable address 0000000000000000

CONTEXT: ffffd0012e65a0d0 – (.cxr 0xffffd0012e65a0d0)
rax=ffffe0005ba72d70 rbx=ffffe0005c46a040 rcx=ffffe0005f803a40
rdx=ffffe0005ba72d70 rsi=ffffe0005c46a040 rdi=ffffe0005b898900
rip=0000000000000000 rsp=ffffd0012e65ab00 rbp=0000000000000080
r8=ffffe0005ba72d70 r9=000000000000002f r10=fffff8015342a180
r11=ffffe0005f803a40 r12=0000000000000000 r13=fffff80267a88000
r14=fffff8015342a300 r15=fffff801534252a0
iopl=0 nv up ei pl zr na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010242
00000000`00000000 ?? ???
Resetting default scope

CPU_COUNT: 8

CPU_MHZ: a28

CPU_VENDOR: GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 2d

CPU_STEPPING: 7

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT_SERVER

PROCESS_NAME: System

CURRENT_IRQL: 0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1: 0000000000000008

EXCEPTION_PARAMETER2: 0000000000000000

WRITE_ADDRESS: GetPointerFromAddress: unable to read from fffff80267ddc138
GetUlongPtrFromAddress: unable to read from fffff80267ddc298
GetUlongPtrFromAddress: unable to read from fffff80267ddc518
0000000000000000

FOLLOWUP_IP:
HPEDpHsmX64+55f6 [f:\svn\storage_optimizer\storage_optimizer\trunk\filterdriver\src\hparchsm.c @ 3833]
fffff801`534225f6 488b542458 mov rdx,qword ptr [rsp+58h]

FAILED_INSTRUCTION_ADDRESS:
+0
00000000`00000000 ?? ???

BUGCHECK_STR: AV

ANALYSIS_VERSION: 10.0.10240.9 amd64fre

IP_IN_FREE_BLOCK: 0

LAST_CONTROL_TRANSFER: from ffffe0005bb59e00 to 0000000000000000

SYMBOL_ON_RAW_STACK: 1

STACK_ADDR_RAW_STACK_SYMBOL: ffffd0012e65ab30

STACK_COMMAND: dps ffffd0012e65ab30-0x20 ; kb

STACK_TEXT:
ffffd0012e65ab10 0000000000000000
ffffd0012e65ab18 ffffe0005bb59df0
ffffd0012e65ab20 ffffe0005c46a040
ffffd0012e65ab28 fffff801534225f6 HPEDpHsmX64!HpArcHsmDeferredReadWriteCompletion+0x126 [f:\svn\storage_optimizer\storage_optimizer\trunk\filterdriver\src\hparchsm.c @ 3833]
ffffd0012e65ab30 fffff8015342a180 HPEDpHsmX64!HpArcHsmData+0x40
ffffd0012e65ab38 ffffe0005f803a40
ffffd0012e65ab40 ffffe0005c46a040
ffffd0012e65ab48 fffff80267b58472 nt!KeReleaseSpinLock+0x22
ffffd0012e65ab50 0000000000000001
ffffd0012e65ab58 ffffe0005f803a40
ffffd0012e65ab60 ffffe0005c4a35b0
ffffd0012e65ab68 0000000067000000
ffffd0012e65ab70 0000000000000000
ffffd0012e65ab78 0000000000000000
ffffd0012e65ab80 ffffe00100000000
ffffd0012e65ab88 ffffe000610b1dd8

FAULTING_SOURCE_LINE: f:\svn\storage_optimizer\storage_optimizer\trunk\filterdriver\src\hparchsm.c

FAULTING_SOURCE_FILE: f:\svn\storage_optimizer\storage_optimizer\trunk\filterdriver\src\hparchsm.c

FAULTING_SOURCE_LINE_NUMBER: 3833

FAULTING_SOURCE_CODE:
3829: //
3830: ExFreeToNPagedLookasideList(&HpArcHsmData.WorkItemContextList, pWorkItem);
3831:
3832: //

3833: // Get the stream context here.
3834: //
3835: status = FltGetStreamContext(Instance, Data->Iopb->TargetFileObject,
3836: &pHpArcHsmContext);
3837: if (!NT_SUCCESS(status))
3838: {

SYMBOL_NAME: HPEDpHsmX64+55f6

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: HPEDpHsmX64

IMAGE_NAME: HPEDpHsmX64.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 57eca42b

IMAGE_VERSION: 1.1.135.1

BUCKET_ID_FUNC_OFFSET: 55f6

FAILURE_BUCKET_ID: AV_NULL_IP_HPEDpHsmX64!Unknown_Function

BUCKET_ID: AV_NULL_IP_HPEDpHsmX64!Unknown_Function

PRIMARY_PROBLEM_CLASS: AV_NULL_IP_HPEDpHsmX64!Unknown_Function

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_null_ip_hpedphsmx64!unknown_function

FAILURE_ID_HASH: {c8a1565a-0f63-308e-6a6b-549d699dd542}

Followup: MachineOwner

4: kd> !threads
Index TID TEB StackBase StackLimit DeAlloc StackSize ThreadProc
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
0 0000000000000000 0xffffe0005c46a040 0xffffe0005c46a048 0xffffe0005c46a048 0xffffffffffffffff 0x0000000000000000 0x6ffffc000
Total VM consumed by thread stacks 0x00000000

The register windows shows ip = 0

> rip=0000000000000000

you have a stack corruption

So it’s a NULL pointer dereference *in the instruction pointer*.

It looks like your source code has changed slightly, but usually this would point to the line following the faulting call:

3830: ExFreeToNPagedLookasideList(&HpArcHsmData.WorkItemContextList, pWorkItem);

This doesn’t match well with what you are seeing, but what I’m guessing is that a few lines earlier you’ve done something with that work item. Thus, the most plausible theory is that you’ve jumped through a NULL function pointer.

Tony
OSR

Activate DriverVerifier with special pool settings and replace ExAllocateFrom****LookasideList / ExFreeTo***LookasideList to ExAllocatePoolWithTag / ExFreePoolWithTag as DriverVerifier is unable to detect memory corruptions for LookasideLists or has a very limited capability for this.

I don’t see anything suggesting the stack is corrupted. This looks more like a jump through a null pointer.

Tony
OSR

Thanks all, I will relook at my code and also try driver verifier.

Tony, one query I have is, in some of the routines I check the kernel stack size for each thread by using IoGetRemainingStackSize() and print it.
I can see that the stack size suddenly increases.

I thought that rip becomes NULL because the stack goes bad , and the stack size suddenly increases for all the threads in all the routines which were working normally so far.

The crash does not happen immediately but after a few I/O calls after the stack seems to go bad.
Can the stack go bad because of access violation or this is a different problem altogether.

PostReadWrite: setting up job offset = 26287767552, DataSize = 32768, ThreadID =
0000000000000930, Stack Size = 18904
?
?
?
PostReadWrite: setting up job offset = 26289569792, DataSize = 32768, ThreadID =
0000000000000DE8, Stack Size = 18904

MessageNotify: Result = 1 Address = FFFFD00022602000, offset = 0, DataSize =
32768 ThreadID = 00000000000003B4 StackSize = 21960 STOFF = 26287767552
IOPLENGTH = 32768

PostReadWrite: setting up job offset = 26289602560, DataSize = 32768, ThreadID =
0000000000000F00, Stack Size = 18904

MessageNotify: Result = 1 Address = FFFFD00022614000, offset = 0, DataSize =
32768 ThreadID = 00000000000003B4 StackSize = 21960 STOFF = 26287800320
IOPLENGTH = 32768

PostReadWrite: setting up job offset = 26289635328, DataSize = 32768, ThreadID =
0000000000000930, Stack Size = 556267992

MessageNotify: Result = 1 Address = FFFFD00022653000, offset = 0, DataSize =
32768 ThreadID = 00000000000003B4 StackSize = 544646600 STOFF = 26287833088
IOPLENGTH = 32768

PostReadWrite: setting up job offset = 26289668096, DataSize = 32768, ThreadID =
0000000000000324, Stack Size = 576137688