About the system crash caused by InsertTailList

In my driver code, I need to use InsertTailList to insert data into the linked list. The main code is as follows:

//Here will use ExAllocPoolWithTag(NonPagedPool,'tag') to dynamically generate a clonedFlowContext structure, which has a LIST_ENTRY member inside: listEntry;
clonedFlowContext = CloneFlowContext(&localFlowContext);

if (clonedFlowContext == NULL) {
	DbgPrint("AleAuthConnectFn::CloneFlowContext Fail.\n");
	goto Exit;
}

KeAcquireInStackQueuedSpinLock(&g_pWfpGlobal->flowContextListLock, &lockHandle);

if (g_pWfpGlobal->isUninstalled) {
	KeReleaseInStackQueuedSpinLock(&lockHandle);
	goto Exit;
}

InsertTailList(&g_pWfpGlobal->flowContextList, &clonedFlowContext->listEntry);
KeReleaseInStackQueuedSpinLock(&lockHandle);

ExFreePoolWithTag(clonedFlowContext,'tag');//When running here to release the clonedFlowContext memory, the system will crash.

I searched for some reasons and found that it seems to be related to “IRQL”. When I run it on Windows7, it runs fine and does not crash. Calling KeGetCurrentIrql() prints IRQL=0 (PASSIVE_LEVEL). But when the program is run on Windows10, the system crashes and the IRQL=2 (DISPATCH_LEVEL) printed by calling KeGetCurrentIrql().

Later, I saw this sentence in Microsoft’s development documentation:

“Callers of InsertTailList can be running at any IRQL. If InsertTailList is called at IRQL >= DISPATCH_LEVEL, the storage for ListHead and the list entries must be resident.”

I don’t quite understand the meaning of this sentence, he is saying that if IRQL >=DISPATCH_LEVEL, does the memory I dynamically allocate for the clonedFlowContext structure always exist and cannot be released? , if so, it will cause a memory leak, what should I do?

Is my question really like this? Hope to get an answer here, thanks!

If you allocate the memory from NonPagedPool, then it will be resident and there are no IRQL issues. I don’t think you told what the crash was. Have you received a dump and analyzed it?

1 Like

@Tim_Roberts Thanks for the guide, looks like I need to check again.

The minidump, which you posted but since deleted, shows bugcheck 192. That does mean you accessed paged memory at a raised IRQL. Your IRQL was raised because you acquire a spinlock (and good for you, because you NEED to do so when using a linked list). Thus, by far the most likely cause is that you allocated your list from paged memory. That doesn’t work. The list and all the list entries need to be non-paged pool.

BTW, when you read a minidump in windbg, the FIRST command you want to type is !analyze -v, as the introduction suggests. That provides lots of good info about the crash.

If your code is actually as it appears, then you are freeing the list entry after inserting it. That would be a bug. Are you actually removing the entry from the list before freeing it?

@Tim_Roberts @Mark_Roddy

terribly sorry!
Due to my carelessness, when I posted this question, the target operating system platform (Driver Settings–>General–>Target OS Version) of my Visual Studio 2019 project settings was set to “Windows7”, but I compiled The .sys file that comes out is run on Windows 10. The magic is that it can be loaded correctly, although it encounters the crash problem mentioned above.
Later, I set river Settings–>General–>Target OS Version to “Windows10 or higher”, and put the compiled .sys file on Windows10, but it failed to load, the error message: "System error 127 has occured. The specified procedure cannot be found.”
I fell into doubt again, and I checked step by step and found that as long as the “FwpsCalloutRegister” function exists in the code, it will not succeed, even if I don’t call it, I can delete it.

CODE:

NTSTATUS RegisterCalloutForLayer(
	IN const GUID* layerKey,
	IN const GUID calloutKey,
	IN FWPS_CALLOUT_CLASSIFY_FN classifyFn,
	IN FWPS_CALLOUT_NOTIFY_FN notifyFn,
	IN FWPS_CALLOUT_FLOW_DELETE_NOTIFY_FN flowDeleteNotifyFn,
	OUT UINT32* calloutId,  
	OUT UINT64* filterId,
	FWPM_FILTER_CONDITION *conditions,
	UINT32 conditionLength
) {

	NTSTATUS        status = STATUS_SUCCESS;
	FWPS_CALLOUT    sCallout = { 0 };
	FWPM_FILTER     mFilter = { 0 };
	FWPM_FILTER_CONDITION localConditions[1] = { 0 };
	UINT32 localConditionLength = 0;
	FWP_CONDITION_VALUE conditionValue;
	FWP_BYTE_BLOB* byteBlob = NULL;
	FWPM_CALLOUT    mCallout = { 0 };
	FWPM_DISPLAY_DATA mDispData = { 0 };
	BOOLEAN         bCalloutRegistered = FALSE;

	sCallout.calloutKey = calloutKey; 
	sCallout.classifyFn = classifyFn; 
	sCallout.flowDeleteFn = flowDeleteNotifyFn; 
	sCallout.notifyFn = notifyFn;	

        //As long as this function appears in the code, the driver will fail to load. This macro definition is "FwpsCalloutRegister3"
	status = FwpsCalloutRegister(g_pWfpGlobal->deviceObject,
		&sCallout, 
		calloutId  
	);
	
	//Do something...

	return status;
}

In fwpvi.h, I saw that its macro definition is like this:

#if (NTDDI_VERSION >= NTDDI_WIN10_RS3)
#define FwpsCalloutRegister FwpsCalloutRegister3
#define FwpsvSwitchEventsSubscribe FwpsvSwitchEventsSubscribe0
#define FwpsvSwitchEventsUnsubscribe FwpsvSwitchEventsUnsubscribe0
#define FwpsvSwitchNotifyComplete FwpsvSwitchNotifyComplete0
#elif (NTDDI_VERSION >= NTDDI_WIN8)
#define FwpsCalloutRegister FwpsCalloutRegister2
#define FwpsvSwitchEventsSubscribe FwpsvSwitchEventsSubscribe0
#define FwpsvSwitchEventsUnsubscribe FwpsvSwitchEventsUnsubscribe0
#define FwpsvSwitchNotifyComplete FwpsvSwitchNotifyComplete0
#elif (NTDDI_VERSION >= NTDDI_WIN7)
#define FwpsCalloutRegister FwpsCalloutRegister1
#else
#define FwpsCalloutRegister FwpsCalloutRegister0
#endif // (NTDDI_VERSION >= NTDDI_WIN7)

I’m very distressed right now and don’t know where the problem is…

I’m confused by your description. If you are going to RUN this on Windows 7, then you must COMPILE it for Windows 7. The compiled-for-Windows-7 version will also run on Windows 10, but the reverse is not true.

1 Like

@Tim_Roberts
Thank you very much, I found my problem, my callback function runs at the DISPATCH_LEVEL interrupt level, but I call API functions that are not <= DISPATCH_LEVEL, so it causes a blue screen crash.
In addition, I would like to ask you, is Windows 10 compatible with Windows 7 drivers? Because I see that there are such settings in the compile configuration of VS2019

Picture:https://drive.google.com/file/d/1ejJXa_AvrdbpYI8rhnt5ws53_tmQWNbw/view

Yes. This is one attribute where Windows is much better than MacOS or Linux. Many drivers built for XP will still load and run on Windows 11.

1 Like