KERNEL_SECURITY_CHECK_FAILURE BSOD in ndisFreeNblToNPagedPool after using NdisFreeNetBufferList?

Some background:

We have an NDIS LWF driver that drops received packets, sends them to user-mode service for checking, and user-mode sends packet buffers to our LWF; our LWF then creates an NBL for each packet and then we chain all the created NBLs together and Indicate them using NdisFIndicateReceiveNetBufferLists. The only recent change that we had on our driver was storing the related NetBufferListInfo for each NBL. We do this in our filter receive handler by creating a clone NBL for the entire NBL list that we received in our handler, and then copy the NetBufferListInfo to this cloned NBL (using NdisAllocateCloneNetBufferList + NdisCopyReceiveNetBufferListInfo), and this cloned NBL is then only used to keep the NetBufferListInfo. We then include the pointer to this cloned NBL for each packet that we send to user-mode, and later when the user-mode send us the packets it wants to send, we find the corresponding cloned NBL for each packet in our internal struct, and after we created the NBL for that packet (using NdisAllocateNetBufferList), we copy the NBL info to that packet using NdisCopyReceiveNetBufferListInfo to restore the original NBL info for each packet. And then we free these cloned NBLs after we created the new NBLs, using NdisFreeCloneNetBufferList, and finally indicate the new NBLs that are chained together using NdisFIndicateReceiveNetBufferLists.

Note that we only use the cloned NBLs to save the NBL info for each packet, thus we use the NDIS_CLONE_FLAGS_USE_ORIGINAL_MDLS as a performance improvement. And I’m not sure if the BSOD is happening because of this change or not, i just wrote that just to give insight on the recent changes.


Back to the problem:

We have encountered a single machine that has a Windows 10 on a VirtualBox, and that machine gets this BSOD after 15-20 minutes of boot:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure.  The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: ffffec8f95f76380, Address of the trap frame for the exception that caused the bugcheck
Arg3: ffffec8f95f762c0, Address of the exception record for the exception that caused the bugcheck
Arg4: 0000000000000000, Reserved

Callstack:

nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiFastFailDispatch+0xd0
nt!KiRaiseSecurityCheckFailure+0x323
ndis!ndisFreeNblToNPagedPool+0x7c
ndis!NdisFreeNetBufferList+0x11d
our FilterReturnBufferList handler
ndis!ndisCallReceiveCompleteHandler+0x33
ndis!NdisReturnNetBufferLists+0x4c9
tcpip!FlpReturnNetBufferListChain+0xd4
NETIO!NetioDereferenceNetBufferListChain+0x104
tcpip!TcpReceive+0x64f
tcpip!TcpNlClientReceiveDatagrams+0x22
tcpip!IppProcessDeliverList+0xc1
tcpip!IppReceiveHeaderBatch+0x21b
tcpip!IppFlcReceivePacketsCore+0x32f
tcpip!IpFlcReceivePackets+0xc
tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x270
tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x17c
nt!KeExpandKernelStackAndCalloutInternal+0x78
nt!KeExpandKernelStackAndCalloutEx+0x1d
tcpip!NetioExpandKernelStackAndCallout+0x8d
tcpip!FlReceiveNetBufferListChain+0x46d
ndis!ndisMIndicateNetBufferListsToOpen+0x141
ndis!ndisMTopReceiveNetBufferLists+0x22b
ndis!ndisInvokeNextReceiveHandler+0x4b
ndis!ndisFilterIndicateReceiveNetBufferLists+0x3cad1
ndis!NdisFIndicateReceiveNetBufferLists+0x6e
our IOCTL handler

I tried to look through the ndisFreeNblToNPagedPool, and it seems like the BSOD is related to NetBufferListInfo[27] of the NBL, and the logic seems to be basically the following, which is the classic LIST_ENTRY corruption checking when calling RemoveEntryList:

if ((NextEntry->Blink != Entry) || (PrevEntry->Flink != Entry)) {
    BSOD
}

So how can i find out why this BSOD is happening using the dump? I thought maybe its a double free, but we only allocate the new NBLs in the IOCTL callback using NdisAllocateNetBufferList, copy the related NBL info to each of them, chain them together and then indicate them, and then in our return NBL callback, we just loop through the NBLs and free them using NdisFreeNetBufferList only if their NdisPoolHandle is equal to ours.

I also used the ndiskd report to get a view of the NDIS stack, and the only third party driver is the VirtualBox NDIS Light-Weight Filter which is below us on the NDIS stack. Could this be somehow related to the virtualBox LWF driver?

Also note that so far only one customer has had this issue, so i doubt that its a bug that we always double free the NBL, otherwise we would have caught this a long time ago during testing.