BSOD in WFP driver based on WDK inspect sample

Hi

We have a WFP driver based on inspect WDK sample.
As in stack we can see mydriver!WFPCloneReinjectInbound+0x18c I am making an call to FwpsInjectTransportReceiveAsync0 function.

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 0000000000001200, (reserved)
Arg3: 0000000000000000, Memory contents of the pool block
Arg4: ffffe00005c8e168, Address of the block of pool being deallocated

Debugging Details:

POOL_ADDRESS: ffffe00005c8e168

FREED_POOL_TAG: NDnd

BUGCHECK_STR: 0xc2_7_NDnd

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 2

ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre

LAST_CONTROL_TRANSFER: from fffff8001d714f5c to fffff8001d5c38a0

STACK_TEXT:
ffffd000218881c8 fffff8001d714f5c : 00000000000000c2 0000000000000007 0000000000001200 0000000000000000 : nt!KeBugCheckEx
ffffd000218881d0 fffff80052303653 : 0000000000000000 ffffe000049b0500 ffffe000049a1390 0000000000000000 : nt!ExDeferredFreePool+0x6ec
ffffd000218882c0 fffff80053383455 : 0000000000000000 fffff800534fa6fd 0000000000000000 0000000000000000 : NETIO!NetioFreeMdl+0x232d3
ffffd00021888310 fffff800522d9142 : ffffe000031e3500 0000000000000001 0000000000000000 0000000000000000 : tcpip!FlpReturnNetBufferListChain+0x8b585
ffffd00021888360 fffff800522d53a2 : 0000000000000000 ffffe000049b05f0 0000000000000000 ffffe000050ee140 : NETIO!NetioDereferenceNetBufferList+0xb2
ffffd000218883a0 fffff800532fad53 : 0000000000000000 ffffd00021888400 0000000000000000 0000000000000000 : NETIO!NetioDereferenceNetBufferListChain+0x2e2
ffffd00021888440 fffff800532f9040 : fffff8005344b180 ffffe000050ee140 ffffe000024e0000 ffffe000024e0000 : tcpip!IppReceiveHeaderBatch+0x323
ffffd00021888560 fffff800533edd30 : ffffe00003488bd0 0000000000000000 0000000000000001 0000000000000000 : tcpip!IppFlcReceivePacketsCore+0x680
ffffd000218888e0 fffff800534fa2fd : ffffe00004ae2902 ffffe00002375c10 ffffd00021888bb9 ffffd00021883000 : tcpip!IppInspectInjectReceive+0x148
ffffd00021888940 fffff8001d52ef63 : 0000000000000000 0000000000000000 0000000000000000 fffff800534fa7c0 : fwpkclnt!FwppInjectionStackCallout+0xe5
ffffd000218889d0 fffff8005350b7ae : fffff800534fa218 ffffd00021888b40 0000000000000010 ffffe00003b32c70 : nt!KeExpandKernelStackAndCalloutInternal+0xf3
ffffd00021888ac0 fffff80052d0231c : ffffe00003b32c70 0000000000000000 ffffe000049b0700 ffffe00002e42650 : fwpkclnt!FwpsInjectTransportReceiveAsync0+0x2ea
ffffd00021888c00 fffff80052d026ed : ffffe000050ee140 ffffe00002e42650 fffff80052d06e10 0000000000000000 : mydriver!WFPCloneReinjectInbound+0x18c
ffffd00021888c80 fffff8001d571554 : ffffe00003b33880 ffffe00002e42650 0000000000000080 0000000000000001 : mydriver!WFP_AuthenticateThread+0x315
ffffd00021888d40 fffff8001d5c9ec6 : ffffd000205ce180 ffffe00003b33880 ffffd000205da240 0000000000005000 : nt!PspSystemThreadStartup+0x58
ffffd00021888da0 0000000000000000 : ffffd00021889000 ffffd00021883000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

There are some observations which may help:

-> Happens sometimes when we pend packet at ALE AUTH RECIEVE ( INBOUND ) and then process packets in separate thread and then while reinjecting it deferences the NET_BUFFER_LIST.
Since this happens only sometimes,so when we try to deference BSOD happens.

-> The machine has NSClient++ installed.It is observed that when nscp.exe connects at port 5666 then at server process it is INBOUND at 5666 port and while reinjecting the packet it dereferences.After uninstalling NSClient++ this problem also happened though very infrequently.

-> I want to know under what conditions does derefernce happens so that I can skip dereference myself later for that particular case.

-> Searching through google I could find many such cases where WFP driver crashes similarly but everywhere the solution is just to uninstall the particular driver.

That’s eerie, because I saw a remarkably similar crash on my laptop yesterday:

ffffd0013026d1d8 fffff8025c486f05 : 00000000000000c2 0000000000000007 0000000000001254 0000000000000000 : nt!KeBugCheckEx
ffffd0013026d1e0 fffff8019b5991bf : ffffe001c6da3890 ffffe001c0dd8a70 ffffe001cfe48080 0000000000000000 : nt!ExFreePool+0x23d
ffffd0013026d2c0 fffff8019b0a4149 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : NETIO!NetioFreeMdl+0x2707f
ffffd0013026d310 fffff8019b571440 : ffffe001c0c30b60 0000000000000000 0000000000000000 0000000000000000 : fwpkclnt!FwppInjectComplete+0x59
ffffd0013026d350 fffff8019ae868aa : ffffe001c1e78250 0000000000000001 0000000000000000 ffffe001cb655d90 : NETIO!NetioDereferenceNetBufferListChain+0x2d0
ffffd0013026d400 fffff8019ae42e3d : ffffe001c17e0370 0000000001543498 ffffe001c1e785b4 00000033e929b7ac : tcpip!TcpFlushDelay+0x7a
ffffd0013026d490 fffff8019ae41e29 : ffffe001c1799af0 ffffd001302669f4 ffffd001302668f4 ffffd00100000003 : tcpip!TcpPreValidatedReceive+0x3ad
ffffd0013026d580 fffff8019ae41a22 : 0000000000000000 0000000000000002 0000000000000000 0000000000000006 : tcpip!IppDeliverListToProtocol+0x59
ffffd0013026d630 fffff8019ae40e74 : 0000000000000001 00000a8000000000 ffffd00100000018 0000000000000000 : tcpip!IppProcessDeliverList+0x62
ffffd0013026d6a0 fffff8019ae958a8 : 0000000000003dfb 0000000000000020 0000000000000000 ffffd0013026d828 : tcpip!IppReceiveHeaderBatch+0x214
ffffd0013026d7a0 fffff8019ae95469 : 0000000000000000 fffff8025c290a5a fffff8019afe9300 fffff8019afed9c8 : tcpip!IppLoopbackIndicatePackets+0x1f8
ffffd0013026d820 fffff8025c290925 : ffffe001c451a080 ffffd0013026d9e0 fffff8019ae95360 0000000000000000 : tcpip!IppLoopbackTransmitCalloutRoutine+0x109
ffffd0013026d890 fffff8019ae3f154 : 0000000000000000 0000000000000000 0000000000000002 fffff8019afe9310 : nt!KeExpandKernelStackAndCalloutInternal+0x85
ffffd0013026d8e0 fffff8019ae3e8c5 : ffffe001c17dca78 0000000000000000 ffffe001c17dca78 ffffd00100003b9a : tcpip!IppDispatchSendPacketHelper+0x5f4
ffffd0013026db30 fffff8019ae3cbc8 : ffffd0013026df00 ffffe001c2a11040 ffffd0013026df00 ffffe001c17dca78 : tcpip!IppPacketizeDatagrams+0x2e5
ffffd0013026dc60 fffff8019af88b6e : 0000000000000000 ffffd0013026df07 fffff8019afe9310 ffffe001c0e55ad0 : tcpip!IppSendDatagramsCommon+0x4b8
ffffd0013026ded0 fffff8019b0a4d70 : ffffe001c451a080 2000000168f46907 fffff8019b0a4cd0 fffff8019b5718db : tcpip!IppInspectInjectTlSend+0x16e
ffffd0013026e000 fffff8025c290925 : ffffd0013026e1e0 ffffd0013026e1e0 0000000000000000 ffffe001c773f1e8 : fwpkclnt!FwppInjectionStackCallout+0xa0
ffffd0013026e090 fffff8019b0a66c6 : ffffe001c15d1ac0 ffffe001c0c30b60 ffffe001c15d19a0 ffffe001c15d1ba0 : nt!KeExpandKernelStackAndCalloutInternal+0x85
ffffd0013026e0e0 fffff8019b0a4c8e : 0000000000000007 ffffd0013026e220 ffffd0013026e220 ffffe001c0c30b60 : fwpkclnt!NetioExpandKernelStackAndCallout+0x52
ffffd0013026e120 fffff8019b0a6393 : ffffe00100000000 ffffe001c6f844e0 ffffe001c1c24a90 0000000000003dfb : fwpkclnt!FwppInjectTransportSendAsync+0x552
ffffd0013026e320 fffff8019bb3a85f : 00000000003bb953 fffff8019b22e766 0000000000000000 ffffe001cb506ba0 : fwpkclnt!FwpsInjectTransportSendAsync0+0x63
ffffd0013026e390 fffff8019bb3f55d : ffffe001c1c24a90 0000000000000000 00000000046eedb8 ffffe001c3d78fc0 : vsdatant+0xa85f
ffffd0013026e420 fffff8019bb5410e : 000000008400008f 0000000000000000 ffffe001c1c24a90 0000000000000000 : vsdatant+0xf55d
ffffd0013026e4d0 fffff8019bb54ed8 : ffffe001c784c140 ffffe001c7836f00 e001c79a40607aa3 000000000012019f : vsdatant+0x2410e
ffffd0013026e680 fffff8019bb54f2c : 0000000000000001 0000000000000000 ffffe001c79a4090 ffffd0013026ea80 : vsdatant+0x24ed8
ffffd0013026e720 fffff8025c6516b3 : 0000000000000000 ffffd0013026ea80 ffffe001c79a4090 ffffe001c451a080 : vsdatant+0x24f2
ffffd0013026e750 fffff8025c650456 : e001c4433f30bd55 00000000001f0003 0000000000000000 0000000000000000 : nt!IopXxxControlFile+0x1253
ffffd0013026e920 fffff8025c36cb63 : 0000000000000000 0000000000000001 0000000000000001 fffff8025c64de00 : nt!NtDeviceIoControlFile+0x56
ffffd0013026e990 000000006bca1e52 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x13
0000000000e7f0e8 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x6bca1e52

My stack isn’t identical, but it is quite similar. In my case, I suspect it is related to the (just released) VPN client software, but I don’t have a smoking gun yet (though the fact it hasn’t happened since I got this crash dump because I haven’t used the VPN client *is* a bit of a smoking gun.)

Since I am not sure exactly what’s going wrong yet, I enabled driver verifier: special pool, pool tracking, irp logging and I/O verification. I did this on the drivers that were on the stack so that the next time this happens I’ll get more information. I’d suggest you try something similar and see what you find.

Tony
OSR

Thanks

This issue is reproducible on only one machine and mostly by nscp.exe 5666 INBOUND.
disabling nscp.exe it still happended on svchost 3389 but after long time ( 3 days )

According to my observation and analysis though clone nbl is passed to FwpsInjectTransportReceiveAsync0 but in some cases original nbl is dereferenced.

So in race conditions BSOD may also happen when we try to dereference it later.
Or if we have already dereferenced then it may happen as I have enclosed the dump.

I am unable to find root cause why in some cases original nbl is dereferenced.

probably the decision happens in tcpip!IppReceiveHeaderBatch but could not decipher what.

FYI

Check the IPSec.
If it’s IPSec Packet, Skip or Reconstruct.

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/fwpsk/nf-fwpsk-fwpsinjecttransportreceiveasync0

To allow IPsec to process inbound packets first, the callout that inspects the transport layer data must have a lower value of subLayerWeight in the FWPS_FILTER0 structure than the universal sublayer. In addition, the callout driver must not intercept tunnel-mode packets for which the combination of FWPS_PACKET_LIST_INBOUND_IPSEC_INFORMATION0 members ( isTunnelMode && ! isDeTunneled ) is returned by the FwpsGetPacketListSecurityInformation0 function. The callout driver must wait for the packet to be detunneled and then should intercept it at the transport layer or at a forward layer.

https://social.msdn.microsoft.com/Forums/en-US/7c593871-ef79-45f0-8d87-5f839b85c4c5/wfp-driver-bsod-at-fwpsinjecttransportreceiveasync0?forum=wfp

if (packet->ipSecProtected)
{
 //
 // When an IpSec protected packet is indicated to AUTH_RECV_ACCEPT or 
 // INBOUND_TRANSPORT layers, for performance reasons the tcpip stack
 // does not remove the AH/ESP header from the packet. And such 
 // packets cannot be recv-injected back to the stack w/o removing the
 // AH/ESP header. Therefore before re-injection we need to "re-build"
 // the cloned packet.
 //
status = 
FwpsConstructIpHeaderForTransportPacket(
clonedNetBufferList,
packet->ipHeaderSize,
packet->addressFamily,
(UINT8*)&packet->remoteAddr,
(UINT8*)&packet->localAddr,
packet->protocol,
0,
NULL,
0,
0,
NULL,
0,
0 );

if (!NT_SUCCESS(status))
{
goto Exit;
}

/roll eyes

You know you’re replying to a six year old post, right?

Thread locked.

Peter