NDIS Lightweight filter unloading problems

Harvey_War · May 25, 2012, 5:14am

Hello folks,

We have a NDIS Lightweight filter based on the WDK “filter” sample, it’s working pretty well but we are facing an annoying issue when it comes to unloading it.

In our FilterReceiveNetBufferLists routine, we are creating a copy of the received buffer and originating a receive indication with the copy (including the NDIS_RECEIVE_FLAGS_RESOURCES flag). The original buffer is returned using NdisFReturnNetBufferLists.
It’s worth noting that we are making a new copy and calling NdisFIndicateReceiveNetBufferLists for every buffer in the received buffer list, so this means that we may be indicating several times for one NBL received.

Of course, we are freeing the new buffers right after returning from the indicate call.
Also, we are doing passthrough for every receive having the NDIS_RECEIVE_FLAGS_RESOURCES flag.

It seems to work fine, but we are having problems with the Unload phase, when there is some application using the network. If, for example, Dropbox is running (not necessarily uploading or downloading content, just active) and we try to unload the filter, the system enters in an almost freezed state, and the driver won’t unload until we close Dropbox; then everything goes back to normality and our filter unloads without problems.

Tracing our driver we can see several calls to FilterPause for the adapters and protocols installed on the system.
The hang occurs after returning from the FilterPause that corresponds to the NIC miniport. This is the stack:

[83d46958 System]
4.000040 83daf798 0003019 Blocked nt!KiSwapContext+0x26
nt!KiSwapThread+0x266
nt!KiCommitThreadWait+0x1df
nt!KeWaitForSingleObject+0x393
nt!ExWaitForRundownProtectionReleaseCacheAware+0x9c
tcpip!FlUnbindAdapter+0x61
ndis!ndisUnbindProtocol+0x235
ndis!ndisCloseMiniportBindingsForPause+0x1d3
ndis!ndisDetachFilter+0x288
ndis!NdisFDeregisterFilterDriver+0xf0
MYDRIVER!FilterUnload+0xc2
nt!IopLoadUnloadDriver+0x1e
nt!ExpWorkerThread+0x10d
nt!PspSystemThreadStartup+0x9e
nt!KiThreadStartup+0x19

Then when we close Dropbox, we can see NDIS calling FilterRestart, FilterPause and finally FilterDetach.

Doing some disassembly we have seen that after closing Dropbox, the system signals the event that unblocks us, and that TCPIP frees the NBL list calling “flpFreeNetBufferListChain”, so it seems pretty clear that the problem is related to buffers, so we tracked every indication, returning buffer, and so on, but we don’t see any buffer loss or mishandling.

Also, the NDISKD debugger extensions didn’t help me; trying to find pending NBLs shows this:

3: kd> !ndiskd.pendingnbls
Type information missing error for ndisGlobalNetBufferListPoolList

PHASE 1/3 Found 0 NBL pool(s).
Type information missing error for ndisMaxNumberOfProcessors
PHASE 2/3: Found 0 freed NBL(s).

Pending Nbl Currently held by _
No pending NBLs were found.
PHASE 3/3: Found 0 pending NBL(s) of 0 total NBL(s).
Search complete.

BTW, got the same output with the checked build version.

So, any ideas?

Thanks a lot in advance!

Jeffrey_Tippet_MSFT · May 25, 2012, 10:36pm

The !ndiskd.pendingnbls problem should be fixed in the next release of the WDK (coming out in the first week of June, so not long to wait). Sorry about that.

From the stack, I agree with your assessment - there is a problem with a packet somewhere. TCPIP will wait for two things during unbind:

When TCPIP gives *receive* packets up to higher layers (like TDI), then TCPIP must wait for those higher drivers to complete the packet before TCPIP can unbind.
When TCPIP *sends* packets down through the filter stack to the miniport, then TCPIP must wait for those send packets to complete back to it before TCPIP can unbind.

Therefore, the most common way for an NDIS filter to introduce a hang is if the NDIS filter loses a *send* packet. But from your description, I gather that you’re not messing with the *send* path at all. (Is this true?) If you don’t mess with the send path, then that rules out case #2 above. Therefore, the next most likely cause is something above TCPIP. It’s hard to imagine your NDIS filter causing problems up there.

The “dropbox theory” is also interesting. As far as I know, dropbox doesn’t have any kernel code, so it shouldn’t be interacting with your filter at all. It’s more likely that it’s interacting with some 3rd party firewall or WFP callout.

What I would do is check if your LWF is causing this problem at all. Remove your LWF, and build the WDK sample unmodified, and see if that repros the issue. Or, you can get similar test coverage by removing all LWFs and just disable/enable the NIC a few times. (Disabling the NIC also unbinds TCPIP, so if there’s a problem above TCPIP, then disabling the NIC will repro it.)

Another way to go about this is to look for 3rd party firewall or network security software. If there is any, disable it and see if the problem still repros. Or try the filter on a clean install of the OS.

Finally, !ndiskd.pendingnbls is a good way to diagnose these sorts of issues. When you get a working copy of !ndiskd, check its output for indications of packets that are still in-flight. If they are attributed to TCPIP, then you can bet that somebody above TCPIP (a TDI client, or possibly a WFP callout) is mishandling the NBL. If the packets are attributed to your LWF or the NIC driver, then that driver leaked the NBL on the send path.

Harvey_War · May 28, 2012, 6:58am

Hello Jeffrey, thanks a lot for your response.

On the “send” subject, yes, all we do in the send path is passing the buffers with NdisFSendNetBufferLists.
The system (Windows 7 x86) is pretty much clean, with no 3rd party software installed, just the usual services: QoS, file and printer sharing, NetBIOS, and WFP Lightweight Filter.

Another interesting thing we noticed, is that the problem (system almost frozen) also occurs when we try to unbind the “WDP Lightweight Filter” driver with our filter loaded. Closing Dropbox also fixes the issue.In fact, we have disabled every WFP and NDIS filter (except ours), and the problem is still there.

The thing is, we are pretty sure that it’s our filter’s fault, because if we change our Receive path and just do the passthru thing, our filter unloads without problems; it’s just that we cannot find the issue, as we are following every rule and documentation detail, and yet the problem is still there.

We have tried waiting on the FilterPause callback for any recv to finish, and also returning PENDING and completing the Pause later on, but the result is the same, and we only can recover the system by closing Dropbox.

Also, as a remainder: in our FilterReceive, we are making a copy of every buffer within the NBL, and calling NdisFIndicateReceiveNetBufferLists with NDIS_DEFAULT_PORT_NUMBER and NDIS_RECEIVE_FLAGS_RESOURCES.
Then, right after the indication, we just free our buffer and return the original NBL with NdisFReturnNetBufferLists. We do this only if the filter state is “running”; in every other case, we just call NdisFIndicateReceiveNetBufferLists with the original parameters.
Is that the right way of creating new indications?
Because, acting this way, if the “main” NBL has 3 buffers, then we will be doing 3 different indications…could there any problem with this method?

One interesting thing we have seen is that, when Dropbox is active, very frequently the NBLs in the Receive path contain more than just 1 or 2 buffers; this leads us to think that our behavior could be wrong.

Is there any situation in which the filter behavior should be different?
Is there any other way of identifying a pending NBL?

Any idea will be very welcome…thanks guys!

Arsen · May 22, 2023, 5:14pm

Hello. If you solved this problem, tell me how. I also have the same problem only in the receive path. Thanks