WFP Callout Driver Causing Out-Of-Memory on Domain Controller Due to High Non-Paged Pool Usage

vaibhav_langote · February 7, 2025, 5:24am

We have a WFP (Windows Filtering Platform) callout driver installed on a Domain Controller (DC) . Over the course of a week, the DC machine enters an Out-Of-Memory (OOM) state. After analyzing the kernel memory dump and logs, the following observations were made:

High SMB Traffic:

The DC server has an SMB share published, which is generating significant SMB traffic.
The drivers SRV2.SYS, SRVNET.SYS, and MRXSMB.SYS are allocating large amounts of Non-Paged Pool memory with the tag LShs (approximately 244 MB).

TCP Connections and WFP Streams:

Heavy Non-Paged Pool memory allocations are also observed for:
- TCP connections.
- WFP stream data.
- ALE (Application Layer Enforcement) endpoints.

WFP Callout Driver Involvement:

The issue only occurs when our WFP driver's callouts are registered.
Our driver pends SMB Tree Connect requests for user-mode processing and later performs FwpsStreamContinue0 to resume the connection.

Are there any known issues with WFP callouts and SMB traffic that could lead to excessive Non-Paged Pool memory usage?

1: kd> !poolused -t 5 2
..
 Sorting by NonPaged Pool Consumed

               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used

 LShs     58835    244748128          0            0	SMB2 lease hash table , Binary: srv2.sys   ⇒ 244MB
 EtwB       212      145678592         12        331776	Etw Buffer , Binary: nt!etw
 TTcb     55481     67480256          0            0	             TCP Connections , Binary: tcpip.sys
 StCx     51292     59088384          0            0	              WFP stream internal callout context , Binary: netio.sys
 AleE     58268     44765632          0            0            	ALE endpoint context , Binary: tcpip.sys
ViIn         1           8388608          0            0	              Verifier Internal Tag for pool tracking , Binary: nt!Vf
ConT       339      4845568          0            0	              UNKNOWN pooltag 'ConT', please update pooltag.txt
File     13398       5834208          0            0	              File objects

TOTAL   1342370    857100016     353620    153417152

DymOK93 · February 8, 2025, 7:44pm

There is a nearly 1:1 relationship between the number of open connections, their corresponding ALE endpoints, and (per-flow) callout contexts.
Are you sure SMB clients are closing their connections?
Do these numbers ever decrease?
Can you create a minimal example that reproduces the issue?

Jason_Stephenson · February 17, 2025, 10:08am

Are you associating any context with your ALE "flows" that you're not cleaning up?

vaibhav_langote · February 19, 2025, 3:58am

@Jason_Stephenson We are using Lookaside list to maintain context data, but our TAG is missing from "!poolused" command and Lookaside list data structure contains only few entries as per MEMEORY.DMP so looks like we are cleaning context entries correctly.

Does anybody have idea around "SMB2 lease hash table" ?

Jason_Stephenson · February 20, 2025, 11:27am

Any use of FwpsFlowAssociateContext0 ?

vaibhav_langote · February 21, 2025, 5:31am

No we are not attaching any context and not calling that API

vaibhav_langote · February 26, 2025, 7:15pm

@DymOK93 Not sure about SMB connections are still opened or closed buts its not clear why "SMB2 lease hash table" allocated so much from Non-paged pool, almost 250MB out of 4GB of total physical memory.

Is there any way to disable SMB Lease feature or control/throttle?

Trying to repro the issue but its not hitting the memory pressure and things gets normal over-the time.

system · April 27, 2025, 7:15pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.