Sleep/hibernate kills network connection with NDIS filter loaded

For a KMDF-based NDIS LWF, is there anything special that needs to happen during the sleep/hibernate cycle? I notice that when systems (both Win7 and Win10) with my filter (modifying, optional) loaded resumes from a suspended state, the network connection is dead. It’s not until I unload the driver does it pop back to life. My current solution is to just unload the driver when the service is notified that the system is being suspended and then reload it again when it resumes but that seems clunky. I’ve been trying to reproduce in a VM but everything works fine so it’s difficult to figure out what the hold up is. Is there something I need to specifically handle for suspend/resume or is it more likely just causing a bug in my code to manifest itself?

This should work seamlessly. And breaking this is not a common “gotcha” of LWFs, so I don’t have a psychic debugging answer for you. Instead, some generic avenues for investigation:

  • Make sure your filter isn’t interfering with any OIDs (especially OID_PNP_SET_POWER, I suppose).
  • If the miniport driver is a little older, it might not implement the NDIS_MINIPORT_ATTRIBUTES_NO_PAUSE_ON_SUSPEND flag, in which case NDIS will pause the datapath when going to low power, and restart the datapath when coming out of low power. Make sure your filter driver doesn’t lose NBLs or otherwise causes a hang during pause. This is easy to see from !ndiskd.miniport, since you’ll see evidence that the miniport is in the middle of a pause transition. Within a few hundred milliseconds of resuming from low power, the miniport should have come out of pause already.
  • In general, filter drivers are unaware of system power states, and don’t really need any special code to deal with power transitions. Check if your filter driver has any such code (aside, I suppose, from the workaround you mentioned), and code review it again. If the filter is doing something strange in response to a power IRP or OID_PNP_SET_POWER / OID_PM_PARAMETERS, that could be the source of the problem.
  • “the network connection is dead” is generic. If you can narrow that down to a specific problem, that might give more hints on the symptoms. Is the miniport in a bad state? Is there an NDIS thread stuck somewhere? Are OIDs getting failed? Are NBLs getting dropped? Does this repro with different miniport drivers from different IHVs, or just a particular one?

@“Jeffrey_Tippet_[MSFT]” said:
This should work seamlessly. And breaking this is not a common “gotcha” of LWFs, so I don’t have a psychic debugging answer for you. Instead, some generic avenues for investigation:

  • Make sure your filter isn’t interfering with any OIDs (especially OID_PNP_SET_POWER, I suppose).
  • If the miniport driver is a little older, it might not implement the NDIS_MINIPORT_ATTRIBUTES_NO_PAUSE_ON_SUSPEND flag, in which case NDIS will pause the datapath when going to low power, and restart the datapath when coming out of low power. Make sure your filter driver doesn’t lose NBLs or otherwise causes a hang during pause. This is easy to see from !ndiskd.miniport, since you’ll see evidence that the miniport is in the middle of a pause transition. Within a few hundred milliseconds of resuming from low power, the miniport should have come out of pause already.
  • In general, filter drivers are unaware of system power states, and don’t really need any special code to deal with power transitions. Check if your filter driver has any such code (aside, I suppose, from the workaround you mentioned), and code review it again. If the filter is doing something strange in response to a power IRP or OID_PNP_SET_POWER / OID_PM_PARAMETERS, that could be the source of the problem.
  • “the network connection is dead” is generic. If you can narrow that down to a specific problem, that might give more hints on the symptoms. Is the miniport in a bad state? Is there an NDIS thread stuck somewhere? Are OIDs getting failed? Are NBLs getting dropped? Does this repro with different miniport drivers from different IHVs, or just a particular one?

Thanks Jeff,

Not doing anything crazy with OID’s other than PROMISCUOUS but after your last answer to my question that all works fine. That’s what’s so strange about this because I didn’t think I needed to handle any power issues. When I say “the connection is dead” I mean that ping.exe returns “general failure,” the adapter IP is 169.xxx, and Wireshark shows a lot of attempts looking for the gateway with no response. These are test boxes that are not configured for kernel debugging so I don’t have the answer about miniport or NBLs. Like I said, it doesn’t appear in the VM so it’s a bit harder to debug. I may “NotMyFault” the box just to get a dump when it’s in a bad state so I at least have a dump to investigate.

I believe I have identified the problem but I’m not sure how it got this way. When running !ndiskd.filter on MyLwf1 I get the following output.

kd> !ndiskd.filter fffff6026ed3cc60
State Running
Datapath Receive only
References 1
Flags RUNNING

The datapath value is set to ‘Receive only.’ What does that refer to and how does it get into that state? My other loaded LWF (MyLwf2) has the datapath set to ‘Normal.’ If that’s not the problem, there is nothing else I’m seeing in the !ndiskd output that looks unusual.

6: kd> !ndiskd.netadapter ffffa38fe192c1a0

MINIPORT

Realtek PCIe GBE Family Controller

Ndis handle        ffffa38fe192c1a0
Ndis API version   v6.40
Adapter context    ffffa38fe1b3a000
Driver             ffffa38fe7e65ae0 - rt640x64  v9.1
Network interface  fffff6025f54ea20

Media type         802.3

STATE

Miniport           Running
Device PnP         Started             Show state history
Datapath           Normal
Interface          Up
Media              Connected
Power              D0
References         0n15                Show detail
Total resets       0
Pending OID        None
Flags              BUS_MASTER, DEFAULT_PORT_ACTIVATED,
                   SUPPORTS_MEDIA_SENSE, DOES_NOT_DO_LOOPBACK,
                   MEDIA_CONNECTED
PnP flags          RECEIVED_START, HARDWARE_DEVICE

6: kd> !ndiskd.pendingnbls ffffa38fe192c1a0

PHASE 1/3: Found 51 NBL pool(s).
PHASE 2/3: Found 0 freed NBL(s).

Pending Nbl        Currently held by                                        
No pending NBLs were found.                                              

PHASE 3/3: Found 0 pending NBL(s) of 1029 total NBL(s).
Search complete.
6: kd> !ndiskd.oid -miniport ffffa38fe192c1a0

ALL PENDING OIDs

[Showing all OIDs on the stack for miniport ffffa38fe192c1a0]

No pending or queued OIDs were found.

6: kd> !ndiskd.filter fffff6026e78ac60

FILTER

Realtek PCIe GBE Family Controller-MyLwf2-0000

Ndis handle        fffff6026e78ac60
Filter driver      fffff6026162ed60 - MyLwf2
Module context     fffff6027111cdb0
Miniport           ffffa38fe192c1a0 - Realtek PCIe GBE Family Controller
Network interface  fffff60271d66a20

State              Running
Datapath           Normal
References         1
Flags              RUNNING

Higher filter      fffff60272938c60 - Realtek PCIe GBE Family Controller-VirtualBox NDIS Light-Weight Filter-0000
Lower filter       fffff6026ed3cc60 - Realtek PCIe GBE Family Controller-MyLwf1-0000

Driver handlers

6: kd> !ndiskd.filter fffff6026ed3cc60

FILTER

Realtek PCIe GBE Family Controller-MyLwf1-0000

Ndis handle        fffff6026ed3cc60
Filter driver      fffff6025fe5ad60 - MyLwf1
Module context     fffff6026e0ccdd0
Miniport           ffffa38fe192c1a0 - Realtek PCIe GBE Family Controller
Network interface  fffff6026f54ca20

State              Running
Datapath           Receive only
References         1
Flags              RUNNING

Higher filter      fffff6026e78ac60 - Realtek PCIe GBE Family Controller-MyLwf2-0000
Lower filter       fffff6026f04ac60 - Realtek PCIe GBE Family Controller-WFP Native MAC Layer LightWeight Filter-0000

“Receive only” happens when your filter only registers FilterReceiveNetBufferLists and FilterReturnNetBufferLists, and it does not register FilterSendNetBufferLists or FilterSendNetBufferListsComplete.

It’s perfectly fine for a filter to do this – it just means that the Tx path will skip over the LWF, and the LWF only participates in the Rx path. But if you didn’t do it intentionally, obviously that’s something to look into.

It looks like there’s another 3rd party LWF in there: the VirtualBox one. Try removing it if you can, and see if that improves things. There might be an interop issue.

Everything else looks pretty good. It seems like everything is in a good state, but for some reason, someone’s not delivering packets. You can try capturing traffic off the box to see if packets are getting out. (Running Wireshark locally doesn’t tell you much about any bugs in the host driver stack, because Wireshark interfaces with the driver stack in a clumsy and weird way.) The builtin netmon driver can capture between each Modifying LWF, so you can see whether packets are getting through any particular layer. (Use netsh trace start CaptureMultiLayer=yes).

You can also try setting a breakpoint on NdisMIndicateReceiveNetBufferLists, and just following the packets up the stack; or NdisSendNetBufferLists down the stack.

Tried to do the netsh capture but on resume got a BSOD. The current process context is my service but not seeing any signs of my drivers. Based on the bugcheck code, I thought it was a lock issue but there is only one lock currently held and one waiting and both are related to srvnet.sys. There are no pending NBLs but some pending OIDs. !verifier on the various memory addresses are not returning anything and !dpcs doesn’t show anything that seems to be of interest.

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
Arg1: ffffa505b0dcf194
Arg2: 0000000000000002
Arg3: 0000000000000000
Arg4: fffff80f60ebcb90

 # Child-SP          RetAddr           Call Site
ffffa505`af237648 fffff801`0725a669 nt!KeBugCheckEx
ffffa505`af237650 fffff801`072572eb nt!KiBugCheckDispatch+0x69
ffffa505`af237790 fffff80f`60ebcb90 nt!KiPageFault+0x42b
ffffa505`af237920 fffff80f`60ead38c ndis!ndisOidPostPacketFilter+0xd0
ffffa505`af237a00 fffff80f`60ee5ca2 ndis!ndisOidRequestComplete+0xfc
ffffa505`af237aa0 fffff80f`60ee3375 ndis!ndisMOidRequestCompleteInternal+0xd2
ffffa505`af237b20 fffff80f`60f1d1c4 ndis!ndisMRawOidRequestComplete+0xf5
ffffa505`af237b70 fffff80f`60ee388c ndis!ndisMpHookDefaultOidRequestComplete+0x14
ffffa505`af237ba0 fffff80f`637867c4 ndis!NdisMOidRequestComplete+0xac
ffffa505`af237be0 fffff80f`60eb14dd rt640x64!MpPollingDpc+0x18c8
ffffa505`af237c60 fffff801`07133669 ndis!ndisMTimerObjectDpc+0xcd
ffffa505`af237cb0 fffff801`071326d7 nt!KiProcessExpiredTimerList+0x159
ffffa505`af237da0 fffff801`07250a45 nt!KiRetireDpcList+0x4c7
ffffa505`af237fb0 fffff801`07250840 nt!KxRetireDpcList+0x5
ffffa505`b0066660 fffff801`0724ffc9 nt!KiDispatchInterruptContinue
ffffa505`b0066690 fffff801`071762ca nt!KiDpcInterrupt+0x2a9
ffffa505`b0066820 fffff801`071758c2 nt!MiWalkPageTablesRecursively+0x10aa
ffffa505`b0066900 fffff801`071758c2 nt!MiWalkPageTablesRecursively+0x6a2
ffffa505`b00669e0 fffff801`071758c2 nt!MiWalkPageTablesRecursively+0x6a2
ffffa505`b0066ac0 fffff801`071735c7 nt!MiWalkPageTablesRecursively+0x6a2
ffffa505`b0066ba0 fffff801`070ae953 nt!MiWalkPageTables+0x1e7
ffffa505`b0066c90 fffff801`072f82e4 nt!MiEmptyWorkingSetInitiate+0x103
ffffa505`b0066e60 fffff801`072f8e8a nt!MiEmptyTargetedWorkingSet+0x78
ffffa505`b0066eb0 fffff801`078b3a4e nt!MiTrimAllSystemPagableMemory+0xde
ffffa505`b0066f00 fffff801`078c88f7 nt!MmVerifierTrimMemory+0x6a
ffffa505`b0066f30 fffff801`078c6d3e nt!ViKeRaiseIrqlSanityChecks+0xcb
ffffa505`b0066f70 fffff801`078c6b86 nt!VerifierKeAcquireInStackQueuedSpinLockCommon+0x62
ffffa505`b0066fa0 fffff80f`5f4d5b4a nt!VerifierKeAcquireInStackQueuedSpinLock+0x16
ffffa505`b0066fe0 fffff80f`5f4d5683 FLTMGR!FltpPerformPostCallbacks+0x16a
ffffa505`b00670b0 fffff80f`5f4d4f3c FLTMGR!FltpPassThroughCompletionWorker+0x73
ffffa505`b0067120 fffff801`078b3634 FLTMGR!FltpPassThroughCompletion+0xc
ffffa505`b0067150 fffff801`0712501f nt!IovpLocalCompletionRoutine+0x174
ffffa505`b00671b0 fffff801`078b2f71 nt!IopfCompleteRequest+0x11f
ffffa505`b00672d0 fffff801`072952ed nt!IovCompleteRequest+0x1bd
ffffa505`b00673c0 fffff80f`625bcaff nt!IofCompleteRequest+0x17041d
ffffa505`b00673f0 fffff801`07206b9a Npfs!NpFsdFileSystemControl+0x3f
ffffa505`b0067420 fffff801`078b2d29 nt!IopfCallDriver+0x56
ffffa505`b0067460 fffff801`072884ef nt!IovCallDriver+0x275
ffffa505`b00674a0 fffff80f`5f7f53b3 nt!IofCallDriverSpecifyReturn+0x819cf
ffffa505`b00674d0 fffff801`078c0631 VerifierExt!IofCallDriver_internal_wrapper+0x13
ffffa505`b0067500 fffff80f`5f4d7207 nt!VerifierIofCallDriver+0x21
ffffa505`b0067540 fffff80f`5f50aed0 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x157
ffffa505`b00675b0 fffff801`078d0ab8 FLTMGR!FltpFsControl+0x110
ffffa505`b0067610 fffff801`078d0c16 nt!ViGenericDispatchHandler+0x54
ffffa505`b0067650 fffff801`07206b9a nt!ViGenericFileSystemControl+0x16
ffffa505`b0067680 fffff801`078b2d29 nt!IopfCallDriver+0x56
ffffa505`b00676c0 fffff801`072961ad nt!IovCallDriver+0x275
ffffa505`b0067700 fffff801`075a9f7b nt!IofCallDriver+0x16d9cd
ffffa505`b0067740 fffff801`075ae4ea nt!IopSynchronousServiceTail+0x1ab
ffffa505`b00677f0 fffff801`0756b2a6 nt!IopXxxControlFile+0x68a
ffffa505`b0067920 fffff801`0725a143 nt!NtFsControlFile+0x56
ffffa505`b0067990 00007ff8`ea1db0c4 nt!KiSystemServiceCopyEnd+0x13
0000006f`02ffe558 00007ff8`e67db1e5 ntdll!NtFsControlFile+0x14
0000006f`02ffe560 00007ff8`cd69c485 KERNELBASE!WaitNamedPipeW+0x1e5
0000006f`02ffe670 000001c3`72ceeec0 System_Core_ni!DomainBoundILStubClass.IL_STUB_PInvoke(System.String, Int32)

1: kd> !locks
KD: Scanning for held 
Resource @ 0xffffbe0c75df5b50    Exclusively owned
    Contention Count = 4
    NumberOfExclusiveWaiters = 1
     Threads: ffffbe0c75fe8040-01<*> 
     Threads Waiting On Exclusive Access:
              ffffbe0c73874040       
27030 total locks, 1 locks currently held

1: kd> .thread ffffbe0c`73874040
nt!KiCommitThreadWait+0x13b
nt!KeWaitForSingleObject+0x1ff
nt!ExpWaitForResource+0x6d
nt!ExAcquireResourceExclusiveLite+0x1c9
srvnet!SrvNetUpdateNetNameWorkerRoutine+0x95
nt!IopProcessWorkItem+0x8b
nt!ExpWorkerThread+0xf5

1: kd> .thread ffffbe0c75fe8040
nt!KiCommitThreadWait+0x13b
nt!KeWaitForSingleObject+0x1ff
nt!IopParseDevice+0x1609
nt!ObpLookupObjectName+0x73b
nt!ObOpenObjectByNameEx+0x1df
nt!IopCreateFile+0x3f5
nt!NtCreateFile+0x79
TDI!TdiOpenNetbiosAddress+0x140
srvnet!SrvNetOpenEndpointHandle+0x49
srvnet!SrvNetTdiAllocateEndpoint+0x27c
srvnet!SrvNetAllocateEndpoint+0xb8
srvnet!SrvNetAddServedName+0x16e
srvnet!SvcXportAdd+0x12d
srvnet!SrvAdminProcessFsctlFsp+0xaa
nt!IopProcessWorkItem+0x8b
nt!ExpWorkerThread+0xf5

Only the first OID_GEN_CURRENT_PACKET_FILTER is marked as complete and all others are not completed. The only OID fiddling my filters do is add PROMISCUOUS to PACKET_FILTER. All others are just passed through.

ALL PENDING OIDs
    NetAdapter         ffffbe0c736db1a0 - Realtek PCIe GBE Family Controller
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f4d4c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0000
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f554c60 - Realtek PCIe GBE Family Controller-WFP Native MAC Layer LightWeight Filter-0000
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f594c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0001
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4fa8ac60 - Realtek PCIe GBE Family Controller-MyLwf1-0000
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f558c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0003
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f5c2c60 - Realtek PCIe GBE Family Controller-MyLwf2-0000
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f586c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0004
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f84ec60 - Realtek PCIe GBE Family Controller-VirtualBox NDIS Light-Weight Filter-0000
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f806c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0005
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f9fcc60 - Realtek PCIe GBE Family Controller-QoS Packet Scheduler-0000
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f964c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0006
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
    Filter             ffffdf8e4f5f2c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0007
        Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Queued OIDs        OID_GEN_STATISTICS
                           OID_GEN_STATISTICS

1: kd> !dpcs
CPU Type      KDPC       Function
 1: Normal  : 0xfffff80107447780 0xfffff8010719bbe0 nt!PpmCheckPeriodicStart

1: kd> dt 0xfffff80107447780 _KDPC
ntdll!_KDPC
   +0x000 TargetInfoAsUlong : 0x10313
   +0x000 Type             : 0x13 ''
   +0x001 Importance       : 0x3 ''
   +0x002 Number           : 1
   +0x008 DpcListEntry     : _SINGLE_LIST_ENTRY
   +0x010 ProcessorHistory : 0xff
   +0x018 DeferredRoutine  : 0xfffff801`0719bbe0     void  nt!PpmCheckPeriodicStart+0
   +0x020 DeferredContext  : (null) 
   +0x028 SystemArgument1  : (null) 
   +0x030 SystemArgument2  : (null) 
   +0x038 DpcData          : 0xffffbe0c`702cc3b0 Void

I’ll disable VirtualBox next to see if that makes any difference.