Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Sleep/hibernate kills network connection with NDIS filter loaded

MDHMDH Member Posts: 16

For a KMDF-based NDIS LWF, is there anything special that needs to happen during the sleep/hibernate cycle? I notice that when systems (both Win7 and Win10) with my filter (modifying, optional) loaded resumes from a suspended state, the network connection is dead. It's not until I unload the driver does it pop back to life. My current solution is to just unload the driver when the service is notified that the system is being suspended and then reload it again when it resumes but that seems clunky. I've been trying to reproduce in a VM but everything works fine so it's difficult to figure out what the hold up is. Is there something I need to specifically handle for suspend/resume or is it more likely just causing a bug in my code to manifest itself?

Comments

  • Jeffrey_Tippet_[MSFT]Jeffrey_Tippet_[MSFT] Member - All Emails Posts: 560

    This should work seamlessly. And breaking this is not a common "gotcha" of LWFs, so I don't have a psychic debugging answer for you. Instead, some generic avenues for investigation:

    • Make sure your filter isn't interfering with any OIDs (especially OID_PNP_SET_POWER, I suppose).
    • If the miniport driver is a little older, it might not implement the NDIS_MINIPORT_ATTRIBUTES_NO_PAUSE_ON_SUSPEND flag, in which case NDIS will pause the datapath when going to low power, and restart the datapath when coming out of low power. Make sure your filter driver doesn't lose NBLs or otherwise causes a hang during pause. This is easy to see from !ndiskd.miniport, since you'll see evidence that the miniport is in the middle of a pause transition. Within a few hundred milliseconds of resuming from low power, the miniport should have come out of pause already.
    • In general, filter drivers are unaware of system power states, and don't really need any special code to deal with power transitions. Check if your filter driver has any such code (aside, I suppose, from the workaround you mentioned), and code review it again. If the filter is doing something strange in response to a power IRP or OID_PNP_SET_POWER / OID_PM_PARAMETERS, that could be the source of the problem.
    • "the network connection is dead" is generic. If you can narrow that down to a specific problem, that might give more hints on the symptoms. Is the miniport in a bad state? Is there an NDIS thread stuck somewhere? Are OIDs getting failed? Are NBLs getting dropped? Does this repro with different miniport drivers from different IHVs, or just a particular one?
  • MDHMDH Member Posts: 16

    @Jeffrey_Tippet_[MSFT] said:
    This should work seamlessly. And breaking this is not a common "gotcha" of LWFs, so I don't have a psychic debugging answer for you. Instead, some generic avenues for investigation:

    • Make sure your filter isn't interfering with any OIDs (especially OID_PNP_SET_POWER, I suppose).
    • If the miniport driver is a little older, it might not implement the NDIS_MINIPORT_ATTRIBUTES_NO_PAUSE_ON_SUSPEND flag, in which case NDIS will pause the datapath when going to low power, and restart the datapath when coming out of low power. Make sure your filter driver doesn't lose NBLs or otherwise causes a hang during pause. This is easy to see from !ndiskd.miniport, since you'll see evidence that the miniport is in the middle of a pause transition. Within a few hundred milliseconds of resuming from low power, the miniport should have come out of pause already.
    • In general, filter drivers are unaware of system power states, and don't really need any special code to deal with power transitions. Check if your filter driver has any such code (aside, I suppose, from the workaround you mentioned), and code review it again. If the filter is doing something strange in response to a power IRP or OID_PNP_SET_POWER / OID_PM_PARAMETERS, that could be the source of the problem.
    • "the network connection is dead" is generic. If you can narrow that down to a specific problem, that might give more hints on the symptoms. Is the miniport in a bad state? Is there an NDIS thread stuck somewhere? Are OIDs getting failed? Are NBLs getting dropped? Does this repro with different miniport drivers from different IHVs, or just a particular one?

    Thanks Jeff,

    Not doing anything crazy with OID's other than PROMISCUOUS but after your last answer to my question that all works fine. That's what's so strange about this because I didn't think I needed to handle any power issues. When I say "the connection is dead" I mean that ping.exe returns "general failure," the adapter IP is 169.xxx, and Wireshark shows a lot of attempts looking for the gateway with no response. These are test boxes that are not configured for kernel debugging so I don't have the answer about miniport or NBLs. Like I said, it doesn't appear in the VM so it's a bit harder to debug. I may "NotMyFault" the box just to get a dump when it's in a bad state so I at least have a dump to investigate.

  • MDHMDH Member Posts: 16

    I believe I have identified the problem but I'm not sure how it got this way. When running !ndiskd.filter on MyLwf1 I get the following output.

    kd> !ndiskd.filter fffff6026ed3cc60
    State Running
    Datapath Receive only
    References 1
    Flags RUNNING

    The datapath value is set to 'Receive only.' What does that refer to and how does it get into that state? My other loaded LWF (MyLwf2) has the datapath set to 'Normal.' If that's not the problem, there is nothing else I'm seeing in the !ndiskd output that looks unusual.

    6: kd> !ndiskd.netadapter ffffa38fe192c1a0

    MINIPORT

    Realtek PCIe GBE Family Controller
    
    Ndis handle        ffffa38fe192c1a0
    Ndis API version   v6.40
    Adapter context    ffffa38fe1b3a000
    Driver             ffffa38fe7e65ae0 - rt640x64  v9.1
    Network interface  fffff6025f54ea20
    
    Media type         802.3
    

    STATE

    Miniport           Running
    Device PnP         Started             Show state history
    Datapath           Normal
    Interface          Up
    Media              Connected
    Power              D0
    References         0n15                Show detail
    Total resets       0
    Pending OID        None
    Flags              BUS_MASTER, DEFAULT_PORT_ACTIVATED,
                       SUPPORTS_MEDIA_SENSE, DOES_NOT_DO_LOOPBACK,
                       MEDIA_CONNECTED
    PnP flags          RECEIVED_START, HARDWARE_DEVICE
    

    6: kd> !ndiskd.pendingnbls ffffa38fe192c1a0

    PHASE 1/3: Found 51 NBL pool(s).
    PHASE 2/3: Found 0 freed NBL(s).

    Pending Nbl        Currently held by                                        
    No pending NBLs were found.                                              
    

    PHASE 3/3: Found 0 pending NBL(s) of 1029 total NBL(s).
    Search complete.
    6: kd> !ndiskd.oid -miniport ffffa38fe192c1a0

    ALL PENDING OIDs

    [Showing all OIDs on the stack for miniport ffffa38fe192c1a0]
    

    No pending or queued OIDs were found.

    6: kd> !ndiskd.filter fffff6026e78ac60

    FILTER

    Realtek PCIe GBE Family Controller-MyLwf2-0000
    
    Ndis handle        fffff6026e78ac60
    Filter driver      fffff6026162ed60 - MyLwf2
    Module context     fffff6027111cdb0
    Miniport           ffffa38fe192c1a0 - Realtek PCIe GBE Family Controller
    Network interface  fffff60271d66a20
    
    State              Running
    Datapath           Normal
    References         1
    Flags              RUNNING
    
    Higher filter      fffff60272938c60 - Realtek PCIe GBE Family Controller-VirtualBox NDIS Light-Weight Filter-0000
    Lower filter       fffff6026ed3cc60 - Realtek PCIe GBE Family Controller-MyLwf1-0000
    
    Driver handlers
    

    6: kd> !ndiskd.filter fffff6026ed3cc60

    FILTER

    Realtek PCIe GBE Family Controller-MyLwf1-0000
    
    Ndis handle        fffff6026ed3cc60
    Filter driver      fffff6025fe5ad60 - MyLwf1
    Module context     fffff6026e0ccdd0
    Miniport           ffffa38fe192c1a0 - Realtek PCIe GBE Family Controller
    Network interface  fffff6026f54ca20
    
    State              Running
    Datapath           Receive only
    References         1
    Flags              RUNNING
    
    Higher filter      fffff6026e78ac60 - Realtek PCIe GBE Family Controller-MyLwf2-0000
    Lower filter       fffff6026f04ac60 - Realtek PCIe GBE Family Controller-WFP Native MAC Layer LightWeight Filter-0000
    
  • Jeffrey_Tippet_[MSFT]Jeffrey_Tippet_[MSFT] Member - All Emails Posts: 560

    "Receive only" happens when your filter only registers FilterReceiveNetBufferLists and FilterReturnNetBufferLists, and it does not register FilterSendNetBufferLists or FilterSendNetBufferListsComplete.

    It's perfectly fine for a filter to do this -- it just means that the Tx path will skip over the LWF, and the LWF only participates in the Rx path. But if you didn't do it intentionally, obviously that's something to look into.

    It looks like there's another 3rd party LWF in there: the VirtualBox one. Try removing it if you can, and see if that improves things. There might be an interop issue.

    Everything else looks pretty good. It seems like everything is in a good state, but for some reason, someone's not delivering packets. You can try capturing traffic off the box to see if packets are getting out. (Running Wireshark locally doesn't tell you much about any bugs in the host driver stack, because Wireshark interfaces with the driver stack in a clumsy and weird way.) The builtin netmon driver can capture between each Modifying LWF, so you can see whether packets are getting through any particular layer. (Use netsh trace start CaptureMultiLayer=yes).

    You can also try setting a breakpoint on NdisMIndicateReceiveNetBufferLists, and just following the packets up the stack; or NdisSendNetBufferLists down the stack.

  • MDHMDH Member Posts: 16

    Tried to do the netsh capture but on resume got a BSOD. The current process context is my service but not seeing any signs of my drivers. Based on the bugcheck code, I thought it was a lock issue but there is only one lock currently held and one waiting and both are related to srvnet.sys. There are no pending NBLs but some pending OIDs. !verifier on the various memory addresses are not returning anything and !dpcs doesn't show anything that seems to be of interest.

    DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
    Arg1: ffffa505b0dcf194
    Arg2: 0000000000000002
    Arg3: 0000000000000000
    Arg4: fffff80f60ebcb90
    
     # Child-SP          RetAddr           Call Site
    ffffa505`af237648 fffff801`0725a669 nt!KeBugCheckEx
    ffffa505`af237650 fffff801`072572eb nt!KiBugCheckDispatch+0x69
    ffffa505`af237790 fffff80f`60ebcb90 nt!KiPageFault+0x42b
    ffffa505`af237920 fffff80f`60ead38c ndis!ndisOidPostPacketFilter+0xd0
    ffffa505`af237a00 fffff80f`60ee5ca2 ndis!ndisOidRequestComplete+0xfc
    ffffa505`af237aa0 fffff80f`60ee3375 ndis!ndisMOidRequestCompleteInternal+0xd2
    ffffa505`af237b20 fffff80f`60f1d1c4 ndis!ndisMRawOidRequestComplete+0xf5
    ffffa505`af237b70 fffff80f`60ee388c ndis!ndisMpHookDefaultOidRequestComplete+0x14
    ffffa505`af237ba0 fffff80f`637867c4 ndis!NdisMOidRequestComplete+0xac
    ffffa505`af237be0 fffff80f`60eb14dd rt640x64!MpPollingDpc+0x18c8
    ffffa505`af237c60 fffff801`07133669 ndis!ndisMTimerObjectDpc+0xcd
    ffffa505`af237cb0 fffff801`071326d7 nt!KiProcessExpiredTimerList+0x159
    ffffa505`af237da0 fffff801`07250a45 nt!KiRetireDpcList+0x4c7
    ffffa505`af237fb0 fffff801`07250840 nt!KxRetireDpcList+0x5
    ffffa505`b0066660 fffff801`0724ffc9 nt!KiDispatchInterruptContinue
    ffffa505`b0066690 fffff801`071762ca nt!KiDpcInterrupt+0x2a9
    ffffa505`b0066820 fffff801`071758c2 nt!MiWalkPageTablesRecursively+0x10aa
    ffffa505`b0066900 fffff801`071758c2 nt!MiWalkPageTablesRecursively+0x6a2
    ffffa505`b00669e0 fffff801`071758c2 nt!MiWalkPageTablesRecursively+0x6a2
    ffffa505`b0066ac0 fffff801`071735c7 nt!MiWalkPageTablesRecursively+0x6a2
    ffffa505`b0066ba0 fffff801`070ae953 nt!MiWalkPageTables+0x1e7
    ffffa505`b0066c90 fffff801`072f82e4 nt!MiEmptyWorkingSetInitiate+0x103
    ffffa505`b0066e60 fffff801`072f8e8a nt!MiEmptyTargetedWorkingSet+0x78
    ffffa505`b0066eb0 fffff801`078b3a4e nt!MiTrimAllSystemPagableMemory+0xde
    ffffa505`b0066f00 fffff801`078c88f7 nt!MmVerifierTrimMemory+0x6a
    ffffa505`b0066f30 fffff801`078c6d3e nt!ViKeRaiseIrqlSanityChecks+0xcb
    ffffa505`b0066f70 fffff801`078c6b86 nt!VerifierKeAcquireInStackQueuedSpinLockCommon+0x62
    ffffa505`b0066fa0 fffff80f`5f4d5b4a nt!VerifierKeAcquireInStackQueuedSpinLock+0x16
    ffffa505`b0066fe0 fffff80f`5f4d5683 FLTMGR!FltpPerformPostCallbacks+0x16a
    ffffa505`b00670b0 fffff80f`5f4d4f3c FLTMGR!FltpPassThroughCompletionWorker+0x73
    ffffa505`b0067120 fffff801`078b3634 FLTMGR!FltpPassThroughCompletion+0xc
    ffffa505`b0067150 fffff801`0712501f nt!IovpLocalCompletionRoutine+0x174
    ffffa505`b00671b0 fffff801`078b2f71 nt!IopfCompleteRequest+0x11f
    ffffa505`b00672d0 fffff801`072952ed nt!IovCompleteRequest+0x1bd
    ffffa505`b00673c0 fffff80f`625bcaff nt!IofCompleteRequest+0x17041d
    ffffa505`b00673f0 fffff801`07206b9a Npfs!NpFsdFileSystemControl+0x3f
    ffffa505`b0067420 fffff801`078b2d29 nt!IopfCallDriver+0x56
    ffffa505`b0067460 fffff801`072884ef nt!IovCallDriver+0x275
    ffffa505`b00674a0 fffff80f`5f7f53b3 nt!IofCallDriverSpecifyReturn+0x819cf
    ffffa505`b00674d0 fffff801`078c0631 VerifierExt!IofCallDriver_internal_wrapper+0x13
    ffffa505`b0067500 fffff80f`5f4d7207 nt!VerifierIofCallDriver+0x21
    ffffa505`b0067540 fffff80f`5f50aed0 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x157
    ffffa505`b00675b0 fffff801`078d0ab8 FLTMGR!FltpFsControl+0x110
    ffffa505`b0067610 fffff801`078d0c16 nt!ViGenericDispatchHandler+0x54
    ffffa505`b0067650 fffff801`07206b9a nt!ViGenericFileSystemControl+0x16
    ffffa505`b0067680 fffff801`078b2d29 nt!IopfCallDriver+0x56
    ffffa505`b00676c0 fffff801`072961ad nt!IovCallDriver+0x275
    ffffa505`b0067700 fffff801`075a9f7b nt!IofCallDriver+0x16d9cd
    ffffa505`b0067740 fffff801`075ae4ea nt!IopSynchronousServiceTail+0x1ab
    ffffa505`b00677f0 fffff801`0756b2a6 nt!IopXxxControlFile+0x68a
    ffffa505`b0067920 fffff801`0725a143 nt!NtFsControlFile+0x56
    ffffa505`b0067990 00007ff8`ea1db0c4 nt!KiSystemServiceCopyEnd+0x13
    0000006f`02ffe558 00007ff8`e67db1e5 ntdll!NtFsControlFile+0x14
    0000006f`02ffe560 00007ff8`cd69c485 KERNELBASE!WaitNamedPipeW+0x1e5
    0000006f`02ffe670 000001c3`72ceeec0 System_Core_ni!DomainBoundILStubClass.IL_STUB_PInvoke(System.String, Int32)
    
    1: kd> !locks
    KD: Scanning for held 
    Resource @ 0xffffbe0c75df5b50    Exclusively owned
        Contention Count = 4
        NumberOfExclusiveWaiters = 1
         Threads: ffffbe0c75fe8040-01<*> 
         Threads Waiting On Exclusive Access:
                  ffffbe0c73874040       
    27030 total locks, 1 locks currently held
    
    1: kd> .thread ffffbe0c`73874040
    nt!KiCommitThreadWait+0x13b
    nt!KeWaitForSingleObject+0x1ff
    nt!ExpWaitForResource+0x6d
    nt!ExAcquireResourceExclusiveLite+0x1c9
    srvnet!SrvNetUpdateNetNameWorkerRoutine+0x95
    nt!IopProcessWorkItem+0x8b
    nt!ExpWorkerThread+0xf5
    
    1: kd> .thread ffffbe0c75fe8040
    nt!KiCommitThreadWait+0x13b
    nt!KeWaitForSingleObject+0x1ff
    nt!IopParseDevice+0x1609
    nt!ObpLookupObjectName+0x73b
    nt!ObOpenObjectByNameEx+0x1df
    nt!IopCreateFile+0x3f5
    nt!NtCreateFile+0x79
    TDI!TdiOpenNetbiosAddress+0x140
    srvnet!SrvNetOpenEndpointHandle+0x49
    srvnet!SrvNetTdiAllocateEndpoint+0x27c
    srvnet!SrvNetAllocateEndpoint+0xb8
    srvnet!SrvNetAddServedName+0x16e
    srvnet!SvcXportAdd+0x12d
    srvnet!SrvAdminProcessFsctlFsp+0xaa
    nt!IopProcessWorkItem+0x8b
    nt!ExpWorkerThread+0xf5
    

    Only the first OID_GEN_CURRENT_PACKET_FILTER is marked as complete and all others are not completed. The only OID fiddling my filters do is add PROMISCUOUS to PACKET_FILTER. All others are just passed through.

    ALL PENDING OIDs
        NetAdapter         ffffbe0c736db1a0 - Realtek PCIe GBE Family Controller
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f4d4c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0000
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f554c60 - Realtek PCIe GBE Family Controller-WFP Native MAC Layer LightWeight Filter-0000
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f594c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0001
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4fa8ac60 - Realtek PCIe GBE Family Controller-MyLwf1-0000
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f558c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0003
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f5c2c60 - Realtek PCIe GBE Family Controller-MyLwf2-0000
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f586c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0004
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f84ec60 - Realtek PCIe GBE Family Controller-VirtualBox NDIS Light-Weight Filter-0000
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f806c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0005
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f9fcc60 - Realtek PCIe GBE Family Controller-QoS Packet Scheduler-0000
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f964c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0006
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
        Filter             ffffdf8e4f5f2c60 - Realtek PCIe GBE Family Controller-Microsoft NDIS Capture-0007
            Current OID        OID_GEN_CURRENT_PACKET_FILTER
            Queued OIDs        OID_GEN_STATISTICS
                               OID_GEN_STATISTICS
    
    1: kd> !dpcs
    CPU Type      KDPC       Function
     1: Normal  : 0xfffff80107447780 0xfffff8010719bbe0 nt!PpmCheckPeriodicStart
    
    1: kd> dt 0xfffff80107447780 _KDPC
    ntdll!_KDPC
       +0x000 TargetInfoAsUlong : 0x10313
       +0x000 Type             : 0x13 ''
       +0x001 Importance       : 0x3 ''
       +0x002 Number           : 1
       +0x008 DpcListEntry     : _SINGLE_LIST_ENTRY
       +0x010 ProcessorHistory : 0xff
       +0x018 DeferredRoutine  : 0xfffff801`0719bbe0     void  nt!PpmCheckPeriodicStart+0
       +0x020 DeferredContext  : (null) 
       +0x028 SystemArgument1  : (null) 
       +0x030 SystemArgument2  : (null) 
       +0x038 DpcData          : 0xffffbe0c`702cc3b0 Void
    

    I'll disable VirtualBox next to see if that makes any difference.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Writing WDF Drivers 21 Oct 2019 OSR Seminar Space & ONLINE
Internals & Software Drivers 18 Nov 2019 Dulles, VA
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 27 Apr 2020 OSR Seminar Space & ONLINE