WFP callout unload/load problem when using FwpsFlowAssociateContext0(...) function

Hello,

I playing with Inspect sample from Microsoft which illustrates the use of different WFP callouts. I noticed that once I am associating my own context to a flow handle in ALE_AUTH_CONNECT_V4 using FwpsFlowAssociateContext0 I am starting to have problems with driver unloading/loading.

I.e., if I unload driver using

net stop inspect

and then load it using

net start inspect

I receive the following error:

System error 2 has occurred.
The system cannot find the file specified.

It tried to unregister all the flow contextes using FwpsFlowRemoveContext0 in Unload routine (I kept them in list and then just iterate over list and call FwpsFlowRemoveContext0 on each element) but this does not help.

If I do not associate context with flow handle I don’t have any problems: I can load/unload driver in stress tests script and everything works for days.

I have, therefore the following questions:

  1. Why OS gives such a strange error message on load? It could have cancelled the unloading of driver as long as there are any pending contextes associated with the flow handles?

  2. Why even after removing the context with FwpsFlowRemoveContext0 I still have the problem?

  3. If it is a bug of Windows, is there any KB article explaining it?

Thanks for any thoughts on this!

On Unload consider:

Delete your WFP filters.
Unregister your callouts.

Note: The act of unregistering your callouts should cause your flow delete
callouts to be called for each flow. In flow delete you simply free your
context - don’t call FlowRemoveContext.

AFTER deleting filters and unregistering callouts you might insure your list
is empty.

Keep at it. This will work once you re-read the docs and attend to the
details.

Thomas F. Divine

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@meta.ua
Sent: Thursday, May 9, 2013 6:21 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] WFP callout unload/load problem when using
FwpsFlowAssociateContext0(…) function

Hello,

I playing with Inspect sample from Microsoft which illustrates the use of
different WFP callouts. I noticed that once I am associating my own context
to a flow handle in ALE_AUTH_CONNECT_V4 using FwpsFlowAssociateContext0 I am
starting to have problems with driver unloading/loading.

I.e., if I unload driver using

net stop inspect

and then load it using

net start inspect

I receive the following error:

System error 2 has occurred.
The system cannot find the file specified.

It tried to unregister all the flow contextes using FwpsFlowRemoveContext0
in Unload routine (I kept them in list and then just iterate over list and
call FwpsFlowRemoveContext0 on each element) but this does not help.

If I do not associate context with flow handle I don’t have any problems: I
can load/unload driver in stress tests script and everything works for days.

I have, therefore the following questions:

  1. Why OS gives such a strange error message on load? It could have
    cancelled the unloading of driver as long as there are any pending contextes
    associated with the flow handles?

  2. Why even after removing the context with FwpsFlowRemoveContext0 I still
    have the problem?

  3. If it is a bug of Windows, is there any KB article explaining it?

Thanks for any thoughts on this!


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Hello All,

I am getting the same issue.
even after unregistering all the filters and unregistering all the callouts in the order the
DRIVER_OBJECT is still present (It does not show up in Driver\ list)
and shows the PointerCount is still 1 for the driver object when !object is printed.

The driver is unloaded using sc stop , during unload someone opens a handle to the object and PointerCount suddenly jumps but is reset back.
there is a breakpoint on write access for object header for the driver object

Hence the driver can’t unload. can someone please check and let me know what am I missing.

I have attached the entire windbg trace and pasted a little bit of info.
thanks

3: kd> !object ffff9306c660ba60 Object: ffff9306c660ba60 Type: (ffff9306c450ab00) Driver ObjectHeader: ffff9306c660ba30 (new version) HandleCount: 0 PointerCount: 3 Directory Object: ffffc88fbd8f17c0 Name: GIAppDef 3: kd> ba w8 ffff9306c660ba30 3: kd> g TRACE: GetMSDosName : nameInfo = C:\Windows\System32\sc.exe TRACE: GetUtf8StringLengthRaw : Getting utf8 string length for C:\Windows\System32\sc.exe TRACE: GetUtf8StringLengthRaw : --- Exiting GetUtf8StringLengthRaw --- TRACE: GetUtf8StringLengthRaw : Getting utf8 string length for sc stop giappdef TRACE: GetUtf8StringLengthRaw : --- Exiting GetUtf8StringLengthRaw --- TRACE: MemCopy : dest = FFFFC8003E33AFD0, src = FFFFC8003E346FE0, len = 27 TRACE: MemCopy : --- Exiting MemCopy --- TRACE: MemCopy : dest = FFFFC8003E33AFEB, src = FFFFC8003E1C4FE0, len = 18 TRACE: MemCopy : --- Exiting MemCopy --- TRACE: MemCopy : dest = FFFFC8003DF0AEF0, src = FFFFC8003DF4AEF0, len = 267 TRACE: MemCopy : --- Exiting MemCopy --- TRACE: IsMemEqual : s1 = FFFFC8003E33AF5C, s2 = FFFFC80038610FBC, len = 32 TRACE: IsMemEqual : --- Exiting IsMemEqual --- DEBUG: _GetHashKey : PROCESS type C:\Windows\System32\sc.exe alarm key 2331978580 DEBUG: GIAppDefProcCreateExitCb : Process with PID 4472 starting. ProcessTable update successful. Breakpoint 1 hit nt!ObfReferenceObject+0x25: fffff803aab0c305 48ffc3 inc rbx
3: kd> !object ffff9306`c660ba60
Object: ffff9306c660ba60 Type: (ffff9306c450ab00) Driver
ObjectHeader: ffff9306c660ba30 (new version)
HandleCount: 0 PointerCount: 4
Directory Object: ffffc88fbd8f17c0 Name: GIAppDef
3: kd> kv

Child-SP RetAddr : Args to Child : Call Site

00 ffffe78111e4fca0 fffff803aaeffb0a : ffffc88fc92ae030 ffffe78111e4fde0 ffffe78100000002 ffffc88fc5829670 : nt!ObfReferenceObject+0x25
01 ffffe78111e4fce0 fffff803aae25d0d : ffff9306c8ce3000 ffffe78111e4ff40 ffffe78100000240 ffff9306c450ab00 : nt!ObpLookupObjectName+0xa1a
02 ffffe78111e4feb0 fffff803aae3b9ad : ffffe78100000001 ffff9306c705b080 0000000000000000 ffffe78111e50350 :

I have found why the driver can’t be unloaded but still need help to resolve this problem. The issue is that while unloading - I call FwpsFlowRemoveContext for remaining flow contexts which are stored in a linked list) returns STATUS_UNSUCCESSFUL (There is no context currently associated with the data flow.) as per https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/fwpsk/nf-fwpsk-fwpsflowremovecontext0

However there are contexts associated because FwpsCalloutUnregisterById() fails with DEVICE_BUSY. https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/fwpsk/nf-fwpsk-fwpscalloutunregisterbyid0
I have verified that This is an active flow and the process (svchost) which initiated this flow is still active.
The flow deletion function for this flow has not been called.

Any clues about this issue, why FwpsFlowRemoveContext fails?

there is a similar issue reported but with no reply from MS. https://social.msdn.microsoft.com/Forums/sqlserver/en-US/3d7e45ee-d3dd-49cf-8599-743cdac76097/fwpsflowremovecontext-returns-statusunsuccessful?forum=wfp

1 Like

I ran into this as well. I currently use a counter, that gets decremented by the FlowDeleteFn. When FwpsCalloutUnregisterById fails with STATUS_DEVICE_BUSY, I check every second if the counter has reached zero yet. This helps, but sadly sometimes even after ~10s the counter has not reached zero - So FwpsCalloutUnregisterById still fails.

Is there no way to “force” Wfp to remove the context? I think I will try FwpsFlowRemoveContext0 - And if that does not help, I guess one could manually maintain a hashmap (flowHandle to context).

Okay so while FwpsCalloutUnregisterById returns STATUS_SUCCESS and calls flowDeleteFn, the succeeding call to FwpsCalloutUnregisterById still fails with STATUS_DEVICE_BUSY, even when waiting ~1s. Seems like a bug to me.

I guess one could manually maintain a hashmap (flowHandle to context).

The problem with that approach seems to be that the flowDeleteFn is not called anymore, as per docs:
The filter engine calls this callout function only if the callout driver associated a context with the data flow.

So we’d have to manually figure out when to free the associated data. Does anyone know of how one could do that / when one should free the associated data?

FwpsFlowAbort0(), if the FwpsFlowRemoveContext0()'s result is not success.

It fixed driver unloading for me.