WFP callouts synchronization

Hi all.

Can someone please clarify the following for me:

  1. Are callouts [always] invoked with IRQL=DPC?
    I’d like to use KeAcquireInStackQueuedSpinLockAtDpcLevel/KeReleaseInStackQueuedSpinLockFromDpcLevel but the doc says IRQL <= DISPATCH_LEVEL. I have never seen IRQL < DISPATCH_LEVEL for my callouts.

  2. I can assign some context to a filter registration:
    filter.rawContext = myContextPointer;

myContextPointer is a WDFFILEOBJECT’s context structure pointer.

And then get it back inside a callout: filter->context.

How do I make sure this structure is not deallocated at the time my callout(s) are in progress? Provided I first call FwpmFilterDeleteByKey0, FwpmCalloutDeleteByKey0 and FwpsCalloutUnregisterByKey0 during Close/Cleanup and only then deallocate my context structure (deallocated automatically by WDF).

I assume Fwpm/Fwps functions do not wait/spin until all outstanding callouts are finished.
Probably some callouts can even be invoked after unregistration happens.

Thanks.

>I’d like to use KeAcquireInStackQueuedSpinLockAtDpcLevel/
KeReleaseInStackQueuedSpinLockFromDpcLevel
I don’t understand your problem. Test if you are at DPC. If so use these
routines otherwise use the non-DpcLevel acquire/release.

How do I make sure this structure is not deallocated at the time my
callout(s) are in progress
Call WdfObjectReference on your object when you allocate it (I’m assuming
by WdfObjectAllocateContext). Make sure you specify a cleanup routine.
Then once your callout is done with the context, deference it so the
cleanup routine can be called. It won’t be deallocated while there is
still a reference to it.

Probably some callouts can even be invoked after unregistration happens
Yes.

On Wed, Feb 17, 2016 at 7:58 PM, wrote:

> Hi all.
>
> Can someone please clarify the following for me:
>
> 1) Are callouts [always] invoked with IRQL=DPC?
> I’d like to use
> KeAcquireInStackQueuedSpinLockAtDpcLevel/KeReleaseInStackQueuedSpinLockFromDpcLevel
> but the doc says IRQL <= DISPATCH_LEVEL. I have never seen IRQL <
> DISPATCH_LEVEL for my callouts.
>
> 2) I can assign some context to a filter registration:
> filter.rawContext = myContextPointer;
>
> myContextPointer is a WDFFILEOBJECT’s context structure pointer.
>
> And then get it back inside a callout: filter->context.
>
> How do I make sure this structure is not deallocated at the time my
> callout(s) are in progress? Provided I first call FwpmFilterDeleteByKey0,
> FwpmCalloutDeleteByKey0 and FwpsCalloutUnregisterByKey0 during
> Close/Cleanup and only then deallocate my context structure (deallocated
> automatically by WDF).
>
> I assume Fwpm/Fwps functions do not wait/spin until all outstanding
> callouts are finished.
> Probably some callouts can even be invoked after unregistration happens.
>
> Thanks.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

Paul,
I understand how to manage WDF objects lifetime.

Please reread my #2 issue. Callouts are invoked by NETIO asynchronously and I have to pass raw pointer to my object when registering my filter. And there are no known means to say “Hey, NETIO, please do not invoke my callouts any more and please make sure all outstanding callouts are finished by the time I release my resources”.

FwpsCalloutRegister0 requires DeviceObject pointer, thats why your driver is not unloaded until you unregister all callouts.

FwpmFilterAdd0 do not reference any objects.

Andrii,

  1. My experience is that they are always invoked at DISPATCH IRQL, but I’m not sure if this is guaranteed.

  2. Guaranteeing callouts have stopped before destroying context information is a bit tricky in WFP and I don’t think there is a nice way. My understanding is that once FwpsCalloutUnregisterByKey has completed, no new callouts will fire. However, that doesn’t mean that a callout that began before or during FwpsCalloutUnregisterByKey can’t still continue and try to access your context.

To make your callouts safe with respect to context destruction, you can do your context destruction in your WDFDEVICE?s EvtDestroyCallback and then ensure that the KMDF framework doesn?t invoke it until your callouts are finished by using the WdfObjectReference/WdfObjectDereference technique that Paul mentioned. You also need to maintain a boolean state indicating whether you are in the unload process or not and a lock to make sure that checking the unload state and calling WdfObjectReference in your callout is atomic with respect to setting it before you call FwpsCalloutUnregisterByKey.

In your callout:

  1. Acquire the unload state lock
  2. Check the unload state.
  3. If you are unloading you know not to access your context and you can just release the lock.
  4. If you are not unloading you need to call WdfObjectReference on your WDFDEVICE, then release the lock (you don?t want to hold it for too long) and continue with callout processing.
  5. In the ?not unloading? code path, call WdfObjectDereference on your WDFDEVICE once you?ve finished with the context.

Where you are currently calling FwpsCalloutUnregisterByKey:

  1. Acquire the unload state lock
  2. Set the unload state to ?unloading?
  3. Release the unload state lock
  4. Call FwpsCalloutUnregisterByKey

In your WDFDEVICE?s EvtDestroyCallback:

  1. Destroy your context

This logic means that by the time FwpsCalloutUnregisterByKey is called, any callouts that are in progress have either already incremented the WDFDEVICE ref count (stopping your EvtDestroyCallback from firing and destroying the context until the last one completes) or have not yet checked the unload state and, when they do, will see you are unloading and not attempt to access the context. No callouts will attempt to access the context after it has been destroyed.

If you are worried about contention on the unload state lock in your callouts, you could use a lock with shared read/exclusive write capability since the unload state is read many times and written to only on unload.

I?m aware that this is a bit complicated but I can?t think of another way to do it. I?m putting a feature request in to Microsoft for a new callout unregister function that blocks until all activity has ceased but that obviously doesn?t help right now.

Thinking a bit more, it’s possible that WFP holds onto its device object reference until the last callout has completed, in which case simply moving your context destruction into the EvtDestroyCallback is enough to make it safe. I’ve sent a query to Microsoft tech support (am dealing with them for other WFP issues at the moment) and will report back if they give me an answer on this.

>Test if you are at DPC. If so use these routines

otherwise use the non-DpcLevel acquire/release.

One of the most absurd suggestions one may think of…

The only purpose of DPC-level aquisition/release variants is optimisation, because changing IRQL may involve a serious overhead on some HALs. Thererfore, DPC-level aquisition/release variants allow raw spinlock acquisition and release in situations when you are 100% sure IRQL is elevated (i.e. in context of DPC routine). I hope by now you understand why checking IRQL before acquisition is just an absurd, although harmless, exercise - if you aren ot sure you can always use
non-DPC version. It does not work the other way around - more on it below.

I don’t understand your problem.

The problem is going to arise if you use DPC-level(i.e. raw) spinlock acquisition while context switches on a CPU are not disabled (i.e. current IRQL is below DISPATCH_LEVEL).Consider what happens if a CPU tries to acquire a spinlock that had been earlier acquired by the same CPU in context of thread X, and does so at elevated IRQL (i.e from a DPC routine) - as you can see, a deadock is guaranteed here.

This is why the OP asks whether he can safely use raw spinlock/acquision/release here, and the answer to his question is obviously negative - as long as there is a theoretical possibility of such acquisition being attempted while switches on a CPU are not disabled (i.e. the scenario that the official documentation suggests) doing so may result in a deadlock…

Anton Bassov

My problem is that callout and filter are associated with WDF File Object and not with a device itself.
So I can’t use WdfObjectReference/WdfObjectDereference technique.

I wonder when does WFP call notifyFn? Are all outstanding callouts finished by that time?

I don’t see the problem but I’ve not worked with WDF File Objects before so perhaps I’m missing something.

Can you not reference and dereference them like any other WDFOBJECT?

https://www.osr.com/nt-insider/2014-issue1/wdf-file-object-callbacks-properties-demystified/

This article would suggest that you can use the ref count technique…

It cannot help me cause WFP has to do it. I can’t bind my object to a filter.

Don’t feed the trolls…don’t feed the trolls…ok…

One of the most absurd suggestions
Thanks.

So what part of “test” from my answer doesn’t make sense? No where did I
say to blindly try to use the DPC routines in non-DPC paths which is what
your post implies I said. Maybe I wasn’t clear but if you TEST (i.e.
KeGetCurrentIrql == DISPATCH) and it is TRUE then context switches are
disabled because YOU ARE ALREADY AT DPC and therefore you can use the raw
routines. How can that be TRUE and a context switch occurs? That’s a
serious question Anton. Doesn’t DPC mean that context switches on the
particular CPU are disabled? Shit, you even said yourself that context
switches are disabled at IRQL >= DISPATCH. So how can KeGetCurrentIrql
return DISPATCH and then the thread performs a context switch to something
lower to make the use of DPC acquire/release routines invalid?

As for the OP, like I said, reference your objects and they won’t go away
until you deference them.

On Fri, Feb 19, 2016 at 4:46 AM, wrote:

> >Test if you are at DPC. If so use these routines
> > otherwise use the non-DpcLevel acquire/release.
>
>
> One of the most absurd suggestions one may think of…
>
>
> The only purpose of DPC-level aquisition/release variants is optimisation,
> because changing IRQL may involve a serious overhead on some HALs.
> Thererfore, DPC-level aquisition/release variants allow raw spinlock
> acquisition and release in situations when you are 100% sure IRQL is
> elevated (i.e. in context of DPC routine). I hope by now you understand why
> checking IRQL before acquisition is just an absurd, although harmless,
> exercise - if you aren ot sure you can always use
> non-DPC version. It does not work the other way around - more on it below.
>
>
>
> >I don’t understand your problem.
>
>
> The problem is going to arise if you use DPC-level(i.e. raw) spinlock
> acquisition while context switches on a CPU are not disabled (i.e. current
> IRQL is below DISPATCH_LEVEL).Consider what happens if a CPU tries to
> acquire a spinlock that had been earlier acquired by the same CPU in
> context of thread X, and does so at elevated IRQL (i.e from a DPC routine)
> - as you can see, a deadock is guaranteed here.
>
> This is why the OP asks whether he can safely use raw
> spinlock/acquision/release here, and the answer to his question is
> obviously negative - as long as there is a theoretical possibility of such
> acquisition being attempted while switches on a CPU are not disabled (i.e.
> the scenario that the official documentation suggests) doing so may result
> in a deadlock…
>
>
>
> Anton Bassov
>
>
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

Paul,

Don’t feed the trolls…don’t feed the trolls…ok…

Hey, I don’t want to make yet another enemy,do I…

What you do not want to understand is that the very purpose of raw spinlocks is to avoid
dealing with IRQL because of possible overhead that it may have on some HALs - this is nothing more than just an optimization. The code like

if(KeGetCurrentIrql ()== DISPATCH_LEVEL)
KeAcquireSpinLockAtDpcLevel(…);

is absurd simply because the first line automatically negates any optimization/advantage that the second line is meant to provide. This is the only thing I was trying to say

So what part of “test” from my answer doesn’t make sense?

I hope now you got it…

In either case, as I said in my previous post, this approach is totally harmless and safe despite its absurdity, although unconditionally calling KeAcquireSpinLock() seems to be a better option - I would reserve KeAcquireSpinLockAtDpcLevel() variant only to those cases where I am
100% sure IRQL is elevated…

I guess now it is my turn to say “Maybe I was not clear” …

Anton Bassov

Despite the useless verbal sparring and exaggerated claims of “absurdity” Antin is basically correct: You don’t test the IRQL and then determine which flavor of the spin lock functions to call… This negates any possible (tiny) gain from the optimization that the xxxxAtDpcLevel flavor gets you.

You just call the basic function call and be done with it.

For the purpose of clarification for the OP: If the docs say the callback can happen at IRQLs <= DISPATCH_LEVEL then this is architecturally possible, regardless of what you’ve seen experientially.

So just use the vanilla flavor of the lock claim function, and you’ll be good to go.

Peter
OSR
@OSRDrivers

Mr Cattley said:

testing IRQL is for debugging asserts only.

This.

Peter
OSR
@OSRDrivers

?

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-602799-
xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Sunday, 21 February 2016 2:14 AM
To: Windows System Software Devs Interest List
> Subject: RE: Re: [ntdev] WFP callouts synchronization
>
> This.
>
> Peter
> OSR
> @OSRDrivers
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at:
> http:
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at
> http:</http:></http:></http:>

Peter,

Despite the useless verbal sparring and exaggerated claims of “absurdity”…

Believe me or not, but on this particular occasion I had no intention of trolling. Seriously. In fact,
the only reason why I posted here was because I decided to take a break from the “trolling mode” and post something technical. Apparently, idiomatically speakings, I just forgot to flush the caches and invalidate TLBs that got filled on the thread where you requested me to be “creative”…

Anton Bassov

Andrii,

I’m not sure if this will help you but Microsoft support have confirmed to me that (following a FwpsCalloutUnregister call) WFP will not let the object reference count return to zero until all callouts have finished. However, the object dereferencing is asynchronous so you need to free your context information only when your object is being destroyed by the framework, not when FwpsCalloutUnregsister returns.

>the DPC routines in non-DPC paths which is what your post implies I said. Maybe I wasn’t clear but

if you TEST (i.e. KeGetCurrentIrql == DISPATCH)

No cleanly written code should ever call KeGetCurrentIrql for such a purpose, except for debug asserts and printouts.

Anton is correct. KeGetCurrentIrql is expensive, and so should be avoided.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> KeGetCurrentIrql is expensive, and so should be avoided.

Well, apparently, not everywhere, but on some HALs it can involve something as heavy as multiple IO port instructions (I think it works his way on PIC HAL). OTOH, this is more or a history and applies only to antique hardware that can be, for all practical purposes, simply omitted in the year 2016. On those HAls where IRQL is implemented purely in software or stored in the local APIC’s TPR the cost of getting and setting IRQL is tiny…

Concerning the claims of absurdity, the proposed code is equivalent to

if(KeGetCurrentIrql ()< DISPATCH_LEVEL)

{
KeSetCurrentIrqlDISPATCH_LEVEL) ;
KeAcquireSpinLockAtDpcLevel(…);
}

else

KeAcquireSpinLockAtDpcLevel(…);

Am I the one who finds the above lines absurd?

Anton Bassov