Best practices for Windows Kernel call-back functions and operations

Mecanik · October 23, 2022, 3:37pm

I have asked opinions on this question on stack overflow, but unfortunately that “community” has become… poisoned. I’ll leave it a that.

With that being said, I decided to come here, where the actual kernel experts are.

This question aims to get a bit of clarity and more information about Windows Kernel call-backs.

If you go the official documentation you will find the following:

Keep routines short and simple.
Do not make calls into a user mode service to validate the process, thread, or image.
Do not make registry calls.
Do not make blocking and/or Interprocess Communication (IPC) function calls.
Do not synchronize with other threads because it can lead to reentrancy deadlocks.
…

From my own opinion, this defeats the purpose of having a call-back in the first place. If you can’t validate a thread, image, process or even perform IPC… what’s the point?

Based on MSDN you shouldn’t even contact your service, or log information via IPC or registry, etc. So again, what’s the point of having one?

Consider having an AV like product and you want to validate images on LOAD_IMAGE_NOTIFY_ROUTINE/OB_PRE_OPERATION_CALLBACK, you are not supposed to. So now what?

I’ve already seen countless drivers doing a lot of validations and operations in these call-backs, even though they do not follow “best practices”. And yet, nothing “bad” happens.

Please share your thoughts as to:

Why is MSDN really recommending these best practices?
How would one perform “correct” and “best practice” operations without System Worker Threads? Because on the example above (LOAD_IMAGE_NOTIFY_ROUTINE/OB_PRE_OPERATION_CALLBACK), it would be pointless to “queue” a validation for an image for example, when your purpose is to prevent loading it if it’s invalid. Please share an example if possible.

Mark_Roddy · October 23, 2022, 4:22pm

Do not make calls into a user mode service to validate the process, thread, or image.
I have no clue what that really means, but certainly you can notify your user mode service that it has work to do and wait for that work to complete, as that is basically the point of having the callback to begin with.

Mecanik · October 23, 2022, 4:28pm

@Mark_Roddy said:

Do not make calls into a user mode service to validate the process, thread, or image.
I have no clue what that really means, but certainly you can notify your user mode service that it has work to do and wait for that work to complete, as that is basically the point of having the callback to begin with.

Agreed. One of the methods… but look at what Microsoft says

That’s the whole question… WHY.

Mark_Roddy · October 23, 2022, 9:22pm

Well literally they tell you to not make a function call into a user mode service. So I agree: don’t do that.

NiallNSec · October 25, 2022, 6:01pm

Certain types of drivers violate these conditions all the times. The biggest offender being AV/EDR products. The best practices, in my (limited) opinion, serve as more of a serious warning. Unless you know EXACTLY what you are doing you can cause serious problems by not following them, but that doesn’t mean you can’t do something like call a user mode service in a callback.

Usually, if you need to break one of those rules, there will be an alternative safe way of achieving your goal which does follow Microsofts guidelines. So it should only be exceptional cases where you would consider breaking one of these rules. If you do choose to ignore Microsofts recommendations then you should be prepared for situations where a future update causes problems for your driver.

(Also, it’s worth noting that Microsoft themselves do appear to break these rules. If you were to, for example, suspend the windows defender user mode service you would be surprised to find that many things stop working because they are held up in WdFilter.)

MBond2 · October 25, 2022, 11:32pm

I think that you should apply judgement to these rules. In any performance critical section of anything, keeping the work to be done as short and simple as possible is an obvious goal. Avoiding blocking or waiting gratuitously should be done. But when the functional objective cannot be achieved without blocking, then blocking must happen. Just make sure that you have a strategy to avoid deadlock and consider what to do if the component that you need to wait on does not respond - usually a decision to fail open or fail closed

Mark’s literal interpretation that they tell you not to execute a call instruction targeting a UM address should also be followed of course

brad_H · October 28, 2022, 10:44am

@Mark_Roddy said:

Do not make calls into a user mode service to validate the process, thread, or image.
I have no clue what that really means, but certainly you can notify your user mode service that it has work to do and wait for that work to complete, as that is basically the point of having the callback to begin with.

I’m pretty sure they are talking about FltMgr’s Communication Ports.

And there is no issue in using communication ports to contact your service for some user-mode checking.

Scott_Noone_OSR · November 4, 2022, 4:24pm

Originally some of these Ps callbacks were called with a non-recursively acquirable per-process lock held. If you happened to try and perform an operation that also attempted to acquire this lock you’d deadlock. This led to confusing and draconian language being added to the docs telling people to basically not do anything in these callbacks.

The locking has changed over time and it’s possible to do more in these callbacks these days, though the guidance stays (presumably to avoid constraining the Ps implementation from changing in the future). Though the primary issue is around anything to do with the process (address space, PEB, loaded moduies, etc.). Lots of products just end up scanning the underlying file (e.g. FltCreateSectionForDataScan) and that would be safe given that there are no file system locks held in these callbacks. Anything else they’re doing just happens to work by way of the current implementation.