Concurrent delivery of image load events via PsSetLoadImageNotifyRoutine() and a callback function.

chuckchopp · February 5, 2023, 6:52pm

In the past year my development responsibilities expanded to include the maintenance & troubleshooting of a software driver written in C++ using the DriverWorks framework. Much of the code shows a revision history with very few changes made over the 15+ year span of time since it was originally written. Recently, a bugcheck started occurring and I was able to track down the root-cause, which is a double-remove operation being performed on an item in a doubly linked list. While analyzing the source code and a crash dump I noticed that the callback function registered via PsSetLoadImageNotifyRoutine() appears to be getting called simultaneously by two different threads reporting image load events for the same process, yet is coded with an implicit assumption that image load notifications for a given process will occur sequentially, not concurrently. Obviously, that assumption is incorrect based on the content of the crash dump and the double-remove operation that caused the bugcheck.

PsSetCreateProcessNotifyRoutine(), PsSetCreateThreadNotifyRoutine() and PsSetLoadImageNotifyRoutine() are used by this driver.

The NTDDK documentation for PCREATE_PROCESS_NOTIFY_ROUTINE states “The operating system calls the driver’s process-notify routine at PASSIVE_LEVEL inside a critical region with normal kernel APCs disabled.”.

The NTDDK documentation for PCREATE_THREAD_NOTIFY_ROUTINE states “The driver’s thread-notify routine runs at IRQL = PASSIVE_LEVEL or APC_LEVEL.”.

The NTDDK documentation for PLOAD_IMAGE_NOTIFY_ROUTINE states “The operating system calls the driver’s load-image notify routine at PASSIVE_LEVEL inside a critical region with normal kernel APCs always disabled and sometimes with both kernel and special APCs disabled.”.

I read the OSR article “Understanding Critical Regions” and am very clear in my understanding that a critical region is not a critical section. There is explicit mention that acquiring any Spinlocks, Kernel Mutexes, Fast Mutexes or Guarded Mutexes will implicitly result in entering a critical region, while other locking/synchronization constructs do not implicitly enter a critical region.

Two of the callbacks are documented as being called by the kernel while in a critical region, so it’s clear that the thread executing the callback cannot be interrupted in those cases. However, no statements are made about any kind of serialization of the invocation of those callbacks by the kernel, especially as it relates to notifications of a given type for the same process or for different processes.

Also of note, thread create/delete notifications don’t even happen within a critical region, and the driver code I’m working has a function for that callback that makes other calls that ultimately lead to obtaining & releasing a lock while manipulating a linked list. I need to determine if the lock mechanism being used falls under the category of locks that implicitly enter a critical region or of its using one of the other types that do not have that behavior. If it is using the latter type, then that represents another place where a fault could occur in the driver code leading to a deadlock or some other eventually fatal problem if the thread were to be interrupted while it held the lock.

Question: What documentation, if any, defines the concurrency rules for when callback functions are called to deliver the notification of image load events and thread create/delete events?

It’s obvious that a process create/delete event should only ever get delivered a single time for a given process and cannot, by definition, both be delivered concurrently for the same process by different threads.

What is of concern is that this driver has been working just fine, as-is, until recently and only has the bugcheck occurring in a specific runtime environment. I suspect that there is some negative interaction between two or more security enhancement products being present and functional at the same time such that a newly created process is being subjected to DLL injection or some other code injection that results in additional threads being created and DLLs loaded in a way that is causing the simultaneous delivery of image load events. The lack of locking in the callback functions then leads to the double-remove and subsequent bugcheck. Also of note, it has not been possible, yet, to reproduce the crash in a controlled development lab environment, but that is likely due to not having sufficiently duplicated the runtime environment in a virtual machine. In the environment where the failure happens, though, it is easily repeatable and the complete memory dump that is produced always shows the same underlying double-remove as the cause for the bugcheck.

Dejan_Maksimovic · February 5, 2023, 7:47pm

Hello,

Not sure I understood, but is there any sync object acquired during the
manipulation of the shared data?

Regards, Dejan.

ThatsBerkan · February 5, 2023, 8:07pm

The system might have been compromised, in my opinion. As Dejan_Maksimovic suggested, you can just use a synchronization object as a workaround for now, before modifying the linked list.

chuckchopp · February 5, 2023, 11:34pm

The driver has a callback function registered via PsSetLoadImageNotifyRoutine(), and the callback function gets called by the kernel for each image load event that happens for a process. That has been working fine for 15+ years.

Yes, there is synchronization involved with access to the list as it’s a class derived from KInterlockedList<> [part of the DriverWorks framework] in terms of “add”, “find” and “remove” operations. However, the code in the callback, as originally written, performs a “find” followed by a “remove” and there is no overarching lock being held as the find is performed and then a subsequent remove being performed if the find was successful. That’s where things are blowing up as there are 2 threads entering the callback at the same time for the same process, and so they momentarily synchronize on the “find” but then things become non-deterministic as they execute a few more instructions before attempting to perform the “remove” operation. This results in one of the first “remove” operation to be complete being successful followed by the second/redundant attempt to remove the item from the list causing a bugcheck.

I intend to make a change to introduce additional synchronization within the callback function to ensure that this is no longer a problem.

My question, though, was intended to ask whether this simultaneous invocation of the load image callback for the same process was something that is supposed to be possible based on how the kernel is supposed to be invoking the callback. Somehow, that process is attempting to load 2 different DLLs at the same time and the notifications of those events are being delivered. Given that there is supposed to be other locking going on in the process itself while DLLs are being loaded and mapped into memory I’m at a loss to explain the why/how. All I can do is respond to what is actually happening and deal with it.

For example, when the driver goes active, it does the following:

status = PsSetLoadImageNotifyRoutine(&LoadImageNotify);

with LoadImageNotify(…) declared as:

static VOID STDMETHODCALLTYPE LoadImageNotify(IN PUNICODE_STRING FullImageName, IN HANDLE ProcessId, IN PIMAGE_INFO ImageInfo)

Tracing messages show LoadImageNotify() being entered by thread A with ProcessId == X, but before thread A returns from the function, thread B also enters LoadImageNotify() with ProcessId == X, albeit for a different FullImageName string value.

Tracing messages showing multiple threads calling into LoadImageNotify() with different ProcessId parameter values is quite normal. However, there is a particular runtime environment over which I have no control where I see LoadImageNotify() invoked simultaneously by 2 different threads for the same ProcessId value as described above. That runtime environment affects several hundred computers where the driver is deployed and the bugcheck happens consistently in a repeatable manner on any/all of those computers.

So, yes, adding appropriate synchronization will resolve the problem. What I’m trying to figure out is how/why this started happening to begin with. I agree that the code in the callback function is not behaving correctly and should fail exactly as it is failing in this scenario. Not knowing what caused the change in behavior on the part of the kernel is what I’m trying to determine. If I can get a proper understanding of the rules that the kernel follows as “contractual behavior” regarding serialized vs. simultaneous invocation of these event notification types of callback functions then I can ensure that the various callback methods implemented in this driver are using appropriate synchronization mechanisms.

Dejan_Maksimovic · February 6, 2023, 8:44am

I don’t recall that class’s implementation (I used it, but 20 years ago).
But there is NO interlocked find, there is only synchronized find.

You have to have synchronization, as find is not atomic itself, and find +
remove surely isn’t.

At first I thought you had two callbacks for the same DLL, which I don’t
think is possible at the same time.
But for two DLLs - yeah, it’s possible and how you never saw it is beyond
me It is so common.

Dejan.