Inverted Call Model right for me needs?

Hi all,

I have a question concerning kernel-> user communication on an event-driven basis. Since I'm just a newbie in driver development I thought up a project that would enable me to learn how it all works:


I'm building a KMDF driver and userland service, the KMDF needs to send the PID of each created process to the userland service which will then do some processing on it. So each time a process is created the userland service needs to be notified and the PID needs to be sent to it, instantly without missing anything. I searched the internet for a while and came across the inverted call model, many say this is the way to go for kernel->userland communication.


I have implemented this, however, I do have some issues with it that I hope you all could clarify:

  1. My KMDF driver uses PsSetCreateProcessNotifyRoutineEx and has an IO queue for the IOCTL requests coming from userland
  2. Each time the PsSetCreateProcessNotifyRoutineEx routine is called, my driver will complete a request in the queue and sent the PID in the output buffer
  3. My USERLAND SERVICE has a separate thread where I perform a synchronous DeviceIoControlin in an infinite loop

Now, this all works beautifully, but only if the userland application is fast enough to send a new IOCTL after processing the previous one, so I'm in real risk here to miss out on process creation events if the processing would take longer than normal.


So now for my plethora of questions:

  1. Is this a correct implementation of the inverted call model, and what exactly is inverted about this? Because as far as I see it, the userland application needs to send out an IOCTL first (which in my eyes is not event-driven kernel communication)
  2. If yes to the previous question, should I implement logic myself to keep track of all process created events in some sort of FIFO list, whenever the client application sends an IOTCTL I complete it with an entry in the list, then remove the entry until the list is empty. Is there a specific way I need to go about this, or just create an array and make sure I manage the memory correctly.
Best regards!

Now, this all works beautifully, but only if the userland application is fast enough to send a new IOCTL after processing the previous one,
so I’m in real risk here to miss out on process creation events if the processing would take longer than normal.

At the risk of provoking Mr.Johnson’s wrath, I would suggest doing things in a UNIX-like fashion. You will need two separate IOCTLs here if you decide to go this way.

IOCTL X will act as a logical equivalent of a signal delivery to the target thread, so that will be completed asynchronously by your driver (it is going to happen in a callback). IOCTL Y will act as a poll()/select() equivalent - it will get submitted by your app only upon the IOCTL X’s completion, and completed synchronously. In order to make sure that no events are lost, your driver will maintain a counter that gets incremented/decremented with interlocked op by respectively a callback and IOCTL Y’s handler

For example, consider the following approach. If InterlockedIncrement() returns 1 your callback completes a pending IOCTL X and does nothing if the return value is above 1. Your IOCTL Y handler will process all the outstanding requests in a loop, decrementing the above mentioned counter upon every iteration before actually processing the request. If InterlockedDecrement() returns 0 IOCTL Y handler will break out of the loop and complete the request.

It does not sound THAT complex, does it…

Anton Bassov

From what I understand, the inverted call idea is that your user mode application submits multiple ioctls asynchronously (overlapped io) and then waits on them. Each request gets pended and queued up by the driver and then when a request gets completed, the user mode thread gets signaled.

If you haven’t already, see this article on it: https://www.osr.com/nt-insider/2013-issue1/inverted-call-model-kmdf/

For what you’re wanting to accomplish in this example though I would probably just create a queue or list in the driver to keep the process information and then complete the incoming IRPs by pulling things out of the list/queue.

For what you’re wanting to accomplish in this example though I would probably just create a queue or list
in the driver to keep the process information and then complete the incoming IRPs by pulling things out of the list/queue.

The OP is concerned about the scenario when the thread that actually processes the requests gets bogged for some reason
(for example, because of the priority inversion scenario in some KM driver so that it has no chance of doing anything about it).
In this case the callback may, indeed, complete all outstanding IRPs before the target app has a chance to submit the new ones.
As a result, the app will start missing the events.

Therefore, the OP asks us how this “unfortunate situation” may be avoided…

Anton Bassov

If the driver is maintaining a list/queue of the events then there’s no problem unless you run out of memory. Yes the IRP’s submitted by the user mode thread may be completed with some delay, but realistically how many new processes are going to be launched per second?

@anton_bassov said:

For what you’re wanting to accomplish in this example though I would probably just create a queue or list
in the driver to keep the process information and then complete the incoming IRPs by pulling things out of the list/queue.

The OP is concerned about the scenario when the thread that actually processes the requests gets bogged for some reason
(for example, because of the priority inversion scenario in some KM driver so that it has no chance of doing anything about it).
In this case the callback may, indeed, complete all outstanding IRPs before the target app has a chance to submit the new ones.
As a result, the app will start missing the events.

Therefore, the OP asks us how this “unfortunate situation” may be avoided…

Anton Bassov

Thank you Anton for putting is so eloquently, that was exactly my concern :). My intitial idea was indeed to created a queue or list, but I have not really come to grips yet with the queue system and how I can use it for custom data. The only queue I know now are WDFQUEUE, but I guess these are for I/O operations only and not really usable for custom data?

I’ll read this first , that might answer my question already.

Thanks for the help everybody!

Given, you need to create a queue of some kind and size it… somehow.

I’d suggest you simply make that queue the Queue of incoming Requests. Have the app send multiple Requests simultaneously, as previously suggested. The. The driver completes them as it has data to return. If the app gets “too busy” and the driver runs out of Requests, just increase the number of Requests the app sends in advance. The Requests will live on a WDFQUEUE and will be handled automatically by the Framework.

Think about it. I think you’ll see this is the easiest way forward, with the least code to write and debug, and will meet your needs quite nicely. Why invent a SECOND queue?

Peter

All,

What I’ve done is manually created a linked list to hold all events, works like a charm now :).
Thanks all for the help!

Soooo… you created a queue of events that you use to serve data to your queue of Requests. Thus, instead of one queue (Requests), you have two queues: one of events and one of Requests.

Surely you can see that one queue of size N followed by a second of size M (serving the same activity) in this case is entirely equivalent to one single queue of size N+M, right?

Or is there some unspecified constraints that make two queues a better idea?

Peter