Synchronizing with a DPC or work item

Michael_Rolle · February 14, 2019, 10:51pm

My driver has a resource that I want to protect across a few function calls, which might not all be in the same thread.
I plan to use a synchronization Event.
A driver dispatch routine will wait on the Event. It usually queues a DPC, which in some cases will request an IoWorkItem. Whichever of these three was the last to run will set the Event, which then allows another dispatch routine to use the resource.
Are there any gotchas associated with this? I saw something somewhere about kernel dispatcher objects being used in an arbitrary thread context. I believe this applies to a wait (since the wait involves knowing about which thread is waiting), but not to any SetEvent call (with wait = false).
I also have a “service” DPC which gets queued from time to time. This will do some work on the protected resource. All DPCs on the same resource are targeted to the same CPU, so that takes care of preventing it from running concurrently with the “dispatch” DPC. The service DPC is queued in response to a timer event, or possibly from a hardware interrupt.
There are occasions where the dispatch DPC puts the resource in a “disabled” state where I don’t want any service DPC to run any more. It would call KeRemoveQueueDpc, but if it returns FALSE, is it possible that the service DPC is still waiting on the CPU’s DPC queue?
I assume that this is possible if the timer or interrupt event occurred while the dispatch DPC was running but before the KeRemoveQueueDpc call. So how to I synchronize this?
I gather that I would need a second “service completed” dispatcher object of some sort. It would have to be in the unsignaled state if the dispatch DPC disabled the resource and the service DPC was queued and KeRemoveQueueDpc returned FALSE and the service DPC has not yet run.
Does this sound like a good strategy, or is there a simpler solutiion?

rstruempf · February 15, 2019, 2:53am

I don’t fully grasp your plan, but when you say “wait on an event”, you cannot wait from DISPATCH_LEVEL. The thread scheduler is disabled on that CPU, so the current thread cannot be suspended to wait, and no other thread can be scheduled on that CPU. The only way to synchronize at DPC and above is with spinlocks. When acquiring the spinlock, the kernel raises the IRQ to the highest IRQ that the resource will be accessed add, normally DISPATCH_LEVEL, and then acquires a spinlock. DISPATCH_LEVEL assures that the thread is not interrupted on that CPU and the spinlock allows sharing across CPUs.

If all of your code will be forced to one CPU, and all access will be from DISPATCH_LEVEL, though, then you don’t need to do anything. DISPATCH_LEVEL interrupts cannot interrupt other DISPATCH_LEVEL routines on the same CPU.

Michael_Rolle · February 15, 2019, 4:09am

Let me start over.
My driver collects information from internal CPU registers (for profiling user code). The user will have the profiled code pinned to one CPU.
The protocol for a File for my driver is a sequence of operations:

Begin a session, on CPU # N. Hardware on N is programmed to start sampling the code running on N. The driver allocates nonpaged pool to hold, among other things, a quantity of sample information.
The driver will start collecting data. Every time the hardware takes a sample, it sets a Sample bit and this can be set up to generate an interrupt. It requires driver intervention for each sample taken, to clear the Sample bit, read the sample information, and re-enable the next sample.
The driver knows there is a sample, either by checking the hardware periodically using a timer, or (if I can get this to work) by enabling an interrupt and servicing the interrupt.
From time to time, the user will issue a Read on the file, to transfer sample data it has collected. There are also some miscellaneous Control codes the File may issue.
End the session. The sampling hardware is turned off, and the nonpaged pool is returned.

The driver can have several sessions running concurrently, all on different CPUs and belonging to different Files. It ensures that a File cannot start a session on N if some other file already has a session running on the same N. This coordination is done at PASSIVE_LEVEL with a mutex for serialization. Most of the work for each session is performed by DPCs tied to that session’s N.

The difficulty I have is that the End session operation has to do some more things at PASSIVE_LEVEL after the DPC has finished its work. I fire up a work item to do this. The driver has to wait for this to be completed before it can process any new IRPs from this or any other File. That’s why I want the work item to be able to signal something that a later driver dispatch routine can wait on.

As part of the End processing I also want to handle the last sample that may be in the hardware. I can’t have a sample service DPC sitting in N’s DPC queue while the End work item frees the nonpaged pool. So it will have to wait for the service DPC to complete, if it had been queued and the call to remove it from the queue returned FALSE. Alternatively I could just always wait for the service to complete. Either way, the service DPC will need to signal something that the End work item can wait op.

Mark_Roddy · February 15, 2019, 3:40pm

Seems fine to me. You can use any waitable object for this, and I would
probably use an event as they are really simple to use. The other poster
was probably confused about “dispatch routine” vs “DISPATCH_LEVEL” because
one of those is misnamed, and has been for many decades :-).

Mark Roddy

Michael_Rolle · February 16, 2019, 8:13pm

Let me start over.
My driver collects information from internal CPU registers (for profiling user code). The user will have the profiled code pinned to one CPU.
The protocol for a File for my driver is a sequence of operations:

Begin a session, on CPU # N. Hardware on N is programmed to start sampling the code running on N. The driver allocates nonpaged pool to hold, among other things, a quantity of sample information.
The driver will start collecting data. Every time the hardware takes a sample, it sets a Sample bit and this can be set up to generate an interrupt. It requires driver intervention for each sample taken, to clear the Sample bit, read the sample information, and re-enable the next sample.
The driver knows there is a sample, either by checking the hardware periodically using a timer, or (if I can get this to work) by enabling an interrupt and servicing the interrupt.
From time to time, the user will issue a Read on the file, to transfer sample data it has collected. There are also some miscellaneous Control codes the File may issue.
End the session. The sampling hardware is turned off, and the nonpaged pool is returned.

The driver can have several sessions running concurrently, all on different CPUs and belonging to different Files. It ensures that a File cannot start a session on N if some other file already has a session running on the same N. This coordination is done at PASSIVE_LEVEL with a mutex for serialization. Most of the work for each session is performed by DPCs tied to that session’s N.

The difficulty I have is that the End session operation has to do some more things at PASSIVE_LEVEL after the DPC has finished its work. I fire up a work item to do this. The driver has to wait for this to be completed before it can process any new IRPs from this or any other File. That’s why I want the work item to be able to signal something that a later driver dispatch routine can wait on.

As part of the End processing I also want to handle the last sample that may be in the hardware. I can’t have a sample service DPC sitting in N’s DPC queue while the End work item frees the nonpaged pool. So it will have to wait for the service DPC to complete, if it had been queued and the call to remove it from the queue returned FALSE. Alternatively I could just always wait for the service to complete. Either way, the service DPC will need to signal something that the End work item can wait op.

Michael_Rolle · February 16, 2019, 8:15pm

@Mark_Roddy said:
The other poster was probably confused about “dispatch routine” vs “DISPATCH_LEVEL” because
one of those is misnamed, and has been for many decades :-).

Mark Roddy

Which one is misnamed, and why, and what should it be?

anton_bassov · February 16, 2019, 11:15pm

Which one is misnamed, and why, and what should it be?

Actually, neither of them seems to be “misnamed” - they just refer to two completely unrelated types of dispatching, which may,indeed, confuse a newbie…

The term “Dispatch routine” refers to driver routines that dispatch IRPs that are sent to a driver, and the one “DISPATCH_LEVEL” is related to the system dispatcher, i.e the one that dispatches threads. Raising IRQL to DISPATCH_LEVEL disables the system dispatcher, effectively ensuring that no context switches may occur on a given CPU until IRQL goes down to <=APC_LEVEL…

Anton Bassov

rstruempf · February 17, 2019, 12:42am

I don’t think that I am confused. I understand how ISRs work, and that they do as little work as possible as quickly as possible at their high IRQL, and then post a DPC completion routine to do the rest at DISPATCH_LEVEL. I know that although these DPC routines are software interrupts, that by piggy backing on hardware ISRs, that they interrupt your code the same as a hardware interrupt. I understand how the dynamics work with the thread sheduler disabled, and how spinlocks are used to synchronize any interactions with these, that assure the the IRQL is raised to the highest level that will access the resource before acquiring the spinlock.

The part that I am apparently confused about is that there are other Dispatch Procedure Calls that are not fired by the kernel as the IRQL is to drop below DISPATCH_LEVEL?

Peter_Viscarola_OSR · February 17, 2019, 1:17am

that by piggy backing on hardware ISRs, that they interrupt your code the same as a hardware interrupt.

I know I’m confused by that statement.

Lots of senseless drift in this thread.

there are other Dispatch Procedure Calls that are not fired by the kernel as the IRQL is to drop below DISPATCH_LEVEL?

Huh? Yes, there are other DPCs (each pending DPC callback is represented by an object queued to a particular processor). The DPC list is drained before the processor IRQL drops below IRQL DISPATCH_LEVEL. It’s a time-honored OS design principle.

Peter

rstruempf · February 17, 2019, 4:33am

@“Peter_Viscarola_(OSR)” said:

that by piggy backing on hardware ISRs, that they interrupt your code the same as a hardware interrupt.

I know I’m confused by that statement.

A hardware interrupt occurs, interrupts running code and takes over its thread context. The ISR does only as much work as is necessary at the high IRQL and then queues a DPC to complete the handling of that interrupt. When the IRQL is about to transition below DIPATCH_LEVEL, the CPU processes the DPC at DISPATCH_LEVEL, again on whatever thread happens to be scheduled on that processor. So although DPCs are software interrupts, they behave like hardware interrupts.

there are other Dispatch Procedure Calls that are not fired by the kernel as the IRQL is to drop below DISPATCH_LEVEL?

Huh? Yes, there are other DPCs (each pending DPC callback is represented by an object queued to a particular processor). The DPC list is drained before the processor IRQL drops below IRQL DISPATCH_LEVEL. It’s a time-honored OS design principle.

That was my point. The OP was talking about having his DPC routine wait until some other event is complete. My original response was that DPC routines are at DISPATCH_LEVEL and cannot “wait”, but that he should be able to use a spinlock to synchronize access to a resource shared by a DPC routine.

The second response said, no, that I was confusing DPC procedures and DISPATCH_LEVEL.

I can take this to a second thread if you like, but I’m trying to figure out why I am confused when I say that a DPC routine runs at DISPATCH_LEVEL and cannot be suspended so it cannot wait on an event or mutex, etc., that it can only use a spinlock to synchronize access by other threads on other CPUs also at DISPATCH_LEVEL.

anton_bassov · February 17, 2019, 10:37pm

[begin quote]

There are occasions where the dispatch DPC puts the resource in a “disabled” state where I don’t want any service DPC to run any more. It would call KeRemoveQueueDpc, but if it returns FALSE, is it possible that the service DPC is still waiting on the CPU’s DPC queue?

[end quote]

This is not the question of the return value…

Consider the following scenario. KeRemoveQueueDpc() that you have called from your “dispatch” DPC routine is about to make a return, so it writes a return value into RAX. No matter what it is about to return, your “service” DPC is not in a queue at this moment - even if it was in a queue at the time of the call, it has already been removed by now. However, before KeRemoveQueueDpc() has had a chance to execute RET instruction, an interrupt occurs, and your ISR enqueues the “service” DPC again.

As you can see, the “service” DPC may happily still be in a queue by the time KeRemoveQueueDpc() returns control , and it does not depend on KeRemoveQueueDpc()'s return value in any possible way. If we were speaking about the uniprocessor system you could
protect yourself from this scenario by running the “dispatch” DPC routine with interrupts disabled. However, it is not going to help you on MP system anyway, because the “service” DPC may get re-queued on another CPU

So how to I synchronize this?

As you can see it yourself, there is no way to synchronize it, unless you don’t mind doing something as stupid
as running your “dispatch” DPC routine while holding KDPC’s spinlock with interrupts disabled. What you have to do here is just to change your requirements and restructure you code in order to ensure that the above scenario either cannot occur, or does not pose any problems or errors even if it does occur…

Anton Bassov

Michael_Rolle · February 27, 2019, 11:01pm

I have simplified my design since the original post.
Now the driver creates separate devices for each CPU, with their own device and symbolic link names. The user code decides which CPU it wants to use before opening the File, rather than opening a File on a single device and specifying the CPU in a Control IRP.
There’s still a Start IOCTL code, it just doesn’t specify the CPU.
The Start IRP does some stuff in a DPC and this may fail for some reason. Normally the DispatchControl would mark the IRP pending and the DPC would complete the IRP with its result. However, there’s some more work at PASSIVE_LEVEL that is required, namely, connecting an interrupt vector to an ISR on the designated CPU. If this fails, I want the IRP to be completed with an error status.
So I think I want to have DispatchControl wait for the DPC to finish, using an Event, then do the ISR stuff and finally complete the IRP. I don’t mind the short delay.

Peter_Viscarola_OSR · February 28, 2019, 1:47am

Mr Rolle… With all due respect, you’re kinda pushing your luck. You’re setting some sort of record for frequency of posts, and you’re asking mostly questions that aren’t amenable to simple, factual, answers here on the Community. I’ve told you that you’re not likely to get good end-to-end engineering design one question at a time. But you persist.

If this is a hobby project, fine. Everyone needs a hobby. But please be clear about that, so the professionals here — myself included — can decide if they want to try to teach you Windows architecture, computer OS and device concepts, and how to write Windows drivers, in single post increments… cuz you wanna know for the sake of amusement and curiosity.

If this is a real work project, you long ago reached the point where you need to acquire professional help for your project, in addition to doing some serious studying. Hire a consultant who knows what they’re doing. Take a seminar. But seriously, don’t ask us to desig your driver one post at a time. It’s basically asking us to act irresponsibly, answering point questions without the full picture and effectively commit malpractice.

Please. Don’t make me out you on moderation. We’ve tried to be helpful, but enough is enough, you know?

anton_bassov · February 28, 2019, 7:02am

You’re setting some sort of record for frequency of posts,

Oh, come on - although the OP is undoubtedly highly “productive” in this respect, he still does not go anywhere close to Max when it comes to flooding NTDEV. Certainly, breaking Max’s records is not the easiest task one would imagine - every time I noticed that a dozen or so of originally dormant threads had, once in a sudden, grown at least 20 post longer completely out of the blue, I already knew that our “Windows fanboy” had woken up.

Certainly, his astounding “efficiency” was significantly “enhanced” by Truly Your’s “contributions” to NTDEV as well,
but this is already a different story…

Anton Bassov