Putting a barrier for DPC before continuing the rest of code

Hi all,

In a Windows driver, I wanna execute a function in a specific core.

I have multiple options like KeGenericCallDpc and broadcast DPC to all cores then check if I’m in the right core so I can execute (but that’s stupid).
The second option is using KeSetSystemAffinityThread to run on a specific core but the problem is, this function won’t support more than 64 cores.

All in all, I think the best option is setting a DPC for the specific core. For this purpose, I use KeSetTargetProcessorDpc and select my core (in this example 10 ) and finally insert the DPC into the queue.

        PRKDPC Dpc = ExAllocatePoolWithTag(NonPagedPool, sizeof(KDPC), POOLTAG);
        KeInitializeDpc(Dpc,                      // Dpc
                        MyDpcFunction, // DeferredRoutine
                        NULL                       // DeferredContext
        );

        KeSetTargetProcessorDpc(Dpc, 10);
        KeInsertQueueDpc(Dpc, NULL, NULL);
        ShowTheResults();

The problem with the above code is that ShowTheResults() might be executed before MyDpcFunction and the result is not ready yet as DPC is not executed. So how can I put a barrier to make sure that MyDpcFunction is executed on the other core (core 10) before executing ShowTheResults(). ?

Also, KeSignalCallDpcSynchronize doesn’t seem to be a solution.

So my questions,

1 - Is using DPC the best choice here? Is there any better way?
2 - How can I put a barrier so that my DPC is executed on the other core before continuing the rest of the code ( ShowTheResults() )?

That’s sort of an unusual question.

Why use a DPC? Why not just use a worker thread, and call KeSetSystemAffinityThread at the beginning of that thread?

In terms of waiting for the work to be complete… What IRQL do you need to wait at? Either wait, or have the thread queue your follow-up processing when it’s done.

Peter

You can use KeSetSystemGroupAffinityThread to get over the 64 processor limit.

@“Peter_Viscarola_(OSR)” said:
That’s sort of an unusual question.

Why use a DPC? Why not just use a worker thread, and call KeSetSystemAffinityThread at the beginning of that thread?

In terms of waiting for the work to be complete… What IRQL do you need to wait at? Either wait, or have the thread queue your follow-up processing when it’s done.

Peter

The reason is that KeSetSystemAffinityThread has the limitation of 64 cores.

@“Scott_Noone_(OSR)” said:
You can use KeSetSystemGroupAffinityThread to get over the 64 processor limit.

Thanks, I didn’t know about this function.

KeSetSystemAffinityThread has the limitation of 64 cores.

and

Thanks, I didn’t know about this function.

… it’s remarkable what one can learn from carefully reading the documentation, isn’t it?

For example, from the docs on KeSetSystemAffinityThreadEx:

Drivers that are designed to handle information about processor groups should use the KeSetSystemGroupAffinityThread routine, which specifies a processor group, instead of KeSetSystemAffinityThreadEx, which does not.

Peter

Is there a reason you want to execute on a specific core? this in and of itself is an unusual desire.

i mean, if there is another one available to do the work, and the one you want to use is busy, is there a reason you want to wait?

Is there a reason you want to execute on a specific core? this in and of itself is an unusual desire.

I guess you may want to check the following thread

https://community.osr.com/discussion/290734/cpu-pinning-in-windows

This thread is relatively long, but you have to read it to up to the very end if you want to get the most out of it, because the most “exciting” part comes in almost in the end. Enjoy!!!

Anton Bassov

Anton, you’re not helping. You know that, right?

Is there a reason you want to execute on a specific core? this in and of itself is an unusual desire.

Not really that very unusual. Consider that you want to complete a request on the same processor on which the request was initiated, to ensure you’re “near” to the data.

I do agree that trying to manually manage core allocations is usually a bad idea… it’s the kind of thing that devs tend to think of as being clever, but when you really sit down to work it out, just “letting the OS do its thing” is almost always the best approach.

Peter

Anton, you’re not helping.

Of course I do, especially in context of Marion’s statement - I refer him to the thread where I list the valid and justified reasons
for CPU reservations and assigning certain tasks to some particular CPUs.

[begin quote]

…reserving the CPUs has nothing to do with RT, as you seem to believe. Although it may have a wide range of applications covering anything from load balancing and improving CPU utilisation to making use of NUMA capabilities of the target machine, handling RT tasks is not among them…

[end quote]

The rest of the thread provides a thorough analysis of the reasons why CPU reservations are simply insufficient for running
RT tasks on GPOS (although some people seem to believe otherwise - it seems to be a fairly common misconception)

Consider that you want to complete a request on the same processor on which the request was initiated,
to ensure you’re “near” to the data.

This is, indeed, a perfectly valid and reasonable use of CPU reservations that I have mentioned on above mentioned thread.Another example that gets into my head straight away is the situation when you have multiple CPU-intense tasks that operate on their own datasets, and, hence, don’t need to either synchronise their access to the shared variables, or compete for the resources other than the CPU time. It makes a perfect sense to assign each thread to a separate CPU so that they can run in parallel, thus improving CPU utilisation, in situations like this.

Anton Bassov

Peter

In your example, surely the HT processor sharing the same core would be just as ‘near’ to the data. And how ‘far’ from it would the other processors / cores on the same die be? Far enough to justify simply not running until the exact one you want becomes available?

A you say, trying to optimize this leads to maddness. So why say that it is not that unusual? If it is madness to do it, I hope it should be unusual madness. These days, madness is the new normal, but I hope most participents here know better than to ingest bleach

In your example, surely the HT processor sharing the same core would be just as ‘near’ to the data. And how ‘far’ from
it would the other processors / cores on the same die be? Far enough to justify simply not running until the exact one
you want becomes available?

There is just a wonderful OS-agnostic article available on the topic. Although it is more than 10 years old, one may still find it informative and useful
https://lwn.net/Articles/250967/

Concerning your particular question, you may want to jump right to the following subset of it
https://lwn.net/Articles/254445/

I hope most participents here know better than to ingest bleach

Well, I don’t know about you, but I think it is really great to have a few pints of Ajax before going for all 5G masts that you can find in your area. OTOH, some people may prefer Comet instead…

Anton Bassov