The MSDN doc for KeQueryActiveProcessors contains this line:
“Callers cannot assume that KeQueryActiveProcessors maps processors to bits consecutively, or that the routine consistently uses the same mapping each time it is called. The only valid use for the return value is to determine the number of active processors by counting the number of bits that are set.”
This is confusing. Does that mean the bits in the KAFFINITY correspond to real processor number? Or does it mean the bits are randomly set?
If randomly set, how am I supposed to specify, say the first 2 processors when calling KeSetSystemAffinityThread or other affinity related APIs?
It means exactly what it says: The only use of the return value is to determine the NUMBER of active processors, and not to infer anything about the position of the bits in the returned value.
It also only speaks to this DDI, and thus doesn’t say anything about KeSetSystemAffinityThread. I would not generalize this information to KeSetSystemAffinityThreadEx or KeSetSystemGroupAffinityThread.
Of course, the underlying issue *really* is “why do you care”? Do you really wish to distinguish between processors 0 and 1 being active versus processors 0 and 2? And that numbering would be based on… what? Before you say “NUMA!” as your answer, not that if you’re interested in NUMA issues, there is a separate set of DDIs (see KeQueryNodeActiveAffinity and friends) that need to be taken into account for this.
In any case, I hope that answer helps.
Peter
OSR
@OSRDrivers
Googling “windows processor affinity” will return a lot of articles describing why changing the processor affinity in user mode, say thru the Task Manager, can improve performance e.g.
http://www.techrepublic.com/blog/windows-and-office/change-the-processor-affinity-setting-in-windows-7-to-gain-a-performance-edge/
Anyway, without digging into why we need to change the processor affinity (we can do this even thru user mode WinAPI or simply clicking some checkboxes in the Task Manager), how do we specify the parameters for the documented API KeSetSystemAffinityThread() ? It requires a KAFFINITY parameter but are there any reliable ways to tell what bits to set if the result from KeQueryActiveProcessors is not reliable at all.
While the issue of whether CPU Affinity is useful or not is not debatable in certain conditions, what is also undeniable is that Tech Republic article is sad piece of technical writing with almost no useful technical content.
Setting threads belonging to a process to a have affinity with a specific sub-set of processors is fine.
If you want to affinitize App A and App B to different processors, set App A with a CPU Affinity Mask of 0x01 and App B with a CPU Affinity Mask of 0x02. Nothing hard about that. And no reason at all, none, zero, to call KeQueryActiveProcessors.
Similarly, if you want App X and App Y to “fight it out” on the same processor, just affinitize them using the same KAFFINITY mask. Again, KeQueryActiveProcessors is not required or even useful.
I think you’re making this problem harder than it is.
Whatever you do, be sure to think carefully through your decisions regarding affinitizing threads. It’s one of those things that people who are new to understanding Windows internals often jump on as a good idea, but when you REALLY sit back and think about it, you discover that it’s almost never a good idea at all. The Kernel usually does a far better job than you will of determining where to run threads to achieve best overall system throughput, assuming the importance of all threads are reflected by their scheduling priorities. This is especially true in multi-threaded (SMT) and NUMA type architectures where Windows makes some pretty sophisticated decisions about which processor to use for scheduling. If you want to force different behavior, you CAN use affinity… but use it carefully, and be sure you’re taking into account the different potential physical processor topologies on which your threads are going to run. It’s not that easy.
Peter
OSR
@OSRDrivers
>This is especially true in multi-threaded (SMT) and NUMA type
architectures where Windows makes some pretty sophisticated decisions about
which processor to use for scheduling.
In the quest to make the scheduler more scalable, Windows doesn’t have a dispatcher lock anymore, and each processor has its own ready thread list. But it might have made things worse on client non-NUMA boxes. Sometimes you see that a process with one runaway thread is causing systemwide degradation and slow response, because the dispatcher doesn’t know to move ready threads into another idle processor of the socket.
It might make more sense to have a per-socket queue, because a spinlock (and a structure in general) which doesn’t cross socket boundary has much less cost than cross-node locks and structures, because of better cache locality.
xxxxx@osr.com wrote:
Whatever you do, be sure to think carefully through your decisions regarding affinitizing threads. It’s one of those things that people who are new to understanding Windows internals often jump on as a good idea, but when you REALLY sit back and think about it, you discover that it’s almost never a good idea at all. The Kernel usually does a far better job than you will of determining where to run threads to achieve best overall system throughput…
This is a critically important point. Some people seem to think that
assigning a thread affinity means their thread will be given total
ownership of that processor. It’s not true. You still compete for
processors, but now you are competing with an artificial handicap. The
actual effect of thread affinity is usually to reduce performance,
because all you’ve done is eliminate circumstances where your thread
COULD have run.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
On Thu, Dec 3, 2015 at 8:05 PM wrote:
>
> Sometimes you see that a process with one runaway thread is causing
> systemwide degradation and slow response, because the dispatcher doesn’t
> know to move ready threads into another idle processor of the socket.
>
>
>
That’s odd; to the best of my understanding, the idle processor should
steal work from other processors’ queues. See Windows Internals 6th
Edition, Part 1, page 468. Perhaps you’re referring to the situation where
the idle processor is parked?
> actual effect of thread affinity is usually to reduce performance,
If you have compute-intensive tasks (bulk encryption, to provide one) - it is good to spin a thread per core, and affinitize each thread per core. Then divide the task between cores (chunk per core) this way.
Otherwise, sometimes the ready thread will be startving competing on the same core with the running thread of the same priority.
–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com