> Thanks!
"> ok, so the waking of a thread due to mutex released happens as a result
of
> an interrupt.
And most especially, you cannot wake a thread from a mutex at DIRQL
levels, that is, from an ISR. Note that there are special rules about
calling KeReleaseMutex from DISPATCH_LEVEL (see the docs)."
I meant, when the system wakes a thread. When you call release, there will
be a DPC queued, and when it’s time for it to be processed, the kernel
will raise the IRQL to dispatch. As I understood from what I read.
I don’t follow any of the above. When the scheduler dispatches a
schedulable thread, it sets a series of parameters that define the “thread
context”. In the case of a user app, this includes setting the virtual
memory map to point to the correct VM map for the process. In the case of
a kernel thread, the VM map for the kernel space is guaranteed to be
identical in all kernel-level threads, or user-level threads that have
entered the kernel. There is only one kernel-level VM map. In user
space, this set of addresses does not exist, and an attempt to user a
kernel address gives an access fault.
When you call release, there is no reason to expect a DPC is involved.
Generally, when there are DPCs queued, any time a core drops to below DPC
level, a DPC queued on that core’s DPC queue will be dispatched, and
indeed, the level is raised to DISPATCH_LEVEL. This is an
oversimplification of what is really happening, and not quite precise, but
for your purposes it is close enough. There is no preemptive scheduler
for DPCs; a DPC must run to completion, and do so within a limited time
budget (typically something like 50us, 0.000050 seconds). A DPC cannot be
preempted except by a hardware interrupt, and, in fact, the reason for
using DPCs is to minimize interrupt latency so fast-response devices will
not become starved or suffer data over/under-run.
"> Some functions will be called at dispatch level. So this should mean
that
> those functions will not be ran by multiple threads.
Not true. You could have multiple threads running on multiple cores, each
thread running at DISPATCH_LEVEL. And therefore, running concurrently."
"No. What it means is that those functions will be called at IRQL ==
DISPATCH_LEVEL. That is a constraint on what you can do (you cannot
block
except on a spinlock).
That does not mean the function is serialized. Another CPU, also at
DISPATCH_LEVEL (of course) can call that function simultaneously. By
definition that other CPU is executing a different ‘thread’. With 16
cores
you could have 16 threads (at dispatch level) active in your entry point.
"
It seems I misunderstood the concept of ‘thread’ here. I thought that only
when the code is run under the management of a thread scheduler it could
be called a thread. I understand, then, that these kinds of threads also
have context info, similar to threads managed by the scheduler (and
similar to user space threads).
Yes, note that I said that “thread” is a slippery concept at kernel level.
A thread is, strictly speaking, defined by a register set and a stack.
The register set includes the registers that make memory maps available,
hence a “kernel thread” is a set of registers such that the kernel address
context is defined, and the stack is a stack in the kernel.
The key here is that there is a scheduler responsible for dispatching
user-level threads, and the entities called “kernel threads”, all of which
are nominally running at PASSIVE_LEVEL, by setting up these register
contexts, including [E/R]SP, and the memory map pointers. However, when a
user-level thread calls the kernel, the kernel entry routine provides a
different context, including making the kernel address space visible, and
switching to a kernel stack for execution. But an interrupt also does
this; the interrupt handler has a private register set, and switches to a
kernel stack, and makes the kernel map available. Think of this as
happening by heavy-duty magic (I’ve written many ISRs, and the first dozen
or so instructions are Very Deep Magic, then the next couple hundred
instructions are merely Deep Magic, and finally you are in full kernel
mode with a stack and everything. Logically, it looks like a separate
thread of execution, but if you look around, you will see that some other
thread, such as a PASSIVE_LEVEL user thread, a PASSIVE_LEVEL kernel
thread, or a DPC “thread” was executing. The concurrency looks like any
normal thread concurrency, since it is “concurrent” with the executing
state it “hijacked”, which means that if there are data structures which
you must access from two or more levels, such as ISR and DPC, or ISR and
PASSIVE_LEVEL, you have to use interrupt-level synchronization in those
other threads. These synchronization primitives raise the processor level
to DIRQL level (thus prohibiting interrupts on the current core) and sets
the ISR lock in the KINTERRUPT_OBJECT (thus prohibiting interrupt dispatch
from other cores). This allows you to do things like set the device
register values for multiple device registers without an interrupt coming
in the middle of your setup.
Your concept of “threads” is very simplistic, since you think of it in
terms of the scheduler dispatching “thread objects”, but the truth is that
a “thread” is essentially defined as a register context and a stack, and
nothing more. Thus, any mechanism that provides a register context can be
thought of as a “thread scheduler”. The interrupt hardware and the first
few hundred instructions thus fulfill the requirements of being a
“scheduler” because what they do is put the core into a particular
register set/stack state which is different than the register set/stack
state that had been executing. The DPC mechanism thus also forms a
“scheduler” because each time a DPC is executed, the core is given a
register set/stack state that is not the same as it previously had. So
you need a broader definition of what constitutes “thread” than the more
restrictive view you have.
My question should then be reformulated:
Some functions will be called at dispatch level. So this should mean that
if I have 4 CPUs, those functions will be run by max 4 threads. I suppose
it’s possible to have more threads than CPUs in kernel as we have in user
space; and this mapping of multiple threads to fewer CPUs can only be done
by a thread manager / scheduler. So there will be parts in kernel that
will be run by dynamically created threads (threads created by a thread
manager), at IRQL < dispatch.
It is not only possible to have more threads than cores, it is typical.
There may be hundreds of threads, and a huge number of them might be
marked as “runnable”, but you can only run as many contexts as cores;
that’s a “maximum” requirement–you could be running fewer threads than
cores, and some cores will be doing their “idle loop”, which is not
necessarily a “loop”. In modern CPUs, it may be a HALT instruction or
something similar that drops that core’s power consumption to a much lower
value, particularly valuable for multicore systems on laptops, tablets,
and cell phones.
But this does not mean that all “threads” are running at PASSIVE_LEVEL.
Taking our looser definition as register set/stack state, a 8-core system
might have one core running a user-level thread in an app, another core
running a kernel-only thread (created by PsCreateThread in a driver, for
example) at PASSIVE_LEVEL, and during the execution of that thread it
might have transitions to high-priority PASSIVE_LEVEL (a kernel thread
that acquires a mutex by KeWaitFor… is boosted to level 31 priority),
transitions to DISPATCH_LEVEL (which would happen when it acquires a spin
lock) or even to DIRQL levels (which would happen when the thread had to
synchronize with state also used by an ISR). Similarly, the a core might
be running a DPC at DISPATCH_LEVEL, which may have transitions up to DIRQL
levels to synchronize access to state shared with an ISR. And so on.
Some cores may be running ISRs, so they will be at DIRQL level threads, as
defined by our broader concept of “thread”.
“Note that “threads” are a slippery concept at the kernel level, since an
ISR “hijacks” some innocent bystander thread, and in effect runs
“concurrent” with that thread. Similarly, DPC routines “hijack” some
thread. So you have to think of the register context defining a “thread”
rather than a dispatcher artifact like thread ID.”
It may be that I’m still confused about it, accustomed to the user space
view of threads (have a HANDLE, an id, you know the CPU preempts it to
resume a different thread). In kernel,
is the term thread synonymous to CPU?
A “thread”, as represented by a KTHREAD object, is a fully-schedulable
entity created by the thread subsystem and managed by the scheduler. A
“thread handle” is a user-level synonym for a KTHREAD object, found by
doing a lookup in the process’s handle table. But “managing” these
threads means that the scheduler, in order to dispatch them, sets the
register/stack state to the correct one for that thread; when the thread
is preempted or descheduled for any reason, that register/stack state is
saved as part of the thread description, and restored whenever the
scheduler wishes to dispatch the thread.
But DPCs and ISRs are “scheduled” by other means; all that the
“scheduling” requires is that the correct register/stack state be set.
The first few hundred instructions of the kernel’s interrupt handler (the
code that eventually calls your ISR) create this state, so the ISR
“thread” is “scheduled” by the hardware. Similarly, DPCs are “scheduled”
by the DPC mechanism, which from your viewpoint is pure magic, but when
you are in a DPC, its state is defined by a register set and stack. A DPC
can be preempted by an ISR, and an ISR can be preempted by an interrupt of
higher priority. So the component called “the scheduler” in the kernel
architecture discussions in various books is only one of many “schedulers”
that cause “threads” to execute.
By the way, I found that in NDIS there is also NDIS_RW_LOCK_EX, which can
be used at dispatch level. If it would be to make a choice between it and
spin lock, perhaps the only difference between them is read-write lock
versus non-condition-check lock.
An ERESOURCE (Executive Resource) is an example of a
multiple-reader-single-writer lock. I don’t know if NDIS uses an ERSOURCE
or has its own private mrsw implementation. There are, in addition,
spin-locks that support mrsw behavior, and they can be executed at
DISPATCH_LEVEL. I don’t know what you mean by “non-condition-check” lock.
joe
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer