Synchronization in Kernel

Hello,

I’ve got a few things unclear:

  1. If the threads and processes are a mechanism created by the kernel (which I know it to be true), what causes the need of synchronization in Kernel? Is there each core of a CPU assigned to a part of the kernel, or how does it happen?

  2. When should I expect concurrency in kernel: in any callback called by the kernel? (exception must surely be where it says in doc that callback X is called after Y is finished)

  3. What’s the difference between a fast/guarded mutex and spinlock?

Thanks!

You need to find a basic book on multithreading and parallel processing.

In the meantime, here it is:

Because there are objects that need to be protected from modification by more than one thread at any given time, you need a mechanism to protect those objects. Read in your Computer Sciences 101 book about “mutual exclusion” and “semaphore” primitives.

  1. You shold ALWAYS expect concurrency in kernel, except when you have an explicit serialization guarantee.

  2. Read the WDK documentation.

I guess you should simply stop writing your driver and, instead, read some books on the OS design.
Your questions reveal that your level of understanding of kernel-level design principles is, for the time being, at zero. Therefore, your chance of producing a workable driver with this level of understanding of kernel principles is ABSOLUTELY ZERO…

What’s the difference between a fast/guarded mutex and spinlock?

If I tell you that mutex is a dispatcher-level synchronization construct that may be used only in threaded context while spinlock is inter-CPU synchronization primitive that may be used in atomic context, are you going to understand it??? If I tell you that the former can be implemented only on top of latter on MP system
while on UP one simply disabling preemption will suffice, are you going to understand why it works this way??? Therefore, go and get a good book on OS design…

Anton Bassov

I believe I do have a basic understanding of multithreading and parallel processing.
I have used mutex & semaphore & critical section in user-mode apps before.

“You shold ALWAYS expect concurrency in kernel, except when you have an
explicit serialization guarantee.”

Should I expect a driver callback to be executed simultaneously by multiple threads / processors?
Like, in NDIS, there are these callbacks:
FilterSendNetBufferLists
FilterSendNetBufferListsComplete.

Should I assume that while FilterSendNetBufferLists is ran, another call to FilterSendNetBufferLists may happen?

Also, if NDIS calls FilterSendNetBufferListsComplete as a result of NdisFSendNetBufferLists being called in FilterSendNetBufferLists, should I expect it possible for FilterSendNetBufferLists and FilterSendNetBufferListsComplete to be ran simultaneously?
e.g NDIS to call FilterSend for Nbl2 while processing FilterSendComplete for Nbl1.

“If I tell you that mutex is a dispatcher-level synchronization construct that
may be used only in threaded context while spinlock is inter-CPU synchronization
primitive that may be used in atomic context, are you going to understand it???”

The mutex is a dispatcher object. Thread dispatching happens at dispatch level. At dispatch level, the thread dispatcher resumes / starts executing threads. A user mode mutex can be used for synchronization between multiple processes.
The spinlock is a kind of loop where it reads for a state to see if it changed. It disallows context switching while spinning, so no other thread can execute on that CPU while it’s spinning. And the spinlock helps you in a multi-processor scenario.

ok, so the waking of a thread due to mutex released happens as a result of an interrupt.

Perhaps a better question:
If I’m not spawning threads, I’m thinking how should I expect my driver callbacks to be called…
Some functions will be called at dispatch level. So this should mean that those functions will not be ran by multiple threads. Perhaps I should expect them to be ran by multiple processors at once. As about the functions not running at dispatch level, perhaps I should expect random system threads to be assigned on doing their work.

I have Windows Internals 6th edition (part 1, 2), but it will be a while until I finish reading it all.
Perhaps I should research more before asking such questions.

> I believe I do have a basic understanding of multithreading and parallel

processing.
I have used mutex & semaphore & critical section in user-mode apps before.

“You shold ALWAYS expect concurrency in kernel, except when you have an
explicit serialization guarantee.”

Should I expect a driver callback to be executed simultaneously by
multiple threads / processors?
Like, in NDIS, there are these callbacks:
FilterSendNetBufferLists
FilterSendNetBufferListsComplete.

Should I assume that while FilterSendNetBufferLists is ran, another call
to FilterSendNetBufferLists may happen?

Also, if NDIS calls FilterSendNetBufferListsComplete as a result of
NdisFSendNetBufferLists being called in FilterSendNetBufferLists, should I
expect it possible for FilterSendNetBufferLists and
FilterSendNetBufferListsComplete to be ran simultaneously?
e.g NDIS to call FilterSend for Nbl2 while processing FilterSendComplete
for Nbl1.

“If I tell you that mutex is a dispatcher-level synchronization construct
that
may be used only in threaded context while spinlock is inter-CPU
synchronization
primitive that may be used in atomic context, are you going to understand
it???”

The mutex is a dispatcher object. Thread dispatching happens at dispatch
level. At dispatch level, the thread dispatcher resumes / starts executing
threads. A user mode mutex can be used for synchronization between
multiple processes.
The spinlock is a kind of loop where it reads for a state to see if it
changed. It disallows context switching while spinning, so no other thread
can execute on that CPU while it’s spinning. And the spinlock helps you in
a multi-processor scenario.

No, strangely enough (historical naming), dispatching does NOT take place
at DISPATCH_LEVEL. It only takes place at PASSIVE_LEVEL. If you attempt
any kind of thread-related operation at DISPATCH_LEVEL on your current
thread, a BSOD will result. Thus, at DISPATCH_LEVEL, you may not use a
mutex, semaphore, fast mutex, or executive resource. For most purposes,
you are limited to a KSPIN_LOCK for your synchronization (or a queued spin
lock, which is also a KSPIN_LOCK, just to keep life exciting).

ok, so the waking of a thread due to mutex released happens as a result of
an interrupt.

And most especially, you cannot wake a thread from a mutex at DIRQL
levels, that is, from an ISR. Note that there are special rules about
calling KeReleaseMutex from DISPATCH_LEVEL (see the docs).

Perhaps a better question:
If I’m not spawning threads, I’m thinking how should I expect my driver
callbacks to be called…
Some functions will be called at dispatch level. So this should mean that
those functions will not be ran by multiple threads.

Not true. You could have multiple threads running on multiple cores, each
thread running at DISPATCH_LEVEL. And therefore, running concurrently.
Whether or not this applies to callbacks is something I cannot comment on,
but, in general, running at DISPATCH_LEVEL is no guarantee that there is
no concurrency.

Perhaps I should
expect them to be ran by multiple processors at once. As about the
functions not running at dispatch level, perhaps I should expect random
system threads to be assigned on doing their work.

Yes, the assignment of system threads running at PASSIVE_LEVEL is
essentially random, from your viewpoint. Once you are at PASSIVE_LEVEL,
you still could have concurrency from one or more threads running at
DISPATCH_LEVEL, and in some cases (probably not for NDIS drivers, but
certainly true for hardware drivers) you have the possibility of a thread
at DIRQL level, one (at least) thread at DISPATCH_LEVEL, and many threads
at PASSIVE_LEVEL, and if there is any possibility these share a data
structure, that data structure has to be protected by the most powerful
synchronization required (that is, if it is accessed at DIRQL level, such
as hardware registers might be, then you will need KeSynchronizeExecution,
or more properly its KMDF equivalent whose name escapes me at the moment.

I have Windows Internals 6th edition (part 1, 2), but it will be a while
until I finish reading it all.
Perhaps I should research more before asking such questions.

Windows Internals is not going to be as much help as you think; read about
synchronization in WDF/KMDF.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>Should I expect a driver callback to be executed simultaneously by multiple
threads / processors?

Like, in NDIS, there are these callbacks:
FilterSendNetBufferLists
FilterSendNetBufferListsComplete.

Yes, you should expect and plan for this. NDIS does not serialize these
entry-points. All packet path activity is deserialized.

Should I assume that while FilterSendNetBufferLists is ran, another call
to FilterSendNetBufferLists may happen?

Yes.

Also, if NDIS calls FilterSendNetBufferListsComplete as a result of
NdisFSendNetBufferLists being called in FilterSendNetBufferLists, should I
expect it possible for
FilterSendNetBufferLists and FilterSendNetBufferListsComplete to be ran
simultaneously?
e.g NDIS to call FilterSend for Nbl2 while processing FilterSendComplete
for Nbl1.

Yes. That is certainly possible. It is also certainly possible that you
could have a ‘nested’ callback to FilterSendNetBufferListsComplete() while
your call to NdisFSendNetBufferLists() is active on a thread/CPU (depending
on IRQL) and that another CPU could proceed into calling your
FIlterSendNetBufferListsComplete() or any other datapath entry. This is
what deserialized means. A free-for-all across multiple CPUs and threads.

Some functions will be called at dispatch level. So this should mean that
those functions will not be ran by multiple threads.

No. What it means is that those functions will be called at IRQL ==
DISPATCH_LEVEL. That is a constraint on what you can do (you cannot block
except on a spinlock).
That does not mean the function is serialized. Another CPU, also at
DISPATCH_LEVEL (of course) can call that function simultaneously. By
definition that other CPU is executing a different ‘thread’. With 16 cores
you could have 16 threads (at dispatch level) active in your entry point.

Perhaps I should expect them to be ran by multiple processors at once.

Yes.

As about the functions not running at dispatch level, perhaps I should
expect random system threads to be assigned on doing their work.

Random is not necessarily the most precise description. The point is the
thread context is ‘arbitrary’ meaning you don’t control it and cannot
predict what it will be.

Perhaps I should research more before asking such questions.

Perhaps. But you might also now understand that this list or at least some
of its contributors has at times been prone to treating some questions with
harsh responses. It takes some thickness of skin. Keep asking but
understand sometimes you will get challenged to go off and ask a better
question.

Good Luck,
Dave Cattley

1 Like

> Hello,

I’ve got a few things unclear:

  1. If the threads and processes are a mechanism created by the kernel
    (which I know it to be true), what causes the need of synchronization in
    Kernel? Is there each core of a CPU assigned to a part of the kernel, or
    how does it happen?

Once the kernel code is loaded, it is potentially accessed and executed
concurrently by every core. There is no “division of responsibility”. So
there is always the potential for concurrency, everywhere, all the time.
In hardware drivers, you can expect a minimum of three threads: one DIRQL
thread (never more than one, because the ISR dispatching assures there can
only be one DIRQL thread at a time running–the “ISR spin lock” is what
handles this), at least one DISPATCH_LEVEL thread (and it is possible for
multiple cores to be executing DISPATCH_LEVEL code, even the same DPC
routine, because you can schedule a DPC thread on a specific core [by
default it runs on the core that took the interrupt] and on some advanced
architectures, the interrupt can be taken on different cores, depending
upon the incidental details of the hardware implementation, the phase of
the Moon, and the angular relation of Jupiter and Mars. And there can be
many, many, many PASSIVE_LEVEL threads executing, all concurrent with the
DPCs and/or ISR. In general, you would like to structure your driver in
such a way that you have predictable concurrency, but for many drivers
this is simply impossible [any driver that streams or provides a
continuous flow of interrupts independent of the state of the driver])

Note that “threads” are a slippery concept at the kernel level, since an
ISR “hijacks” some innocent bystander thread, and in effect runs
“concurrent” with that thread. Similarly, DPC routines “hijack” some
thread. So you have to think of the register context defining a “thread”
rather than a dispatcher artifact like thread ID.

  1. When should I expect concurrency in kernel: in any callback called by
    the kernel? (exception must surely be where it says in doc that callback X
    is called after Y is finished)

Concurrency is there all the time. I can’t speak as to potential
concurrency of WDF callbacks, but you have to also ask “concurrent with
what?”. For example, a callback done at DISPATCH_LEVEL might still be
concurrent with a DPC and/or an ISR. A callback done at PASSIVE_LEVEL is
always potentially concurrent with anything.

  1. What’s the difference between a fast/guarded mutex and spinlock?

A fast mutex is a scheduler object; as such, to wait on it involves
descheduling the thread. This means it can only be used at PASSIVE_LEVEL.
Spin locks, the generic/queued kind, are valid at DISPATCH_LEVEL and
below, and the code executing in them is always executed at DISPATCH_LEVEL
(thus imposing limitations, even if the thread that called the spin lock
acquisition had been running at PASSIVE_LEVEL before). In addition, the
KeSynchronizeExecution (or its WDF equivalent to synch to an ISR) will
synchronize to ISRs (using the ISR spin lock) from either PASSIVE_LEVEL or
DISPATCH_LEVEL, but during the time the lock is in place you are running
at DIRQL_LEVEL and therefore once more limited in what you can call within
the scope of such a lock.
joe

Thanks!


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks!

"> ok, so the waking of a thread due to mutex released happens as a result of

an interrupt.

And most especially, you cannot wake a thread from a mutex at DIRQL
levels, that is, from an ISR. Note that there are special rules about
calling KeReleaseMutex from DISPATCH_LEVEL (see the docs)."

I meant, when the system wakes a thread. When you call release, there will be a DPC queued, and when it’s time for it to be processed, the kernel will raise the IRQL to dispatch. As I understood from what I read.

"> Some functions will be called at dispatch level. So this should mean that

those functions will not be ran by multiple threads.

Not true. You could have multiple threads running on multiple cores, each
thread running at DISPATCH_LEVEL. And therefore, running concurrently."

"No. What it means is that those functions will be called at IRQL ==
DISPATCH_LEVEL. That is a constraint on what you can do (you cannot block
except on a spinlock).
That does not mean the function is serialized. Another CPU, also at
DISPATCH_LEVEL (of course) can call that function simultaneously. By
definition that other CPU is executing a different ‘thread’. With 16 cores
you could have 16 threads (at dispatch level) active in your entry point. "

It seems I misunderstood the concept of ‘thread’ here. I thought that only when the code is run under the management of a thread scheduler it could be called a thread. I understand, then, that these kinds of threads also have context info, similar to threads managed by the scheduler (and similar to user space threads).

My question should then be reformulated:
Some functions will be called at dispatch level. So this should mean that if I have 4 CPUs, those functions will be run by max 4 threads. I suppose it’s possible to have more threads than CPUs in kernel as we have in user space; and this mapping of multiple threads to fewer CPUs can only be done by a thread manager / scheduler. So there will be parts in kernel that will be run by dynamically created threads (threads created by a thread manager), at IRQL < dispatch.

“Note that “threads” are a slippery concept at the kernel level, since an
ISR “hijacks” some innocent bystander thread, and in effect runs
“concurrent” with that thread. Similarly, DPC routines “hijack” some
thread. So you have to think of the register context defining a “thread”
rather than a dispatcher artifact like thread ID.”

It may be that I’m still confused about it, accustomed to the user space view of threads (have a HANDLE, an id, you know the CPU preempts it to resume a different thread). In kernel,
is the term thread synonymous to CPU?

By the way, I found that in NDIS there is also NDIS_RW_LOCK_EX, which can be used at dispatch level. If it would be to make a choice between it and spin lock, perhaps the only difference between them is read-write lock versus non-condition-check lock.

> Thanks!

"> ok, so the waking of a thread due to mutex released happens as a result
of
> an interrupt.

And most especially, you cannot wake a thread from a mutex at DIRQL
levels, that is, from an ISR. Note that there are special rules about
calling KeReleaseMutex from DISPATCH_LEVEL (see the docs)."

I meant, when the system wakes a thread. When you call release, there will
be a DPC queued, and when it’s time for it to be processed, the kernel
will raise the IRQL to dispatch. As I understood from what I read.

I don’t follow any of the above. When the scheduler dispatches a
schedulable thread, it sets a series of parameters that define the “thread
context”. In the case of a user app, this includes setting the virtual
memory map to point to the correct VM map for the process. In the case of
a kernel thread, the VM map for the kernel space is guaranteed to be
identical in all kernel-level threads, or user-level threads that have
entered the kernel. There is only one kernel-level VM map. In user
space, this set of addresses does not exist, and an attempt to user a
kernel address gives an access fault.

When you call release, there is no reason to expect a DPC is involved.

Generally, when there are DPCs queued, any time a core drops to below DPC
level, a DPC queued on that core’s DPC queue will be dispatched, and
indeed, the level is raised to DISPATCH_LEVEL. This is an
oversimplification of what is really happening, and not quite precise, but
for your purposes it is close enough. There is no preemptive scheduler
for DPCs; a DPC must run to completion, and do so within a limited time
budget (typically something like 50us, 0.000050 seconds). A DPC cannot be
preempted except by a hardware interrupt, and, in fact, the reason for
using DPCs is to minimize interrupt latency so fast-response devices will
not become starved or suffer data over/under-run.

"> Some functions will be called at dispatch level. So this should mean
that
> those functions will not be ran by multiple threads.

Not true. You could have multiple threads running on multiple cores, each
thread running at DISPATCH_LEVEL. And therefore, running concurrently."

"No. What it means is that those functions will be called at IRQL ==
DISPATCH_LEVEL. That is a constraint on what you can do (you cannot
block
except on a spinlock).
That does not mean the function is serialized. Another CPU, also at
DISPATCH_LEVEL (of course) can call that function simultaneously. By
definition that other CPU is executing a different ‘thread’. With 16
cores
you could have 16 threads (at dispatch level) active in your entry point.
"

It seems I misunderstood the concept of ‘thread’ here. I thought that only
when the code is run under the management of a thread scheduler it could
be called a thread. I understand, then, that these kinds of threads also
have context info, similar to threads managed by the scheduler (and
similar to user space threads).

Yes, note that I said that “thread” is a slippery concept at kernel level.
A thread is, strictly speaking, defined by a register set and a stack.
The register set includes the registers that make memory maps available,
hence a “kernel thread” is a set of registers such that the kernel address
context is defined, and the stack is a stack in the kernel.

The key here is that there is a scheduler responsible for dispatching
user-level threads, and the entities called “kernel threads”, all of which
are nominally running at PASSIVE_LEVEL, by setting up these register
contexts, including [E/R]SP, and the memory map pointers. However, when a
user-level thread calls the kernel, the kernel entry routine provides a
different context, including making the kernel address space visible, and
switching to a kernel stack for execution. But an interrupt also does
this; the interrupt handler has a private register set, and switches to a
kernel stack, and makes the kernel map available. Think of this as
happening by heavy-duty magic (I’ve written many ISRs, and the first dozen
or so instructions are Very Deep Magic, then the next couple hundred
instructions are merely Deep Magic, and finally you are in full kernel
mode with a stack and everything. Logically, it looks like a separate
thread of execution, but if you look around, you will see that some other
thread, such as a PASSIVE_LEVEL user thread, a PASSIVE_LEVEL kernel
thread, or a DPC “thread” was executing. The concurrency looks like any
normal thread concurrency, since it is “concurrent” with the executing
state it “hijacked”, which means that if there are data structures which
you must access from two or more levels, such as ISR and DPC, or ISR and
PASSIVE_LEVEL, you have to use interrupt-level synchronization in those
other threads. These synchronization primitives raise the processor level
to DIRQL level (thus prohibiting interrupts on the current core) and sets
the ISR lock in the KINTERRUPT_OBJECT (thus prohibiting interrupt dispatch
from other cores). This allows you to do things like set the device
register values for multiple device registers without an interrupt coming
in the middle of your setup.

Your concept of “threads” is very simplistic, since you think of it in
terms of the scheduler dispatching “thread objects”, but the truth is that
a “thread” is essentially defined as a register context and a stack, and
nothing more. Thus, any mechanism that provides a register context can be
thought of as a “thread scheduler”. The interrupt hardware and the first
few hundred instructions thus fulfill the requirements of being a
“scheduler” because what they do is put the core into a particular
register set/stack state which is different than the register set/stack
state that had been executing. The DPC mechanism thus also forms a
“scheduler” because each time a DPC is executed, the core is given a
register set/stack state that is not the same as it previously had. So
you need a broader definition of what constitutes “thread” than the more
restrictive view you have.

My question should then be reformulated:
Some functions will be called at dispatch level. So this should mean that
if I have 4 CPUs, those functions will be run by max 4 threads. I suppose
it’s possible to have more threads than CPUs in kernel as we have in user
space; and this mapping of multiple threads to fewer CPUs can only be done
by a thread manager / scheduler. So there will be parts in kernel that
will be run by dynamically created threads (threads created by a thread
manager), at IRQL < dispatch.

It is not only possible to have more threads than cores, it is typical.
There may be hundreds of threads, and a huge number of them might be
marked as “runnable”, but you can only run as many contexts as cores;
that’s a “maximum” requirement–you could be running fewer threads than
cores, and some cores will be doing their “idle loop”, which is not
necessarily a “loop”. In modern CPUs, it may be a HALT instruction or
something similar that drops that core’s power consumption to a much lower
value, particularly valuable for multicore systems on laptops, tablets,
and cell phones.

But this does not mean that all “threads” are running at PASSIVE_LEVEL.
Taking our looser definition as register set/stack state, a 8-core system
might have one core running a user-level thread in an app, another core
running a kernel-only thread (created by PsCreateThread in a driver, for
example) at PASSIVE_LEVEL, and during the execution of that thread it
might have transitions to high-priority PASSIVE_LEVEL (a kernel thread
that acquires a mutex by KeWaitFor… is boosted to level 31 priority),
transitions to DISPATCH_LEVEL (which would happen when it acquires a spin
lock) or even to DIRQL levels (which would happen when the thread had to
synchronize with state also used by an ISR). Similarly, the a core might
be running a DPC at DISPATCH_LEVEL, which may have transitions up to DIRQL
levels to synchronize access to state shared with an ISR. And so on.
Some cores may be running ISRs, so they will be at DIRQL level threads, as
defined by our broader concept of “thread”.

“Note that “threads” are a slippery concept at the kernel level, since an
ISR “hijacks” some innocent bystander thread, and in effect runs
“concurrent” with that thread. Similarly, DPC routines “hijack” some
thread. So you have to think of the register context defining a “thread”
rather than a dispatcher artifact like thread ID.”

It may be that I’m still confused about it, accustomed to the user space
view of threads (have a HANDLE, an id, you know the CPU preempts it to
resume a different thread). In kernel,
is the term thread synonymous to CPU?

A “thread”, as represented by a KTHREAD object, is a fully-schedulable
entity created by the thread subsystem and managed by the scheduler. A
“thread handle” is a user-level synonym for a KTHREAD object, found by
doing a lookup in the process’s handle table. But “managing” these
threads means that the scheduler, in order to dispatch them, sets the
register/stack state to the correct one for that thread; when the thread
is preempted or descheduled for any reason, that register/stack state is
saved as part of the thread description, and restored whenever the
scheduler wishes to dispatch the thread.

But DPCs and ISRs are “scheduled” by other means; all that the
“scheduling” requires is that the correct register/stack state be set.
The first few hundred instructions of the kernel’s interrupt handler (the
code that eventually calls your ISR) create this state, so the ISR
“thread” is “scheduled” by the hardware. Similarly, DPCs are “scheduled”
by the DPC mechanism, which from your viewpoint is pure magic, but when
you are in a DPC, its state is defined by a register set and stack. A DPC
can be preempted by an ISR, and an ISR can be preempted by an interrupt of
higher priority. So the component called “the scheduler” in the kernel
architecture discussions in various books is only one of many “schedulers”
that cause “threads” to execute.

By the way, I found that in NDIS there is also NDIS_RW_LOCK_EX, which can
be used at dispatch level. If it would be to make a choice between it and
spin lock, perhaps the only difference between them is read-write lock
versus non-condition-check lock.

An ERESOURCE (Executive Resource) is an example of a
multiple-reader-single-writer lock. I don’t know if NDIS uses an ERSOURCE
or has its own private mrsw implementation. There are, in addition,
spin-locks that support mrsw behavior, and they can be executed at
DISPATCH_LEVEL. I don’t know what you mean by “non-condition-check” lock.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

In kernel, your code may be executing in a context of:

  1. Non-system thread, in kernel mode. This happens when an usermode process made a Read/Write/DeviceIoControl call (socket send/recv also go through these calls). The user thread then enters kernel mode through a defined gate, and continues through the drivers on the device stack (including possibly your filter routine). Since multiple usermode threads may be sending/receiving data at the same time, your filter routine may be called by different threads.

  2. System thread. It’s a thread created by some driver, as a worker thread, or an otherwise dedicated thread. For example, srv.sys creates its own worker threads. Those threads could also call send/recv equivalents. There are multiple such threads.

The threads run on PASSIVE_LEVEL, but may temporarily raise to DISPATCH LEVEL (and above). Some callback may be called while on raised IRQL.

  1. DPC routine. A DPC routine is not associated with any thread (this is an arbitrary thread context). A DPC routine can run in parallel with other DPC routines on other CPUs. A DPC routine can only be interrupted by an ISR.

  2. Interrupt routine (ISR).

The keyword here is STACK - this is what makes a thread different from any other unit of execution
(like, say, DPC). As long as a unit of execution has its own place where its execution context can be saved,
it can be made scheduleable not only on FIFO basis, and it can go blocking on dispatcher constructs.Unlike
threads, DPCs haven’t got their own stacks. In a way, they still maybe “scheduled” (because the order of DPC execution does not necessarily have to depend on the order of their arrival to DPC queue), but DPC B cannot start running on the CPU before DPC A returns control solely due to the fact that DPC A has no place to save its execution context …

Anton Bassov

Nope.

IRQL is a per-CPU concept. You can be in RoutineA running at IRQL DISPATCH_LEVEL on CPU0, while that same code, RoutineA, is running at IRQL PASSIVE_LEVEL on CPU1. This means the code running on CPU1 could be pre-empted, and thus RoutineA could be re-entered on CPU1 an arbitrary number of times.

This is why serialization is “important” in Windows driver.

Mr. Grig is, once again, both clear and concise on this topic. Let me amplify his already good response:

Windows has only one thing called a thread, and when you’re running code, you’re executing in the context of SOME thread.

Specifically, as enumerated by Mr. Grig, we have:

  1. User-Threads: Some specific user process is mapped into the low half of the process address space.

  2. System-Threads: Created in the context of the SYSTEM process. No user process mapped into the low half of the process address space.

  3. Deferred Procedure Call: A kernel-mode call back that runs at IRQL DISPATCH_LEVEL and in the context of an ARBITRARY thread and process, and on a dedicated system-supplied stack.

  4. ISR: A kernel-mode call back that runs at DIRQL (see note) as a result of a hardware device changing its state and requesting service. Runs in the context of an ARBITRARY thread and process.

The definition of “ARBITRARY thread and process” is: A thread and process that we cannot predict in advance. In other words, ANY unspecified thread and process. It could be a user-thread. It could be a system thread.

Peter
OSR

Note from above about ISR’s running at DIRQL: There is an option to elect PASSIVE_LEVEL processing of ISR’s that is mostly of interest to drivers for Simple Peripheral Bus, GPIO, and devices connected via serial port controllers. In other words “special case” stuff, mostly on SoC systems today. Ignore this for now, it’s entirely tangential to the point.

Both an ISR and a DPC have a stack. The difference is that the stack does
not persist beyond the execution of the code, and therefore annot be used
to hold any persistent state. Once an ISR or DPC terminates execution,
the stack is discarded, and the space used can be recycled for another
purpose, although I have this nagging memory that it will likely be
re-used as a stack for some other ISR or DPC. Key here is tge
understanding tat a “snapshot” of state cannot be taken, altough in tbe
case of one ISR preempting another because it is handling a
higher-hardware-priority device, the stack of te preempted ISR remains
intact.

As I said earlier, the concept of “thread” gets a little slippery…just
when you think you have a solid definition, it morphs out from under you.
joe

The keyword here is STACK - this is what makes a thread different from
any other unit of execution
(like, say, DPC). As long as a unit of execution has its own place where
its execution context can be saved,
it can be made scheduleable not only on FIFO basis, and it can go blocking
on dispatcher constructs.Unlike
threads, DPCs haven’t got their own stacks. In a way, they still maybe
“scheduled” (because the order of DPC execution does not necessarily have
to depend on the order of their arrival to DPC queue), but DPC B cannot
start running on the CPU before DPC A returns control solely due to the
fact that DPC A has no place to save its execution context …

Anton Bassov


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Peter, I think it’s more useful to say that DPCs don’t run in the context of any thread. Certainly the person who walked me through the kernel when I was just getting started looked at it that way, and I’ve always subscribed to that view. Given that all currently surviving versions of Windows use a dedicated stack for DPCs reinforces this in my mind. The older versions where (on some processor architectures) you could end up executing a DPC on the running thread’s stack were more arguably running in the context of that thread.

  • Jake

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Monday, October 28, 2013 9:50 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Synchronization in Kernel

Nope.

IRQL is a per-CPU concept. You can be in RoutineA running at IRQL DISPATCH_LEVEL on CPU0, while that same code, RoutineA, is running at IRQL PASSIVE_LEVEL on CPU1. This means the code running on CPU1 could be pre-empted, and thus RoutineA could be re-entered on CPU1 an arbitrary number of times.

This is why serialization is “important” in Windows driver.

Mr. Grig is, once again, both clear and concise on this topic. Let me amplify his already good response:

Windows has only one thing called a thread, and when you’re running code, you’re executing in the context of SOME thread.

Specifically, as enumerated by Mr. Grig, we have:

  1. User-Threads: Some specific user process is mapped into the low half of the process address space.

  2. System-Threads: Created in the context of the SYSTEM process. No user process mapped into the low half of the process address space.

  3. Deferred Procedure Call: A kernel-mode call back that runs at IRQL DISPATCH_LEVEL and in the context of an ARBITRARY thread and process, and on a dedicated system-supplied stack.

  4. ISR: A kernel-mode call back that runs at DIRQL (see note) as a result of a hardware device changing its state and requesting service. Runs in the context of an ARBITRARY thread and process.

The definition of “ARBITRARY thread and process” is: A thread and process that we cannot predict in advance. In other words, ANY unspecified thread and process. It could be a user-thread. It could be a system thread.

Peter
OSR

Note from above about ISR’s running at DIRQL: There is an option to elect PASSIVE_LEVEL processing of ISR’s that is mostly of interest to drivers for Simple Peripheral Bus, GPIO, and devices connected via serial port controllers. In other words “special case” stuff, mostly on SoC systems today. Ignore this for now, it’s entirely tangential to the point.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>Given that all currently surviving versions of Windows use a dedicated stack for
DPCs reinforces this in my mind.

I’ve seen KeRetireDpcList (?) on the stacks of threads in (if I remember correctly) Win2008 (maybe R2).

> >Given that all currently surviving versions of Windows use a dedicated stack for

DPCs reinforces this in my mind.

I’ve seen KeRetireDpcList (?) on the stacks of threads in (if I remember correctly) Win2008 (maybe R2).

IIRC this was idle thread stack, which is the one used for DPCs.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>> I’ve seen KeRetireDpcList (?) on the stacks of threads in (if I remember correctly) Win2008 (maybe R2).

IIRC this was idle thread stack, which is the one used for DPCs.

That was a real thread stack.

Not that it truly matters, but the Idle thread stack is different from the
DPC stack.

For amusement/clarification I walk through some of the details with the
debugger here:

http://analyze-v.com/?p=598

(Usual caveats of “implementation detail subject to change” apply)

-scott
OSR

wrote in message news:xxxxx@ntdev…

> I’ve seen KeRetireDpcList (?) on the stacks of threads in (if I remember
> correctly) Win2008 (maybe R2).

IIRC this was idle thread stack, which is the one used for DPCs.

That was a real thread stack.

I once had an interesting discussion with a hardware guy. I was doing platform bring-up on a new computer for which he was the hardware designer. It had two processors, and the hardware engineer assumed that each of them would have a specific job. He kept asking me “but which one owns the screen – they can’t both own the screen.” (This was an old RS/6000 and the video subsystem would crash if there were two simultaneous accesses to the video frame buffer.)

I finally got him to go away, if not particularly satisfied, when I told him that processors were like hands that do things and the software is the brain. The brain has to ensure that your two hands don’t try to both do the exact same thing simultaneously.

I think that he mostly didn’t like my answer because he wanted a metaphor where the processor was a brain and the software was ancillary.

  • Jake Oshins
    (former IBM guy)
    Windows Kernel Team

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Saturday, October 26, 2013 2:29 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Synchronization in Kernel

Hello,

I’ve got a few things unclear:

  1. If the threads and processes are a mechanism created by the kernel (which I know it to be true), what causes the need of synchronization in Kernel? Is there each core of a CPU assigned to a part of the kernel, or how does it happen?

  2. When should I expect concurrency in kernel: in any callback called by the kernel? (exception must surely be where it says in doc that callback X is called after Y is finished)

  3. What’s the difference between a fast/guarded mutex and spinlock?

Thanks!


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>I’ve seen KeRetireDpcList (?) on the stacks of threads in (if I remember
correctly) Win2008 (maybe R2).

Here is an example (2008 R2):

fffff880027dfb20 fffff88001006761 msdsm!DsmpRequestComplete+0x243
fffff880027dfb80 fffff8000249ea91 mpio!MPIOPdoCompletion+0x5c1
fffff880027dfc30 fffff8000293819f nt!IopfCompleteRequest+0x3b1
fffff880027dfd10 fffff8800145b7d8 nt!IovCompleteRequest+0x19f
fffff880027dfde0 fffff8800146244e storport!RaidUnitCompleteRequest+0x208
fffff880027dfec0 fffff800024a6b1c storport!RaidpAdapterRedirectDpcRoutine+0x4e
fffff880027dff00 fffff8000249e165 nt!KiRetireDpcList+0x1bc
fffff880027dffb0 fffff8000249df7c nt!KyRetireDpcList+0x5
fffff8801be00e70 fffff800024e731c nt!KiDispatchInterruptContinue
fffff8801be00ea0 fffff8800145ab21 nt!KiDpcInterrupt+0xcc
fffff8801be01030 fffff8800145a71d storport!RaUnitScsiIrp+0x3c1
fffff8801be010f0 fffff8000293e750 storport!RaDriverScsiIrp+0x5d
fffff8801be01130 fffff88001007946 nt!IovCallDriver+0xa0
fffff8801be01190 fffff88001007a89 mpio!MPIOReadWrite+0x3f2
fffff8801be01230 fffff8800100936d mpio!MPIOPdoHandleRequest+0xdd
fffff8801be014d0 fffff880010070bf mpio!MPIOPdoInternalDeviceControl+0x45
fffff8801be01540 fffff880010069d3 mpio!MPIOPdoCommonDeviceControl+0x6bb
fffff8801be015c0 fffff8800100178a mpio!MPIOPdoDispatch+0x1b3
fffff8801be01610 fffff8000293e750 mpio!MPIOGlobalDispatch+0x12
fffff8801be01640 fffff88002103445 nt!IovCallDriver+0xa0
fffff8801be016a0 fffff88002103975 CLASSPNP!ServiceTransferRequest+0x355
fffff8801be01740 fffff8000293e750 CLASSPNP!ClassReadWrite+0xd5
fffff8801be01790 fffff8800102b0af nt!IovCallDriver+0xa0
fffff8801be017f0 fffff8000293e750 partmgr!PmGlobalDispatch+0x9f
fffff8801be01820 fffff80002708828 nt!IovCallDriver+0xa0
fffff8801be01880 fffff800027084bd nt!RawReadWriteDeviceControl+0xa8
fffff8801be018c0 fffff8000293e750 nt!RawDispatch+0x7d
fffff8801be01920 fffff8800139a6af nt!IovCallDriver+0xa0
fffff8801be01980 fffff8000293e750 fltmgr!FltpDispatch+0x9f
fffff8801be019e0 fffff800027a371b nt!IovCallDriver+0xa0
fffff8801be01a40 fffff800027ae183 nt!IopSynchronousServiceTail+0xfb
fffff8801be01ab0 fffff8000249a8d3 nt!NtWriteFile+0x7e2
fffff8801be01bb0 000000007754139a nt!KiSystemServiceCopyEnd+0x13