generic questions about communicaiton between kernel mode and user mode

Speer, Kenny wrote:

I don’t agree with this. I primarily use WMI as the control path for drivers and have been doing so for at least the last 8 years. Yes, there are some bugs in the object model but overall, it just works. I find that new college grads can get a handle on it fairly quickly with the help of working code and google. I see that you stated it does have its merits, but I would not turn off new developers so quickly from using WMI.

If you are talking C#, then I agree. The CLR makes it easy. My comment
was strictly aimed at native C++.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

>I have never seen a case where shared memory is the right solution

NDSPI provider with kernel bypass.

>I once needed a version of bsearch that returned not the equal pointer, but a pointer that was either equal to or less than, such that the next pointer would have been greater than.

These days, you would just use std::upper_bound().

> >I once needed a version of bsearch that returned not the equal pointer, but a pointer that was either

equal to or less than, such that the next pointer would have been greater than.

That’s why I write all bsearch, trees etc myself.

These days, you would just use std::upper_bound().

In the kernel?


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>> These days, you would just use std::upper_bound().

In the kernel?

It doesn’t throw.

Joe,

I just wonder how your huge paragraphs are related to the topic that you were supposed to discuss,
i.e. the one of sharing memory (which, btw, you had introduced yourself - no one was even remotely suggesting that the OP should use shared memory ).Why can’t you simply say “I was wrong”…

Anton Bassov

>I have never seen a case where shared memory is the right solution, have

thus never used it, and disagree with using it

It can be helpful in latency sensitive solutions. While it’s true that a
user/kernel switch is rather cheap nowadays in terms of cycles, what is
often overlooked is that a user/kernel transacation can cause a thread
switch to occur. This matters for instance if multiple buffers are to be
processed per cycle, when would otherwise require multiple IOCTLS and
user/kernel switches.

A real-time user thread (for instance that needs to fill up buffer) can run
longer than its assigned quantum if it does not perform a user/kernel switch
and no clock interrupt occurs in the meantime.

There are designs in which the user thread never switches into kernel mode
even once (by means of an IOCTL or event) but instead marks a bit in the
shared memory that the driver can pick up to see that the work is done.

Even without signaling at all, it’s possible that the driver just picks
whatever is in the buffer to the hardware whenever it’s due.

Also, shared memory is always locked down and as such avoids hard
pagefaults. It’s a guaranteed way for a user application to get resident
memory, while VirtualLock is not.

//Daniel

> what is often overlooked is that a user/kernel transacation can cause a thread switch to occur.

This matters for instance if multiple buffers are to be processed per cycle, when would otherwise
require multiple IOCTLS and user/kernel switches.

A real-time user thread (for instance that needs to fill up buffer) can run longer than its assigned
quantum if it does not perform a user/kernel switch and no clock interrupt occurs in the meantime.

The only situation when your statement is valid is the one when some kernel component makes either a blocking call to KeWaitXXX or signals an event, effectively unblocking some thread of higher priority, and does so transparently to your app. If this is what you mean than you are correct. However, a suggestion that a kernel call per se may cause a switch is simply wrong - it has always been under Windows NT right from the day one. It might have been true under early UNICes (and under not-so-early early Linux as well - IIRC, kernel preemption has not been introduced until 2.6 series). Under these systems context switch was
considered only upon the return to the userland…

Also, shared memory is always locked down and as such avoids hard pagefaults. It’s a guaranteed
way for a user application to get resident memory, while VirtualLock is not.

This is true…

Even if no page faults occur mapping buffers and copying data may add a significant overhead if done on regular basis, particularly if the amounts of data are large.Page faults will make the things order of magnitude worse. If we take into consideration that we are speaking about the system that may swap out pages for no reason whatsoever, this factor may be a significant one…

Anton Bassov

>The only situation when your statement is valid is the one when some kernel

component makes either a blocking call to KeWaitXXX or signals an event,
However, a suggestion that a kernel call per se may cause a switch is
simply wrong -

Waits and pagefaults are some of those places where Windows checks to see if
there is a better thread candidate to switch too. But the list of
conditions that is published is mentioned to be non exhaustive. I believe
DeviceIoControl, which is needed for the inverted call model, triggers that
too, perhaps because it makes use of events under the hood.

//Daniel

> I believe DeviceIoControl, which is needed for the inverted call model, triggers that too,

perhaps because it makes use of events under the hood.

Well, in this sense the use of shared buffer does not eliminate switches either - after all, an app and
a driver need to synchronize their access to it, right…

Anton Bassov

>A real-time user thread (for instance that needs to fill up buffer) can run longer than its assigned quantum if it does not perform a user/kernel switch and no clock interrupt occurs in the meantime.

Well, how do you think the time slice does expire, if not on a clock interrupt?

>Well, how do you think the time slice does expire, if not on a clock

interrupt?

On a clock interrupt or when calling into a Ke function which affects thread
state, or hitting a pagefault or by getting interrupted by a device
interrupt which completes an I/O request of another process, or possibly
something else.

//Daniel

> DeviceIoControl, which is needed for the inverted call model, triggers that

too, perhaps because it makes use of events under the hood.

Synchronous - yes, overlapped - no.

Neither is user/kernel transition.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> Well, in this sense the use of shared buffer does not eliminate switches either - after all, an app and

a driver need to synchronize their access to it, right…

Yes, and this is the worst thing about the shared memory.

Shutting it down can also be a problem, but only if it is umode-allocated. If this is a DMA common buffer (which is aforementioned WaveRT audio uses) - then no problem.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> Well, how do you think the time slice does expire, if not on a clock interrupt?

:slight_smile:

More so: user/kernel switch does not mean rescheduling.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> >Well, how do you think the time slice does expire, if not on a clock interrupt?

On a clock interrupt or when calling into a Ke function which affects thread state, or
hitting a pagefault or by getting interrupted by a device interrupt which completes an I/O
request of another process, or possibly something else.

Well, you somehow mixing up voluntary yield (i.e blocking in case of pagefault), forceful preemption by some thread of higher priority (in case of DPC signaling an event or completing a request), and actual timeslice expiry (i.e. a situation when another thread of the same priority may be considered for being scheduled on a given CPU) in the same pot. As Alex has already pointed out, the latter may happen only upon timer interrupt simply because the system has no notion of time in between two clock ticks…

Anton Bassov

>As Alex has already pointed out, the latter may happen only upon timer

interrupt simply because the system has no notion of time >in between two
clock ticks…

That was true before Vista, when it used a Monte Carlo way of measuring. It
was unfair because this could also charge time spent in interrupts to a
thread. Nowadays Windows does not rely on a clock interval based quantum but
uses an accurate CPU cycle counter
for that.

//Daniel

Thanks, Daniel - I did not know that…

Anton Bassov

> thread. Nowadays Windows does not rely on a clock interval based quantum but

uses an accurate CPU cycle counter

How?

Quantum end must be signaled, i.e. as an interrupt.

How can CPU cycle counter signal an interrupt?


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> Quantum end must be signaled, i.e. as an interrupt.

Well, the only thing that has to be done as far as quantum is concerned is to check thread-specific counter
before resuming the thread after DPC queue has been flushed. The immediate consequence to it is that, with this approach, you can allow quantum expiry upon ANY interrupt. Furthermore, you can check it upon voluntary yielding as well - when the thread is allowed to run again you can see what the situation with its quantum is, and put it into the corresponding place in the runqueue…

How can CPU cycle counter signal an interrupt?

It cannot do it, but it does not have to…

Anton Bassov