KeWaitForSingleObject() can only reach 0.01sec timeout

> more likely 10-year scenario is a computer with 100 processors, each one

executing a single process without EVER yielding the CPU.
… and then if one of them does spin almost 100% (interrupts closed, 1 thread), nobody cares…

-------------- Original message --------------
From: Tim Roberts

> Mike Kemp wrote:
> >
> > But I bet MS and Intel are already designing a future system to bring
> > context switching forward from the 1980’s to this century. I imagine
> > that in 5 years all our current apps will be running in a virtual
> > legacy machine so they can continue to work so slow, while our next
> > apps will cruise along in the fast lane, perhap with a nS of
> > processing here and there when they really need it…:slight_smile:
>
> I think your VM vision is likely, but I’m not sure the context switching
> revolution you envision will EVER happen. We’re talking about a
> mechanism to make the user think that his computer is doing 100 things
> at once. Human reaction times have not changed since the 1980s, and are
> unlikely to change significantly in the foreseeable future. Indeed, a
> more likely 10-year scenario is a computer with 100 processors, each one
> executing a single process without EVER yielding the CPU.
>
>
> > (oh and driver kernel support that runs at an untrusted level to keep
> > the machine safe but driver enabled, and - but sorry, time to wake
> > up… the VS SP1 has finally downloaded!)
>
> Be aware that it is not universally agreed that VS SP1 actually
> represents an improvement to VS2005.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

anton bassov wrote:

In order to make KeStallExecutionProcessor() work the way MSDN describes it
maskable interrupts have to be disabled. It looks like MSFT “forgot” this part…

I don’t think they forgot. The function name and the WDK description might be a bit misleading, but I think that is the intended behavior. And this is how most uSleep implementations work.

If you want the “other” behavior, you can always raise IRQL before calling KeStallExecutionProcessor.

KeStallExecutionProcessor() is based upon infinite loop that makes RDTSC instruction

Not really significant to the issue, but note that the exact timer used by the routine is different depending on the HAL.

> I don’t think they forgot. The function name and the WDK description might be a

bit misleading, but I think that is the intended behavior.

Actually, the only thing I am saying is that they should have documented its behaviour properly…

If you want the “other” behavior, you can always raise IRQL before calling
KeStallExecutionProcessor.

Sure - the only problem is that MSDN does not even suggest that this step may be ever required, so that you just would not know it unless you disassemble KeStallExecutionProcessor()

Not really significant to the issue, but note that the exact timer used by the
routine is different depending on the HAL.

Actually, it is quite significant…

For example, on UP HAL KeQueryPerformanceCounter() relies upon the system timer, rather than
RDTSC instruction. There is a good chance that the same holds true for KeStallExecutionProcessor() as well. If this is the case, then the minimal period you can expect is measured already in ms…

I believe MSDN should have just explained all that - otherwise, people may start using KeStallExecutionProcessor() when they expect real-time results…

Anton Bassov

Thanks for the comments. I appreciate the insight into the OS operation and the lines of thought of the experts here… I’ve just a few notes in the hope of encouraging the OS designers to consider some speed issues…

It’s not “very useful”. It might be useful in certain rather specific
situations, but a reliance on them often indicates a misunderstanding of
the problem, or an attempt to work around buggy hardware that should
probably be hidden in a kernel driver anyway.

It would be very useful to me. I had understood that the best advice was to do everthing possible in user mode, and the least you can get away with (preferably nothing) in the kernel. I don’t think we should be forced to write a driver just to do this. An example is a device I have that I address over ethernet: ergo I don’t have to write a driver. But I have to throttle data to it to one packet every mS or so. How can I do that without hogging the CPU (or do what I do, which is loop on the high precison timer with a Sleep(0) so I can yield, but the user still sees 100% CPU. Note this is not a real time requirement, but it is a time sensitive requirement that seems to fall foul of the current user mode api)?

the cost of task switching is amortized over the time each process has to do useful work.

An ideal candidate for some hardware support then. Where’s my Duodecium chip?

KeStallExecutionProcessor() is based upon infinite loop that makes RDTSC instruction …Therefore … context switches … may occur while this loop runs. There is no guarantee that KeStallExecutionProcessor() will return strictly after the requested interval elapses- some delays are possible…And this is how most uSleep implementations work.

This is great news - all we need is an implementation of uSleep() in user space and my problem at least is solved (BTW the code works fine wholly in user space on OSX using uSleep()).

I’m not sure the context switching revolution you envision will EVER happen

Arthur C Clarke’s nth Law: “When an eminent scientest tells you something is possible he is almost ceratinly right, when he tells you it is not possible he is almost certainly wrong”.

Human reaction times have not changed since the 1980s, and are
unlikely to change significantly in the foreseeable future

Funny how my computer is always just a bit too slow for me then. Always was (I hope not always will be). Little things like seeing what I type appear (e.g. in Word) always without fail immediately. I wonder if anyone else has any examples of computers not being fast enough for the human using them, or is it just me? Also my hearing is pretty sensitive to context switches in a data stream!

a more likely 10-year scenario is a computer with 100 processors, each one
executing a single process without EVER yielding the CPU.

Quite possible on the first part, but as long as we can have more threads than processors there will still be task switching. I use 256 threads at one point.

M

anton bassov wrote:

For example, on UP HAL KeQueryPerformanceCounter() relies upon the system timer,
rather than RDTSC instruction. There is a good chance that the same holds true for
KeStallExecutionProcessor() as well. If this is the case, then the minimal period you
can expect is measured already in ms…

Any timer used by KeQueryPerformanceCounter has a resolution of ~us or better.

Specifically, the legacy PIT 8254 timer has a resolution of ~0.84 us

xxxxx@rahul.net wrote:

anton bassov wrote:

> For example, on UP HAL KeQueryPerformanceCounter() relies upon the system timer,
> rather than RDTSC instruction. There is a good chance that the same holds true for
> KeStallExecutionProcessor() as well. If this is the case, then the minimal period you
> can expect is measured already in ms…
>

Any timer used by KeQueryPerformanceCounter has a resolution of ~us or better.

Specifically, the legacy PIT 8254 timer has a resolution of ~0.84 us

Yes, but there is a caveat. The 8254 sits on a legacy I/O port, and it
can actually take multiple microseconds to read the countdown timer
value. It is a *very* expensive operation, relatively speaking.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim Roberts wrote:

> Specifically, the legacy PIT 8254 timer has a resolution of ~0.84 us
>

Yes, but there is a caveat. The 8254 sits on a legacy I/O port, and it
can actually take multiple microseconds to read the countdown timer
value. It is a *very* expensive operation, relatively speaking

That’s very true. I was curious and checked a couple of UP hals.

Windows 2000 SP4 UP (non ACPI):

KeQueryPerformanceCounter: PIT
KeStallExecutionProcessor: software pre-calibrated loop

Windows XP SP2 UP (ACPI):

KeQueryPerformanceCounter: PM/ACPI timer
KeStallExecutionProcessor: PM/ACPI timer

XP uses an indirect pointer, the actual timer routine used is decided at runtime.

The software loop used by W2K is an interesting idea. On one hand is very cheap and would be very accurate if not interrupted. But it is the method most affected by interrupts.