question about KeStallExecutionProcessor

OSR_Community_User · November 28, 2011, 2:02am

The discription in WDK document about this function is:

KeStallExecutionProcessor is a processor-dependent routine that busy-waits for at least the specified number of microseconds, but not significantly longer.This routine is for use by device drivers and other software that must wait for an interval of less than a clock tick but more than for a few instructions.

As far as I know, a clock tick is ten nanosecond which is less than a microseconds.
The first sentence says “at least the specified number of microseconds”. However, the second sentence become “an interval of less than a clock tick”. Isn’t them incompatible?

Thanks for your reply.

Ken_Johnson · November 28, 2011, 2:11am

Clock ticks are generally on the order of 10 milliseconds* and not 10 nanoseconds.

*: As a general ballpark value. The clock tick interval on an actual machine may vary.

S (Msft)

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of daedae11
Sent: Sunday, November 27, 2011 11:02 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] question about KeStallExecutionProcessor

The discription in WDK document about this function is:

KeStallExecutionProcessor is a processor-dependent routine that busy-waits for at least the specified number of microseconds, but not significantly longer.This routine is for use by device drivers and other software that must wait for an interval of less than a clock tick but more than for a few instructions.

As far as I know, a clock tick is ten nanosecond which is less than a microseconds.
The first sentence says “at least the specified number of microseconds”. However, the second sentence become “an interval of less than a clock tick”. Isn’t them incompatible?

Thanks for your reply.

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Pavel_A1 · November 28, 2011, 2:23am

No, the clock tick mentioned in the docum is indeed the normal OS timer tick
(measured in milliseconds).
Ten nano is just the resolution of time interval for various DDI, it does
not correspond to anything physical.
If you need accurate delay in nanoseconds, Windows does not offer a ready
solution. Roll your own.

Regards,
– pa

“daedae11” wrote in message news:xxxxx@ntdev…
> The discription in WDK document about this function is:
>
> KeStallExecutionProcessor is a processor-dependent routine that busy-waits
> for at least the specified number of microseconds, but not significantly
> longer.This routine is for use by device drivers and other software that
> must wait for an interval of less than a clock tick but more than for a
> few instructions.
>
> As far as I know, a clock tick is ten nanosecond which is less than a
> microseconds.
> The first sentence says “at least the specified number of microseconds”.
> However, the second sentence become “an interval of less than a clock
> tick”. Isn’t them incompatible?
>
> Thanks for your reply.

Doron_Holan · November 28, 2011, 2:33am

It is pretty clear from the description, KeStallExecutionProcessor is a spin busy loop. Clock resolution is not as much a factor because you are not waiting on a dispatcher event and/or yielding the processor back to the scheduler, you are holding on to the CPU for the duration of the wait. This is why this API is only appropriate for very short wait intervals, anything longer and you are not playing nice with the needs of the rest of the os (kernel, drivers, apps) by hogging the CPU

d

debt from my phone

From: Pavel A.
Sent: 11/27/2011 11:22 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] question about KeStallExecutionProcessor

No, the clock tick mentioned in the docum is indeed the normal OS timer tick
(measured in milliseconds).
Ten nano is just the resolution of time interval for various DDI, it does
not correspond to anything physical.
If you need accurate delay in nanoseconds, Windows does not offer a ready
solution. Roll your own.

Regards,
– pa

“daedae11” wrote in message news:xxxxx@ntdev…
> The discription in WDK document about this function is:
>
> KeStallExecutionProcessor is a processor-dependent routine that busy-waits
> for at least the specified number of microseconds, but not significantly
> longer.This routine is for use by device drivers and other software that
> must wait for an interval of less than a clock tick but more than for a
> few instructions.
>
> As far as I know, a clock tick is ten nanosecond which is less than a
> microseconds.
> The first sentence says “at least the specified number of microseconds”.
> However, the second sentence become “an interval of less than a clock
> tick”. Isn’t them incompatible?
>
> Thanks for your reply.

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 28, 2011, 2:35am

Actually, on most modern machines, the clock tick is 15ms. The clock tick
on uniprocessor x86 systems was 10ms, and on multiprocessor x86 systems is
15ms. On other (no longer supported) Windows platforms, there were other
values used, but they don’t matter. In general, anything that depends on
the actual clock tick resolution and makes any assumption is probably BAD
(Broken As Designed) code. What I would tell my students in my app-level
systems programming course is that “Modern operating systems have a simple
concept of time: time moves forward. Any dependency on realtime behavior
in modern operating systems (except RTOS systems) is problematic. Between
the scheduler, OS overheads, and other factors, there isn’t a whole lot
more you can tell about time.”

I know of no system that has a 10ns tick; even on modern highly-pipelined
asynchronous opportunistic execution engines, that is only enough time to
execute between 30 and 80 instructions, assuming cache hits are optimal.
I’ve not seen a modern OS that supports clocks faster than 5ms, and that
was an exceptional one for a figure that low. General-purpose operating
systems lived for years with 50/60Hz clocks (line frequency), and even
that was pushing some of the older architectures.
joe

Clock ticks are generally on the order of 10 milliseconds* and not 10
nanoseconds.

*: As a general ballpark value. The clock tick interval on an actual
machine may vary.

S (Msft)

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of daedae11
Sent: Sunday, November 27, 2011 11:02 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] question about KeStallExecutionProcessor

The discription in WDK document about this function is:

KeStallExecutionProcessor is a processor-dependent routine that busy-waits
for at least the specified number of microseconds, but not significantly
longer.This routine is for use by device drivers and other software that
must wait for an interval of less than a clock tick but more than for a
few instructions.

As far as I know, a clock tick is ten nanosecond which is less than a
microseconds.
The first sentence says “at least the specified number of microseconds”.
However, the second sentence become “an interval of less than a clock
tick”. Isn’t them incompatible?

Thanks for your reply.

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
other seminars visit: http://www.osr.com/seminars To unsubscribe, visit
the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 28, 2011, 2:41am

My impression would be, if you need an accurate delay in nanoseconds,
resdesign your hardware or driver. There’s something deeply wrong with
the world view of the hardware designer. KeStallExecutionProcessor has
always been, to my mind, an indication of the failure of the hardware
design.

By the way, getting an accurate spin loop at fine resolutions is a real
challenge. It has to work across a variety of chip sets, a variety of CPU
architectures, etc. You will need to understand EXACTLY how the
speculative execution works, how the caches work, deal with interrupt
preemption, and understand what a “serializing instruction” is and does.
Professional programmers on a closed course. Do not try this at home.

(There was a time in my life when I had to do this, and it was difficult
even without opportunistic asynchronous pipelining without caches, to
cover all models of the mainframe that might be running the device, which
was truly badly designed).
joe

No, the clock tick mentioned in the docum is indeed the normal OS timer
tick
(measured in milliseconds).
Ten nano is just the resolution of time interval for various DDI, it does
not correspond to anything physical.
If you need accurate delay in nanoseconds, Windows does not offer a ready
solution. Roll your own.

Regards,
– pa

“daedae11” wrote in message news:xxxxx@ntdev…
>> The discription in WDK document about this function is:
>>
>> KeStallExecutionProcessor is a processor-dependent routine that
>> busy-waits
>> for at least the specified number of microseconds, but not significantly
>> longer.This routine is for use by device drivers and other software that
>> must wait for an interval of less than a clock tick but more than for a
>> few instructions.
>>
>> As far as I know, a clock tick is ten nanosecond which is less than a
>> microseconds.
>> The first sentence says “at least the specified number of microseconds”.
>> However, the second sentence become “an interval of less than a clock
>> tick”. Isn’t them incompatible?
>>
>> Thanks for your reply.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Ken_Johnson · November 28, 2011, 2:44am

Hence the statement about the clock tick interval being on roughly the same order of magnitude (but not necessarily exactly) 10ms.

S (Msft)

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
Sent: Sunday, November 27, 2011 11:35 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] question about KeStallExecutionProcessor

Actually, on most modern machines, the clock tick is 15ms. The clock tick on uniprocessor x86 systems was 10ms, and on multiprocessor x86 systems is 15ms. On other (no longer supported) Windows platforms, there were other values used, but they don’t matter. In general, anything that depends on the actual clock tick resolution and makes any assumption is probably BAD (Broken As Designed) code. What I would tell my students in my app-level systems programming course is that “Modern operating systems have a simple concept of time: time moves forward. Any dependency on realtime behavior in modern operating systems (except RTOS systems) is problematic. Between the scheduler, OS overheads, and other factors, there isn’t a whole lot more you can tell about time.”

I know of no system that has a 10ns tick; even on modern highly-pipelined asynchronous opportunistic execution engines, that is only enough time to execute between 30 and 80 instructions, assuming cache hits are optimal.
I’ve not seen a modern OS that supports clocks faster than 5ms, and that was an exceptional one for a figure that low. General-purpose operating systems lived for years with 50/60Hz clocks (line frequency), and even that was pushing some of the older architectures.
joe

Clock ticks are generally on the order of 10 milliseconds* and not 10
nanoseconds.

*: As a general ballpark value. The clock tick interval on an actual
machine may vary.

S (Msft)

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of daedae11
Sent: Sunday, November 27, 2011 11:02 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] question about KeStallExecutionProcessor

The discription in WDK document about this function is:

KeStallExecutionProcessor is a processor-dependent routine that
busy-waits for at least the specified number of microseconds, but not
significantly longer.This routine is for use by device drivers and
other software that must wait for an interval of less than a clock
tick but more than for a few instructions.

As far as I know, a clock tick is ten nanosecond which is less than a
microseconds.
The first sentence says “at least the specified number of microseconds”.
However, the second sentence become “an interval of less than a clock
tick”. Isn’t them incompatible?

Thanks for your reply.

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging
and other seminars visit: http://www.osr.com/seminars To unsubscribe,
visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · November 28, 2011, 3:27am

> In general, anything that depends on the actual clock tick resolution and makes any assumption is

probably BAD (Broken As Designed) code. What I would tell my students in my app-level systems programming course is that “Modern operating systems have a simple concept of time: time moves
forward. Any dependency on realtime behavior in modern operating systems (except RTOS systems)
is problematic. Between the scheduler, OS overheads, and other factors, there isn’t a whole lot more
you can tell about time.”

True, but consider the needs of audio apps, games and other applications where right timing is just essential parameter - they are still meant to run under GPOS that cannot provide ANY guarantees related to time measurement, other than “the maximal possible precision is, say, 10-15 ms - you cannot get anything more precise than that”…

Anton Bassov

OSR_Community_User · November 28, 2011, 5:24am

On 28/11/2011 07:34, xxxxx@flounder.com wrote:

I know of no system that has a 10ns tick; even on modern highly-pipelined
asynchronous opportunistic execution engines, that is only enough time to
execute between 30 and 80 instructions, assuming cache hits are optimal.
I’ve not seen a modern OS that supports clocks faster than 5ms, and that
was an exceptional one for a figure that low. General-purpose operating
systems lived for years with 50/60Hz clocks (line frequency), and even
that was pushing some of the older architectures.

On FreeBSD it’s common to see: “Timecounters tick every 1.000 msec”; a
few years ago I also worked on a project which used a 1ms tick under
VxWorks.

–
Bruce Cran

OSR_Community_User · November 28, 2011, 5:54am

The problem is actually more complex. Consider the architecture of many
RTOS scedulers. “Repeated” actions are considered time-critical for
correct behavior, and “one-time” events are lower priority. So, if I’m
doing computer-based music, where two percussion sounds must be
synchronized within 1ms, but happen only once during the piece, these are,
by classic RTOS schedulers, irrelevant, and may or may not happen in sync.
But updating the SMPTE frame counter on the screen? Well, that has to
happen 29.97 times per second, and therefore is considered critical.

Many of the features that make real-time music work are handled either in
drivers, or by using large buffers, or by using special multimedia
interfaces which essentially live “outside” the normal scheduling
mechanism.

I have written and maintained real-time multimedia code on a variety of
platforms, and it has essentially always been a nightmare. One system was
called “Real-time Unix” and its claim to fame was a 10ms timer. By our
requirements, this was about two orders of magnitude too coarse for
serious computer music apps (100us starts to get within bounds, although
we could live with 250us)
joe

> In general, anything that depends on the actual clock tick resolution
> and makes any assumption is
> probably BAD (Broken As Designed) code. What I would tell my students in
> my app-level systems programming course is that “Modern operating
> systems have a simple concept of time: time moves
> forward. Any dependency on realtime behavior in modern operating
> systems (except RTOS systems)
> is problematic. Between the scheduler, OS overheads, and other factors,
> there isn’t a whole lot more
> you can tell about time.”

True, but consider the needs of audio apps, games and other applications
where right timing is just essential parameter - they are still meant to
run under GPOS that cannot provide ANY guarantees related to time
measurement, other than "the maximal possible precision is, say, 10-15 ms

you cannot get anything more precise than that"…

Anton Bassov

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · November 28, 2011, 6:56am

> Many of the features that make real-time music work are handled either in drivers, or by using large

buffers, or by using special multimedia interfaces which essentially live “outside” the normal
scheduling mechanism.

Well, although you can write a custom microkernel/HAL that runs GPOS as a lowest-priority task,
this “RTOS” must be self-contained, i.e. it cannot rely on any services that GPOS provides, for understandable reasons. However, playing music invariably involves dealing with either storage or network, because the actual data to be played must be coming from somewhere, which means GPOS services are still required.

Therefore, “GPOS as a task in RTOS” concept does not seem to apply here - you still forced to rely (at least partly) upon the scheduler that GPOS provides…

Anton Bassov

Pavel_A1 · November 28, 2011, 10:54am

On 28-Nov-2011 09:34, xxxxx@flounder.com wrote:

Actually, on most modern machines, the clock tick is 15ms. The clock tick
on uniprocessor x86 systems was 10ms, and on multiprocessor x86 systems is
15ms.

This “15 ms” is actually ~ 15.625 ms.
If you run a sequence of Sleep(1), the difference of GetTickCount()
around each Sleep() is 15 or 16 ms, interleaving.

– pa

Alex_Grig · November 28, 2011, 11:13am

“By our requirements, this was about two orders of magnitude too coarse for
serious computer music apps (100us starts to get within bounds, although
we could live with 250us)”

Was there a requirement for maintaining phase-coherent C of 5th octave (about 4 kHz?).

By the way, a typical band staying at 3m distance can’t play with sync better than 10ms anyway.

Cay_Bremer · November 28, 2011, 12:11pm

On Mon, 28 Nov 2011 08:34:38 +0100, wrote:
> I’ve not seen a modern OS that supports clocks faster than 5ms, and that
> was an exceptional one for a figure that low.

NT has supported clock intervals lower than 5 ms for a long time:

http://msdn.microsoft.com/en-us/library/windows/hardware/ff545614.aspx
http://technet.microsoft.com/en-us/sysinternals/bb897568
“Inside Windows NT High Resolution Timers”

Most, if not all, multimedia applications make use of this.

Tim_Roberts · November 28, 2011, 12:35pm

xxxxx@broadcom.com wrote:

“By our requirements, this was about two orders of magnitude too coarse for
serious computer music apps (100us starts to get within bounds, although
we could live with 250us)”

Was there a requirement for maintaining phase-coherent C of 5th octave (about 4 kHz?).

By the way, a typical band staying at 3m distance can’t play with sync better than 10ms anyway.

No, but as a keyboard player, I know that if I play a MIDI keyboard into
a PC and listen to the results out of the speaker, I will be
subconsciously annoyed by delays of far less than 10ms.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Maxim_S_Shatskih · November 28, 2011, 1:50pm

> I’ve not seen a modern OS that supports clocks faster than 5ms

timeBeginPeriod will set Windows clock to 1ms.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Alex_Grig · November 28, 2011, 2:05pm

“I will be subconsciously annoyed by delays of far less than 10ms.”

Middle A is 440 Hz, and its period is 2.2 ms. A typical attack for this note will be a few periods. Say 10 ms.

Daniel_Terhell · November 28, 2011, 2:32pm

wrote in message news:xxxxx@ntdev…
> “I will be subconsciously annoyed by delays of far less than 10ms.”
>
> Middle A is 440 Hz, and its period is 2.2 ms. A typical attack for this
> note will be a few periods. Say 10 ms.
>

The speed of sound is only weakly influenced by its frequency (or phase)
depending on the medium. “typical attack” is dependent on the instrument you
are playing, not on the frequency of the note.

//Daniel

Alex_Grig · November 28, 2011, 2:49pm

“The speed of sound is only weakly influenced by its frequency (or phase)
depending on the medium. “typical attack” is dependent on the instrument you
are playing, not on the frequency of the note.”

The point I’m making is that the “window of audible uncertainty” for middle notes is over 5 ms anyway, even if you’re the one playing the instrument. Which I would not call “far less than 10 ms”. Also, keyboard instruments have pretty steep attacks, although the piano is not the steepest (which would perhaps be a clavichord).

MBond · November 28, 2011, 6:50pm

The usual solution is buffering. If playing audio, or video, from either a
network disk source, simply ensure that at least one of these periods of
data is buffered and the user experiences a slightly delayed, but internally
consistent result. Few users care about a 15-30 ms delay in receiving AV
data, but most will complain if that data jitters by even a tenth of that
amount. Game developers have a much more difficult problem IMHO as they
need to respond to input as closely as possible to the generation of output.
This tends to lead to designs based on busy loops. Back in the good ol’
days, game authors for DOS simply used interrupt handlers for keystrokes
etc. and had more or less deterministic results, but now that the OS wants
to control the HW, they have a much tougher time.

wrote in message news:xxxxx@ntdev…

Many of the features that make real-time music work are handled either in
drivers, or by using large
buffers, or by using special multimedia interfaces which essentially live
“outside” the normal
scheduling mechanism.

Well, although you can write a custom microkernel/HAL that runs GPOS as a
lowest-priority task,
this “RTOS” must be self-contained, i.e. it cannot rely on any services
that GPOS provides, for understandable reasons. However, playing music
invariably involves dealing with either storage or network, because the
actual data to be played must be coming from somewhere, which means GPOS
services are still required.

Therefore, “GPOS as a task in RTOS” concept does not seem to apply here -
you still forced to rely (at least partly) upon the scheduler that GPOS
provides…

Anton Bassov