Peter, I’ve been wondering whether that change matters. For the record,
there are two visible changes that I know of in the driver contract in
Windows 7.
KeSetEvent never returns with the Dispatcher Lock held, because there is
no dispatcher lock any more.
Timers may execute on all processors, not just processor zero. Thus
it’s possible that two different timers execute simultaneously. This was
never true before.
Both are side effects of scalability work, and probably quite necessary.
But I’m curious whether you (all of you) think that we’ll see bugs in
drivers as a result of either of these.
–
Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group
This post implies no warranties and confers no rights.
wrote in message news:xxxxx@ntdev… >
> > Thanks for this. For me, this is easily the most interesting (and perhaps > the most useful) thing I’ve read on the list for the past week. > > Great comment about a subtle change in a subtle parameter, > > Peter > OSR > >
Consider two synchronization (auto reset) events which protect a shared
buffer like in a consumer/producer fashion:
Reader thread:
R1)Read from the buffer
R2)PulseEvent (readEvent,0,TRUE) // Signal writer that reading is done
R3)WaitForSingleObject(writeEvent)
R4)goto A1
Writer thread:
W1)Write to the buffer
W2)PulseEvent(writeEvent,0,TRUE) // Signal read thread that writing is done
W3)WaitForSingleObject(readEvent) // Wait until reader did reading
W4)Goto W1
With atomic set and wait this will work as expected:
R1-R2-R3-W4-W1-W2-W3-R4-R1-R2-R3-etc.
Without atomic set and wait, the order may become:
R1-R2-W4-W1-W2-R3-W3 -> deadlock
or
R1-R2-R3-W1-W2-R4-R1-R2-W3-R3 -> deadlock
I will give this some more thought and see if I can think of a case with a
true race condition because the set and wait is not atomic.
//Daniel
“Jake Oshins” wrote in message news:xxxxx@ntdev… > Peter, I’ve been wondering whether that change matters. For the record, > there are two visible changes that I know of in the driver contract in > Windows 7. > > 1) KeSetEvent never returns with the Dispatcher Lock held, because there > is no dispatcher lock any more. > > 2) Timers may execute on all processors, not just processor zero. Thus > it’s possible that two different timers execute simultaneously. This was > never true before. > > Both are side effects of scalability work, and probably quite necessary. > But I’m curious whether you (all of you) think that we’ll see bugs in > drivers as a result of either of these. > > – > Jake Oshins > Hyper-V I/O Architect > Windows Kernel Group > > This post implies no warranties and confers no rights. > > -------------------------------------------------------------- > > > wrote in message news:xxxxx@ntdev… >>
>> >> Thanks for this. For me, this is easily the most interesting (and >> perhaps the most useful) thing I’ve read on the list for the past week. >> >> Great comment about a subtle change in a subtle parameter, >> >> Peter >> OSR >> >> >
> Consider two synchronization (auto reset) events which protect a shared
buffer like in a consumer/producer fashion:
Reader thread:
R1)Read from the buffer
R2)PulseEvent (readEvent,0,TRUE) // Signal writer that reading is done
R3)WaitForSingleObject(writeEvent)
R4)goto A1
Writer thread:
W1)Write to the buffer
W2)PulseEvent(writeEvent,0,TRUE) // Signal read thread that writing is
done
W3)WaitForSingleObject(readEvent) // Wait until reader did reading
W4)Goto W1
With atomic set and wait this will work as expected:
R1-R2-R3-W4-W1-W2-W3-R4-R1-R2-R3-etc.
Without atomic set and wait, the order may become:
R1-R2-W4-W1-W2-R3-W3 -> deadlock
This is the “missed pulse” problem that often comes up when people
try to implement something similar to POSIX condition variables
using win32 events. Some people attempt to solve it using
SignalObjectAndWait (which internally uses the Wait=TRUE trick)
but this is still not 100% reliable (even pre-win7) because an APC
can temporarily remove the thread from its wait, causing it to
miss a pulse anyway. (This is why PulseEvent is now deprecated,
by the way).
Drivers can disable APCs so technically it’s possible to construct
a kernel-mode example that seems to work reliably on previous
OS versions and suffers from the missed pulse problem on win7.
However, I think that behavior in this case was not well-defined
to begin with, so it’s not something that drivers should be relying on.
–
Pavel Lebedinsky/Windows Kernel Test
This posting is provided “AS IS” with no warranties, and confers no rights.
On Thu, Aug 27, 2009 at 2:18 AM, Jake Oshins wrote: > 2) ?Timers may execute on all processors, not just processor zero. ?Thus > it’s possible that two different timers execute simultaneously. ?This was > never true before. >
Hmmm… on the one hand, tracking down system hangs due to somebody blocking processor 0 will not occur, on the other hand identifying and fixing the bug that caused the hang is going to be more difficult. This is an excellent change that will make my life more difficult.
(Also, PulseEvent is broken by design and has no guarantee about actually making any forward progress at all.)
S
-----Original Message-----
From: xxxxx@resplendence.com Sent: Thursday, August 27, 2009 02:40 To: Windows System Software Devs Interest List Subject: Re:[ntdev] Driver contract changes in Win 7
Consider two synchronization (auto reset) events which protect a shared buffer like in a consumer/producer fashion:
Reader thread:
R1)Read from the buffer R2)PulseEvent (readEvent,0,TRUE) // Signal writer that reading is done R3)WaitForSingleObject(writeEvent) R4)goto A1
Writer thread:
W1)Write to the buffer W2)PulseEvent(writeEvent,0,TRUE) // Signal read thread that writing is done W3)WaitForSingleObject(readEvent) // Wait until reader did reading W4)Goto W1
With atomic set and wait this will work as expected: R1-R2-R3-W4-W1-W2-W3-R4-R1-R2-R3-etc.
Without atomic set and wait, the order may become: R1-R2-W4-W1-W2-R3-W3 -> deadlock or R1-R2-R3-W1-W2-R4-R1-R2-W3-R3 -> deadlock
I will give this some more thought and see if I can think of a case with a true race condition because the set and wait is not atomic.
//Daniel
“Jake Oshins” wrote in message news:xxxxx@ntdev… > Peter, I’ve been wondering whether that change matters. For the record, > there are two visible changes that I know of in the driver contract in > Windows 7. > > 1) KeSetEvent never returns with the Dispatcher Lock held, because there > is no dispatcher lock any more. > > 2) Timers may execute on all processors, not just processor zero. Thus > it’s possible that two different timers execute simultaneously. This was > never true before. > > Both are side effects of scalability work, and probably quite necessary. > But I’m curious whether you (all of you) think that we’ll see bugs in > drivers as a result of either of these. > > – > Jake Oshins > Hyper-V I/O Architect > Windows Kernel Group > > This post implies no warranties and confers no rights. > > -------------------------------------------------------------- > > > wrote in message news:xxxxx@ntdev… >>
>> >> Thanks for this. For me, this is easily the most interesting (and >> perhaps the most useful) thing I’ve read on the list for the past week. >> >> Great comment about a subtle change in a subtle parameter, >> >> Peter >> OSR >> >> >
> Consider two synchronization (auto reset) events which protect a shared buffer
like in a consumer/producer fashion:
Reader thread:
R1)Read from the buffer
R2)PulseEvent (readEvent,0,TRUE) // Signal writer that reading is done >R3)WaitForSingleObject(writeEvent)
R4)goto A1
Writer thread:
W1)Write to the buffer
W2)PulseEvent(writeEvent,0,TRUE) // Signal read thread that writing is done > W3)WaitForSingleObject(readEvent) // Wait until reader did reading
W4)Goto W1
In actuality, I think it is butter just to introduce a third event here, and, at this point, you can stop worrying whether set-and-wait is atomic or not. Look how it can be done - A,B and C are synch (i.e. auto-reset)events, with C being originally initialized to signaled state and A and B to non-signaled one:
Event C synchronizes an access to the buffer, and events A and B ensure that an access to the buffer is always sequenced like W-R-W-R-W-R. and never like R-R or W-W. With such approach it just does not matter whether set-and-wait is atomic or not.
<mr. oshins> 2) Timers may execute on all processors, not just processor zero. Thus it’s possible that two different timers execute simultaneously. This was never true before. </mr.>
Any chance we will see this service-packed into NT6-es prior to Win7?
For what it is worth, I would have considered it a bug in a code review to
rely on this behavior and since Timers often must synchronize with other
activity potentially on other processors, it seems only vaguely accessable
as a ‘behavior’ to be leveraged.
OTOH the new behavior is great. I wish it was so in all revs of the OS
I really doubt that we’d put that in a service pack. We’d have little
incentive and quite a bit of risk.
As it happens, this change went it to make life easier on the scheduler in
our hypervisor. The hypervisor in Windows Server 2008 (not R2) is not
bottlenecked on timer processing. VMWare does (mostly) gang-scheduling, so
they won’t really care. It’s only after we did some performance work and
broke other bottlenecks that spreading out timer delivery mattered for the
hypervisor in Window Server 2008 R2 (Win 7).
–
Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group
This post implies no warranties and confers no rights.
“David R. Cattley” wrote in message news:xxxxx@ntdev… > > <mr. oshins> > 2) Timers may execute on all processors, not just processor zero. Thus > it’s possible that two different timers execute simultaneously. This was > never true before. > </mr.> > > Any chance we will see this service-packed into NT6-es prior to Win7? > > For what it is worth, I would have considered it a bug in a code review to > rely on this behavior and since Timers often must synchronize with other > activity potentially on other processors, it seems only vaguely accessable > as a ‘behavior’ to be leveraged. > > OTOH the new behavior is great. I wish it was so in all revs of the OS > > Thanks for highlighting these two changes. > > Regards, > Dave Cattley > >
Another contract change that was introduced in server 2008 R2 was greater than 64 proc support. This has 2 visible changes
MAXIMUM_PROCESSORS is now meaningless
If a driver was using the current processor’s index in the affinity mask as an index into their own data structure (such as an array bounded which has MAXIMUM_PROCESSORS elements), while the index will never exceed MAXIMUM_PROCESSORS so you will not go off the end of the array, 2 concurrent threads of execution (be it dispatch routines, DPCs, whatever) can both have the same index but be in different processor groups, thus invalidating the assumption that the index can be used lock free or exclusively or as some type of unique id
d
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Thursday, August 27, 2009 10:16 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Driver contract changes in Win 7
I really doubt that we’d put that in a service pack. We’d have little incentive and quite a bit of risk.
As it happens, this change went it to make life easier on the scheduler in our hypervisor. The hypervisor in Windows Server 2008 (not R2) is not bottlenecked on timer processing. VMWare does (mostly) gang-scheduling, so they won’t really care. It’s only after we did some performance work and broke other bottlenecks that spreading out timer delivery mattered for the hypervisor in Window Server 2008 R2 (Win 7).
–
Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group
This post implies no warranties and confers no rights.
“David R. Cattley” wrote in message news:xxxxx@ntdev… > > <mr. oshins> > 2) Timers may execute on all processors, not just processor zero. > Thus it’s possible that two different timers execute simultaneously. > This was never true before. > </mr.> > > Any chance we will see this service-packed into NT6-es prior to Win7? > > For what it is worth, I would have considered it a bug in a code > review to rely on this behavior and since Timers often must > synchronize with other activity potentially on other processors, it > seems only vaguely accessable as a ‘behavior’ to be leveraged. > > OTOH the new behavior is great. I wish it was so in all revs of the > OS > > Thanks for highlighting these two changes. > > Regards, > Dave Cattley > >
Another contract change that was introduced in server 2008 R2 was
greater than
64 proc support. This has 2 visible changes
MAXIMUM_PROCESSORS is now meaningless
If a driver was using the current processor’s index in the affinity
mask as
an index into their own data structure (such as an array bounded which
has
MAXIMUM_PROCESSORS elements), while the index will never exceed
MAXIMUM_PROCESSORS so you will not go off the end of the array, 2
concurrent
threads of execution (be it dispatch routines, DPCs, whatever) can
both have
the same index but be in different processor groups, thus invalidating
the
assumption that the index can be used lock free or exclusively or as
some type
of unique id
Is there a document that describes how a driver should maintain per-cpu
data under this new regime? Or is doing that a no-no from windows 7
onwards?
There is a document on whdc, i do not have access to a pc to check for the exact link. There are notifications about processor changes that you can subscribe to that you can use to dynamically reallocate structures to account for the change in the number of procs
d
Sent from my phone with no t9, all spilling mistakes are not intentional.
-----Original Message-----
From: James Harper Sent: Thursday, August 27, 2009 6:13 PM To: Windows System Software Devs Interest List Subject: RE: Re:[ntdev] Driver contract changes in Win 7
> > Another contract change that was introduced in server 2008 R2 was greater than > 64 proc support. This has 2 visible changes > 1) MAXIMUM_PROCESSORS is now meaningless > 2) If a driver was using the current processor’s index in the affinity mask as > an index into their own data structure (such as an array bounded which has > MAXIMUM_PROCESSORS elements), while the index will never exceed > MAXIMUM_PROCESSORS so you will not go off the end of the array, 2 concurrent > threads of execution (be it dispatch routines, DPCs, whatever) can both have > the same index but be in different processor groups, thus invalidating the > assumption that the index can be used lock free or exclusively or as some type > of unique id >
Is there a document that describes how a driver should maintain per-cpu data under this new regime? Or is doing that a no-no from windows 7 onwards?
It has these tantalizing headings in its table of contents:
Kernel-Mode Driver Modifications
Per-Processor Data Structures
Static Array
Dynamic Array
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Doron Holan
Sent: Thursday, August 27, 2009 7:01 PM
To: Windows System Software Devs Interest List
Subject: RE: Re:[ntdev] Driver contract changes in Win 7
There is a document on whdc, i do not have access to a pc to check for the exact link. There are notifications about processor changes that you can subscribe to that you can use to dynamically reallocate structures to account for the change in the number of procs
d
Sent from my phone with no t9, all spilling mistakes are not intentional.
-----Original Message-----
From: James Harper Sent: Thursday, August 27, 2009 6:13 PM To: Windows System Software Devs Interest List Subject: RE: Re:[ntdev] Driver contract changes in Win 7
> > Another contract change that was introduced in server 2008 R2 was greater than > 64 proc support. This has 2 visible changes > 1) MAXIMUM_PROCESSORS is now meaningless > 2) If a driver was using the current processor’s index in the affinity mask as > an index into their own data structure (such as an array bounded which has > MAXIMUM_PROCESSORS elements), while the index will never exceed > MAXIMUM_PROCESSORS so you will not go off the end of the array, 2 concurrent > threads of execution (be it dispatch routines, DPCs, whatever) can both have > the same index but be in different processor groups, thus invalidating the > assumption that the index can be used lock free or exclusively or as some type > of unique id >
Is there a document that describes how a driver should maintain per-cpu data under this new regime? Or is doing that a no-no from windows 7 onwards?
It has these tantalizing headings in its table of contents:
Kernel-Mode Driver Modifications
Per-Processor Data Structures
Static Array
Dynamic Array
It’s an interesting read. Thanks.
It also contains the first use I have ever come across of the word
“affinitizes”, as in the dot point “Is performance critical and
affinitizes interrupt and deferred procedure call (DPC) workload beyond
the first 64 processors.”.
Another contract change that was introduced in server 2008 R2 was
greater than
64 proc support. This has 2 visible changes
MAXIMUM_PROCESSORS is now meaningless
If a driver was using the current processor’s index in the affinity
mask as
an index into their own data structure (such as an array bounded which
has
MAXIMUM_PROCESSORS elements), while the index will never exceed
MAXIMUM_PROCESSORS so you will not go off the end of the array, 2
concurrent
threads of execution (be it dispatch routines, DPCs, whatever) can
both have
the same index but be in different processor groups, thus invalidating
the
assumption that the index can be used lock free or exclusively or as
some type
of unique id
What would be really nice is a layer of abstraction above this all for
per-cpu data storage, eg something like:
not too hard to write your own of course, but harder to make it future
proof for changes like Windows 7.
My drivers maintain some per-cpu data and from reports I’ve had the
‘winlh’ build runs just fine under Windows 7, but I suspect that it
wouldn’t under a >64 CPU system, or even maybe NUMA. That said, my
drivers run in a VM under Xen and I suspect that if you are trying to
solve a problem by creating a VM with more than 64 CPU’s then you are
doing it wrong
It also contains the first use I have ever come across of the word
“affinitizes”, as in the dot point “Is performance critical and
affinitizes interrupt and deferred procedure call (DPC) workload beyond
the first 64 processors.”.
Noah Webster is rolling over in his grave. Microsoft has a bad habit of
inventing new words by verbizing their nouns.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
On Fri, 28 Aug 2009 19:05:40 +0200, Tim Roberts wrote: > Noah Webster is rolling over in his grave. Microsoft has a bad habit of > inventing new words by verbizing their nouns.
Is this a deliberate joke or not? According to Merriam-Webster’s Online Dictionary, the word you mean is “verbalize”.
Weeeeellll… Maybe. But “affinitize” doesn’t bother me. It’s a useful word, immediately descriptive of its meaning. In fact, I was surprised not to find “affinitize” in the dictionary. This is how the language grows. Perhaps it makes sense as a computer science only term?
I don’t know, it all comes down to a matter of personal taste, doesn’t it.
I DO have lots of trouble with other word uses: “Actionable” feedback is one of my pet peeves… though, believe it or not, the non-legal meaning has found its way into at least one dictionary. And I’ve often complained, to anyone who’d listen, of what I find to be Windows XP’s most annoying message EVER: “Windows is starting up”. I didn’t realize “up” needed to be started. Who wrote this, and why weren’t they satisfied with “Windows is starting”?
‘Actionable’ is a truly irritating word indeed. Although unrelated, to me it always falls solidly in to the set of really egregious managerial/hr-ish buzzword expressions that have become the preferred means of those everywhere looking to communicate clearly and simply, ‘hey, this is all bullshit.’
Short list:
actionable
compliance/compliant (perhaps the worst of them all in my book)
tco
roi
coaching
mentoring
leadership
values
core values
vision
aware independent transparent …
As the person responsible for the word “affinitize” appearing in said document, I was encouraged that bing turned up 3,840 references to the word. Most are in computer science but refer to the affinity of all manner of one thing to another. Some are in chemistry and other fields. And a few refer to this word as a neologism.
I also just think it’s a great word. But now I’m just justifying myself.