I think putting the “guidance” into the “reference” is a great idea! +1!
Whitepapers have their place, but they should live inside the doc tree, so they are easily referenced by the reference pages, not as a bunch of independently downloadable entities.
That’s my take on it, anyway.
Phil B
Phil Barila | Senior Software Engineer
720.881.5364 (w)
LogRhythm, Inc.
A LEADER 2012 SIEM Magic Quadrant
WINNER of SC Magazine’s 2012 SIEM Best Buy
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Diane Olsen
Sent: Wednesday, January 09, 2013 7:34 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Baffling issue with Queued spin-locks
The current WDK documentation team DEFINITELY sees things differently. In fact, less than an hour ago I was asked to ask this list how they’d feel if, instead of publishing a bunch of new whitepapers this time around, we take that same guidance and integrate it into the online documentation!
One reason for doing this is that, as you say, the information really belongs directly in the WDK docs. The other reason is that keeping a bunch of separate whitepapers around as downloadable Word documents creates a lot of extra maintenance work for us over time.
Thoughts?
–Diane
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Saturday, January 05, 2013 10:53 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Baffling issue with Queued spin-locks
My opinion is that the documentation on queued spin locks is misleading. I’ve been trying to get it corrected for years, and the old documentation team just kept wanting to publish more whitepapers and such, rather than put the right guidance directly in the WDK. Their take on it was that the WDK was mostly reference, and guidance wasn’t reference. We went around on that one a bunch of times. Perhaps the current team will eventually see things differently.
Any design choice is a set of tradeoffs. In order to understand those tradeoffs, it’s often useful to understand the implementations.
Traditional spinlock:
The acquiring thread raises to the IRQL of the lock (usually DISPATCH_LEVEL,) disabling the dispatcher (scheduler.) No more thread preemptions are possible. The thread then spins using an atomic compare-and-exchange instruction. When the right value is returned, it owns the lock. Both before and after acquiring the lock, the thread can be preempted by interrupts.
This strategy is simple and relatively cheap when lock contention is low. When lock contention is high, it tends to tie up the bus in cache management traffic, as all the waiters are doing atomic compare-and-exchange operations on the same cache line. In a NUMA machine, waiters in the same NUMA node as the owner are much more likely to successfully acquire the lock.
A waiter that is handling an interrupt will never acquire the lock.
Queued spinlock:
The acquiring thread raises to the IRQL of the lock. The thread then attempts to insert itself at the end of the list of waiters, using a compare-and-exchange operation, making the end of the list point to a local structure. Then it spins reading the local structure to see if it owns the lock. The owner then releases the lock by following the list and assigning ownership to the next waiter by writing into that waiter’s local structure.
This strategy is a little more complex, and no more expensive in the uncontended case. In the heavily contended case, it’s much more efficient, as all the waiters are spinning on different cache line, and thus very little cache management traffic flows across the bus. Waiters in remote NUMA nodes are no less likely to acquire the lock.
Note, though, that if a waiter is interrupted or if the hypervisor schedules some other virtual processor, it may have ownership of the lock assigned to it while it is still off handling the ISR or the while the virtual processor isn’t even running. So all the other waiters will wait longer. (I’m assuming here that these sorts of performance issues are usually more interesting in servers and that servers, to a close approximation, are now virtualized.)
For this reason, we initially only used queued locks within the kernel itself, and only when the lock IRQL was higher than device ISRs (something not typically available in drivers.)
Eventually, we exposed queued lock primitives to drivers, and at DISPATCH_LEVEL. The documentation that got written made a lot of people think that queued locks are always a win. They’re not. They’re useful if the lock tends to have a lot of waiters, and if there’s no good way to avoid that situation.
Note also that if there is a whole bunch of threads simultaneously attempting to insert themselves into the list of waiters, the contention looks a lot like a traditional spinlock, since those compare-exchange operations have to happen on the list itself, which is shared between all the processors.
As a last thought, I want to point out that the problem of having a virtual processor unscheduled while waiting for queued lock acquisition is important enough that Windows will cede virtual processor time to the hypervisor when a thread has failed to insert itself into the lock queue more than a couple of times. Similarly, it will cede time to the hypervisor when failing to acquire a traditional spinlock. This makes the two lock forms perform more similarly to each other when virtualized.
- Jake Oshins
Windows Kernel Team
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Saturday, January 5, 2013 4:33 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Baffling issue with Queued spin-locks
In my disk upper-filter driver, I was using normal spin-locks. Things
were working fine( including WHQL tests). Then I stumbled upon
“queued spin locks”. Greed had me in its grip, on reading that they are more efficient.
I changed all the ordinary spin-locks to queued spin locks. Things shattered.
After enabling driver verifier, I get to see an assert which declares
timeout for DPC watchdog.
I wonder what went wrong.
Are you sure you changed all of the calls to be queued spinlocks?
Is your PKLOCK_QUEUE_HANDLE allocated on the stack? Have a read of Doron’s blog entry on this http://blogs.msdn.com/b/doronh/archive/2006/03/08/546934.aspx and make sure you aren’t doing any of the stuff he says not to do.
James
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
NTDEV is sponsored by OSR
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer