Read up on acquire and release memory semantics. Interlocked operations combine the memory barrier (full or partial) with access to a (or multiple) variables. IMHO Peter is correct that except for implementing synch primitives or extremely exotic cases, these calls are not useful for the vast majority of programmers
Sent from Surface Pro
From: James Harper
Sent: Tuesday, November 11, 2014 3:54 PM
To: Windows System Software Devs Interest List
Seriously?
To be quite honest, I don’t even know where I’d drop a KeMemoryBarrier().
Sure, I know what it does, but I’ve never known a place where I should code
one in a real implementation (other than on something like ARM, of course).I bet I’m not the only one.
Mr. Harper, maybe you’d be willing to write something up for The NT Insider
on the practical use of KeMemoryBarrier as a tutorial and warning to help the
clueless such as myself regarding this topic??
KeMemoryBarrier() stops both the compiler and the CPU re-ordering your code (you probably knew that). My driver uses a shared memory ring between the fronend (windows) and the backend (xen dom0). It’s lock free, so to prevent a race, at the end of the ring processing loop we do:
- Check if backend has subsequently written any more entries to the ring, and if so, go around the loop again
- If they haven’t, set the notification variable so the backend will notify us when more data is written
- Check again if the backend has written any more while we were doing the above
So when I exit from the ISR/DPC, I know that the backend will send another event through when there is more data.
I was missing KeMemoryBarrier() between #2 and #3, so either the compiler or the cpu was optimising things for me and not actually checking the backend again.
AFAICS, the only practical use of KeMemoryBarrier() is for processing lock free data structures. I guess if I used InterlockedXxx() calls then they wouldn’t be needed, but I’m communicating with another VM which doesn’t use InterlockedXxx() calls, and I don’t think you can mix them. Also the ring processing code is mostly macros from a xen header file which include mb() and wmb() calls (which I macro’d to KeMemoryBarrier()) so I need to be consistent. I’m doing things a bit manually in this case to better optimise things - it’s a storport driver so I know exactly how many outstanding requests I have.
As for article writing - not something I’m really good at.
James
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer