std::atomic<unsigned> vs. InterlockedXxx

Since C++11 we have std::atomic and I wonder if I could use it instead of InterlockedXxx. Let’s assume that I only want to count, but do it thread-safe. I might write:

void test1(volatile LONG& a) {
   InterlockedIncrement(&a);
}

Or, by using std::atomic:

void test2(volatile std::atomic<long>& a) {
   ++a;
}

Compiling the second case gets me this assembler function:

?test@@YAXAAU?$atomic@J@std@@@Z (void __cdecl test(struct std::atomic<long> &)):
  00000000: 8B 44 24 04        mov         eax,dword ptr [esp+4]
  00000004: B9 01 00 00 00     mov         ecx,1
  00000009: F0 0F C1 08        lock xadd   dword ptr [eax],ecx
  0000000D: C3                 ret

As far as I understand it that’s sufficient for multi-thread on one CPU (testing confirms that). But is is also safe on multi-CPU systems (can’t test it :))?

Any help appreciated.

But is is also safe on multi-CPU systems

The short answer is “yes”, because of the LOCK prefix before the instruction

Anton Bassov

cxxl wrote:

Since C++11 we have std::atomic and I wonder if I could use it instead of InterlockedXxx. Let’s assume that I only want to count, but do it thread-safe. I might write:

As far as I understand it that’s sufficient for multi-thread on one CPU (testing confirms that). But is is also safe on multi-CPU systems (can’t test it )?

In user-mode, yes.  Unlike much of the standard library, all of the
std::atomic functions are declared as being exception-free, so it should
even be safe in kernel mode, but the necessary include paths are not
part of the default kernel build environment.

As Mr Roberts says, there’s nothing in theory that prevents most of the <atomic> header from working in an NT kernel driver. There’s just some details to mop up. Microsoft’s compiler team is open to doing that work, but they don’t see a strong demand yet. (Many kernel hackers are… not fans… of C++.) If y’all want to see <atomic> (and <type_traits> and <memory>) officially supported for kernel drivers, please drop them a line and add your customer case.

BTW if you are just collecting high-volume statistics, I have a slight preference for InterlockedIncrementNoFence, which is faster on certain processor architectures, and also clearly communicates that you’re not using the interlock for any synchronization purposes. This corresponds with std::memory_order_relaxed.

BTW #2: In your example of std::atomic, you don’t need the volatile modifier. It’s just going to hurt performance without improving correctness. Generally you should stick to one platform’s story for memory model: use Windows’s InterlockedXxx everywhere, or use C++'s atomic<T> everywhere, or use the (discouraged!) MSVC language extension of overloading the volatile keyword to impart barriers (compiler flag /volatile:ms). You shouldn’t mix these together in the same codebase, since it’s redundant at best, and confusing and slow at worst.

please drop them a line and add your customer case

Can you suggest the best way for a typical non-MSFT connected person to do that?

Peter

@“Peter_Viscarola_(OSR)” said:

please drop them a line and add your customer case

Can you suggest the best way for a typical non-MSFT connected person to do that?

Peter

Sure - they have a variety of feedback channels to select from:

As usual, if you work for a large company that has a developer support account, you can also reach out to your Microsoft contact. One of the advantages of paid support is that it’s a little easier to draw attention to your feedback.

In recent years the designers of c++ have added quite a few kernel useful features to the language. It’s interesting people at Microsoft are uncertain of the demand for these in the kernel and continue to quietly leave them out. I find this methodology unprecedented. Of course we want them. We hear over and over again how a few old fogies failed to get on with c++. It’s incomprehensible how this can be a justification to deny all developers the option of leveraging new, powerful and practical language features that would certainly benefit the windows platform.

Having to complete bizarre feedback/surveys to indicate interest in such things as atomic is a bit of a sham. If something takes more than 2 clicks to perform then the drop off becomes exponential. Why not make a survey asking how many people don’t want these things in the kernel using the same difficult methods to provide the feedback and compare the results. Better yet please just fix the tool chain and stop making irrational excuses.

On Jan 19, 2019, at 5:14 PM, Rourke wrote:
>
> In recent years the designers of c++ have added quite a few kernel useful features to the language. It’s interesting people at Microsoft are uncertain of the demand for these in the kernel and continue to quietly leave them out. I find this methodology unprecedented. Of course we want them. We hear over and over again how a few old fogies failed to get on with c++. It’s incomprehensible how this can be a justification to deny all developers the option of leveraging new, powerful and practical language features that would certainly benefit the windows platform.

The fogies are not the justification. The problem is that anything Microsoft “blesses” into the kernel has to be supported. That means, if there is a problem, they guarantee under their support contracts that they will fix those problems. If a company loses revenue because their drivers crashed because of a compiler issue, Microsoft can be liable. In order to provide that guarantee, they have to KNOW that the compiler features will work in all conditions. That’s expensive.

Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

In recent years the designers of c++ have added quite a few kernel useful features to the language.

I agree.

I find this methodology unprecedented

Really? Which operating system are you thinking of that sets a precedent in which:

  • Kernel programming practices are subject to up down votes of random people?
  • C++ is allowed, including exceptions

Please,don’t randomly spout-off. It dilutes your initial valid point and your very reasonable reuqest. Instead, thank Mr Tippet for his help, and go show your support for your quest in one or more of the ways he graciously outlined.

Peter

The problem is that anything Microsoft “blesses” into the kernel has to be supported. That means,
if there is a problem, they guarantee under their support contracts that they will fix those problems.

That’s interesting. A lot of people here remember the 2003 DDK. Ironically, there was a compiler bug with the interlock functions being discussed here such as InterlockedOr, InterlockedAnd, and InterlockedXor that generated code that didn’t work. This compiler bug was silently creating faulty device drivers each time unsuspecting developers implemented these functions. Interestingly, Microsoft discovered the bug several weeks after release of the DDK, but they never patched the DDK and they never made people downloading the DDK aware of the problem. And this went on for YEARS of the very latest and supported DDK creating faulty drivers.

I would be more comfortable using std:: functions that are exercised and tested by a much broader user base. And chances are people are already familiar with std:: functions from school or other areas of development so the learning curve is reduced plus std functions tend to be a lot more powerful than alternative APIs in the WDK. So a suggestion for the future is perhaps have the WDK functions be wrappers for std::atomic and others as a first step towards modernizing the WDK.

thank Mr Tippet for his help, and go show your support for your quest in one or more of the ways he graciously outlined.

It’s great to see on this message board there are so many incredible minds and good insights direct from Microsoft as well. I am not trying to upset anyone. I really did hit all the links suggested but after seeing nothing relevant decided to save digging into that for a rainy day. Does this message board support doing polls by chance? That could add some interesting discussion points and increased input from the field.

In recent years the designers of c++ have added quite a few kernel useful features to the language.
It’s interesting people at Microsoft are uncertain of the demand for these in the kernel and continue to quietly leave them out.
I find this methodology unprecedented. Of course we want them. We hear over and over again how a few old fogies failed to get
on with c++. It’s incomprehensible how this can be a justification to deny all developers the option of leveraging new,
powerful and practical language features that would certainly benefit the windows platform.

Have you thought about the technical aspects of these “kernel useful features” and issues that may potentially arise when making them work
before making your post?

I am going to give you a little brain-teaser. Let’s look at this particular example of “atomic” statement. As long as we are speaking about x86, atomic adding,subtracting, XORing, and other ALU operations are supported by the hardware. Therefore, the only thing that has to be done in order to make it work is adding a LOCK prefix to the instruction, which is true for both kernel and userland . However, Load-And-Store architectures typically don’t support any atomic operations other than test-and-set (they need to support the atomic test-and-set because they need to provide some means of MP synchronisation).

Therefore, my question is “How would you implement, say, an atomic increment on such a platform under the constraint that you are not allowed supplying anything, apart from a pointer to the target variable?”

If you come up with a correct answer to this question you are going to realize that a language feature that is “cool” on architecture X may be already “not-so-cool” on architecture Y. Certainly, the same is true with functions like InterlockedXXX, but, unlike a language feature, the OS designers always have a chance to leave this function unsupported on certain architectures so that its use can be regulated with #ifdefs. You cannot take the same approach with a C++ keyword or operator, can you…

Anton Bassov

If you set your include paths right and use the most recent MSVC (2017 15.8, released Aug 14, 2018), you can use in the kernel, at least for types which can be manipulated in a lock-free way. In theory, bigger types can be operated on with std::atomic, but I have not personally tested that in the kernel.

You can see which versions of things and how I set up my paths here: https://github.com/ben-craig/kernel_test_harness/blob/master/build.ninja

RE: Anton Bassov’s comments about Load-And-Store architectures. Most (not all) multi-core architectures support pointer-sized atomics. ARMv6 and later has a bunch of atomic operations, all abstracted appropriately by std::atomic. You can query whether a type is always lock free or not in the preprocessor in case you are paranoid about getting ported to a Cortex M0 or PA-RISC accidentally though.

anton_bassov wrote:

Therefore, my question is “How would you implement, say, an atomic increment on such a platform under the constraint that you are not allowed supplying anything, apart from a pointer to the target variable?”

How is that possibly relevant to this discussion?  Assuming the platform
has a standard-compliant C++ compiler, the compiler’s standard library
authors have already solved that problem.  And all of the platforms on
which Windows runs have several standard-compliant C++ compilers.

If you come up with a correct answer to this question you are going to realize that a language feature that is “cool” on architecture X may be already “not-so-cool” on architecture Y. Certainly, the same is true with functions like InterlockedXXX, but, unlike a language feature, the OS designers always have a chance to leave this function unsupported on certain architectures so that its use can be regulated with #ifdefs. You cannot take the same approach with a C++ keyword or operator, can you…

So what?

@Ben_Craig said:
If you set your include paths right and use the most recent MSVC (2017 15.8, released Aug 14, 2018), you can use in the kernel, at least for types which can be manipulated in a lock-free way.

Yes, a couple of the maintainers of Microsoft’s C++ standard library have been dabbling with a skunkworks project to remove roadblocks on using a few STL headers in kernel drivers. This is not an official statement of support… just something they’ve been investigating in their spare time.

If you look at our source code for the NetAdapterCx.sys OS component, you might notice that it relies on a forked copy of <type_traits> and a naive reimplementation of std::unique_ptr<T>. We in the OS group are pretty eager to get our hands on the real STL too.

I might have something to do with that skunkworks project :slight_smile: Freestanding Proposal

@Ben_Craig said:
I might have something to do with that skunkworks project :slight_smile: Freestanding Proposal

Golly, that was some pleasant reading. I wish you luck in getting this paper through all the reviews & red tape.

I’ll add the remark that std::unique_ptr<T> is one of the biggest value propositions for the kernel code that we write here. So I’m slightly concerned to see it’s on the potential cut-list. But hey, I’ll take what I can get. If the C++ committee wants to officially acknowledge that some developers want to std::move without buying into the whole runtime, then that’s progress. Maintaining a cheap imitation of an RAII class is a lot easier than maintaining a cheap imitation of an entire standard library.

BTW, I might have had something to do with that skunkworks project too :slight_smile: I ported a few of the STL headers to a kernel driver, and provided that as a proof-of-concept to the STL maintainers. Microsoft’s NDIS team also contributed partial kernel support to the C++ library that we OS developers use internally, and which should be publicly open-sourced “soon”. Maybe 2020 will be the year of C++ in NT kernel drivers.

You just saw R3 of the paper. R4 should be published soon, and unique_ptr is there in its entirety. I realized that the problem wasn’t with the new and delete operators, it was with the default operator new and operator delete. unique_ptr doesn’t force the default allocator in any way.

RE: Anton Bassov’s comments about Load-And-Store architectures. Most (not all) multi-core architectures support pointer-sized atomics. >ARMv6 and later has a bunch of atomic operations, all abstracted appropriately by std::atomic

Could you please expand it a bit…

What do you mean by “bunch of atomic operations” ??? According to ARM official documentation, the only atomic instructions that ARMv6 provides are LDREX and STREX.
(http://infocenter.arm.com/help/topic/com.arm.doc.dht0008a/DHT0008A_arm_synchronization_primitives.pdf).

Let’s look at how they work

[begin quote]

The LDREX instruction loads a word from memory, initializing the state of the exclusive monitor(s) to track the synchronization operation. For example, LDREX R1, [R0] performs a Load-Exclusive from the address in R0 , places the value into R1 and updates
the exclusive monitor(s)

The STREX instruction performs a conditional store of a word to memory. If the exclusive monitor(s) permit the store, the operation updates the memory location and returns the value 0 in the destination register, indicating that the operation succeeded. If the exclusive monitor(s) do not permit the store, the operation does not update the memory location and returns the value 1 in the destination register. This makes it possible to implement conditional execution paths based on the success or failure of the memory operation. For example, STREX R2, R1, [R0] performs a Store-Exclusive operation to the address in R0 , conditionally storing the value from R1 and indicating success or failure
in R2
.
[end quote]

Judging from this description, the best that you can get out of it is implementing a spinlock, with LDREX implementing the outer loop and STREX the inner one, effectively acting as an equivalent to x86’s atomic test-and-set (i.e.something that I was speaking about in my previous post). This is all that ARM seems to provide. It does not seem to go anywhere close to what x86 allows, i.e. loading-updating-storing as a single atomic instruction, does it. OTOH, it simplifies the “puzzle” that I gave to Mr.Rourke - the “constraint” of allowing only a single pointer is not really a constraint here, although the whole thing still has to be built around a spinlock…

Anton Bassov

Being “atomic” doesn’t require that an operation execute in a single instruction. It just requires that the operation be indivisible and can’t tear. There are other requirements to get lock freedom, which I’m not as confident on (I believe one requirement there is a guarantee of forward progress).

For ARMv6 and beyond, you can get an atomic increment by doing a LDREX on the location of your int, doing a regular increment in a register, then doing a STREX on the original memory location. If it succeeds, then great. If it fails, then you loop back to the LDREX and try again. Readers of the integer memory always see a consistent value, and you never drop an increment.

For a decent list of what some mappings of C++ atomics to assembly are, look here: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html