Spinlocks vs. atomic instructions

Hello,

I have a few global variables multiple routines can read and write to.

So far, I know of only one necessary spinlock in my driver
(I allocate non-paged pool, one routine reads it, while another one could attempt to free it).

Almost all of my globals are DWORDS and I wouldn’t mind them changing spontaneously as long as they change as a whole.

So, for example, if a variable has the value 0x00000000 and the new value is 0xFFFFFFFF, the operation should be atomic:
0x00000000 -> 0xFFFFFFFF
and not happen gradually, like:
0x00000000 -> 0xFFFF0000 -> 0xFFFFFFFF

As far as I know, x86-32 supports atomic instructions for 8-, 16- and 32-bit data, so synchronization isn’t necessary in this case (please correct me if I’m wrong).

But what about floating-point D- and QWORDS?

Are x87 and SSE/SSE2 instructions like FST, FSTP, MOVSS & MOVSD guaranteed to be atomic aswell (for both 32- & 64-bit)?

Thanks.

Examine the “Interlocked” family of functions.

Thomas F. Divine

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-294455-
xxxxx@lists.osr.com] On Behalf Of xxxxx@hushmail.com
Sent: Friday, July 20, 2007 1:31 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Spinlocks vs. atomic instructions

Hello,

I have a few global variables multiple routines can read and write to.

So far, I know of only one necessary spinlock in my driver
(I allocate non-paged pool, one routine reads it, while another one
could attempt to free it).

Almost all of my globals are DWORDS and I wouldn’t mind them changing
spontaneously as long as they change as a whole.

So, for example, if a variable has the value 0x00000000 and the new
value is 0xFFFFFFFF, the operation should be atomic:
0x00000000 -> 0xFFFFFFFF
and not happen gradually, like:
0x00000000 -> 0xFFFF0000 -> 0xFFFFFFFF

As far as I know, x86-32 supports atomic instructions for 8-, 16- and
32-bit data, so synchronization isn’t necessary in this case (please
correct me if I’m wrong).

But what about floating-point D- and QWORDS?

Are x87 and SSE/SSE2 instructions like FST, FSTP, MOVSS & MOVSD
guaranteed to be atomic aswell (for both 32- & 64-bit)?

Thanks.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Another question to ask yourself in terms of needing a lock is if you
need to evaluate 2 vars for state to make a decision (and not have var1
change while checking var2), or to have 2 variables be set to certain
values based on each others state. If any 2 vars are interdependent,
you need a lock to set and evaluate them as one unit.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hushmail.com
Sent: Thursday, July 19, 2007 10:31 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Spinlocks vs. atomic instructions

Hello,

I have a few global variables multiple routines can read and write to.

So far, I know of only one necessary spinlock in my driver
(I allocate non-paged pool, one routine reads it, while another one
could attempt to free it).

Almost all of my globals are DWORDS and I wouldn’t mind them changing
spontaneously as long as they change as a whole.

So, for example, if a variable has the value 0x00000000 and the new
value is 0xFFFFFFFF, the operation should be atomic:
0x00000000 -> 0xFFFFFFFF
and not happen gradually, like:
0x00000000 -> 0xFFFF0000 -> 0xFFFFFFFF

As far as I know, x86-32 supports atomic instructions for 8-, 16- and
32-bit data, so synchronization isn’t necessary in this case (please
correct me if I’m wrong).

But what about floating-point D- and QWORDS?

Are x87 and SSE/SSE2 instructions like FST, FSTP, MOVSS & MOVSD
guaranteed to be atomic aswell (for both 32- & 64-bit)?

Thanks.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

wrote in message news:xxxxx@ntdev…
> As far as I know, x86-32 supports atomic instructions for 8-, 16- and
> 32-bit data, so
> synchronization isn’t necessary in this case (please correct me if I’m
> wrong).

16-bit access is only atomic if the least significant bit of the address is
zero, meaning that it’s aligned on a 16-bit boundary. 32-bit accesses are
only atomic if the lower two bits of the address are zero.

Phil

Philip D. Barila
Seagate Technology LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.

“Philip D. Barila” wrote in message
news:xxxxx@ntdev…
> wrote in message news:xxxxx@ntdev…
>> As far as I know, x86-32 supports atomic instructions for 8-, 16- and
>> 32-bit data, so
>> synchronization isn’t necessary in this case (please correct me if I’m
>> wrong).
>
> 16-bit access is only atomic if the least significant bit of the address
> is zero, meaning that it’s aligned on a 16-bit boundary. 32-bit accesses
> are only atomic if the lower two bits of the address are zero.

Uggh, I shouldn’t do this so late at night. I should have said that memory
accesses wider than one byte are only atomic if they are not split across
cache lines. In general, modern compilers default alignment of multi-byte
types ensures that restriction is met, but packing can negate that.

Phil

Philip D. Barila
Seagate Technology LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.

Instead of worrying about processor-specific details, you should
probably just use the OS provided synchronization functions when
logically required even if you don’t really need to.

If you’re worried about performance issues, then *maybe* you should
care about avoiding locks. Even so, I’d write it “correctly” first
and then evaluate performance objectively.

On 7/20/07, xxxxx@hushmail.com wrote:
> Hello,
>
> I have a few global variables multiple routines can read and write to.
>
> So far, I know of only one necessary spinlock in my driver
> (I allocate non-paged pool, one routine reads it, while another one could attempt to free it).
>
> Almost all of my globals are DWORDS and I wouldn’t mind them changing spontaneously as long as they change as a whole.
>
>
> So, for example, if a variable has the value 0x00000000 and the new value is 0xFFFFFFFF, the operation should be atomic:
> 0x00000000 -> 0xFFFFFFFF
> and not happen gradually, like:
> 0x00000000 -> 0xFFFF0000 -> 0xFFFFFFFF
>
>
> As far as I know, x86-32 supports atomic instructions for 8-, 16- and 32-bit data, so synchronization isn’t necessary in this case (please correct me if I’m wrong).
>
> But what about floating-point D- and QWORDS?
>
> Are x87 and SSE/SSE2 instructions like FST, FSTP, MOVSS & MOVSD guaranteed to be atomic aswell (for both 32- & 64-bit)?
>
>
> Thanks.
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Yeah, I agree - you have other issues to worry about, besides just
atomicity, like memory barriers, compiler optimizations (i.e. what
volatile turns off), etc., any of which might come into play for you.

Use OS-supplied synch functions or OS-supplied interlock functions.

-sd

On Jul 20, 2007, at 12:41 PM, Shawn Brooks wrote:

Instead of worrying about processor-specific details, you should
probably just use the OS provided synchronization functions when
logically required even if you don’t really need to.

If you’re worried about performance issues, then *maybe* you should
care about avoiding locks. Even so, I’d write it “correctly” first
and then evaluate performance objectively.

On 7/20/07, xxxxx@hushmail.com wrote:
>> Hello,
>>
>> I have a few global variables multiple routines can read and write
>> to.
>>
>> So far, I know of only one necessary spinlock in my driver
>> (I allocate non-paged pool, one routine reads it, while another
>> one could attempt to free it).
>>
>> Almost all of my globals are DWORDS and I wouldn’t mind them
>> changing spontaneously as long as they change as a whole.
>>
>>
>> So, for example, if a variable has the value 0x00000000 and the
>> new value is 0xFFFFFFFF, the operation should be atomic:
>> 0x00000000 -> 0xFFFFFFFF
>> and not happen gradually, like:
>> 0x00000000 -> 0xFFFF0000 -> 0xFFFFFFFF
>>
>>
>> As far as I know, x86-32 supports atomic instructions for 8-, 16-
>> and 32-bit data, so synchronization isn’t necessary in this case
>> (please correct me if I’m wrong).
>>
>> But what about floating-point D- and QWORDS?
>>
>> Are x87 and SSE/SSE2 instructions like FST, FSTP, MOVSS & MOVSD
>> guaranteed to be atomic aswell (for both 32- & 64-bit)?
>>
>>
>> Thanks.
>>
>> —
>> Questions? First check the Kernel Driver FAQ at http://
>> www.osronline.com/article.cfm?id=256
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
> —
> Questions? First check the Kernel Driver FAQ at http://
> www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

Yes, that sounds plausible.

In my case, “protecting” the other globals basically means calling KeAcquireSpinLockAtDpcLevel a bit earlier since I already have to guard the pool memory in the same routine.

Thomas,

Examine the “Interlocked” family of functions.

Actually, as long as the OP does not cross the cache line, he does not even need interlocked functions in order to insure atomicity of memory access - as long as you do not cross the cache line, MOV is guaranteed to be atomic (at least on P6 family and above) . Furthermore, as long as he relies upon the natural data alignement, his memory accesses are guaranteed to be atomic even on older processors. This is not a problem at all. The real problem here is that the OP just does not seem to see any difference between atomicity of memory access and synchronization, and this is, apparently, going to put him into a trouble - imagine if his variables are inter-dependent…

Anton Bassov

On Jul 20, 2007, at 7:35 PM, xxxxx@hotmail.com wrote:

Thomas,

> Examine the “Interlocked” family of functions.

Actually, as long as the OP does not cross the cache line, he does
not even need interlocked functions in order to insure atomicity of
memory access - as long as you do not cross the cache line, MOV is
guaranteed to be atomic (at least on P6 family and above) .
Furthermore, as long as he relies upon the natural data alignement,
his memory accesses are guaranteed to be atomic even on older
processors. This is not a problem at all. The real problem here is
that the OP just does not seem to see any difference between
atomicity of memory access and synchronization, and this is,
apparently, going to put him into a trouble - imagine if his
variables are inter-dependent…

While it may be true that aligned memory accesses are atomic, they
don’t enforce memory barriers, and they may need volatile qualifiers.
It’s very hard to do this stuff right without using the OS’s built-in
synch/interlock functions, which do make ordering guarantees.

Plus, while the OP may indeed be an expert on (current) computer
platforms, do you expect every maintenance coder to be that good too?
Real world experience says they won’t always be. And what if the next
programmer decides to compile it for IA-64 or architecture N+1, N
+2, …? Will they make exactly the same memory read/write ordering
guarantees that x86 does? Even x86 has a history of inconsistency on
this point.

So, if you’re an absolute expert writing for a narrowly defined
platform, your comments are correct; most people/projects don’t fall
into that category.

http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx
http://64.233.169.104/search?q=cache:tPN1977REOsJ:www.aristeia.com/
Papers/DDJ_Jul_Aug_2004_revised.pdf+scott+meyers+volatile
+barrier&hl=en&ct=clnk&cd=1&gl=us&client=safari

-Steve

> While it may be true that aligned memory accesses are atomic, they

don’t enforce memory barriers, and they may need volatile qualifiers.
It’s very hard to do this stuff right without using the OS’s built-in
synch/interlock functions, which do make ordering guarantees.

I am afraid you misunderstood me - otherwise, you would not have used the term “synch/interlock functions” in this context. Ironically, you quoted everything apart from the sentence that actually
expresses the main idea of my post…

What I am trying to say here is that, although interlocked operation insures atomicity of memory access, in most cases it is *NOT* a substitute for synchronization, so that, by choosing to replace a spinlock with some construct that guarantees nothing more than just atomicity of memory access, the OP most likely have chosen the wrong path from the very beginning…

Concerning memory barriers and memory read/write ordering… please read the OP’s original post carefully, and you will see that he is NOT concerned about it at all, i.e. he makes one more grave mistake - he believes that insuring atomicity of memory access is all he needs( this objective alone can be achieved simply by properly aligning data). However, if he used a spinlock,this issue will be solved in itself due to LOCK prefix in test-and-set.

And what if the next programmer decides to compile it for IA-64 or architecture N+1, N +2, …?

Well, I somehow presume the OP is not concerned about anything, apart from x86 - otherwise, he would not have asked the original question, in the first place…

Anton Bassov

Anton Bassov wrote:

What I am trying to say here is that, although interlocked operation insures
atomicity of memory access, in most cases it is *NOT* a substitute for
synchronization, so that, by choosing to replace a spinlock with some construct
that guarantees nothing more than just atomicity of memory access, the OP most
likely have chosen the wrong path from the very beginning…

I think I know the difference between memory atomicity and “proper” synchronization primitives.

Yes, atomicity is not necessarily a substitute for a spinlock,
but a spinlock (and friends) is always a substitute for atomicity.

Because of the way I use most of my global variables, they do not require a spinlock as long as memory atomicity is guaranteed.

I admit, I didn’t know about the memory alignment requirements and their possible side-effects.

I just wasn’t sure how FP operations might behave.
Although this question hasn’t been answered yet, I understand the concern about portability and have decided to use spinlocks even if I technically don’t have to on x86-32,64 platforms.

Comments embedded …

On 7/21/07, xxxxx@hushmail.com wrote:
>
> Anton Bassov wrote:
> > What I am trying to say here is that, although interlocked operation
> insures
> > atomicity of memory access, in most cases it is NOT a substitute for
> > synchronization, so that, by choosing to replace a spinlock with some
> construct
> > that guarantees nothing more than just atomicity of memory access, the
> OP most
> > likely have chosen the wrong path from the very beginning…
>
> I think I know the difference between memory atomicity and “proper”
> synchronization primitives.
>
> Yes, atomicity is not necessarily a substitute for a spinlock,
> but a spinlock (and friends) is always a substitute for atomicity.
>
> Because of the way I use most of my global variables, they do not require
> a spinlock as long as memory atomicity is guaranteed.

How do you use these global variables?. And exactly how do you kinow the mem
atomicity is guranteed? And what kind of operations you do on to those
variable? Finally what instructions are being used to operate on those
variables ?

Using Interlock APIs or some other locks ( depending on your requirements)
is the best suggesstion given here.

I admit, I didn’t know about the memory alignment requirements and their
> possible side-effects.
>
> I just wasn’t sure how FP operations might behave.
> Although this question hasn’t been answered yet, I understand the concern
> about portability and have decided to use spinlocks even if I technically
> don’t have to on x86-32,64 platforms.
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

> as long as you do not cross the cache line, MOV is

guaranteed to be atomic (at least on P6 family and above) .
It is a bit off-topic, but…
A while ago I made some research on read atomicity and did find
that Intel makes this promise ( from
http://download.intel.com/design/PentiumII/manuals/24319202.pdf):

7.1.1. Guaranteed Atomic Operations
The Intel386™, Intel486™, Pentium®, and P6 family processors guarantee that the following
basic memory operations will always be carried out atomically:
• Reading or writing a byte.
• Reading or writing a word aligned on a 16-bit boundary.
• Reading or writing a doubleword aligned on a 32-bit boundary.
The P6 family processors guarantee that the following additional memory operations will
always be carried out atomically:
• Reading or writing a quadword aligned on a 64-bit boundary. (This operation is also
guaranteed on the Pentium® processor.)
• 16-bit accesses to uncached memory locations that fit within a 32-bit data bus.
• 16-, 32-, and 64-bit accesses to cached memory that fit within a 32-Byte cache line.
Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries
are not guaranteed to be atomic by the Intel486™, Pentium®, or P6 family processors. The P6
family processors provide bus control signals that permit external memory subsystems to make
split accesses atomic; however, nonaligned data accesses will seriously impact the performance
of the processor and should be avoided where possible.
But I could no and can not find anything similar about, let’s say, AMD.

Can anyone point to an official document (not a newsgroup and the like,
but vendor’s manual or guide) about read/write/modify atomicity for any of
the AMD processors?

I searched all PDFs I could find at

http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_11467_11513,00.html

for “atomic” but found only what relates to LOCK prefix (and this is NOT what I was looking for).

-------------- Original message --------------
From: Steve Dispensa

>
> On Jul 20, 2007, at 7:35 PM, xxxxx@hotmail.com wrote:
>
> > Thomas,
> >
> >> Examine the “Interlocked” family of functions.
> >
> > Actually, as long as the OP does not cross the cache line, he does
> > not even need interlocked functions in order to insure atomicity of
> > memory access - as long as you do not cross the cache line, MOV is
> > guaranteed to be atomic (at least on P6 family and above) .
> > Furthermore, as long as he relies upon the natural data alignement,
> > his memory accesses are guaranteed to be atomic even on older
> > processors. This is not a problem at all. The real problem here is
> > that the OP just does not seem to see any difference between
> > atomicity of memory access and synchronization, and this is,
> > apparently, going to put him into a trouble - imagine if his
> > variables are inter-dependent…
>
> While it may be true that aligned memory accesses are atomic, they
> don’t enforce memory barriers, and they may need volatile qualifiers.
> It’s very hard to do this stuff right without using the OS’s built-in
> synch/interlock functions, which do make ordering guarantees.
>
> Plus, while the OP may indeed be an expert on (current) computer
> platforms, do you expect every maintenance coder to be that good too?
> Real world experience says they won’t always be. And what if the next
> programmer decides to compile it for IA-64 or architecture N+1, N
> +2, …? Will they make exactly the same memory read/write ordering
> guarantees that x86 does? Even x86 has a history of inconsistency on
> this point.
>
> So, if you’re an absolute expert writing for a narrowly defined
> platform, your comments are correct; most people/projects don’t fall
> into that category.
>
> http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx
> http://64.233.169.104/search?q=cache:tPN1977REOsJ:www.aristeia.com/
> Papers/DDJ_Jul_Aug_2004_revised.pdf+scott+meyers+volatile
> +barrier&hl=en&ct=clnk&cd=1&gl=us&client=safari
>
> -Steve
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

> And exactly how do you kinow the mem atomicity is guranteed?

Very simply - just don’t do things like example below does:

UCHAR array[256];
PULONG pointer=(PULONG)&array[1];

Compiler aligns variables on their natural boundary (i.e WORD and DWORD are aligned on respectively 2 or 4-byte boundary), so that, unless you do some “modification” in a way the above sample code does, *atomicity* of memory access is guaranteed. However, *atomicity* of memory access alone does not solve all other possible issues, and, out of those that remain unsolved, only one may get solved by using “Intelocked” family of functions ( interlocked functions act as a memory barrier). The only approach that solves absolutely all issues is using synchronization functions, i.e doing something that the OP originally wanted to avoid…

And what kind of operations you do on to those variable?

Actually, in his original post the OP made it clear that he just reads and writes variables (i.e. no increments, decrements or any other operation that requires bus locking, and, hence, is dealt with by “Interlocked” family of functions). His original statement is quoted below:

[begin quote]

I have a few global variables multiple routines can read and write to.

[end quote]

Finally what instructions are being used to operate on those variables ?

Well, as long as we are speaking about simple reads and writes, anything apart from MOV is very unlikely to be used…

Using Interlock APIs or some other locks ( depending on your requirements)
is the best suggesstion given here.

Actually, I think the best suggestion given here is just to give up the idea of replacing synchronization functions with any other construct (including “Interlocked” family of functions), and, instead, just use a spinlock. It looks like the OP got convinced with my arguments, and decided to use a spinlock…

Anton Bassov

Anton Bassov wrote:

Well, as long as we are speaking about simple reads and writes, anything apart
from MOV is very unlikely to be used…

Most of my DWORDs are floating-point numbers, so FST, FSTP, MOVSS, MOVSD (depending on arch, etc) are almost exclusively used.
I actually use my own typedef to easily switch between single and double precision, so they can also be QWORDs if I decide so.

These variables are user-supplied multipliers.
They’re not codependent, not used for conditional jumps, only for multiplication with local variables, so anything above memory atomicity is not necessary.

That’s also the reason why I can’t use Interlocked* functions (I could, but they wouldn’t do what I want them to).

Anton Bassov wrote:

It looks like the OP got convinced with my arguments, and decided to use a spinlock…

No, it were Shawn Brooks’ and Steve Dispensa’s objections.

I sometimes wonder whether some people actually bother to read previous replies. There’s so much redundancy in here.

Sorry, that should have been “interdependent”. Codependence has a different denotation.

> Most of my DWORDs are floating-point numbers,

Incredible statement…

Since when Windows developers use the term DWORD for some data type, other than ULONG??? Please note that DWORD is a user-mode typedef for ‘unsigned long’ in windef.h. Therefore, when Windows developer hears ‘DWORD’, he/she automatically assumes that you are speaking not
just about 4-byte variable, but strictly about ULONG. This is not the case when you speak about ‘sizeof (DWORD)’, for understandable reasons - at this point it is obvious that you are speaking just about 4-byte variable that may be of some type other than DWORD.

In your original post you made it clear that you were speaking about DWORDs. No wonder I just could not have imagined that, in actuality, you were speaking about floating-point numbers. I am not the only one who got confused - otherwise, no one would have even mentioned interlocked functions, in the first place…

No, it were Shawn Brooks’ and Steve Dispensa’s objections.

I sometimes wonder whether some people actually bother to read previous replies.
There’s so much redundancy in here.

Actually, after having read your post that came immediately after Shawn’s and Steve’s ones, I thought that you were still not convinced with their arguments at the moment. This is all that you had said :

[begin quote]

Yes, that sounds plausible.

In my case, “protecting” the other globals basically means calling
KeAcquireSpinLockAtDpcLevel a bit earlier since I already have to guard the pool
memory in the same routine.

[end quote]

Nothing in the above statement suggests that you actually decided to use a spinlock, don’t you think??? Therefore, I jumped in, and gave some more details on the idea that they have expressed, and, after that, you told us about your decision to use a spinlock. Therefore, you can accuse me of anything, apart from having skipped Shawn’s and Steve’s replies…

Anton Bassov

Anton Bassov wrote:

In your original post you made it clear that you were speaking about DWORDs.
No wonder I just could not have imagined that, in actuality, you were speaking
about floating-point numbers. I am not the only one who got confused -
otherwise, no one would have even mentioned interlocked functions,
in the first place…

Single-precision floating-point numbers are double words just like 32-bit integers.
Double-precision floating-point numbers are quadruple words just like 64-bit integers.
(Of course, I assume x86 architecture.)

My choice of words is entirely correct.

I can see how you (and possibly others) have been confused by this, but let’s see what I actually asked
(questions usually end with question marks):

But what about floating-point D- and QWORDS?
Are x87 and SSE/SSE2 instructions like FST, FSTP, MOVSS & MOVSD
guaranteed to be atomic aswell (for both 32- & 64-bit)?

As you can see, my only two questions were about FP operations, so it’s quite reasonable to assume I’m talking about FP types, don’t you agree?

Okay, most replies were about the “(please correct me if I’m wrong)” part, and also not entirely off-topic, they did not directly answer my questions.

Anton Bassov wrote:

Nothing in the above statement suggests that you actually decided to use a
spinlock, don’t you think???

My reply started with: “Yes, that sounds plausible.”
In other words: “Yes, I agree with you.”

If I already agree with their concerns, why shouldn’t I follow their advice?!

It is a very simple matter of semantics… really.

> Single-precision floating-point numbers are double words just like 32-bit integers.

Double-precision floating-point numbers are quadruple words just like 64-bit integers.
(Of course, I assume x86 architecture.)

My choice of words is entirely correct.

This is where the funny things start…

Indeed, in terms of words themselves, your statement is absolutely correct - I don’t argue about it. However, from the Windows developer’ s perspective it becomes not-so-correct if you put it in capital letters, i.e. do it the way you did - in the Windows world, particularly in the user-mode part of it, DWORD is generally known as a typedef for ‘unsigned long’…

Anton Bassov