> Speer, Kenny wrote:
> This is the sort of advice that I would love to use, but this statement
> in the documentation is too ambiguous:
>
> The RtlCopyMemory routine runs faster than RtlMoveMemory,"
>
> What does “run faster” mean? Is it 1000x faster or just .0005x faster?
> Some products will be very dependent on this statement.
It’s a ridiculous statement that has been handed down since the mists of
time, based on the documentation for the original C run-time library
some 40 or 50 years ago.
*****
Actually, it can still be true. The important part of the text below is
the to the “normal” direction, and that is key. memcpy uses the “normal”
direction. “Normal” is whatever is most efficient on the hardware
platform. It can be either low-to-high or high-to-low, and the C library
carefully leaves this unspecified. Note that if the source and
destination overlap, the results are undefined. Thus, the “move” forms
were introduced. If the buffers do not overlap, it can do whatever memcpy
does. But if they overlap, then the test below determines if the data is
copied in the forward direction (not the “normal” direction) or the
backward direction. This allows you to “slide” a group of contiguous
values either drection with the same call, and ensures that the results
are well-defined.
Now, as far as “efficiency” is cocerned, part of the issue is that many
compilers, including Microsoft, can implement “memcpy” as an “intrinsic”,
meaning the compiler replaces the call with actual inline code. The
inline code moves data in the “normal” direction. The “move” version
generally has no intrinsic implementation, so requires an out-of-line
call.
On modern pipelined superscalar multilevel-cache architectures, the
differences in performance will not show up as long as the move is in the
“normal” direction. But if it requires moving in the “non-normal”
direction, the performance degradation is highly platform-specific. In
modern Intel x86/x64 machines, however, these differences may be minor.
Hint: if uncertain, do the experiment and measure and see.
The bottom line: for non-overlapping copies, use the “copy” form. If
there is any difference in performance, this will get the best
performance. If the source and destination can overlap, you ***MUST***
use the “move” forms; this is not optional. In this case, correctness
trumps performance.
joe
Here is a pseudo-code implementation of memmove:
Is the source address greater than the destination address?
copy in the normal direction //WRONG! “forward”
else
copy in the backwards direction
Here is a pseudo-code implementation of memcpy:
copy in the normal direction
You can see how significant the performance impact is likely to be. It
is so close to 0 that it cannot be measured, except for very small
copies. The “copy” algorithms themselves are identical.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
NTDEV is sponsored by OSR
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer