RtlCompareMemory - Please clarify

Could somebody please clarify what exactly RtlCompareMemory() returns ?
The DDK documentation says:

“RtlCompareMemory returns the number of bytes that compare as equal. If all
bytes compare as equal, the input Length is returned.”

In the following case:

Block1 = AA BB CC DD EE FF (no spaces in actual memory)
Block2 = AA 22 33 44 EE 00 (no spaces in actual memory)

Will it return 2 because 2 bytes are equal [i.e. Byte 0 (AA) and Byte 4 (EE) ] ?

OR

Will it return 1 because after 1 byte [i.e. Byte 0 (AA)], it encounters
bytes which are unequal ?

Basically I am looking for a function which will give me the
location/offset of the first byte where 2 memory blocks differ. Can I use
RtlCompareMemory() for this ? Any other ideas ?

Any help is appreciated.

Thanks!
Puja

From:
Sent: Friday, May 05, 2000 8:15 PM

>
> Could somebody please clarify what exactly RtlCompareMemory() returns ?
> The DDK documentation says:
>
> “RtlCompareMemory returns the number of bytes that compare as equal. If
all
> bytes compare as equal, the input Length is returned.”
>
> In the following case:
>
> Block1 = AA BB CC DD EE FF (no spaces in actual memory)
> Block2 = AA 22 33 44 EE 00 (no spaces in actual memory)
>
> Will it return 2 because 2 bytes are equal [i.e. Byte 0 (AA) and Byte 4<br>&gt; (EE)] ?
>
> OR
>
> Will it return 1 because after 1 byte [i.e. Byte 0 (AA)], it encounters
> bytes which are unequal ?

RtlCompareMemory() works like the standard “C” memcmp() function. Namely,
it returns the number of bytes in the first block that match corresponding
bytes in the second block, starting from the beginning of the first block.

So, for the above example, the result from either RtlCompareMemory() or
memcmp() would be 1. Only the first byte AA in Block1 matches the
corresponding byte in Block2. The second byte BB in Block1 does not match
Block2.

> Basically I am looking for a function which will give me the
> location/offset of the first byte where 2 memory blocks differ. Can I use
> RtlCompareMemory() for this ? Any other ideas ?

Yes, that’s exactly the information RtlCompareMemory() returns. Note that
you could also include <string.h> and use the standard memcmp() (which is
also exported by NTOSKRNL) – whichever you prefer.

- Matt</string.h>

>Basically I am looking for a function which will give me the

location/offset of the first byte where 2 memory blocks differ. Can I use
RtlCompareMemory() for this ? Any other ideas ?

Is there some reason you just don’t write a little loop, as in:

for(i = 0;i < length;i++) {
if (*a++ != *b++)
break;
}

if (i != length) {

}

If one of the operands is in device memory, you may want to use correct
device memory access functions like READ_REGISTER_UCHAR. You may also want
to compare bigger chunks (like QWORDS, with MMX registers) for performance
reasons, and then figure out the actual byte offset. You may also want to
arrange for device memory to ONLY be read ONCE, as in:

for(i = 0;i < length;i++) {
x = READ_REGISTER_UCHAR(a++);
if (x != *b++)
break;
}

if (i != length) {
// x still contains the device value that mismatched

}

Or you may want to do some appropriate cache prefetch touching, to get the
memory subsystem and processor to overlap better (processor dependent).
RtlCompareMemory will not know of all these unique cases.

I believe “memcmp” will also work fine in a driver.

  • Jan

> Is there some reason you just don’t write a little loop, as in:

for(i = 0;i < length;i++) {
if (*a++ != *b++)
break;
}

I can’t speak for the original poster, but you should always use the library
functions (memcmp, memcpy, RtlCompareMemory, etc.) for that sort of thing
since it uses the string instructions that are built into the Intel CPU. I
think I measured them long ago and found them to be about 25% faster than
the optimized versions of the code I wrote. Of course, I suppose the
compiler these days could be much smarter, and there may not actually be a
difference.

Taed has given the major reason why I didn’t want to use a loop to do the
comparison. The other reason is that I need to compare very large memory
blocks ~ 100 K, so I thought I’d be better off using a library function
if I could find one. Moreover, I need to do this in a REALTIME thread,
running at HIGH_PRIORITY, although I’m not sure what the implications
might
be if I ran in a loop vs. calling a lib. function.

Thanks for the clarification everybody!

Puja

On 05/06/00, “Taed Nelson ” wrote:
> > Is there some reason you just don’t write a little loop, as in:
> > for(i = 0;i < length;i++) {
> > if (*a++ != *b++)
> > break;
> > }
>
> I can’t speak for the original poster, but you should always use the library
> functions (memcmp, memcpy, RtlCompareMemory, etc.) for that sort of thing
> since it uses the string instructions that are built into the Intel CPU. I
> think I measured them long ago and found them to be about 25% faster than
> the optimized versions of the code I wrote. Of course, I suppose the
> compiler these days could be much smarter, and there may not actually be a
> difference.

At 02:21 PM 5/6/00 -0700, you wrote:

> Is there some reason you just don’t write a little loop, as in:
> for(i = 0;i < length;i++) {
> if (*a++ != *b++)
> break;
> }

I can’t speak for the original poster, but you should always use the library
functions (memcmp, memcpy, RtlCompareMemory, etc.) for that sort of thing
since it uses the string instructions that are built into the Intel CPU. I
think I measured them long ago and found them to be about 25% faster than
the optimized versions of the code I wrote. Of course, I suppose the
compiler these days could be much smarter, and there may not actually be a
difference.

My guess is the run-time functions (kernel or C) do NOT take into account
processor unique instructions, such as the cache prefetch and non-caching
move instructions on recent processors. It’s been a couple of years since I
measured all the tradoffs, but very much believe the Intel string
instructions will cause horrible cache pollution (anybody correct me if
this is not the case). Ideally, calling a kernel run-time function for
things like this should do some processor specific code (different on
Pentium II and Pentium III), as the OS certainly has to know about
different processors. I don’t know if W2K kernel functions have this
enhancement.

The Microsoft C compiler does know how to inline many of the low level
run-time functions. I know for sure it can generate inline string move
instructions to replace memcpy, not sure about memcmp). If the code
optimizer was really together, it should detect a simple loop that moves
memory, and use the appropriate instructions.

If one of the operands is device memory, over say a PCI bus, there may be
some advantage to using larger data sizes (like 64 or 128 bit) move
operations, as the hardware should know to do a 2 or 4 DWORD burst, instead
of single DWORD bursts. Burst length is extreemly important to getting good
PCI bus efficency. Stripmining into registers may also generate even better
PCI bus bursts, like (for moving memory from a PCI target to main memory):

next:
movq mm0,[deviceMemory+0x00]
movq mm1,[deviceMemory+0x08]
movq mm2,[deviceMemory+0x10]
movq mm3,[deviceMemory+0x18]
movq mm4,[deviceMemory+0x20]
movq mm5,[deviceMemory+0x28]
movq mm6,[deviceMemory+0x30]
movq mm7,[deviceMemory+0x38]
movntq [dest+0x00],mm0
movntq [dest+0x08],mm1
movntq [dest+0x10],mm2
movntq [dest+0x18],mm3
movntq [dest+0x20],mm4
movntq [dest+0x28],mm5
movntq [dest+0x30],mm6
movntq [dest+0x38],mm7
sub ecx, 0x40
jnz next

This MAY generate 64-byte PCI bursts (a bus analyzer would be needed to see
what really happens) and writes the data to memory without cache pollution.
I know there was a discussion here a while back about getting burst PCI
target read transfers if the source device memory is declared uncached
(don’t remember the conclusion).

Also note that MMX instructions are like floating-point instructions, care
must be taken to make them work in kernel mode (especially at greater than
IRQL PASSIVE).

My understanding of the actual application by xxxxx@usa.net is to sweep
128K of PCI target memory, and look for changes against a memory buffer.
Getting fast PCI bursts, and not flushing the processor cache on every
“poll” seems like a desirable quality. Twenty carefully chosen instructions
will potentially perform MUCH better than a generic memory comparison
function. Of course if the polling happens once every 60 seconds, it really
doen’t matter. If the polling happens 100 times/sec, it really does.

I guess my point is there may be HUGE efficency losses by just using some
generic run-time function, and “it depends” on lots of factors if this is a
problem.

  • Jan

Hello Matt,

Maybe I am missing something. You said that RtlCompareMemory() works
exactly like memcmp(), but documentation for memcmp() says that it compares
two memory blocks lexically. It says that the return value

< 0 means buf1 less than buf2
= 0 means buf1 identical to buf2

0 means buf1 greater than buf2

Doesn’t it mean that memcmp() never returns the offset ?

Also, see below is the source for memcmp() [File MSDEV\crt\src\memcmp.c].

Please clarify since this is very confusing.

Thanks
Puja

/***
*memcmp.c - compare two blocks of memory
*
* Copyright (c) 1985-1993, Microsoft Corporation. All rights
reserved.
*
*Purpose:
* defines memcmp() - compare two memory blocks lexically and
* find their order.
*
*******************************************************************************/

#include <cruntime.h>
#include <string.h>

#ifdef _MSC_VER
#pragma function(memcmp)
#endif /* _MSC_VER /

/

int memcmp(buf1, buf2, count) - compare memory for lexical order

Purpose:
Compares count bytes of memory starting at buf1 and buf2
and find if equal or which one is first in lexical order.

Entry:
void buf1, buf2 - pointers to memory sections to compare
size_t count - length of sections to compare

Exit:
returns < 0 if buf1 < buf2
* returns 0 if buf1 == buf2
* returns > 0 if buf1 > buf2

Exceptions:

******************************************************************************/

int __cdecl memcmp (
const void * buf1,
const void * buf2,
size_t count
)
{
if (!count)
return(0);

while ( --count && *(char *)buf1 == *(char *)buf2 ) {
buf1 = (char *)buf1 + 1;
buf2 = (char *)buf2 + 1;
}

return( *((unsigned char *)buf1) - *((unsigned char *)buf2) );
}

==

On 05/05/00, ““Matt A.” ” wrote:
> > “RtlCompareMemory returns the number of bytes that compare as equal. If
> all
> > bytes compare as equal, the input Length is returned.”
> >
> > In the following case:
> >
> > Block1 = AA BB CC DD EE FF (no spaces in actual memory)
> > Block2 = AA 22 33 44 EE 00 (no spaces in actual memory)
> >
> > Will it return 2 because 2 bytes are equal [i.e. Byte 0 (AA) and Byte 4<br>&gt; &gt; (EE)] ?
> >
> > OR
> >
> > Will it return 1 because after 1 byte [i.e. Byte 0 (AA)], it encounters
> > bytes which are unequal ?
>
> RtlCompareMemory() works like the standard “C” memcmp() function. Namely,
> it returns the number of bytes in the first block that match corresponding
> bytes in the second block, starting from the beginning of the first block.
>
> So, for the above example, the result from either RtlCompareMemory() or
> memcmp() would be 1. Only the first byte AA in Block1 matches the
> corresponding byte in Block2. The second byte BB in Block1 does not match
> Block2.
>
> > Basically I am looking for a function which will give me the
> > location/offset of the first byte where 2 memory blocks differ. Can I use
> > RtlCompareMemory() for this ? Any other ideas ?
>
> Yes, that’s exactly the information RtlCompareMemory() returns. Note that
> you could also include <string.h> and use the standard memcmp() (which is
> also exported by NTOSKRNL) – whichever you prefer.
>
> - Matt</string.h></string.h></cruntime.h>

From:
Sent: Sunday, May 07, 2000 7:11 PM

> Hello Matt,
>
> Maybe I am missing something. You said that RtlCompareMemory() works
> exactly like memcmp(), but documentation for memcmp() says that it
compares
> two memory blocks lexically. It says that the return value
>
> < 0 means buf1 less than buf2
> = 0 means buf1 identical to buf2
> > 0 means buf1 greater than buf2
>
> Doesn’t it mean that memcmp() never returns the offset ?
>
> Also, see below is the source for memcmp() [File MSDEV\crt\src\memcmp.c].
>
> Please clarify since this is very confusing.

Yes, and I apologize for confusing you even more!

What I said about memcmp() was entirely wrong. How embarrassing! I must
have been working too long and posting too late. :wink:

memcmp() does NOT return the same sort of value as RtlCompareMemory(). As
you say, memcmp() returns a signed value indicating the lexical relation the
first block or memory has to the second.

However, based on your description of “a function which will give … the
location/offset of the first byte where 2 memory blocks differ”, I still
think RtlCompareMemory() is the function you want to use.

- Matt

> What I said about memcmp() was entirely wrong. How embarrassing! I must

have been working too long and posting too late. :wink:

It happens to all of us! Its ok as long as we clear things up. Thanks for
that!

However, based on your description of “a function which will give … the
location/offset of the first byte where 2 memory blocks differ”, I still
think RtlCompareMemory() is the function you want to use.

I’m fine as long as I have a function which solves my purpose! I was just
curious about memcmp() (in case I need to use it in future), even though I
plan to use RtlCompare().

Thanks again!

Puja