Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV

Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


RtlCompareMemory - Please clarify

OSR_Community_UserOSR_Community_User Member Posts: 110,217
Could somebody please clarify what exactly RtlCompareMemory() returns ?
The DDK documentation says:

"RtlCompareMemory returns the number of bytes that compare as equal. If all
bytes compare as equal, the input Length is returned."

In the following case:

Block1 = AA BB CC DD EE FF (no spaces in actual memory)
Block2 = AA 22 33 44 EE 00 (no spaces in actual memory)

Will it return 2 because 2 bytes are equal [i.e. Byte 0 (AA) and Byte 4
(EE) ] ?

OR

Will it return 1 because after 1 byte [i.e. Byte 0 (AA) ], it encounters
bytes which are unequal ?

Basically I am looking for a function which will give me the
location/offset of the first byte where 2 memory blocks differ. Can I use
RtlCompareMemory() for this ? Any other ideas ?

Any help is appreciated.

Thanks!
Puja

Comments

  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    From: <[email protected]>
    Sent: Friday, May 05, 2000 8:15 PM


    >
    > Could somebody please clarify what exactly RtlCompareMemory() returns ?
    > The DDK documentation says:
    >
    > "RtlCompareMemory returns the number of bytes that compare as equal. If
    all
    > bytes compare as equal, the input Length is returned."
    >
    > In the following case:
    >
    > Block1 = AA BB CC DD EE FF (no spaces in actual memory)
    > Block2 = AA 22 33 44 EE 00 (no spaces in actual memory)
    >
    > Will it return 2 because 2 bytes are equal [i.e. Byte 0 (AA) and Byte 4
    > (EE) ] ?
    >
    > OR
    >
    > Will it return 1 because after 1 byte [i.e. Byte 0 (AA) ], it encounters
    > bytes which are unequal ?

    RtlCompareMemory() works like the standard "C" memcmp() function. Namely,
    it returns the number of bytes in the first block that match *corresponding*
    bytes in the second block, starting from the beginning of the first block.

    So, for the above example, the result from either RtlCompareMemory() or
    memcmp() would be 1. Only the first byte AA in Block1 matches the
    corresponding byte in Block2. The second byte BB in Block1 does not match
    Block2.

    > Basically I am looking for a function which will give me the
    > location/offset of the first byte where 2 memory blocks differ. Can I use
    > RtlCompareMemory() for this ? Any other ideas ?

    Yes, that's exactly the information RtlCompareMemory() returns. Note that
    you could also include <string.h> and use the standard memcmp() (which is
    also exported by NTOSKRNL) -- whichever you prefer.

    - Matt
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    >Basically I am looking for a function which will give me the
    >location/offset of the first byte where 2 memory blocks differ. Can I use
    >RtlCompareMemory() for this ? Any other ideas ?

    Is there some reason you just don't write a little loop, as in:

    for(i = 0;i < length;i++) {
    if (*a++ != *b++)
    break;
    }

    if (i != length) {
    ...
    }

    If one of the operands is in device memory, you may want to use correct
    device memory access functions like READ_REGISTER_UCHAR. You may also want
    to compare bigger chunks (like QWORDS, with MMX registers) for performance
    reasons, and then figure out the actual byte offset. You may also want to
    arrange for device memory to ONLY be read ONCE, as in:

    for(i = 0;i < length;i++) {
    x = READ_REGISTER_UCHAR(a++);
    if (x != *b++)
    break;
    }

    if (i != length) {
    // x still contains the device value that mismatched
    ...
    }

    Or you may want to do some appropriate cache prefetch touching, to get the
    memory subsystem and processor to overlap better (processor dependent).
    RtlCompareMemory will not know of all these unique cases.

    I believe "memcmp" will also work fine in a driver.

    - Jan
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > Is there some reason you just don't write a little loop, as in:
    > for(i = 0;i < length;i++) {
    > if (*a++ != *b++)
    > break;
    > }

    I can't speak for the original poster, but you should always use the library
    functions (memcmp, memcpy, RtlCompareMemory, etc.) for that sort of thing
    since it uses the string instructions that are built into the Intel CPU. I
    think I measured them long ago and found them to be about 25% faster than
    the optimized versions of the code I wrote. Of course, I suppose the
    compiler these days could be much smarter, and there may not actually be a
    difference.
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    Taed has given the major reason why I didn't want to use a loop to do the
    comparison. The other reason is that I need to compare very large memory
    blocks ~ 100 K, so I thought I'd be better off using a library function
    if I could find one. Moreover, I need to do this in a REALTIME thread,
    running at HIGH_PRIORITY, although I'm not sure what the implications
    might
    be if I ran in a loop vs. calling a lib. function.

    Thanks for the clarification everybody!

    Puja


    On 05/06/00, "Taed Nelson " wrote:
    > > Is there some reason you just don't write a little loop, as in:
    > > for(i = 0;i < length;i++) {
    > > if (*a++ != *b++)
    > > break;
    > > }
    >
    > I can't speak for the original poster, but you should always use the library
    > functions (memcmp, memcpy, RtlCompareMemory, etc.) for that sort of thing
    > since it uses the string instructions that are built into the Intel CPU. I
    > think I measured them long ago and found them to be about 25% faster than
    > the optimized versions of the code I wrote. Of course, I suppose the
    > compiler these days could be much smarter, and there may not actually be a
    > difference.
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    At 02:21 PM 5/6/00 -0700, you wrote:
    >> Is there some reason you just don't write a little loop, as in:
    >> for(i = 0;i < length;i++) {
    >> if (*a++ != *b++)
    >> break;
    >> }
    >
    >I can't speak for the original poster, but you should always use the library
    >functions (memcmp, memcpy, RtlCompareMemory, etc.) for that sort of thing
    >since it uses the string instructions that are built into the Intel CPU. I
    >think I measured them long ago and found them to be about 25% faster than
    >the optimized versions of the code I wrote. Of course, I suppose the
    >compiler these days could be much smarter, and there may not actually be a
    >difference.

    My guess is the run-time functions (kernel or C) do NOT take into account
    processor unique instructions, such as the cache prefetch and non-caching
    move instructions on recent processors. It's been a couple of years since I
    measured all the tradoffs, but very much believe the Intel string
    instructions will cause horrible cache pollution (anybody correct me if
    this is not the case). Ideally, calling a kernel run-time function for
    things like this should do some processor specific code (different on
    Pentium II and Pentium III), as the OS certainly has to know about
    different processors. I don't know if W2K kernel functions have this
    enhancement.

    The Microsoft C compiler does know how to inline many of the low level
    run-time functions. I know for sure it can generate inline string move
    instructions to replace memcpy, not sure about memcmp). If the code
    optimizer was really together, it should detect a simple loop that moves
    memory, and use the appropriate instructions.

    If one of the operands is device memory, over say a PCI bus, there may be
    some advantage to using larger data sizes (like 64 or 128 bit) move
    operations, as the hardware should know to do a 2 or 4 DWORD burst, instead
    of single DWORD bursts. Burst length is extreemly important to getting good
    PCI bus efficency. Stripmining into registers may also generate even better
    PCI bus bursts, like (for moving memory from a PCI target to main memory):

    next:
    movq mm0,[deviceMemory+0x00]
    movq mm1,[deviceMemory+0x08]
    movq mm2,[deviceMemory+0x10]
    movq mm3,[deviceMemory+0x18]
    movq mm4,[deviceMemory+0x20]
    movq mm5,[deviceMemory+0x28]
    movq mm6,[deviceMemory+0x30]
    movq mm7,[deviceMemory+0x38]
    movntq [dest+0x00],mm0
    movntq [dest+0x08],mm1
    movntq [dest+0x10],mm2
    movntq [dest+0x18],mm3
    movntq [dest+0x20],mm4
    movntq [dest+0x28],mm5
    movntq [dest+0x30],mm6
    movntq [dest+0x38],mm7
    sub ecx, 0x40
    jnz next

    This MAY generate 64-byte PCI bursts (a bus analyzer would be needed to see
    what really happens) and writes the data to memory without cache pollution.
    I know there was a discussion here a while back about getting burst PCI
    target read transfers if the source device memory is declared uncached
    (don't remember the conclusion).

    Also note that MMX instructions are like floating-point instructions, care
    must be taken to make them work in kernel mode (especially at greater than
    IRQL PASSIVE).

    My understanding of the actual application by [email protected] is to sweep
    128K of PCI target memory, and look for changes against a memory buffer.
    Getting fast PCI bursts, and not flushing the processor cache on every
    "poll" seems like a desirable quality. Twenty carefully chosen instructions
    will potentially perform MUCH better than a generic memory comparison
    function. Of course if the polling happens once every 60 seconds, it really
    doen't matter. If the polling happens 100 times/sec, it really does.

    I guess my point is there may be HUGE efficency losses by just using some
    generic run-time function, and "it depends" on lots of factors if this is a
    problem.

    - Jan
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    Hello Matt,

    Maybe I am missing something. You said that RtlCompareMemory() works
    exactly like memcmp(), but documentation for memcmp() says that it compares
    two memory blocks lexically. It says that the return value

    < 0 means buf1 less than buf2
    = 0 means buf1 identical to buf2
    > 0 means buf1 greater than buf2

    Doesn't it mean that memcmp() never returns the offset ?

    Also, see below is the source for memcmp() [File MSDEV\crt\src\memcmp.c].

    Please clarify since this is very confusing.

    Thanks
    Puja

    /***
    *memcmp.c - compare two blocks of memory
    *
    * Copyright (c) 1985-1993, Microsoft Corporation. All rights
    reserved.
    *
    *Purpose:
    * defines memcmp() - compare two memory blocks lexically and
    * find their order.
    *
    *******************************************************************************/

    #include
    #include

    #ifdef _MSC_VER
    #pragma function(memcmp)
    #endif /* _MSC_VER */

    /***
    *int memcmp(buf1, buf2, count) - compare memory for lexical order
    *
    *Purpose:
    * Compares count bytes of memory starting at buf1 and buf2
    * and find if equal or which one is first in lexical order.
    *
    *Entry:
    * void *buf1, *buf2 - pointers to memory sections to compare
    * size_t count - length of sections to compare
    *
    *Exit:
    * returns < 0 if buf1 < buf2
    * returns 0 if buf1 == buf2
    * returns > 0 if buf1 > buf2
    *
    *Exceptions:
    *
    *******************************************************************************/

    int __cdecl memcmp (
    const void * buf1,
    const void * buf2,
    size_t count
    )
    {
    if (!count)
    return(0);

    while ( --count && *(char *)buf1 == *(char *)buf2 ) {
    buf1 = (char *)buf1 + 1;
    buf2 = (char *)buf2 + 1;
    }

    return( *((unsigned char *)buf1) - *((unsigned char *)buf2) );
    }

    ==


    On 05/05/00, ""Matt A." " wrote:
    > > "RtlCompareMemory returns the number of bytes that compare as equal. If
    > all
    > > bytes compare as equal, the input Length is returned."
    > >
    > > In the following case:
    > >
    > > Block1 = AA BB CC DD EE FF (no spaces in actual memory)
    > > Block2 = AA 22 33 44 EE 00 (no spaces in actual memory)
    > >
    > > Will it return 2 because 2 bytes are equal [i.e. Byte 0 (AA) and Byte 4
    > > (EE) ] ?
    > >
    > > OR
    > >
    > > Will it return 1 because after 1 byte [i.e. Byte 0 (AA) ], it encounters
    > > bytes which are unequal ?
    >
    > RtlCompareMemory() works like the standard "C" memcmp() function. Namely,
    > it returns the number of bytes in the first block that match *corresponding*
    > bytes in the second block, starting from the beginning of the first block.
    >
    > So, for the above example, the result from either RtlCompareMemory() or
    > memcmp() would be 1. Only the first byte AA in Block1 matches the
    > corresponding byte in Block2. The second byte BB in Block1 does not match
    > Block2.
    >
    > > Basically I am looking for a function which will give me the
    > > location/offset of the first byte where 2 memory blocks differ. Can I use
    > > RtlCompareMemory() for this ? Any other ideas ?
    >
    > Yes, that's exactly the information RtlCompareMemory() returns. Note that
    > you could also include and use the standard memcmp() (which is
    > also exported by NTOSKRNL) -- whichever you prefer.
    >
    > - Matt
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    From: <[email protected]>
    Sent: Sunday, May 07, 2000 7:11 PM


    > Hello Matt,
    >
    > Maybe I am missing something. You said that RtlCompareMemory() works
    > exactly like memcmp(), but documentation for memcmp() says that it
    compares
    > two memory blocks lexically. It says that the return value
    >
    > < 0 means buf1 less than buf2
    > = 0 means buf1 identical to buf2
    > > 0 means buf1 greater than buf2
    >
    > Doesn't it mean that memcmp() never returns the offset ?
    >
    > Also, see below is the source for memcmp() [File MSDEV\crt\src\memcmp.c].
    >
    > Please clarify since this is very confusing.

    Yes, and I apologize for confusing you even more!

    What I said about memcmp() was entirely wrong. How embarrassing! I must
    have been working too long and posting too late. ;-)

    memcmp() does NOT return the same sort of value as RtlCompareMemory(). As
    you say, memcmp() returns a signed value indicating the lexical relation the
    first block or memory has to the second.

    However, based on your description of "a function which will give ... the
    location/offset of the first byte where 2 memory blocks differ", I still
    think RtlCompareMemory() is the function you want to use.

    - Matt
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > What I said about memcmp() was entirely wrong. How embarrassing! I must
    > have been working too long and posting too late. ;-)

    It happens to all of us! Its ok as long as we clear things up. Thanks for
    that!

    > However, based on your description of "a function which will give ... the
    > location/offset of the first byte where 2 memory blocks differ", I still
    > think RtlCompareMemory() is the function you want to use.

    I'm fine as long as I have a function which solves my purpose! I was just
    curious about memcmp() (in case I need to use it in future), even though I
    plan to use RtlCompare().

    Thanks again!

    Puja
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 30 January 2023 Live, Online
Developing Minifilters 20 March 2023 Live, Online
Writing WDF Drivers TBD 2023 Live, Online
Internals & Software Drivers 17 April 2023 Live, Online