Is is safer to use Memcpy() rather then RtlCopyMemory() in driver?
What is the Pros n Cons by using both.
Thanks
Is is safer to use Memcpy() rather then RtlCopyMemory() in driver?
What is the Pros n Cons by using both.
Thanks
According to this artice, http://www.codeproject.com/Articles/9575/Driver-Development-Part-2-Introduction-to-Implemen
RtlCopyMemory is just a wrapper for memcpy in case memcpy changes. ATM, there is no difference between them
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))
It may be at this point, but there is no guarantee that it will be that
way tomorrow. Driver developers should use RtlCopyMemory in my opinion
since it is the documented API for the kernel programming environment.
By the way the article you mention has caused a significant number of
the crashes I have tracked down for clients, i.e. I think it is a total
piece of crap.
Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
“xxxxx@orbitech.org” wrote in message
news:xxxxx@ntdev:
> According to this artice, http://www.codeproject.com/Articles/9575/Driver-Development-Part-2-Introduction-to-Implemen
> RtlCopyMemory is just a wrapper for memcpy in case memcpy changes. ATM, there is no difference between them
> #define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))
As usual, Mr. Burn is correct.
If you look at the kernel mode functions that copy memory in the OS and in the headers, when they need to copy something they call RtlCopyMemory. Look at acpiioct.h, ndis.h, tdikrnl.h, and ALL the various inline functions in WDF. They call use RtlCopyMemory and do not call memcpy directly.
It is true that wdm.h unconditionally defines:
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))
but this is an artifact of implementation, not architecture.
The function that copies memory in kernel mode is RtlCopyMemory. How that’s implemented is how it’s implemented. Nothing more.
Peter
OSR
P.S. Mr. Burn wrote:
The article came from CodeProject. That’s almost always enough said, right there. When the source is CodeProject, the code is guilty until proven innocent as far as I’m concerned. There’s no vetting of the content, and any major fool can, and will, post anything. You don’t have to look to far to prove this.
Peter
OSR
And don’t even think about using RtlCopyMemory or memcpy to access your device memory.
On 5/6/2013 11:06 AM, xxxxx@orbitech.org wrote:
According to this artice, http://www.codeproject.com/Articles/9575/Driver-Development-Part-2-Introduction-to-Implemen
RtlCopyMemory is just a wrapper for memcpy in case memcpy changes. ATM, there is no difference between them
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))
On x86 builds that is true; but on x64 builds RtlCopyMemory uses XMM
instructions (at least in one of its control paths). If you need to
copy large amounts of memory on x64 builds RtlCopyMemory will result in
substantial performance improvement as compared to memcpy. The use of
XMM instructions on x64 builds leads to some unpleasant surprises for
people who try to port x86 applications to 64-bits and who are in the
habit of using misaligned pointers or RtlCopyMemory with memory that is
exposed by a PCI BAR.
Really? That’s interesting.
Can you point me to where you see that? Because the (Windows 8) copy of WDM.H at line 11118 I see a definition for RtlCopyMemory that does not appear to be within any conditional.
I see similar definitions (without condition) in other header files (such as winnt.h, and others).
Peter
OSR
xxxxx@gmail.com wrote:
Is is safer to use Memcpy() rather then RtlCopyMemory() in driver?
You’ve already received the answer that RtlCopyMemory is a #define for
memcpy.
Remember, however, that memcpy is not guaranteed to handle the case
where the buffers overlap. I have virtually eliminated memcpy (and
RtlCopyMemory) from my vocabulary, and use memmove (or RtlMoveMemory)
exclusively.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
This is the sort of advice that I would love to use, but this statement in the documentation is too ambiguous:
The RtlCopyMemory routine runs faster than RtlMoveMemory,"
What does “run faster” mean? Is it 1000x faster or just .0005x faster? Some products will be very dependent on this statement.
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Monday, May 06, 2013 12:42 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] RtlCopyMemory() Vs Memcpy()
xxxxx@gmail.com wrote:
Is is safer to use Memcpy() rather then RtlCopyMemory() in driver?
You’ve already received the answer that RtlCopyMemory is a #define for memcpy.
Remember, however, that memcpy is not guaranteed to handle the case where the buffers overlap. I have virtually eliminated memcpy (and
RtlCopyMemory) from my vocabulary, and use memmove (or RtlMoveMemory) exclusively.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
NTDEV is sponsored by OSR
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
On 5/6/2013 3:21 PM, xxxxx@osr.com wrote:
Really? That’s interesting.
Can you point me to where you see that? Because the (Windows 8) copy of WDM.H at line 11118 I see a definition for RtlCopyMemory that does not appear to be within any conditional.
I see similar definitions (without condition) in other header files (such as winnt.h, and others).
Peter
OSR
uf nt!memcpy
[…]
nt!memcpy+0x77:
fffff8020c489ef7 4c2bc1 sub r8,rcx fffff802
0c489efa f3420f6f041a movdqu xmm0,xmmword ptr [rdx+r11]
fffff802`0c489f00 f3410f7f03 movdqu xmmword ptr [r11],xmm0
nt!memcpy+0x85:
fffff802`0c489f05 4903cb add rcx,r11
nt!memcpy+0x88:
fffff8020c489f08 4d8bc8 mov r9,r8 fffff802
0c489f0b 49c1e905 shr r9,5
fffff8020c489f0f 4981f900200000 cmp r9,2000h fffff802
0c489f16 0f8776000000 ja nt!memcpy+0x112
(fffff802`0c489f92)
nt!memcpy+0x9c:
fffff802`0c489f1c 4983e01f and r8,1Fh
nt!memcpy+0xa0:
fffff8020c489f20 f30f6f040a movdqu xmm0,xmmword ptr [rdx+rcx] fffff802
0c489f25 f30f6f4c0a10 movdqu xmm1,xmmword ptr [rdx+rcx+10h]
fffff8020c489f2b 4883c120 add rcx,20h fffff802
0c489f2f 660f7f41e0 movdqa xmmword ptr [rcx-20h],xmm0
fffff8020c489f34 660f7f49f0 movdqa xmmword ptr [rcx-10h],xmm1 fffff802
0c489f39 49ffc9 dec r9
fffff8020c489f3c 75e2 jne nt!memcpy+0xa0 (fffff802
0c489f20)
[…]
movdqu xmm0,xmmword ptr [rdx+rcx] <
Qu’est-ce que c’est que ça?
Speer, Kenny wrote:
This is the sort of advice that I would love to use, but this statement in the documentation is too ambiguous:
The RtlCopyMemory routine runs faster than RtlMoveMemory,"
What does “run faster” mean? Is it 1000x faster or just .0005x faster? Some products will be very dependent on this statement.
It’s a ridiculous statement that has been handed down since the mists of
time, based on the documentation for the original C run-time library
some 40 or 50 years ago.
Here is a pseudo-code implementation of memmove:
Is the source address greater than the destination address?
copy in the normal direction
else
copy in the backwards direction
Here is a pseudo-code implementation of memcpy:
copy in the normal direction
You can see how significant the performance impact is likely to be. It
is so close to 0 that it cannot be measured, except for very small
copies. The “copy” algorithms themselves are identical.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
George M. Garner Jr. wrote:
On 5/6/2013 3:21 PM, xxxxx@osr.com wrote:
> [quote]
> On x86 builds that is true; but on x64 builds RtlCopyMemory uses XMM
> instructions (at least in one of its control paths).
> [/quote]
>
> Really? That’s interesting.
>
> Can you point me to where you see that? Because the (Windows 8) copy of WDM.H at line 11118 I see a definition for RtlCopyMemory that does not appear to be within any conditional.
uf nt!memcpy
[…]
nt!memcpy+0x77:
fffff8020c489ef7 4c2bc1 sub r8,rcx fffff802
0c489efa f3420f6f041a movdqu xmm0,xmmword ptr [rdx+r11]
fffff802`0c489f00 f3410f7f03 movdqu xmmword ptr [r11],xmm0
Yes, but you just proved a statement that is very different from the one
you originally made. Your statement above implies that x64
RtlCopyMemory uses XMM instructions and memcpy does not, and is
therefore inferior. Here you have shown that memcpy uses XMM instructions.
The original assertion was that RtlCopyMemory == memcpy. I believe that
assertion still stands.
By the way, memcpy is a compiler intrinsic, so if intrinsics are
enabled, you won’t even get to the run-time code shown there.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
Tim,
Yes, but you just proved a statement that is very different from the one
you originally made. Your statement above implies that x64
RtlCopyMemory uses XMM instructions and memcpy does not, and is
therefore inferior. Here you have shown that memcpy uses XMM instructions.The original assertion was that RtlCopyMemory == memcpy. I believe that
assertion still stands.By the way, memcpy is a compiler intrinsic, so if intrinsics are
enabled, you won’t even get to the run-time code shown there.
If you use RtlCopyMemory in your code you will import memcpy from the
kernel (which is optimized on x64-builds). If you use memcpy directly
unoptimized code is either inlined or a separate function body is
generated in your driver, depending on what optimizations you have
enabled. So the practical result is just what I said. At least that is
the way things are working here.
Google (or bing) around a little and you will find some studies that
demonstrate the superior performance of xmm instructions on 64-bit
builds and not on x86 builds. MS has obviously done some research of
its own and come to the same conclusion.
Regards,
George.
Oh, almost forgot about the prefetch on 64-bit builds:
nt!memcpy+0x2c0:
fffff8020c48a140 4881e980000000 sub rcx,80h fffff802
0c48a147 0f18040a prefetchnta [rdx+rcx]
fffff8020c48a14b 0f18440a40 prefetchnta [rdx+rcx+40h] fffff802
0c48a150 ffc8 dec eax
fffff8020c48a152 75ec jne nt!memcpy+0x2c0 (fffff802
0c48a140)
I have to give MS a lot of credit here. They did their homework!
@Tim Roberts:
REP MOVSD will move whole cache lines in one clock, if the alignment allows that, and the address goes up. I’m not sure if that’t the case with address decrement, which you will often get with memmove.
This is an unsupported assertion. I’m once again asking for the facts to back this.
So again, I’ll ask: Please point me to where RtlCopyMemory is defined as something other than memcpy. Because, as I said, you can see in WDM.H it’s a strict #define.
Your code dump simply shows that you memcpy uses XMM. Which is fine. It does, however, not demonstrate that RtlCopyMemory and memcpy in Windows kernel mode are different.
Peter
OSR
George M. Garner Jr. wrote:
If you use RtlCopyMemory in your code you will import memcpy from the
kernel (which is optimized on x64-builds). If you use memcpy directly
unoptimized code is either inlined or a separate function body is
generated in your driver, depending on what optimizations you have
enabled. So the practical result is just what I said.
Sorry, but this is simply not true. Look at the code. Here is the
essential definition of RtlCopyMemory:
#define RtlCopyMemory(a,b,c) memcpy(a,b,c)
It’s a preprocessor macro. Textual substitution. By the time the code
generator pass sees it, RtlCopyMemory DOES NOT EXIST. The result is
EXACTLY the same as if you had typed memcpy. It has to be. The
language requires it.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
On 5/6/2013 5:04 PM, xxxxx@osr.com wrote:
This is an unsupported assertion. I’m once again asking for the facts to back this.
So again, I’ll ask: Please point me to where RtlCopyMemory is defined as something other than memcpy. Because, as I said, you can see in WDM.H it’s a strict #define.
Your code dump simply shows that you memcpy uses XMM. Which is fine. It does, however, not demonstrate that RtlCopyMemory and memcpy in Windows kernel mode are different.
Peter
OSR
Dude! My code dump WAS from the w8 RTM kernel! Or does “uf nt!memcpy”
mean something else up in OSR-land?
As far as the assertion which you quote above, it is based on
disassembly of x86 and x86 binaries compiled using the W7 WDK and either
“memcpy” or “RtlCopyMemory.”
Here, where we do engineering, it means you’re dumping memcpy.
Your assertion was that memcpy and RtlCopyMemory are different.
So, why not now dump RtlCopyMemory for us.
You can’t because they are not different. They are identical, per the macro I showed you previously. In fact, I will assert that there IS no code for RtlCopyMemory in the Windows Kernel (unless you count a #define as a defintion).
As Mr. Roberts said:
And you, Mr. Garner, are – once again – blowing smoke and making assertions that you have not been able to support.
Cite your evidence, that… as you said:
It seems that your dumping memcpy on x64, and showing the XMM instructions, in fact proves that your statement above is false.
Peter
OSR
disclaimer nothing about speed or slow just about the existence of a
similar sounding api
well there was one RtlCopyMemoryNonTemporal in xp (identical code in
both ntdll and ntoskrnl ) ( i remember because it had some 0f c3
opcodes which ollydbg wasnt disassembling and virtual pc was using a
series of 0f 3f opcodes (possibly alluded as vpc backdoor)
it used movnti in xpsp3 with a prefetchnta and sfence on the end
it was not an exported and didnt seem to be referanced
so i wrote a hack to call it and see then (asked ollydbg to step into
unknown commands)
int main (void)
{
HMODULE hMod = GetModuleHandle(L"ntdll.dll");
ULONG RtlCopyMemoryNonTemporal = (ULONG) hMod + 0x2cd8;
char from[0x100] = {“let me be the one to be copied iam the
sacrifical goat let me be the one to be sacred”};
char to[0x100];
__asm
{
mov eax,RtlCopyMemoryNonTemporal
mov edx,0x23 // unaligned triggers rep movsb for the last 3
bytes else only sse2
push edx
lea edx, from
push edx
lea edx, to
push edx
call eax
}
return 0;
}
havent Seen a Plain RtlCopyMemory though
On 5/7/13, xxxxx@osr.com wrote:
>
>
> Here, where we do engineering, it means you’re dumping memcpy.
>
> Your assertion was that memcpy and RtlCopyMemory are different.
>
> So, why not now dump RtlCopyMemory for us.
>
> You can’t because they are not different. They are identical, per the macro
> I showed you previously. In fact, I will assert that there IS no code for
> RtlCopyMemory in the Windows Kernel (unless you count a #define as a
> defintion).
>
> As Mr. Roberts said:
>
>
>
> And you, Mr. Garner, are – once again – blowing smoke and making
> assertions that you have not been able to support.
>
> Cite your evidence, that… as you said:
>
>
>
> It seems that your dumping memcpy on x64, and showing the XMM instructions,
> in fact proves that your statement above is false.
>
> Peter
> OSR
>
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>