Hi all,
Just to give you some feedback on something I don’t find elsewhere : using RtlZeroBytes() on mapped registers of a PCIe card will stall the computer in a way that even the Kernel Debug see nothing.
This post is mainly a reminder.
The issue
On Windows 10 (10.0.19043).
The code was like :
area = MmMapIoSpace(basePhyAddr, size, MmNonCached) RtlZeroMemory(area, size);
This perfectly ran in Debug builds, but crashed in Release ones.
According to the code in wdm.h , the RtlZeroBytes() is defined as #define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))
I’ve found no remark or else in any Microsoft documentation about the use of RtlZeroBytes() on mapped registers.
The solution
At first, I used a simple for() loop. But once I’ve validated the write operations, I’ve searched for a better solution and this was with RtlSecureZeroMemory().
This is either a straight amd64 CPU intrinsic or a bunch of assembly code otherwise.
RtlZeroMemory calls memset. If you do more than 16 bytes, it uses the XMM “movups” instruction to write 16 bytes at a time. If your device can’t handle burst write requests bigger than 32 bits, that will explode, but that’s not WIndows fault.
On the other hand, it’s an unusual device that needs register space cleared in bulk like that.
I know memset() was used but I had no idea about its implementation in Windows’s kernel libs. So it uses an MMX instruction. I’ve seen other implementations that use DMA features to do the job on other CPU.
The fault : simply add in the doc : only to be used on internal memory, it would be helpful.
By the way, the secure version uses another CPU instruction which looks being supported by the PCIe bus controller. Sorry, I didn’t read the amd64 ISA doc about it. But I’m quite sure it will do some burst accesses here too.
About cleaning the mapped region : it’s an exchange area so I think it’s a good practice to clean it before use, isn’t it ?
Why do you need to “clean” a mapped region at all? If this is a circular buffer that boots up in a random state, then random is just as good as zero, right? Your code was zeroing out the entire BAR. Typically, a BAR contains registers that affect operation, where writes of any kind can have dangerous side effects. It is not “best practice” to zero out such a BAR at all. All registers should have a well-defined “reset” state, and that state should be what it needs to begin operation.
Hi Tim,
I agree, I will never do any “bulk” writes on card’s registers.
As I wrote in my first post. These particular registers (subset of the card’s BAR) are an exchange area between the PC and the card. So, they can be seen like a shared memory embeded in the card. They are used by several services and thus they need to be cleaned to avoid any “collision”.
“Leave this place clean after use”.
It seems unlikely that the memory really does need to be ‘cleaned’ after use.
From a security point of view, all of these ‘services’ that can use this area must run either on the card itself, or in ring 0 on the host side. either location can arbitrarily read or corrupt data that is is not supposed to read or write to.
From a correctness point of view, each one of these services that can use this area must have some way of indicating ‘this resource is mine now and don’t touch it’. That should prevent reading partially written or corrupted data. It doesn’t really matter if your protocol has fixed size structures, delimited ones, null terminated or polka dotted and striped ones - initializing the unused bytes to zero does not help and costs extra memory writes