RtlZeroBytes() on PCIe card's mapped registers : BAD IDEA ?!

Hi all,
Just to give you some feedback on something I don’t find elsewhere : using RtlZeroBytes() on mapped registers of a PCIe card will stall the computer in a way that even the Kernel Debug see nothing.
This post is mainly a reminder.

The issue

On Windows 10 (10.0.19043).

The code was like :

area = MmMapIoSpace(basePhyAddr, size, MmNonCached)
RtlZeroMemory(area, size);

This perfectly ran in Debug builds, but crashed in Release ones.

According to the code in wdm.h , the RtlZeroBytes() is defined as
#define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))

I’ve found no remark or else in any Microsoft documentation about the use of RtlZeroBytes() on mapped registers.

The solution

At first, I used a simple for() loop. But once I’ve validated the write operations, I’ve searched for a better solution and this was with RtlSecureZeroMemory().
This is either a straight amd64 CPU intrinsic or a bunch of assembly code otherwise.

Only one question

Did you ever already seen this ?

Regards,
Eric.

RtlZeroMemory calls memset. If you do more than 16 bytes, it uses the XMM “movups” instruction to write 16 bytes at a time. If your device can’t handle burst write requests bigger than 32 bits, that will explode, but that’s not WIndows fault.

On the other hand, it’s an unusual device that needs register space cleared in bulk like that.

Hi Tim,
Thanks for your reply.

I know memset() was used but I had no idea about its implementation in Windows’s kernel libs. So it uses an MMX instruction. I’ve seen other implementations that use DMA features to do the job on other CPU.

The fault : simply add in the doc : only to be used on internal memory, it would be helpful.

By the way, the secure version uses another CPU instruction which looks being supported by the PCIe bus controller. Sorry, I didn’t read the amd64 ISA doc about it. But I’m quite sure it will do some burst accesses here too.

About cleaning the mapped region : it’s an exchange area so I think it’s a good practice to clean it before use, isn’t it ?

Best regards,
Eric.

The fault : simply add in the doc : only to be used on internal memory

This is unnecessary. By definition, all accesses to device memory require the use of HAL-provided functions. It is not optional.

Peter

Hi Peter,
Thanks for your reply.
Could you tell me what is the HAL function to clean a mapped region ?

Best regards,
Eric.

Well, you use the ordinary HAL functions: WRITE_REGISTER_xxxx. In a loop.

If this is an initialization function, the performance isn’t important… but correctness is.

Peter

Why do you need to “clean” a mapped region at all? If this is a circular buffer that boots up in a random state, then random is just as good as zero, right? Your code was zeroing out the entire BAR. Typically, a BAR contains registers that affect operation, where writes of any kind can have dangerous side effects. It is not “best practice” to zero out such a BAR at all. All registers should have a well-defined “reset” state, and that state should be what it needs to begin operation.

1 Like

Hi Tim,
I agree, I will never do any “bulk” writes on card’s registers.

As I wrote in my first post. These particular registers (subset of the card’s BAR) are an exchange area between the PC and the card. So, they can be seen like a shared memory embeded in the card. They are used by several services and thus they need to be cleaned to avoid any “collision”.
“Leave this place clean after use”.

Regards,
Eric.
PS : thanks for your time.

It seems unlikely that the memory really does need to be ‘cleaned’ after use.

From a security point of view, all of these ‘services’ that can use this area must run either on the card itself, or in ring 0 on the host side. either location can arbitrarily read or corrupt data that is is not supposed to read or write to.

From a correctness point of view, each one of these services that can use this area must have some way of indicating ‘this resource is mine now and don’t touch it’. That should prevent reading partially written or corrupted data. It doesn’t really matter if your protocol has fixed size structures, delimited ones, null terminated or polka dotted and striped ones - initializing the unused bytes to zero does not help and costs extra memory writes

1 Like