Accessing non cachable memory from user space

OSR_Community_User · April 28, 2011, 2:24pm

Hello,

We are trying to improve a piece of code we work on to allocate memory from a cachable address space to make sure this memory can be stored in L1 of the CPU (unlike some address space which are marked as memory that cannot be cached so they can be used for DMA transfers for example).

The same code also has a simulation mode but here of course all the memory can be cached. So I was wondering if there is a way to call from user space to kernel and reserve a piece of non cachable memory and then use it from user space. If we can do that we can prove that using non cachable memory does has preformance impact.

Thanks.

Don_Burn_1 · April 28, 2011, 2:31pm

Take a look at http://www.osronline.com/article.cfm?article=39 this will
allow you to allocate non-cached memory your app can use.

Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“xxxxx@emc.com” wrote in message
news:xxxxx@ntdev:

> Hello,
>
> We are trying to improve a piece of code we work on to allocate memory from a cachable address space to make sure this memory can be stored in L1 of the CPU (unlike some address space which are marked as memory that cannot be cached so they can be used for DMA transfers for example).
>
> The same code also has a simulation mode but here of course all the memory can be cached. So I was wondering if there is a way to call from user space to kernel and reserve a piece of non cachable memory and then use it from user space. If we can do that we can prove that using non cachable memory does has preformance impact.
>
> Thanks.

Pavel_Lebedinsky · April 28, 2011, 3:24pm

If this memory is going to be used only from user space you can call VirtualAlloc(PAGE_NOCACHE).

Thanks,
Pavel

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@emc.com
Sent: Thursday, April 28, 2011 11:24 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Accessing non cachable memory from user space

Hello,

We are trying to improve a piece of code we work on to allocate memory from a cachable address space to make sure this memory can be stored in L1 of the CPU (unlike some address space which are marked as memory that cannot be cached so they can be used for DMA transfers for example).

The same code also has a simulation mode but here of course all the memory can be cached. So I was wondering if there is a way to call from user space to kernel and reserve a piece of non cachable memory and then use it from user space. If we can do that we can prove that using non cachable memory does has preformance impact.

Thanks.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 28, 2011, 3:56pm

thanks a lot, this is helpful !

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Don Burn
Sent: Thursday, April 28, 2011 2:31 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Accessing non cachable memory from user space

Take a look at http://www.osronline.com/article.cfm?article=39 this will
allow you to allocate non-cached memory your app can use.

Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“xxxxx@emc.com” wrote in message
news:xxxxx@ntdev:

> Hello,
>
> We are trying to improve a piece of code we work on to allocate memory from a cachable address space to make sure this memory can be stored in L1 of the CPU (unlike some address space which are marked as memory that cannot be cached so they can be used for DMA transfers for example).
>
> The same code also has a simulation mode but here of course all the memory can be cached. So I was wondering if there is a way to call from user space to kernel and reserve a piece of non cachable memory and then use it from user space. If we can do that we can prove that using non cachable memory does has preformance impact.
>
> Thanks.

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · April 29, 2011, 10:46am

>cannot be cached so they can be used for DMA transfers for example).

On usual PC, the cache coherency with DMA is automatically maintained in hardware.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Tim_Roberts · May 2, 2011, 3:19pm

xxxxx@emc.com wrote:

We are trying to improve a piece of code we work on to allocate memory
from a cachable address space to make sure this memory can be stored
in L1 of the CPU (unlike some address space which are marked as memory
that cannot be cached so they can be used for DMA transfers for example).

That’s not much of a concern in x86 systems. With a small number of
obscure exceptions, DMA on x86 systems is cache-aware.

What makes you think the code you have is getting non-cachable memory.

The same code also has a simulation mode but here of course all the
memory can be cached. So I was wondering if there is a way to call
from user space to kernel and reserve a piece of non cachable memory
and then use it from user space. If we can do that we can prove that
using non cachable memory does has preformance impact.

I ran an experiment on this just after Vista was released. I wrote a
driver to reprogram the MTRR registers on the fly, so we could enable
and disable caching. Our particular tests were focused on write-through
caching vs write-back caching. With write-through, writes behave as
though the cache were not present – it goes all the way to the memory chip.

The results were quite remarkable. With the PassMark benchmark, most of
the results were two full orders of magnitude better with write-back
caching. You’d see 150 times, 200 times, 300 times improvement.

In one experiment, I made my driver a boot driver, so that it booted
with write-back disabled. The boot time went from 50 seconds to 15 minutes.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

anton_bassov · May 2, 2011, 4:06pm

> I ran an experiment on this just after Vista was released. I wrote a driver to reprogram the

MTRR registers on the fly, so we could enable and disable caching. Our particular tests
were focused on write-through caching vs write-back caching. With write-through, writes behave
as though the cache were not present – it goes all the way to the memory chip.

And what about modifying PTEs in a way that matches your modifications of MTTRs???

Please note that, as a general rule, in case of conflict between caching type specified for a page in PTE and the one specified for a physical range via MTTR, the hardware assumes the more restrictive caching type, but some certain combinations may produce unpredictable results. IIRC, Intel Developer’s Manual even provides a table with the possible combinations, so that you can check it for more detailed info…

In one experiment, I made my driver a boot driver, so that it booted with write-back disabled.
The boot time went from 50 seconds to 15 minutes.

In fact, it may have resulted not from disabling caching per se but from above mentioned cache type conflict - after all, the OS does not know about your experimentations, so that it specifies caching type in PTE the way it always does. Therefore, it may be just above mentioned “unpredictable result”, in Developer’s Manual’s terms.

BTW, I vaguely recall a discussion where it was mentioned that in some cases specifying different caching types for the same physical page that is mapped to two different virtual addresses in corresponding PTEs may result in system freeze. IIRC, this behavior is not mentioned anywhere in Intel’s Manuals…

Anton Bassov

Peter_Viscarola_OSR · May 2, 2011, 4:58pm

Let’s all jump in with a comment, shall we?

As Mr. Roberts said, DMA on x86 systems is coherent without regard to the memory’s cache attribute. Thus, in terms of current implementation if not necessarily architecture, AllocateCommonBuffer (used for common-buffer DMA) is always allocated cached.

From the docs on the “Cached” parameter of AllocateCommonBuffer:

This parameter is ignored on computers with x86-based, x64-based, and Itanium-based processors, because the operating system assumes that DMA is always coherent with the caches of these processors. For these processors, the operating system always allocates common buffers that are cache-enabled, and a driver can use common buffers only if DMA operations for the device are coherent with the contents of the processor caches.

Peter
OSR

Jake_Oshins · May 3, 2011, 11:06pm

Let’s all jump in with a comment, shall we?

As Mr. Roberts said, DMA on x86 systems is coherent without regard to the
memory’s cache attribute. Thus, in terms of current implementation if not
necessarily architecture, AllocateCommonBuffer (used for common-buffer DMA)
is always allocated cached.

Peter
OSR

Actually Mr. Roberts said something a little bit more nuanced and certainly
more correct. He used the phrase “With a small number of obscure
exceptions.”

Those exceptions are important to note, if not just to get people to stop
arguing that they exist or don’t exist. PCI Express devices can have
non-cached virtual channels, though we at Microsoft strongly recommend that
you don’t try to make use of that. In general, we’ve seen this capability
produce really hard to debug failures and little real gain.

Jake Oshins
Windows Kernel Team