Understanding MEMORY_CACHING_TYPE

vincent_saint_martin · September 12, 2008, 8:06am

Hi all,

When reading the DDK documentation, I don’t really understand the differences between MmCached, MmNonCached and MmWriteCombined.

What impacts does this have on device memory mapped with MmMapIoSpace?
What impacts does this have on RAM memory mapped with MmMapLockedPagesSpecifyCache?

Could you point me on a document that explains those differences?

Thanks a lot for your help.

Vincent

Daniel_Terhell · September 12, 2008, 9:04am

You can find out more than you want to know about I/O address spaces,
caching and write combining in the system programming manuals of AMD and
Intel CPUs.

//Daniel

wrote in message news:xxxxx@ntdev…
> Hi all,
>
> When reading the DDK documentation, I don’t really understand the
> differences between MmCached, MmNonCached and MmWriteCombined.
>
> What impacts does this have on device memory mapped with MmMapIoSpace?
> What impacts does this have on RAM memory mapped with
> MmMapLockedPagesSpecifyCache?
>
> Could you point me on a document that explains those differences?
>
> Thanks a lot for your help.
>
> Vincent
>

Pavel_A1 · September 12, 2008, 10:33am

wrote in message news:xxxxx@ntdev…
> You can find out more than you want to know about I/O address spaces,
> caching and write combining in the system programming manuals of AMD and
> Intel CPUs.
>
> //Daniel

This is exactly what’s the problem with fine manuals: overwhelming with
irrelevant details.
To put it very short and simple (maybe too simple, pls. correct me then) -

For mapping of i/o space you want non-cached memory.
(WDK documentation says that noncached memory is scarce, allocation may
fail, so better use cached, etc. - but their explanation how to use cached
memory is IMHO confusing and hard to follow).

To allocate a shared or “common” DMA buffer (such as for bus-mastering PCI
devices) use IoGetDmaAdapter().

MmWriteCombined: Ignore it unless you’re dealing with a video adapter.

All usermode memory (buffers passed to you in I/O requests) and kernel pools
are MmCached.

And finally, there is an important rule: each physical block of memory can
have several
virtual space mappings - but all must have same Mm caching type - otherwise
bad things may happen.
Be careful to avoid this (e.g. don’t attempt to remap usermode buffer as
MmNonCached , etc).
Latest OSes won’t let you to do this, but older (win2k) can.

Regards,
–PA

> wrote in message news:xxxxx@ntdev…
>> Hi all,
>>
>> When reading the DDK documentation, I don’t really understand the
>> differences between MmCached, MmNonCached and MmWriteCombined.
>>
>> What impacts does this have on device memory mapped with MmMapIoSpace?
>> What impacts does this have on RAM memory mapped with
>> MmMapLockedPagesSpecifyCache?
>>
>> Could you point me on a document that explains those differences?
>>
>> Thanks a lot for your help.
>>
>> Vincent
>>
>
>

vincent_saint_martin · September 12, 2008, 11:49am

First of all, thank you for your replies.

In fact, I have the address of a physical memory page (RAM) into which a device can perform DMA transfers:

I map this page with MmCached parameter.
I memset 0xFF inside the whole page.
I perform a DMA transfer from a device memory location that contains all 0xAB.
The result is that the whole RAM page contains 0xAB and not 0xFF anymore.
I thought some parts of the page would still contain 0xFF because the CPU was not aware of the DMA and would’nt have updated its cache accordingly.
Am I wrong?

Regards.
Vincent

Pavel_A1 · September 12, 2008, 4:23pm

wrote in message news:xxxxx@ntdev…
> First of all, thank you for your replies.
>
> In fact, I have the address of a physical memory page (RAM) into which a
> device can perform DMA transfers:
> 1. I map this page with MmCached parameter.
> 2. I memset 0xFF inside the whole page.
> 3. I perform a DMA transfer from a device memory location that contains
> all 0xAB.
> The result is that the whole RAM page contains 0xAB and not 0xFF anymore.
> I thought some parts of the page would still contain 0xFF because the CPU
> was not aware of the DMA and would’nt have updated its cache accordingly.
> Am I wrong?

The CPU cache is aware of DMA. And it updates a little faster than you type

–PA

Calvin_Guan-2 · September 12, 2008, 8:43pm

Don’t remember how parallel PCI does it but in the case of PCIe, DMA is snooped by default unless the no-snoop bit in the TLP is set. One would feel very sorry by setting this bit w/o taking other facts into account:) Don’t try it at home.

–
Calvin Guan
NetXtreme II 10Gbps Converged NIC
Broadcom Corporation
Connecting Everything(r)

“Pavel A.” wrote in message news:xxxxx@ntdev…
> wrote in message news:xxxxx@ntdev…
>> First of all, thank you for your replies.
>>
>> In fact, I have the address of a physical memory page (RAM) into which a
>> device can perform DMA transfers:
>> 1. I map this page with MmCached parameter.
>> 2. I memset 0xFF inside the whole page.
>> 3. I perform a DMA transfer from a device memory location that contains
>> all 0xAB.
>> The result is that the whole RAM page contains 0xAB and not 0xFF anymore.
>> I thought some parts of the page would still contain 0xFF because the CPU
>> was not aware of the DMA and would’nt have updated its cache accordingly.
>> Am I wrong?
>
> The CPU cache is aware of DMA. And it updates a little faster than you type
>
>
> --PA
>
>
>

OSR_Community_User · September 12, 2008, 11:36pm

> For mapping of i/o space you want non-cached memory.

(WDK documentation says that noncached memory is scarce, allocation may
fail, so better use cached, etc. - but their explanation how to use cached
memory is IMHO confusing and hard to follow).

Which documentation says that? Caching attributes are just bits
in the PTE so they shouldn’t have any effect on whether allocations
succeed or fail.

–
This posting is provided “AS IS” with no warranties, and confers no
rights.

anton_bassov · September 13, 2008, 4:36am

>Which documentation says that?

Caching attributes are just bits in the PTE so they shouldn’t have any effect on
whether allocations succeed or fail.

Actually, there are 2 types on caching policy control:

Control of caching virtual addresses. This is done via PTEs
Control of caching physical memory ranges. This is done via MTTRs

IIRC, in case of conflict between the former and the latter for a given page(as well as of the one between overlapping variable-size ranges) the results are not always defined, but, as a general rule, UC has precedence, no matter whether it is specified in PTE or in MTTR . Therefore, apparently, the system tries to avoid conflicts between caching types whenever it is possible, and prefers not to mess around with MTTRs unless it has a good reason to do so. This is how I understood MSDN quotation that Pavel had provided…

Anton Bassov

Pavel_A1 · September 13, 2008, 4:54pm

“Pavel Lebedinsky” wrote in message news:xxxxx@ntdev…
>> For mapping of i/o space you want non-cached memory.
>> (WDK documentation says that noncached memory is scarce, allocation may
>> fail, so better use cached, etc. - but their explanation how to use
>> cached memory is IMHO confusing and hard to follow).
>
>
> Which documentation says that? Caching attributes are just bits
> in the PTE so they shouldn’t have any effect on whether allocations
> succeed or fail.

Perhaps I shouldn’t have mentioned this. The OP asked for a short clear
explanation…
The following is quote from NdisMAllocateSharedMemory (WDK docum. build
August 06, 2008):

"Whenever possible, a miniport driver calls NdisMAllocateSharedMemory with
Cached set to TRUE because its request is more likely to succeed.
In any platform, noncached memory is always a scarce system resource.
Usually, drivers can get larger allocations from cached memory as well.

A miniport driver must allocate its shared memory space from noncached
memory if either of the following is true:
* The NIC or miniport driver writes directly into receive buffers before
the miniport driver indicates the newly received data.
For example, a NIC that sets flags in each received frame after it has been
transferred must have access to receive buffers in noncached memory.
Otherwise, the miniport driver could not determine when it should issue a
flush to maintain cache coherency: either the miniport driver would take a
performance hit by waiting for a fail-safe interval to flush the cached
receive buffer, or the miniport driver would make indications in which the
frame flags were randomly set.

* The NIC transfers some number of received frames sequentially into
contiguous physical memory within the shared memory space.
If such a NIC transferred incoming frames into contiguous cached memory, its
driver cannot maintain data integrity for all such frames when any frame
might straddle a cache-line boundary. When the miniport driver flushed the
range for such a frame, it also might flush the cache space containing some
of the next frame if it was already transferred, thereby making that next
frame incoherent.

A miniport driver should align the buffers it allocates from shared cached
memory on an integral of the host data-cache-line boundary to prevent
cache-line tearing during DMA. Cache-line tearing can cause data-integrity
problems in the driver or degrade the driver’s (and the system’s) I/O
performance by requiring excessive data-cache flushing to maintain data
integrity. "

Once I showed this passage to our project manager; he read it several
minutes forth and back, then said: “we’ll use non-cached memory, period.”

And this is from description of AllocateCommonBuffer:
“CacheEnabled - Specifies whether the allocated memory can be cached.
This parameter is ignored on computers with x86-based, x64-based, and
Itanium-based processors, because the operating system assumes that DMA is
always coherent with the caches of these processors.
For these processors, the operating system always allocates common buffers
that are cache-enabled, and a driver can use common buffers only if DMA
operations for the device are coherent with the contents of the processor
caches.”

Regards,
–PA

anton_bassov · September 13, 2008, 9:12pm

> This parameter is ignored on computers with x86-based, x64-based, and Itanium-based processors

Well, modern NT kernel does not seem to support any architecture, apart from the ones mentioned above, does it. I just wonder if they are trying to confuse readers on purpose…

Anton Bassov

Mark_Roddy · September 16, 2008, 7:22am

The quoted text appears to be total bullshit for the most part, probably an
accumulation of obsolete information that has not been revisited for
accuracy in quite a while. Memory/cache coherency is guaranteed by hardware
for x86, x64, ia64 platforms and has been for a long time. The NIC cannot
write a bit into system memory that will be unobserved by a processor
reading that memory at a later time.

On Sat, Sep 13, 2008 at 9:13 PM, wrote:

> > This parameter is ignored on computers with x86-based, x64-based, and
> Itanium-based processors
>
> Well, modern NT kernel does not seem to support any architecture, apart
> from the ones mentioned above, does it. I just wonder if they are trying to
> confuse readers on purpose…
>
> Anton Bassov
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

–
Mark Roddy