MmNonCached or MmCached

Hi guys

I’m a little confused about definiation of MEMORY_CACHING_TYPE. Could you
tell me the difference between MmNonCached or MmCached.

Thanks
Fan Zhang

Do you understand about a CPU’s memory cache? Platforms have on chip
memory to cache data items for faster retrieval by the processor.
Unfortunately, data in the cache is not available to DMA devices that
access memory until the data is flushed from the cache, and likewise if
the device updates memory the cache may not reflect the change.
MmNonCached indicates the memory accesses should not go through the
cache, MmCached indicates that like normal memory allocations such as
ExAllocatePool the data should be cacheable.

Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Marvin(Fan) Zhang” wrote in message
news:xxxxx@ntdev:

> Hi guys
>
> I’m a little confused about definiation of MEMORY_CACHING_TYPE. Could you
> tell me the difference between MmNonCached or MmCached.
>
> Thanks
> Fan Zhang

> I’m a little confused about definiation of MEMORY_CACHING_TYPE. Could you tell me the difference

between MmNonCached or MmCached.

What I would highly recommend is to google the doc named “What every programmer should know about memory” that got published by Red Hat - it gives you really good introduction to the memory concepts…

Anton Bassov

Hi Don,

Thanks for your explanation. Is there any method that can force the that
cache into physical memory? I found there is function named
KeFlushIoBuffers. However, it is empty, why?

Fan Zhang

2010/11/24 Don Burn

> Do you understand about a CPU’s memory cache? Platforms have on chip
> memory to cache data items for faster retrieval by the processor.
> Unfortunately, data in the cache is not available to DMA devices that
> access memory until the data is flushed from the cache, and likewise if the
> device updates memory the cache may not reflect the change. MmNonCached
> indicates the memory accesses should not go through the cache, MmCached
> indicates that like normal memory allocations such as ExAllocatePool the
> data should be cacheable.
>
>
> Don Burn (MVP, Windows DKD)
> Windows Filesystem and Driver Consulting
> Website: http://www.windrvr.com
> Blog: http://msmvps.com/blogs/WinDrvr
>
>
>
> “Marvin(Fan) Zhang” wrote in message news:xxxxx@ntdev
> :
>
>
> Hi guys
>>
>> I’m a little confused about definiation of MEMORY_CACHING_TYPE. Could you
>> tell me the difference between MmNonCached or MmCached.
>>
>> Thanks
>> Fan Zhang
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

There is also a similar document about Memory Management published by MSFT. That with the one you mentioned would be a good starting point to understand the concept!

-pro

On Nov 24, 2010, at 5:58 AM, xxxxx@hotmail.com wrote:

> I’m a little confused about definiation of MEMORY_CACHING_TYPE. Could you tell me the difference
> between MmNonCached or MmCached.

What I would highly recommend is to google the doc named “What every programmer should know about memory” that got published by Red Hat - it gives you really good introduction to the memory concepts…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Marvin(Fan) Zhang wrote:

Thanks for your explanation. Is there any method that can force the
that cache into physical memory?

That’s not required in a PC system. The x86 chipsets maintain cache
coherency.

I found there is function named KeFlushIoBuffers. However, it is
empty, why?

It’s empty for x86 and x64 builds, because the chipsets do this
automatically. It is not empty for Itanium builds, because the chipsets
don’t handle it.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> That with the one you mentioned would be a good starting point to understand the concept!

And this is exactly what the OP needs Judging from the way he asked his question he does not know that much (if anything at all) on the topic. Therefore, he needs a basic introduction to all memory concepts.

What I really like about this Red Hat doc is the way it explains things - it seems to be a much more pleasant read compared, say, to Mindshare books, let alone Intel Manuals, don’t you think…

Anton Bassov

Absolutely, I would read the Red hat doc first, then the msft doc. Two
together is fundamental to me. Then Intel manual would make sense. I don’t
like Mindshare that much, they just try to put words on to spec without
why’s…

-pro

> That with the one you mentioned would be a good starting point to
> understand the concept!

And this is exactly what the OP needs Judging from the way he asked his
question he does not know that much (if anything at all) on the topic.
Therefore, he needs a basic introduction to all memory concepts.

What I really like about this Red Hat doc is the way it explains things -
it seems to be a much more pleasant read compared, say, to Mindshare
books, let alone Intel Manuals, don’t you think…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thank guys. I learn a lot during the discuss with you.

Actually, the rule is that you must call it. Should you be running on a
chipset that handles cache coherency intrinsically, the HAL will have a
function with an empty body. If, at some point in the future, you have a
different chipset, its HAL will do the right thing. Key here is that you
have to obey the protocols; looking inside the functions to see what they
do, and determining that you don’t need to call one because one
implementation on one chipset on one machine appears to have an empty body
is not the way to construct reliable and robust drivers.

We see this same failure a lot in application-level programming, where
people substitute types like DWORD for WPARAM, because they once saw, a long
time ago, that WPARAM and DWORD had the same definition, on one platform,
and therefore these types are obviously going to be the same forever for all
platforms into the forseeable future, so they don’t need to follow the
actual interface specifications but can bypass all that unecessary overhead
by using the definition they are comfortable with. These are the people who
are now going nuts (or driving people like me nuts) converting to 64-bit
platforms.

By the way, if you are concerned about “efficiency”, remember that call/ret
execute in times measured in PICOseconds, so you save at most two
instruction cycles (that is, about 700ps on a 2.8GHz machine, probably a lot
less, because the top of stack is fairly likely to be in the L1 cache if not
in the stack simulation registers). So there is no rationale for NOT
calling KeFlushIoBuffers in a driver, unless, of course, you know that no
one will ever want to run your driver on an architecture that does not have
intrinsic cache coherency management in the chipset. This involves more
dependency on crystal balls than most of us are willing to accept.

What surprises me is that you think that a function with an empty body is
unusual. You really need to separate out the concept of “abstraction” from
“one particular implementation of that abstraction”.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Wednesday, November 24, 2010 1:14 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] MmNonCached or MmCached

Marvin(Fan) Zhang wrote:

Thanks for your explanation. Is there any method that can force the
that cache into physical memory?

That’s not required in a PC system. The x86 chipsets maintain cache
coherency.

I found there is function named KeFlushIoBuffers. However, it is
empty, why?

It’s empty for x86 and x64 builds, because the chipsets do this
automatically. It is not empty for Itanium builds, because the chipsets
don’t handle it.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

>> It’s empty for x86 and x64 builds,

> because the chipsets do this
> automatically.

I stumbled on a “nonsnooping bit” in the PCIe DLLP while debugging an “once a day or 2” issue at the time it was called “3GIO”. Yeah, everything broke if the any DMA agent is not doing this. It’s perhaps the best lesson I have learned W.R.T this topic.

Calvin

DMA is IMO poorly documented or understood. In a different world, UNIX, couple months ago I was trying to understand the underlying DMA infrastructure in the kernel, so that I can somewhat correlate to Window…

The problem was that we were trying to slice a PCI function into virtual functions and then do whatever we need to do. Continuous DMA memory was allocated ( as opposed to packet DMA), and theory abound, expectation was that DMA was not programmed properly. I was haunted for a while, then found exactly what was being said here by Joe… At the allocation time, use the APIs to make sure we want coherent DMA. Now inside it does the magic if needed else just stay silent.

So following the contract is good :slight_smile:

-pro

On Nov 24, 2010, at 8:09 PM, xxxxx@yahoo.ca wrote:

>> It’s empty for x86 and x64 builds,
>> because the chipsets do this
>> automatically.

I stumbled on a “nonsnooping bit” in the PCIe DLLP while debugging an “once a day or 2” issue at the time it was called “3GIO”. Yeah, everything broke if the any DMA agent is not doing this. It’s perhaps the best lesson I have learned W.R.T this topic.

Calvin


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

2010/11/25 Joseph M. Newcomer

> Actually, the rule is that you must call it. Should you be running on a
> chipset that handles cache coherency intrinsically, the HAL will have a
> function with an empty body. If, at some point in the future, you have a
> different chipset, its HAL will do the right thing. Key here is that you
> have to obey the protocols; looking inside the functions to see what they
> do, and determining that you don’t need to call one because one
> implementation on one chipset on one machine appears to have an empty body
> is not the way to construct reliable and robust drivers.
>
> We see this same failure a lot in application-level programming, where
> people substitute types like DWORD for WPARAM, because they once saw, a
> long
> time ago, that WPARAM and DWORD had the same definition, on one platform,
> and therefore these types are obviously going to be the same forever for
> all
> platforms into the forseeable future, so they don’t need to follow the
> actual interface specifications but can bypass all that unecessary overhead
> by using the definition they are comfortable with. These are the people
> who
> are now going nuts (or driving people like me nuts) converting to 64-bit
> platforms.
>
> By the way, if you are concerned about “efficiency”, remember that call/ret
>

sorry,I don’t know the exact meaning here. Why do u say cal/ret?
do u mean on other chipset KeFlushIoBuffers is a real function, so it will
use call/ret. it will only take at most two
instruction cycles.

execute in times measured in PICOseconds, so you save at most two
> instruction cycles (that is, about 700ps on a 2.8GHz machine, probably a
> lot
> less, because the top of stack is fairly likely to be in the L1 cache if
> not
> in the stack simulation registers). So there is no rationale for NOT
> calling KeFlushIoBuffers in a driver, unless, of course, you know that no
> one will ever want to run your driver on an architecture that does not have
> intrinsic cache coherency management in the chipset. This involves more
> dependency on crystal balls than most of us are willing to accept.
>
> What surprises me is that you think that a function with an empty body is
> unusual. You really need to separate out the concept of “abstraction” from
> “one particular implementation of that abstraction”.
> joe
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
> Sent: Wednesday, November 24, 2010 1:14 PM
> To: Windows System Software Devs Interest List
> Subject: Re: [ntdev] MmNonCached or MmCached
>
> Marvin(Fan) Zhang wrote:
> >
> > Thanks for your explanation. Is there any method that can force the
> > that cache into physical memory?
>
> That’s not required in a PC system. The x86 chipsets maintain cache
> coherency.
>
> > I found there is function named KeFlushIoBuffers. However, it is
> > empty, why?
>
> It’s empty for x86 and x64 builds, because the chipsets do this
> automatically. It is not empty for Itanium builds, because the chipsets
> don’t handle it.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
> –
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>