>How do you tell the HAL to map the memory without caching. I would like
all accesses to the memory regions to go directly to the board, without
any read/write caching. If caching were to take place, then we would be
required to explicitly flush the cache… In reading the Intel
documentation, we have found that the CPUID instruction can serialize the
CPU and also possibly flush the cache. The CPUID instruction is available
from user mode land. I have used the CPUID for identification, but have
never used it to serialize the CPU. Does anyone have any feedback about
using CPUID from user mode?
- How to map memory uncached?
- Advise on using CPUID for serialization and cache flush…
When you map the device memory in the driver, one of the parameters to the
mapping API controls cached or uncached. Also note that caching and
instruction stream serialization are different. Also note that to get PCI
bursts larger than 1 DWORD, to a target PCI device, you may need the
mapping to be cached (especially for PCI target reads).
Instruction stream synchronization may be needed if you need something like
access to system memory shared with a bus master device to be synchronized
with your uncached bus accesses. For example, if you don’t synchronize the
instruction stream, you could potentially write to device memory, and then
read from memory that’s written by bus mastering. As the processor may
execute instructions out of order, you may actually be reading the shared
memory BEFORE you execute the device memory write instruction. The Intel
processor manual has a large section on exactly how reads/writes are
ordered to different types of addresses. You also may actually want memory
to be write buffered/combined to get better PCI target burst performance.
Actually FLUSHING the memory cache can be VERY expensive, like 40,000+
clocks as I remember, not to mention processor performance suffers while
the cache reloads. Some versions of Windows also don’t exactly control the
MTRR registers so well, so getting the exact caching/write combining
behavior desired can be tricky.
Also note that bugs caused by instruction stream synchronization and
caching issues can be very tricky to debug.
Also note that depending on your hardware design, mapping a device directly
into user mode space has the potential for seriously disrupting the whole
system. Specifically, if your PCI device might decide to do PCI bus
transaction retries for a long time, to stall the access, the processor can
be FROZEN on the read/write until it completes. Interrupts will NOT be
serviced while the processor waits for the access to happen, for PCI 2.1
and later devices, bus master activity should interleave with the retry
attempts. Some video cards have produced interrupt latencies of > 10
milliseconds because of this, causing other devices to malfunction. Kernel
mode code can also cause this latency disruption too, but kernel mode
developers are supposed to know better.