This is a very interesting topic, I have a related question:
I have a PCI device without DMA, where I could optimize the write speed
by 15% by changing the MmMapIoSpace from MmNonCached to
MmFrameBufferCached. That’s good.
But I still only get 26 MB/s!
Would it make a difference to change the driver from IOCTL using NEITHER
to WriteFile with DO_DIRECT_IO?
What is the best way to get long PCI bursts without DMA?
By the way, I’m still using NT4.
/Martin Green
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Mats PETERSSON
Sent: Monday, November 15, 2004 5:05 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Access speed of memory allocated with
‘MmAllocateNonCachedMemory’
The exact ratio between non-cached memory and cached memory, in access
speed, will be something in the order of 2x to 100x slower, depending on
many factors, such as the speed of the processor, cache-sizes,
write-combine facilities used, etc, etc. This is for the obvious reason:
The non-cached memory isn’t cached internally in the processor, and
thus, it’s going to force the processor to access this memory
externally. It is also likely to not do burst-accesses to non-cached
memory, but that’s not a guarantee, it does definitely depend on the
chipset (memory controller) involved. However, it’s guaranteed that if
you do a write to memory, followed by a read, the write should finish
before the memory is allowed to be read.
If the chipset supports write-combining, and the memory type is set to
write-combine rather than “No caching”, then the processor/chipset is
allowed to combine several writes, out of order, so that they appear as
one or more larger writes. Imaginary example:
mov eax, dword ptr 2000
mov ebx, dword ptr 2008
mov ecx, dword ptr 2004
mov edx, dword ptr 200c
This is not guaranteed to come out in the order of the writes, but could
well come out in the more sequential order of:
2000+2004 as one write
2008+200C as one write
However, for completely non-cacheable memory, the writes HAVE TO be
ordered as the processor completes the instructions (and completion of
instructions must be strongly ordered with respects to memory accesses).
The memory access pattern is configured through a series of Model
Specific Registers called MTRR (Memory Type Range Register). There are
several of these MTRR’s. Memory can be configured as “No caching”,
“Write Combine” or “Cacheable”.
Cacheable is of course the complete opposite of non-cacheable, the
processor is allowed to write in ANY order it likes, and writes may
well happen long after a read for some other region, and the processor
may even read memory on a speculative basis (i.e. you read address 2000,
and the processor decides that “I haven’t got any better idea of what to
do, so I’ll load up 2010 too”).
The actual comprehensive list of memory types are:
NC - Non cacheable. No caching is allowed for this memory. No
speculative reads. CD - Cache disable. This mode prevents data from
being loaded to the cache, but data in the cache is still availble for
Code Cachine, but not for data caching. . WC - Write combine. Allow
minimal re-ordering of writes so that the number of writes are reduced.
Speculative reads allowed. WP - Write protected. This means that the
cache is write protected, not that the memory is write protected. So
cache-lines are allocated on reads, but writes go directly to memory and
invalidates the cache-line. Speculative reads allowed. WT - Write
through. Write to memory at the same time as updating the cache.
Speculative reads allowed. WB - Write back. Write to memory only when
modified data in the cache has to be evicted. Of course, speculative
reads are allowed here too.
If you call ExAllocatePoolWithTag, you will get WB memory,
MmAllocateNonCachedMemory, should as far as I understand, return “NC”
memory. You can also allocate “WC” memory by using the
MmAllocateContiguousMemorySpecifyCache, obviously using the
MmWriteCombined cache-type.
–
Mats
xxxxx@lists.osr.com wrote on 11/15/2004 08:10:57 AM:
Hello all,
I did not made exact measurements for now, but at first sight , it
seems that the access of memory allocate with
‘MmAllocateNonCachedMemory()’ is about 4 times slower than memory
allocated with “ExAllocatePoolWithTag” ( paged or non-paged doesn’t
matter ). Has someone an idea about the cause of this ?
Christiaan
Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag
argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com
ForwardSourceID:NT000073BA
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@lorensbergs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com