FW: Cache line size on Pentium 4

From: Campbell, Randolph L
Sent: Friday, August 18, 2000 8:23 AM
To: Nemiroff, Daniel
Subject: RE: [ntdev] Cache line size on Pentium 4

It is more complicated than this. The Foster/Willamette can have upto three
levels of cache. The first level cache has a 128 byte line width, the
second-level cache has a 64 byte line width, and the third-level cache has a
128 byte line width. If you are going to use the cache line size (for
performance reasons), you should use CPUID to find out the number of cache
levels and their line sizes. As far as acting badly goes, the DMA will still
function correctly and the performance will probably be equal or better (due
to changes in P4P FSB (quad pumped) and chipset (better RDRAM performance)).

-Randy

-----Original Message-----
From: Nemiroff, Daniel
Sent: Friday, August 18, 2000 7:31 AM
To: Campbell, Randolph L
Subject: FW: [ntdev] Cache line size on Pentium 4

Is the following true/a concern?

daniel

-----Original Message-----
From: Jan Bottorff [mailto:xxxxx@pmatrix.com]
Sent: Thursday, August 17, 2000 10:20 PM
To: NT Developers Interest List
Subject: [ntdev] Cache line size on Pentium 4

I happen to be browsing the Pentium 4 docs and noticed it said the cache
line
size was now 128-bytes, not the 32-bytes we all have know all these years.

Seems like the DDK function to return the cache line size on an x86
processor
was hard coded. A quick check of an older DDK version handy says “#define
KeGetDcacheFillSize() 1L”. I seem to remember noticing the bogus value once
in
the past, and being forced to use a better value to program some hardware,
like
32-bytes. Why isn’t this value pulled from the PCR, like
KeGetDcacheFillSize for
non-x86 processors?

This suggests DMA devices may act badly on systems with Pentium 4
processors.
PCI devices which tried really hard to do full cache line bursts will also
now
often be doing partial cache line bursts, potentially having some
performance
implications.

I’m assuming the Intel docs are not a misprint? Have to put it on my toask
list
at next week’s Intel developers conference.

  • Jan

You are currently subscribed to ntdev as: xxxxx@intel.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

>It is more complicated than this. The Foster/Willamette can have upto three

levels of cache. The first level cache has a 128 byte line width, the
second-level cache has a 64 byte line width, and the third-level cache has a
128 byte line width. If you are going to use the cache line size (for
performance reasons), you should use CPUID to find out the number of cache
levels and their line sizes. As far as acting badly goes, the DMA will still
function correctly and the performance will probably be equal or better (due
to changes in P4P FSB (quad pumped) and chipset (better RDRAM performance)).

Hmmm. That’s not exactly proper NT driver style. Some of the religious type
here would seriously frown on inserting a very processor specific CPUID
instruction into their driver.

The issue of PCI busmaster DMA sounds like it WOULD be a problem. If you
programmed the device thinking the cache line size was 32 bytes (like it is
now), it should try and do a PCI WRITE INVALIDATE command instead of a
WRITE at times. The issue is the PCI device is claiming it will write the
whole cache line, so the processor can just discard it from cache, instead
of being written back. If the PCI device thinks a cache line is 32 bytes,
it may terminate the burst long before the cache line is really filled.
This may cause very subtle bugs.

It seems like the only way this would work correctly is if the P4+chipset
actually still did the cache writeback even when the PCI device initiated a
transfer that was supposed to write a whole line?

  • Jan