I agree good background knowledge, but the only practical conclusions for the programmer are the standard ones
Algorithms with good locality of reference (both time and space) have better performance; and
Occasionally padding to cache aligned structures improves performance
For KM, there is almost nothing you can do in the average driver about either of these. And in UM, stay away from OOP (at least for storage) and be ware of generic heap allocation.
The data in this paper does not cover a sufficient concurrency interval to draw any meaningful conclusions about whether the optimal associativity of a cache varies depending on the number of consumers (cores / logical processors). This is likely to vary according to the application(s) under test and specially whether the CPUs are working together on a small working set, of running different applications working with orthogonal data. Unfortunately the hardware they used is quad core and similar and does not describe the situation where 20 or more cores may share the cache so there is no opportunity to see how performance may vary
Sent from Surface Pro
From: Jamey Kirby
Sent: Tuesday, June 16, 2015 10:45 AM
To: Windows System Software Devs Interest List
This is one of the best papers I have read on the subject of memory. It is a Red Hat paper and a skosh *nix-centric, and a bit dated (2007), but something that every systems programmer should read.
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
ᐧ
On Mon, Jun 15, 2015 at 6:34 PM, Marion Bond wrote:
That is an interesting conclusion. Do you know if there is any correlation between the effectiveness of the level of associativity for a cache versus the number of logical processors making use of the cache?
For example, would a L3 cache used by 2 logical processors exhibit a different profile with respect to associativity versus a L3 cache used by 40 logical processors?
Sent from Surface Pro
From: xxxxx@hotmail.com
Sent: Monday, June 15, 2015 9:56 AM
To: Windows System Software Devs Interest List
> http://yarchive.net/comp/linux/page_coloring.html
BTW, once we are at it, in terms of potential performance improvement, N-way cache associativity happens to be pretty non-linear factor. Unfortunately, I cannot immediately find a link to a document that provides an in-depth analysis of cache associativity, but, IIRC, intoducing 2-way associativity gives a tremendous improvement over the direct 1-to-1 mapping, and intoducing 4-way associativity gives a similar improvement compared to 2-way one. However, 8-way associativity gives only a marginal (if any at all) improvement over 4-way one, and increasing N above 8 is clearly just a waste of transistors…
Anton Bassov
—
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
—
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
–
Jamey Kirby
Disrupting the establishment since 1964
This is a personal email account and as such, emails are not subject to archiving. Nothing else really matters.
— NTDEV is sponsored by OSR Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev OSR is HIRING!! See http://www.osr.com/careers For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer