Re: [ntdev] Re: [ntdev] Hash Mapping at driver level

I agree good background knowledge, but the only practical conclusions for the programmer are the standard ones

Algorithms with good locality of reference (both time and space) have better performance; and

Occasionally padding to cache aligned structures improves performance

For KM, there is almost nothing you can do in the average driver about either of these. And in UM, stay away from OOP (at least for storage) and be ware of generic heap allocation.

The data in this paper does not cover a sufficient concurrency interval to draw any meaningful conclusions about whether the optimal associativity of a cache varies depending on the number of consumers (cores / logical processors). This is likely to vary according to the application(s) under test and specially whether the CPUs are working together on a small working set, of running different applications working with orthogonal data. Unfortunately the hardware they used is quad core and similar and does not describe the situation where 20 or more cores may share the cache so there is no opportunity to see how performance may vary

Sent from Surface Pro

From: Jamey Kirby
Sent: ‎Tuesday‎, ‎June‎ ‎16‎, ‎2015 ‎10‎:‎45‎ ‎AM
To: Windows System Software Devs Interest List

This is one of the best papers I have read on the subject of memory. It is a Red Hat paper and a skosh *nix-centric, and a bit dated (2007), but something that every systems programmer should read.

https://people.freebsd.org/~lstewart/articles/cpumemory.pdf

On Mon, Jun 15, 2015 at 6:34 PM, Marion Bond wrote:

That is an interesting conclusion. Do you know if there is any correlation between the effectiveness of the level of associativity for a cache versus the number of logical processors making use of the cache?

For example, would a L3 cache used by 2 logical processors exhibit a different profile with respect to associativity versus a L3 cache used by 40 logical processors?

Sent from Surface Pro

From: xxxxx@hotmail.com
Sent: ‎Monday‎, ‎June‎ ‎15‎, ‎2015 ‎9‎:‎56‎ ‎AM
To: Windows System Software Devs Interest List

> http://yarchive.net/comp/linux/page_coloring.html

BTW, once we are at it, in terms of potential performance improvement, N-way cache associativity happens to be pretty non-linear factor. Unfortunately, I cannot immediately find a link to a document that provides an in-depth analysis of cache associativity, but, IIRC, intoducing 2-way associativity gives a tremendous improvement over the direct 1-to-1 mapping, and intoducing 4-way associativity gives a similar improvement compared to 2-way one. However, 8-way associativity gives only a marginal (if any at all) improvement over 4-way one, and increasing N above 8 is clearly just a waste of transistors…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer



Jamey Kirby
Disrupting the establishment since 1964

This is a personal email account and as such, emails are not subject to archiving. Nothing else really matters.
— NTDEV is sponsored by OSR Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev OSR is HIRING!! See http://www.osr.com/careers For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Whatever we may think about it, this is what hardware designers have done

This is an example of there cache config from a machine I have (measured by
GetLogicalProcessorInformation)

Cache 49 [Level 3] Unified [20 MB] 64 byte lines [20-way] Logical Processors [16
, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

Type Intel64 Family 6 Model 45 Stepping 7
Name Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz

This L3 cache is 20-way associative and services 16 of the 32 logical processors in the system. This system has hyper threading enabled, so 8 cores share this cache.

The other NUMA node in this system has an identical L3 cache shared by the other processors

As you point out the L1 & L2 instruction and data caches are per core (two logical processors because of hyper threading). It occurs to me that this is reasonably efficient in the cases (like mine) where a single application uses all cores and even though there are context switches etc., mostly the same code pages are executed

Sent from Surface Pro

From: xxxxx@hotmail.com
Sent: ‎Thursday‎, ‎June‎ ‎18‎, ‎2015 ‎10‎:‎22‎ ‎AM
To: Windows System Software Devs Interest List

… the situation where 20 or more cores may share the cache so there is no opportunity
to see how performance may vary

Well, from the hardware designer’s perspective it would be a truly moronic idea to make separate cores share the same cache, don’t you think. In fact, it would simply defeat the very purpose of caching, in the first place. In order to realize it all you have to do is to ask yourself where this cache is going to be physically located. Dfferent threads on the same physical core may, indeed, share L3 cache (L1 and L2 would still be thread-specific), but making different cores share the same cache is going to be AT LEAST pointless on a UMA architecture where all the cores share the same memory controller. On NUMA architectre where every core has its own memory controller this approach is obviously going to result in a serious performance degradation, raher than enhancement that it purports to offer…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer