Spinlock vs RWLocks

anton_bassov · November 20, 2012, 2:25pm

Peter,

I may be totally wrong, but I heard that Windows, at some point, had a couple of entires in Top 500 List.
They’ve got pretty serious players down there. For example, according to June 2012 data, the 1st place is held by the machine with 1572864 (0xF00000 in hex notation) cores, which happens to be the
240*(1 + max. possible value of unsigned short type). Certainly, not all of them have that many cores -
there are just around 30 machines on that list with the number of the cores exceeding 65535, and the last
place is occupied by the machine with “modest” 6064 cores. You can check http://www.top500.org/list/2012/06/100/?page=1 for more info. In any case, you’ve got absolutely nothing to do on this list with the system that supports at most 256 CPUs.Therefore, I thought Windows may support higher-end machines…

If that’s the environment that my software is targeting… why, yes. If not in-house, then certainly
at a test site with which I’m partnering.

???

Are you sure you will be able to find such a machine for the sole purpose of testing??? Surely some of your clients may have the one, but it is more than likely to be the one they use for the production purposes. Furthermore, it is more than likely to be a high-availability one, i.e. the one they just cannot afford to reboot at any moment they wish just for the fun of doing it Therefore, they are not going to let you use it for the testing/experimentation purposes…

Fair enough - indeed, those who seek high performance normally would not use Windows, would they…

Anton Bassov

Alex_Grig · November 20, 2012, 2:45pm

For what it’s worth, I had a bug where my driver crashed on a box with >256 CPUs (320, actualy).

OSR_Community_User · November 20, 2012, 4:01pm

> I may be totally wrong, but I heard that Windows, at some point, had a couple of entires in Top 500 List.

Right at the moment, 3 of the top 500 supercomputers are listed as Windows based.

They’ve got pretty serious players down there. For example, according to June 2012 data, the 1st place is held by the machine with 1572864 (0xF00000 in hex notation) cores

Most (all?) of the Top 500 supercomputers are clusters, so when they say 1,572,864 cores, that is the total across all cluster nodes. Each node might only have 16 or 32 cores.

I suspect a big factor of Windows not being popular for Top 500 supercomputers is that each node would need an OS license, so if you have 100,000 nodes (assume 16 cores each), Windows would need 100,000 licenses. Say you get a deal, and a license is only $500 per node, that’s $50 million, just for the OS licenses. Linux on the other hand has $0 for a license. If you could build each node for say $3000 (processor/memory/storage/switch fabric), then using Windows vs Linux would give you 17% ($500/$3000) lower performance because you had to spend some money on OS licenses and not hardware.

On the other hand, if you look at the distributed computing statistics from the BOINC project (https://boinc.berkeley.edu/index.php) which says it’s 356,976 active computers are delivering about 6.864 PetaFLOPS in the last 24 hours. If you also look at the BOINC OS breakdown (http://boincstats.com/en/stats/-1/host/breakdown/os/) you will find Windows machines make up something like 90% of the CPU credits, so let’s say 90% of the total PetaFLOPS gives 6.178. If we then go look at the current top 500 supercomputer list (http://www.top500.org/lists/2012/11/), and find the ranking for a 6.178 PetaFLOPS machine, it would be #5. My guess is (anybody want to find the data?) the cost of computing via BOINC is a tiny fraction of the cost of doing computing via one of the Top 500 supercomputers (since hardware time and power are basically being donated), and since the majority of BOINC runs on Windows 7, one could claim Windows 7 currently is an OS creating very cost effective PetaFLOPS for many projects.

Jan

anton_bassov · November 20, 2012, 5:25pm

> Most (all?) of the Top 500 supercomputers are clusters, so when they say 1,572,864 cores,

that is the total across all cluster nodes. Each node might only have 16 or 32 cores.

This is true. Furthermore, most of these nodes may be “workers” that are responsible strictly for
computations - all file IO operations will be deferred to dedicated IO nodes where the file system is mounted. This file system may be a distributed one, with a single metadata server and multiple user data servers that actually store file data.

I think you already see what the main potential bottlenecks under such an architecture may be like - they have obviously nothing to do with the “'workers”, but, instead, have everything to do with IO nodes, particularly metadata server. This is where extremely efficient OS, file system and storage stack are required, and this is exactly the place where Linux is used…

However, when it comes to testing you still have to use all these “workers”, because otherwise you will never manage to put file IO processing nodes…

Anton Bassov

OSR_Community_User · November 20, 2012, 5:32pm

Back in 1990, Gordon Bell wondered why he had manaaged to work for two
supercomputer companies that failed. Why did they fail?

There are very, very, very, very few places that can afford a
supercomputer. Livermore Labs, and similar places doing nuclear research,
the National Weather Service and a couple of the big independent weather
companies, and overall the market looks like 12 supercomputers would
saturate the world’s needs (sound familiar?)

The other customers are the supercomputer centers scattered around the
world. So Gordon studied who uses these. A researcher could get an NSF
grant with $5000 for supercomputer time. This entitled the researcher to
about 30 seconds a day of supercomputer time. He said, “That’s one result
a day”. For half that price, the researcher could by a 386 machine with a
387 math coprocessor (this was 1990, remember), which could run the same
program, but it took 24 hours to get the result. This meant, guess what,
one result per day. The researcher could then take the remaining half of
the $5000 computer budget and buy a second 386, no math coprocessor, and
use it to process email, write reports, etc. But for the research, it
produced one result per day. Now, compare that 20MHz 386/387 with a
modern Core series multicore machine; it would probably take just a couple
hours per result.

So the current supercomputer race is largely a “mine is bigger than yours”
contest, trying to show that country X’s major manufacturer can build a
faster machine than country Y. It is the Space Race of the second decade
of the 21st century.

In 2000, the supercomputer trade group published a list of the top 50
supercomputers. In 2005, they published another list, but the important
thing is that EACH of the top 50 supercomputers in 2005 had more power
than ALL of the supercomputers in the year 2000 top-50 list combined.

There are still many supercomputer-scale problems, and we actually do need
faster machines, but is there an actual “market” for them? Gordon thought
not. (Gordon was the chief architect of the PDP-11, one of the founders
of Digital Equipment, and is now at Microsoft Research)
joe

> I may be totally wrong, but I heard that Windows, at some point, had a
> couple of entires in Top 500 List.

Right at the moment, 3 of the top 500 supercomputers are listed as Windows
based.

>They’ve got pretty serious players down there. For example, according to
> June 2012 data, the 1st place is held by the machine with 1572864
> (0xF00000 in hex notation) cores

Most (all?) of the Top 500 supercomputers are clusters, so when they say
1,572,864 cores, that is the total across all cluster nodes. Each node
might only have 16 or 32 cores.

I suspect a big factor of Windows not being popular for Top 500
supercomputers is that each node would need an OS license, so if you have
100,000 nodes (assume 16 cores each), Windows would need 100,000 licenses.
Say you get a deal, and a license is only $500 per node, that’s $50
million, just for the OS licenses. Linux on the other hand has $0 for a
license. If you could build each node for say $3000
(processor/memory/storage/switch fabric), then using Windows vs Linux
would give you 17% ($500/$3000) lower performance because you had to spend
some money on OS licenses and not hardware.

On the other hand, if you look at the distributed computing statistics
from the BOINC project (https://boinc.berkeley.edu/index.php) which says
it’s 356,976 active computers are delivering about 6.864 PetaFLOPS in the
last 24 hours. If you also look at the BOINC OS breakdown
(http://boincstats.com/en/stats/-1/host/breakdown/os/) you will find
Windows machines make up something like 90% of the CPU credits, so let’s
say 90% of the total PetaFLOPS gives 6.178. If we then go look at the
current top 500 supercomputer list (http://www.top500.org/lists/2012/11/),
and find the ranking for a 6.178 PetaFLOPS machine, it would be #5. My
guess is (anybody want to find the data?) the cost of computing via BOINC
is a tiny fraction of the cost of doing computing via one of the Top 500
supercomputers (since hardware time and power are basically being
donated), and since the majority of BOINC runs on Windows 7, one could
claim Windows 7 currently is an OS creating very cost effective PetaFLOPS
for many projects.

Jan

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 20, 2012, 5:37pm

I would have thought ULONGLONG, since the processor maps are bitmaps.
There are already configurations that exceed 64 processors. 64 processor
systems are Old News; we saw these many years ago at WinHEC. One box was
Itanics, and it was done the year that Intel sold 500 Itanic chips during
their fourth quarter. I realized that nearly all of those sales were
probably to this one company. A half-height 19" rack held 64 AMD x64
machines, cost about $300,000, and exceeded the power of a 1980s
supercomputer costing $20,000,000.

joe

Peter,

Hmmm… let me think… “that well exceeds the one described by unsigned
short type”… UCHAR… USHORT… what WOULD that number be? USHORT’s half
an ULONG… and we’re talking C in kernel mode here… Hmmmm… 65536?
Yes, I think that’s what you meant. 65536! More than 65535 CPUs? Why would
I do that? Windows only supports 256 CPUs per system.

I may be totally wrong, but I heard that Windows, at some point, had a
couple of entires in Top 500 List.
They’ve got pretty serious players down there. For example, according to
June 2012 data, the 1st place is held by the machine with 1572864
(0xF00000 in hex notation) cores, which happens to be the
240*(1 + max. possible value of unsigned short type). Certainly, not all
of them have that many cores -
there are just around 30 machines on that list with the number of the
cores exceeding 65535, and the last
place is occupied by the machine with “modest” 6064 cores. You can check
http://www.top500.org/list/2012/06/100/?page=1 for more info. In any case,
you’ve got absolutely nothing to do on this list with the system that
supports at most 256 CPUs.Therefore, I thought Windows may support
higher-end machines…

> If that’s the environment that my software is targeting… why, yes. If
> not in-house, then certainly
> at a test site with which I’m partnering.

???

Are you sure you will be able to find such a machine for the sole
purpose of testing??? Surely some of your clients may have the one, but it
is more than likely to be the one they use for the production purposes.
Furthermore, it is more than likely to be a high-availability one, i.e.
the one they just cannot afford to reboot at any moment they wish just for
the fun of doing it Therefore, they are not going to let you use it for
the testing/experimentation purposes…

And YES, you might get customers who disagree when it comes to whether the
exhibited performance in their environment “totally sucks” or not. In that
case, you explain to them that functionality and not performance was the
goal for the product/release/feature and whether you ever plan to enhance
the performance in this area.

Fair enough - indeed, those who seek high performance normally would not
use Windows, would they…

Anton Bassov

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 28, 2012, 1:51am

Not that I really want to enter into the meat of this debate, but Windows Server 2012 supports 640 logical processors (cores or threads,) not 256. Install the Hyper-V role and that number drops to 320.

Jake Oshins
Windows Kernel Team

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Tuesday, November 20, 2012 6:48 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Spinlock vs RWLocks

Hmmm… let me think… “that well exceeds the one described by unsigned short type”… UCHAR… USHORT… what WOULD that number be? USHORT’s half an ULONG… and we’re talking C in kernel mode here… Hmmmm… 65536? Yes, I think that’s what you meant. 65536! More than 65535 CPUs? Why would I do that? Windows only supports 256 CPUs per system.

But… I’ll play along for fun, because well, it’s still morning here and I haven’t left for the office yet. So, here’s my answer: If that’s the environment that my software is targeting… why, yes. If not in-house, then certainly at a test site with which I’m partnering.

I guess I don’t understand your entire point (hey, THAT’s a first, huh?).

The way WE write software is we create a set of functional requirements, define a design that we expect will meet those requirements, write the software, and then test our software to determine – as far as practical – that our implementation and design in fact DO meet the stated functional requirements. The requirements, of necessity, include supported target environments and performance metrics, if performance is part of the goal.

One place where I *will* differ, if only slightly, from Mr. Tippet is that in some cases, or for some releases, or for some operations, performance is completely secondary to functionality. The product has to work properly, even if it exhibits crappy performance. In these cases, I’ve found that it actually works to have a performance goal that’s entirely subjective. For example, the goal might be “doing xyz in environment abc must work, and performance can’t totally suck” – My experience is that three or four engineers locked in a room can generally come to a consensus on when performance of an operation does and does not “totally suck.” Yes, this is a VERY LOW performance goal… but it’s still a valid goal. And YES, you might get customers who disagree when it comes to whether the exhibited performance in their environment “totally sucks” or not. In that case, you explain to them that functionality and not performance was the goal for the product/release/feature and whether you ever plan to enhance the performance in this area.

In general, when we have customers that report our software behaving in ways other than those we’ve specified, performance or otherwise, we first ask: “Is this product running in a way we intended and in a supported environment?” If it’s not, we explain that to the customer.

If you try to dig a hole with an axe, and you break that axe handle, it’s REALLY not a problem the axe manufacturer should have to deal with. You can complain all you want about how the axe is inefficient, and how the handle should not have broken, but the bottom line is the problem lies with YOU not with the axe – you’re using the wrong tool for the job.

It’s no different in the world of software. In the real world, there’s zero need to complicate things further. You just make your job harder, and you gain nothing.

Peter
OSR

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer