A cpu/memory bound thread on physical processor 1 will run faster if
physical CPU 2, on which other threads could interfere with memory access,
is held idle. This sort of defeats the purpose of having multiple
processors. The real question for an MP system is how much useful work is
the system doing in total, not how much work is one thread doing.
Is there a general purpose MP OS out there that even considers not
scheduling threads on inactive processors?
=====================
Mark Roddy
-----Original Message-----
From: xxxxx@tab.at [mailto:xxxxx@tab.at]
Sent: Friday, September 10, 2004 6:48 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Differences between UP and MP systems?
Hi Mats,
I think out of your list #4 would be rather easy to implement and a good
option. Maybe not for all cases, but at least for some cases. I mean a
thread in logical CPU 1 will run faster if logical CPU 2 is “executing”
HALT. On a system with low-priority threads that do some cleanup and
housekeeping work but do always run, the latency for critical operations
could be improved. For most applications threads will be same priority so HT
would work well, and if there’s critical work to it would not be slowed down
just to execute some cleanup-thread that might as well run at a later time.
#5 would be similar to #4 except that it would not let to threads with the
same high priority run parallel - I see no real benefit in that. Latency for
one of those threads would be better, but latency for the other thread would
be worse.
#3 finally would be very complicated and would have to be implemented
towards one specific and well-known CPU design. If Intel (in the case of the
P4) would change the caches or anything else in the CPU the solution might
end up being slower and less responsive than a non-HT-aware system.
Ah, yes, xxxxx@home was just an example - I actually ment any of those
“use-up-all-idle-CPU-time-for-non-important-stuff” programs.
The whole point of hyperthreading is that there are two threads to run
in
the processor, so the processor can do something useful when it’s
blocked
waiting for a memory operation to finish (or something else that takes
time), and if you only schedule one thread at a time, that wouldn’t
work…
If you don’t want this, turn off HT in the BIOS…
Yeah, I understand the point of HT, and I like parallel execution (even
without a better efficiency) for ISRs and stuff. I just don’t like the idea
that critical threads are slowed down by non-critical threads. I find it
somewhat strange that a thread running at very high priority runs
“full-speed” when there are no other threads running, and runs like 30%
slower if there is a very very low priority thread running too - it
shouldn’t do that, that’s what the whole priority-system is all about.
But it’s ok, I can accept the way it is implemented in XP
- I just
wanted to point out that there are some real differences between a HT system
and a 2-physical-CPU system that could be accounted for in the scheduler.
Have a nice day,
Paul
Mats PETERSSON Gesendet von:
xxxxx@lists.osr.com
10.09.2004 10:49
Bitte antworten an “Windows System Software Devs Interest List”
An: “Windows System Software Devs Interest List”
Kopie:
Thema: Re: [ntdev] Differences between UP and MP systems?
Paul,
I’m not sure exactly what you want the scheduler to do… The processor, in
HT mode, will ALWAYS run two threads at the same time. Of course, one of
those threads may be the IDLE thread, which does a “HALT”. But no matter
what you do, the processor will attempt to execute two threads, and there is
really no control for the scheduler that says “Give more priority to
Thread1” or some such. Both threads are given equal priority. So, there’s a
few possible options for the scheduler (and some variations):
1. Schedule only one thread at a time. Hyperthreading is essentially
meantingless.
2. Always schedule one thread per logical processor, using hyperthreading to
the maximum.
3. Try to figure out what the behaviour of the thread is, and schedule
accordingily (not sure what the rules for this would be).
4. If a thread has higher priority than other runnable threads, then run
only one, if they are equal priority, run two threads.
5. If a thread has high priority, then schedule on it’s own, if it’s got low
priority schedule togehter with other low priority threads.
Now, I believe #2 is the current method of scheduling. The only added
feature to take care of in a HT system is that two threads that are sharing
any form of data should be scheduled on the same physical processor, in the
case of multiple physical processors.
It’s very hard to figure out what is the right choice in the #3+ options
above. This is because a thread could well have high priority, but also be
using the memory inefficiently (not all code that has high priority is
written to make good use of the processor). #3 would be an ideal solution,
but then you’d have to use LOTS of different metrics (memory usage
efficiency, which execution units are used, etc, etc).
Note also that if you’re running xxxxx@home on the Float unit will be more
busy than the Integer unit, so a thread running mostly in cache, using the
integer unit would run quite well. Of course, SETI also uses quite a big
chunk of memory, so caches and memory controller will be more stressed than
if the other thread was the “IDLE” thread…
The whole point of hyperthreading is that there are two threads to run in
the processor, so the processor can do something useful when it’s blocked
waiting for a memory operation to finish (or something else that takes
time), and if you only schedule one thread at a time, that wouldn’t work…
If you don’t want this, turn off HT in the BIOS…
–
Mats
xxxxx@lists.osr.com wrote on 09/09/2004 06:47:24 PM:
> Hi!
>
> I believe there are many things that will improve latency but decrease
> efficiency when going from UP to MP logical processors. AFAIK changing
> IRQL does
cost
> much more
> with MP than with UP? The effect on caches etc. has already been
> mentioned.
>
> Concerning the scheduler, well, I really do hope that there’s code in
> XP to deal with some HT issues. Consider the following case…
> Let’s say you have some normal-priority thread doing some lengthy
> computation, and one backgound-thread like xxxxx@home uses to do it’s
> stuff. No other threads doing real work except the “standard-stuff”
> which will I expect to < 1%
of
> the
> available CPU time.
> So on an UP system the normal-priority thread would run alone and get
> nearly 100% of the total computation-power. On a “true” MP system (2
> physical CPUs) the normal-priority thread would get 100% of one CPU
> which would be all it
can
> use
> anyway, and the low-priority thread would get most of the time on the
> second CPU.
> Now, if a non-HT-aware scheduler is used to run a HT machine, the case
> would be the same, only that the total computation-power isn’t near
> the 200% you get from
> 2 physical CPUs, but rather ~140% (at least as advertised by intel…).
> The normal-priority thread would end up getting only ~70% of the
> computation-power as compared to the non-HT case. Of course the “sum
> of all the work that the CPU gets done” would be more (or the same) as
> in the UP case, but nearly 50% of it would be “non-important” work…
> - the normal-priority thread would take longer to get it’s job done
> than with HT disabled.
> A HP aware scheduler could disable HT (by simply HALTing one logical
CPU)
> if
> there are 2 threads with different priorities, and only let the 2
logical
> CPUs
> run in the case of interrupts or 2 concurrent threads with the same
> priority.
> In my opinion that would be a far better thing than to just let my
> screensaver
> (or xxxxx@home or whatever) take away much of the nice computation-power
> from the
> compiler, renderer or whatever “hungry” singlethreaded application I
might
> have
> running.
>
> BTW: does anyone know if the scheduler in XP does that? And can anyone
> tell me
> it the scheduler in win2k sp4 is “HT aware”? I’m running my 2k-box with
HT
> disabled
> for now…
>
> Regards,
> Paul Groke
>
>
>
>
>
> Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 09.09.2004 18:10
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Differences between UP and MP systems?
>
>
>
>
>
>
>
> First of all, I’d like to point out that some of the performance
> difference
> may not come from differences in the kernel. It’s also true to say that
> the
> performance may degrade from an increased pressure on caches and memory
> subsystem.
>
> If you have two threads reading from memory, they will interfere with
each
> other in the sense that the memory controller may have to access
different
> sections of memory for each thread (most likely, unless the threads are
> just reading the same memory, in which case it should be in the cache).
> This reading different blocks of memory will reduce the efficiency of
the
> memory controller by adding extra commands to be sent to the memory
chips.
> This can amount to much more than 5% of the memory performance, but I
> guess
> the average benchmark isn’t only reading memory, so it’s highly likely
> that
> this is not all that is different.
>
> If you have two threads sharing the same cache, there will be a greater
> likelihood that something needed soon is thrown out, because the threads
> will read data into the cache that is given the space needed by the
other
> thread. Thrashing the cache can of course happen in single threading
too,
> but it’s increased by the fact that you have two different threads that
> may
> not have any knowledge of the other one, and the cache isn’t any bigger
> when running hyperthreading, so this may slow things down.
>
> A further reason for a slow down might be that despite the advertised
> effects of hyperthreading, two threads are actually using the processor
> core LESS efficient than a single thread. This obviously depends very
much
> on what the actual benchmark is doing (not only benchmarks, of course,
but
> a 5% slowdown is very hard to percieve on a machine, generally, you need
> to
> get at least 20% difference before we realize there’s a difference).
This
> would make most of the difference on small apps that don’t do much
memory
> accessing, and spend a lot of time on CPU-bound calculations. Memory
bound
> applications will suffer from the above two problems.
>
> Also, as you mention, some of the kernel is different, primarily, it
will
> do LOCK prefixes on some of the memory accesses where one CPU has to
know
> it’s the only CPU to access this location, and I think someone mentioned
> something about some SpinLock call being essentially a No-Op on the UP
> kernel, whilst it’s “a real function” on the MP kernel.
>
> Aside from the NTKERNXX, I believe HAL.DLL is also different depending
on
> which configuration.
>
> My guess is that the major difference in performance would be caused by
> the
> cache/memory issues I’ve mentioned, rather than by differences in the
> kernel. But that would naturally depend a lot on what benchmarks are
being
> run too.
>
> To measure the true difference between MP and UP kernel, you should be
> able
> to switch off the HyperThreading and run the same benchmark in Single
> processor mode, without re-installing the kernel. If there’s a noticable
> difference, then my first three reasons are highly likely.
>
> –
> Mats
>
> xxxxx@lists.osr.com wrote on 09/09/2004 04:47:18 PM:
>
> > I was looking at upgrading my home PC and was looking at the various
> > HyperThreading information. What surprised me is that a number of
> > benchmarks showed a degradation on systems where HT is enabled versus
it
> > being disabled, up to 5% when it’s mentioned by the tester. The
> apparent
> > cause of this is the OS differences between the Uniprocessor
> (ntoskrnl.exe)
> > and Multiprocessor (ntkrnlmp.exe) kernels.
> >
> > As far as I know, the only differences between UP and MP are in the
> > ntoskrnl/ntkrnlmp.exe kernel; I don’t think that any other files such
as
> the
> > HAL need to change. The only difference that I could think of was
with
> > SpinLocks, but I don’t see that accounting for a 5% difference.
There’s
> > surely some difference in the task scheduler, but I wouldn’t think
> that’s
> > major either. I suspect that there are other differences, but I
> couldn’t
> > find any with my Google searching, so could someone educate me/us?
> >
> > Thanks in advance!
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> > ForwardSourceID:NT00003192
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
> ForwardSourceID:NT000031F2
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@tab.at
To unsubscribe send a blank email to xxxxx@lists.osr.com
Please visit us: www.tab.at www.championsnet.net
www.silverball.com
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@stratus.com
To unsubscribe send a blank email to xxxxx@lists.osr.com