I have some processor intensive code in my filter driver that I process with
kernel threads. I ran some benchmark tests on a couple of dual-processor
servers. The old server has a couple of 700 Pentium III chips, while the
new server has a couple of 3GHz Pentium IVs with hyperthreading.
Well I just assumed that the new server would blow the socks off the old
one, but to my utter amazement, for some benchmarks it was actually slower.
Are there any tips for creating and scheduling Kernel Threads that I should
be aware of?
Thanks.
On Oct 21, 2005, at 4:41 PM, Neil Weicher wrote:
I have some processor intensive code in my filter driver that I
process with
kernel threads. I ran some benchmark tests on a couple of dual-
processor
servers. The old server has a couple of 700 Pentium III chips,
while the
new server has a couple of 3GHz Pentium IVs with hyperthreading.
Well I just assumed that the new server would blow the socks off
the old
one, but to my utter amazement, for some benchmarks it was actually
slower.
I can’t offer much help other than to suggest using a profiler on
both boxes - that is indeed an interesting result. All other things
being equal (i.e. no code changes, same OS, etc), this surprises me
too. VTune may be useful if this is really intel-specific weirdness.
I’d definitely try profiling, regardless, if you really have CPU-
bound kernel-mode code.
One other change that jumps out is that you have moved from a 2-proc
box to a (pseudo) 4-proc box - you might reevaluate your locking
architecture, but profiling will reveal that if it’s a problem. You
could also try turning off hyperthreading.
Good luck. I’m curious to know what you come up with.
Steve Dispensa
MVP - Windows DDK
www.kernelmustard.com
Sorry - I meant it is a Xeon 3ghz. It has 2MB of L2 cache.
In any case, the processor intensive code is highly optimized assembler.
There must be some assembler code that the Xeon processors do not like.
----- Original Message -----
From: “Steve Dispensa”
To: “Windows File Systems Devs Interest List”
Sent: Friday, October 21, 2005 6:31 PM
Subject: Re: [ntfsd] strange performance results
On Oct 21, 2005, at 4:41 PM, Neil Weicher wrote:
> I have some processor intensive code in my filter driver that I
> process with
> kernel threads. I ran some benchmark tests on a couple of dual-
> processor
> servers. The old server has a couple of 700 Pentium III chips,
> while the
> new server has a couple of 3GHz Pentium IVs with hyperthreading.
>
> Well I just assumed that the new server would blow the socks off
> the old
> one, but to my utter amazement, for some benchmarks it was actually
> slower.
I can’t offer much help other than to suggest using a profiler on
both boxes - that is indeed an interesting result. All other things
being equal (i.e. no code changes, same OS, etc), this surprises me
too. VTune may be useful if this is really intel-specific weirdness.
I’d definitely try profiling, regardless, if you really have CPU-
bound kernel-mode code.
One other change that jumps out is that you have moved from a 2-proc
box to a (pseudo) 4-proc box - you might reevaluate your locking
architecture, but profiling will reveal that if it’s a problem. You
could also try turning off hyperthreading.
Good luck. I’m curious to know what you come up with.
----------------------------------
Steve Dispensa
MVP - Windows DDK
www.kernelmustard.com
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@netlib.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
Yeah, it really is odd. Do you do any sort of locking or anything
that could pagefault? You might consider switching to in-stack queued
spin locks if you’re currently using legacy spin locks, but I’d
really profile before I attempted any more optimization. If you don’t
have VTune, try kernrate:
http://www.microsoft.com/whdc/driver/perform/drvperf.mspx
I’m sure you know what you’re doing regarding your optimized assembly
code, but do you think the chip change (and going to 4x vs 2x) could
mean that the code is actually not as optimal for the new target?
Good luck; sounds like a fun one.
-sd
On Oct 21, 2005, at 6:31 PM, Neil Weicher wrote:
Sorry - I meant it is a Xeon 3ghz. It has 2MB of L2 cache.
In any case, the processor intensive code is highly optimized
assembler.
There must be some assembler code that the Xeon processors do not
like.
----- Original Message -----
From: “Steve Dispensa”
> To: “Windows File Systems Devs Interest List”
> Sent: Friday, October 21, 2005 6:31 PM
> Subject: Re: [ntfsd] strange performance results
>
>
> On Oct 21, 2005, at 4:41 PM, Neil Weicher wrote:
>
>> I have some processor intensive code in my filter driver that I
>> process with
>> kernel threads. I ran some benchmark tests on a couple of dual-
>> processor
>> servers. The old server has a couple of 700 Pentium III chips,
>> while the
>> new server has a couple of 3GHz Pentium IVs with hyperthreading.
>>
>> Well I just assumed that the new server would blow the socks off
>> the old
>> one, but to my utter amazement, for some benchmarks it was actually
>> slower.
>>
>
> I can’t offer much help other than to suggest using a profiler on
> both boxes - that is indeed an interesting result. All other things
> being equal (i.e. no code changes, same OS, etc), this surprises me
> too. VTune may be useful if this is really intel-specific weirdness.
> I’d definitely try profiling, regardless, if you really have CPU-
> bound kernel-mode code.
>
> One other change that jumps out is that you have moved from a 2-proc
> box to a (pseudo) 4-proc box - you might reevaluate your locking
> architecture, but profiling will reveal that if it’s a problem. You
> could also try turning off hyperthreading.
>
> Good luck. I’m curious to know what you come up with.
>
>
>
> ----------------------------------
> Steve Dispensa
> MVP - Windows DDK
> www.kernelmustard.com
>
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@netlib.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
> —
> Questions? First check the IFS FAQ at https://www.osronline.com/
> article.cfm?id=17
>
> You are currently subscribed to ntfsd as:
> xxxxx@positivenetworks.net
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
Neil,
Current Xeons (server versions of Pentium IV) has much longer pipeline than
Pentium III so they require quite different optimizations. Many Pentium III
optimizations just slow down Pentium IV.
You also could try your code on a AMD Opteron system. Code optimized for PIII
usually would have excellent performance on Opteron and Athlon CPUs.
Dmitriy Budko
VMware
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Neil Weicher
Sent: Friday, October 21, 2005 4:32 PM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] strange performance results
Sorry - I meant it is a Xeon 3ghz. It has 2MB of L2 cache.
In any case, the processor intensive code is highly optimized assembler.
There must be some assembler code that the Xeon processors do not like.
On Oct 21, 2005, at 4:41 PM, Neil Weicher wrote:
I have some processor intensive code in my filter driver that I
process with
kernel threads. I ran some benchmark tests on a couple of dual-
processor
servers. The old server has a couple of 700 Pentium III chips,
while the
new server has a couple of 3GHz Pentium IVs with hyperthreading.
Well I just assumed that the new server would blow the socks off
the old
one, but to my utter amazement, for some benchmarks it was actually
slower.