Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV

Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Going Faster...!

OSR_Community_UserOSR_Community_User Member Posts: 110,217
We've just completed an exhaustive test of several Pentium and Athlon
processors, trying to find the fastest combination of components on which to
run our NTW4 software. Our code is carefully written to run entirely in
memory - once we're loaded into memory, the hard drive is very seldom
accessed and only then by NT's internal housekeeping. Our code runs as an NT
Service and performs no video operations of any kind, so video bandwidth
shouldn't be a factor either. Math operations are almost entirely integer,
not FP. Because of these factors, we believe(d) we are limited only by core
and bus speed... and while we don't expect speed to scale linearly with core
speed, we expected _something_.

Our testing showed that above 800MHz core speed there doesn't appear to be
any benefit to faster cores. Going from 800 to 867 to 900 MHz, which
represents a 13% increase in core speed, we see exactly _zero_ delivered
improvement. Going from 450 to 700 to 800, we do see improvements - but they
level off above 800. Changing the external bus speed (from 100 MHz to 133
MHz) also has no measurable effect.

These NTW4 machines are running very lean. Minimal NT Services are running,
and there's only two cards in the backplane (AGP video and PCI network).
TaskMan reports only 13 processes, most of them NT's own, and memory
consumption at idle is under 15MB.

Since we've factored out the disk and video, what's left? Any suggestions?

RLH

Comments

  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    If your code is multi-threaded then add more CPUs. That should definitly
    speed things up.

    Jim

    -----Original Message-----
    From: [email protected]
    [mailto:[email protected]]On Behalf Of Richard Hartman
    Sent: Friday, May 26, 2000 4:47 PM
    To: NT Developers Interest List
    Subject: [ntdev] Going Faster...!


    We've just completed an exhaustive test of several Pentium and Athlon
    processors, trying to find the fastest combination of components on which to
    run our NTW4 software. Our code is carefully written to run entirely in
    memory - once we're loaded into memory, the hard drive is very seldom
    accessed and only then by NT's internal housekeeping. Our code runs as an NT
    Service and performs no video operations of any kind, so video bandwidth
    shouldn't be a factor either. Math operations are almost entirely integer,
    not FP. Because of these factors, we believe(d) we are limited only by core
    and bus speed... and while we don't expect speed to scale linearly with core
    speed, we expected _something_.

    Our testing showed that above 800MHz core speed there doesn't appear to be
    any benefit to faster cores. Going from 800 to 867 to 900 MHz, which
    represents a 13% increase in core speed, we see exactly _zero_ delivered
    improvement. Going from 450 to 700 to 800, we do see improvements - but they
    level off above 800. Changing the external bus speed (from 100 MHz to 133
    MHz) also has no measurable effect.

    These NTW4 machines are running very lean. Minimal NT Services are running,
    and there's only two cards in the backplane (AGP video and PCI network).
    TaskMan reports only 13 processes, most of them NT's own, and memory
    consumption at idle is under 15MB.

    Since we've factored out the disk and video, what's left? Any suggestions?

    RLH


    ---
    You are currently subscribed to ntdev as: [email protected]
    To unsubscribe send a blank email to $subst('Email.Unsub')
  • Paul_BunnPaul_Bunn Member Posts: 251
    Memory bandwidth may be an issue. I wrote a small program to test this and
    it is available at:
    ftp://ftp.ultrabac.com/pub/utils/bm_mem/x86/bm_mem.zip
    One of the tests stress-tests the ability to perform rapid context switches.
    I'm rather proud of the fact that this little program identified a bug in a
    beta of Win2K that revealed a disasterous performance problem that MS were
    able to fix prior to release (after blaming my code for the problem, of
    course!). The results that it gives aren't much use by themselves, but they
    are excellent at using as an "index" to compare other machines against each
    other.

    You may find that the problem is the quanta that NT's scheduler is letting
    you have. One way around this might be to boost your priority of execution
    to see if this helps (and you're not concerned with system performance by
    other applications).

    You might also want to run VTUNE from Intel to see where the processors are
    spending their time.

    Regards,

    Paul Bunn, UltraBac.com, 425-644-6000
    Microsoft MVP - WindowsNT/2000
    http://www.ultrabac.com




    -----Original Message-----
    From: [email protected]
    [mailto:[email protected]]On Behalf Of Richard Hartman
    Sent: Friday, May 26, 2000 4:47 PM
    To: NT Developers Interest List
    Subject: [ntdev] Going Faster...!


    We've just completed an exhaustive test of several Pentium and Athlon
    processors, trying to find the fastest combination of components on which to
    run our NTW4 software. Our code is carefully written to run entirely in
    memory - once we're loaded into memory, the hard drive is very seldom
    accessed and only then by NT's internal housekeeping. Our code runs as an NT
    Service and performs no video operations of any kind, so video bandwidth
    shouldn't be a factor either. Math operations are almost entirely integer,
    not FP. Because of these factors, we believe(d) we are limited only by core
    and bus speed... and while we don't expect speed to scale linearly with core
    speed, we expected _something_.

    Our testing showed that above 800MHz core speed there doesn't appear to be
    any benefit to faster cores. Going from 800 to 867 to 900 MHz, which
    represents a 13% increase in core speed, we see exactly _zero_ delivered
    improvement. Going from 450 to 700 to 800, we do see improvements - but they
    level off above 800. Changing the external bus speed (from 100 MHz to 133
    MHz) also has no measurable effect.

    These NTW4 machines are running very lean. Minimal NT Services are running,
    and there's only two cards in the backplane (AGP video and PCI network).
    TaskMan reports only 13 processes, most of them NT's own, and memory
    consumption at idle is under 15MB.

    Since we've factored out the disk and video, what's left? Any suggestions?
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    >Since we've factored out the disk and video, what's left? Any suggestions?

    Lots of things. For example, memory bandwidth and locality? PCI bus
    bandwidth? A 800 Mhz processor can thrash the cache just as fast as a 900
    Mhz processor.

    If you run a profile with VTune does it show any hotspots? Like for cache
    misses?

    You also could find the Intel processor performance counter plugin for NT's
    performance monitor (resource kit???), and look at lots of processor
    internal performance measures.

    The processor support chipset may have some effect on things too. For
    example, the memory latency of a 440BX chipset is lower than the newer 820
    chipset running with SDRAM. The 820 was designed for RDRAM, so has to have
    a memory protocol translator device in the memory access path. Are your
    comparisons on IDENTICAL systems, except for processor clock speed?

    You might also be bumping into some device latency limitation. For example,
    a LAN device I once worked on took 100+ microseconds for the firmware to
    process a command. Even on an infinitly fast processor, LAN performance
    would have not changed much because commands could only get processed at a
    firmware limited rate.

    If your product is a PCI device, you might also want to use a PCI bus
    analyzer to see how efficent your bus transfers are. Seeing the actual
    processor bus activity takes a very expensive piece of equipment (maybe
    thousands of dollars a month to rent).

    You could also be having some issue like your LAN device is interrupting
    5000 times/sec, and polluting the cache. Even if not that much data is
    getting transfered, the cache pollution could hurt. The faster Pentium
    III's only have 256K of faster L2 cache vs. 512K on slower processors. The
    Xeon's can have heaps more L2 cache, at a hefty price. I think Xeon's don't
    come in as fast core clock speeds as Pentium III's either.

    My suggestion is collect data, like from the Intel performance counters.

    - Jan
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 30 January 2023 Live, Online
Developing Minifilters 20 March 2023 Live, Online
Writing WDF Drivers TBD 2023 Live, Online
Internals & Software Drivers 17 April 2023 Live, Online