Pentium 4 application design implications

I’ve been living with a Pentium 4 system for a little while now and am
noticing some differences that may have an impact on application design. My
box specifically has a P4 at 1.4 Ghz, with 384 MB of PC800 RAMBUS ram.

The biggest differences from a Pentium 3 machine so far are:

  1. addition of SSE2 instructions, basically 128-bit wide MMX+extensive FP
    SIMD instructions

  2. TREMENDOUSLY better memory bandwidth (finally), I measure about 650
    MBytes/sec of memory copy performance, Pentium 3’s with SDRAM tended to be
    around 150 MBytes/sec, none of this was done using Pentium 4 optimized
    code even

  3. cache line size is wider (it’s not totally clear if I should use 64 or
    128 bytes as the cache line width), which will have implications for
    drivers, and SMP code that needs to align shared data in ways to prevent
    cache thrashing

My previous beliefs about how to achieve high disk I/O throughput may need
adjusting. I suspect BUFFERED file operations may now suddenly be much more
effective for fast I/O. In the past week, I just happened to be doing some
video editing, and noticed how the rendering performance was significantly
better with my source and destinations file streams were on different
disks. This seemed to be caused by disk head thrashing if they were on the
same disk. My belief is the app, or perhaps the OS, should buffer LARGE
chunks, perhaps as much as a few megabytes. The memory bandwidth is so high
on the Pentium 4, it’s going to be a big win to focus on reducing disk head
thrashing, even if you have to copy the data to temporary buffers. In the
past, I would not have imagined that consolidating 128 kbyte data chunks by
copying, and then doing unbuffered I/O would be a win, now I do.

Doing the math…

Assume seek time is like 100 seeks/sec, and transfer rate is 25 MBytes/sec
(typical for the 75 GByte IBM 75GXP drives I have, for < $400 no less). So
let’s see, if you did alternating 64 kbyte unbuffered I/O we get about 2600
microseconds per 64k data transfer+10,000 for the seek, so we get about 2.5
MBytes/sec of file copy performance. If we increase the transfer size to 1
megabyte, we get about 50,000 microseconds per transfer, to get about 10
MBytes/sec of file copy performance. This isn’t quite to the point of
transfer time saturation, but getting close. It’s also hard to say the
impact of internal disk buffering.

This is strongly suggesting that anybody who wants fast streaming disk I/O
had better be doing 1 megabyte+ transfers, even if they have to copy
everything to a temp buffer. Keep in mind these are total back of envelope
calculations, and it’s early in the morning when my brain is not always
functioning correctly.

Running a real test on my system here… If I drag and drop a large file (a
gigabyte) using file explorer, to initiate a file copy, performance monitor
says: transfers are 64 kbytes or less, and total disk throughput is only a
few megabytes/sec (which would be right based on the transfer sizes). This
suggests my back of envelope calculations may not be too far off.

I’d say it’s a bit annoying to see the standard OS tool for copying files
getting performance that’s almost HALF AN ORDER OF MAGNITUDE less that my
hardware should be capable of. I also noticed for files larger than 4 GB,
the time to completion in file explorer is totally broken, it counts down
rapidly then wraps around to some huge numbers. This is all on Win2000 SP1
with the latest updates.

Dear Microsoft, do your developers actually USE Win2000 at all? You claim
to want to compete with serious workstations, yet without looking very
hard, I can find just horrible bugs like this. I can’t believe it would
take more than a programmer day or two to change the file explorer buffer
size from 64 KByte to 2 MBytes (tuned dynamically based on available
memory), and see a gigantic improvement in copy performance. I personally
spent a few hours last week watching file copies slowly happen.

  • Jan

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

> take more than a programmer day or two to change the file explorer buffer

size from 64 KByte to 2 MBytes (tuned dynamically based on available

It would be not so easy. Explorer’s copy uses:

  • memory-mapping the source
  • writing the destination from this memory-mapped source
    (I dunno whether the destination writes are cached - I hope not).
    So, 64KB is an inpage cluster size - and I expect the lazy writer and MPW to
    have some cluster size too - maybe even 64KB too.
    So, this is the kernel’s problem, not the Explorer’s.

Max


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com