This statement is STILL too strong. Here are two distinct cases that I
would consider to obey WRITE_THROUGH and yet still allow a disk
controller to reorder writes:
-
NTFS writes I/O operations in “clusters” that consist of one (or more)
disk sectors. A single cluster might involve I/O to disjoint regions on
the disk (sector sparing or some bizarre striping implementation).
There is *no* ordering constraint on the writes within that region.
Essentially, then, the “WRITE_THROUGH” request (which is nothing more
than a request to the disk and/or controller to disable write-back
caching for that given instance) merely says “tell me when the I/O is
really committed to disk”. Nothing within the operation requires that
the sectors be written in the correct order (I’m not sure how NTFS
handles this issue, but we used to compute a checksum over log records
to detect sector-write failures on replay).
-
Distinct I/O operations to distinct regions of the disk may be
interleaved. There is no “atomicity” or ordering with respect to two
different I/O operations to the same disk. Combining that with the
previous example, there’s no ordering relative to sector-level writes
split between two different I/O operations.
While Ravi’s point is correct - my comments about NTFS were overly
strong (think of it as disabling write-back caching on the data IT cares
about, rather than ALL data on the drive) the underlying point remains:
any journaling system needs to know that once its write to disk has been
acknowledged it will never go away. There is no other way to ensure
inter-disk operations remain consistent with one another. That
shouldn’t be an issue for NTFS (where, as I understand it the journal is
part of the NTFS file system) but is an issue if you store the journal
on a different disk drive (witness IBM’s AIX where JFS used a separate
logical volume for storing the journal for a file system example).
Of course, this does not equate to requiring that the data be “on disk”,
merely that it be persistent. A disk (or controller) with NVRAM can
provide blazingly fast performance by lying about such I/O operations -
but they guarantee the data will eventually be written back to disk.
That’s sufficient for our purposes.
Regards,
Tony
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ravisankar Pudipeddi
Sent: Tuesday, November 02, 2004 1:16 PM
To: ntdev redirect
Subject: RE: [ntdev] Ordering of I/O requests
Disk controllers that reorder writes when you have specified
WRITE_THROUGH and synchronously waited for a request to complete before
sending down another are buggy. Or they are lying so as to get a boost
in performance at the expense of correctness. You will see chkdsk
running on those machines. Caching in the controller is one thing but
not ordering writes is something else altogether, when WRITE_THROUGH is
supplied.
Actually NTFS doesn’t disable write back caching - because it assumes
ordering is never broken, it uses WRITE_THROUGH.
Ravi
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Tuesday, November 02, 2004 9:35 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Ordering of I/O requests
I have a very file systems centric view here - and 15 years ago I was
developing the transaction/journaling components of a journaling file
system, so I’ve been over this territory before.
The general rule is: nobody cares unless they explicitly ask. In other
words, if you don’t ask, the underlying system will assume that you
don’t care about the ordering of specific operations.
For a journaling file system, we have the same problem as a database.
In my personal example from so many years ago, we used an old value/new
value journal, which means that we stored the data BEFORE the change and
the data AFTER the change. We then would periodically write a block of
such change records out to disk.
There are ordering constraints here - we must ensure that pieces of the
log are committed (permanently recorded on disk) before the
corresponding changes to meta-data are written out to disk. Thus to
guarantee our transactional semantics, we needed to ensure that any
hardware level out-of-order caching was disabled. Otherwise, we
couldn’t guarantee correct recovery.
NTFS does exactly the same thing - it explicitly requests that
write-back caching be DISABLED in the disk controller and on the disk
drive itself (see IOCTL_DISK_SET_CACHE_INFORMATION and
IOCTL_DISK_SET_CACHE_SETTING for example, no doubt there are also other
mechanisms floating around through here).
If you want to implement caching, the only safe transparent way to do
this (from a single disk transactional perspective) is to guarantee no
reordering of operations. But databases where the log is on one disk
and the database on another will NOT be happy if they find out that data
they were told had committed to the log didn’t, while the updates (which
must now be aborted) were written to the 2nd disk drive. The only safe
way to do this is to disable caching on both drives, so that once data
has been acknowledged back, we know that it has been written out to
disk.
A quick search also turns up yet more information. For example
http://www.storagereview.com/guide2000/ref/hdd/if/scsi/protCQR.html
discusses tagged command queuing (and this DOES improve performance).
There is a wealth of information about this topic floating around. This
just demonstrates that systems DO reorder disks.
Regards,
Tony
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Programmers Society
Prokash Sinha
Sent: Tuesday, November 02, 2004 12:12 PM
To: ntdev redirect
Subject: RE: [ntdev] Ordering of I/O requests
Now again I’m lost, just out of curiosity, if a disk controller does
reordering of writes and reads, and could not feed the upper interface
layer with the most current data ( transaction symantics ) then I would
assume it is nothing but a brick. So the question comes, when would it
be really necessary to turnoff disk caching, assuming that the catching
( say at sector level ) is implemented correctly at the disk firmware
level !!!
-pro
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com