NTFS fault tolerant?

Here are some questions for NTFS gurus. We store information in database of
our own format. From time to time we need to update critical structures, one
at a time. The size of this structure is small - about 200-300 bytes. We use
WriteFile routine followed by the FlushFileBuffer routine (file is opened
for buffered I/O). We really-really want to be tolerant to suprise shutdowns
(hardware reset and power losses). So, here are the questions:

  1. Can we say that at any given time we have either old version of
    information or new information? Can it be updated partially? As far as I
    understand, if the piece of data crosses page boundaries, we cannot
    guarantee that.

  2. Is it possible that data near the updated location becomes corrupted? We
    noticed that sometimes data before and/or after written block becomes
    corrupted if power was disrupted.

  3. Can anyone give some advise as to how we can achieve our goal? Does
    anyone know, how do enterprise databases guarantee consistency?

–htfv

  1. Write the new content of the structure to some log file
  2. Mark the log as valid
  3. Write to the main file
  4. After main file write completed - truncate the log

This must be the ONLY way how you write your structure to the disk.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Alexey Logachyov”
Newsgroups: ntfsd
To: “Windows File Systems Devs Interest List”
Sent: Wednesday, January 19, 2005 6:09 PM
Subject: [ntfsd] NTFS fault tolerant?

> Here are some questions for NTFS gurus. We store information in database of
> our own format. From time to time we need to update critical structures, one
> at a time. The size of this structure is small - about 200-300 bytes. We use
> WriteFile routine followed by the FlushFileBuffer routine (file is opened
> for buffered I/O). We really-really want to be tolerant to suprise shutdowns
> (hardware reset and power losses). So, here are the questions:
>
> 1. Can we say that at any given time we have either old version of
> information or new information? Can it be updated partially? As far as I
> understand, if the piece of data crosses page boundaries, we cannot
> guarantee that.
>
> 2. Is it possible that data near the updated location becomes corrupted? We
> noticed that sometimes data before and/or after written block becomes
> corrupted if power was disrupted.
>
> 3. Can anyone give some advise as to how we can achieve our goal? Does
> anyone know, how do enterprise databases guarantee consistency?
>
> --htfv
>
>
>
> —
> Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

Is it possible that some data that borders with updated structure is
corrupted? Let’s imagine the following situation:

  1. Write new content to log file. Content size is smaller than the
    allocation unit size (or whatever it is called).
  2. Mark the log as valid.
  3. As soon as data structure is small in size, some data is first read from
    the disk (by VMM).
  4. Write new content to the main file. Written data block size is larger
    than the size of the updated structure.
  5. At this moment computer is powered off.
  6. When we start again, we first check log file. If data found in log file,
    we consider that program was terminated unexpectedly. So, we try to copy
    log-file content to the main file. But we don’t have the data that was in
    the main file near updated structure.

Seems like we have to read a piece of data from the main file first, update
it in memory and then write to the log file, paying attention to all
alignments and stuff.

…Hope, I make myself clear.

–htfv

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
> 1) Write the new content of the structure to some log file
> 2) Mark the log as valid
> 3) Write to the main file
> 4) After main file write completed - truncate the log
>
> This must be the ONLY way how you write your structure to the disk.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
> ----- Original Message -----
> From: “Alexey Logachyov”
> Newsgroups: ntfsd
> To: “Windows File Systems Devs Interest List”
> Sent: Wednesday, January 19, 2005 6:09 PM
> Subject: [ntfsd] NTFS fault tolerant?
>
>
>> Here are some questions for NTFS gurus. We store information in database
>> of
>> our own format. From time to time we need to update critical structures,
>> one
>> at a time. The size of this structure is small - about 200-300 bytes. We
>> use
>> WriteFile routine followed by the FlushFileBuffer routine (file is opened
>> for buffered I/O). We really-really want to be tolerant to suprise
>> shutdowns
>> (hardware reset and power losses). So, here are the questions:
>>
>> 1. Can we say that at any given time we have either old version of
>> information or new information? Can it be updated partially? As far as I
>> understand, if the piece of data crosses page boundaries, we cannot
>> guarantee that.
>>
>> 2. Is it possible that data near the updated location becomes corrupted?
>> We
>> noticed that sometimes data before and/or after written block becomes
>> corrupted if power was disrupted.
>>
>> 3. Can anyone give some advise as to how we can achieve our goal? Does
>> anyone know, how do enterprise databases guarantee consistency?
>>
>> --htfv
>>
>>
>>
>> —
>> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>>
>> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
>> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>

First of all use non cahed i/o
Second have a look at a database design 101 resource

Cheers
Lyndon

“Alexey Logachyov” wrote in message news:xxxxx@ntfsd…
> Here are some questions for NTFS gurus. We store information in database
> of our own format. From time to time we need to update critical
> structures, one at a time. The size of this structure is small - about
> 200-300 bytes. We use WriteFile routine followed by the FlushFileBuffer
> routine (file is opened for buffered I/O). We really-really want to be
> tolerant to suprise shutdowns (hardware reset and power losses). So, here
> are the questions:
>
> 1. Can we say that at any given time we have either old version of
> information or new information? Can it be updated partially? As far as I
> understand, if the piece of data crosses page boundaries, we cannot
> guarantee that.
>
> 2. Is it possible that data near the updated location becomes corrupted?
> We noticed that sometimes data before and/or after written block becomes
> corrupted if power was disrupted.
>
> 3. Can anyone give some advise as to how we can achieve our goal? Does
> anyone know, how do enterprise databases guarantee consistency?
>
> --htfv
>
>

Please, explain how use of non-cached I/O will help. This is important for
us.

Second, we cannot change format of database. Our database is specialized for
our needs. It has nothing to do with tables, relations, primary keys, and
all that kinds of stuff. What we need is some advise on how to store data so
that we don’t lose or corrupt data on disk in case of unexpected
termination.

–htfv

“Lyndon J Clarke” wrote in message
news:xxxxx@ntfsd…
> First of all use non cahed i/o
> Second have a look at a database design 101 resource
>
> Cheers
> Lyndon
>
> “Alexey Logachyov” wrote in message
> news:xxxxx@ntfsd…
>> Here are some questions for NTFS gurus. We store information in database
>> of our own format. From time to time we need to update critical
>> structures, one at a time. The size of this structure is small - about
>> 200-300 bytes. We use WriteFile routine followed by the FlushFileBuffer
>> routine (file is opened for buffered I/O). We really-really want to be
>> tolerant to suprise shutdowns (hardware reset and power losses). So, here
>> are the questions:
>>
>> 1. Can we say that at any given time we have either old version of
>> information or new information? Can it be updated partially? As far as I
>> understand, if the piece of data crosses page boundaries, we cannot
>> guarantee that.
>>
>> 2. Is it possible that data near the updated location becomes corrupted?
>> We noticed that sometimes data before and/or after written block becomes
>> corrupted if power was disrupted.
>>
>> 3. Can anyone give some advise as to how we can achieve our goal? Does
>> anyone know, how do enterprise databases guarantee consistency?
>>
>> --htfv
>>
>>
>
>
>

I’m by no means an authority on this… But it stands to reason that
non-cached IO will mean data is directly written to disk.
This will mean that in the event of a crash etc that the data isn’t
sat in a cache somewhere waiting to be written out.
Of course… Power loss part way would screw this up…

As a simple example… When developing software if i’m using logs
i’ll write non-cached since if my application or driver crashes.
The log would be ‘complete’ up to that point.
I have used cached in the past, and yes sometimes additioanl random
data is written to the file.

Always writing direct to disk can cause problems with
performance, and timing. So in your situation perhaps reopening
the file without cacheing for these critical points would be better.

As for avoiding data loss via ‘random shutdowns’ either through
powerloss
or something else, I’m not sure outside the fact that its obvious cached

data that hasn’t been flushed yet is likely to be lost.

Off the top of my head I have a few ideas (like adding the new data in a
spare block
first, and switching use to this updated area once the write is verified
as correct).

I’m sure since this is such a critical area someone here can provide an
‘accepted’ method or point you at a resource.

BR,

Rob Linegar
Software Engineer
Data Encryption Systems Limited
www.des.co.uk | www.deslock.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexey Logachyov
Sent: 20 January 2005 10:17
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] NTFS fault tolerant?

Please, explain how use of non-cached I/O will help. This is important
for
us.

Second, we cannot change format of database. Our database is specialized
for
our needs. It has nothing to do with tables, relations, primary keys,
and
all that kinds of stuff. What we need is some advise on how to store
data so
that we don’t lose or corrupt data on disk in case of unexpected
termination.

–htfv

“Lyndon J Clarke” wrote in message
news:xxxxx@ntfsd…
> First of all use non cahed i/o
> Second have a look at a database design 101 resource
>
> Cheers
> Lyndon
>
> “Alexey Logachyov” wrote in message
> news:xxxxx@ntfsd…
>> Here are some questions for NTFS gurus. We store information in
database
>> of our own format. From time to time we need to update critical
>> structures, one at a time. The size of this structure is small -
about
>> 200-300 bytes. We use WriteFile routine followed by the
FlushFileBuffer
>> routine (file is opened for buffered I/O). We really-really want to
be
>> tolerant to suprise shutdowns (hardware reset and power losses). So,
here
>> are the questions:
>>
>> 1. Can we say that at any given time we have either old version of
>> information or new information? Can it be updated partially? As far
as I
>> understand, if the piece of data crosses page boundaries, we cannot
>> guarantee that.
>>
>> 2. Is it possible that data near the updated location becomes
corrupted?
>> We noticed that sometimes data before and/or after written block
becomes
>> corrupted if power was disrupted.
>>
>> 3. Can anyone give some advise as to how we can achieve our goal?
Does
>> anyone know, how do enterprise databases guarantee consistency?
>>
>> --htfv
>>
>>
>
>
>


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@des.co.uk
To unsubscribe send a blank email to xxxxx@lists.osr.com

The issue of reliability deals with the classes of failures against
which you are trying to protect. For example, transactional systems are
attempting to protect against a category of errors related to premature
termination of the system. File systems often also try to include
additional information to protect against physical damage to the
underlying storage (which transactional systems do NOT protect against!)

When you issue a cached I/O operation, the data is written to a location
in memory and asynchronously written back to disk - you have no
guarantees as to when it has been successfully written back.

When you issue a non-cached I/O operation, the data is written to a
persistent location - once that I/O has completed, you are guaranteed
that even if the system halts prematurely (like a spontaneous reboot)
then the data may be retrieved. Thus, transactional systems generally
use a log to record the change about to be made and then record the
change. If the system crashes AFTER the log is written but BEFORE the
data is written, the data is rewritten in order to recover from the
premature failure.

In transactional systems I’ve written previously, we normally checksum
the records within the log to protect against partial-write errors (in
other words, where only part of the record was written, not the entire
record) that might occur when the system fails. Within the file system,
we also duplicated some critical information in order to make recovery
from media failures (bad sectors, etc) simpler. In distributed systems,
we often would keep critical information stored in multiple replicas -
so that even if one computer “ceased to exist” we would still be able to
recover from the problems.

Thus, I suggest the first thing you do is try to identify the categories
of failures from which you need to protect yourself and then devise
strategies for protecting against those failures. For example you might
have a list like:

(1) Failure due to power failure
(2) Failure due to shutdown while work is in-progress
(3) Failure due to bugcheck
(4) Failure due to destruction of the data storage unit.

Then your solutions might be:

(1) A logging (transactional) system in which the integrity of the log
is protected using checksums.
(2) & (3) A logging system allowing recovery upon reboot
(4) Periodic off-site backups of the data (implemented by documenting
this for users)

I hope this makes sense. I can say that in my experience, building
resilient systems capable of handling a broad range of failures is quite
a bit more difficult than implementing the basic implementation
functionality.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class April
4, 2004 in Boston!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexey Logachyov
Sent: Thursday, January 20, 2005 5:17 AM
To: ntfsd redirect
Subject: Re:[ntfsd] NTFS fault tolerant?

Please, explain how use of non-cached I/O will help. This is important
for
us.

Second, we cannot change format of database. Our database is specialized
for
our needs. It has nothing to do with tables, relations, primary keys,
and
all that kinds of stuff. What we need is some advise on how to store
data so
that we don’t lose or corrupt data on disk in case of unexpected
termination.

–htfv

“Lyndon J Clarke” wrote in message
news:xxxxx@ntfsd…
> First of all use non cahed i/o
> Second have a look at a database design 101 resource
>
> Cheers
> Lyndon
>
> “Alexey Logachyov” wrote in message
> news:xxxxx@ntfsd…
>> Here are some questions for NTFS gurus. We store information in
database
>> of our own format. From time to time we need to update critical
>> structures, one at a time. The size of this structure is small -
about
>> 200-300 bytes. We use WriteFile routine followed by the
FlushFileBuffer
>> routine (file is opened for buffered I/O). We really-really want to
be
>> tolerant to suprise shutdowns (hardware reset and power losses). So,
here
>> are the questions:
>>
>> 1. Can we say that at any given time we have either old version of
>> information or new information? Can it be updated partially? As far
as I
>> understand, if the piece of data crosses page boundaries, we cannot
>> guarantee that.
>>
>> 2. Is it possible that data near the updated location becomes
corrupted?
>> We noticed that sometimes data before and/or after written block
becomes
>> corrupted if power was disrupted.
>>
>> 3. Can anyone give some advise as to how we can achieve our goal?
Does
>> anyone know, how do enterprise databases guarantee consistency?
>>
>> --htfv
>>
>>
>
>
>


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

We already use the method “off the top of your head” for large blocks of
data. We write data to spare blocks, but to switch to the updated area we
need to update a structure I referred to.

Even if we use non-cached I/O we will have to use caching of this or that
kind. As I said, out structure is smaller than an allocation unit (sector
size), so we have to load whole allocation unit first. May be specifying
FILE_FLAG_WRITE_THROUGH without FILE_FLAG_NO_BUFFERING would help but I need
to investigate.

As for cached I/O, we now use WriteFile followed by FlushFileBuffers. So,
virtually this must be the same as using non-cached I/O. Am I wrong?

“Rob Linegar” wrote in message news:xxxxx@ntfsd…

I’m by no means an authority on this… But it stands to reason that
non-cached IO will mean data is directly written to disk.
This will mean that in the event of a crash etc that the data isn’t
sat in a cache somewhere waiting to be written out.
Of course… Power loss part way would screw this up…

As a simple example… When developing software if i’m using logs
i’ll write non-cached since if my application or driver crashes.
The log would be ‘complete’ up to that point.
I have used cached in the past, and yes sometimes additioanl random
data is written to the file.

Always writing direct to disk can cause problems with
performance, and timing. So in your situation perhaps reopening
the file without cacheing for these critical points would be better.

As for avoiding data loss via ‘random shutdowns’ either through
powerloss
or something else, I’m not sure outside the fact that its obvious cached

data that hasn’t been flushed yet is likely to be lost.

Off the top of my head I have a few ideas (like adding the new data in a
spare block
first, and switching use to this updated area once the write is verified
as correct).

I’m sure since this is such a critical area someone here can provide an
‘accepted’ method or point you at a resource.

BR,

Rob Linegar
Software Engineer
Data Encryption Systems Limited
www.des.co.uk | www.deslock.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexey Logachyov
Sent: 20 January 2005 10:17
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] NTFS fault tolerant?

Please, explain how use of non-cached I/O will help. This is important
for
us.

Second, we cannot change format of database. Our database is specialized
for
our needs. It has nothing to do with tables, relations, primary keys,
and
all that kinds of stuff. What we need is some advise on how to store
data so
that we don’t lose or corrupt data on disk in case of unexpected
termination.

–htfv

“Lyndon J Clarke” wrote in message
news:xxxxx@ntfsd…
> First of all use non cahed i/o
> Second have a look at a database design 101 resource
>
> Cheers
> Lyndon
>
> “Alexey Logachyov” wrote in message
> news:xxxxx@ntfsd…
>> Here are some questions for NTFS gurus. We store information in
database
>> of our own format. From time to time we need to update critical
>> structures, one at a time. The size of this structure is small -
about
>> 200-300 bytes. We use WriteFile routine followed by the
FlushFileBuffer
>> routine (file is opened for buffered I/O). We really-really want to
be
>> tolerant to suprise shutdowns (hardware reset and power losses). So,
here
>> are the questions:
>>
>> 1. Can we say that at any given time we have either old version of
>> information or new information? Can it be updated partially? As far
as I
>> understand, if the piece of data crosses page boundaries, we cannot
>> guarantee that.
>>
>> 2. Is it possible that data near the updated location becomes
corrupted?
>> We noticed that sometimes data before and/or after written block
becomes
>> corrupted if power was disrupted.
>>
>> 3. Can anyone give some advise as to how we can achieve our goal?
Does
>> anyone know, how do enterprise databases guarantee consistency?
>>
>> --htfv
>>
>>
>
>
>


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@des.co.uk
To unsubscribe send a blank email to xxxxx@lists.osr.com

This may be a bit off topic here, but on a related note, do any of you guys
know whether most disk devices today gaurantee atomicity of sector writes
(i.e., a sector of data can never be half written; it’s either all written
or all not written) ?

Matt

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Thursday, January 20, 2005 12:17 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] NTFS fault tolerant?

The issue of reliability deals with the classes of failures against which
you are trying to protect. For example, transactional systems are
attempting to protect against a category of errors related to premature
termination of the system. File systems often also try to include
additional information to protect against physical damage to the underlying
storage (which transactional systems do NOT protect against!)

When you issue a cached I/O operation, the data is written to a location in
memory and asynchronously written back to disk - you have no guarantees as
to when it has been successfully written back.

When you issue a non-cached I/O operation, the data is written to a
persistent location - once that I/O has completed, you are guaranteed that
even if the system halts prematurely (like a spontaneous reboot) then the
data may be retrieved. Thus, transactional systems generally use a log to
record the change about to be made and then record the change. If the
system crashes AFTER the log is written but BEFORE the data is written, the
data is rewritten in order to recover from the premature failure.

In transactional systems I’ve written previously, we normally checksum the
records within the log to protect against partial-write errors (in other
words, where only part of the record was written, not the entire
record) that might occur when the system fails. Within the file system, we
also duplicated some critical information in order to make recovery from
media failures (bad sectors, etc) simpler. In distributed systems, we often
would keep critical information stored in multiple replicas - so that even
if one computer “ceased to exist” we would still be able to recover from the
problems.

Thus, I suggest the first thing you do is try to identify the categories of
failures from which you need to protect yourself and then devise strategies
for protecting against those failures. For example you might have a list
like:

(1) Failure due to power failure
(2) Failure due to shutdown while work is in-progress
(3) Failure due to bugcheck
(4) Failure due to destruction of the data storage unit.

Then your solutions might be:

(1) A logging (transactional) system in which the integrity of the log is
protected using checksums.
(2) & (3) A logging system allowing recovery upon reboot
(4) Periodic off-site backups of the data (implemented by documenting this
for users)

I hope this makes sense. I can say that in my experience, building
resilient systems capable of handling a broad range of failures is quite a
bit more difficult than implementing the basic implementation functionality.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class April 4,
2004 in Boston!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexey Logachyov
Sent: Thursday, January 20, 2005 5:17 AM
To: ntfsd redirect
Subject: Re:[ntfsd] NTFS fault tolerant?

Please, explain how use of non-cached I/O will help. This is important for
us.

Second, we cannot change format of database. Our database is specialized for
our needs. It has nothing to do with tables, relations, primary keys, and
all that kinds of stuff. What we need is some advise on how to store data so
that we don’t lose or corrupt data on disk in case of unexpected
termination.

–htfv

“Lyndon J Clarke” wrote in message
news:xxxxx@ntfsd…
> First of all use non cahed i/o
> Second have a look at a database design 101 resource
>
> Cheers
> Lyndon
>
> “Alexey Logachyov” wrote in message
> news:xxxxx@ntfsd…
>> Here are some questions for NTFS gurus. We store information in
database
>> of our own format. From time to time we need to update critical
>> structures, one at a time. The size of this structure is small -
about
>> 200-300 bytes. We use WriteFile routine followed by the
FlushFileBuffer
>> routine (file is opened for buffered I/O). We really-really want to
be
>> tolerant to suprise shutdowns (hardware reset and power losses). So,
here
>> are the questions:
>>
>> 1. Can we say that at any given time we have either old version of
>> information or new information? Can it be updated partially? As far
as I
>> understand, if the piece of data crosses page boundaries, we cannot
>> guarantee that.
>>
>> 2. Is it possible that data near the updated location becomes
corrupted?
>> We noticed that sometimes data before and/or after written block
becomes
>> corrupted if power was disrupted.
>>
>> 3. Can anyone give some advise as to how we can achieve our goal?
Does
>> anyone know, how do enterprise databases guarantee consistency?
>>
>> --htfv
>>
>>
>
>
>


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe send
a blank email to xxxxx@lists.osr.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com