Sudden powerdown disk corruption

A rather ordinary IBM Selectric. I arrived for a meeting about five
minutes after it happened.
joe

That must have been SOME typewriter.

Peter
OSR


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks.

>20000V 15 picofarad

Holds 0.003 J, which is about as much energy as a quarter dropped from 10
cm.


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

There was a time selenium-based rectifiers were used. When they burned out, they raised horrible stink.

…and the best part about The Good Old Days is that they are in the past!

My father knew te local TV repairman, and from age 9 to about 13 my source
for resistors and capacitors was whatever I could salvage from these
devices.
joe.

There was a time selenium-based rectifiers were used. When they burned
out, they raised horrible stink.


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>>1-Farad capacitors for cars are now for sale in lots of consumer

> electronic
shops to power the car audio for a long time after the ignition is off.

Energy density in best capacitors is much less than energy density in
lousest batteries.

Consider 1F 20V capacitor. It’s only 200 J. Now, a low quality 1.5V 1Ah AA
battery. It’s 5400 J.

The question now is: Whe bother with a capacitor, if you can just leech
off the main car battery.

Isolation. The huge voltage spikes seen at the battery (both positive and
negative) are the problem. But a relatively simple backup battery which
is used to power the circuit in qestion, and not used for anything else,
should work nicely.
joe


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>My father knew te local TV repairman, and from age 9 to about 13 my source
for resistors and capacitors was whatever I could salvage from these
devices.

My father was electronics engineer who was designing measuring equipment, and I was getting BRAND NEW resistors and capacitors and transistors! {:^{P

I don’t think I saw a new resistor until I built a multimeter from a kit
at age 12. And it was a couple more years before I could afford to buy my
own…

Those three 450V can capacitors were salvaged from TV sets.
joe

>My father knew te local TV repairman, and from age 9 to about 13 my
> source
for resistors and capacitors was whatever I could salvage from these
devices.

My father was electronics engineer who was designing measuring equipment,
and I was getting BRAND NEW resistors and capacitors and transistors!
{:^{P


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> Having read this great thread, couldn’t you solve your core problem by having a read only boot disk? You could then guarantee that you could get to the chkdsk recovery of your writable media, which you say is more or less acceptable.No capacitators though.

Hi Mark. I thought of this idea early on, and it is still a possibility. So far I have three examples, which is a fairly small universe. In all cases a chkdsk recovered the file system, but in one case a file had disappeared and I needed to put it back manually.

So I can’t really say that I can be sure (yet) that chkdsk will always be sufficient. It will certainly be better than the current state, and that might be “good enough”. It is under consideration as a possible ‘solution’.

Thanks,
Loren

Generally, the problem with hard drives is that they have moving parts,
and things with moving parts break. But a small boot disk could be
powered down after the boot, so it would only be spun up on recovery
events. A capacitor or battery would still be useful for short outages; I
have four 1500VA backup supplies, and their major contribution is in
dealing with the 5-second outages (there’s a reason our local power
company is affectionately[?] referred to as “Duquesne Flicker & Flash”).
joe

> Having read this great thread, couldn’t you solve your core problem by
> having a read only boot disk? You could then guarantee that you could
> get to the chkdsk recovery of your writable media, which you say is more
> or less acceptable.No capacitators though.

Hi Mark. I thought of this idea early on, and it is still a possibility.
So far I have three examples, which is a fairly small universe. In all
cases a chkdsk recovered the file system, but in one case a file had
disappeared and I needed to put it back manually.

So I can’t really say that I can be sure (yet) that chkdsk will always be
sufficient. It will certainly be better than the current state, and that
might be “good enough”. It is under consideration as a possible
‘solution’.

Thanks,
Loren


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

What you do care about is that any write in progress during a power failure
could damage any other arbitrary data that is stored on the device. This is
a consequence of multi-level cell media design and cannot be mitigated by
any software protocol that assumes the orthogonally of write operations
with respect to other data. This is directly analogous to the head crashing
problem that others have mentioned (which has been effectively solved in HDD
hardware for some years) where any in progress operation during a power
failure may affect the data in any sector on the media; thus destroying the
force unit access / committed to persistent media semantic underpinning this
and all ‘transactionally consistent’ filesystems

To the OP:

I believe that there exist SSDs that effectively implement the control
necessary to prevent arbitrary data corruption during power failure, but
expect them to be an order of magnitude more expensive than commodity
hardware. I suspect that a more practical solution would be simply to
prevent your systems from loosing power by installing a UPS of some kind
between your power supply and your equipment. By simply reducing the
frequency with which your systems experience uncontrolled shutdown, you
reduce the probability of malfunction. And if you can reduce that frequency
to near zero, then you can reduce the frequency of malfunction caused by any
of these effects to near-zero too

wrote in message news:xxxxx@ntdev…

But what this means is you don’t actually know when the data is committed
to the platter, and
it isn’t always FIFO. This means tests like A == B and B != C are not
guaranteed to work to tell
you the commitment order.

Actually, I don’t really care about the commitment order - it is just of
zero importance here. What I DO care about is that when I am told that my
write request has been committed I want to be 100% sure it has been
actually committed to the persistent storage and will survive a reboot. I
am not going to write to C until I get a confirmation that my outstanding
write to B has been actually committed to the disk; I am not going to
write to B until I get a confirmation that my outstanding write to A has
been actually committed to the disk; and I am not going to write to A until
I get a confirmation that all my previous outstanding writes have been
committed. I know it does not sound particularly efficient, but this is just
a checkpoint event that does not happen that often.

Therefore, I don’t mind if the disk reorders my requests. However, I Do want
to be sure that I get informed about the events only after they have already
taken place. Otherwise, I just have no way to ensure data consistency in
case of power failure, no matter how elaborate the methods that I design
are…

Anton Bassov

> What you do care about is that any write in progress during a power failure could damage any

other arbitrary data that is stored on the device. This is a consequence of multi-level cell media design

This is true. You can check

http://www.eetimes.com/design/memory-design/4237059/Understanding-the-effects-of-power-failure-on-flash-based-SSDs

As you will see, according to their experiments, the retroactive data corruption effect can result in as high as 25% bit error for the first pages.

and cannot be mitigated by any software protocol that assumes the orthogonally of write
operations with respect to other data.

thus destroying the force unit access / committed to persistent media semantic underpinning this
and all ‘transactionally consistent’ filesystems

Well, this is already a bit of exaggeration. Your concerns are perfectly valid, but this is what mirroring and checksuming done at the FS level are for. For example, ZFS was designed specifically with the assumption in mind that data (both user data and metadata) may eventually get corrupted due to the bit rot…

Anton Bassov

And what prevents the checksum from being corrupted because it happens to
share some physical connection to another sector with an in-progress write?

Adding checksums or other data integrity verification cannot prevent
corruption. It can increase the probability of detecting it (and possibly
of recovering from it), but if the media is unstable for any reason, then
any data integrity technique has a non-zero probability of failure

Using multi-level cell technology increases this error rate because unlike
conventional disks where decades of hard learned lessons have taught design
engineers the dangers and canonical mitigations for head crashing and other
electrical / physical failures, SSDs are new. Heretofore, SSD design issues
have primarily focused on speed and wear-leveling, but block level tests
using test harnesses are a long way from actual file system use in the
field. And based on the frequency with which IBM updates drive firmware for
certain product lines, one can infer that at least for that manufacturer,
SSD is not a mature technology yet.

wrote in message news:xxxxx@ntdev…

What you do care about is that any write in progress during a power
failure could damage any
other arbitrary data that is stored on the device. This is a consequence
of multi-level cell media design

This is true. You can check

http://www.eetimes.com/design/memory-design/4237059/Understanding-the-effects-of-power-failure-on-flash-based-SSDs

As you will see, according to their experiments, the retroactive data
corruption effect can result in as high as 25% bit error for the first
pages.

and cannot be mitigated by any software protocol that assumes the
orthogonally of write
operations with respect to other data.

thus destroying the force unit access / committed to persistent media
semantic underpinning this
and all ‘transactionally consistent’ filesystems

Well, this is already a bit of exaggeration. Your concerns are perfectly
valid, but this is what mirroring and checksuming done at the FS level are
for. For example, ZFS was designed specifically with the assumption in mind
that data (both user data and metadata) may eventually get corrupted due to
the bit rot…

Anton Bassov

> And what prevents the checksum from being corrupted because it happens to share some

physical connection to another sector with an in-progress write? Adding checksums or other data
integrity verification cannot prevent corruption.

Well, no one says that it can prevent corruption of any given sector, and no one says that it should even try to reach this goal which, after all, is unrealistic in itself. Its purpose is totally different, don’t you think…

It can increase the probability of detecting it (and possibly of recovering from it),

Exactly, and, as far as this goal is met, the filesystem as a whole (including user data) is not corrupt, although
data on certain sectors may be destroyed/corrupted…

but if the media is unstable for any reason, then any data integrity technique has a
non-zero probability of failure

First let’s decide what the term “failure” refers to - we seem to be speaking about different things.
I am speaking about ensuring the integrity of a filesystem and user data (which, as long as more than one copy of any given data sector is being made by the FS, can be achieved), and you seem to be speaking about preventing damage to a given sector, which is obviously simply unrealistic goal…

Using multi-level cell technology increases this error rate because unlike conventional disks
where decades of hard learned lessons have taught design engineers the dangers and
canonical mitigations for head crashing and other electrical / physical failures, SSDs are new.

You seem to be idealizing the conventional rotating media. Although, compared to SSD, it is, indeed, more stable, silent data corruption due to bit rot may be a real-life problem, rather than merely theoretical one.
As long as disk capacities were relatively low, this issue was almost of zero practical importance.
However, taking into account the growing storage capacities of the modern disks, this silent corruption becomes more and more of an issue of concern, and ZFS was designed specifically with this issue in mind…

Anton Bassov

I think the main point is that bit-rot on magnetic media and corruption
caused by multi-level cell failures are sufficiently different that
techniques used to mitigate to consequences of one form of corruption are
unsuitable to mitigate the other. In the same way that a file system that
optimizes based on CHS addressing no-longer optimizes anything, protecting
data with an algorithm based on the orthogonally of stored data is
unsuitable for MLC SSD without some knowledge of its internal construction

Note that the possibility that the whole file system can become corrupted
from any single operation was exactly my point when I said that it
undermined the basic principals of transactional file systems. This is
exactly the real world problem the OP is having, and another file system
might be more or less prone to to the problem, but when the hardware can’t
conform to the interface assumptions, only bad things can happen

wrote in message news:xxxxx@ntdev…

And what prevents the checksum from being corrupted because it happens to
share some
physical connection to another sector with an in-progress write? Adding
checksums or other data
integrity verification cannot prevent corruption.

Well, no one says that it can prevent corruption of any given sector, and no
one says that it should even try to reach this goal which, after all, is
unrealistic in itself. Its purpose is totally different, don’t you think…

It can increase the probability of detecting it (and possibly of
recovering from it),

Exactly, and, as far as this goal is met, the filesystem as a whole
(including user data) is not corrupt, although
data on certain sectors may be destroyed/corrupted…

but if the media is unstable for any reason, then any data integrity
technique has a
non-zero probability of failure

First let’s decide what the term “failure” refers to - we seem to be
speaking about different things.
I am speaking about ensuring the integrity of a filesystem and user data
(which, as long as more than one copy of any given data sector is being made
by the FS, can be achieved), and you seem to be speaking about preventing
damage to a given sector, which is obviously simply unrealistic goal…

Using multi-level cell technology increases this error rate because unlike
conventional disks
where decades of hard learned lessons have taught design engineers the
dangers and
canonical mitigations for head crashing and other electrical / physical
failures, SSDs are new.

You seem to be idealizing the conventional rotating media. Although,
compared to SSD, it is, indeed, more stable, silent data corruption due to
bit rot may be a real-life problem, rather than merely theoretical one.
As long as disk capacities were relatively low, this issue was almost of
zero practical importance.
However, taking into account the growing storage capacities of the modern
disks, this silent corruption becomes more and more of an issue of concern,
and ZFS was designed specifically with this issue in mind…

Anton Bassov

Wow, this is a very interesting thread!

Back to the original topic at hand, I do not think I would have gone with a full-fledged OS route for this sort of device/application. Perhaps a microcontroller-based solution would have been more resilient and easier to patch to deal with this sort of unexpected behavior on a hardware/electrical level. (A $5 voltage regulator and a $2 capacitor and you have 5 minutes of battery backup!)

I’m not saying this to criticize the original design decisions, but rather to adapt some working points in hopes of perhaps mitigating the problems you’re experiencing.

Microcontroller firmware is typically stored on read-only memory (well, EEPROM and the like, but while running, read-only), and data is written to a secondary disk or device either as files or binary data, and flushed in real-time. A read-only boot disk has been brought up several times. I experimented with read-only boot drives for Windows in the past, and there are many issues that arise as the OS does not expect to be running on read-only medium - though I am beyond certain that there are many others on this list with more experience with solving this problem than myself. I do recall that it was possible to get it to work. However, perhaps the easiest would be some sort of “pseudo-write” solution where the actual data is read-only with a write buffer on top so that Windows thinks it is being loaded from an RW device. I know some flash disks can simulate that sort of behavior when you trigger the write-lock on the media.

You’ve provided very little info about the configuration of the disk controllers in the OS. You have not mentioned whether or not kernel/driver-level write caching has been enabled, disabled, or even set to the max. I do believe this option was present even back in NT (and I have no clue if the OP means NT NT or another NT-based release).

A lot of the comments here are talking about the differences between SSDs and conventional magnetic media. I must point out that the original posts says an “8 GB flash disk” which I presume is actually more along the lines of an SD card than a SSD disk. Again, more info is needed - the original question is very light on specifications and details.

However, if it is indeed an SD card and not an SSD, then a lot of the assumptions made in the thread no longer apply. While the pseudo-addressing still applies, you no longer have firmware making highly-sophisticated read/write caches in its own memory (again, this would require unprovided info about the SD card controller, too!), and even if it were present, it’s nowhere near as sophisticated as that in a recent, say, Sandforce controller.

And if this is neither an SSD nor an SD but rather a USB stick - then that’s a whole ‘nother can of worms. You have to take into account the USB bus’ handling of power failure and this goes back to more unprovided specs. Is this a modern CPU w/ integrated USB in the chip itself, on the northbridge, or completely separate on the motherboard? And, of course, each USB flash disk is a beast of a different nature. You have everything from the crappiest Chinese knockoffs that can’t even guarantee proper read/write under normal circumstances that sell for basically the cost of the flash storage itself, to some really high performance units that have a significant amount of tech in the embedded controllers to provide high-quality data transmissions.

Then you have the drivers. If this USB, then Windows NT (to the best of my knowledge!) needs 3rd party drivers to provide that functionality. What’s going on behind the scenes there? The same applies if not USB - again, where does the controller physically reside, and how is it controlled.

In short, a very interesting question with too many things that could go wrong; not enough details to provide working solutions. I understand the original question was actually very different (can I use “bad power” interrupt to save data), but many lessons to be learned.