Sudden powerdown disk corruption

David_R_Cattley · February 6, 2013, 8:47pm

Ok, no really my area of expertise anymore …

Is there some reason that putting [the equivalent] of a UPS between the
computer and the raw power is not an option?

It seems like the most obvious solution. Somebody must make a DC-DC
conversion with instant auxiliary (battery) cut-over that could isolate you
from the vehicle power.

Weight does not seem to be a factor and you probably would not need a
terribly large battery.

I’ll go back under my rock now where I belong.

Good Luck,
Dave Cattley

[I will now be looking under every amusement ride looking for a PC]

anton_bassov · February 6, 2013, 10:18pm

> I think I’ve suggested that. Actually it needs to be solved in both arenas: I need to get a 'power

good’ line out of the main power supply that will go false some determinate time before power
drops below some specific voltage, and then I need some software to do something with that
power good signal. I’m negotiating with the vehicle vendor on getting the power good signal
and the guarantee on how long it is good for (which will be not many milliseconds).

…

. Now, what can I do with the signal to make writes stop fairly quickly?

Actually, this approach is certainly not new. The problem that I have mentioned in my previous post had been recognized more than 30 years ago, and was addressed (IIRC, by IBM) by introducing “power failure interrupt” that was meant to cancel all outstanding writes as fast as it could. However, I heard that it did not work as good as it was meant to, which eventually lead to the realization that this problem is better addressed by the file systems that should be developed with the possibility of power failure in mind.

I am afraid you need to find some file system other than NTFS. What you need here is a custom copy-on-write system that never overwrites the existing data. All updates that you make to such a FS are invisible (in fact, simply non-existent, in practical terms) until the moment you update the filesystem root node.

Let’s say you have 3 copies of this root node at the locations A, B and C. Every time you update A, you update B and C as well. Then you can discover valid state of the file system by examining these locations. There are the following cases:

A==B && B==C. Everything is OK
A==B && B!=C. The system crashed definitely after the updates of A and B had been accomplished. Therefore, the state pointed by A and B is valid
A!=B && B==C. The system crashed before update of B had commenced, which means that the state pointed by A may be invalid (i.e. in case if power failure had occurred while A was being updated ). Therefore, it would be more reasonable to think of the state pointed by B and C as of the last known valid one
A!b && B!=C. The system crashed while B was being updated. Therefore, the state pointed by A is valid

I know that writing a custom FS under Windows in certainly not the easiest thing one would imagine,
but still I don’t see any other more or less reliable way of addressing your problem…

Anton Bassov

James_Harper · February 6, 2013, 10:21pm

>

Actually, this approach is certainly not new. The problem that I have
mentioned in my previous post had been recognized more than 30 years
ago, and was addressed (IIRC, by IBM) by introducing “power failure
interrupt” that was meant to cancel all outstanding writes as fast as it could.
However, I heard that it did not work as good as it was meant to, which
eventually lead to the realization that this problem is better addressed by the
file systems that should be developed with the possibility of power failure in
mind.

I am afraid you need to find some file system other than NTFS. What you
need here is a custom copy-on-write system that never overwrites the
existing data. All updates that you make to such a FS are invisible (in fact,
simply non-existent, in practical terms) until the moment you update the
filesystem root node.

The system you describe might be useful if SSD’s weren’t so damn smart these days.

How sure are you that the write you are doing isn’t going to require erasing a block that contains existing data?

James

Maxim_S_Shatskih · February 6, 2013, 10:36pm

> would be possible to construct a 1-Farad capacitor for under $5,000 (1977

1-Farad capacitors for cars are now for sale in lots of consumer electronic shops to power the car audio for a long time after the ignition is off.

They are of a size of ~.33 Coca-Cola can.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

anton_bassov · February 6, 2013, 10:48pm

> How sure are you that the write you are doing isn’t going to require erasing a block that

contains existing data?

This is exactly the reason why COW-based FS is particularly useful here - you NEVER EVER erase/update anything, apart from the root node. Instead, you write an update to some new location, which, in turn, requires another update (I hope you understand why - you have to change the allocation tree/bitmap/etc,which means you have to make yet another update), so that every update you make to a FS is propagated all the way up to its root node. This is why all your updates are simply non-existent until you update a root node - if something goes wrong the filesystem stays exactly in the same state it used to be in before you had updated it. Basically, all your updates are nothing more than just an exercise in scratching on an unused space.

The only thing that potentially may be screwed up by the power failure is the actual update of the root node itself, because it happens to be the thing that you have to actually overwrite. This is why I am speaking about the scheme with 3 separate copies of it…

Anton Bassov

James_Harper · February 6, 2013, 10:55pm

>

> How sure are you that the write you are doing isn’t going to require erasing
> a block that contains existing data?

This is exactly the reason why COW-based FS is particularly useful here - you
NEVER EVER erase/update anything, apart from the root node. Instead, you
write an update to some new location, which, in turn, requires another
update (I hope you understand why - you have to change the allocation
tree/bitmap/etc,which means you have to make yet another update), so
that every update you make to a FS is propagated all the way up to its root
node. This is why all your updates are simply non-existent until you update a
root node - if something goes wrong the filesystem stays exactly in the same
state it used to be in before you had updated it. Basically, all your updates
are nothing more than just an exercise in scratching on an unused space.

The only thing that potentially may be screwed up by the power failure is the
actual update of the root node itself, because it happens to be the thing that
you have to actually overwrite. This is why I am speaking about the scheme
with 3 separate copies of it…

SSD’s these days don’t make visible the location of the underlying data. Your FS might decide that because one block of data is stored in cluster 123, and it wants to store this data in a new location, that cluster 100123 would be a good idea. With all the remapping that goes on, clusters 123 and 100123 might in fact be in the same erase block.

It’s unlikely, but all three copies of the root node could in fact be stored together (less unlikely if your disk is getting full and they are the root node copies are the only thing that gets modified).

But the system you describe is still better than anything else…

James

anton_bassov · February 6, 2013, 11:25pm

> With all the remapping that goes on, clusters 123 and 100123 might in fact be in the same erase block.

It’s unlikely, but all three copies of the root node could in fact be stored together (less unlikely if your disk
is getting full and they are the root node copies are the only thing that gets modified).

I see what you mean…

Basically, this is the problem of any shitty drive that may tell you that the operation has been completed while, in actuality, it is still pending somewhere in a hardware cache. It is understandable that writes to A, B and C have to be strictly sequential and cannot be combined in a single operation. Now consider what happens if the disk decides to “optimize” things behind your back and sends all 3 writes in one go. I heard that some shitty commodity drives may do things like that…

Anton Bassov

OSR_Community_User · February 7, 2013, 12:59am

A serious problem with modern hard drives is they have onboard RAM caches
with multiple megatbyte capacity. The old “head scheduling” algorithms
formerly done by the OS driver is now done in the drive firmware. But
what this means is you don’t actually know when the data is committed to
the platter, and it isn’t always FIFO. This means tests like A == B and B
!= C are not guaranteed to work to tell you the commitment order. SSDs
and Flash drives are different entities in terms of how smart they are
internally, for example, I think block writes work differently on each,
with flash drives being simpler and more straightforward. But I think
this may be addressing the problem from the wrong direction.

I might consider canceling all pending IRPs (which would require changes
in the apps if the need to recover from this). Then enqueue the request
to cleanly close the file system. Another approach, and I don’t know how
to do this in KMDF, would be to make the queues priority-ordered and move
the flush-write to the head of the queue. Key here is you need a hardware
guarantee that your system has Nms of run time after power-down and that
you can do the cancel-and-flush within those N ms. So the software
solution is to make sure that your 200ms of reads don’t block whatever you
need to do to get your write to clean up.

In addition, there might be a way to buy a reliable trasacted file system,
istead of building one. While expensive, it will not be as expensive as
writing your own.

Actually, your description of the power source makes it sound much less
system-friendly than a mere automotive system. But if you’ve already got
a guaranteed DC voltage, the “UPS” problem is easier: you just put a
battery in the circuit. I used a 6.3V sealed lead-acid battery with a
dual diode drop (2*0.7V) to deliver 4.9 V, which was enough to reliably
power the board. A friend designed a trickle-charge circuit to keep the
battery charged. I could handle a 6hr power outage. Due to the nature of
the app and problem domain, flaky circuit margins as the battery was dying
didn’t matter, because there was time for human intervention; an alarm
started sounding as soon as it went on battery backup (Sonalert made two
modules, and I picked the more annoying frequency).
joe

> With all the remapping that goes on, clusters 123 and 100123 might in
> fact be in the same erase block.
> It’s unlikely, but all three copies of the root node could in fact be
> stored together (less unlikely if your disk
> is getting full and they are the root node copies are the only thing
> that gets modified).

I see what you mean…

Basically, this is the problem of any shitty drive that may tell you that
the operation has been completed while, in actuality, it is still pending
somewhere in a hardware cache. It is understandable that writes to A, B
and C have to be strictly sequential and cannot be combined in a single
operation. Now consider what happens if the disk decides to “optimize”
things behind your back and sends all 3 writes in one go. I heard that
some shitty commodity drives may do things like that…

Anton Bassov

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · February 7, 2013, 1:31am

> But what this means is you don’t actually know when the data is committed to the platter, and

it isn’t always FIFO. This means tests like A == B and B != C are not guaranteed to work to tell
you the commitment order.

Actually, I don’t really care about the commitment order - it is just of zero importance here. What I DO care about is that when I am told that my write request has been committed I want to be 100% sure it has been actually committed to the persistent storage and will survive a reboot. I am not going to write to C until I get a confirmation that my outstanding write to B has been actually committed to the disk; I am not going to write to B until I get a confirmation that my outstanding write to A has been actually committed to the disk; and I am not going to write to A until I get a confirmation that all my previous outstanding writes have been committed. I know it does not sound particularly efficient, but this is just a checkpoint event that does not happen that often.

Therefore, I don’t mind if the disk reorders my requests. However, I Do want to be sure that I get informed about the events only after they have already taken place. Otherwise, I just have no way to ensure data consistency in case of power failure, no matter how elaborate the methods that I design are…

Anton Bassov

Mike_Kemp · February 7, 2013, 8:01am

While in BOF mode I recall when the new IBM 370 at the Computer Lab in Cambridge suffered a severe problem when an engineer servicing a VDU managed to drop an EHT cable onto the logic rail. Apparently 5V logic does not like 22kV spikes.

A warning with caps: when I was very young, on my Dad’s advice I powered a valve (tube) 6.3V heater directly off the 240V mains via a 4microfarad capacitor. It worked perfectly (the reactance dropping the effective voltage with no power loss). Some years later after the thing had been disassembled for a long time, I managed to get a nasty shock off the capacitor which was still storing a substantial charge.

End BOF mode… Mike

If you think road vehicles are bad, try working in the railroad locomotive
power environment sometime. 20KV 5ms spikes, both positive and negative, on
the 74VDC lines are commonplace. (Think about what happens when you switch

Mark_Roddy · February 7, 2013, 9:00am

Having read this great thread, couldn’t you solve your core problem by
having a read only boot disk? You could then guarantee that you could get
to the chkdsk recovery of your writable media, which you say is more or
less acceptable.No capacitators though.

Mark Roddy

On Tue, Feb 5, 2013 at 10:39 PM, Loren Wilton wrote:

> I’ve got a bunch of vehicle-mounted systems in the field that are based on
> NT Embedded. They use an 8G flash disk so that vehicle vibration won’t
> cause head crashes.
>
> The systems work fine, with one problem. The spec called for normal power
> sequences. When those occur things work fine. However, it turns out that
> the machines are subject to anything from 1-2 to 10 sudden power losses
> every day, with no warning. This is outside the spec for the unit, but the
> vehicle manufacturer simply said “we didn’t mention that because we didn’t
> think it was a problem”.
>
> Of course, it is. I’m getting systems returned as non-functional, and it
> turns out they are all suffering from disk corruption. I suspect this is
> due to a power drop in the middle of a disk write.
>
> I’m fishing for possible workarounds for this problem. There is nothing I
> can do to prevent the power losses, and the vehicle manufacturer can’t fix
> that either. So I have to live with it and find a way to keep them from
> causing disk corruption.
>
> Could I use the UPS “low battery” warning? What will NT do when it sees
> that? I can get around 75ms notification from the vehicle when power is
> going down before it dumps completely. I know that is real close, but is
> there a path where I could get disk writes inhibited before the power fail?
> Lost data is preferrable to trashed disk directories. (Almost all disk
> write activity should be to pre-opened log files, so there generally should
> be no directory activity at the time of a power failure.)
>
> Thanks for any suggestions!
>
> Loren
>
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.**cfm?name=ListServer http:
></http:>

Alex_Grig · February 7, 2013, 10:42am

>1-Farad capacitors for cars are now for sale in lots of consumer electronic
shops to power the car audio for a long time after the ignition is off.

Energy density in best capacitors is much less than energy density in lousest batteries.

Consider 1F 20V capacitor. It’s only 200 J. Now, a low quality 1.5V 1Ah AA battery. It’s 5400 J.

The question now is: Whe bother with a capacitor, if you can just leech off the main car battery.

OSR_Community_User · February 7, 2013, 12:30pm

Good point, and it is this precise problem that is becoming harder and
harder to address because “write commitment notification” is not part of
the abstract specification of the interface (I’ve told this story here
before, but one disk vendor, when I asked a similar question at a trade
show, said “If there’s any disk corruption, we just blame Microsoft”)
joe

> But what this means is you don’t actually know when the data is
> committed to the platter, and
> it isn’t always FIFO. This means tests like A == B and B != C are not
> guaranteed to work to tell
> you the commitment order.

Actually, I don’t really care about the commitment order - it is just of
zero importance here. What I DO care about is that when I am told that my
write request has been committed I want to be 100% sure it has been
actually committed to the persistent storage and will survive a reboot.
I am not going to write to C until I get a confirmation that my
outstanding write to B has been actually committed to the disk; I am not
going to write to B until I get a confirmation that my outstanding write
to A has been actually committed to the disk; and I am not going to write
to A until I get a confirmation that all my previous outstanding writes
have been committed. I know it does not sound particularly efficient, but
this is just a checkpoint event that does not happen that often.

Therefore, I don’t mind if the disk reorders my requests. However, I Do
want to be sure that I get informed about the events only after they have
already taken place. Otherwise, I just have no way to ensure data
consistency in case of power failure, no matter how elaborate the methods
that I design are…

Anton Bassov

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Daniel_Terhell · February 7, 2013, 12:34pm

What Windows does in case of any sign of possible corruption is bugcheck and
I think that is what you should do too. With NTFS supporting master file
table mirror records, the chance of corruption this way is very small.
Having always tested my drivers on development machines and having faced
many bugchecks while risking to “blow my source tree” as they had warned, it
actually never happened through the many years.

If you are getting serious data corruption through power failure it must be
due to hardware error or damage, not due to a software problem such as an
inconsistent state of some file system table.

//Daniel

Maxim_S_Shatskih · February 7, 2013, 1:11pm

> What Windows does in case of any sign of possible corruption is bugcheck

No. Red event Ntfs/55 (IIRC) is logged if NTFS will detect metadata corruption at runtime. No BSODs.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · February 7, 2013, 1:27pm

>severe problem when an engineer servicing a VDU managed to drop an EHT cable onto the logic

rail. Apparently 5V logic does not like 22kV spikes.

Do you mean CRT anode cable?

Anode/flyback units of CRT equipment cannot produce the current strong enough to be dangerous for a human (though the shock is rather feelable - like a hit by a soccer ball which was powerfully kicked).

It can still be a disaster for logic chips though.

Also, powering up the CRT device with anode cable disconnected from the tube seems to be a bit a strange idea

long time, I managed to get a nasty shock off the capacitor which was still storing a substantial
charge.

Good insulation in the capacitor.

When I was a schoolboy, we were making fun of charging the capacitors off the school’s wall power sockets and causing small electric shocks to one another

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · February 7, 2013, 1:44pm

The problem of capacitive charge can be serious. In some vacuum tube
eqipment, the capacitor for the HV circuitry could hold ehough charge to
even be lethal if it discharged across the heart at exactly the wrong
time–sortof an inverse defibrillator. DC power is not ordinarily
dangerous.

CRT flyback circuits had a 20000V 15 picofarad capacitor. A standard trick
in physics lab was to charge one of these up and toss it to someone. 15pf
did’t have enough energy to do any damage but could give the equivalent of
a static shock. And after a while, you didn’t have to waste time charging
them…just toss one and watch people jump away…

A rep from Ahmdahl Computer said that the real danger from -5.2V @ 1500A
(it was an ECL machine) was flying screwdrivers. A s rewdriver dropped
across the power bus would cause an arc so explosive that it could drive a
screwdriver across the room and bury it in drywall up to its handle.

EMF from turning off an electric typewriter destroyed a CRT in the
adjacent office…it induced large spikes on the unprotected circuit
board. It was the EMF pulse through the air that did it, not through the
power lines. When the CRT vendor sprayed the inside of their cases with
conductive paint and grounded it the problem went away.
joe

>severe problem when an engineer servicing a VDU managed to drop an EHT
> cable onto the logic
>rail. Apparently 5V logic does not like 22kV spikes.

Do you mean CRT anode cable?

Anode/flyback units of CRT equipment cannot produce the current strong
enough to be dangerous for a human (though the shock is rather feelable -
like a hit by a soccer ball which was powerfully kicked).

It can still be a disaster for logic chips though.

Also, powering up the CRT device with anode cable disconnected from the
tube seems to be a bit a strange idea

>long time, I managed to get a nasty shock off the capacitor which was
> still storing a substantial
>charge.

Good insulation in the capacitor.

When I was a schoolboy, we were making fun of charging the capacitors off
the school’s wall power sockets and causing small electric shocks to one
another

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Peter_Viscarola_OSR · February 7, 2013, 2:06pm

That must have been SOME typewriter.

Peter
OSR

Alex_Grig · February 7, 2013, 2:13pm

>20000V 15 picofarad

Holds 0.003 J, which is about as much energy as a quarter dropped from 10 cm.

Maxim_S_Shatskih · February 7, 2013, 2:25pm

> CRT flyback circuits had a 20000V 15 picofarad capacitor.

More-or-less modern equipment (since late 1970ies in USSR/Russia and I think 5-10 years earlier in the Western countries) uses diode+capacitor based voltage multiplier for anode voltage, implemented in a solid block of plastic. In even more modern equipment (like most PC CRT monitors since 1980ies), this device is combined with the flyback transformer in a single unit.

But older designs (~1970-75 in USSR/Russia), which used a kenotrone rectifier tube with 1V heating (single wire loop around the flyback, this ensures that the anode voltage appears on a CRT only after the horisontal deflection is up and running) mounted on the flyback itself, really had such a capacitor.

When I was around 12 years old, I was curious enough to disassemble this part of the old granny’s black-and-white 50cm TV (older then myself ), and really had an electric shock off this capacitor (14kV 10pF).

It was like a suggen hit of a seriously kicked soccer ball.

BTW - I think I have a .djvu file with a schematic of this TV if I will find it - I can email it by request

a static shock. And after a while, you didn’t have to waste time charging
them…just toss one and watch people jump away…

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com