Also: are you sure the corruption is due to software? Vehicular systems
are notorious for things like power spikes, or failures that can be
portrayed as |||||||||||||…|…|.|.|…|.|.|…|.|.|
where the | indicates power present and . is power absent. I knew someone
who had worked in the auto industry and had a real set of horror stories
about power and temperature problems they were encountering during the
initial digital revolution of the 1980s. His previous experience had been
in “battle-hardened” military systems, and he said these were easy
compared to the automobile industry.
One possible way of extending that 75ms could be a big MF capacitor. I
once did an embedded system with about 50,000mf in the power supply (it
was a low-power system-on-a-board, and I had a couple hundred ms to “safe”
the system. It had no secondary storage, and this was well before flash
drives). I had about a second before the cutoff circuity cut the power to
zero (I didn’t design that circuit, I found it in an application note.
The problem was that like most circuitry of that era it ran just dandy at
5.0 volts, but erratically at 4.3 volts, so the power curve was a slow
decay from 5.0 to about 4.7 volts, then complete cutout to zero; startup
was that power did not appear until it reached about 4.8 volts, with the
“reset” line held low until somewhere between 4.9 and 5.0 volts had been
present for more than a couple seconds.) The lesson here is that if you
continue writing to the SSD at below-spec voltages, it doesn’t matter that
the CPU and write circuitry continue to run; the data may not arrive at
the SSD intact, or the voltage may be too low to get it written reliably
(and when the address-select circuits aren’t working right, all bets are
off). I was never very good at analog circuitry, so I either bought power
supplies or relied on serious experts to give me designs.
So before you go searching for a software solution, make sure the hardware
is going to function correctly during those 75ms.
(A friend designed a disk controller. It had special power-detection
circuitry, and would retract the heads from the disk cartridge on power
failure. What he forgot to do was turn off power to the write head. So
if there was a power failure during a write operation, the heads wrote a
spiral as they retracted outwards. The disk was unrecoverable. He could
handle, in the added bits on each track, recovery from an 8-bit burst
error. He later computed that based on the head retraction, the
decay-spiral corrupted 11 bits on each sector it passed over. There was
no software solution for this; several hundred controllers required
hardware mods). So consider my log file example of the previous post and
make sure that if you are trying to do some kind of commit or rollback in
the power-failure window that the hardware itself will be reliable for the
75ms, or whatever you end up with. Just because the CPU can continue to
fetch and execute instructions doesn’t guarantee that your SSD will have
the power (in watt-seconds at nominal operating voltage) to successfully
write the data.
joe
I’ve got a bunch of vehicle-mounted systems in the field that are based on
NT Embedded. They use an 8G flash disk so that vehicle vibration won’t
cause
head crashes.
The systems work fine, with one problem. The spec called for normal power
sequences. When those occur things work fine. However, it turns out that
the
machines are subject to anything from 1-2 to 10 sudden power losses every
day, with no warning. This is outside the spec for the unit, but the
vehicle
manufacturer simply said “we didn’t mention that because we didn’t think
it
was a problem”.
Of course, it is. I’m getting systems returned as non-functional, and it
turns out they are all suffering from disk corruption. I suspect this is
due
to a power drop in the middle of a disk write.
I’m fishing for possible workarounds for this problem. There is nothing I
can do to prevent the power losses, and the vehicle manufacturer can’t fix
that either. So I have to live with it and find a way to keep them from
causing disk corruption.
Could I use the UPS “low battery” warning? What will NT do when it sees
that? I can get around 75ms notification from the vehicle when power is
going down before it dumps completely. I know that is real close, but is
there a path where I could get disk writes inhibited before the power
fail?
Lost data is preferrable to trashed disk directories. (Almost all disk
write
activity should be to pre-opened log files, so there generally should be
no
directory activity at the time of a power failure.)
Thanks for any suggestions!
Loren
NTDEV is sponsored by OSR
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer