A serious problem with modern hard drives is they have onboard RAM caches
with multiple megatbyte capacity. The old “head scheduling” algorithms
formerly done by the OS driver is now done in the drive firmware. But
what this means is you don’t actually know when the data is committed to
the platter, and it isn’t always FIFO. This means tests like A == B and B
!= C are not guaranteed to work to tell you the commitment order. SSDs
and Flash drives are different entities in terms of how smart they are
internally, for example, I think block writes work differently on each,
with flash drives being simpler and more straightforward. But I think
this may be addressing the problem from the wrong direction.
I might consider canceling all pending IRPs (which would require changes
in the apps if the need to recover from this). Then enqueue the request
to cleanly close the file system. Another approach, and I don’t know how
to do this in KMDF, would be to make the queues priority-ordered and move
the flush-write to the head of the queue. Key here is you need a hardware
guarantee that your system has Nms of run time after power-down and that
you can do the cancel-and-flush within those N ms. So the software
solution is to make sure that your 200ms of reads don’t block whatever you
need to do to get your write to clean up.
In addition, there might be a way to buy a reliable trasacted file system,
istead of building one. While expensive, it will not be as expensive as
writing your own.
Actually, your description of the power source makes it sound much less
system-friendly than a mere automotive system. But if you’ve already got
a guaranteed DC voltage, the “UPS” problem is easier: you just put a
battery in the circuit. I used a 6.3V sealed lead-acid battery with a
dual diode drop (2*0.7V) to deliver 4.9 V, which was enough to reliably
power the board. A friend designed a trickle-charge circuit to keep the
battery charged. I could handle a 6hr power outage. Due to the nature of
the app and problem domain, flaky circuit margins as the battery was dying
didn’t matter, because there was time for human intervention; an alarm
started sounding as soon as it went on battery backup (Sonalert made two
modules, and I picked the more annoying frequency).
joe
> With all the remapping that goes on, clusters 123 and 100123 might in
> fact be in the same erase block.
> It’s unlikely, but all three copies of the root node could in fact be
> stored together (less unlikely if your disk
> is getting full and they are the root node copies are the only thing
> that gets modified).
I see what you mean…
Basically, this is the problem of any shitty drive that may tell you that
the operation has been completed while, in actuality, it is still pending
somewhere in a hardware cache. It is understandable that writes to A, B
and C have to be strictly sequential and cannot be combined in a single
operation. Now consider what happens if the disk decides to “optimize”
things behind your back and sends all 3 writes in one go. I heard that
some shitty commodity drives may do things like that…
Anton Bassov
NTDEV is sponsored by OSR
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer