It’s been a long time since I’ve turned to this forum, but I’m stumped on a
problem that doesn’t make any sense.
I’m working on a DMA driver for Vista/Win7 for a custom compute engine that
supports multiple channels of DMA. The data flow through the various DMA
channels is all interdependent in such a way that if the driver receives a
cancellation for a DMA request that is queued up for one of the DMA
channels, it needs to idle all of the DMA engines, cancel all of the
outstanding requests for all of the DMA engines, and reinitialize
everything. This, in itself, is not a real problem because the applications
that use this device don’t ever explicitly call IoCancel, and thus the only
real reason any queued up or in-progress DMA requests would get cancelled is
if the application crashes or is killed without performing a clean shutdown
of things. Furthermore, I now seem to have all of the cancellation and
cleanup working just fine - or so it appears.
To test that the cancellation is working correctly, I run a script that
continuously fires up an application that uses the hardware, and then after
a random period of time kills the running app while it’s in the middle of
doing intensive DMA transfers on multiple channels simultaneously. The
script keeps doing this in a loop until something goes wrong.
Due to the nature of how the hardware works in conjunction with their
corresponding application software, it only makes sense for one application
to ever have a particular device open at any given time. Thus, the driver
implements a flag in the device object’s device extension that gets
atomically set during EvtDeviceFileCreate using InterlockedExchange, and
then cleared in EvtFileClose. If the flag is found to have been previously
set in EvtDeviceFileCreate, it completes the create request with
STATUS_SHARING_VIOLATION.
So, after running the script all day long without issue, eventually (like
after 500+ iterations), it stops because after killing the app during the
prior iteration, it cannot open the device again when firing up a new app to
run for the next iteration. EvtDeviceFileCreate gets called by the new app,
but sees that the flag is set, and fails the request. The crux of the
problem is that EvtFileClose never got called following the killing of the
app during the prior iteration. The docs state, “The framework calls a
driver’s EvtFileClose callback function when the last handle for a file
object has been closed and released, and all outstanding I/O requests have
been completed or canceled.”
Fortunately, I captured this while running a debug version of the driver
with WinDBG attached, so I’ve got the condition captured in a live debug
session. I’ve verified that all outstanding I/O requests from the app that
was last killed were correctly cancelled and no I/O requests are left
outstanding for the device. I’ve poked around looking for all sorts of
things, hoping to find some dangling DMA transaction object or something
else out of sorts, but as far as I can tell, everything was cleaned up
correctly following the killing of the prior app, except that
!wdfopenhandles reveals that there’s still a lingering open file object on
the device, indicating that the file handle used by the app that was last
killed somehow still hasn’t been closed and released properly, and until
that occurs, the EvtFileClose callback won’t get called (in addition to
clearing the flag, this callback also does some other final cleanup tasks
that are required before the device can be used by another app, but that’s
outside of the scope of this discussion - the key issue is that EvtFileClose
isn’t getting called).
Oddly, even though there seems to be a handle still open for the file
object, the app that was killed is gone - i.e. it’s not lingering around
waiting for the last I/O to complete or anything like that. So I can’t
figure out why the file object is still out there. Windows should have taken
care of closing all of the handles to the file object when the app was
killed, so why is there a handle still not closed or released?
Once in this state, the driver is effectively unusable since you can’t open
it with a new app. Furthermore, if you try to uninstall it via Device
Manager, it says you have to reboot. And, at least according to my client (I
haven’t tried it yet myself), when you try to reboot the system in this
condition it BSODs during shutdown. Not good all around.
Note that this is currently being tested on 64-bit Win7.
Does anybody have any suggestions of what I might be missing or what else I
could look for that might be the cause of this problem? I’ve still got the
live debug session going where I’ve captured this state of the device, so
hopefully somebody will suggest a debugger command that I haven’t tried yet
that might provide some insight to the source of the problem.
Looking for a “duh!” moment. I just haven’t had one yet.
Thanks,
- Jay
Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
http://www.sysproconsulting.com http:</http:>