> Hello,
I’ll clear up a few of the comments and describe my current progress:
Firstly, please feel free to collectively roll your eyes if this is a
really bad design, but I am not and was never using METHOD_NEITHER; I set
up the IOCTL as METHOD_OUT_DIRECT and was calling the probe-and-lock
function on the extracted embedded pointer. It was never my intent to use
METHOD_NEITHER. I thought that we would still need to probe-and-lock the
buffer pointed to by the embedded pointer since the output buffer for the
IOCTL is a separate thing from this buffer and is not actually where the
payload will be filled in.
Yes, if you supplied a pointer to user address space in te IOCTL call, you
do have to lock the buffer down. For all practical purposes, this is
isomorphic to METHOD_NEITHER in all the possible bad ways. [drum roll,
no, eye roll] Why are you providing a pointer to a buffer in the payload
instead of just providing the buffer in the call itself?
The reason for this design is that the data structure we pass down with
the embedded pointer is filled with members that describe the data both
before and after the data transfer. The application specifies the payload
buffer size in one field prior to sending the IOCTL.The driver fills the
payload buffer with data the device observes, and then fills in other data
like IO completion status, a timestamp, etc.
And this differs from using the METHOD_OUT_DIRECT buffer how?
This design was in place prior to my involvement with the development of
this driver, however we are now trying to transition to a new protocol and
decided to implement a similar design. I am certainly open to different
design approaches and am willing to do whatever will make this work.
The design would be considered bizarre. There’s no good reason I see that
makes this design advantageous (there are reasons that “neither” mode can
make sense, but your explanation does not mention any of them).
So, unless there are factors you have not mentioned, my reaction [eye
roll] is that it is a good opportunity to redo the design.
Joseph, thank you for your points. I just wanted to mention that I am not
fixated on either of the two non-existent problems mentioned; my
observation initially was that perhaps the cause of the problem was
because of the user buffer being freed mid-transfer, but everyone chimed
in and said that this should not happen, so I no longer believe this to be
the case and did not push this theory. I keep mentioning the problem
happening “when application is terminated mid-transfer” simply because it
is the only reproduction case I currently have for this problem, and is
exactly the case of what I am trying to fix. As for the reason for the
design, it is described above.
I am quite willing to believe that there are problems caused by a
mid-transfer abort. The most common case is that if an IRP that is “in
flight” (currently being in communication with the device) it is possible,
for example, to have a DMA transfer to continue, scribbling over random
storage, or a programmed transfer to use a now-stale pointer. One way to
deal with this is to not allow an active IRP to be completed until the
transfer is finished. For a device with a bounded response time, this is
the best approach. If the device is potentially unbounded response time,
recovery is more complex. The problem is that when you complete an IRP, a
process that is being shut down will shut down, and the buffers will
disappear. But if you have parts of the driver which don’t recognize this
(e.g. the ISR or DPC) they will continue to use now-stale pointers. So,
given your scenario, the first thing I’d look for is the consequences of
an overeager cancel; for example, the active IRP still has a cancel
routine set and the cancel routine doesn’t check to see to see if the IRP
is actually active.
So, I have changed the code around in the following way:
- The application now passes in two buffers with the IOCTL, input and
output. The input buffer contains a struct with fields describing the data
size, etc and the output buffer is the payload buffer as is; no embedded
pointer.
Yes, that sounds better
- My EvtIoInCallerContext routine intercepts requests and passes all
requests on with WdfDeviceEnqueueRequest except for the problematic IOCTL
- For this IOCTL, the EvtIoInCallerContext routine calls
WdfRequestRetrieveInputBuffer, checks the specified buffer size, and then
calls WdfRequestRetrieveOutputBuffer with that buffer size (Maybe I don’t
need to do this in EvtIoInCallerContext since I’m no longer using
ProbeAndLock?)
Isuspect that you are correct that this is not necessary, but I’m not a
KMDF expert.
- A request context is allocated and the buffer size is filled in.
- WdfMemoryCreatePreallocated is called with the pointer to the
previously retrieved output buffer. The WDFMEMORY object that gets created
is a member of the request context.
- WdfDeviceEnqueueRequest is called and the request is requeued.
- The EvtIoDeviceControl routine receives the request, retrieves the
request context, calls WdfUsbTargetPipeFormatRequestForRead with the
RequestContext->memoryObject passed as the third parameter. A completion
routine is set and the request is sent.
Unfortunately, this design reproduces the exact same behavior as the
other: everything works fine and I can’t make the device hang in most
cases, but it does hang when I kill the application mid-transfer. Again,
all in-flight requests do appear to ‘clean up’ immediately.
Can you explain “hang”? If the manifestation is that future requests
arrive at the driver and get enqueued, but never dequeued, it could be
that the cancellation leaves the queues in a state where they think they
are busy. In a WDM driver, this would suggest that you called
IoCompleteRequest but failed to call the “dequeue next request” routine
(which,if it finds the pending queue is empty, resets the “device
currently processing an IRP” state. The fact that you see the same
problem after the change in buffer management suggestd that the buffer
management was not the problem. So look for queue management problems.
Unfortunately, I’m still not sure what the bus sees. We’re looking at
getting a bus analyzer trace, but our USB3 analyzer seems to be
malfunctioning right now. I’m looking at taking this step next.
If the hang is that the device is not reponding, then the problem is not
in the queue management, but in the device itself. If you see a request
being sent to the device, but the device is not responding, it means that
the device is expecting some other request and refuses to respond.
Consider if the protocol is a sequence of packets ABC, and you abort the
transfer while B is running, then the device sees ABABC and is in a weird
state because it saw an A after B, instead of the expected C. This is
almost entirely guesswork on my part, but it is consistent with things I
have seen happen to other devices on pre-Windows systems.
You should be able to infer this without a bus analyzer, just be adding
some debug printouts at enqueue, dequeue, and simolar events.
joe
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer