I’m trying to chase down an obscure bug in a hardware device driver.
Background: This is the only driver in the stack for the target
hardware. No IRPs are ever ‘sent down’. The driver is built from
Oney’s framework as published in the first edition of “Programming the
Microsoft Windows Driver Model”.
Scenario: Userland performs a query through IOCTL. This query is
placed in the write queue for the device, the IRP is locally cached,
marked pending and DispatchControl() returns an appropriate value. When
the periodic read service picks up the response to the query, the
initiating IRP is recovered from the internal cache, data is copied to
the userland buffer and the IRP is completed.
Symptom: On rare occasions, the MdlAddress pointer in the IRP
“disappears” (becomes reset to NULL) between the time DispatchControl is
called and the time the transaction handler attempts to place the IRP
into the internal cache. A userland pointer is unconditionally
recovered from the MdlAddress field as DispatchControl enters, using
MmGetSystemAddressForMdl. (I know it’s deprecated, but the driver has
to run on Win98 as well) DispatchControl calls a transaction handler
function to place the query in the write queue and cache the IRP for
later completion. The handler calls a cache function to perform the add
to the internal cache. It is at this point (adding the pending
transaction to the internal cache) that the MdlAddress pointer is tested.
Questions: What could cause the MdlAddress pointer to be NULLed between
the initial MmGetSystemAddressForMdl call and the attempt to add to the
internal cache? Would moving the driver to the second edition version
of Oney’s framework address this?
On a related note: The userland query function has a timeout value and
the buffer provided for retrieving the query response is an automatic
variable. (legacy library code… I’m not supposed to change it without
a lot of justification) If a query should take longer that
user_timeout to complete, the buffer represented by the MdlAddress field
of the cached IRP will no longer exist. KdPrint statements in the
OnCancel routine never show up in the debug console, so presumably the
IRP is not cancelled. What happens to MdlAddress? (I’m theorizing that
timeouts may be occurring under heavy device loads… the symptom only
seems to occur with one user’s userland program, which is not
instrumented very well for debugging)
Many thanks in advance for any clues you may have to offer.
Roy M. Silvernail - xxxxx@parker.com
“PLEASE NOTE: The preceding information may be confidential or privileged. It only should be used or disseminated for the purpose of conducting business with Parker. If you are not an intended recipient, please notify the sender by replying to this message and then delete the information from your system. Thank you for your cooperation.”