Hi Scott, thanks for your reply.
Actually, I’m using a TransferBuffer instead of a TransferMDL. I suppose it
could not hurt to switch strategies, as a test.
Yes, I thought about what you mention (buffers/URBs being freed
prematurely), so that is why I focused on proving to myself that all
requests were indeed being cancelled (and touched on it in my second email).
Some ASIO programs do a rapid succession of start/stop during loading, so
cancellation would be stressed more, but this corruption has occurred in
cases before cancellation came into effect (unless I missed a DbgPrint on
the console- I’ll reconfirm that with a set breakpoint). When an input IRP
completes, the record data is copied, the URB is freed, and it signals my
system thread (which had been waiting, and has a low realtime priority) to
send down more input buffers up to a specified maximum. I opted for that
approach due to fear of recursion in sending down buffers from the
completion routine, and scheduling a DPC is dangerous due to surprise
removal and KeFlushQueuedDPCs is not supported on all OS versions). For now
I do not recycle the IRPs; I’ve done that before, but I have heard that IRPs
have their own lookasides, so I figured calling IoAllocateIrp/IoFreeIrp
would not be expensive. I do allocate my input/output URBs and buffers from
lookasides (though for now I reverted to ExAllocatePoolWithTag/ExFreePool,
so that verifier special pool applies- in fact the starting point of this
was that ExFreeToNPagedLookasideList was observed to hang, probably due to
corruption in its list). URBs are only freed in the completion routines),
so I can’t see how I could be freeing a request which is outstanding.
I did mention cancellation in the second email, but I do not believe that it
is necessary in order to produce the corruption. I’ll confirm that.
I like your idea of IRP/URB to physical page listing, maybe that could help
me figure out which request resulted in observed corruption.
I can’t wait to figure this one out, its been driving my nuts, especially
that I can reproduce this only on one machine. Thanks for your reply, I do
appreciate it.
Philip Lukidis
-----Original Message-----
From: Scott Noone [mailto:xxxxx@osr.com]
Sent: Friday, July 08, 2005 5:48 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
corruption issue
Hi,
Never had this particular problem, though I can’t say that I’ve ever tested
in your configuration nor tried exactly what you’re trying
It sounds to me like MDL abuse, if the MDLs that you’re passing down the USB
stack are freed/unlocked prematurely it could result in the random memory
corruption that you’re seeing. I can think of lots of goofy stuff that would
cause this to happen, but it sounds like this code is pretty well tested so
it’s probably something subtle (or some horrible HC issue). Are you
specifying a TransferBuffer or a TransferBufferMDL when you build your URBs?
Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
after they have been completed) or just letting them be completed once and
throwing them away?
Also, you mention your cancel logic in this message but not the previous
one. Do you only see this memory corruption after you’ve cancelled requests?
As a last resort you might want to try keeping track of the IRP/URB pairs
and physical pages that they’re reading into. Once the system crashes you
could then try to match the scoobied physical page up with a particular
IRP/URB/MDL set and hope that some light is shed on the issue…
HTH,
-scott
–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com
“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Some extra information, hope it helps place everything in context:
>
> - For recording audio, I always send down multiple URB in requests to the
> isoch endpoint so that I don’t lose data. My issue occurs only when I am
> not sending down enough URB in requests (as a stress test). My normal
> assumption is that the input data would be lost.
> - The observed issue is that when I know that insufficient URBs for input
> have been sent down to the host controller, input packet data which I
> observe in my CATC analyzer is splattered in host memory- sometimes, but
> not always, producing the verifier C1 special pool BSOD. Sometimes other
> BSODs occur, like illegal operand and such, because code/data has been
> blasted to hell through DMA.
> - I always send down URB buffers of integral size wrt to the maximum
> packet
> size. So if my maximum packet size is X, I always send down a URB buffer
> of
> size NX, where N is determined by the user’s choice of the ASIO buffer
> size.
> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
> myself)
> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
> SP2. Verifier’s special pool flushes it out all that much faster. The
> host
> controller is EHCI, and my device operates at full speed. Using the
> checker
> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
> - The machine where this has been observed is an ASUS Springdale mobo,
> with
> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
> on
> or off. In any case, the driver has been tested extensively on
> dual/hyperthreading machines, and the issue so far has been observed on
> this
> machine alone- WHEN I throttle the amount of input URBs sent down.
> - Testing on OHCI machines has not reproduced the issue so far.
>
> When I need to cancel my requests, I have a cancelAllRequests() routine
> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
> with the IRP completion routine. When all requests have come back up, I
> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
> sure).
> This code is pretty old, and seems to be fine on inspection, and has
> passed
> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
> HT. The code is based on Walter Oney’s example for canceling async IRPs
> which I created on page 284 of his second ed. book, but extended for
> multiples IRPs down at once.
>
> Are there any suggestions at all how this issue could occur…what could I
> be doing to provoke this? (besides passing down a bad buffer pointer- that
> code has been reviewed, and extensive verifier/checked build stress tests
> have been done).
>
> thanks,
>
> Philip Lukidis
>
> -----Original Message-----
> From: Philip Lukidis
> Sent: Thursday, July 07, 2005 4:39 PM
> To: ‘xxxxx@lists.osr.com’
> Subject: question for USB isochronous I/O (input)- massive corruption
> issue
>
>
> Hello. I’m having a strange issue with my device on WinXP SP2, on an
> Intel
> Springdale mobo. My device is a USB audio device, for which I have
> written
> an ASIO audio driver, whose lower layer is a kernel USB driver, which
> performs the necessary input and output isochronous I/O. On the
> aforementioned mobo onto which the device is plugged into an EHCI port, I
> see from time to time that isochronous I/O (input) is giving me enormous
> host memory corruption issues when my USB driver has not fed the host
> controller with enough input URBs (deliberate for stress testing). When I
> make sure to keep the host controller well supplied with input URBs, all
> is
> (seemingly) well.
>
> However, when the host controller is starved of input URBs (perhaps for
> hundreds or milliseconds), thereafter I see enormous host memory
> corruption.
> The BSODs are so many in variety that it would be (I think at least)
> pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
> with verifier enabled for my driver, but that did not help pinpoint the
> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
> flag, and I calculate my USB frames rather than use the
> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
> issue
> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>
> The latest finding is interesting. My latest BSOD pointed towards a
> region
> of nuked memory, which had a data pattern identical to a input USB packet
> data pattern recorded on my CATC USB bus analyzer. So it looks to me that
> starving the host controller of input URBs (seemingly on that one machine
> so
> far) results in improper DMA to host memory…Any takers as to why this
> may
> be, or any suggestions at all would be welcome.
>
> thanks,
>
> Philip Lukidis
>
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com