RE: question for USB isochronous I/O (input)- massive corruption issue

Some extra information, hope it helps place everything in context:

  • For recording audio, I always send down multiple URB in requests to the
    isoch endpoint so that I don’t lose data. My issue occurs only when I am
    not sending down enough URB in requests (as a stress test). My normal
    assumption is that the input data would be lost.
  • The observed issue is that when I know that insufficient URBs for input
    have been sent down to the host controller, input packet data which I
    observe in my CATC analyzer is splattered in host memory- sometimes, but
    not always, producing the verifier C1 special pool BSOD. Sometimes other
    BSODs occur, like illegal operand and such, because code/data has been
    blasted to hell through DMA.
  • I always send down URB buffers of integral size wrt to the maximum packet
    size. So if my maximum packet size is X, I always send down a URB buffer of
    size NX, where N is determined by the user’s choice of the ASIO buffer size.
  • I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
    USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number myself)
  • This issue so far only occurs on one machine, with WinXP SP1 and WinXP
    SP2. Verifier’s special pool flushes it out all that much faster. The host
    controller is EHCI, and my device operates at full speed. Using the checker
    kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
  • The machine where this has been observed is an ASUS Springdale mobo, with
    a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is on
    or off. In any case, the driver has been tested extensively on
    dual/hyperthreading machines, and the issue so far has been observed on this
    machine alone- WHEN I throttle the amount of input URBs sent down.
  • Testing on OHCI machines has not reproduced the issue so far.

When I need to cancel my requests, I have a cancelAllRequests() routine
which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
with the IRP completion routine. When all requests have come back up, I
call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make sure).
This code is pretty old, and seems to be fine on inspection, and has passed
testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
HT. The code is based on Walter Oney’s example for canceling async IRPs
which I created on page 284 of his second ed. book, but extended for
multiples IRPs down at once.

Are there any suggestions at all how this issue could occur…what could I
be doing to provoke this? (besides passing down a bad buffer pointer- that
code has been reviewed, and extensive verifier/checked build stress tests
have been done).

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Thursday, July 07, 2005 4:39 PM
To: ‘xxxxx@lists.osr.com’
Subject: question for USB isochronous I/O (input)- massive corruption
issue

Hello. I’m having a strange issue with my device on WinXP SP2, on an Intel
Springdale mobo. My device is a USB audio device, for which I have written
an ASIO audio driver, whose lower layer is a kernel USB driver, which
performs the necessary input and output isochronous I/O. On the
aforementioned mobo onto which the device is plugged into an EHCI port, I
see from time to time that isochronous I/O (input) is giving me enormous
host memory corruption issues when my USB driver has not fed the host
controller with enough input URBs (deliberate for stress testing). When I
make sure to keep the host controller well supplied with input URBs, all is
(seemingly) well.

However, when the host controller is starved of input URBs (perhaps for
hundreds or milliseconds), thereafter I see enormous host memory corruption.
The BSODs are so many in variety that it would be (I think at least)
pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
with verifier enabled for my driver, but that did not help pinpoint the
issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
flag, and I calculate my USB frames rather than use the
USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this issue
on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.

The latest finding is interesting. My latest BSOD pointed towards a region
of nuked memory, which had a data pattern identical to a input USB packet
data pattern recorded on my CATC USB bus analyzer. So it looks to me that
starving the host controller of input URBs (seemingly on that one machine so
far) results in improper DMA to host memory…Any takers as to why this may
be, or any suggestions at all would be welcome.

thanks,

Philip Lukidis

Hi,

Never had this particular problem, though I can’t say that I’ve ever tested
in your configuration nor tried exactly what you’re trying :slight_smile:

It sounds to me like MDL abuse, if the MDLs that you’re passing down the USB
stack are freed/unlocked prematurely it could result in the random memory
corruption that you’re seeing. I can think of lots of goofy stuff that would
cause this to happen, but it sounds like this code is pretty well tested so
it’s probably something subtle (or some horrible HC issue). Are you
specifying a TransferBuffer or a TransferBufferMDL when you build your URBs?
Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
after they have been completed) or just letting them be completed once and
throwing them away?

Also, you mention your cancel logic in this message but not the previous
one. Do you only see this memory corruption after you’ve cancelled requests?

As a last resort you might want to try keeping track of the IRP/URB pairs
and physical pages that they’re reading into. Once the system crashes you
could then try to match the scoobied physical page up with a particular
IRP/URB/MDL set and hope that some light is shed on the issue…

HTH,

-scott


Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Some extra information, hope it helps place everything in context:
>
> - For recording audio, I always send down multiple URB in requests to the
> isoch endpoint so that I don’t lose data. My issue occurs only when I am
> not sending down enough URB in requests (as a stress test). My normal
> assumption is that the input data would be lost.
> - The observed issue is that when I know that insufficient URBs for input
> have been sent down to the host controller, input packet data which I
> observe in my CATC analyzer is splattered in host memory- sometimes, but
> not always, producing the verifier C1 special pool BSOD. Sometimes other
> BSODs occur, like illegal operand and such, because code/data has been
> blasted to hell through DMA.
> - I always send down URB buffers of integral size wrt to the maximum
> packet
> size. So if my maximum packet size is X, I always send down a URB buffer
> of
> size NX, where N is determined by the user’s choice of the ASIO buffer
> size.
> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
> myself)
> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
> SP2. Verifier’s special pool flushes it out all that much faster. The
> host
> controller is EHCI, and my device operates at full speed. Using the
> checker
> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
> - The machine where this has been observed is an ASUS Springdale mobo,
> with
> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
> on
> or off. In any case, the driver has been tested extensively on
> dual/hyperthreading machines, and the issue so far has been observed on
> this
> machine alone- WHEN I throttle the amount of input URBs sent down.
> - Testing on OHCI machines has not reproduced the issue so far.
>
> When I need to cancel my requests, I have a cancelAllRequests() routine
> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
> with the IRP completion routine. When all requests have come back up, I
> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
> sure).
> This code is pretty old, and seems to be fine on inspection, and has
> passed
> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
> HT. The code is based on Walter Oney’s example for canceling async IRPs
> which I created on page 284 of his second ed. book, but extended for
> multiples IRPs down at once.
>
> Are there any suggestions at all how this issue could occur…what could I
> be doing to provoke this? (besides passing down a bad buffer pointer- that
> code has been reviewed, and extensive verifier/checked build stress tests
> have been done).
>
> thanks,
>
> Philip Lukidis
>
> -----Original Message-----
> From: Philip Lukidis
> Sent: Thursday, July 07, 2005 4:39 PM
> To: ‘xxxxx@lists.osr.com’
> Subject: question for USB isochronous I/O (input)- massive corruption
> issue
>
>
> Hello. I’m having a strange issue with my device on WinXP SP2, on an
> Intel
> Springdale mobo. My device is a USB audio device, for which I have
> written
> an ASIO audio driver, whose lower layer is a kernel USB driver, which
> performs the necessary input and output isochronous I/O. On the
> aforementioned mobo onto which the device is plugged into an EHCI port, I
> see from time to time that isochronous I/O (input) is giving me enormous
> host memory corruption issues when my USB driver has not fed the host
> controller with enough input URBs (deliberate for stress testing). When I
> make sure to keep the host controller well supplied with input URBs, all
> is
> (seemingly) well.
>
> However, when the host controller is starved of input URBs (perhaps for
> hundreds or milliseconds), thereafter I see enormous host memory
> corruption.
> The BSODs are so many in variety that it would be (I think at least)
> pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
> with verifier enabled for my driver, but that did not help pinpoint the
> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
> flag, and I calculate my USB frames rather than use the
> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
> issue
> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>
> The latest finding is interesting. My latest BSOD pointed towards a
> region
> of nuked memory, which had a data pattern identical to a input USB packet
> data pattern recorded on my CATC USB bus analyzer. So it looks to me that
> starving the host controller of input URBs (seemingly on that one machine
> so
> far) results in improper DMA to host memory…Any takers as to why this
> may
> be, or any suggestions at all would be welcome.
>
> thanks,
>
> Philip Lukidis
>