RE: question for USB isochronous I/O (input)- massive cor ruption issue

OSR_Community_User · July 8, 2005, 6:28pm

Hi Scott, thanks for your reply.

Actually, I’m using a TransferBuffer instead of a TransferMDL. I suppose it
could not hurt to switch strategies, as a test.

Yes, I thought about what you mention (buffers/URBs being freed
prematurely), so that is why I focused on proving to myself that all
requests were indeed being cancelled (and touched on it in my second email).
Some ASIO programs do a rapid succession of start/stop during loading, so
cancellation would be stressed more, but this corruption has occurred in
cases before cancellation came into effect (unless I missed a DbgPrint on
the console- I’ll reconfirm that with a set breakpoint). When an input IRP
completes, the record data is copied, the URB is freed, and it signals my
system thread (which had been waiting, and has a low realtime priority) to
send down more input buffers up to a specified maximum. I opted for that
approach due to fear of recursion in sending down buffers from the
completion routine, and scheduling a DPC is dangerous due to surprise
removal and KeFlushQueuedDPCs is not supported on all OS versions). For now
I do not recycle the IRPs; I’ve done that before, but I have heard that IRPs
have their own lookasides, so I figured calling IoAllocateIrp/IoFreeIrp
would not be expensive. I do allocate my input/output URBs and buffers from
lookasides (though for now I reverted to ExAllocatePoolWithTag/ExFreePool,
so that verifier special pool applies- in fact the starting point of this
was that ExFreeToNPagedLookasideList was observed to hang, probably due to
corruption in its list). URBs are only freed in the completion routines),
so I can’t see how I could be freeing a request which is outstanding.

I did mention cancellation in the second email, but I do not believe that it
is necessary in order to produce the corruption. I’ll confirm that.

I like your idea of IRP/URB to physical page listing, maybe that could help
me figure out which request resulted in observed corruption.

I can’t wait to figure this one out, its been driving my nuts, especially
that I can reproduce this only on one machine. Thanks for your reply, I do
appreciate it.

Philip Lukidis

-----Original Message-----
From: Scott Noone [mailto:xxxxx@osr.com]
Sent: Friday, July 08, 2005 5:48 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi,

Never had this particular problem, though I can’t say that I’ve ever tested
in your configuration nor tried exactly what you’re trying

It sounds to me like MDL abuse, if the MDLs that you’re passing down the USB

stack are freed/unlocked prematurely it could result in the random memory
corruption that you’re seeing. I can think of lots of goofy stuff that would

cause this to happen, but it sounds like this code is pretty well tested so
it’s probably something subtle (or some horrible HC issue). Are you
specifying a TransferBuffer or a TransferBufferMDL when you build your URBs?

Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
after they have been completed) or just letting them be completed once and
throwing them away?

Also, you mention your cancel logic in this message but not the previous
one. Do you only see this memory corruption after you’ve cancelled requests?

As a last resort you might want to try keeping track of the IRP/URB pairs
and physical pages that they’re reading into. Once the system crashes you
could then try to match the scoobied physical page up with a particular
IRP/URB/MDL set and hope that some light is shed on the issue…

HTH,

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Some extra information, hope it helps place everything in context:
>
> - For recording audio, I always send down multiple URB in requests to the
> isoch endpoint so that I don’t lose data. My issue occurs only when I am
> not sending down enough URB in requests (as a stress test). My normal
> assumption is that the input data would be lost.
> - The observed issue is that when I know that insufficient URBs for input
> have been sent down to the host controller, input packet data which I
> observe in my CATC analyzer is splattered in host memory- sometimes, but
> not always, producing the verifier C1 special pool BSOD. Sometimes other
> BSODs occur, like illegal operand and such, because code/data has been
> blasted to hell through DMA.
> - I always send down URB buffers of integral size wrt to the maximum
> packet
> size. So if my maximum packet size is X, I always send down a URB buffer
> of
> size NX, where N is determined by the user’s choice of the ASIO buffer
> size.
> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
> myself)
> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
> SP2. Verifier’s special pool flushes it out all that much faster. The
> host
> controller is EHCI, and my device operates at full speed. Using the
> checker
> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
> - The machine where this has been observed is an ASUS Springdale mobo,
> with
> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
> on
> or off. In any case, the driver has been tested extensively on
> dual/hyperthreading machines, and the issue so far has been observed on
> this
> machine alone- WHEN I throttle the amount of input URBs sent down.
> - Testing on OHCI machines has not reproduced the issue so far.
>
> When I need to cancel my requests, I have a cancelAllRequests() routine
> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
> with the IRP completion routine. When all requests have come back up, I
> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
> sure).
> This code is pretty old, and seems to be fine on inspection, and has
> passed
> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
> HT. The code is based on Walter Oney’s example for canceling async IRPs
> which I created on page 284 of his second ed. book, but extended for
> multiples IRPs down at once.
>
> Are there any suggestions at all how this issue could occur…what could I
> be doing to provoke this? (besides passing down a bad buffer pointer- that
> code has been reviewed, and extensive verifier/checked build stress tests
> have been done).
>
> thanks,
>
> Philip Lukidis
>
> -----Original Message-----
> From: Philip Lukidis
> Sent: Thursday, July 07, 2005 4:39 PM
> To: ‘xxxxx@lists.osr.com’
> Subject: question for USB isochronous I/O (input)- massive corruption
> issue
>
>
> Hello. I’m having a strange issue with my device on WinXP SP2, on an
> Intel
> Springdale mobo. My device is a USB audio device, for which I have
> written
> an ASIO audio driver, whose lower layer is a kernel USB driver, which
> performs the necessary input and output isochronous I/O. On the
> aforementioned mobo onto which the device is plugged into an EHCI port, I
> see from time to time that isochronous I/O (input) is giving me enormous
> host memory corruption issues when my USB driver has not fed the host
> controller with enough input URBs (deliberate for stress testing). When I
> make sure to keep the host controller well supplied with input URBs, all
> is
> (seemingly) well.
>
> However, when the host controller is starved of input URBs (perhaps for
> hundreds or milliseconds), thereafter I see enormous host memory
> corruption.
> The BSODs are so many in variety that it would be (I think at least)
> pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
> with verifier enabled for my driver, but that did not help pinpoint the
> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
> flag, and I calculate my USB frames rather than use the
> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
> issue
> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>
> The latest finding is interesting. My latest BSOD pointed towards a
> region
> of nuked memory, which had a data pattern identical to a input USB packet
> data pattern recorded on my CATC USB bus analyzer. So it looks to me that
> starving the host controller of input URBs (seemingly on that one machine
> so
> far) results in improper DMA to host memory…Any takers as to why this
> may
> be, or any suggestions at all would be welcome.
>
> thanks,
>
> Philip Lukidis
>

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Scott_Noone_OSR · July 8, 2005, 6:58pm

Hi,

When you just specify a TransferBuffer, the USB stack will build an MDL for
you and stick it in TransferBufferMDL so I always figure that you might as
well do it yourself and save him the work.

Note that before your completion routine runs this MDL will have been freed
but NOT set to NULL in the URB (you can verify this for yourself, dump the
URB in the completion routine and a poison MDL is in the URB like magic!).
I’ve seen people have issues very similar to yours when they recycle URBs
but don’t properly reinitialize them before sending them down again. The USB
stack sees that the TransferBufferMDL isn’t NULL and starts using the freed
MDL (which, as luck would have it, still points to valid memory because it’s
on a lookaside list) and things eventually go kaboom when the pages get
recycled. It doesn’t sound like you’re doing any recycling here, but I just
wanted to note it because it’s something to be aware of.

I’d definitely double check on whether or not cancel is required to repro, a
race there (or hardware that doesn’t like its in progress I/O cancelled)
seems like a likely candidate.

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Hi Scott, thanks for your reply.
>
> Actually, I’m using a TransferBuffer instead of a TransferMDL. I suppose
> it
> could not hurt to switch strategies, as a test.
>
> Yes, I thought about what you mention (buffers/URBs being freed
> prematurely), so that is why I focused on proving to myself that all
> requests were indeed being cancelled (and touched on it in my second
> email).
> Some ASIO programs do a rapid succession of start/stop during loading, so
> cancellation would be stressed more, but this corruption has occurred in
> cases before cancellation came into effect (unless I missed a DbgPrint on
> the console- I’ll reconfirm that with a set breakpoint). When an input
> IRP
> completes, the record data is copied, the URB is freed, and it signals my
> system thread (which had been waiting, and has a low realtime priority) to
> send down more input buffers up to a specified maximum. I opted for that
> approach due to fear of recursion in sending down buffers from the
> completion routine, and scheduling a DPC is dangerous due to surprise
> removal and KeFlushQueuedDPCs is not supported on all OS versions). For
> now
> I do not recycle the IRPs; I’ve done that before, but I have heard that
> IRPs
> have their own lookasides, so I figured calling IoAllocateIrp/IoFreeIrp
> would not be expensive. I do allocate my input/output URBs and buffers
> from
> lookasides (though for now I reverted to ExAllocatePoolWithTag/ExFreePool,
> so that verifier special pool applies- in fact the starting point of this
> was that ExFreeToNPagedLookasideList was observed to hang, probably due to
> corruption in its list). URBs are only freed in the completion routines),
> so I can’t see how I could be freeing a request which is outstanding.
>
> I did mention cancellation in the second email, but I do not believe that
> it
> is necessary in order to produce the corruption. I’ll confirm that.
>
> I like your idea of IRP/URB to physical page listing, maybe that could
> help
> me figure out which request resulted in observed corruption.
>
> I can’t wait to figure this one out, its been driving my nuts, especially
> that I can reproduce this only on one machine. Thanks for your reply, I
> do
> appreciate it.
>
> Philip Lukidis
>
> -----Original Message-----
> From: Scott Noone [mailto:xxxxx@osr.com]
> Sent: Friday, July 08, 2005 5:48 PM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
> corruption issue
>
>
> Hi,
>
> Never had this particular problem, though I can’t say that I’ve ever
> tested
> in your configuration nor tried exactly what you’re trying
>
> It sounds to me like MDL abuse, if the MDLs that you’re passing down the
> USB
>
> stack are freed/unlocked prematurely it could result in the random memory
> corruption that you’re seeing. I can think of lots of goofy stuff that
> would
>
> cause this to happen, but it sounds like this code is pretty well tested
> so
> it’s probably something subtle (or some horrible HC issue). Are you
> specifying a TransferBuffer or a TransferBufferMDL when you build your
> URBs?
>
> Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
> after they have been completed) or just letting them be completed once and
> throwing them away?
>
> Also, you mention your cancel logic in this message but not the previous
> one. Do you only see this memory corruption after you’ve cancelled
> requests?
>
> As a last resort you might want to try keeping track of the IRP/URB pairs
> and physical pages that they’re reading into. Once the system crashes you
> could then try to match the scoobied physical page up with a particular
> IRP/URB/MDL set and hope that some light is shed on the issue…
>
> HTH,
>
> -scott
>
> –
> Scott Noone
> Software Engineer
> OSR Open Systems Resources, Inc.
> http://www.osronline.com
>
>
> “Philip Lukidis” wrote in message
> news:xxxxx@ntdev…
>> Some extra information, hope it helps place everything in context:
>>
>> - For recording audio, I always send down multiple URB in requests to the
>> isoch endpoint so that I don’t lose data. My issue occurs only when I am
>> not sending down enough URB in requests (as a stress test). My normal
>> assumption is that the input data would be lost.
>> - The observed issue is that when I know that insufficient URBs for input
>> have been sent down to the host controller, input packet data which I
>> observe in my CATC analyzer is splattered in host memory- sometimes, but
>> not always, producing the verifier C1 special pool BSOD. Sometimes other
>> BSODs occur, like illegal operand and such, because code/data has been
>> blasted to hell through DMA.
>> - I always send down URB buffers of integral size wrt to the maximum
>> packet
>> size. So if my maximum packet size is X, I always send down a URB buffer
>> of
>> size NX, where N is determined by the user’s choice of the ASIO buffer
>> size.
>> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
>> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
>> myself)
>> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
>> SP2. Verifier’s special pool flushes it out all that much faster. The
>> host
>> controller is EHCI, and my device operates at full speed. Using the
>> checker
>> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
>> - The machine where this has been observed is an ASUS Springdale mobo,
>> with
>> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
>> on
>> or off. In any case, the driver has been tested extensively on
>> dual/hyperthreading machines, and the issue so far has been observed on
>> this
>> machine alone- WHEN I throttle the amount of input URBs sent down.
>> - Testing on OHCI machines has not reproduced the issue so far.
>>
>> When I need to cancel my requests, I have a cancelAllRequests() routine
>> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
>> with the IRP completion routine. When all requests have come back up, I
>> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
>> sure).
>> This code is pretty old, and seems to be fine on inspection, and has
>> passed
>> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
>> HT. The code is based on Walter Oney’s example for canceling async IRPs
>> which I created on page 284 of his second ed. book, but extended for
>> multiples IRPs down at once.
>>
>> Are there any suggestions at all how this issue could occur…what could
>> I
>> be doing to provoke this? (besides passing down a bad buffer pointer-
>> that
>> code has been reviewed, and extensive verifier/checked build stress tests
>> have been done).
>>
>> thanks,
>>
>> Philip Lukidis
>>
>> -----Original Message-----
>> From: Philip Lukidis
>> Sent: Thursday, July 07, 2005 4:39 PM
>> To: ‘xxxxx@lists.osr.com’
>> Subject: question for USB isochronous I/O (input)- massive corruption
>> issue
>>
>>
>> Hello. I’m having a strange issue with my device on WinXP SP2, on an
>> Intel
>> Springdale mobo. My device is a USB audio device, for which I have
>> written
>> an ASIO audio driver, whose lower layer is a kernel USB driver, which
>> performs the necessary input and output isochronous I/O. On the
>> aforementioned mobo onto which the device is plugged into an EHCI port, I
>> see from time to time that isochronous I/O (input) is giving me enormous
>> host memory corruption issues when my USB driver has not fed the host
>> controller with enough input URBs (deliberate for stress testing). When
>> I
>> make sure to keep the host controller well supplied with input URBs, all
>> is
>> (seemingly) well.
>>
>> However, when the host controller is starved of input URBs (perhaps for
>> hundreds or milliseconds), thereafter I see enormous host memory
>> corruption.
>> The BSODs are so many in variety that it would be (I think at least)
>> pointless to post them. I’ve been using the checked HAL/kernel/USB
>> stack,
>> with verifier enabled for my driver, but that did not help pinpoint the
>> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
>> flag, and I calculate my USB frames rather than use the
>> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
>> issue
>> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>>
>> The latest finding is interesting. My latest BSOD pointed towards a
>> region
>> of nuked memory, which had a data pattern identical to a input USB packet
>> data pattern recorded on my CATC USB bus analyzer. So it looks to me
>> that
>> starving the host controller of input URBs (seemingly on that one machine
>> so
>> far) results in improper DMA to host memory…Any takers as to why this
>> may
>> be, or any suggestions at all would be welcome.
>>
>> thanks,
>>
>> Philip Lukidis
>>
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@guillemot.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Tim_Roberts · July 8, 2005, 7:03pm

Philip Lukidis wrote:

I can’t wait to figure this one out, its been driving my nuts, especially
that I can reproduce this only on one machine. Thanks for your reply, I do
appreciate it.

Do not overlook the possibility that the host controller in this machine
is simply defective. All it would take is for the DMA machinery to miss
one “stop” signal, and you’d have disaster. Do you have a second sample
of this exact same motherboard?

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

OSR_Community_User · July 8, 2005, 8:34pm

Thanks again for your reply. I’ll double check the cancelation part being
required or not for this to occur.

Yes, I noticed the poison MDL in the past. I don’t recycle the URB myself,
but use a lookaside myself. I thought that this would be as efficient. In
the past I’ve done the recycling myself, but questioned its worth. In any
case each time I allocate a URB from the lookaside I set the
TransferBufferMDL to NULL after I have set the TransferBuffer pointer. But
thanks for bringing that up; I might as well put an ASSERT in that section
making sure that it is NULL.

If cancelation is the issue, then a possibly easier solution than canceling
the IRPs and synchronizing with the completion routine would be to call
abort on the pipe, and wait for IRPs to complete. Or is calling IoCancelIrp
required?

Anyways, thanks for the pointers.

Philip Lukidis

-----Original Message-----
From: Scott Noone
To: Windows System Software Devs Interest List
Sent: 07/08/2005 6:57 PM
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive cor
ruption issue

Hi,

When you just specify a TransferBuffer, the USB stack will build an MDL
for
you and stick it in TransferBufferMDL so I always figure that you might
as
well do it yourself and save him the work.

Note that before your completion routine runs this MDL will have been
freed
but NOT set to NULL in the URB (you can verify this for yourself, dump
the
URB in the completion routine and a poison MDL is in the URB like
magic!).
I’ve seen people have issues very similar to yours when they recycle
URBs
but don’t properly reinitialize them before sending them down again. The
USB
stack sees that the TransferBufferMDL isn’t NULL and starts using the
freed
MDL (which, as luck would have it, still points to valid memory because
it’s
on a lookaside list) and things eventually go kaboom when the pages get
recycled. It doesn’t sound like you’re doing any recycling here, but I
just
wanted to note it because it’s something to be aware of.

I’d definitely double check on whether or not cancel is required to
repro, a
race there (or hardware that doesn’t like its in progress I/O cancelled)

seems like a likely candidate.

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Hi Scott, thanks for your reply.
>
> Actually, I’m using a TransferBuffer instead of a TransferMDL. I
suppose
> it
> could not hurt to switch strategies, as a test.
>
> Yes, I thought about what you mention (buffers/URBs being freed
> prematurely), so that is why I focused on proving to myself that all
> requests were indeed being cancelled (and touched on it in my second
> email).
> Some ASIO programs do a rapid succession of start/stop during loading,
so
> cancellation would be stressed more, but this corruption has occurred
in
> cases before cancellation came into effect (unless I missed a DbgPrint
on
> the console- I’ll reconfirm that with a set breakpoint). When an
input
> IRP
> completes, the record data is copied, the URB is freed, and it signals
my
> system thread (which had been waiting, and has a low realtime
priority) to
> send down more input buffers up to a specified maximum. I opted for
that
> approach due to fear of recursion in sending down buffers from the
> completion routine, and scheduling a DPC is dangerous due to surprise
> removal and KeFlushQueuedDPCs is not supported on all OS versions).
For
> now
> I do not recycle the IRPs; I’ve done that before, but I have heard
that
> IRPs
> have their own lookasides, so I figured calling
IoAllocateIrp/IoFreeIrp
> would not be expensive. I do allocate my input/output URBs and
buffers
> from
> lookasides (though for now I reverted to
ExAllocatePoolWithTag/ExFreePool,
> so that verifier special pool applies- in fact the starting point of
this
> was that ExFreeToNPagedLookasideList was observed to hang, probably
due to
> corruption in its list). URBs are only freed in the completion
routines),
> so I can’t see how I could be freeing a request which is outstanding.
>
> I did mention cancellation in the second email, but I do not believe
that
> it
> is necessary in order to produce the corruption. I’ll confirm that.
>
> I like your idea of IRP/URB to physical page listing, maybe that could

> help
> me figure out which request resulted in observed corruption.
>
> I can’t wait to figure this one out, its been driving my nuts,
especially
> that I can reproduce this only on one machine. Thanks for your reply,
I
> do
> appreciate it.
>
> Philip Lukidis
>
> -----Original Message-----
> From: Scott Noone [mailto:xxxxx@osr.com]
> Sent: Friday, July 08, 2005 5:48 PM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
> corruption issue
>
>
> Hi,
>
> Never had this particular problem, though I can’t say that I’ve ever
> tested
> in your configuration nor tried exactly what you’re trying
>
> It sounds to me like MDL abuse, if the MDLs that you’re passing down
the
> USB
>
> stack are freed/unlocked prematurely it could result in the random
memory
> corruption that you’re seeing. I can think of lots of goofy stuff that

> would
>
> cause this to happen, but it sounds like this code is pretty well
tested
> so
> it’s probably something subtle (or some horrible HC issue). Are you
> specifying a TransferBuffer or a TransferBufferMDL when you build your

> URBs?
>
> Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs
again
> after they have been completed) or just letting them be completed once
and
> throwing them away?
>
> Also, you mention your cancel logic in this message but not the
previous
> one. Do you only see this memory corruption after you’ve cancelled
> requests?
>
> As a last resort you might want to try keeping track of the IRP/URB
pairs
> and physical pages that they’re reading into. Once the system crashes
you
> could then try to match the scoobied physical page up with a
particular
> IRP/URB/MDL set and hope that some light is shed on the issue…
>
> HTH,
>
> -scott
>
> –
> Scott Noone
> Software Engineer
> OSR Open Systems Resources, Inc.
> http://www.osronline.com
>
>
> “Philip Lukidis” wrote in message
> news:xxxxx@ntdev…
>> Some extra information, hope it helps place everything in context:
>>
>> - For recording audio, I always send down multiple URB in requests to
the
>> isoch endpoint so that I don’t lose data. My issue occurs only when
I am
>> not sending down enough URB in requests (as a stress test). My
normal
>> assumption is that the input data would be lost.
>> - The observed issue is that when I know that insufficient URBs for
input
>> have been sent down to the host controller, input packet data which I
>> observe in my CATC analyzer is splattered in host memory- sometimes,
but
>> not always, producing the verifier C1 special pool BSOD. Sometimes
other
>> BSODs occur, like illegal operand and such, because code/data has
been
>> blasted to hell through DMA.
>> - I always send down URB buffers of integral size wrt to the maximum
>> packet
>> size. So if my maximum packet size is X, I always send down a URB
buffer
>> of
>> size NX, where N is determined by the user’s choice of the ASIO
buffer
>> size.
>> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
>> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
>> myself)
>> - This issue so far only occurs on one machine, with WinXP SP1 and
WinXP
>> SP2. Verifier’s special pool flushes it out all that much faster.
The
>> host
>> controller is EHCI, and my device operates at full speed. Using the
>> checker
>> kernel/HAL/USB stack did not result in untoward warnings before the
BSOD.
>> - The machine where this has been observed is an ASUS Springdale
mobo,
>> with
>> a hyperthreading 3.0 GHz CPU. The issue occurs whether
hyperthreading is
>> on
>> or off. In any case, the driver has been tested extensively on
>> dual/hyperthreading machines, and the issue so far has been observed
on
>> this
>> machine alone- WHEN I throttle the amount of input URBs sent down.
>> - Testing on OHCI machines has not reproduced the issue so far.
>>
>> When I need to cancel my requests, I have a cancelAllRequests()
routine
>> which calls IoCancelIrp for all outstanding IRPs, and which
synchronizes
>> with the IRP completion routine. When all requests have come back
up, I
>> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
>> sure).
>> This code is pretty old, and seems to be fine on inspection, and has
>> passed
>> testing on checked HAL/kernel machines with verifier enabled- UP, SMP
and
>> HT. The code is based on Walter Oney’s example for canceling async
IRPs
>> which I created on page 284 of his second ed. book, but extended for
>> multiples IRPs down at once.
>>
>> Are there any suggestions at all how this issue could occur…what
could
>> I
>> be doing to provoke this? (besides passing down a bad buffer pointer-

>> that
>> code has been reviewed, and extensive verifier/checked build stress
tests
>> have been done).
>>
>> thanks,
>>
>> Philip Lukidis
>>
>> -----Original Message-----
>> From: Philip Lukidis
>> Sent: Thursday, July 07, 2005 4:39 PM
>> To: ‘xxxxx@lists.osr.com’
>> Subject: question for USB isochronous I/O (input)- massive corruption
>> issue
>>
>>
>> Hello. I’m having a strange issue with my device on WinXP SP2, on an
>> Intel
>> Springdale mobo. My device is a USB audio device, for which I have
>> written
>> an ASIO audio driver, whose lower layer is a kernel USB driver, which
>> performs the necessary input and output isochronous I/O. On the
>> aforementioned mobo onto which the device is plugged into an EHCI
port, I
>> see from time to time that isochronous I/O (input) is giving me
enormous
>> host memory corruption issues when my USB driver has not fed the host
>> controller with enough input URBs (deliberate for stress testing).
When
>> I
>> make sure to keep the host controller well supplied with input URBs,
all
>> is
>> (seemingly) well.
>>
>> However, when the host controller is starved of input URBs (perhaps
for
>> hundreds or milliseconds), thereafter I see enormous host memory
>> corruption.
>> The BSODs are so many in variety that it would be (I think at least)
>> pointless to post them. I’ve been using the checked HAL/kernel/USB
>> stack,
>> with verifier enabled for my driver, but that did not help pinpoint
the
>> issue. For USB input URB management, I do use the
USBD_SHORT_TRANSFER_OK
>> flag, and I calculate my USB frames rather than use the
>> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated
this
>> issue
>> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>>
>> The latest finding is interesting. My latest BSOD pointed towards a
>> region
>> of nuked memory, which had a data pattern identical to a input USB
packet
>> data pattern recorded on my CATC USB bus analyzer. So it looks to me

>> that
>> starving the host controller of input URBs (seemingly on that one
machine
>> so
>> far) results in improper DMA to host memory…Any takers as to why
this
>> may
>> be, or any suggestions at all would be welcome.
>>
>> thanks,
>>
>> Philip Lukidis
>>
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@guillemot.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · July 8, 2005, 8:49pm

Unfortunately I don’t have a second sample, but I’ll consider this
possibility if no other cause is found. So far, it has been a reliable test
machine except for this one issue.

thanks for your reply,

Philip Lukidis

-----Original Message-----
From: Tim Roberts
To: Windows System Software Devs Interest List
Sent: 07/08/2005 7:02 PM
Subject: Re: [ntdev] question for USB isochronous I/O (input)- massive cor
ruption issue

Philip Lukidis wrote:

I can’t wait to figure this one out, its been driving my nuts,
especially
that I can reproduce this only on one machine. Thanks for your reply,
I do
appreciate it.

Do not overlook the possibility that the host controller in this machine

is simply defective. All it would take is for the DMA machinery to miss

one “stop” signal, and you’d have disaster. Do you have a second sample

of this exact same motherboard?

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · July 11, 2005, 11:26am

Well, it turns out that I don’t have to call IoCancelIrp for this corruption
to occur. In fact, I have removed it, and I now use abort on the pipe, and
I wait for all IRPs to come up. This is by far easier, but it did not help.

I’ll now mention what *did* make the problem disappear; I was using a USB
2.0 hub for convenience, which I realized that I never used on any other
test machine. Well, plugging directly into the mobo URB ports made the
issue disappear. Any ideas why?

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Friday, July 08, 2005 6:28 PM
To: ‘Windows System Software Devs Interest List’
Subject: RE: [ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi Scott, thanks for your reply.

Actually, I’m using a TransferBuffer instead of a TransferMDL. I suppose it
could not hurt to switch strategies, as a test.

Yes, I thought about what you mention (buffers/URBs being freed
prematurely), so that is why I focused on proving to myself that all
requests were indeed being cancelled (and touched on it in my second email).
Some ASIO programs do a rapid succession of start/stop during loading, so
cancellation would be stressed more, but this corruption has occurred in
cases before cancellation came into effect (unless I missed a DbgPrint on
the console- I’ll reconfirm that with a set breakpoint). When an input IRP
completes, the record data is copied, the URB is freed, and it signals my
system thread (which had been waiting, and has a low realtime priority) to
send down more input buffers up to a specified maximum. I opted for that
approach due to fear of recursion in sending down buffers from the
completion routine, and scheduling a DPC is dangerous due to surprise
removal and KeFlushQueuedDPCs is not supported on all OS versions). For now
I do not recycle the IRPs; I’ve done that before, but I have heard that IRPs
have their own lookasides, so I figured calling IoAllocateIrp/IoFreeIrp
would not be expensive. I do allocate my input/output URBs and buffers from
lookasides (though for now I reverted to ExAllocatePoolWithTag/ExFreePool,
so that verifier special pool applies- in fact the starting point of this
was that ExFreeToNPagedLookasideList was observed to hang, probably due to
corruption in its list). URBs are only freed in the completion routines),
so I can’t see how I could be freeing a request which is outstanding.

I did mention cancellation in the second email, but I do not believe that it
is necessary in order to produce the corruption. I’ll confirm that.

I like your idea of IRP/URB to physical page listing, maybe that could help
me figure out which request resulted in observed corruption.

I can’t wait to figure this one out, its been driving my nuts, especially
that I can reproduce this only on one machine. Thanks for your reply, I do
appreciate it.

Philip Lukidis

-----Original Message-----
From: Scott Noone [mailto:xxxxx@osr.com]
Sent: Friday, July 08, 2005 5:48 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi,

Never had this particular problem, though I can’t say that I’ve ever tested
in your configuration nor tried exactly what you’re trying

It sounds to me like MDL abuse, if the MDLs that you’re passing down the USB

stack are freed/unlocked prematurely it could result in the random memory
corruption that you’re seeing. I can think of lots of goofy stuff that would

cause this to happen, but it sounds like this code is pretty well tested so
it’s probably something subtle (or some horrible HC issue). Are you
specifying a TransferBuffer or a TransferBufferMDL when you build your URBs?

Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
after they have been completed) or just letting them be completed once and
throwing them away?

Also, you mention your cancel logic in this message but not the previous
one. Do you only see this memory corruption after you’ve cancelled requests?

As a last resort you might want to try keeping track of the IRP/URB pairs
and physical pages that they’re reading into. Once the system crashes you
could then try to match the scoobied physical page up with a particular
IRP/URB/MDL set and hope that some light is shed on the issue…

HTH,

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Some extra information, hope it helps place everything in context:
>
> - For recording audio, I always send down multiple URB in requests to the
> isoch endpoint so that I don’t lose data. My issue occurs only when I am
> not sending down enough URB in requests (as a stress test). My normal
> assumption is that the input data would be lost.
> - The observed issue is that when I know that insufficient URBs for input
> have been sent down to the host controller, input packet data which I
> observe in my CATC analyzer is splattered in host memory- sometimes, but
> not always, producing the verifier C1 special pool BSOD. Sometimes other
> BSODs occur, like illegal operand and such, because code/data has been
> blasted to hell through DMA.
> - I always send down URB buffers of integral size wrt to the maximum
> packet
> size. So if my maximum packet size is X, I always send down a URB buffer
> of
> size NX, where N is determined by the user’s choice of the ASIO buffer
> size.
> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
> myself)
> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
> SP2. Verifier’s special pool flushes it out all that much faster. The
> host
> controller is EHCI, and my device operates at full speed. Using the
> checker
> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
> - The machine where this has been observed is an ASUS Springdale mobo,
> with
> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
> on
> or off. In any case, the driver has been tested extensively on
> dual/hyperthreading machines, and the issue so far has been observed on
> this
> machine alone- WHEN I throttle the amount of input URBs sent down.
> - Testing on OHCI machines has not reproduced the issue so far.
>
> When I need to cancel my requests, I have a cancelAllRequests() routine
> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
> with the IRP completion routine. When all requests have come back up, I
> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
> sure).
> This code is pretty old, and seems to be fine on inspection, and has
> passed
> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
> HT. The code is based on Walter Oney’s example for canceling async IRPs
> which I created on page 284 of his second ed. book, but extended for
> multiples IRPs down at once.
>
> Are there any suggestions at all how this issue could occur…what could I
> be doing to provoke this? (besides passing down a bad buffer pointer- that
> code has been reviewed, and extensive verifier/checked build stress tests
> have been done).
>
> thanks,
>
> Philip Lukidis
>
> -----Original Message-----
> From: Philip Lukidis
> Sent: Thursday, July 07, 2005 4:39 PM
> To: ‘xxxxx@lists.osr.com’
> Subject: question for USB isochronous I/O (input)- massive corruption
> issue
>
>
> Hello. I’m having a strange issue with my device on WinXP SP2, on an
> Intel
> Springdale mobo. My device is a USB audio device, for which I have
> written
> an ASIO audio driver, whose lower layer is a kernel USB driver, which
> performs the necessary input and output isochronous I/O. On the
> aforementioned mobo onto which the device is plugged into an EHCI port, I
> see from time to time that isochronous I/O (input) is giving me enormous
> host memory corruption issues when my USB driver has not fed the host
> controller with enough input URBs (deliberate for stress testing). When I
> make sure to keep the host controller well supplied with input URBs, all
> is
> (seemingly) well.
>
> However, when the host controller is starved of input URBs (perhaps for
> hundreds or milliseconds), thereafter I see enormous host memory
> corruption.
> The BSODs are so many in variety that it would be (I think at least)
> pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
> with verifier enabled for my driver, but that did not help pinpoint the
> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
> flag, and I calculate my USB frames rather than use the
> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
> issue
> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>
> The latest finding is interesting. My latest BSOD pointed towards a
> region
> of nuked memory, which had a data pattern identical to a input USB packet
> data pattern recorded on my CATC USB bus analyzer. So it looks to me that
> starving the host controller of input URBs (seemingly on that one machine
> so
> far) results in improper DMA to host memory…Any takers as to why this
> may
> be, or any suggestions at all would be welcome.
>
> thanks,
>
> Philip Lukidis
>

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · July 11, 2005, 11:42am

Perhaps I should have mentioned that it was an externally powered USB 2.0
hub, by Cicero. I’ve never had any problems with it in general.

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Monday, July 11, 2005 11:26 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] question for USB isochronous I/O (input)- massive
cor ruption issue

Well, it turns out that I don’t have to call IoCancelIrp for this corruption
to occur. In fact, I have removed it, and I now use abort on the pipe, and
I wait for all IRPs to come up. This is by far easier, but it did not help.

I’ll now mention what *did* make the problem disappear; I was using a USB
2.0 hub for convenience, which I realized that I never used on any other
test machine. Well, plugging directly into the mobo URB ports made the
issue disappear. Any ideas why?

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Friday, July 08, 2005 6:28 PM
To: ‘Windows System Software Devs Interest List’
Subject: RE: [ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi Scott, thanks for your reply.

Actually, I’m using a TransferBuffer instead of a TransferMDL. I suppose it
could not hurt to switch strategies, as a test.

Yes, I thought about what you mention (buffers/URBs being freed
prematurely), so that is why I focused on proving to myself that all
requests were indeed being cancelled (and touched on it in my second email).
Some ASIO programs do a rapid succession of start/stop during loading, so
cancellation would be stressed more, but this corruption has occurred in
cases before cancellation came into effect (unless I missed a DbgPrint on
the console- I’ll reconfirm that with a set breakpoint). When an input IRP
completes, the record data is copied, the URB is freed, and it signals my
system thread (which had been waiting, and has a low realtime priority) to
send down more input buffers up to a specified maximum. I opted for that
approach due to fear of recursion in sending down buffers from the
completion routine, and scheduling a DPC is dangerous due to surprise
removal and KeFlushQueuedDPCs is not supported on all OS versions). For now
I do not recycle the IRPs; I’ve done that before, but I have heard that IRPs
have their own lookasides, so I figured calling IoAllocateIrp/IoFreeIrp
would not be expensive. I do allocate my input/output URBs and buffers from
lookasides (though for now I reverted to ExAllocatePoolWithTag/ExFreePool,
so that verifier special pool applies- in fact the starting point of this
was that ExFreeToNPagedLookasideList was observed to hang, probably due to
corruption in its list). URBs are only freed in the completion routines),
so I can’t see how I could be freeing a request which is outstanding.

I did mention cancellation in the second email, but I do not believe that it
is necessary in order to produce the corruption. I’ll confirm that.

I like your idea of IRP/URB to physical page listing, maybe that could help
me figure out which request resulted in observed corruption.

I can’t wait to figure this one out, its been driving my nuts, especially
that I can reproduce this only on one machine. Thanks for your reply, I do
appreciate it.

Philip Lukidis

-----Original Message-----
From: Scott Noone [mailto:xxxxx@osr.com]
Sent: Friday, July 08, 2005 5:48 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi,

Never had this particular problem, though I can’t say that I’ve ever tested
in your configuration nor tried exactly what you’re trying

It sounds to me like MDL abuse, if the MDLs that you’re passing down the USB

stack are freed/unlocked prematurely it could result in the random memory
corruption that you’re seeing. I can think of lots of goofy stuff that would

cause this to happen, but it sounds like this code is pretty well tested so
it’s probably something subtle (or some horrible HC issue). Are you
specifying a TransferBuffer or a TransferBufferMDL when you build your URBs?

Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
after they have been completed) or just letting them be completed once and
throwing them away?

Also, you mention your cancel logic in this message but not the previous
one. Do you only see this memory corruption after you’ve cancelled requests?

As a last resort you might want to try keeping track of the IRP/URB pairs
and physical pages that they’re reading into. Once the system crashes you
could then try to match the scoobied physical page up with a particular
IRP/URB/MDL set and hope that some light is shed on the issue…

HTH,

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Some extra information, hope it helps place everything in context:
>
> - For recording audio, I always send down multiple URB in requests to the
> isoch endpoint so that I don’t lose data. My issue occurs only when I am
> not sending down enough URB in requests (as a stress test). My normal
> assumption is that the input data would be lost.
> - The observed issue is that when I know that insufficient URBs for input
> have been sent down to the host controller, input packet data which I
> observe in my CATC analyzer is splattered in host memory- sometimes, but
> not always, producing the verifier C1 special pool BSOD. Sometimes other
> BSODs occur, like illegal operand and such, because code/data has been
> blasted to hell through DMA.
> - I always send down URB buffers of integral size wrt to the maximum
> packet
> size. So if my maximum packet size is X, I always send down a URB buffer
> of
> size NX, where N is determined by the user’s choice of the ASIO buffer
> size.
> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
> myself)
> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
> SP2. Verifier’s special pool flushes it out all that much faster. The
> host
> controller is EHCI, and my device operates at full speed. Using the
> checker
> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
> - The machine where this has been observed is an ASUS Springdale mobo,
> with
> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
> on
> or off. In any case, the driver has been tested extensively on
> dual/hyperthreading machines, and the issue so far has been observed on
> this
> machine alone- WHEN I throttle the amount of input URBs sent down.
> - Testing on OHCI machines has not reproduced the issue so far.
>
> When I need to cancel my requests, I have a cancelAllRequests() routine
> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
> with the IRP completion routine. When all requests have come back up, I
> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
> sure).
> This code is pretty old, and seems to be fine on inspection, and has
> passed
> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
> HT. The code is based on Walter Oney’s example for canceling async IRPs
> which I created on page 284 of his second ed. book, but extended for
> multiples IRPs down at once.
>
> Are there any suggestions at all how this issue could occur…what could I
> be doing to provoke this? (besides passing down a bad buffer pointer- that
> code has been reviewed, and extensive verifier/checked build stress tests
> have been done).
>
> thanks,
>
> Philip Lukidis
>
> -----Original Message-----
> From: Philip Lukidis
> Sent: Thursday, July 07, 2005 4:39 PM
> To: ‘xxxxx@lists.osr.com’
> Subject: question for USB isochronous I/O (input)- massive corruption
> issue
>
>
> Hello. I’m having a strange issue with my device on WinXP SP2, on an
> Intel
> Springdale mobo. My device is a USB audio device, for which I have
> written
> an ASIO audio driver, whose lower layer is a kernel USB driver, which
> performs the necessary input and output isochronous I/O. On the
> aforementioned mobo onto which the device is plugged into an EHCI port, I
> see from time to time that isochronous I/O (input) is giving me enormous
> host memory corruption issues when my USB driver has not fed the host
> controller with enough input URBs (deliberate for stress testing). When I
> make sure to keep the host controller well supplied with input URBs, all
> is
> (seemingly) well.
>
> However, when the host controller is starved of input URBs (perhaps for
> hundreds or milliseconds), thereafter I see enormous host memory
> corruption.
> The BSODs are so many in variety that it would be (I think at least)
> pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
> with verifier enabled for my driver, but that did not help pinpoint the
> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
> flag, and I calculate my USB frames rather than use the
> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
> issue
> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>
> The latest finding is interesting. My latest BSOD pointed towards a
> region
> of nuked memory, which had a data pattern identical to a input USB packet
> data pattern recorded on my CATC USB bus analyzer. So it looks to me that
> starving the host controller of input URBs (seemingly on that one machine
> so
> far) results in improper DMA to host memory…Any takers as to why this
> may
> be, or any suggestions at all would be welcome.
>
> thanks,
>
> Philip Lukidis
>

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · July 11, 2005, 2:11pm

More data, if anyone is interested. The issue is NOT confined to the one
model of USB 2.0 hub which I previously mentioned (Cicero). Now for the
first time I can duplicate this on more than one machine, and with more than
one type of hub. The formula for duplication would be plugging a USB 2.0
externally powered hub (Belkin OR Cicero so far) into an enhanced controller
root hub. Then I install my device by plugging into the USB 2.0 hub (the
generic hub off the root hub). Note that plugging in a bus powered USB 1.1
hub into an enhanced controller root hub does NOT cause the issue to
surface. All the hubs which cause the issue to be observed use usbhub.sys
as the driver.

When the duplication parameters are fulfilled, a previous machine which was
fine shows the corruption. It’s as if an old buffer/MDL is being used by
the host controller somehow…but that is a pure *shot in the dark* (I’ll
try to prove it with DbgPrints). Without the USB 2.0 hub, the issue does
not occur. Any suggestions, hints, as to why this would occur?

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Monday, July 11, 2005 11:42 AM
To: ‘Windows System Software Devs Interest List’
Subject: RE: [ntdev] question for USB isochronous I/O (input)- massive
cor ruption issue

Perhaps I should have mentioned that it was an externally powered USB 2.0
hub, by Cicero. I’ve never had any problems with it in general.

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Monday, July 11, 2005 11:26 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] question for USB isochronous I/O (input)- massive
cor ruption issue

Well, it turns out that I don’t have to call IoCancelIrp for this corruption
to occur. In fact, I have removed it, and I now use abort on the pipe, and
I wait for all IRPs to come up. This is by far easier, but it did not help.

I’ll now mention what *did* make the problem disappear; I was using a USB
2.0 hub for convenience, which I realized that I never used on any other
test machine. Well, plugging directly into the mobo URB ports made the
issue disappear. Any ideas why?

thanks,

Philip Lukidis

-----Original Message-----
From: Philip Lukidis
Sent: Friday, July 08, 2005 6:28 PM
To: ‘Windows System Software Devs Interest List’
Subject: RE: [ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi Scott, thanks for your reply.

Actually, I’m using a TransferBuffer instead of a TransferMDL. I suppose it
could not hurt to switch strategies, as a test.

Yes, I thought about what you mention (buffers/URBs being freed
prematurely), so that is why I focused on proving to myself that all
requests were indeed being cancelled (and touched on it in my second email).
Some ASIO programs do a rapid succession of start/stop during loading, so
cancellation would be stressed more, but this corruption has occurred in
cases before cancellation came into effect (unless I missed a DbgPrint on
the console- I’ll reconfirm that with a set breakpoint). When an input IRP
completes, the record data is copied, the URB is freed, and it signals my
system thread (which had been waiting, and has a low realtime priority) to
send down more input buffers up to a specified maximum. I opted for that
approach due to fear of recursion in sending down buffers from the
completion routine, and scheduling a DPC is dangerous due to surprise
removal and KeFlushQueuedDPCs is not supported on all OS versions). For now
I do not recycle the IRPs; I’ve done that before, but I have heard that IRPs
have their own lookasides, so I figured calling IoAllocateIrp/IoFreeIrp
would not be expensive. I do allocate my input/output URBs and buffers from
lookasides (though for now I reverted to ExAllocatePoolWithTag/ExFreePool,
so that verifier special pool applies- in fact the starting point of this
was that ExFreeToNPagedLookasideList was observed to hang, probably due to
corruption in its list). URBs are only freed in the completion routines),
so I can’t see how I could be freeing a request which is outstanding.

I did mention cancellation in the second email, but I do not believe that it
is necessary in order to produce the corruption. I’ll confirm that.

I like your idea of IRP/URB to physical page listing, maybe that could help
me figure out which request resulted in observed corruption.

I can’t wait to figure this one out, its been driving my nuts, especially
that I can reproduce this only on one machine. Thanks for your reply, I do
appreciate it.

Philip Lukidis

-----Original Message-----
From: Scott Noone [mailto:xxxxx@osr.com]
Sent: Friday, July 08, 2005 5:48 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
corruption issue

Hi,

Never had this particular problem, though I can’t say that I’ve ever tested
in your configuration nor tried exactly what you’re trying

It sounds to me like MDL abuse, if the MDLs that you’re passing down the USB

stack are freed/unlocked prematurely it could result in the random memory
corruption that you’re seeing. I can think of lots of goofy stuff that would

cause this to happen, but it sounds like this code is pretty well tested so
it’s probably something subtle (or some horrible HC issue). Are you
specifying a TransferBuffer or a TransferBufferMDL when you build your URBs?

Are you ping-ponging the IRPs (i.e. constantly resubmitting the IRPs again
after they have been completed) or just letting them be completed once and
throwing them away?

Also, you mention your cancel logic in this message but not the previous
one. Do you only see this memory corruption after you’ve cancelled requests?

As a last resort you might want to try keeping track of the IRP/URB pairs
and physical pages that they’re reading into. Once the system crashes you
could then try to match the scoobied physical page up with a particular
IRP/URB/MDL set and hope that some light is shed on the issue…

HTH,

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Philip Lukidis” wrote in message
news:xxxxx@ntdev…
> Some extra information, hope it helps place everything in context:
>
> - For recording audio, I always send down multiple URB in requests to the
> isoch endpoint so that I don’t lose data. My issue occurs only when I am
> not sending down enough URB in requests (as a stress test). My normal
> assumption is that the input data would be lost.
> - The observed issue is that when I know that insufficient URBs for input
> have been sent down to the host controller, input packet data which I
> observe in my CATC analyzer is splattered in host memory- sometimes, but
> not always, producing the verifier C1 special pool BSOD. Sometimes other
> BSODs occur, like illegal operand and such, because code/data has been
> blasted to hell through DMA.
> - I always send down URB buffers of integral size wrt to the maximum
> packet
> size. So if my maximum packet size is X, I always send down a URB buffer
> of
> size NX, where N is determined by the user’s choice of the ASIO buffer
> size.
> - I always use the USBD_SHORT_TRANSFER_OK flag, and never use the
> USBD_START_ISO_TRANSFER_ASAP flag (meaning I enter the frame number
> myself)
> - This issue so far only occurs on one machine, with WinXP SP1 and WinXP
> SP2. Verifier’s special pool flushes it out all that much faster. The
> host
> controller is EHCI, and my device operates at full speed. Using the
> checker
> kernel/HAL/USB stack did not result in untoward warnings before the BSOD.
> - The machine where this has been observed is an ASUS Springdale mobo,
> with
> a hyperthreading 3.0 GHz CPU. The issue occurs whether hyperthreading is
> on
> or off. In any case, the driver has been tested extensively on
> dual/hyperthreading machines, and the issue so far has been observed on
> this
> machine alone- WHEN I throttle the amount of input URBs sent down.
> - Testing on OHCI machines has not reproduced the issue so far.
>
> When I need to cancel my requests, I have a cancelAllRequests() routine
> which calls IoCancelIrp for all outstanding IRPs, and which synchronizes
> with the IRP completion routine. When all requests have come back up, I
> call URB_FUNCTION_ABORT_PIPE (perhaps redundant, but I wanted to make
> sure).
> This code is pretty old, and seems to be fine on inspection, and has
> passed
> testing on checked HAL/kernel machines with verifier enabled- UP, SMP and
> HT. The code is based on Walter Oney’s example for canceling async IRPs
> which I created on page 284 of his second ed. book, but extended for
> multiples IRPs down at once.
>
> Are there any suggestions at all how this issue could occur…what could I
> be doing to provoke this? (besides passing down a bad buffer pointer- that
> code has been reviewed, and extensive verifier/checked build stress tests
> have been done).
>
> thanks,
>
> Philip Lukidis
>
> -----Original Message-----
> From: Philip Lukidis
> Sent: Thursday, July 07, 2005 4:39 PM
> To: ‘xxxxx@lists.osr.com’
> Subject: question for USB isochronous I/O (input)- massive corruption
> issue
>
>
> Hello. I’m having a strange issue with my device on WinXP SP2, on an
> Intel
> Springdale mobo. My device is a USB audio device, for which I have
> written
> an ASIO audio driver, whose lower layer is a kernel USB driver, which
> performs the necessary input and output isochronous I/O. On the
> aforementioned mobo onto which the device is plugged into an EHCI port, I
> see from time to time that isochronous I/O (input) is giving me enormous
> host memory corruption issues when my USB driver has not fed the host
> controller with enough input URBs (deliberate for stress testing). When I
> make sure to keep the host controller well supplied with input URBs, all
> is
> (seemingly) well.
>
> However, when the host controller is starved of input URBs (perhaps for
> hundreds or milliseconds), thereafter I see enormous host memory
> corruption.
> The BSODs are so many in variety that it would be (I think at least)
> pointless to post them. I’ve been using the checked HAL/kernel/USB stack,
> with verifier enabled for my driver, but that did not help pinpoint the
> issue. For USB input URB management, I do use the USBD_SHORT_TRANSFER_OK
> flag, and I calculate my USB frames rather than use the
> USBD_START_ISO_TRANSFER_ASAP flag. So far, I have not duplicated this
> issue
> on any machine except one- a Springdale (ASUS), 3Ghz, 512 MB DDR RAM.
>
> The latest finding is interesting. My latest BSOD pointed towards a
> region
> of nuked memory, which had a data pattern identical to a input USB packet
> data pattern recorded on my CATC USB bus analyzer. So it looks to me that
> starving the host controller of input URBs (seemingly on that one machine
> so
> far) results in improper DMA to host memory…Any takers as to why this
> may
> be, or any suggestions at all would be welcome.
>
> thanks,
>
> Philip Lukidis
>

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Norbert_Kawulski · July 27, 2005, 9:43am

Hello Philip,
I once had a very similar problem ( and never solved it).
I was facing random BSODs on a particular system running WindowsXP.
I have analyzed the memdumps and found a recurring pattern of
overwritten bytes just after a page-boundary. This means all the
different BSODs are caused because someone hammers into the beginning
of a (4K) page.

However, this seems to be related to running our drivers ???
These same drivers run fine on a checked MP system with verifier.

The memory corruption seems to have a length of 0xC bytes.

It was a notebook from HP with Acer Labs M5237 Hostcontroller.

Using a PCMCIA USB 2.0 Card in this machine did not show the problems
that the built-in hardware had shown. So we defined the hardware as
being broken, basta.

Norbert.
“Diplomacy is the art of telling someone to go to hell and have them
look forward to the trip.”

OSR_Community_User · July 27, 2005, 2:52pm

In my case, I can make the problem go away by attaching to the root hub,
because the problem manifests itself when attached to a generic 2.0 hub off
the root hub *while* under heavy load. Also, I have duplicated this on two
different machines. Finally, the hammer blows to memory definitely
consisted of input packets from my device.

Was there any pattern to your corruption? Did you recognize the corrupting
bytes as being part of your input packets, as in my case? Did it occur when
attached to the root hub? Did it occur for isoch I/O only, or for
bulk/interrupt I/O as well?

thanks,

Philip Lukidis

-----Original Message-----
From: Norbert Kawulski [mailto:xxxxx@stollmann.de]
Sent: Wednesday, July 27, 2005 9:41 AM
To: Windows System Software Devs Interest List
Cc: Philip Lukidis
Subject: RE: question for USB isochronous I/O (input)- massive cor
ruption issue

Hello Philip,
I once had a very similar problem ( and never solved it).
I was facing random BSODs on a particular system running WindowsXP.
I have analyzed the memdumps and found a recurring pattern of
overwritten bytes just after a page-boundary. This means all the
different BSODs are caused because someone hammers into the beginning
of a (4K) page.

However, this seems to be related to running our drivers ???
These same drivers run fine on a checked MP system with verifier.

The memory corruption seems to have a length of 0xC bytes.

It was a notebook from HP with Acer Labs M5237 Hostcontroller.

Using a PCMCIA USB 2.0 Card in this machine did not show the problems
that the built-in hardware had shown. So we defined the hardware as
being broken, basta.

Norbert.
“Diplomacy is the art of telling someone to go to hell and have them
look forward to the trip.”

Norbert_Kawulski · July 28, 2005, 9:31am

Yes, it was only isochronous.
(Yes), the corrupted data looked like an input packet.
It looked like arbitrary pages were hit.
The problem report stated that a USB 2.0 hub had to be connected.
It was a customer machine that I have no access to any more.
The problem did not reproduce on our QA machines.
Norbert.

“Two feet on the ground is better than one in the mouth.”
---- snip ----

Was there any pattern to your corruption? Did you recognize the corrupting
bytes as being part of your input packets, as in my case? Did it occur when
attached to the root hub? Did it occur for isoch I/O only, or for
bulk/interrupt I/O as well?

---- snip ----

OSR_Community_User · July 28, 2005, 9:54pm

Well, there was at least one extra trigger required…there had to be a
period during which the host controller had no input URB queued up. I
achieved this by throttling my IN URBs sending routine in my driver, and
increased the machine load. Using this, I can duplicate the issue on at
least two different machines. Otherwise, this would never show up (in fact
it was hidden for months here). Maybe that is why you never duplicated the
issue, but it is a *pure guess*. If possible, it would be an interesting
test to perform on an arbitrary machine.

hth

Philip Lukidis

-----Original Message-----
From: Norbert Kawulski [mailto:xxxxx@stollmann.de]
Sent: Thursday, July 28, 2005 9:29 AM
To: Windows System Software Devs Interest List
Cc: Philip Lukidis
Subject: Re:[ntdev] question for USB isochronous I/O (input)- massive
cor ruption issue

Yes, it was only isochronous.
(Yes), the corrupted data looked like an input packet.
It looked like arbitrary pages were hit.
The problem report stated that a USB 2.0 hub had to be connected.
It was a customer machine that I have no access to any more.
The problem did not reproduce on our QA machines.
Norbert.

“Two feet on the ground is better than one in the mouth.”
---- snip ----

Was there any pattern to your corruption? Did you recognize the
corrupting
bytes as being part of your input packets, as in my case? Did it occur
when
attached to the root hub? Did it occur for isoch I/O only, or for
bulk/interrupt I/O as well?

---- snip ----

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@guillemot.com
To unsubscribe send a blank email to xxxxx@lists.osr.com