crash in usbuhci!UhciAbortAsyncTransfer

OSR_Community_User · May 21, 2007, 4:23pm

Debugging a crash in usbuhci!UhciAbortAsyncTransfer, I’ve come across the following:

usbuhci!UhciAbortAsyncTransfer+0xf:
f9c674c9 8b7d0c mov edi,dword ptr [ebp+0Ch]
f9c674cc 8b4748 mov eax,dword ptr [edi+48h]
f9c674cf 6a00 push 0
f9c674d1 ff7510 push dword ptr [ebp+10h]
f9c674d4 bb01000010 mov ebx,10000001h
f9c674d9 50 push eax
f9c674da 687461415f push 5F416174h
f9c674df 53 push ebx
f9c674e0 ff7508 push dword ptr [ebp+8]
f9c674e3 c645ff00 mov byte ptr [ebp-1],0
f9c674e7 8945f4 mov dword ptr [ebp-0Ch],eax
f9c674ea ff15a485c6f9 call dword ptr [usbuhci!RegistrationPacket+0x104 (f9c685a4)]
f9c674f0 8b4758 mov eax,dword ptr [edi+58h]
f9c674f3 8bf0 mov esi,eax
f9c674f5 eb0e jmp usbuhci!UhciAbortAsyncTransfer+0x4b (f9c67505)

> f9c674f7 8b4e18 mov ecx,dword ptr [esi+18h] <<<=== jump target from below is here
f9c674fa 3b4d10 cmp ecx,dword ptr [ebp+10h]
f9c674fd 740a je usbuhci!UhciAbortAsyncTransfer+0x4f (f9c67509) <<<=== found the pointer of interest
f9c674ff 8975f8 mov dword ptr [ebp-8],esi
f9c67502 8b7624 mov esi,dword ptr [esi+24h]
f9c67505 85f6 test esi,esi
f9c67507 75ee jne usbuhci!UhciAbortAsyncTransfer+0x3d (f9c674f7) <<<=== break out of loop on NULL
f9c67509 8b4e08 mov ecx,dword ptr [esi+8] <<<=== use the pointer just determined to be NULL
f9c6750c c1e913 shr ecx,13h
The first obvious thing is that regardless of whether the jump is taken after the test of esi at 0xf9c67505, esi is going to be deref’ed either by the next instruction or by the jump target which is just above. My crash happens to be at 0xf9c67509 and esi is NULL.

From the looks of things, a list is being scanned and there is an assumption (bug?) that the pointer of interest will be found and the loop will exit with a good pointer since there is not a test after breaking out and before using the pointer. FWIW, this is running under a spinlock acquired a few call frames earlier.

The version of the driver is “5.1.2600.2180 (xpsp_sp2_rtm.040803-2158)”.

Call stack:
kd> k
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
f9e63c74 f9c651fb usbuhci!UhciAbortAsyncTransfer+0x4f
f9e63c98 f962d29b usbuhci!UhciAbortTransfer+0x3d
f9e63ce4 f962d563 USBPORT!USBPORT_DmaEndpointPaused+0x263
f9e63d10 f962f98c USBPORT!USBPORT_DmaEndpointWorker+0x149
f9e63d38 f963341a USBPORT!USBPORT_CoreEndpointWorker+0x6d2
f9e63d7c f962bfc0 USBPORT!USBPORT_Worker+0x212
f9e63dac 805c4a28 USBPORT!USBPORT_WorkerThread+0x12a
f9e63ddc 80540fa2 nt!PspSystemThreadStartup+0x34
00000000 00000000 nt!KiThreadStartup+0x16

Any ideas on how to determine if our code has caused an assumption to be violated or determine if this is a bug in the USB driver?

Thanks,
Tom

OSR_Community_User · May 21, 2007, 4:56pm

I would test this against the checked os distribution. The usbuhci
driver will probably fail an assertion before you get to your crash.

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tom Ramsdell
Sent: Monday, May 21, 2007 4:12 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] crash in usbuhci!UhciAbortAsyncTransfer

Debugging a crash in usbuhci!UhciAbortAsyncTransfer, I’ve come across
the following:

usbuhci!UhciAbortAsyncTransfer+0xf:
f9c674c9 8b7d0c mov edi,dword ptr [ebp+0Ch]
f9c674cc 8b4748 mov eax,dword ptr [edi+48h]
f9c674cf 6a00 push 0
f9c674d1 ff7510 push dword ptr [ebp+10h]
f9c674d4 bb01000010 mov ebx,10000001h
f9c674d9 50 push eax
f9c674da 687461415f push 5F416174h
f9c674df 53 push ebx
f9c674e0 ff7508 push dword ptr [ebp+8]
f9c674e3 c645ff00 mov byte ptr [ebp-1],0
f9c674e7 8945f4 mov dword ptr [ebp-0Ch],eax
f9c674ea ff15a485c6f9 call dword ptr
[usbuhci!RegistrationPacket+0x104 (f9c685a4)]
f9c674f0 8b4758 mov eax,dword ptr [edi+58h]
f9c674f3 8bf0 mov esi,eax
f9c674f5 eb0e jmp usbuhci!UhciAbortAsyncTransfer+0x4b
(f9c67505)

> f9c674f7 8b4e18 mov ecx,dword ptr [esi+18h]
<<<=== jump target from below is here
f9c674fa 3b4d10 cmp ecx,dword ptr [ebp+10h]
f9c674fd 740a je usbuhci!UhciAbortAsyncTransfer+0x4f
(f9c67509) <<<=== found the pointer of interest
f9c674ff 8975f8 mov dword ptr [ebp-8],esi
f9c67502 8b7624 mov esi,dword ptr [esi+24h]
f9c67505 85f6 test esi,esi
f9c67507 75ee jne usbuhci!UhciAbortAsyncTransfer+0x3d
(f9c674f7) <<<=== break out of loop on NULL
f9c67509 8b4e08 mov ecx,dword ptr [esi+8]
<<<=== use the pointer just determined to be NULL
f9c6750c c1e913 shr ecx,13h

The first obvious thing is that regardless of whether the jump is taken
after the test of esi at 0xf9c67505, esi is going to be deref’ed either
by the next instruction or by the jump target which is just above. My
crash happens to be at 0xf9c67509 and esi is NULL.

From the looks of things, a list is being scanned and there is an
assumption (bug?) that the pointer of interest will be found and the
loop will exit with a good pointer since there is not a test after
breaking out and before using the pointer. FWIW, this is running under
a spinlock acquired a few call frames earlier.

The version of the driver is “5.1.2600.2180 (xpsp_sp2_rtm.040803-2158)”.

Call stack:

kd> k
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
f9e63c74 f9c651fb usbuhci!UhciAbortAsyncTransfer+0x4f
f9e63c98 f962d29b usbuhci!UhciAbortTransfer+0x3d
f9e63ce4 f962d563 USBPORT!USBPORT_DmaEndpointPaused+0x263
f9e63d10 f962f98c USBPORT!USBPORT_DmaEndpointWorker+0x149
f9e63d38 f963341a USBPORT!USBPORT_CoreEndpointWorker+0x6d2
f9e63d7c f962bfc0 USBPORT!USBPORT_Worker+0x212
f9e63dac 805c4a28 USBPORT!USBPORT_WorkerThread+0x12a
f9e63ddc 80540fa2 nt!PspSystemThreadStartup+0x34
00000000 00000000 nt!KiThreadStartup+0x16

Any ideas on how to determine if our code has caused an assumption to be
violated or determine if this is a bug in the USB driver?

Thanks,
Tom

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Chris_Aseltine · May 21, 2007, 10:00pm

Tom Ramsdell wrote:

f9e63c74 f9c651fb usbuhci!UhciAbortAsyncTransfer+0x4f
f9e63c98 f962d29b usbuhci!UhciAbortTransfer+0x3d
f9e63ce4 f962d563 USBPORT!USBPORT_DmaEndpointPaused+0x263
f9e63d10 f962f98c USBPORT!USBPORT_DmaEndpointWorker+0x149
f9e63d38 f963341a USBPORT!USBPORT_CoreEndpointWorker+0x6d2
f9e63d7c f962bfc0 USBPORT!USBPORT_Worker+0x212
f9e63dac 805c4a28 USBPORT!USBPORT_WorkerThread+0x12a

For what it’s worth, I’ve seen this same exact backtrace. In my case it was seen on new, probably buggy hardware. I don’t know what the cause really was.

I tend to see UHCI bugchecks fairly often, but have not gotten a lot of insight or help into how to debug them. Usually my approach is “try a totally different way to do what I need to do,” from which I’ve obtained mixed results overall.

Michal_Vodicka-2 · May 22, 2007, 7:07pm

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of xxxxx@gmail.com[SMTP:xxxxx@gmail.com]
Reply To: Windows System Software Devs Interest List
Sent: Tuesday, May 22, 2007 4:02 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] crash in usbuhci!UhciAbortAsyncTransfer

Tom Ramsdell wrote:

> f9e63c74 f9c651fb usbuhci!UhciAbortAsyncTransfer+0x4f
> f9e63c98 f962d29b usbuhci!UhciAbortTransfer+0x3d
> f9e63ce4 f962d563 USBPORT!USBPORT_DmaEndpointPaused+0x263
> f9e63d10 f962f98c USBPORT!USBPORT_DmaEndpointWorker+0x149
> f9e63d38 f963341a USBPORT!USBPORT_CoreEndpointWorker+0x6d2
> f9e63d7c f962bfc0 USBPORT!USBPORT_Worker+0x212
> f9e63dac 805c4a28 USBPORT!USBPORT_WorkerThread+0x12a

For what it’s worth, I’ve seen this same exact backtrace. In my case it was seen on new, probably buggy hardware. I don’t know what the cause really was.

I tend to see UHCI bugchecks fairly often, but have not gotten a lot of insight or help into how to debug them. Usually my approach is “try a totally different way to do what I need to do,” from which I’ve obtained mixed results overall.

I see a bit different backtrace but the crash at the same point (at Vista, not XP SP2). As Tom pointed out, there seems to be clear bug in the usbuhci.sys code.

We’re seeing this bug when running DTM tests at particular machine and OS version combined with specific version of USB fixes. It looks like timing related problem; probably race conditions. I’m trying to solve this problem via PSS but with no success because they resist on local repro. It is probably necessary there are more reports about this issue unless somebody decides it is worth fixing. So please, instead of trying totally different ways to do what you need to do, report OS bugs to MS. I guess even hw problem doesn’t justify NULL pointer dereference.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Chris_Aseltine · May 22, 2007, 10:11pm

Michal Vodicka wrote:

So please, instead of trying totally different ways to do what
you need to do, report OS bugs to MS.

Well, here’s the problem: we are only alotted a few tickets per year and can’t really afford to use them in this manner (to generate “noise” around common issues). Even when we’ve reported problems, the response has not really been useful.

For example, right now we have a ticket open where usbohci.sys is sitting on bulk read IRPs for up to 30 seconds on hyperthreading machines after the data comes in on the wire. The support guy tried to blame Norton Antivirus…

Michal_Vodicka-2 · May 23, 2007, 9:04am

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of xxxxx@gmail.com[SMTP:xxxxx@gmail.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, May 23, 2007 4:12 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] crash in usbuhci!UhciAbortAsyncTransfer

> So please, instead of trying totally different ways to do what
> you need to do, report OS bugs to MS.

Well, here’s the problem: we are only alotted a few tickets per year and can’t really afford to use them in this manner (to generate “noise” around common issues). Even when we’ve reported problems, the response has not really been useful.

If I understood correctly, you also have problem with this issue. And if MS admits bug, you shouldn’t be charged for it.

For example, right now we have a ticket open where usbohci.sys is sitting on bulk read IRPs for up to 30 seconds on hyperthreading machines after the data comes in on the wire. The support guy tried to blame Norton Antivirus…

Support guys will try to blame everything available and if there isn’t anything, they will try to blame hardware :-/ One would expect after releasing buggy beta version of OS they will try to fix it ASAP. The other possibility is they’re totally overloaded with fixing bugs and try to select the most serious ones.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Chris_Aseltine · May 23, 2007, 12:16pm

Michal Vodicka wrote:

If I understood correctly, you also have problem with this issue. And
if MS admits bug, you shouldn’t be charged for it.

Nope: for example, ZLP issue in Win2k usbser.sys. Obvious bug, product support admitted it and then quoted us $30k for a hotfix. That’s 300 engineering hours, and in that amount of time we just rolled our own…

Michal_Vodicka-2 · May 23, 2007, 12:36pm

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of xxxxx@gmail.com[SMTP:xxxxx@gmail.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, May 23, 2007 6:16 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] crash in usbuhci!UhciAbortAsyncTransfer

> If I understood correctly, you also have problem with this issue. And
> if MS admits bug, you shouldn’t be charged for it.

Nope: for example, ZLP issue in Win2k usbser.sys. Obvious bug, product support admitted it and then quoted us $30k for a hotfix. That’s 300 engineering hours, and in that amount of time we just rolled our own…

Have you complained about it? If I understand correctly, support request has to be marked as non-charge. Support should do it when they admit bug.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Chris_Aseltine · May 23, 2007, 1:02pm

Michal Vodicka wrote:

Have you complained about it? If I understand correctly, support
request has to be marked as non-charge. Support should do it
when they admit bug.

Complained about what? To whom? I don’t think we got charged for the support incident, but we didn’t get a hotfix, either.

Michal_Vodicka-2 · May 23, 2007, 1:21pm

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of xxxxx@gmail.com[SMTP:xxxxx@gmail.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, May 23, 2007 7:02 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] crash in usbuhci!UhciAbortAsyncTransfer

Complained about what? To whom? I don’t think we got charged for the support incident, but we didn’t get a hotfix, either.

Ah so, I guess we speak about charge for support incident. Sorry, I misunderstood your previous post.

Anyway, I believe it makes sense to report OS bugs to MS even if it is long, painful, tiresome and hopeless process.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]