Blue screen when suprise remove USB device

Hi:

This is totally a shot in the dark. However, I got to start from somewhere :slight_smile:

I was asked to debug a legacy WDM driver for a USB device(I think that the code was heaveliy borrowed from the DDK bulkusb sample, at least the PnP management part is almost identical).

The problem is that the user encountered blue screen after surprise remove the device. Unfortunately, the details about the steps that lead to the blue screen are not entirely clear and I have not been able to reproduce it in house yet. Following is what I can collect regarding to the blue screen.

The user unplugged the USB device, but left the user application running which was still attempting to read from the device. After an unspecified amount of time, the user tried to close the application. A blue screen happened. In another case, the user reconnected the device and found that the application was not working anymore. So the user tried to close the application, a blue screen happened again.

Follwoing is the dump I got from the user. If someone could pick up something obvious, it will be great. Otherwise, I will try to reproduce it and get a better understanding of the crash.

Thanks in advance.

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 00000000, The address that the exception occurred at
Arg3: ba50fa8c, Exception Record Address
Arg4: ba50f788, Context Record Address

Debugging Details:

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

FAULTING_IP:
+16
00000000 ?? ???

EXCEPTION_RECORD: ba50fa8c -- (.exr 0xffffffffba50fa8c)
ExceptionAddress: 00000000
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000008
Parameter[1]: 00000000
Attempt to execute non-executable address 00000000

CONTEXT: ba50f788 -- (.cxr 0xffffffffba50f788)
eax=00000006 ebx=83ca3850 ecx=00000000 edx=00000000 esi=8410dac8 edi=83ca35e0
eip=00000000 esp=ba50fb54 ebp=ba50fb88 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00210246
00000000 ?? ???
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: System

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

EXCEPTION_PARAMETER1: 00000008

EXCEPTION_PARAMETER2: 00000000

WRITE_ADDRESS: 00000000

FOLLOWUP_IP:
fireballusb+36c7
a22e16c7 8945f0 mov dword ptr [ebp-10h],eax

FAILED_INSTRUCTION_ADDRESS:
+28b952f01addfdc
00000000 ?? ???

BUGCHECK_STR: 0x7E

LAST_CONTROL_TRANSFER: from ba1acc8d to 00000000

STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
ba50fb50 ba1acc8d 00000000 68627375 70646f52 0x0
ba50fb88 ba1b3e8a 8410da00 83fa9008 83fa9008 usbhub!USBH_PdoRemoveDevice+0x7b
ba50fba4 ba1acf06 8410dac8 83fa9008 00000002 usbhub!USBH_PdoPnP+0x58
ba50fbc8 ba1aa5e4 0110dac8 83fa9008 ba50fc14 usbhub!USBH_PdoDispatch+0x5a
ba50fbd8 804ef19f 8410da10 83fa9008 8411a210 usbhub!USBH_HubDispatch+0x48
ba50fbe8 a22e16c7 00000001 ffffffff 8052b734 nt!IopfCallDriver+0x31
ba50fc14 a22df03e 847c6b68 83fa9008 00000002 fireballusb+0x36c7
ba50fc48 804ef19f 847c6b68 83fa9008 ba50fcd4 fireballusb+0x103e
ba50fc58 80592b63 8410da10 8410da10 00000002 nt!IopfCallDriver+0x31
ba50fc84 80592dc5 847c6b68 ba50fcb0 00000000 nt!IopSynchronousCall+0xb7
ba50fcd8 804f6f70 8410da10 00000002 00000000 nt!IopRemoveDevice+0x93
ba50fd00 80594796 e1bd1978 00000018 e472b8e8 nt!IopRemoveLockedDeviceNode+0x160
ba50fd18 805947fd 8a5ce5c0 00000002 e472b8e8 nt!IopDeleteLockedDeviceNode+0x34
ba50fd4c 805948a1 8410da10 0272b8e8 00000002 nt!IopDeleteLockedDeviceNodes+0x3f
ba50fd7c 8053879d 8a5f4dd8 00000000 8a792da8 nt!IopDelayedRemoveWorker+0x4b
ba50fdac 805cff62 8a5f4dd8 00000000 00000000 nt!ExpWorkerThread+0xef
ba50fddc 8054612e 805386ae 00000001 00000000 nt!PspSystemThreadStartup+0x34
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

SYMBOL_STACK_INDEX: 6

SYMBOL_NAME: fireballusb+36c7

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: fireballusb

IMAGE_NAME: fireballusb.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4d0fd2d0

STACK_COMMAND: .cxr 0xffffffffba50f788 ; kb

FAILURE_BUCKET_ID: 0x7E_NULL_IP_fireballusb+36c7

BUCKET_ID: 0x7E_NULL_IP_fireballusb+36c7

Followup: MachineOwner

xxxxx@yahoo.com wrote:

This is totally a shot in the dark. However, I got to start from somewhere :slight_smile:

I was asked to debug a legacy WDM driver for a USB device(I think that the code was heaveliy borrowed from the DDK bulkusb sample, at least the PnP management part is almost identical).

The problem is that the user encountered blue screen after surprise remove the device. Unfortunately, the details about the steps that lead to the blue screen are not entirely clear and I have not been able to reproduce it in house yet. Following is what I can collect regarding to the blue screen.

The user unplugged the USB device, but left the user application running which was still attempting to read from the device. After an unspecified amount of time, the user tried to close the application. A blue screen happened. In another case, the user reconnected the device and found that the application was not working anymore. So the user tried to close the application, a blue screen happened again.

This is an easy sequence to get wrong. As long as the user application
has the file handle open, the driver for the unplugged device cannot be
removed. It will hang around as a “zombie” driver, with no hardware.
When the device is plugged in again, a NEW instance is created. The old
zombie instance is still present, and the application is still talking
to that old instance.

If the driver assumes there can only be one device at a time (through
the use of globals, for instance), disaster often ensues. The new
instance can trash global variables that the old instance relied on.
Or, if the driver uses a single symbolic link, the new instance cannot
create the link because the old one still exists.

Further, the driver needs to set a flag when it knows the device has
been removed, so it can immediately reject any new application I/O requests.

In the end, your application needs to be smarter. It needs to use
RegisterDeviceNotification to be notified when the device tree changes,
so it can go figure out whether its device has gone missing. That way,
it can close the handle properly.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I’d go take a look at 0xba1acc8d in WinDbg and determine if that is in your driver or not. Given it’s in your driver, you’ve managed bogatize a function pointer and jumping, or calling to 0x00000000. That’s a No No under any circumstance. Given that address is not in your driver then I’d say your driver is still to blame, but now you have to go looking for corrupted memory.

Gary G. Little

----- Original Message -----
From: “axu 69”
To: “Windows System Software Devs Interest List”
Sent: Thursday, March 3, 2011 12:11:46 PM
Subject: [ntdev] Blue screen when suprise remove USB device

Hi:

This is totally a shot in the dark. However, I got to start from somewhere :slight_smile:

I was asked to debug a legacy WDM driver for a USB device(I think that the code was heaveliy borrowed from the DDK bulkusb sample, at least the PnP management part is almost identical).

The problem is that the user encountered blue screen after surprise remove the device. Unfortunately, the details about the steps that lead to the blue screen are not entirely clear and I have not been able to reproduce it in house yet. Following is what I can collect regarding to the blue screen.

The user unplugged the USB device, but left the user application running which was still attempting to read from the device. After an unspecified amount of time, the user tried to close the application. A blue screen happened. In another case, the user reconnected the device and found that the application was not working anymore. So the user tried to close the application, a blue screen happened again.

Follwoing is the dump I got from the user. If someone could pick up something obvious, it will be great. Otherwise, I will try to reproduce it and get a better understanding of the crash.

Thanks in advance.

0: kd> !analyze -v

*
Bugcheck Analysis *
*
***

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 00000000, The address that the exception occurred at
Arg3: ba50fa8c, Exception Record Address
Arg4: ba50f788, Context Record Address

Debugging Details:
------------------

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx” referenced memory at “0x%08lx”. The memory could not be “%s”.

FAULTING_IP:
+16
00000000 ?? ???

EXCEPTION_RECORD: ba50fa8c – (.exr 0xffffffffba50fa8c)
ExceptionAddress: 00000000
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000008
Parameter[1]: 00000000
Attempt to execute non-executable address 00000000

CONTEXT: ba50f788 – (.cxr 0xffffffffba50f788)
eax=00000006 ebx=83ca3850 ecx=00000000 edx=00000000 esi=8410dac8 edi=83ca35e0
eip=00000000 esp=ba50fb54 ebp=ba50fb88 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00210246
00000000 ?? ???
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: System

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx” referenced memory at “0x%08lx”. The memory could not be “%s”.

EXCEPTION_PARAMETER1: 00000008

EXCEPTION_PARAMETER2: 00000000

WRITE_ADDRESS: 00000000

FOLLOWUP_IP:
fireballusb+36c7
a22e16c7 8945f0 mov dword ptr [ebp-10h],eax

FAILED_INSTRUCTION_ADDRESS:
+28b952f01addfdc
00000000 ?? ???

BUGCHECK_STR: 0x7E

LAST_CONTROL_TRANSFER: from ba1acc8d to 00000000

STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
ba50fb50 ba1acc8d 00000000 68627375 70646f52 0x0
ba50fb88 ba1b3e8a 8410da00 83fa9008 83fa9008 usbhub!USBH_PdoRemoveDevice+0x7b
ba50fba4 ba1acf06 8410dac8 83fa9008 00000002 usbhub!USBH_PdoPnP+0x58
ba50fbc8 ba1aa5e4 0110dac8 83fa9008 ba50fc14 usbhub!USBH_PdoDispatch+0x5a
ba50fbd8 804ef19f 8410da10 83fa9008 8411a210 usbhub!USBH_HubDispatch+0x48
ba50fbe8 a22e16c7 00000001 ffffffff 8052b734 nt!IopfCallDriver+0x31
ba50fc14 a22df03e 847c6b68 83fa9008 00000002 fireballusb+0x36c7
ba50fc48 804ef19f 847c6b68 83fa9008 ba50fcd4 fireballusb+0x103e
ba50fc58 80592b63 8410da10 8410da10 00000002 nt!IopfCallDriver+0x31
ba50fc84 80592dc5 847c6b68 ba50fcb0 00000000 nt!IopSynchronousCall+0xb7
ba50fcd8 804f6f70 8410da10 00000002 00000000 nt!IopRemoveDevice+0x93
ba50fd00 80594796 e1bd1978 00000018 e472b8e8 nt!IopRemoveLockedDeviceNode+0x160
ba50fd18 805947fd 8a5ce5c0 00000002 e472b8e8 nt!IopDeleteLockedDeviceNode+0x34
ba50fd4c 805948a1 8410da10 0272b8e8 00000002 nt!IopDeleteLockedDeviceNodes+0x3f
ba50fd7c 8053879d 8a5f4dd8 00000000 8a792da8 nt!IopDelayedRemoveWorker+0x4b
ba50fdac 805cff62 8a5f4dd8 00000000 00000000 nt!ExpWorkerThread+0xef
ba50fddc 8054612e 805386ae 00000001 00000000 nt!PspSystemThreadStartup+0x34
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

SYMBOL_STACK_INDEX: 6

SYMBOL_NAME: fireballusb+36c7

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: fireballusb

IMAGE_NAME: fireballusb.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4d0fd2d0

STACK_COMMAND: .cxr 0xffffffffba50f788 ; kb

FAILURE_BUCKET_ID: 0x7E_NULL_IP_fireballusb+36c7

BUCKET_ID: 0x7E_NULL_IP_fireballusb+36c7

Followup: MachineOwner
---------


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks Tim:

The driver does set a flag and rejects IO request right after receiving the surprise remove. I will go huntting for any possible global variables.

Unfortunately, the application is written by the user and they think that the driver should not crash in any case :slight_smile:

Cheers,

Thanks Gary, I will try to play with those addresses and see what happens.

xxxxx@yahoo.com wrote:

Unfortunately, the application is written by the user and they think that the driver should not crash in any case :slight_smile:

Well, that much is true. However, no matter how smart the driver is,
when you unplug and replug the device, the application is not going to
recover unless there is code in the application to detect that, close
the old, and open the new.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.