Detecting USB device removal - BSOD in VMWare environment

Hi,

We have a USB smartcard reader driver. Card tracking is done through interrupt, hence we always have a IRP pending on the interrupt endpoint. In order to implement device recovery mechanism during ESD scenario, (our reader firmware gets into an unknown state), hence we followed the below procedure.

When ESD occurs, the pending interrupt IRP fails with STATUS_UNSUCCESSFUL. So we take this as an indication of ESD and initiate a USB reset which recovers the device (firmware resets itself, and starts running from the start and thus becomes responsive). When device removal occurs the pending IRP fails with status “STATUS_DEVICE_NOT_CONNECTED, or STATUS_CANCELLED or STATUS_PENDING”. In those cases we dont do a device reset.

This implementation has worked properly across Oses Win7 to 8.1 for more that 4 to 5 years. Currently in one particular Virtual OS setup (our device is connected to Virtual OS [Windows 7-32bit], a VMWare setup, we dont have OS details from the customer), when the device is removed, the pending interrupt IRP fails with STATUS_UNSUCCESSFUL, instead of “STATUS_DEVICE_NOT_CONNECTED, or STATUS_CANCELLED or STATUS_PENDING”. Hence we start the USB reset process which results in a blue screen crash.

At this point we have not even got the surprise removal notification, hence we are not able to differentiate between device removal and an ESD scenario. Is there a way to isolate the device removal properly in the IRP callback function.

Thanks,
Joseph

xxxxx@gmail.com wrote:

This implementation has worked properly across Oses Win7 to 8.1 for more that 4 to 5 years. Currently in one particular Virtual OS setup (our device is connected to Virtual OS [Windows 7-32bit], a VMWare setup, we dont have OS details from the customer), when the device is removed, the pending interrupt IRP fails with STATUS_UNSUCCESSFUL, instead of “STATUS_DEVICE_NOT_CONNECTED, or STATUS_CANCELLED or STATUS_PENDING”. Hence we start the USB reset process which results in a blue screen crash.

Why does it result in a blue screen? If you haven’t received the
surprise removal notification yet, the driver stack should still be intact.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Sounds like the VMware virtual usb controller is getting return codes
wrong. This puts you in a difficult situation as your clue is the return
status for the urb, which is going to be asynchronous to any pnp
notification you could use. Are you also looking at the USBD_STATUS values
in the urb? Are they also wrong? It is possible VMware’s driver is
returning NTSTATUS either STATUS_SUCCESS or STATUS_UNSUCCESSFULL and not
reflecting the specific USBD status. Otherwise you might consider reporting
a bug with VMware, but unless there are a lot of licenses involved that
might not be too fruitful and approach.

Mark Roddy

On Wed, Jun 25, 2014 at 10:11 AM, wrote:

> Hi,
>
> We have a USB smartcard reader driver. Card tracking is done through
> interrupt, hence we always have a IRP pending on the interrupt endpoint. In
> order to implement device recovery mechanism during ESD scenario, (our
> reader firmware gets into an unknown state), hence we followed the below
> procedure.
>
> When ESD occurs, the pending interrupt IRP fails with STATUS_UNSUCCESSFUL.
> So we take this as an indication of ESD and initiate a USB reset which
> recovers the device (firmware resets itself, and starts running from the
> start and thus becomes responsive). When device removal occurs the pending
> IRP fails with status “STATUS_DEVICE_NOT_CONNECTED, or STATUS_CANCELLED or
> STATUS_PENDING”. In those cases we dont do a device reset.
>
> This implementation has worked properly across Oses Win7 to 8.1 for more
> that 4 to 5 years. Currently in one particular Virtual OS setup (our device
> is connected to Virtual OS [Windows 7-32bit], a VMWare setup, we dont have
> OS details from the customer), when the device is removed, the pending
> interrupt IRP fails with STATUS_UNSUCCESSFUL, instead of
> “STATUS_DEVICE_NOT_CONNECTED, or STATUS_CANCELLED or STATUS_PENDING”. Hence
> we start the USB reset process which results in a blue screen crash.
>
> At this point we have not even got the surprise removal notification,
> hence we are not able to differentiate between device removal and an ESD
> scenario. Is there a way to isolate the device removal properly in the IRP
> callback function.
>
> Thanks,
> Joseph
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Hi,

Thank you for the feedback. We are using a WorkItem to do the USB reset. Will this have an impact on the crash? Also, is there a way to conclude that the VMWare handover USB Controller driver is the root cause of the crash, as the actual crash shows our driver as the probable cause.

Am attaching the debug info from the crash dump.

---------------- snip ----------------------
kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time.
Arguments:
Arg1: 00000004, The power transition timed out waiting to synchronize with the Pnp
subsystem.
Arg2: 00000258, Timeout in seconds.
Arg3: 00000000, The thread currently holding on to the Pnp lock.
Arg4: 8078adf4, nt!TRIAGE_9F_PNP on Win7 and higher

Debugging Details:

DRVPOWERSTATE_SUBCODE: 4

DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT

BUGCHECK_STR: 0x9F

PROCESS_NAME: System

CURRENT_IRQL: 2

ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre

DPC_STACK_BASE: FFFFFFFF8078B000

LAST_CONTROL_TRANSFER: from 8332cdd7 to 8332fbfc

STACK_TEXT:
8078add4 8332cdd7 0000009f 00000004 00000258 nt!KeBugCheckEx+0x1e
8078ae00 8357bfb5 8078ae4c 832ca4d9 8a3ddab0 nt!PnpBugcheckPowerTimeout+0x57
8078ae08 832ca4d9 8a3ddab0 8a3ddad0 6c7ad7ec nt!PopBuildDeviceNotifyListWatchdog+0x16
8078ae4c 832ca47d 8337bd20 8078af78 00000002 nt!KiProcessTimerDpcTable+0x50
8078af38 832ca33a 8337bd20 8078af78 00000000 nt!KiProcessExpiredTimerList+0x101
8078afac 832c84ce 00015dfe 00000000 00000200 nt!KiTimerExpiration+0x25c
8078aff4 832c7c9c 807e0ae4 00000000 00000000 nt!KiRetireDpcList+0xcb
807e0b00 8321fb29 00000002 00000000 8337bd20 nt!KiDispatchInterrupt+0x2c
807e0b18 8321fba9 807e0c28 00000000 807e0b3c hal!HalpCheckForSoftwareInterrupt+0x83
807e0b28 832c78a4 849bfd48 807e0c28 00000000 hal!KfLowerIrql+0x61
807e0b3c 832c7809 832c7a00 00000001 00000000 nt!KiExitThreadWait+0x89
807e0bac 8d8d9101 807e0c28 00000000 00000000 nt!KeWaitForSingleObject+0x60d
807e0bfc 8d8e5321 85d32028 807e0c28 00000000 usbhub!UsbhWaitEventWithTimeoutEx+0x221
807e0c48 8d8e1531 00d320e0 8603246c 86032030 usbhub!UsbhSyncResetDeviceInternal+0x16e
807e0c6c 8d8c4ed1 86032030 86032030 86b16378 usbhub!UsbhFdoResetPdoPort+0x158
807e0c90 8d8c4b91 86032030 86b16478 86032030 usbhub!UsbhPdoInternalDeviceControl+0xe2
807e0ca4 83287c1e 86032030 86b16378 866890e0 usbhub!UsbhGenDispatch+0x4a
807e0cbc 875f88f1 866100b0 86689028 8338c3bc nt!IofCallDriver+0x63
807e0cec 83465613 00689028 866890e0 849bfd48 MYDRIVER!USBPipeResetThread+0xb5 [x:\driver\usb.c @ 1338] <– reader driver
807e0d00 832ce14b 866100b0 00000000 849bfd48 nt!IopProcessWorkItem+0x23
807e0d50 8345a12b 00000001 acb0a116 00000000 nt!ExpWorkerThread+0x10d
807e0d90 83301559 832ce03e 00000001 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19

---------------- snip ----------------------

Well you answered this in your original post. You are resetting the port
while the system is attempting to remove the device. Technically that
should not cause a bugcheck. Your driver can legitimately request a port
reset at any time, and asynchronous to that, surprise removal processing
can occur. Why the usb controller cannot handle this situation is a
question for the people at VMware.

You didn’t answer my question re the urb USBD_STATUS_* values. Are they
wrong too?

Summary: while it is an unusual scenario, I don’t think it is your bug. My
guess is that the VMware usb controller driver is not handling “port reset
while unplugged” correctly. And as you first noted, they are not providing
the correct NTSTATUS values.

Mark Roddy

On Thu, Jun 26, 2014 at 6:23 AM, wrote:

> Hi,
>
> Thank you for the feedback. We are using a WorkItem to do the USB reset.
> Will this have an impact on the crash? Also, is there a way to conclude
> that the VMWare handover USB Controller driver is the root cause of the
> crash, as the actual crash shows our driver as the probable cause.
>
> Am attaching the debug info from the crash dump.
>
> ---------------- snip ----------------------
> kd> !analyze -v
>
> *****
>
>
> * Bugcheck Analysis
>
>
>
>
>

>
> DRIVER_POWER_STATE_FAILURE (9f)
> A driver has failed to complete a power IRP within a specific time.
> Arguments:
> Arg1: 00000004, The power transition timed out waiting to synchronize with
> the Pnp
> subsystem.
> Arg2: 00000258, Timeout in seconds.
> Arg3: 00000000, The thread currently holding on to the Pnp lock.
> Arg4: 8078adf4, nt!TRIAGE_9F_PNP on Win7 and higher
>
> Debugging Details:
> ------------------
>
>
> DRVPOWERSTATE_SUBCODE: 4
>
> DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT
>
> BUGCHECK_STR: 0x9F
>
> PROCESS_NAME: System
>
> CURRENT_IRQL: 2
>
> ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre
>
> DPC_STACK_BASE: FFFFFFFF8078B000
>
> LAST_CONTROL_TRANSFER: from 8332cdd7 to 8332fbfc
>
> STACK_TEXT:
> 8078add4 8332cdd7 0000009f 00000004 00000258 nt!KeBugCheckEx+0x1e
> 8078ae00 8357bfb5 8078ae4c 832ca4d9 8a3ddab0
> nt!PnpBugcheckPowerTimeout+0x57
> 8078ae08 832ca4d9 8a3ddab0 8a3ddad0 6c7ad7ec
> nt!PopBuildDeviceNotifyListWatchdog+0x16
> 8078ae4c 832ca47d 8337bd20 8078af78 00000002 nt!KiProcessTimerDpcTable+0x50
> 8078af38 832ca33a 8337bd20 8078af78 00000000
> nt!KiProcessExpiredTimerList+0x101
> 8078afac 832c84ce 00015dfe 00000000 00000200 nt!KiTimerExpiration+0x25c
> 8078aff4 832c7c9c 807e0ae4 00000000 00000000 nt!KiRetireDpcList+0xcb
> 807e0b00 8321fb29 00000002 00000000 8337bd20 nt!KiDispatchInterrupt+0x2c
> 807e0b18 8321fba9 807e0c28 00000000 807e0b3c
> hal!HalpCheckForSoftwareInterrupt+0x83
> 807e0b28 832c78a4 849bfd48 807e0c28 00000000 hal!KfLowerIrql+0x61
> 807e0b3c 832c7809 832c7a00 00000001 00000000 nt!KiExitThreadWait+0x89
> 807e0bac 8d8d9101 807e0c28 00000000 00000000 nt!KeWaitForSingleObject+0x60d
> 807e0bfc 8d8e5321 85d32028 807e0c28 00000000
> usbhub!UsbhWaitEventWithTimeoutEx+0x221
> 807e0c48 8d8e1531 00d320e0 8603246c 86032030
> usbhub!UsbhSyncResetDeviceInternal+0x16e
> 807e0c6c 8d8c4ed1 86032030 86032030 86b16378
> usbhub!UsbhFdoResetPdoPort+0x158
> 807e0c90 8d8c4b91 86032030 86b16478 86032030
> usbhub!UsbhPdoInternalDeviceControl+0xe2
> 807e0ca4 83287c1e 86032030 86b16378 866890e0 usbhub!UsbhGenDispatch+0x4a
> 807e0cbc 875f88f1 866100b0 86689028 8338c3bc nt!IofCallDriver+0x63
> 807e0cec 83465613 00689028 866890e0 849bfd48
> MYDRIVER!USBPipeResetThread+0xb5 [x:\driver\usb.c @ 1338] <– reader driver
> 807e0d00 832ce14b 866100b0 00000000 849bfd48 nt!IopProcessWorkItem+0x23
> 807e0d50 8345a12b 00000001 acb0a116 00000000 nt!ExpWorkerThread+0x10d
> 807e0d90 83301559 832ce03e 00000001 00000000 nt!PspSystemThreadStartup+0x9e
> 00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
>
> ---------------- snip ----------------------
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>