USB reliablity/race condition problem

I encountered some race condition/reliability problem on sending IRP onto USB cable.
I adopt WDM to send out the synchronous IRP. The code is listed below.

UsbBuildVendorRequest(_pVendorRequestURB, URB_FUNCTION_VENDOR_DEVICE,
sizeof(_URB_CONTROL_VENDOR_OR_CLASS_REQUEST),
transferFlags, 0, request, value, index,
buffer, NULL, bufferLength, NULL);

KeInitializeEvent(&event, NotificationEvent, FALSE);

pIRP = IoBuildDeviceIoControlRequest(IOCTL_INTERNAL_USB_SUBMIT_URB,
_topUsbStackDevice, NULL, 0, NULL, 0, TRUE, &event, &ioStatus);

nextStack = IoGetNextIrpStackLocation(pIRP);
nextStack->Parameters.Others.Argument1 = pURB;
ntStatus = IoCallDriver(_topUsbStackDevice, pIRP);

if (ntStatus == STATUS_PENDING)
{
// IDEA: consider using sanity timer here also
KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);
ntStatus = ioStatus.Status;
}

Basically we are doing write and read alternatively and continuously. By using USB analyzer, I am confirmed that after billions of times of reads and writes, one of the read package is not the sent onto the USB cable. The IoCallDriver returns success, but from the analyzer I observe the read never went out. Because the read never actually occurred, the buffer is not touched.

I could reproduce the failure case with different host/device/USB cable combination so I think it is not a hardware connection problem.

I am confirmed that in the failure case, all the code above are exercised - as same as the normal case, the content of pIRP and _pVendorRequestURB are as same as the normal case.

After the failure case, the read and write still works well.

I am confused by Windows returning us a success on a read but the package being not sent onto the USB cable. Does anyone encounter the similar problems before? Does anyone have any recommendations on further debugging steps I could take on this issue?

On 8/14/2009 9:42 AM, xxxxx@hotmail.com wrote:

I could reproduce the failure case with different host/device/USB
cable combination so I think it is not a hardware connection problem.

Please make sure you test also on a PC that does not use an Intel ICH
as USB root. From customer “USB problems” we see that some of the ICHs
can cause problems that manifest only with USB2.

What version(s) of Windows?

xxxxx@hotmail.com wrote:

I encountered some race condition/reliability problem on sending IRP onto USB cable.
I adopt WDM to send out the synchronous IRP. The code is listed below.

UsbBuildVendorRequest(_pVendorRequestURB, URB_FUNCTION_VENDOR_DEVICE,
sizeof(_URB_CONTROL_VENDOR_OR_CLASS_REQUEST),
transferFlags, 0, request, value, index,
buffer, NULL, bufferLength, NULL);

KeInitializeEvent(&event, NotificationEvent, FALSE);

pIRP = IoBuildDeviceIoControlRequest(IOCTL_INTERNAL_USB_SUBMIT_URB,
_topUsbStackDevice, NULL, 0, NULL, 0, TRUE, &event, &ioStatus);

nextStack = IoGetNextIrpStackLocation(pIRP);
nextStack->Parameters.Others.Argument1 = pURB;
ntStatus = IoCallDriver(_topUsbStackDevice, pIRP);

if (ntStatus == STATUS_PENDING)
{
// IDEA: consider using sanity timer here also
KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);
ntStatus = ioStatus.Status;
}

Basically we are doing write and read alternatively and continuously. By using USB analyzer, I am confirmed that after billions of times of reads and writes, one of the read package is not the sent onto the USB cable. The IoCallDriver returns success, but from the analyzer I observe the read never went out. Because the read never actually occurred, the buffer is not touched.

I could reproduce the failure case with different host/device/USB cable combination so I think it is not a hardware connection problem.

I am confirmed that in the failure case, all the code above are exercised - as same as the normal case, the content of pIRP and _pVendorRequestURB are as same as the normal case.

After the failure case, the read and write still works well.

I am confused by Windows returning us a success on a read but the package being not sent onto the USB cable. Does anyone encounter the similar problems before? Does anyone have any recommendations on further debugging steps I could take on this issue?

> What version(s) of Windows?

The problem exists on Windows 2K, XP and Vista.

Let’s see if there’s any other interesting information…

What is the value of the USBD status in the URB after completion?
Additionally there is IRP status which ought to be STATUS_SUCCESS given
that’s returned by IoCallDriver, but it would be good to know if it’s
different too.

Have you noticed any pattern to how many I/Os it takes to repro the bad
read? If it’s MAX_INT or similar that info would be useful.

Do you repro every time? About how long does it take in wall clock
time? About how many bytes have been transferred?

As for the OS, Win7 would be better but I’ll see what I can find about
Vista for now. The USB stack is improved in Win7 and it has ETW tracing
so we could see if the stack logged any exception/error around the time
of the bad read.

xxxxx@hotmail.com wrote:

> What version(s) of Windows?

The problem exists on Windows 2K, XP and Vista.

Hi Philip,

I guess by “USBD status” you refer to the USBD_STATUS Status in the _URB_HEADER. I didn’t notice the what the exact value it is. But I did check that the the value of it is as same as the success case.

The IoCallDriver also returns STATUS_SUCCESS in the failure case.

The I/Os number when the failure happens are random, it is different on different machine. Basically the faster machine, the number is smaller. On a Dell Precision 390 with Core 2 Duo 2.13Ghz, it happens after billions of write/read, it is usually about 3-10 hours. On a Dell Optiplex GX620 with P4 3.0Ghz, it happens after ten millions of write/read, usually about 30mintues or less.

For Win7, I didn’t have a chance to reproduce it on Win7. But I did enable the driver verifier on Vista with IRP logging, I/O verification and Enhanced I/O verification enabled. No fault is found in the failure case.

So next I should try to reproduce it with “ETW tracing” enabled in Windows 7?

Is _pVendorRequestUrb->TransferBufferLength zero or non-zero after you
get the transfer back with the unmodified buffer?

If it’s non-zero:

On Windows 7 RTM, we would be interested to see a kernel memory dump if
you can take one as soon as you repro (right after the request returns
with an unmodified buffer and non-zero transfer length). To cause the
memory dump, you can either have the driver conditionally call
KeBugCheck, or set a breakpoint and use “.crash”

xxxxx@hotmail.com wrote:

Hi Philip,

I guess by “USBD status” you refer to the USBD_STATUS Status in the _URB_HEADER. I didn’t notice the what the exact value it is. But I did check that the the value of it is as same as the success case.

The IoCallDriver also returns STATUS_SUCCESS in the failure case.

The I/Os number when the failure happens are random, it is different on different machine. Basically the faster machine, the number is smaller. On a Dell Precision 390 with Core 2 Duo 2.13Ghz, it happens after billions of write/read, it is usually about 3-10 hours. On a Dell Optiplex GX620 with P4 3.0Ghz, it happens after ten millions of write/read, usually about 30mintues or less.

For Win7, I didn’t have a chance to reproduce it on Win7. But I did enable the driver verifier on Vista with IRP logging, I/O verification and Enhanced I/O verification enabled. No fault is found in the failure case.

So next I should try to reproduce it with “ETW tracing” enabled in Windows 7?

I don’t remember the TransferBufferLength now and I will get a quick reproduce it tomorrow.

But I am sure all the member data in URB in failure case are as same as in the success case. So if is non-zero in success case then it must be non-zero in failure case.

If it does be non-zero after I repro it, it may take a while for me to get a Windows 7 and set it up. The question is once I get the dump file, how can I send it to you or how can I analyze it by myself?

Thanks.