UMDF unhandled exception

Hi,

I’m developing a UMDF driver based on the FX2 sample.
Multiple applications use the driver, and they all should receive all the data from the device.
When an application performs a read, the request is put in a manual queue associated with the application. Meanwhile, another thread keeps sending read requests to the device.
When a request is completed, OnCompletion either pulls the pending request of each application from its queue and completes it, or, if the application wasn’t quick enough to put one in the queue, OnCompletion puts it in a linked list of buffers. When the next OnRead will happen, the data will retrieved directly from this linked list instead of putting it in the manual queue.
So far so good. This has been working fine for a long time.
Now I’ve encountered a new scenario where after sleep this sometimes fails.
On D0 exit, I end the pending read thread, and restart it on D0 entry.
Several applications are performing read and Ioctl simultaneously.
The driver fails on an WUDF Unhandled Exception 0xc0000005 (I think it’s Access Violation) in
hr = it->ReadQueue->RetrieveNextRequest(
&fxRequest
);
I didn’t see in the documentation anything about exceptions which the method throws.
What could be causing this?
Should I catch the exception and override it in some way?

In addition, sometimes when I put breakpoints in the windbg, even though they are highlighted in red, and I know the code there is executed (I have debug messages in the WinDbg command window), the debugger doesn’t stop there. What could be causing this?

Thanks,
Gadi

None of the UMDF functions throw exceptions that you should expect to catch. If you’re getting an exception back then either there’s a bug in your code (perhaps you’re using an invalid interface pointer or passing in an invalid value) or a bug in the framework code.

Under no circumstances should you catch exceptions from the framework code and override/dismiss/handle them in any way, unless explicitly directed to in the documentation for the method. This applies in general to all programming in my mind.

What is the stack trace in the host when the exception happens? Can you send us a dump file?

Thanks,
-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@n-trig.com
Sent: Wednesday, August 29, 2007 9:31 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] UMDF unhandled exception

Hi,

I’m developing a UMDF driver based on the FX2 sample.
Multiple applications use the driver, and they all should receive all the data from the device.
When an application performs a read, the request is put in a manual queue associated with the application. Meanwhile, another thread keeps sending read requests to the device.
When a request is completed, OnCompletion either pulls the pending request of each application from its queue and completes it, or, if the application wasn’t quick enough to put one in the queue, OnCompletion puts it in a linked list of buffers. When the next OnRead will happen, the data will retrieved directly from this linked list instead of putting it in the manual queue.
So far so good. This has been working fine for a long time.
Now I’ve encountered a new scenario where after sleep this sometimes fails.
On D0 exit, I end the pending read thread, and restart it on D0 entry.
Several applications are performing read and Ioctl simultaneously.
The driver fails on an WUDF Unhandled Exception 0xc0000005 (I think it’s Access Violation) in
hr = it->ReadQueue->RetrieveNextRequest(
&fxRequest
);
I didn’t see in the documentation anything about exceptions which the method throws.
What could be causing this?
Should I catch the exception and override it in some way?

In addition, sometimes when I put breakpoints in the windbg, even though they are highlighted in red, and I know the code there is executed (I have debug messages in the WinDbg command window), the debugger doesn’t stop there. What could be causing this?

Thanks,
Gadi


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi Gadi,

You are not supposed to handle any exceptions. Any exceptions out of
driver or UMDF are fatal and are not supposed to be handled (unless
driver code throws its own exceptions which it wants to handle). At such
point UMDF host would be terminated (similar to bugcheck in kernel
except that it won’t take the machine down).

Are you properly synchronizing run down of the pending read thread (i.e.
don’t leave an outstanding read which could later complete and touch
freed memory)? You should run with AppVerifier enabled to attempt to
catch the problem where it happens as opposed to some later side effect.

Once you have appverifier enabled, attach a debugger (user or kernel)
and on AV there should be a break into the debugger. Please send the
stack with proper symbols loaded for UMDF (not necessarily for your
driver) and we can look into it more.

Thanks,
Praveen

The exception I caught with the AppVerifier is “Corrupted suffix pattern for heap block”.

This happens with 2 specific applications, and not with others, something which raises a few questions:

  1. If this is an application issue, why does all the driver “flies”, and not just the application which caused the issue?
  2. Heap corruption has nothing to do with entering sleep mode. Why does it only appear after sleep (and not during normal operation, or after selective suspend)?
  3. Praveen, what do you mean by “synchronizing run down of the pending read thread (i.e. don’t leave an outstanding read which could later complete and touch freed memory”? Should I purge all the manual queues and delete all the linked-list data structures (which I formed to overcome slow application read requests) every time the driver exits D0? After all, the memory is not supposed to be freed just because the device exited D0. It should resume on D0 entry from the same memory snapshot as it was when it exited D0.

Below you can find more info which appeared in the debugger. I can send you the UMDF dump and the symbols, but I guess from what I see that this is more of an application issue (correct me if I’m wrong), so let me know if you still need them.
Thanks,
Gadi
Page heap: pid 0xA44: page heap enabled with flags 0x3.
AVRF: NtrigApplet.exe: pid 0xA44: flags 0x80000001: application verifier enabled
Page heap: pid 0xA4C: page heap enabled with flags 0x3.
AVRF: quickset.exe: pid 0xA4C: flags 0x80000001: application verifier enabled

=======================================
VERIFIER STOP 0000000F : pid 0xA44: Corrupted suffix pattern for heap block.

04541000 : Heap handle used in the call.
045C6F70 : Heap block involved in the operation.
0000008F : Size of the heap block.
045C6FFF : Reserved.

=======================================
This verifier stop is not continuable. Process will be terminated
when you use the `go’ debugger command.

=======================================

Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
001b:773a2ea8 cc int 3
kd> !heap -p -a 045C6F70

address 045c6f70 found in
_DPH_HEAP_ROOT @ 4541000
in busy allocation ( DPH_HEAP_BLOCK: UserAddr UserSize - VirtAddr VirtSize)
45421c0: 45c6f70 8f - 45c6000 2000
773ddc7e ntdll!RtlpAllocateHeap+0x000000c6
773c214c ntdll!RtlAllocateHeap+0x000001e3
6ca6b8d0 +0x6ca6b8d0

Hi,

I’ve continued investigating the exception, and found that it was emanating from a NULL pointer which wasn’t supposed to be null.

This pointer is a new queue being created with the following function call:

hr = m_FxDevice->CreateIoQueue(pQueueCallbackInterface,

FALSE,

WdfIoQueueDispatchManual,

false,

false,

&ReadQueueItemToAdd->ReadQueue);

From a certain point and on the error code returned is 723. According to WinError.h this is an ERROR_ARBITRATION_UNHANDLED (The arbiter has deferred arbitration of these resources to its parent)

I’ve noticed it happens when the system wakes from sleep (but this might be because this is the specific scenario I’ve been testing).

Later on I get a lot of the following message:

Power Irp Watchdog: warning for PDO=858C7C90 Current=8EA2E020 IRP=8444F680 (2) status c00000bb

Power Irp Watchdog: warning for PDO=858C7C90 Current=8EA2E020 IRP=8444F680 (2) status c00000bb

And finally the whole thing results in a bug check and a BLOD.

Please tell me the reason this happens and how to solve\override\avoid it.

Thanks,

Gadi

*** Fatal System Error: 0x0000009f

(0x00000003,0x858C7C90,0x8EA2E020,0x8444F680)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.

Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

Connected to Windows Vista 6000 x86 compatible target, ptr64 FALSE

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 9F, {3, 858c7c90, 8ea2e020, 8444f680}

Probably caused by : WUDFRd.sys

Followup: MachineOwner


nt!RtlpBreakWithStatusInstruction:

81c81770 cc int 3

kd> !analyze -v

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)

A driver is causing an inconsistent power state.

Arguments:

Arg1: 00000003, A device object has been blocking an Irp for too long a time

Arg2: 858c7c90, Physical Device Object of the stack

Arg3: 8ea2e020, Functional Device Object of the stack

Arg4: 8444f680, The blocked IRP

Debugging Details:


DRVPOWERSTATE_SUBCODE: 3

DEVICE_OBJECT: 8ea2e020

DRIVER_OBJECT: 8a3dcbc8

IMAGE_NAME: WUDFRd.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4549b25b

MODULE_NAME: WUDFRd

FAULTING_MODULE: 90c6b000 WUDFRd

DEFAULT_BUCKET_ID: VISTA_RC

BUGCHECK_STR: 0x9F

PROCESS_NAME: Idle

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 81cd86e3 to 81c81770

STACK_TEXT:

81cf1734 81cd86e3 00000003 81cfacc4 00000000 nt!RtlpBreakWithStatusInstruction

81cf1784 81cd9150 00000003 8444f680 96c71058 nt!KiBugCheckDebugBreak+0x1c

81cf1b30 81cd856d 0000009f 00000003 858c7c90 nt!KeBugCheck2+0x5f4

81cf1b54 81c4af78 0000009f 00000003 858c7c90 nt!KeBugCheckEx+0x1e

81cf1bb0 81c5044b 81cf1cbc 81cf1c88 00000001 nt!PopCheckIrpWatchdog+0x165

81cf1bf0 81ca98d1 81d09fa0 00000000 2e7ef880 nt!PopCheckForIdleness+0x33f

81cf1ce8 81ca9221 00000000 00000000 00017713 nt!KiTimerExpiration+0x498

81cf1d50 81c9128e 00000000 0000000e 00000000 nt!KiRetireDpcList+0xba

81cf1d54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x46

STACK_COMMAND: kb

FOLLOWUP_NAME: MachineOwner

FAILURE_BUCKET_ID: 0x9F_IMAGE_WUDFRd.sys_DATE_2006_11_02

BUCKET_ID: 0x9F_IMAGE_WUDFRd.sys_DATE_2006_11_02

Followup: MachineOwner