See below…
> From: xxxxx@lists.osr.com [mailto:bounce-549868-
> xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
> Sent: Tuesday, January 14, 2014 12:33 PM
> To: Windows System Software Devs Interest List
> Subject: RE:[ntdev] KiPageFault into BSOD when stepping over
>
> > OK. Heres failing code:
> >
> > NTSTATUS
> > MyVolFltEvtDeviceAdd(In WDFDRIVER Driver, Inout PWDFDEVICE_INIT
> > DeviceInit)
> > /*++
> > Routine Description:
> >
> > EvtDeviceAdd is called by the framework in response to AddDevice
> > call from the PnP manager. We create and initialize a device object
> to
> > represent a new instance of the device.
> >
> > Arguments:
> > Driver - Handle to a framework driver object created in DriverEntry
> > DeviceInit - Pointer to a framework-allocated WDFDEVICE_INIT
> structure.
> > Return Value:
> > NTSTATUS
> > –*/
> > {
> > NTSTATUS status;
> >
> > PAGED_CODE();
> >
> > UNREFERENCED_PARAMETER(Driver);
> >
> > //PDEVICE_OBJECT Pdo =
WdfFdoInitWdmGetPhysicalDevice(DeviceInit);
> >
> > DECLARE_UNICODE_STRING_SIZE(name, 256);
…
>
> > ULONG retLen;
> > status = WdfFdoInitQueryProperty(DeviceInit,
> DevicePropertyClassGuid,
> > sizeof(name_buffer), name.Buffer, &retLen);
>
> >
> > I’m sure you understand its running on passive IRQL:
> > 1: kd> !irql
> > Debugger saved IRQL for processor 0x1 – 0 (LOW_LEVEL)
> >
> > FAULTING_SOURCE_FILE: c:\program files (x86)\windows
> > kits\8.1\include\wdf\kmdf\1.11\wdffdo.h
> >
> > FAULTING_SOURCE_LINE_NUMBER: 202
> >
> > FAULTING_SOURCE_CODE:
> > 198: PULONG ResultLength
> > 199: )
> > 200: {
> > 201: return ((PFN_WDFFDOINITQUERYPROPERTY)
> > WdfFunctions[WdfFdoInitQueryPropertyTableIndex])(WdfDriverGlobals,
> > DeviceInit, DeviceProperty, BufferLength, PropertyBuffer,
> > ResultLength);
>
> While it is nice that C and C++ let you compose long and complex
> expressions like this, you will find it a LOT easier to debug if you
> break
> this into about four lines of code
…> But it looks like you made one of the silliest possible errors. You
> read
> the documentation of the function, and it said “PULONG ResultLength”.
> So
> you assumed you had to have a variable of type PULONG that you would
> pass
> in, which means you have no idea how C works. So you declared a
> variable
> of type PULONG. Did you initialize it
>
> >> 202: }
>
> You need a good remedial course in the C language.
> joe
Sorry, Dr. Newcomer - but the crash is on return from inlined KMDF
function WdfFdoInitQueryProperty.
So - a) your stylistic criticism better be addressed to Microsoft and,
more importantly,
b) PULONG ResultLength is simply last one of function parameters.
So - C++ problems are actually absent.
I have no idea what that assertion means. The code is simply wrong,
W-R-O-N-G, big-time. The reason it is a Heisenbug is that single-stepping
can alter the state of the stack, so that when single-stepping, the stack
has a different garbage value for the uninitialized variable than if the
programmer lets it run.
The error is true whether the source code is C or C++, and is independent
of the number of parameters or the position of the variable in the
parameter list. The code is garbage. It has to be fixed. What is
amazing is that it has been out there for so long with this deep and
fundamental error in it, and nobody noticed!
What OP states is that he gets this bugcheck only if he breaks into
debugger somewhere before this function invocation and then steps over
(or into) this line of code and he has no bugcheck if there is no
debugger attached or if he breaks into debugger after this line.
See above explanation. Recall the rule that using a variable whose value
has not been established produces “undefined” results. The observed
behavior is an example of one of the possible outcomes of code this bad.
It could be much worse.
To OP: Andrii, and if you break into debugger before this line but just
run code instead of stepping? No crash? Or the same one?
Once the code is this broken, it doesn’t matter what the OP does.
Undefined behavior is the only possible outcome. The problem is not in
the debugger, it is in the fact that the code is deeply and irrecoverably
erroneous as written, and while the fix is trivial (remove the P from the
declaration and add & to the parameter name), until the code IS fixed, it
is simply not functional. The fact that it has not failed earlier is
nothing short of miraculous. Or maybe it was failing, leading to
unaccountable BSODs as the store through whatever pointer value was left
on the stack overwrote some random important piece of data.
There is no real choice here: if Microsoft wrote the code, Microsoft has
to fix the code. If I had realized this was Microsoft code and I was
writing a driver, I would exercise one of two choices: (a) stop all
development until the error was corrected or (b) avoid calling the broken
function while development continued. Simply calling this function opens
you up to random memory damage and undefined behavior. It is unusable as
written, assuming that the display we saw is the actual code.
I once worked with a developing compiler, that had the property that it
would frequently use 17 of the 16 available registers. When I saw this in
my code, I would simply put my development on hold until the compiler team
fixed the problem. When my boss’s boss took me to task because I’d
promised a port to the VAX “in a month”, and it was now six weeks in, I
pulled up my time sheets (I learned to keep careful time sheets) and
pointed out that I was still on time; I had thus far expended fewer than
five days on the project. When he demanded to know why, I pointedly said
that the use-17-of-the-16-register bug was what killed it every time, and
I could not debug my code when the compiler compiled it incorrectly. I
was heavy on the sarcasm because I knew that he was the person who was
writing the register allocator, and it was his bug. I ended the meeting
by saying “When I get a working compiler, I expect it will take fewer than
5 more days to port it, which means my estimate was off by two weeks.” I
then added, “Please let me know when we have a working compiler and I will
try it again”. It took two more weeks to fix that bug, and it took me
three days to finish the port.
This bug is a fatal bug. No progress can be made until it is fixed,
unless progress can be made without calling that function. Somebody at
Microsoft had better get a serious fire lit under them to get a fix out
for this no later than last Tuesday.
joe
Best regards,
Alex Krol
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminarsTo unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer