KiPageFault into BSOD when stepping over

Good time, gentlemen.

I’m constantly running into Bug Check with the following stack:
00 ffffd00020463d78 fffff8010fa520ea nt!DbgBreakPointWithStatus
01 ffffd00020463d80 fffff8010fa519fb nt!KiBugCheckDebugBreak+0x12
02 ffffd00020463de0 fffff8010f9c9da4 nt!KeBugCheck2+0x8ab
03 ffffd000204644f0 fffff8010f9f1b1f nt!KeBugCheckEx+0x104
04 ffffd00020464530 fffff8010f8b85ad nt! ?? ::FNODOBFM::string'+0x1797f 05 ffffd000204645d0 fffff8010f9d3f2f nt!MmAccessFault+0x7ed 06 ffffd00020464710 fffff8000034a2e3 nt!KiPageFault+0x12f 07 ffffd000204648a0 fffff80000e9441f Wdf01000!imp_WdfFdoInitQueryProperty+0x28 08 ffffd000204648f0 fffff80000e9a17f MyVolFlt!WdfFdoInitQueryProperty+0x5f 09 ffffd00020464940 fffff8000031055b MyVolFlt!MyVolFltEvtDeviceAdd+0x9f 0a ffffd00020464bd0 fffff801`0f9449d9 Wdf01000!FxDriver::AddDevice+0xab

This happens ONLY when I step into/over WdfFdoInitQueryProperty. Breaking into debugger after this invocation produces no bug checks.

I’ve run into this problem multiple times in different places (of this module and other modules). Can’t figure out whats wrong,
1: kd> !irql
Debugger saved IRQL for processor 0x1 – 0 (LOW_LEVEL)
1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: ffffe00020464c10, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff8000034a2e3, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000002, (reserved)

1: kd> !pool ffffe00020464c10
Pool page ffffe00020464c10 region is Nonpaged pool
ffffe00020464000 is not a valid large pool allocation, checking large session pool…
Unable to read large session pool table (Session data is not present in mini and kernel-only dumps)
ffffe00020464000 is not valid pool. Checking for freed (or corrupt) pool
Address ffffe00020464000 could not be read. It may be a freed, invalid or paged out page

1: kd> ? poi(DeviceInit)
Evaluate expression: -35183830610928 = ffffe000`20464c10

Is this somehow connected with kd? How can I avoid this bugcheck?

Thanks.

THis is an access to missing or paged memory at high IRQL. It cannot be stepped over.

Find our from the crashdump or from live debugger why the memory is missing or accessed on wrong IRQL.

Plus the usual advice on using Driver Verifier and Special Pool.

The question is why there is a pointer to the invalid page. It could be a
dangling pointer following a release of storage, or bad pointer
arithmetic. Some more context, such as the source code near the point of
failure, would help; if you show variable names, you need to show the code
that sets their values.
joe

THis is an access to missing or paged memory at high IRQL. It cannot be
stepped over.

Find our from the crashdump or from live debugger why the memory is
missing or accessed on wrong IRQL.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OK. Heres failing code:

NTSTATUS
MyVolFltEvtDeviceAdd(In WDFDRIVER Driver, Inout PWDFDEVICE_INIT DeviceInit)
/*++
Routine Description:

EvtDeviceAdd is called by the framework in response to AddDevice
call from the PnP manager. We create and initialize a device object to
represent a new instance of the device.

Arguments:
Driver - Handle to a framework driver object created in DriverEntry
DeviceInit - Pointer to a framework-allocated WDFDEVICE_INIT structure.
Return Value:
NTSTATUS
–*/
{
NTSTATUS status;

PAGED_CODE();

UNREFERENCED_PARAMETER(Driver);

//PDEVICE_OBJECT Pdo = WdfFdoInitWdmGetPhysicalDevice(DeviceInit);

DECLARE_UNICODE_STRING_SIZE(name, 256);
ULONG retLen;
status = WdfFdoInitQueryProperty(DeviceInit, DevicePropertyClassGuid, sizeof(name_buffer), name.Buffer, &retLen);

I’m sure you understand its running on passive IRQL:
1: kd> !irql
Debugger saved IRQL for processor 0x1 – 0 (LOW_LEVEL)

FAULTING_SOURCE_FILE: c:\program files (x86)\windows kits\8.1\include\wdf\kmdf\1.11\wdffdo.h

FAULTING_SOURCE_LINE_NUMBER: 202

FAULTING_SOURCE_CODE:
198: PULONG ResultLength
199: )
200: {
201: return ((PFN_WDFFDOINITQUERYPROPERTY) WdfFunctions[WdfFdoInitQueryPropertyTableIndex])(WdfDriverGlobals, DeviceInit, DeviceProperty, BufferLength, PropertyBuffer, ResultLength);

202: }

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: ffffe00020464c10, memory referenced.

Arg1 is DeviceInit.

Is this a debugger side-effect? Since it doesn’t happen when there is no debugger attached or when I just don’t step over at that point.

> OK. Heres failing code:

NTSTATUS
MyVolFltEvtDeviceAdd(In WDFDRIVER Driver, Inout PWDFDEVICE_INIT
DeviceInit)
/*++
Routine Description:

EvtDeviceAdd is called by the framework in response to AddDevice
call from the PnP manager. We create and initialize a device object to
represent a new instance of the device.

Arguments:
Driver - Handle to a framework driver object created in DriverEntry
DeviceInit - Pointer to a framework-allocated WDFDEVICE_INIT structure.
Return Value:
NTSTATUS
–*/
{
NTSTATUS status;

PAGED_CODE();

UNREFERENCED_PARAMETER(Driver);

//PDEVICE_OBJECT Pdo = WdfFdoInitWdmGetPhysicalDevice(DeviceInit);

DECLARE_UNICODE_STRING_SIZE(name, 256);

I am sort of curious how you determined that 256 is a valid value here.
In app space, it is 261 characters, and there is a manifest constant that
is used to get this value, MAX_PATH. I presume there is a similar
constant defined in ntddk.h, and that’s what you should use here.

ULONG retLen;
status = WdfFdoInitQueryProperty(DeviceInit, DevicePropertyClassGuid,
sizeof(name_buffer), name.Buffer, &retLen);

I don’t know the spec for this, but in app space it is a common error to
use sizeof() on a Unicode string but the APIs want character count, not
byte count. In the kernel, string counts in UNICODE_STRINGs are in bytes,
but be sure you double-check this. And it would probably make sense that
you store the result in the name.Length field. But what is name_buffer?
I don’t see it declared anywhere, and the UNICODE_STRING would not have a
meaningful sizeof(). The value is likely to be name.MaxLength or whatever
the field is (I can’t check it right now, and I’m trusting a very rusty
memory that has not had to look at a UNICODE_STRING in several years. I
note that retLen is not declared anywhere, either. It should be a local
variable. You can’t post code that has undefined variables; we don’t know
where they are declared, or even their types.

I’m sure you understand its running on passive IRQL:
1: kd> !irql
Debugger saved IRQL for processor 0x1 – 0 (LOW_LEVEL)

FAULTING_SOURCE_FILE: c:\program files (x86)\windows
kits\8.1\include\wdf\kmdf\1.11\wdffdo.h

FAULTING_SOURCE_LINE_NUMBER: 202

FAULTING_SOURCE_CODE:
198: PULONG ResultLength
199: )
200: {
201: return ((PFN_WDFFDOINITQUERYPROPERTY)
WdfFunctions[WdfFdoInitQueryPropertyTableIndex])(WdfDriverGlobals,
DeviceInit, DeviceProperty, BufferLength, PropertyBuffer,
ResultLength);

While it is nice that C and C++ let you compose long and complex
expressions like this, you will find it a LOT easier to debug if you break
this into about four lines of code. It becomes easier to debug. Even in
C, you can make this more readable by declaring any variables you need in
a local scope, e.g.,
{
SOMETYPE st;
ANOTHERVAR v;
st = …some computation…;
v = …some computation based on st…;
return v;
}

In C++, you don’t need to declare the variables until they are used, but I
often use this technique to give very limited scope to temporary variables
I might need.

There are many things that can go wrong here; WdfFunctions at that index
may have an invalid address or have been damaged by some bad pointer work.

But it looks like you made one of the silliest possible errors. You read
the documentation of the function, and it said “PULONG ResultLength”. So
you assumed you had to have a variable of type PULONG that you would pass
in, which means you have no idea how C works. So you declared a variable
of type PULONG. Did you initialize it? If you did not initialize it, it
holds garbage. If your luck is good, the value that is in this
uninitialized variable will cause a BSOD; if your luck is bad, it will be
a pointer to something important and that something important will be
clobbered.

The correct way, which you would know if you understood C/C++, would be

ULONG ResultLength;

…(WdfDriverGlobals, …, &ResultLength)

I suggest reading about pointers in C, and fully understand what a pointer
is, and does, and how they are created. &ResultLength creates a PULONG
referencing the ULONG ResultLength. This is beginner’s C knowledge.
Learn the language you are programming in. The specification of a type of
an argument to a function DOES NOT MEAN YOU NEED A VARIABLE OF THAT TYPE.
It means you need an /expression/ of that type. So, for example, if
ResultLength is a ULONG, the expression &ResultLength is a PULONG. And
that satisfies the requirement of the function prototype. You simply
passed an uninitialized variable in; if you had set your warning level to
4, I think it would have caught this, and certainly if you used the
/ANALYZE option which runs The Program Formerly Known As Prefast, that
would definitely have caught it. So you need to understand how to
properly use the tool chain that creates a driver. /W3 is simply
inadequate for most serious programming.

> 202: }

PAGE_FAULT_IN_NONPAGED_AREA (50)

Yep, you have been lucky. You got a BSOD instead of clobbering something
important.

You need a good remedial course in the C language.
joe

Invalid system memory was referenced. This cannot be protected by
try-except,
it must be protected by a Probe. Typically the address is just plain bad
or it
is pointing at freed memory.
Arguments:
Arg1: ffffe00020464c10, memory referenced.

Arg1 is DeviceInit.

Is this a debugger side-effect? Since it doesn’t happen when there is no
debugger attached or when I just don’t step over at that point.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Joe, you are commenting on wdffdo.h here. Not on his code, which entirely
invalidates the harangue.

//Daniel

Joe wrote in message news:xxxxx@ntdev…

But it looks like you made one of the silliest possible errors. You read
the documentation of the function, and it said “PULONG ResultLength”. So
you assumed you had to have a variable of type PULONG that you would
passin, which means you have no idea how C works.

The correct way, which you would know if you understood C/C++, would be
ULONG ResultLength;
…(WdfDriverGlobals, …, &ResultLength)
I suggest reading about pointers in C, and fully understand what a pointer
is, and does, and how they are created. &ResultLength creates a PULONG
referencing the ULONG ResultLength. This is beginner’s C knowledge.
Learn the language you are programming in. The specification of a type of
an argument to a function DOES NOT MEAN YOU NEED A VARIABLE OF THAT TYPE.
It means you need an /expression/ of that type. So, for example, if
ResultLength is a ULONG, the expression &ResultLength is a PULONG. And

> From: xxxxx@lists.osr.com [mailto:bounce-549868-

xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
Sent: Tuesday, January 14, 2014 12:33 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] KiPageFault into BSOD when stepping over

> OK. Heres failing code:
>
> NTSTATUS
> MyVolFltEvtDeviceAdd(In WDFDRIVER Driver, Inout PWDFDEVICE_INIT
> DeviceInit)
> /*++
> Routine Description:
>
> EvtDeviceAdd is called by the framework in response to AddDevice
> call from the PnP manager. We create and initialize a device object
to
> represent a new instance of the device.
>
> Arguments:
> Driver - Handle to a framework driver object created in DriverEntry
> DeviceInit - Pointer to a framework-allocated WDFDEVICE_INIT
structure.
> Return Value:
> NTSTATUS
> –*/
> {
> NTSTATUS status;
>
> PAGED_CODE();
>
> UNREFERENCED_PARAMETER(Driver);
>
> //PDEVICE_OBJECT Pdo =
WdfFdoInitWdmGetPhysicalDevice(DeviceInit);
>
> DECLARE_UNICODE_STRING_SIZE(name, 256);

> ULONG retLen;
> status = WdfFdoInitQueryProperty(DeviceInit,
DevicePropertyClassGuid,
> sizeof(name_buffer), name.Buffer, &retLen);

>
> I’m sure you understand its running on passive IRQL:
> 1: kd> !irql
> Debugger saved IRQL for processor 0x1 – 0 (LOW_LEVEL)
>
> FAULTING_SOURCE_FILE: c:\program files (x86)\windows
> kits\8.1\include\wdf\kmdf\1.11\wdffdo.h
>
> FAULTING_SOURCE_LINE_NUMBER: 202
>
> FAULTING_SOURCE_CODE:
> 198: PULONG ResultLength
> 199: )
> 200: {
> 201: return ((PFN_WDFFDOINITQUERYPROPERTY)
> WdfFunctions[WdfFdoInitQueryPropertyTableIndex])(WdfDriverGlobals,
> DeviceInit, DeviceProperty, BufferLength, PropertyBuffer,
> ResultLength);

While it is nice that C and C++ let you compose long and complex
expressions like this, you will find it a LOT easier to debug if you
break
this into about four lines of code

But it looks like you made one of the silliest possible errors. You
read
the documentation of the function, and it said “PULONG ResultLength”.
So
you assumed you had to have a variable of type PULONG that you would
pass
in, which means you have no idea how C works. So you declared a
variable
of type PULONG. Did you initialize it

>> 202: }

You need a good remedial course in the C language.
joe
Sorry, Dr. Newcomer - but the crash is on return from inlined KMDF
function WdfFdoInitQueryProperty.
So - a) your stylistic criticism better be addressed to Microsoft and,
more importantly,
b) PULONG ResultLength is simply last one of function parameters.
So - C++ problems are actually absent.

What OP states is that he gets this bugcheck only if he breaks into
debugger somewhere before this function invocation and then steps over
(or into) this line of code
and he has no bugcheck if there is no
debugger attached or if he breaks into debugger after this line.

To OP: Andrii, and if you break into debugger before this line but just
run code instead of stepping? No crash? Or the same one?

Best regards,
Alex Krol

> Joe, you are commenting on wdffdo.h here. Not on his code, which entirely

invalidates the harangue.

//Daniel

That was not apparent to me. You would not believe how many times I see
this error. Every programmer with marginal C knowledge makes it. After I
explain it to them, I get some of the following

ULONG result;
PULONG presult = &result;

and when I ask why, they tell me the function prototype requires a PULONG
variable!

Certainly the second-worst example was

PULONG result = new ULONG;
…function using result…
…do things with *result
delete result;

and the worst (which was also in C++) wasa

PULONG result = (PULONG)malloc(sizeof(ULONG));
…function using result…
…do things with *result…
free(result);

I would have at least three students per class who did some variant of
these; they did not comprehend the concept of pointers, initialized
variables, or how to read function prototypes. Many of these had more
than a decade of C programming experience, and one complained that he had
found pointers so esoteric that he never understood why anyone would want
to use them [let us not digress into a discussion of
pointers-vs-references, e.g., a Java/C# vs C/C++ discussion…]. With ten
years’ experience, he was also struct-challenged, and the notion of union
was just so much noise. If he wanted an array of multivalued objects, he
would create an array of int, and array of bool, an array of… instead of
declaring a struct and making it an array.

Maybe I’m just frustrated because I have lost six of the last ten days to
illnesses of various sorts, including two hospital stays. But I think
that code as we saw it shows a serious defect in thinking. And the OP
should have spotted that error.

I didn’t think that any of the WDF source was available, so seeing a
newbie mistake like this caused me to think it was the OP’s code. If this
is Microsoft code, some manager somewhere should catch pluperfect hell for
either (a) not catching this or (b) not realizing his programmers were so
undertrained that they could make this kind of error.

In my Advanced Systems Programming course, I even devoted six slides to
this problem, only to have students make the same error on their very
first lab. Some of them just don’t get pointers at all!
joe

>Joe wrote in message news:xxxxx@ntdev…
>
>But it looks like you made one of the silliest possible errors. You read
>the documentation of the function, and it said “PULONG ResultLength”. So
>you assumed you had to have a variable of type PULONG that you would
>passin, which means you have no idea how C works.
>
>The correct way, which you would know if you understood C/C++, would be
>ULONG ResultLength;
>…(WdfDriverGlobals, …, &ResultLength)
>I suggest reading about pointers in C, and fully understand what a
> pointer
>is, and does, and how they are created. &ResultLength creates a PULONG
>referencing the ULONG ResultLength. This is beginner’s C knowledge.
>Learn the language you are programming in. The specification of a type
> of
>an argument to a function DOES NOT MEAN YOU NEED A VARIABLE OF THAT TYPE.
>It means you need an /expression/ of that type. So, for example, if
>ResultLength is a ULONG, the expression &ResultLength is a PULONG. And


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Hello, Alex.

kd> bp MyVolFlt!MyVolFltEvtDeviceAdd
kd> g
Break instruction exception - code 80000003 (first chance)
MyVolFlt!MyVolFltEvtDeviceAdd:
fffff800`011be0e0 cc int 3
1: kd> g
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x00000050
(0xFFFFE00020464C10,0x0000000000000000,0xFFFFF800002692E3,0x0000000000000002)

So the same :frowning: And its not about only this code. I get similar bugchecks while debugging other drivers (this one is just a skeleton project with nearly no real work performed).

If I set bp right after WdfFdoInitQueryProperty call - it runs like a charm.

Probably I must mention my environment:

  1. Host - Windows 8.1 Enterprise x64 with Hyper-V
  2. Target - Windows 8.1 Enterprise x64 under Hyper-V
  3. WinDbg 6.3.9600.16384 (WDK 8.1)
  4. VM got COM1 configured as a pipe for KD
  5. bootdebug is enabled
  6. Target is FRE build

My only guess was Dynamic Memory in VM configuration. So I disabled it. Still no luck.

See below…

> From: xxxxx@lists.osr.com [mailto:bounce-549868-
> xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
> Sent: Tuesday, January 14, 2014 12:33 PM
> To: Windows System Software Devs Interest List
> Subject: RE:[ntdev] KiPageFault into BSOD when stepping over
>
> > OK. Heres failing code:
> >
> > NTSTATUS
> > MyVolFltEvtDeviceAdd(In WDFDRIVER Driver, Inout PWDFDEVICE_INIT
> > DeviceInit)
> > /*++
> > Routine Description:
> >
> > EvtDeviceAdd is called by the framework in response to AddDevice
> > call from the PnP manager. We create and initialize a device object
> to
> > represent a new instance of the device.
> >
> > Arguments:
> > Driver - Handle to a framework driver object created in DriverEntry
> > DeviceInit - Pointer to a framework-allocated WDFDEVICE_INIT
> structure.
> > Return Value:
> > NTSTATUS
> > –*/
> > {
> > NTSTATUS status;
> >
> > PAGED_CODE();
> >
> > UNREFERENCED_PARAMETER(Driver);
> >
> > //PDEVICE_OBJECT Pdo =
WdfFdoInitWdmGetPhysicalDevice(DeviceInit);
> >
> > DECLARE_UNICODE_STRING_SIZE(name, 256);

>
> > ULONG retLen;
> > status = WdfFdoInitQueryProperty(DeviceInit,
> DevicePropertyClassGuid,
> > sizeof(name_buffer), name.Buffer, &retLen);
>
> >
> > I’m sure you understand its running on passive IRQL:
> > 1: kd> !irql
> > Debugger saved IRQL for processor 0x1 – 0 (LOW_LEVEL)
> >
> > FAULTING_SOURCE_FILE: c:\program files (x86)\windows
> > kits\8.1\include\wdf\kmdf\1.11\wdffdo.h
> >
> > FAULTING_SOURCE_LINE_NUMBER: 202
> >
> > FAULTING_SOURCE_CODE:
> > 198: PULONG ResultLength
> > 199: )
> > 200: {
> > 201: return ((PFN_WDFFDOINITQUERYPROPERTY)
> > WdfFunctions[WdfFdoInitQueryPropertyTableIndex])(WdfDriverGlobals,
> > DeviceInit, DeviceProperty, BufferLength, PropertyBuffer,
> > ResultLength);
>
> While it is nice that C and C++ let you compose long and complex
> expressions like this, you will find it a LOT easier to debug if you
> break
> this into about four lines of code

> But it looks like you made one of the silliest possible errors. You
> read
> the documentation of the function, and it said “PULONG ResultLength”.
> So
> you assumed you had to have a variable of type PULONG that you would
> pass
> in, which means you have no idea how C works. So you declared a
> variable
> of type PULONG. Did you initialize it
>
> >> 202: }
>
> You need a good remedial course in the C language.
> joe
Sorry, Dr. Newcomer - but the crash is on return from inlined KMDF
function WdfFdoInitQueryProperty.
So - a) your stylistic criticism better be addressed to Microsoft and,
more importantly,
b) PULONG ResultLength is simply last one of function parameters.
So - C++ problems are actually absent.

I have no idea what that assertion means. The code is simply wrong,
W-R-O-N-G, big-time. The reason it is a Heisenbug is that single-stepping
can alter the state of the stack, so that when single-stepping, the stack
has a different garbage value for the uninitialized variable than if the
programmer lets it run.

The error is true whether the source code is C or C++, and is independent
of the number of parameters or the position of the variable in the
parameter list. The code is garbage. It has to be fixed. What is
amazing is that it has been out there for so long with this deep and
fundamental error in it, and nobody noticed!

What OP states is that he gets this bugcheck only if he breaks into
debugger somewhere before this function invocation and then steps over
(or into) this line of code
and he has no bugcheck if there is no
debugger attached or if he breaks into debugger after this line.

See above explanation. Recall the rule that using a variable whose value
has not been established produces “undefined” results. The observed
behavior is an example of one of the possible outcomes of code this bad.
It could be much worse.

To OP: Andrii, and if you break into debugger before this line but just
run code instead of stepping? No crash? Or the same one?

Once the code is this broken, it doesn’t matter what the OP does.
Undefined behavior is the only possible outcome. The problem is not in
the debugger, it is in the fact that the code is deeply and irrecoverably
erroneous as written, and while the fix is trivial (remove the P from the
declaration and add & to the parameter name), until the code IS fixed, it
is simply not functional. The fact that it has not failed earlier is
nothing short of miraculous. Or maybe it was failing, leading to
unaccountable BSODs as the store through whatever pointer value was left
on the stack overwrote some random important piece of data.

There is no real choice here: if Microsoft wrote the code, Microsoft has
to fix the code. If I had realized this was Microsoft code and I was
writing a driver, I would exercise one of two choices: (a) stop all
development until the error was corrected or (b) avoid calling the broken
function while development continued. Simply calling this function opens
you up to random memory damage and undefined behavior. It is unusable as
written, assuming that the display we saw is the actual code.

I once worked with a developing compiler, that had the property that it
would frequently use 17 of the 16 available registers. When I saw this in
my code, I would simply put my development on hold until the compiler team
fixed the problem. When my boss’s boss took me to task because I’d
promised a port to the VAX “in a month”, and it was now six weeks in, I
pulled up my time sheets (I learned to keep careful time sheets) and
pointed out that I was still on time; I had thus far expended fewer than
five days on the project. When he demanded to know why, I pointedly said
that the use-17-of-the-16-register bug was what killed it every time, and
I could not debug my code when the compiler compiled it incorrectly. I
was heavy on the sarcasm because I knew that he was the person who was
writing the register allocator, and it was his bug. I ended the meeting
by saying “When I get a working compiler, I expect it will take fewer than
5 more days to port it, which means my estimate was off by two weeks.” I
then added, “Please let me know when we have a working compiler and I will
try it again”. It took two more weeks to fix that bug, and it took me
three days to finish the port.

This bug is a fatal bug. No progress can be made until it is fixed,
unless progress can be made without calling that function. Somebody at
Microsoft had better get a serious fire lit under them to get a fix out
for this no later than last Tuesday.
joe

Best regards,
Alex Krol


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> Hello, Alex.

kd> bp MyVolFlt!MyVolFltEvtDeviceAdd
kd> g
Break instruction exception - code 80000003 (first chance)
MyVolFlt!MyVolFltEvtDeviceAdd:
fffff800`011be0e0 cc int 3
1: kd> g
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x00000050
(0xFFFFE00020464C10,0x0000000000000000,0xFFFFF800002692E3,0x0000000000000002)

So the same :frowning: And its not about only this code. I get similar bugchecks
while debugging other drivers (this one is just a skeleton project with
nearly no real work performed).

If I set bp right after WdfFdoInitQueryProperty call - it runs like a
charm.

No, it most definitely does not “run like a charm”, unless you want to
believe in charms, in which case it runs only because the random garbage
left on the stack causes it to merely corrupt some random memory location,
rather than try to access a nonexistent memory location. It is truly the
“luck of the draw” that leaves an address on the stack that does not cause
a BSOD. Now, if the spec of that function says that parameter can be
NULL, then through only the most amazing coincidences, the value on the
stack just happens to be zero. This is not “running”, this is “not
failing in spite of a deep and fundamental bug in the code”. And it is
only random luck that would leave this particular stack location set to
NULL. When you start single-stepping, the stack gets a different pattern
of garbage, and that particular garbage is fatal.

I repeat: the code cannot be trusted. It does not “run” except by luck.
It is entirely an accident that the value left on the stack without
single-stepping does not cause a BSOD. Officially, the meaning of that
code is undefined, and it is entitled to do anything at all, including
causing your computer to vanish from Earth and take up orbit around
Jupiter. However, the most likely “undefined” behavior is to damage some
random piece of memory somewhere.

Probably I must mention my environment:

  1. Host - Windows 8.1 Enterprise x64 with Hyper-V
  2. Target - Windows 8.1 Enterprise x64 under Hyper-V
  3. WinDbg 6.3.9600.16384 (WDK 8.1)
  4. VM got COM1 configured as a pipe for KD
  5. bootdebug is enabled
  6. Target is FRE build

None of the above matters. The code is wrong. Nothing can save it,
except fixing the bug. On the other hand, if you used to be in ordnance
disposal and don’t mind playing with armed explosives, you may be
perfectly comfortable continuing to use this code. Just don’t be
surprised if it blows up in your face.

My only guess was Dynamic Memory in VM configuration. So I disabled it.
Still no luck.

No. The code is wrong. It cannot possibly work correctly, ever. If it
has been giving the illusion of working, that is just the most amazing
luck in the known universe. Until that bug is fixed, be afraid. Be very
afraid.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Andrii wrote:

Hello, Alex.

kd> bp MyVolFlt!MyVolFltEvtDeviceAdd
kd> g
Break instruction exception - code 80000003 (first chance)
MyVolFlt!MyVolFltEvtDeviceAdd:
fffff800`011be0e0 cc int 3
1: kd> g
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x00000050

(0xFFFFE00020464C10,0x0000000000000000,0xFFFFF800002692E3,0x00000000000

So the same :frowning: And its not about only this code. I get similar
bugchecks while debugging other drivers (this one is just a skeleton
project with nearly no real work performed).

If I set bp right after WdfFdoInitQueryProperty call - it runs like a
charm.

Probably I must mention my environment:

  1. Host - Windows 8.1 Enterprise x64 with Hyper-V
  2. Target - Windows 8.1 Enterprise x64 under Hyper-V
  3. WinDbg 6.3.9600.16384 (WDK 8.1)
  4. VM got COM1 configured as a pipe for KD
  5. bootdebug is enabled
  6. Target is FRE build

My only guess was Dynamic Memory in VM configuration. So I disabled
it.
Still no luck.

Ha! Curiouser and curiouser!
Actually, I think the only person on this list who can shed some light
on it is Doron Holan - he, after all is the KMDF man at Microsoft.
BTW, does this happen when debugging target is physical Windows 8.1
machine and not a VM?

Best regards,
Alex Krol

Dr. Newcomer wrote:

No. The code is wrong

The code in question is

__checkReturn
__drv_maxIRQL(PASSIVE_LEVEL)
NTSTATUS
FORCEINLINE
WdfFdoInitQueryProperty(
__in
PWDFDEVICE_INIT DeviceInit,
__in
DEVICE_REGISTRY_PROPERTY DeviceProperty,
__in
ULONG BufferLength,
__out_bcount_full_opt(BufferLength)
PVOID PropertyBuffer,
__out
PULONG ResultLength
)
{
return ((PFN_WDFFDOINITQUERYPROPERTY)
WdfFunctions[WdfFdoInitQueryPropertyTableIndex])(WdfDriverGlobals,
DeviceInit, DeviceProperty, BufferLength, PropertyBuffer, ResultLength);
}

(Well, it is copypasted from old KMDF 1.9, but this inlined function was
not changed in later ones).
You are seeing in debugger output just last lines starting from
PULONG ResultLength

  • and missing the closing ).

Best regards,
Alex Krol

Alex, I don’t have physical machine available for debugging here :frowning:

And a small update. After first call to WdfFdoInitQueryProperty references rdx=DeviceInit:
fffff800`003702e3 488b1a mov rbx,qword ptr [rdx]

consequent step overs don’t trigger bug checks. WDFDEVICE_INIT structure is allocated by framework and must be valid at that point.

Second run with step into WdfFdoInitQueryProperty:
1: kd> !pool ffffe000017d8e20 <- rdx = DeviceInit
Pool page ffffe000017d8e20 region is Nonpaged pool

*ffffe000017d8e10 size: 1f0 previous size: 1c0 (Allocated) *FxDr
Pooltag FxDr : KMDF driver globals/generic pool allocation tag. Fallback tag in case driver tag is unusable., Binary : wdf01000.sys

What does ‘!pte ffffe000`20464c10’ say (run on dump from your first post)?

Kris

On Tue, Jan 14, 2014 at 12:49 PM, wrote:
> Second run with step into WdfFdoInitQueryProperty:
> 1: kd> !pool ffffe000017d8e20 <- rdx = DeviceInit
> Pool page ffffe000017d8e20 region is Nonpaged pool
> …
> *ffffe000017d8e10 size: 1f0 previous size: 1c0 (Allocated) *FxDr
> Pooltag FxDr : KMDF driver globals/generic pool allocation tag. Fallback tag in case driver tag is unusable., Binary : wdf01000.sys
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


Kris

I haven’t done anything in KMDF for couple of years now so it might be
completely unrelated but did you check what’s going on the other CPUs?
I saw race conditions being more “visible” when debug stepping.

Kris

On Tue, Jan 14, 2014 at 1:21 PM, Krzysztof Uchronski wrote:
> What does ‘!pte ffffe000`20464c10’ say (run on dump from your first post)?
>
> Kris
>
> On Tue, Jan 14, 2014 at 12:49 PM, wrote:
>> Second run with step into WdfFdoInitQueryProperty:
>> 1: kd> !pool ffffe000017d8e20 <- rdx = DeviceInit
>> Pool page ffffe000017d8e20 region is Nonpaged pool
>> …
>> *ffffe000017d8e10 size: 1f0 previous size: 1c0 (Allocated) *FxDr
>> Pooltag FxDr : KMDF driver globals/generic pool allocation tag. Fallback tag in case driver tag is unusable., Binary : wdf01000.sys
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>>
>> OSR is HIRING!! See http://www.osr.com/careers
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>
>
>
> –
> Kris


Kris

Krzysztof, I was digging this right before you wrote :slight_smile:

Before unsuccessful call to WdfFdoInitQueryProperty (bp before call is made)
1: kd> !pte poi(DeviceInit)
VA ffffe00020464c10
PXE at FFFFF6FB7DBEDE00 PPE at FFFFF6FB7DBC0000 PDE at FFFFF6FB78000810 PTE at FFFFF6F000102320
contains 0000000000381863 contains 0000000000382863 contains 0000000000000000
pfn 381 —DA–KWEV pfn 382 —DA–KWEV not valid

After successful call to WdfFdoInitQueryProperty (bp after the call is made):

1: kd> !pte poi(DeviceInit)
VA ffffd00020464c10
PXE at FFFFF6FB7DBEDD00 PPE at FFFFF6FB7DBA0000 PDE at FFFFF6FB74000810 PTE at FFFFF6E800102320
contains 00000000002A4863 contains 00000000002A3863 contains 0000000000541863 contains 8000000002820963
pfn 2a4 —DA–KWEV pfn 2a3 —DA–KWEV pfn 541 —DA–KWEV pfn 2820 -G-DA–KW-V

Does this mean Windows fixes kernel PTEs on the fly? OK, but why does it bugcheck at that point?

PS: Other cores are idle.

  1. Do you ever map memory as non-cached, by any chance?

  2. List the breakpoints: bl

Does it show any stray breakpoints you forgot about?