Bugcheck in fltmgr.sys

I’m seeing a bugcheck in fltmgr.sys that seems to be related to stopping
the service that my driver works with. When that happens, my driver
does some cleanup (flushes CSQs, etc) and then stops filtering requests.
I’m seeing bugchecks on one of my test systems that I’m using for
high-load testing when my service locks up, which I then stop.

There are two tricky bits to this: 1) I don’t have a serial or firewire
connection to this machine (it has neither, I’m going to rummage through
some old hardware and see if we have a serial card) so I can’t verify
that my cleanup stuff is running as expected (at a guess, I suspect it
is running, but something is being cleaned up improperly). 2) I’m not
in the stack when it bugchecks.

The bugcheck appears to be occuring in FltpPassThroughInternal when it
tries to use edi as a pointer and do a test. edi is NULL which explains
why it’s bugchecking there, but I don’t know what edi is, or how it got
NULL. Any suggestions as to how I might go about answering either of
those questions?

Thanks,

~Eric

3: kd> !analyze -v
************************************************************************
*******
*
*
* Bugcheck Analysis
*
*
*
************************************************************************
*******

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never
have
hardcoded breakpoints in retail code, but …
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: bae76f40, The address that the exception occurred at
Arg3: b88c6bac, Trap Frame
Arg4: 00000000

Debugging Details:

Page f3bc6 not present in the dump file. Type “.hh dbgerr004” for
details
Page f3e1a not present in the dump file. Type “.hh dbgerr004” for
details
PEB is paged out (Peb.Ldr = 7ffdf00c). Type “.hh dbgerr001” for details
PEB is paged out (Peb.Ldr = 7ffdf00c). Type “.hh dbgerr001” for details

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx”
referenced memory at “0x%08lx”. The memory could not be “%s”.

FAULTING_IP:
fltmgr!FltpPassThroughInternal+48
bae76f40 f6470301 test byte ptr [edi+3],1

TRAP_FRAME: b88c6bac – (.trap 0xffffffffb88c6bac)
ErrCode = 00000000
eax=8ac48e70 ebx=00000000 ecx=8aff0100 edx=00000000 esi=b88c6c60
edi=00000000
eip=bae76f40 esp=b88c6c20 ebp=b88c6c2c iopl=0 nv up ei ng nz ac
po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00010293
fltmgr!FltpPassThroughInternal+0x48:
bae76f40 f6470301 test byte ptr [edi+3],1
ds:0023:00000003=??
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x8E

PROCESS_NAME: dd.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8082d800 to 80827c63

STACK_TEXT:
b88c6778 8082d800 0000008e c0000005 bae76f40 nt!KeBugCheckEx+0x1b
b88c6b3c 8088a262 b88c6b58 00000000 b88c6bac
nt!KiDispatchException+0x3a2
b88c6ba4 8088a216 b88c6c2c bae76f40 badb0d00
nt!CommonDispatchException+0x4a
b88c6bc4 bae75310 b88c6c60 8924da80 b88c6c60 nt!KiExceptionExit+0x186
b88c6c2c bae778d2 b88c6c60 00000000 89ba1ee8
fltmgr!FltpPerformPreCallbacks+0x11a
b88c6c48 bae77ce3 b88c6c00 8924da80 8b010af8
fltmgr!FltpPassThrough+0x1c2
b88c6c78 8081df65 89ba1ee8 8ac48e70 8ac48e70 fltmgr!FltpDispatch+0x10d
b88c6c8c 808f5437 8ac48fb8 8ac48e70 8924da80 nt!IofCallDriver+0x45
b88c6ca0 808f25eb 89ba1ee8 8ac48e70 8924da80
nt!IopSynchronousServiceTail+0x10b
b88c6d38 8088978c 0000079c 00000000 00000000 nt!NtReadFile+0x5d5
b88c6d38 7c8285ec 0000079c 00000000 00000000 nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be
wrong.
0012fe88 00000000 00000000 00000000 00000000 0x7c8285ec

STACK_COMMAND: kb

FOLLOWUP_IP:
fltmgr!FltpPassThroughInternal+48
bae76f40 f6470301 test byte ptr [edi+3],1

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: fltmgr!FltpPassThroughInternal+48

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: fltmgr

IMAGE_NAME: fltmgr.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 45d697cc

FAILURE_BUCKET_ID: 0x8E_fltmgr!FltpPassThroughInternal+48

BUCKET_ID: 0x8E_fltmgr!FltpPassThroughInternal+48

Followup: MachineOwner

3: kd> u bae76f40
fltmgr!FltpPassThroughInternal+0x48:
bae76f40 f6470301 test byte ptr [edi+3],1
bae76f44 741e je fltmgr!FltpPassThroughInternal+0x6c
(bae76f64)
bae76f46 8b4608 mov eax,dword ptr [esi+8]
bae76f49 f6400408 test byte ptr [eax+4],8
bae76f4d 7511 jne fltmgr!FltpPassThroughInternal+0x68
(bae76f60)
bae76f4f ff770c push dword ptr [edi+0Ch]
bae76f52 8a5701 mov dl,byte ptr [edi+1]
bae76f55 8a0f mov cl,byte ptr [edi]

Hi Eric,

Are you running a checked build or a free build of filter manager? In order to get the benefit of all the checks in Filter Manager you should probably run under verifier (for at least both your minifilter and filter manager, might want to add the other minifilters on the system since you never know …) with a checked build. There are a lot more sanity checks in this combination which might help identify the problem early on.

Other than that I can’t really offer more information. I’ve looked at the beginning of “FltpPassThroughInternal” but without knowing whether it’s a checked build I don’t know how far down to look. Maybe you could post a bit more of the disassembly before FltpPassThroughInternal+48 so I can try to figure out what it’s trying to do …

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

Hey Alex,

I’m pretty sure verifier isn’t even set up on that machine, I don’t
think I ever did that when I first started testing on it (-5 to me).
I’ll definitely turn it on and see about putting the checked build of at
least the hal, kernel, and fltmgr.

I’ve been seeing different symptoms (i.e. it’s crashing in different
places) on this machine versus my test VM (where verifier is enabled,
and the checked build is installed and run from time to time). I wonder
if that’s sanity checking related.

Thanks for the suggestions. I’ll put up something more informative when
I get the chance to do more testing on that machine. In the meantime,
here’s a longer assembler listing.

~Eric

This is a free version of fltmgr, by the way.

3: kd> u fltmgr!FltpPassThroughInternal
fltmgr!FltpPassThroughInternal+68
fltmgr!FltpPassThroughInternal:
bae76ef8 8bff mov edi,edi
bae76efa 55 push ebp
bae76efb 8bec mov ebp,esp
bae76efd 8b4d0c mov ecx,dword ptr [ebp+0Ch]
bae76f00 53 push ebx
bae76f01 56 push esi
bae76f02 8b7508 mov esi,dword ptr [ebp+8]
bae76f05 8b4608 mov eax,dword ptr [esi+8]
bae76f08 894838 mov dword ptr [eax+38h],ecx
bae76f0b 8b4608 mov eax,dword ptr [esi+8]
bae76f0e 8b4020 mov eax,dword ptr [eax+20h]
bae76f11 8b4060 mov eax,dword ptr [eax+60h]
bae76f14 80480301 or byte ptr [eax+3],1
bae76f18 837e0c00 cmp dword ptr [esi+0Ch],0
bae76f1c 7410 je fltmgr!FltpPassThroughInternal+0x36
(bae76f2e)
bae76f1e 8b4608 mov eax,dword ptr [esi+8]
bae76f21 56 push esi
bae76f22 897034 mov dword ptr [eax+34h],esi
bae76f25 e8cce2ffff call fltmgr!FltpPerformPreCallbacks
(bae751f6)
bae76f2a 8bd8 mov ebx,eax
bae76f2c eb02 jmp fltmgr!FltpPassThroughInternal+0x38
(bae76f30)
bae76f2e 33db xor ebx,ebx
bae76f30 b803010000 mov eax,103h
bae76f35 3bd8 cmp ebx,eax
bae76f37 743b je fltmgr!FltpPassThroughInternal+0x7c
(bae76f74)
bae76f39 8b4604 mov eax,dword ptr [esi+4]
bae76f3c 57 push edi
bae76f3d 8b7860 mov edi,dword ptr [eax+60h]
bae76f40 f6470301 test byte ptr [edi+3],1
bae76f44 741e je fltmgr!FltpPassThroughInternal+0x6c
(bae76f64)
bae76f46 8b4608 mov eax,dword ptr [esi+8]
bae76f49 f6400408 test byte ptr [eax+4],8
bae76f4d 7511 jne fltmgr!FltpPassThroughInternal+0x68
(bae76f60)
bae76f4f ff770c push dword ptr [edi+0Ch]
bae76f52 8a5701 mov dl,byte ptr [edi+1]
bae76f55 8a0f mov cl,byte ptr [edi]
bae76f57 e8b4d7ffff call fltmgr!FltpCanReturnPendingFromDispatch
(bae74710)
bae76f5c 84c0 test al,al
bae76f5e 7504 jne fltmgr!FltpPassThroughInternal+0x6c
(bae76f64)

Looks like edi == [[param1+4]+60], if my five minute look on a 3inch screen was correct. (Not sure what the first param to that function is, however, but it seems to be a struct ptr of some sort.)

  • S

-----Original Message-----
From: Eric Diven
Sent: Monday, September 29, 2008 12:52
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Hey Alex,

I’m pretty sure verifier isn’t even set up on that machine, I don’t
think I ever did that when I first started testing on it (-5 to me).
I’ll definitely turn it on and see about putting the checked build of at
least the hal, kernel, and fltmgr.

I’ve been seeing different symptoms (i.e. it’s crashing in different
places) on this machine versus my test VM (where verifier is enabled,
and the checked build is installed and run from time to time). I wonder
if that’s sanity checking related.

Thanks for the suggestions. I’ll put up something more informative when
I get the chance to do more testing on that machine. In the meantime,
here’s a longer assembler listing.

~Eric

This is a free version of fltmgr, by the way.

3: kd> u fltmgr!FltpPassThroughInternal
fltmgr!FltpPassThroughInternal+68
fltmgr!FltpPassThroughInternal:
bae76ef8 8bff mov edi,edi
bae76efa 55 push ebp
bae76efb 8bec mov ebp,esp
bae76efd 8b4d0c mov ecx,dword ptr [ebp+0Ch]
bae76f00 53 push ebx
bae76f01 56 push esi
bae76f02 8b7508 mov esi,dword ptr [ebp+8]
bae76f05 8b4608 mov eax,dword ptr [esi+8]
bae76f08 894838 mov dword ptr [eax+38h],ecx
bae76f0b 8b4608 mov eax,dword ptr [esi+8]
bae76f0e 8b4020 mov eax,dword ptr [eax+20h]
bae76f11 8b4060 mov eax,dword ptr [eax+60h]
bae76f14 80480301 or byte ptr [eax+3],1
bae76f18 837e0c00 cmp dword ptr [esi+0Ch],0
bae76f1c 7410 je fltmgr!FltpPassThroughInternal+0x36
(bae76f2e)
bae76f1e 8b4608 mov eax,dword ptr [esi+8]
bae76f21 56 push esi
bae76f22 897034 mov dword ptr [eax+34h],esi
bae76f25 e8cce2ffff call fltmgr!FltpPerformPreCallbacks
(bae751f6)
bae76f2a 8bd8 mov ebx,eax
bae76f2c eb02 jmp fltmgr!FltpPassThroughInternal+0x38
(bae76f30)
bae76f2e 33db xor ebx,ebx
bae76f30 b803010000 mov eax,103h
bae76f35 3bd8 cmp ebx,eax
bae76f37 743b je fltmgr!FltpPassThroughInternal+0x7c
(bae76f74)
bae76f39 8b4604 mov eax,dword ptr [esi+4]
bae76f3c 57 push edi
bae76f3d 8b7860 mov edi,dword ptr [eax+60h]
bae76f40 f6470301 test byte ptr [edi+3],1
bae76f44 741e je fltmgr!FltpPassThroughInternal+0x6c
(bae76f64)
bae76f46 8b4608 mov eax,dword ptr [esi+8]
bae76f49 f6400408 test byte ptr [eax+4],8
bae76f4d 7511 jne fltmgr!FltpPassThroughInternal+0x68
(bae76f60)
bae76f4f ff770c push dword ptr [edi+0Ch]
bae76f52 8a5701 mov dl,byte ptr [edi+1]
bae76f55 8a0f mov cl,byte ptr [edi]
bae76f57 e8b4d7ffff call fltmgr!FltpCanReturnPendingFromDispatch
(bae74710)
bae76f5c 84c0 test al,al
bae76f5e 7504 jne fltmgr!FltpPassThroughInternal+0x6c
(bae76f64)


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

So what’s going is that Filter Manager is processing an IRP and has finished calling all the minifilters and then it’s checking to see any of them wanted to pend the operation (where the 103h is) and if that isn’t the case then it gets the IO_STACK_LOCATION for the IRP and then it’s trying to see whether the Control member has the SL_PENDING_RETURNED flag set. So edi in edi+3 points to the IO_STACK_LOCATION.

I don’t know what’s going on in there but I hope it’ll help you figure it out.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

Okay, some pieces might be starting to fall into place:

This happens when there’s an IRP that gets cancelled and then the
filter’s communication port gets disconnected (because the service
terminates for one reason or another, I’ve been beating some bugs out of
it). The other symptom I’ve been seeing is that when the disconnect
happens and the FltCbdq’s get flushed, the flush routine bugchecks with
STATUS_ACCESS_VIOLATION not handled.

I see the bugcheck there when I test the cbd->Flags against
FLTFL_CALLBACK_DATA_POST_OPERATION to determine which of
FltCompletePendedPost/PreOperation I need to call.

In my pre-callback, I’m putting the IRP in a FltCbdq, then calling
FltSendMessage to send a message to my service. If that fails, it will
try to pull the IRP out of the FltCbdq, set an error status in
IoStatus.Status, and return FLT_PREOP_SUCCESS_NO_CALLBACK. If things
are okay, it’ll pend the IRP.

I’m wondering if *BOTH* the flush routine and the fltmgr code are trying
to complete the IRP, and who wins the race determines where I see the
bugcheck. Does this sound plausible?

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Monday, September 29, 2008 7:19 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

So what’s going is that Filter Manager is processing an IRP and has
finished calling all the minifilters and then it’s checking to see any
of them wanted to pend the operation (where the 103h is) and if that
isn’t the case then it gets the IO_STACK_LOCATION for the IRP and then
it’s trying to see whether the Control member has the
SL_PENDING_RETURNED flag set. So edi in edi+3 points to the
IO_STACK_LOCATION.

I don’t know what’s going on in there but I hope it’ll help you figure
it out.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no
rights.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Hi Eric,

“If that fails, it will try to pull the IRP out of the FltCbdq, set an error status in IoStatus.Status, and return FLT_PREOP_SUCCESS_NO_CALLBACK.”

Actually, that doesn’t do much. If you want to fail the operation you should return FLT_PREOP_COMPLETE after you set the error in IoStatus.Status.

Thanks,
Alex.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Eric Diven
Sent: Monday, September 29, 2008 4:51 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Okay, some pieces might be starting to fall into place:

This happens when there’s an IRP that gets cancelled and then the
filter’s communication port gets disconnected (because the service
terminates for one reason or another, I’ve been beating some bugs out of
it). The other symptom I’ve been seeing is that when the disconnect
happens and the FltCbdq’s get flushed, the flush routine bugchecks with
STATUS_ACCESS_VIOLATION not handled.

I see the bugcheck there when I test the cbd->Flags against
FLTFL_CALLBACK_DATA_POST_OPERATION to determine which of
FltCompletePendedPost/PreOperation I need to call.

In my pre-callback, I’m putting the IRP in a FltCbdq, then calling
FltSendMessage to send a message to my service. If that fails, it will
try to pull the IRP out of the FltCbdq, set an error status in
IoStatus.Status, and return FLT_PREOP_SUCCESS_NO_CALLBACK. If things
are okay, it’ll pend the IRP.

I’m wondering if *BOTH* the flush routine and the fltmgr code are trying
to complete the IRP, and who wins the race determines where I see the
bugcheck. Does this sound plausible?

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Monday, September 29, 2008 7:19 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

So what’s going is that Filter Manager is processing an IRP and has
finished calling all the minifilters and then it’s checking to see any
of them wanted to pend the operation (where the 103h is) and if that
isn’t the case then it gets the IO_STACK_LOCATION for the IRP and then
it’s trying to see whether the Control member has the
SL_PENDING_RETURNED flag set. So edi in edi+3 points to the
IO_STACK_LOCATION.

I don’t know what’s going on in there but I hope it’ll help you figure
it out.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no
rights.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Wow, another -5 to me on this one. Thanks for the good catch and
looking into this. That would have been in the code for years. Would
fixing that also prevent any possible race from happening since no
further processing by any minifilters or legacy filters below fltmgr
would be necessary? I.e. would it still be checking the flags in the
IO_STACK_LOCATION?

I’ve just now gotten access back to the machine I’ve been testing on.
I’ll get some more diagnostics in place
and see if I can duplicate the issue.

Thanks again,

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Monday, September 29, 2008 7:59 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Hi Eric,

“If that fails, it will try to pull the IRP out of the FltCbdq, set an
error status in IoStatus.Status, and return
FLT_PREOP_SUCCESS_NO_CALLBACK.”

Actually, that doesn’t do much. If you want to fail the operation you
should return FLT_PREOP_COMPLETE after you set the error in
IoStatus.Status.

Thanks,
Alex.

Well, I think you’re right about what actually happens. Sometimes an IRP might be “completed” twice, once by the file system and once when it is removed from the queue and based on which was last you would see a different bugcheck.

However, I don’t think the FLT_PREOP_COMPLETE change would fix the bug. I mean, all that happens is that the IRP is completing sooner now… So the bug might happen less frequently but it should still happen…

I’m still not clear why the IRP is completed twice.
But I guess the main question I have is why do you queue the IRP before calling the user mode service ? Does your user mode service need to use the queue ? If so, I would guess you need some synchronization there…

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Eric Diven
Sent: Monday, September 29, 2008 5:07 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Wow, another -5 to me on this one. Thanks for the good catch and
looking into this. That would have been in the code for years. Would
fixing that also prevent any possible race from happening since no
further processing by any minifilters or legacy filters below fltmgr
would be necessary? I.e. would it still be checking the flags in the
IO_STACK_LOCATION?

I’ve just now gotten access back to the machine I’ve been testing on.
I’ll get some more diagnostics in place
and see if I can duplicate the issue.

Thanks again,

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Monday, September 29, 2008 7:59 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Hi Eric,

“If that fails, it will try to pull the IRP out of the FltCbdq, set an
error status in IoStatus.Status, and return
FLT_PREOP_SUCCESS_NO_CALLBACK.”

Actually, that doesn’t do much. If you want to fail the operation you
should return FLT_PREOP_COMPLETE after you set the error in
IoStatus.Status.

Thanks,
Alex.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Okay, good point about FLT_PREOP_COMPLETE.

I’m not sure I can close the opportunity for the race completely. I
think I’m going to sleep on it.

Thanks for the help, this has been fascinating.

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Monday, September 29, 2008 8:33 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Well, I think you’re right about what actually happens. Sometimes an IRP
might be “completed” twice, once by the file system and once when it is
removed from the queue and based on which was last you would see a
different bugcheck.

However, I don’t think the FLT_PREOP_COMPLETE change would fix the bug.
I mean, all that happens is that the IRP is completing sooner now… So
the bug might happen less frequently but it should still happen…

I’m still not clear why the IRP is completed twice.
But I guess the main question I have is why do you queue the IRP before
calling the user mode service ? Does your user mode service need to use
the queue ? If so, I would guess you need some synchronization there…

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no
rights.

Okay, the answer to the main question is that the service does some
processing on a file, and then messages the driver back. When it does,
the driver removes the IRP from the queue and complete it, which it does
by using the queue context the service includes in the message. That’s
why the driver has to queue the IRP before it sends the message. If
there’s a better way of doing this, I’d definitely entertain making a
change.

Generally though, I’m not sure a driver can synchronize this kind of
access to the queue by itself. After all, even if releasing your synch
is the last thing you do before returning to fltmgr, another thread
could still go and poach the IRP from the queue and complete it if the
scheduler schedules out the queueing thread and schedules in the
de-queueing thread before fltmgr finishes its business.

Specific to my situation, I think I can reduce the chances of this
happening by putting a sleep in my flush routine between disabling the
queue and flushing the IRPs. That should help ensure that anybody else
who is still using the IRP is clear of it before it gets completed.

Thanks,

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Monday, September 29, 2008 8:33 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Well, I think you’re right about what actually happens. Sometimes an IRP
might be “completed” twice, once by the file system and once when it is
removed from the queue and based on which was last you would see a
different bugcheck.

However, I don’t think the FLT_PREOP_COMPLETE change would fix the bug.
I mean, all that happens is that the IRP is completing sooner now… So
the bug might happen less frequently but it should still happen…

I’m still not clear why the IRP is completed twice.
But I guess the main question I have is why do you queue the IRP before
calling the user mode service ? Does your user mode service need to use
the queue ? If so, I would guess you need some synchronization there…

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no
rights.

I don’t know enough about the code to be able to make any useful suggestion. I mean, the algorithm looks pretty simple:

  1. queue IRP
  2. call user mode
  3. if failure -> remove IRP and complete it
  4. else -> pend (IRP to be removed from the list and completed when the UM service is done)

But it seems there are some IRPs being completed twice. Either there is something else in the design that I’m missing or the code doesn’t properly reflect the logic.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

Right, that’s what it’s doing in the original thread. After stewing
this over some more, I think you’re right that I can synchronize around
1-3, since that would ensure that if the cleanup callback gets called
between 2 and 3, it waits until after the IRP is removed from the queue
to do anything. That prevents the cleanup callback from removing the
IRP from the queue and completing it and then having the preop callback
fail to remove the IRP in 3, and returning a failure, which causes the
bugcheck in fltmgr.sys.

Thanks, this has been a fun one to puzzle through.

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alexandru Carp
Sent: Tuesday, September 30, 2008 3:47 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

I don’t know enough about the code to be able to make any useful
suggestion. I mean, the algorithm looks pretty simple:

  1. queue IRP
  2. call user mode
  3. if failure -> remove IRP and complete it 4. else -> pend (IRP to be
    removed from the list and completed when the UM service is done)

But it seems there are some IRPs being completed twice. Either there is
something else in the design that I’m missing or the code doesn’t
properly reflect the logic.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no
rights.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Oh, that’s nasty. To test that my handling of the failure case was
correct, I took out the test for failure, removed the IRP, and returned
FLT_PREOP_COMPLETE. I did NOT, however take out the FltSendMessage, so
the service did its thing and then phoned back in when it was done.

When I called FltCbdqRemoveIo with the
PFLT_CALLBACK_DATA_QUEUE_IO_CONTEXT the service was phoning back with,
FltCbdqRemoveIo bugchecked with a memory access error because the IRP
had long been completed by fltmgr. The Irp field in the CbdqIoContext
was no longer valid.

I understand why it’s like this having now looked through the pieces,
but it’s certainly something worth being aware of.

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Eric Diven
Sent: Tuesday, September 30, 2008 5:33 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Bugcheck in fltmgr.sys

Right, that’s what it’s doing in the original thread. After stewing
this over some more, I think you’re right that I can synchronize around
1-3, since that would ensure that if the cleanup callback gets called
between 2 and 3, it waits until after the IRP is removed from the queue
to do anything. That prevents the cleanup callback from removing the
IRP from the queue and completing it and then having the preop callback
fail to remove the IRP in 3, and returning a failure, which causes the
bugcheck in fltmgr.sys.

Thanks, this has been a fun one to puzzle through.

~Eric

Heh, nice one! Glad you could figure it out.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.