We recently came across what appears to be a rather interesting race
condition within the I/O Manager when using associated IRPs and I am
wondering if others in the community have seen this problem before, since it
is always best to obtain independent confirmation.
The particulars:
NTFS uses associated IRPs when it needs to break a single (larger) I/O up
into a set of smaller I/O operations. This occurs, for example, when the
file is fragmented and a single I/O operation straddles disjoint fragments.
Thus, it constructs a set of two, or more, associated IRPS.
As these associated IRPs complete, the I/O Manager decrements the count in
the Master IRP field and when the count in the Master IRP drops to zero the
I/O Manager completes the IRP.
In some recent test runs, someone found a crash (actually several crashes)
when testing Exchange 2000 on top of an NTFS volume that was, in turn, using
a non-standard storage device (fast, but non-standard.) In analyzing this
crash I found the following code sequence leading up to the crash:
ntkrnlmp!IopfCompleteRequest+108
8041da28 8b7e0c mov edi,[esi+0xc]
8041da2b 6840c44680 push 0x8046c440
8041da30 83caff or edx,0xffffffff
8041da33 8d4f0c lea ecx,[edi+0xc]
8041da36 e8314e0400 call ntkrnlmp!ExfInterlockedAddUlong (8046286c)
<<<<<<<<<<<< Call preceeding location of the crash.
8041da3b 8bd8 mov ebx,eax
8041da3d 8b4750 mov eax,[edi+0x50] <<<<<<<<<<<<<<<<<<< Location of the crash
The code leading up to the crash is decrementing the count in the master IRP
(that’s what is in the EDI register - the address of the master IRP.)
Initially I thought it was some sort of problem with the IRP, but the crash
didn’t occur when the interlocked add was performed (which is really a funky
interlocked decrement.)
The “mov ebx,eax” is just storing away the return value from
ExfInterlockedAddUlong (which is the OLD value of the Master IRP’s count
field.) The 0x50 appears to refer to Tail.Overlay.Thread (there are other
fields here, but they relate to the APC, and that doesn’t seem applicable
here.) The trap frame tells yet more of the story:
1: kd> .trap f7474c8c
eax=00000002 ebx=00000002 ecx=a5889e9c edx=8046c440 esi=85b24b28
edi=a5889e90
eip=8041da3d esp=f7474d00 ebp=85b24cb8 iopl=0 nv up ei ng nz na po nc
vip=0 vif=0
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010286
ErrCode = 00000000
8041da3d 8b4750 mov eax,[edi+0x50]
The EAX register contains 2 - indicating that this decrement was NOT the
last one, it was the “next to the last one”. The EDI register points to
invalid memory at this point - even though EDI+0xC (which is on the same
page) was valid two instructions earlier. I poked at the other CPU and
noticed it was in the same general code path, which then led me to begin
wondering if this were a race condition; all of the subsequent analysis
seems to bear this out - it is a VERY narrow race condition. I checked in
XP and the same code sequence is still there, so this is even a current race
condition (unless it has been fixed very recently.)
It looks to me like the race is that the I/O Manager uses the master IRP
after it decrements the reference count. From a “strict” standpoint this
should be a no-no, since decrementing the reference count means you aren’t
going to use the pointer anymore. Thus, using the pointer AFTER
decrementing the reference count would seem to be a bona fide bug.
This shows up (of course) when running the driver verifier, typically after
running > 24 hours in this exchange test scenario. I suspect that the race
condition is sufficiently rare that this bug would probably almost NEVER
show up when driver verifier was OFF (since the memory would still be
“valid”.)
But, the point of this note is to ask if anyone else has seen this problem.
I’ve reported my suspicions to Microsoft (and I think the fix is really
simple - if you really need to use the master IRP, do so BEFORE the
decrement, not after it) but I’m looking for potential independent
confirmation.
Thanks!
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com
You are currently subscribed to ntfsd as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntfsd-$subst(‘Recip.MemberIDChar’)@lists.osr.com