Sorry for the partial repost from the windbg section but this has moved on from my original posting…
I have been looking at a crash reported against bridge.sys. This occurs on the second of the following two lines (sorry there are no public symbols for bridge.sys)
bridge+0x80b4:
fffff807`6fc780b4 e8d3efffff call bridge+0x708c (fffff807`6fc7708c)
fffff807`6fc780b9 4b8b4c3420 mov rcx,qword ptr [r12+r14+20h]
During this sequence RSI contains the NBL and R14 the context. The crash occurs on the second line because the call on the first line frees the context in R14.
If I run this with our driver installed and set a breakpoint on the second line we see
Breakpoint 0 hit
bridge+0x80b9:
fffff802`740a80b9 4b8b4c3420 mov rcx,qword ptr [r12+r14+20h]
kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=ffffb802199a82a0 rdi=ffffb8021d183010
rip=fffff802740a80b9 rsp=ffffd9837517fcf0 rbp=ffffd9837517fe00
r8=0000000000000001 r9=0000000000000001 r10=0000000000000002
r11=ffffd9837517fb90 r12=0000000000000000 r13=0000000000000000
r14=ffffb8021d9d7350 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282
bridge+0x80b9:
fffff802`740a80b9 4b8b4c3420 mov rcx,qword ptr [r12+r14+20h] ds:002b:ffffb802`1d9d7370=0000000000000000
kd> !pool @r14 2
Pool page ffffb8021d9d7350 region is Nonpaged pool
*ffffb8021d9d7340 size: 40 previous size: 0 (Free) *Brdg
Owning component : Unknown (update pooltag.txt)
So the next instruction attempts to dereference the context in r14 that was just freed.
Since I can reproduce this I looked to see what happens when I uninstall our product and we see exactly the same thing. If I break on the first line prior to the context free
Prior to the call that frees the context we see…
Breakpoint 1 hit
bridge+0x80b4:
fffff807`6fc780b4 e8d3efffff call bridge+0x708c (fffff807`6fc7708c)
kd> dt _net_buffer_list Context @rsi
ndis!_NET_BUFFER_LIST
_NET_BUFFER_LIST
+0x010 Context : 0xffffae88`c0ffd9d0 _NET_BUFFER_LIST_CONTEXT
kd> !nbl @rsi
NBL ffffae88bbc109f0 Next NBL NULL
First NB ffffae88bbc10b70 Source ffffae88bb0da1a0 - Intel(R) PRO/1000 MT Desktop Adapter
Context stack ffffae88c0ffd9d0 Pool ffffae88bb4c3400 -
Flags INDICATED, NBL_ALLOCATED, PROTOCOL_020_0
Walk the NBL chain Dump data payload
Show out-of-band information Show in Microsoft Network Monitor
If we dump the NBL information we see that NetBufferListInfo[28] is null
!ndiskd.nbl ffffae88bbc109f0 -info
…
Info[0n28] NULL
And as previously discussed the context is stored in r14
!pool @r14 2
Pool page ffffae88c0ffd9d0 region is Nonpaged pool
*ffffae88c0ffd9c0 size: 40 previous size: 0 (Allocated) *Brdg
Owning component : Unknown (update pooltag.txt)
Step over onto the second line and we now see
bridge+0x80b9:
fffff807`6fc780b9 4b8b4c3420 mov rcx,qword ptr [r12+r14+20h]
kd> dt _net_buffer_list Context @rsi
ndis!_NET_BUFFER_LIST
_NET_BUFFER_LIST
+0x010 Context : (null)
Now if we dump the nbl we no longer see the context stack
kd> !ndiskd.nbl ffffae88bbc109f0
NBL ffffae88bbc109f0 Next NBL NULL
First NB ffffae88bbc10b70 Source ffffae88bb0da1a0 - Intel(R) PRO/1000 MT Desktop Adapter
Pool ffffae88bb4c3400 -
Flags INDICATED, RETURNED, NBL_ALLOCATED, PROTOCOL_020_0,
PROTOCOL_200_0
Walk the NBL chain Dump data payload
Show out-of-band information Show in Microsoft Network Monitor
And NetBufferListInfo[28] now points to the freed context
!ndiskd.nbl ffffae88bbc109f0 -info
…
Info[0n28] ffffae88c0ffd9d0
The memory is still showing as allocated
d> !pool @r14 2
Pool page ffffae88c0ffd9d0 region is Nonpaged pool
*ffffae88c0ffd9c0 size: 40 previous size: 0 (Allocated) *Brdg
Owning component : Unknown (update pooltag.txt)
The reason for this is the following code from NdisFreeNetBufferListContext (the nbl is in rbx) where the context is soft-freed (ie cached in the nbl)
ffff802`6e5cce6a 410fb74808 movzx ecx,word ptr [r8+8] ; ecx = context->size (0x20)
fffff802`6e5cce97 0fb7433a movzx eax,word ptr [rbx+3Ah] ; eax = nbl->ndisreserved2 (80 in bad (60 for context, 20 for bridge) 20 in good)
fffff802`6e5cce9b 2bc2 sub eax,edx ; rdx (context->offset) = 0 so rax still == 80
fffff802`6e5cce9d 3bc8 cmp ecx,eax ; ecx = context->size == 20
---------------------------------------------------------------------------------------------------------------------------------------------------------
fffff802`6e5cce9f 7c09 jl ndis!NdisFreeNetBufferListContext+0x9a (fffff802`6e5cceaa) Branch ; Branch to ExFreePool in bad case
----------------------------------------------------------------------------------------------------------------------------------------------------------
ndis!NdisFreeNetBufferListContext+0x91:
fffff802`6e5ccea1 4c898370010000 mov qword ptr [rbx+170h],r8 ; cache context in nbl +170 ; net_buffer_list_info[0n28]
fffff802`6e5ccea8 eb40 jmp ndis!NdisFreeNetBufferListContext+0xda (fffff802`6e5cceea) Branch ; and jump to end..
When we add our driver in though, since our context has already been ‘soft’ freed and cached in net_buffer_list_info[0n28], the bridge.sys context free is the final block so the entire context is hard freed (ie by calling ExFreePool).
Ultimately though the issue appears to be that bridge.sys is using a context after freeing it. Has anybody else come across this before ??
Cheers
Mark