bridge.sys crash

Sorry for the partial repost from the windbg section but this has moved on from my original posting…

I have been looking at a crash reported against bridge.sys. This occurs on the second of the following two lines (sorry there are no public symbols for bridge.sys)

bridge+0x80b4:
fffff807`6fc780b4 e8d3efffff      call    bridge+0x708c (fffff807`6fc7708c)
fffff807`6fc780b9 4b8b4c3420      mov     rcx,qword ptr [r12+r14+20h]

During this sequence RSI contains the NBL and R14 the context. The crash occurs on the second line because the call on the first line frees the context in R14.
If I run this with our driver installed and set a breakpoint on the second line we see

Breakpoint 0 hit

bridge+0x80b9:
fffff802`740a80b9 4b8b4c3420      mov     rcx,qword ptr [r12+r14+20h]
kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=ffffb802199a82a0 rdi=ffffb8021d183010
rip=fffff802740a80b9 rsp=ffffd9837517fcf0 rbp=ffffd9837517fe00
 r8=0000000000000001  r9=0000000000000001 r10=0000000000000002
r11=ffffd9837517fb90 r12=0000000000000000 r13=0000000000000000
r14=ffffb8021d9d7350 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
bridge+0x80b9:
fffff802`740a80b9 4b8b4c3420      mov     rcx,qword ptr [r12+r14+20h] ds:002b:ffffb802`1d9d7370=0000000000000000

kd> !pool @r14 2
Pool page ffffb8021d9d7350 region is Nonpaged pool
*ffffb8021d9d7340 size:   40 previous size:    0  (Free)      *Brdg
		Owning component : Unknown (update pooltag.txt)

So the next instruction attempts to dereference the context in r14 that was just freed.

Since I can reproduce this I looked to see what happens when I uninstall our product and we see exactly the same thing. If I break on the first line prior to the context free

Prior to the call that frees the context we see…

Breakpoint 1 hit
bridge+0x80b4:
fffff807`6fc780b4 e8d3efffff      call    bridge+0x708c (fffff807`6fc7708c)

kd> dt _net_buffer_list Context @rsi
ndis!_NET_BUFFER_LIST
 _NET_BUFFER_LIST
   +0x010 Context : 0xffffae88`c0ffd9d0 _NET_BUFFER_LIST_CONTEXT

kd> !nbl @rsi
    NBL                ffffae88bbc109f0    Next NBL           NULL
    First NB           ffffae88bbc10b70    Source             ffffae88bb0da1a0 - Intel(R) PRO/1000 MT Desktop Adapter
    Context stack      ffffae88c0ffd9d0    Pool               ffffae88bb4c3400 - 
    Flags              INDICATED, NBL_ALLOCATED, PROTOCOL_020_0

    Walk the NBL chain                     Dump data payload
    Show out-of-band information           Show in Microsoft Network Monitor

If we dump the NBL information we see that NetBufferListInfo[28] is null

!ndiskd.nbl ffffae88bbc109f0 -info
…
Info[0n28]                             NULL

And as previously discussed the context is stored in r14

!pool @r14 2
Pool page ffffae88c0ffd9d0 region is Nonpaged pool
*ffffae88c0ffd9c0 size:   40 previous size:    0  (Allocated) *Brdg
		Owning component : Unknown (update pooltag.txt)

Step over onto the second line and we now see

bridge+0x80b9:
fffff807`6fc780b9 4b8b4c3420      mov     rcx,qword ptr [r12+r14+20h]

kd> dt _net_buffer_list Context @rsi
ndis!_NET_BUFFER_LIST
 _NET_BUFFER_LIST
   +0x010 Context : (null)

Now if we dump the nbl we no longer see the context stack

kd> !ndiskd.nbl ffffae88bbc109f0
    NBL                ffffae88bbc109f0    Next NBL           NULL
    First NB           ffffae88bbc10b70    Source             ffffae88bb0da1a0 - Intel(R) PRO/1000 MT Desktop Adapter
                                           Pool               ffffae88bb4c3400 - 
    Flags              INDICATED, RETURNED, NBL_ALLOCATED, PROTOCOL_020_0,
                       PROTOCOL_200_0

    Walk the NBL chain                     Dump data payload
    Show out-of-band information           Show in Microsoft Network Monitor

And NetBufferListInfo[28] now points to the freed context

!ndiskd.nbl ffffae88bbc109f0 -info
…
Info[0n28] ffffae88c0ffd9d0

The memory is still showing as allocated

d> !pool @r14 2
Pool page ffffae88c0ffd9d0 region is Nonpaged pool
*ffffae88c0ffd9c0 size:   40 previous size:    0  (Allocated) *Brdg
		Owning component : Unknown (update pooltag.txt)

The reason for this is the following code from NdisFreeNetBufferListContext (the nbl is in rbx) where the context is soft-freed (ie cached in the nbl)

ffff802`6e5cce6a 410fb74808      movzx   ecx,word ptr [r8+8]		; ecx = context->size (0x20)
fffff802`6e5cce97 0fb7433a        movzx   eax,word ptr [rbx+3Ah]		; eax = nbl->ndisreserved2   (80 in bad  (60 for context, 20 for bridge) 20 in good)
fffff802`6e5cce9b 2bc2            sub     eax,edx				; rdx (context->offset) = 0 so rax still == 80
fffff802`6e5cce9d 3bc8            cmp     ecx,eax				; ecx = context->size == 20
---------------------------------------------------------------------------------------------------------------------------------------------------------
fffff802`6e5cce9f 7c09            jl      ndis!NdisFreeNetBufferListContext+0x9a (fffff802`6e5cceaa)  Branch  ; Branch to ExFreePool in bad case 
----------------------------------------------------------------------------------------------------------------------------------------------------------
ndis!NdisFreeNetBufferListContext+0x91:
fffff802`6e5ccea1 4c898370010000  mov     qword ptr [rbx+170h],r8		  ; cache context in nbl +170    ; net_buffer_list_info[0n28]
fffff802`6e5ccea8 eb40            jmp     ndis!NdisFreeNetBufferListContext+0xda (fffff802`6e5cceea)  Branch  ; and jump to end..

When we add our driver in though, since our context has already been ‘soft’ freed and cached in net_buffer_list_info[0n28], the bridge.sys context free is the final block so the entire context is hard freed (ie by calling ExFreePool).

Ultimately though the issue appears to be that bridge.sys is using a context after freeing it. Has anybody else come across this before ??

Cheers

Mark