Hello Everyone
I have a bug check (alas, at customer) in the network stack, which is easily
identified as a stack overflow, but the stack is something the like of which
I’ve never encountered before in his space. I was wondering if any of the
NDIS gurus here have seen the like and would be kind enough to show some
signposts.
To set the stage a little, the adapter is an HP NC 375i (quad port, as it
happens) which is NetXen/QLogin chipset. The windows version is Server 2008
R2. The NDIS 6 adapter miniport driver is hpnd6x64, above which we have
(mine) NDIS 5 IM Filter NfPktFltr (child of passthru). We also have a couple
of NDIS 6 LWF in the picture - wfplwf and pacer.
So what happened here?
Well, (mine) NDIS 5 IM filter NfPktFltr QueryInformationHandler() has been
called, and has returned, after calling NDisRequest() as usual to pass the
request on down to the lower miniport. The request seems to have been given
to a system worker thread, and the errant stack is that of the system worker
thread. The business starts with ndisDoOidRequests(), which calls
ndisMDoOidRequest(), which in turn calls NDIS 6 MP hpnd6x64
OidRequestHandler(). Everything as expected up to here.
Now NDIS 6 MP hpnd6x64 OidRequestHandler() calls NdisMOidRequestComplete(),
which calls ndisMOidRequestCompleteInternal(). Everything as expected up to
here. The surprise is that ndisMOidRequestCompleteInternal() then calls back
into ndisMDoOidRequest(), which in turn calls back into NDIS 6 MP hpnd6x64
OidRequestHandler(). This mutual recursion happens more than 50 times, and
thread stack space is getting short now, eventually
ndisMOidRequestCompleteInternal() calls ndisOidRequestComplete(), which
calls ndisCompleteOidRequestToRequest(), which in turn calls into NDIS 5 IM
NfPktFltr RequestCompleteHandler(). So NDIS 5 IM NfPktFltr
RequestCompleteHandler() calls NdisMQueryInformationComplete() and a couple
of frames later ndisOidRequestComplete() calls KeSetEvent(). Everything
normal at the end here, except of course the stack overflow.
The part here of which I haven’t encountered the like in this space, well of
course it’s the mutual recursion around the loop NDIS 6 MP
OidRequestHandler() – NdisMOidRequestComplete() –
ndisMOidRequestCompleteInternal() – ndisMDoOidRequest() - NDIS 6 MP
OidRequestHandler(). If anyone has seen this before, can offer any ideas,
I’d be extremely grateful for signposts.
I’ve put the annotated stack trace below - compressed so as not to show the
same information 50+ times over
Thanks,
Lyndon
8: kd> kv nf
Memory Child-SP RetAddr : Args to Child
: Call Site
00 fffff8800211bfe8 fffff800
0166fca9 : 000000000000007f 00000000
00000008 0000000080050033 00000000
000006f8 : nt!KeBugCheckEx
01 8 fffff8800211bff0 fffff800
0166e172 : 0000000000000000 00000000
00000000 0000000000000000 00000000
00000000 :
nt!KiBugCheckDispatch+0x69
02 140 fffff8800211c130 fffff800
0167486c : 0000000000000000 00000000
00000000 0000000000000000 00000000
00000000 :
nt!KiDoubleFaultAbort+0xb2 (TrapFrame @ fffff880`0211c130)
** Stack overflow
03 5fbed0 fffff88002718000 fffff880
00e02fef : 0000000000000000 fffff880
00000000 fffff88000e63700 fffffa80
07f22168 : nt!KeSetEvent+0x1dc
04 70 fffff88002718070 fffff880
00e4d91c : fffff88002115101 00000000
00000000 fffff8800c9339e0 00000000
00000000 :
NDIS!ndisOidRequestComplete+0x44f
05 90 fffff88002718100 fffff880
00e4db9e : fffffa8008a669a0 00000000
00000000 fffffa800e2e11a0 fffffa80
00000230 :
NDIS!ndisMOidRequestCompleteInternal+0xdc
06 80 fffff88002718180 fffff880
00e9a629 : fffffa800e2e11a0 fffffa80
1235f010 fffff88000e63110 00000000
00000100 :
NDIS!ndisCompleteLegacyRequest+0x10e
07 60 fffff880027181e0 fffff880
04588d40 : 0000000000000000 00000000
00000000 0000000000000000 fffffa80
1235f010 :
NDIS!NdisMQueryInformationComplete+0xa9
*** NfPktFltr!RequestCompleteHandler called
NdisMQueryInformationComplete therefore this was not our own
IM-generated NDisRequest, we must have been in our miniport edge
QueryInformationHandler.
08 40 fffff88002718220 fffff880
00e8a98b : fffffa8008cdca00 fffff800
016b9fdc 000000006572444e fffffa80
00000230 :
NFPktfltr!RequestCompleteHandler+0x18c
** NfPktFltr!RequestCompleteHandler called therefore NfPktFltr called
NDisRequest
09 30 fffff88002718250 fffff880
00e02f33 : fffffa800e24b1a0 00000000
00000000 fffffa801235f000 fffffa80
0e24b1a0 :
NDIS!ndisCompleteOidRequestToRequest+0x5b
0a 30 fffff88002718280 fffff880
00e4d91c : fffffa8008cdca00 00000000
00000000 fffffa8008cdca30 00000000
00000000 :
NDIS!ndisOidRequestComplete+0x393
0b 90 fffff88002718310 fffff880
00e9a6de : fffffa8008cdca30 fffff880
00e63110 0000000000000000 fffffa80
08cdca30 :
NDIS!ndisMOidRequestCompleteInternal+0xdc
0c 80 fffff88002718390 fffff880
03c65899 : fffffa800e33e000 fffffa80
08cdca78 fffffa8008cdca30 fffffa80
00000000 :
NDIS!NdisMOidRequestComplete+0x6e
0d 40 fffff880027183d0 fffff880
00e8792c : fffffa800e24b1a0 fffffa80
08cdca78 0000000000000000 fffffa80
0e24b1a0 : hpnd6x64+0x29899
** The address hpnd6x64+0x29899 is offset 0x1e1 from the start of the
hpnd6x64 OidRequestHandler address displayed by ndiskd. The
previous instruction was a call through pointer to
NDIS!NdisMOidRequestComplete as expected. So this looks like it’s
part of the OidRequestHandler of hpnd6x64.
0e 30 fffff88002718400 fffff880
00e4d9a6 : 0000000000000000 00000000
00000000 0000000000000000 fffffa80
0e24b1a0 :
NDIS!ndisMDoOidRequest+0x26c
0f b0 fffff880027184b0 fffff880
00e9a6de : fffffa8008cdca30 fffff880
00e63110 0000000000000000 fffffa80
08cdca30 :
NDIS!ndisMOidRequestCompleteInternal+0x166
10 80 fffff88002718530 fffff880
03c65899 : fffffa800e33e000 fffffa80
08cdca78 fffffa8008cdca30 fffffa80
00000000 :
NDIS!NdisMOidRequestComplete+0x6e
11 40 fffff88002718570 fffff880
00e8792c : fffffa800e24b1a0 fffffa80
08cdca78 0000000000000000 fffffa80
0e24b1a0 : hpnd6x64+0x29899
** The above four frame sequence repeats 52 more times (of course the stack
pointer progresses in direction of overflow).
e2 30 fffff8800271da20 fffff880
00e4d9a6 : 0000000000000000 00000000
00000000 0000000000000000 fffffa80
0e24b1a0 :
NDIS!ndisMDoOidRequest+0x26c
e3 b0 fffff8800271dad0 fffff880
00e9a6de : fffffa8008cdca30 fffff880
00e63110 0000000000000000 fffffa80
08cdca30 :
NDIS!ndisMOidRequestCompleteInternal+0x166
e4 80 fffff8800271db50 fffff880
03c65899 : fffffa800e33e000 fffffa80
08cdca78 fffffa8008cdca30 fffffa80
00000000 :
NDIS!NdisMOidRequestComplete+0x6e
e5 40 fffff8800271db90 fffff880
00e8792c : fffffa800e24b1a0 fffffa80
08cdca78 0000000000000000 fffffa80
0e24b1a0 : hpnd6x64+0x29899
*** Going by name we expect ndisMDoOidRequest to call the
OidRequestHandler of an NDIS 6 miniport (or the
QueryInformationHandler of an NDIS 5 miniport, and the
OidRequestHandler of an NDIS 6 LWF). Since hpnd6x64 is an NDIS 6
miniport we expect the OidRequestHandler to be called here.
e6 30 fffff8800271dbc0 fffff880
00e0745d : 0000000000000000 fffff800
018155a0 fffff88000e63110 fffffa80
0715dcb0 :
NDIS!ndisMDoOidRequest+0x26c
e7 b0 fffff8800271dc70 fffff800
0167d961 : fffff88000e073d0 fffffa80
06898680 fffffa8007cf98c0 00000000
00000002 :
NDIS!ndisDoOidRequests+0x8d
** Here’s a worker thread running some NDis internal … something to do
with Oid Request-s we suppose?
e8 40 fffff8800271dcb0 fffff800
01914c06 : 0000000000000000 fffffa80
06898680 0000000000000080 fffffa80
0681a450 :
nt!ExpWorkerThread+0x111
e9 90 fffff8800271dd40 fffff800
0164ec26 : fffff880020a4180 fffffa80
06898680 fffff880020af2c0 00000000
00000000 :
nt!PspSystemThreadStartup+0x5a
ea 40 fffff8800271dd80 00000000
00000000 : fffff8800271e000 fffff880
02718000 fffff880027180e0 00000000
00000000 :
nt!KxStartSystemThread+0x16