Not receiving (subscribed ethertype) NBL in protocol driver(evn aft miniport_indicate

All
I have a NDIS6x miniport and my protocol driver.
I see my miniport doing NdisMIndicateReceiveNBLs() to NDIS.
But I do not see that NBL in my protocol. I see that NBL is Wireshark (a protocol driver)? The NBL in question is random, i.e I cannot deterministaicaly break on that NBL in my miniport.
How do I debug this ?
For now I am looking at NDIS tracing ‘recv’ component to see if it helps.
I will know I did not receive the NBL after an internal timeout (ofcourse).
Any ways to debug this.
I disabled all protocols attached to my miniport except my protocol. under that condition I do not see the issue.

My hypothesis
NDIS indicated the NBL to one of the protocol.
Either

  1. That protocol somehow messed up the NBL chain or
  2. For some reason NDIS decided it doesn’t have to pass that ether_type NBL to me.

Above are bit drsastic at this stage, so wondering if there are any other reasons this could happen or ways to debug.
I specifically subsribe to the ether types I need (in NDIS_OPEN_PARAMETERS.FrameTypeArray etc). My miniport (in some conditions) doesn’t set the NET_BUFFER_LIST_INFO(,NetBufferListFrameType).
I do not knwo how NDIS decides to indicate what pkts to what protocols, I mean does NDIS always does a deep header check to figure out ether_type and pass it up efficiently to only interested protocol drivers. Becuase soem protocosl might not bother to set the OOB[ether_type]?

Basically my test is, traffic is going on and link pulls are happening, and my protocol seem to not get indicatd on some pkts randomaly at random intervals.

What percent of NBLs are vanishing? What is the pattern – is it one at a time, with random intervals, or does it seem to be busrty? Is any other event correlated with the problem?

NetBufferListFrameType is not such a big deal if you’re a regular 802.3 miniport; NDIS knows how to read the frametype out of the Ethernet header (or SNAP header). It will use the value it finds to match against your protocol’s FrameTypeArray. In order for NDIS to deliver the packet to you, these requirements must be met:

  1. Your protocol must have opened the adapter, and the open must be in Running state.
  2. The FrameTypeArray is empty, or it contains an entry that matches the NB’s ethertype.
  3. The packet filter (OID_GEN_CURRENT_PACKET_FILTER) permits the destination address (unicast, multicast, etc.)
  4. The packet wasn’t dropped by some LWF.

In order to read the ethertype, NDIS needs to map the packet’s Ethernet header into system VA. Normally, miniports indicate packets that are pre-mapped, but if not, NDIS needs to allocate the VA. This can fail if the system is critically low on VA, and NDIS just drops the packet. In practice, this is not an issue, because most miniports pre-map their receive packets (actually, they are just sliced out of a single common buffer); and even then, VA is rarely so scarce that we can’t allocate a few bytes for the packet header. But for completeness, I’ll mention this too.

It’s not legal for one protocol to somehow mess up the NBL chain for your protocol. NDIS is implemented in such a way that this would be difficult to do: if two protocols would receive the same NBLs, then NDIS sets NDIS_RECEIVE_FLAGS_RESOURCES on the indications. Then the protocol really isn’t supposed to unlink NBLs from the chain, so it’s unlikely that any protocol is doing this.

(That’s not to say it’s impossible for one protocol to break the rules and hide an NBL from you… but it seems unlikely).

Check that your protocol isn’t getting paused – this easy to check with a quick breakpoint, so I’d check this first. You will not get packets while you’re paused.

The next thing I’d do is instrument the protocol and miniport to add NBL counters. Just before calling NdisMIndicateReceiveNetBufferLists, and immediately as soon as NDIS calls ProtocolReceiveNetBufferLists. If the counters match, then you might have a bug in your protocol driver.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Wednesday, March 30, 2011 7:13 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Not receiving (subscribed ethertype) NBL in protocol driver(evn aft miniport_indicate

All
I have a NDIS6x miniport and my protocol driver.
I see my miniport doing NdisMIndicateReceiveNBLs() to NDIS.
But I do not see that NBL in my protocol. I see that NBL is Wireshark (a protocol driver)? The NBL in question is random, i.e I cannot deterministaicaly break on that NBL in my miniport.
How do I debug this ?
For now I am looking at NDIS tracing ‘recv’ component to see if it helps.
I will know I did not receive the NBL after an internal timeout (ofcourse).
Any ways to debug this.
I disabled all protocols attached to my miniport except my protocol. under that condition I do not see the issue.

My hypothesis
NDIS indicated the NBL to one of the protocol.
Either

  1. That protocol somehow messed up the NBL chain or
  2. For some reason NDIS decided it doesn’t have to pass that ether_type NBL to me.

Above are bit drsastic at this stage, so wondering if there are any other reasons this could happen or ways to debug.
I specifically subsribe to the ether types I need (in NDIS_OPEN_PARAMETERS.FrameTypeArray etc). My miniport (in some conditions) doesn’t set the NET_BUFFER_LIST_INFO(,NetBufferListFrameType).
I do not knwo how NDIS decides to indicate what pkts to what protocols, I mean does NDIS always does a deep header check to figure out ether_type and pass it up efficiently to only interested protocol drivers. Becuase soem protocosl might not bother to set the OOB[ether_type]?

Basically my test is, traffic is going on and link pulls are happening, and my protocol seem to not get indicatd on some pkts randomaly at random intervals.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks Jeffrey for the detailed explanations.
Below transpired after my previous post.
The issue still happens even if disable all the other protocols. So this is not the likely an issue with other protocols.

I am a regular 802.3 miniport;
Yes the counters match in my miniport and my protocol
It is one NBL at a time at random intervals.
I will check the LWF stuff.
I think my open is not paused but will check.

I am printing all the frame contents both in my miniport and my protocol. When issue happens I see the frame print in miniport but not from my miniport
When I see this NBL drop symptom, I do not see any of the error paths being printed i.e. I am not deliberatly dropping that packet (I drop becuase it is not of my ether_type etc)

I had my miniport changed to set NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE flag so that my protocol can optimize (decide if I need to process the NBL chain or not). I set this flag correctly etc.
My optimization was if NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE is set and if the ether_type of the first NBL is not my NBL, I skip processing the whole chain.
I subscribe for only one ether_type

But I see a case where I get a NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE chain with 3 NBL?s but not all the 3 NBLs? are of same ether_type.
The NBL chain looks like below
NBL0 ? Belongs to me
NBL1 ? Doesn?t belong to me
NBL2 ? Belongs to me

I will check if NDIS_RECEIVE_FLAGS_RESOURCES is set.

So when I am processing the second NBL1, I hit my assert becuase I am not at first NBL, but also was anyways dropping the whole chain. I fixed to not key off of single_ether_type hint anymore.

So basically the issue NOW is how did I get a chain with NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE set but not all the NBLS are of same ether_type.

Basically my questions now are
A) Does NDIS coalesce NBL RECV chains ever before indicating them to protocols? Under what conditions?
B) If so, does NDIS do any additional checks (like NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE) before it decides to coalesce.

My current hypothesis is somehow my NBL chains got coalesced by NDIS before being indicated to me without regard to NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE. I might be wrong.

I want to exploit this single_ether_type performance hints that miniport/NDIS provides. But based on above behaviour, my decision now is I can’t exploit single_ether_type.

Will debug more, but let me know on above NDIS mechanism questions.

Thanks

Hmm, that’s a good bug. This is easily the sort of thing that somebody (including NDIS itself) could get wrong, since it’s a fairly uncommon flag, and setting it inappropriately usually wouldn’t be noticed.

I can only think of one case in NDIS itself (versions 6.0 - 6.20) where we’ll rearrange received NBLs. That’s for Receive Side Throttling ( http://msdn.microsoft.com/ff570441.aspx - there is some support in Vista too, despite the fact that only 6.20 miniports explicitly participate ). But we do not set NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE there (I just double-checked).

So the next place to look would be the LWFs bound between your protocol and miniport. I suppose unbinding them one-by-one is the easiest way to rule them out. (The builtin PACER and WFPLWF can be unbound without causing any problems).

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Thursday, March 31, 2011 7:42 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Not receiving (subscribed ethertype) NBL in protocol driver(evn aft miniport_indicate

Thanks Jeffrey for the detailed explanations.
Below transpired after my previous post.
The issue still happens even if disable all the other protocols. So this is not the likely an issue with other protocols.

I am a regular 802.3 miniport;
Yes the counters match in my miniport and my protocol It is one NBL at a time at random intervals.
I will check the LWF stuff.
I think my open is not paused but will check.

I am printing all the frame contents both in my miniport and my protocol. When issue happens I see the frame print in miniport but not from my miniport When I see this NBL drop symptom, I do not see any of the error paths being printed i.e. I am not deliberatly dropping that packet (I drop becuase it is not of my ether_type etc)

I had my miniport changed to set NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE flag so that my protocol can optimize (decide if I need to process the NBL chain or not). I set this flag correctly etc.
My optimization was if NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE is set and if the ether_type of the first NBL is not my NBL, I skip processing the whole chain.
I subscribe for only one ether_type

But I see a case where I get a NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE chain with 3 NBL?s but not all the 3 NBLs? are of same ether_type.
The NBL chain looks like below
NBL0 ? Belongs to me
NBL1 ? Doesn?t belong to me
NBL2 ? Belongs to me

I will check if NDIS_RECEIVE_FLAGS_RESOURCES is set.

So when I am processing the second NBL1, I hit my assert becuase I am not at first NBL, but also was anyways dropping the whole chain. I fixed to not key off of single_ether_type hint anymore.

So basically the issue NOW is how did I get a chain with NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE set but not all the NBLS are of same ether_type.

Basically my questions now are
A) Does NDIS coalesce NBL RECV chains ever before indicating them to protocols? Under what conditions?
B) If so, does NDIS do any additional checks (like NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE) before it decides to coalesce.

My current hypothesis is somehow my NBL chains got coalesced by NDIS before being indicated to me without regard to NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE. I might be wrong.

I want to exploit this single_ether_type performance hints that miniport/NDIS provides. But based on above behaviour, my decision now is I can’t exploit single_ether_type.

Will debug more, but let me know on above NDIS mechanism questions.

Thanks


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

My miniport is ndis6x server sides (2k8 SP2 and R2), so no RST involved I guess.
During my initial investigation, I disabled all other protocols, still saw the issue.
I disabled only PACER (that QoS protocol) I still saw the issue as well.
Also our installer (for a different reason) already detaches the standard WFP LWF for my NIC already :-(, but anyways will check this LWF stuff on that failing machine again and debug further…

Thanks