NDIS protocol driver scrambling or dropping NBL(s) in ProtocolReceiveNBLs

Hi

I have a protocol driver PD1. It subscribes to certain ether_types (not IPV4/6) and also adds a multicast addr.
I have another 3’rd party protocol driver as well PD2. It also subscribes for certain ether_types (not IPV4/6) and also sets multicast addr.

My issue is
0) When I disable all the MS and other protocols from the NIC, below happens
a) At start I get all the NBLs I need, I make a connection
b) Then I loose connection because I did not receive the periodic multicast pkt’ed NBLs
Basically frame drop above the miniport. ( I see them on wire etc)
c) I open WireShark and everything is fine.

I am thinking
A) This is nothing to do with the way I am using OID_802_3_MULTICAST_LIST because of a).
B) Rather it is PD2, that is mucking the NBL list it got in its PRotocolReceiveNetBufferLists().
C) My hypothesis is w/o Wireshark, PD2 is getting an NBL pattern that it is mis-handling, (basically unlinking the chain but not re-chaining everything back or missing NdisReturnNetBufferLists() for some NBLS etc. when NDIS_TEST_RECEIVE_CAN_PEND=1 etc.).
D) With WShrak enabled (promiscuous mode, probably CAN_PEND=0 etc.), PD2 is getting a NBL pattern that it can ALWAYS handle.

I used below
!ndiskd.pendingnbls with NBLTracking enabled on target etc.
Not sure if/how to use its o/p to debug my issue.

I do not have source code for PD2.
PD1/PDR are NDIS6.0, miniport is 6.20. Target is Win7 (i.e. 2k8 R2 SP1) server.
(I know OID_802_3_MULTICAST_LIST doesn?t validate whether unicast or multicast) I also use OID_802_3_MULTICAST_LIST to add a unicast filter (but the pkts I am missing are multi-cast). Probably no significance here.

Let me know any ideas/tools on how to forward on this.

Thx

Also

For [a] to succeed above, I should have got the all the required pkts which include that multicast pkt as well (It is sort of big-bang pkt to kick start my protocol state machine)
The ether_types and multicast filters of PD1 and PD2 are diff
Everything work all fine when I do not disable any of the rest of protocols.

I disabled because I have another production issue where I miss the pkts I need (but or on wire). So suspecting a misbehaving protocol. The missing pkt here could be multicast or unicast, (but their ther_type is the PD1 subscribed ethertype ofcourse)

What OID_GEN_CURRENT_PACKET_FILTER is your protocol (PD1) sending down?

!ndiskd.mopen will show you what NDIS thinks your protocol’s packet filter is. While you’re at it, you could look at what PD2’s packet filter is.

0: kd> !ndiskd.mopen fffffa8003f55c30

RECEIVE PATH

Packet filter DIRECTED, MULTICAST, BROADCAST
Frame Type(s) 0x0800, 0x0806
Multicast groups 01-00-5e-00-00-01
00-00-01-00-00-00

Output below

  1. PD1 looks good
  2. PD2 seem to have only MULTICAST set, (I need to check what its semantics are). But consider is good for now and anways shoudln’t matter to the issue here.
  3. (Driver version probably from versions resources, so they need updated)
  4. Flags [No flags set] –> Need to see what this is.

>PD1 Protocol
0: kd> !ndiskd.protocol fffffa800a263ce0
PROTOCOL
PD1
Ndis handle fffffa800a263ce0
Ndis API version v6.0
Driver context fffff880011eb340
Driver version v5.1
Reference count 2
Flags [No flags set]

>PD1 mopen
0: kd> !ndiskd.mopen fffffa8015a8e8d0
OPEN
Ndis handle fffffa8015a8e8d0
Flags USE_MULTICAST_LIST
References 1
State Running
Pause requests 0 [Open can restart]

Protocol fffffa800a263ce0 ? PD1

RECEIVE PATH

Packet filter DIRECTED, MULTICAST
Frame Type(s) PD1_e1, PD1_e2
Multicast groups 01-xx-xx-xx-xx-xx See all on miniport


> PD2 Protocol
0: kd> !ndiskd.protocol fffffa800a41e010
PROTOCOL
PD2
Ndis API version v6.0
Driver context NULL
Driver version v0.0
Reference count 2
Flags [No flags set]

>PD2 mopen
0: kd> !ndiskd.mopen fffffa801710f8d0

OPEN
Flags USE_MULTICAST_LIST
References 1
State Running
Pause requests 0 [Open can restart]

Protocol fffffa800a41e010 ? PD2

RECEIVE PATH

Packet filter MULTICAST
Frame Type(s) PD2_e
Multicast groups 01-xx-xx-xx-xx-xx See all on miniport

Also what is the purpose of NDIS/miniport sending ProtocolStatusEx.NDIS_STATUS_PACKET_FILTER to protocols. i.e. is it

  1. Information only - PD1 treats as such now.
  2. Protocols need to take action?
    I added instru. to just trap if the new filter is not a superset of PD1.
    Before on this status indication I used to update my filter again, irrespective of what was contained in the indication.
    (Probably removed it because my filter was never gets changed irrespective of what other protocols do as NDIS maintains that per binding etc).

I set OID_GEN_CURRENT_PACKET_FILTER and OID_802_3_MULTICAST_LIST during PRotocolBindADapterEx() and remove them during unbind.

****Not sure if this is of significance, but my unicast mac_Addr is not that of LAN mac addr and I add this to the filter list using OID_802_3_MULTICAST_LIST. I have 2 of these unicast filters I add for my protocol functioning***

O.k.

I add 3 multicast filters “one by one” and then set the filter = directed | mcast.

Q1) Is OID_802_3_MULTICAST_LIST cumulative per-binding? I thought so, but will do the cumulative add now.
Q2) Then not sure, why I get the (later missing) pkts some of the time. Maybe the PKT filter and list is best effort from NDIS/miniport.

Thx.

So it sounds like you’re overwriting your multicast address. I gather that you are setting multiple multicast addresses, one at a time. But the OID_802_3_MULTICAST_LIST doesn’t work that way. (More below). You should expect to see ALL your multicast addresses (including the unicast address that you send down via that OID) listed under !ndiskd.mopen for PD1. Your output below only shows a single multicast address, which doesn’t sound like it’s what you want.

RECEIVE PATH

Packet filter DIRECTED, MULTICAST
Frame Type(s) PD1_e1, PD1_e2
Multicast groups 01-xx-xx-xx-xx-xx // if you are joining multiple multicast groups, you should see them all listed here

Also what is the purpose of NDIS/miniport sending ProtocolStatusEx.NDIS_STATUS_PACKET_FILTER to protocols.

It’s not usually useful. If you needed to know what the current packet filter is for diagnostic purposes, I suppose you could use this information. But most protocols can happily ignore this status indication.

Q1) Is OID_802_3_MULTICAST_LIST cumulative per-binding?

No, this OID resets multicast list for your protocol binding. If you issue this OID twice, the second call overwrites the first. However, there is a slightly different request: OID_802_3_ADD_MULTICAST_ADDRESS. This request DOES have the semantics that you imagined: each call is cumulative with all prior requests. NDIS 6.x protocols are free to use either OID. Use whichever is easier for you, although for your own sanity, I do not recommend that you use both in the same protocol driver.

Incidentally, NDIS does track the multicast list separately per protocol, and is careful to union them all together before sending it down to the miniport.

Q2) Then not sure, why I get the (later missing) pkts some of the time. Maybe the PKT filter and list is best effort from NDIS/miniport.

Indeed, a packet filter is best-effort. There are plenty of cases where NDIS or the miniport will give you MORE packets than you asked for. (It’s a bug, of course, if NDIS or the miniport gives you FEWER packets than you asked for.) For example, if the NIC makes a receive indication with NDIS_RECEIVE_FLAGS_RESOURCES, the NBL chain will not be filtered in NDIS at all; your protocol will typically get EVERY packet in the chain, including packets that don’t match your packet filter. (This is not contractual; it’s only an implementation detail that is subject to change in the future.)

  1. PD2 seem to have only MULTICAST set, (I need to check what its semantics are).

It means the protocol set a packet filter of NDIS_PACKET_TYPE_MULTICAST.

Driver version probably from versions resources

Actually it comes from NDIS_PROTOCOL_DRIVER_CHARACTERISTICS.M*orDriverVersion, but it doesn’t really matter much.

[No flags set] –> Need to see what this is.

This is !ndiskd’s way of saying that no protocol-driver flags are set on the protocol block. Since protocol flags are an internal implementation detail, you’re not really expected to care much about this. There aren’t any interesting flags for protocol drivers. (A miniport block is more interesting.)

Thanks Jeff

> NDIS 6.x protocols are free to use either OID
I made the changes to do cumulative using MULTICAST_LIST.
(Also since ADD_MULTICAST_ADDR OID is 6.x optional, I presume NDIS handles that for miniports who don?t have that pair)

But I see below anomaly between PDI per-binding vs. miniporst wise filter list dump.
I hope it is just an ndiskd issue? I order I use is MULTICAST_LIST[-01, -04, -05]

Per-binding dump is incorrect. Miniport dump is correct. I am debugging still

RECEIVE PATH

Packet filter DIRECTED, MULTICAST
Multicast groups 01-10-18-01-00-01 See all on miniport
80-00-01-00-00-00<<<<<<<<<<<<<<**************
01-10-18-01-00-04

0: kd> !ndiskd.miniport fffffa800a99c1a0 -filterdb
PACKET FILTER
Multicast addresses (Current count: 4; Maximum: …)
01-10-18-01-00-01
01-10-18-01-00-04
01-10-18-01-00-05
01-80-c2-00-00-0e ? from PD2

Some minors and nit-picks?.
[Q2]
Also in ndis.sys which has precedence OID_GEN_CURRENT_PACKET_FILTER or OID_802_3_MULTICAST_LIST, i.e.

A protocol driver subscribes for ether_type e1. Sets pkt_filter = unicast | mcast.
All e1 pkts are of multicast type only ? mcast_Addr1
>*****Protocol driver didn?t configure mcast_Addr1******
>but for now (some how) miniport is enabled to receive mcast_Addr1

Miniport indicates to NDIS all ***e1 pkts it got***.
>Will protocol get ALL e1 pkts?
>Will protocol still get all the e1 pkts if it did not set the pkt_filter to MULTICAST.

[Q3]
Probably http://msdn.microsoft.com/en-us/library/windows/hardware/ff569073(v=vs.85).aspx can be updated to clarify on the non-cumulative nature, maybe not ADD/DELL_MCASR_ADDR OID pair existence implies that?

> I hope it is just an ndiskd issue?

Oh dear, it is indeed just an ndiskd issue. I put this bug into ndiskd over two years ago, and I’m embarrassed to say that I’ve never noticed it until you pointed it out. Even the example output I pasted earlier in the thread clearly shows an obviously-bad multicast address! Thanks for catching this. A future update to the WDK will have a better ndiskd. Specifically, the problem is that !ndiskd.mopen only displays the first multicast address correctly; every subsequent address is junk data. The multicast addresses displayed in !ndiskd.miniport-filterdb should still be correct.

I presume NDIS handles that for miniports who don’t have that pair

Yes. Actually, the documentation is misleading in that it suggests miniports might possibly see that pair of OID requests. Miniports will NEVER see either OID_802_3_ADD_MULTICAST_ADDRESS or OID_802_3_DELETE_MULTICAST_ADDRESS; NDIS always handles those internally and translates them to OID_802_3_MULTICAST_LIST. If your miniport implements the ADD or DELETE requests, you have dead code.

which has precedence OID_GEN_CURRENT_PACKET_FILTER or OID_802_3_MULTICAST_LIST?

Both are required to guarantee delivery of packets. Since NDIS (and the miniport) might decide to give your protocol EXTRA packets, you might sometimes see your packets when you have only set one (or neither) of the two OID requests. But to guarantee that you get your packets, you need to send both OID requests. It doesn’t matter which order you issue the OID requests in.

Probably [MSDN] can be updated to clarify on the non-cumulative nature

Yes, that’s a good point. I’ve filed an internal bug to clarify those MSDN pages.

Jeff

I landed in above scenario trying to minimize (disable all MS protocols leaving just PD1 and PD2) the problem space of my production case.

To debug that main production issue which is happening (even after with above fixes), any tools/ideas. Issue is I do not receiving some of my unicast subscribed-to pkts.

The pkt is unicast and has ether_type PDI_e1 which I subscribed to in my PD1.NdisOpenAdpaterEx() etc.
This pkt has a mac_addr different from TCPIP mac i.e. diff from ndiskd.miniport dump
Clearly I have set pkt filter as unicast as well below
I add that pkt to my NIC HW filter through a custom OID to miniport
But I am still not getting that pkt at times and so back to suspecting the PD2 ? 3?rd party protocol I need for my stack to work (no src code etc)
****I can guarantee 100% PD1 is not mishandling PreotocolReceiveNBLs()****

[Q1] Any tools, ndiskd options, approaches for me to try.

I will re-check as indicated earlier that no pause is happening etc. Not sure why PAUSE or such will be happening when no protocols are joining/leaving and (assuming miniport resets are not happening).
Still I should not loose that pkt and ?*******anyways that pkt is seen on Wirshark********?
I will disable all filters as well (we disable all during install by design)

Then I will try putting my protocol in NDIS_PACKET_TYPE_PROMISCUOUS

0: kd> !ndiskd.miniport fffffa800a99c1a0 -filterdb
PACKET FILTER
Traffic classes
Combined packet filter DIRECTED, MULTICAST
Miniport packet filter DIRECTED, MULTICAST
Unicast address
00-xx-xx

[Q2] Have below generic questions on NDIS policy/mechanism

(PD2 only subscribes to PD2_e2 and its filter is only MULTICAST. PD2_e2 pkts are always mcast etc)
Suppose NDIS got NBL chain< PD2_e2, PD1_e1, PD2_e2> from miniport to indicate to protocols?

[a] Does NDIS gives above same chain anytime (or does it care to break) to PDI and PD2 w/o setting status_resoirces=1, I guess not?

[b] Suppose PD2 gets the above chain first and doesn?t NdisMReturnNBLs() the PD1_e1
What is the end result of it? How to detect/track the culprit?

[c] Any consequences if protocols returns the pkts from chain, in out of order or in sub-chains etc. I guess no.

Thanks

===
[d] For general understanding

Will NDis indicate the NBLS to protocol in the same thread Context (DPC or not) of the miniport ALWAYS?
When(/why) will NDIS reschedule (use its own DP or not thread) the indication back to protocols?
Does NDIS sets DISPATCH_LEVEL flag if miniports do not set it (if NDIS knows it is in DPC context already).

  1. The issue happens on windows 2012 server as well.

  2. The pkt we are missing is ether1_unicast
    My protocol driver has
    a) subscribed for that ether1 and
    b) also sets pkt filter to unicast
    c) that unicast mac (different from natiive mac addr (i.e. lan mac addr) is added to miniport using a custom OID
    b) ******Previous pkts matching this filter we receive*****

****
so not sure why my protocol doesn’t receive this pkt (when all the filters are PERFECT MATCH, but WShark does. All WShark did is it put itself in promiscous mode. I am running, not paused etc.
************

  1. When we run WShark in non-promiscuous mode, we do not see the pkt in Wshark?

  2. Current disposition is we end up in this state after we do repeated cable pulls (for this discussion the issue keeps happening once the cable is inserted and longe after that).
    On link up we do not set the filters or oid_mcast_list again.

  3. MSDN
    NDIS_PACKET_TYPE_PROMISCUOUS
    Specifies all packets regardless of whether VLAN filtering is enabled or not and whether the VLAN identifier matches or not.

Is it literally above i.e. all about VLAN and stuff or is it just, you will get all pkts that the miniport indicates (or the NIC/HW indicates)?

Another probably NDISKD doc issue.
Below dump is for WShark protocol when it is run in non-primiscuous mode.
Verbatim still blurbs about promiscouos mode when it should about ALL_LOCAL etc

RECEIVE PATH

This protocol will receive all traffic indicated by the miniport, because:

  • this open has set a promiscuous packet filter

Packet filter ALL_LOCAL
Frame Type(s) [This protocol has not registered any frame types]
Multicast groups [This protocol has not added any multicast groups]

I’m afraid I’m out of ideas on how to debug this problem further. I’ve done my best to answer your other questions though. Maybe somebody else on this list has experience with this type of problem?

Suppose NDIS got NBL chain< PD2_e2, PD1_e1, PD2_e2> from miniport to indicate to protocols.
Does NDIS gives above same chain anytime (or does it care to break) to PDI and PD2 w/o setting status_resoirces=1, I guess not?

NDIS reserves the right to indicate the complete chain to PD1 and PD2. STATUS_RESOURCES is one example of the several cases where NDIS exercises that right.

[b] Suppose PD2 gets the above chain first and doesn?t NdisMReturnNBLs() the PD1_e1

Theoretically, if PD2 leaks the NBL, then you’d miss it. However in practice, NDIS will usually use a synchronous receive indication for NBL chains that are given to multiple protocols. Therefore, you’d see a thread stuck in PD2 if PD2 was actually leaking one of these NBLs.

You can use !ndiskd.pendingnbls on Windows 8 to see if there are any NBLs stuck anywhere.

[c] Any consequences if protocols returns the pkts from chain, in out of order or in sub-chains etc. I guess no.

There are no negative consequences of this, and protocols do it all the time. In fact, NDIS itself will sometimes rearrange return NBLs.

Is it literally above i.e. all about VLAN and stuff or is it just, you will get all pkts that the miniport indicates (or the NIC/HW indicates)?

The miniport is told about the packet filter, and most miniports will alter their behavior based on it. So if there is a Promiscuous packet filter, then most Ethernet miniports will indicate up *more* packets than if the packet filter had just been Unicast.

Below dump is for WShark protocol when it is run in non-primiscuous mode. Verbatim still blurbs about promiscouos mode when it should about ALL_LOCAL etc

This is actually correct – the ALL_LOCAL packet filter is effectively promiscuous too: all traffic from the remote endpoint will *also* be indicated up to your protocol. Wireshark appears to be relying on that behavior.

Thanks, I got enough leads/info above. I can debug forward now.