Invalid ARP entry

This is not specifically “kernel-mode-only” related, but I am hoping there are some networking experts out here to help out with this question…

What causes invalid entries in the ARP cache? Also, why would they not be removed through an ‘arp -d’ command?

I have a root-enumerated NDIS 6 miniport that seems to run fine. For our testing purposes, I install 2 instances on my system. During our tests, I see 4 ARP requests from Adapter A and an ARP reply from adapter B, and then after a few seconds, this pattern repeats. When I look at the ARP cache, indeed, the ARP request was never resolved for that interface, and that entry remains invalid.

Any idea why an ARP request would never get resolved?

Thanks in advance!
MarkH

I should add that this issue does not occur with exact same device driver and test scenario on my colleague’s system, which is also Windows 10 Version 2004. So the puzzle is why the ARP resolution is an issue on my system? What are some things we should be looking for?

The ARP protocol exists to discover the physical or MAC address that corresponds to a given IPv4 address on Ethernet media. The ARP cache on any host (including Windows) consists of entries that have been discovered by either GARP, promiscuous sniffing or most commonly ARP requests. The all expire after a time if not actively used. After they expire, when new IPv4 packets need to be sent, a new ARP request will be needed

IIRC ARP entries that show ‘invalid’, mean that an ARP request was sent, but no response has yet been received. I don’t have direct knowledge, but empirically, Windows will hold IP packets like TCP SYN packets while the ARP request is pending. The ARP -d may well delete the entry, but the continuing need to send packets will cause new ‘invalid’ entries almost immediately.

As you why you can successfully process APR request I have no idea. But common causes are bad VLAN configurations and spanning tree issues. The fact that you say root enumerated implies that there is no physical hardware (or at least that it is not Ethernet) so its going to be hard to help I think

Thank you for your reply @MBond2. But are you able to please expound upon what you mean by “bad VLAN configuration”? Yes, that is correct that there is no physical hardware; but I am puzzled as to why this works on 2 of my colleagues systems, but not mine. Is there anything to look for in the configuration of the virtual NIC?

well, in a typical scenario with real hardware ‘bad VLAN configuration’ means that there is some kind of mismatch between what the host expects to send and the switch expects to receive. The host may expect to send only untagged packets, but the switch may be configured for only tagged vlan 101. the host may be configured for tagged 201, but the switch tagged 301 with untagged packets sent to VLAN 12. Or maybe both hosts are expected untagged packets, but the switchports are configured with two different access vlans. the possibilities are many, but the point is that there is some kind of mismatch.

If this is the case, then the OS will send out ARPs, but they either get discarded by the switch or disappear into the Ether and never make it to the destination host that should respond.

Virtual NICs must have some kind of virtual media. And there is probably some kind of configuration for it. Not just for VLANs but for lots of other settings like MTU. That’s where I would look next