NDIS 5.1 not returning indicated packets for a while

I have a customer running my drivers under Windows 2003 and they have encountered a situation where the network appears to freeze for a short time (seconds to a minute).

After a bit of to and fro and adding some debugging to the driver, I can see that during these times of freezing, I am indicating packets to Windows but Windows is not returning them for a long time. They are always returned eventually though.

I have encountered this before with small packets, and I resolved it by always indicating such packets with STATUS_RESOURCES, but am reluctant to do this with all packets unless that is the only option to fix this. That reluctance isn’t based on any actual performance testing though, which I’m yet to do.

What could cause this? The packets are all TCP (and I think all SMB/CIFS), and all have only a single buffer attached to them. The problem happens rarely , maybe a few times a day. I have not been able to reproduce this problem in my testing environment.

One thing I am not doing is falling back to STATUS_RESOURCES when my supply of packets getting low. Perhaps this is all I need to do. I’ve never encountered a situation where Windows didn’t quickly return packets before though.

Thanks

James

If you can detect this condition in a debugger and then break on your MiniportReturnPacket entry, you might be able to tell from the call stack what entity was hanging on to the packets because it is likely that the offending entity will be very near the root of the call chain.

And yes, you probably should prefer to indicate low resources but keep making forward progress and not hold or drop indications until packets are returned.

Do you get a bunch returned all at once or do they trickle back?

Good Luck,
Dave Cattley

>

If you can detect this condition in a debugger and then break on your
MiniportReturnPacket entry, you might be able to tell from the call stack
what entity was hanging on to the packets because it is likely that the
offending entity will be very near the root of the call chain.

The machine having problems is remote, in production, and not mine, so getting a debugger onto it is likely not possible.

And yes, you probably should prefer to indicate low resources but keep
making forward progress and not hold or drop indications until packets are
returned.

I’m making that change now.

Do you get a bunch returned all at once or do they trickle back?

I’m DebugPrint-ing stats every 10 seconds, so the granularity is very course, but it appears that the packets get stuck all at once, and then returned all at once.

Thanks for the input

James

> What could cause this?

Well, the very first idea that gets into my head is that of some dumbly-implemented third-party
tap/PF/sniffer/analyzer/etc that communicates with the userland. If this “masterpiece” does not either return or indicate packets to NDIS until its gets authorization from the userland, this is exactly the kind of scenario one would expect if his userland part goes blocking for a significant time…

Anton Bassov