NDIS filter driver pausing requirements

Hi,

When reading https://docs.microsoft.com/en-us/windows-hardware/drivers/network/pausing-a-filter-module and https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ndis/nc-ndis-filter_pause, I tend to think that the only way to fulfill some of the requirements is incrementing a shared counter for each NBL passed in calls to NdisFSendNetBufferLists or NdisFIndicateReceiveNetBufferLists from my filter driver and decrementing it for each NBL received back in calls to FilterSendNetBufferListsComplete or FilterReturnNetBufferLists (my handlers) from underlying or overlying drivers. Only when the count is zero can the pausing process be completed (with additional measures to make the check safe and prevent race conditions).

As even interlocked operations aren’t free and every sharing should be avoided (my driver needs to cope with 10 Gbps traffic), I want to be sure I’ve understood the documentation correctly.

Can the algorithm be improved? I’m not only referring to more lightweight ways of performing the counting, that I’m already considering, like having a counter per processor and things like that.

Specifically, when the documentation says “Should not originate any new receive indications”, “Should not originate any new send requests”, is it referring to any calls to NdisFIndicateReceiveNetBufferLists or NdisFSendNetBufferLists from the filter driver? Or only to receive indications and send requests for new packets created genuinely by the filter driver, not those calls resulting from the driver altering/forwarding packets handed out by underlying or overlying drivers? Are all premature pause completion confirmations from the filter driver problematic?

For example, this phrase seems to relax receive rules a bit:

In the Pausing or Paused states, a filter driver should continue to handle OID requests or status indications. The driver should reject calls to its FilterSendNetBufferLists function. The driver can pass on calls to its FilterReceiveNetBufferLists function. However, the driver cannot pass any buffers that it created. The driver must not originate any receive indications or send requests.

It seems to point out that the deeper constraint is not being able to create buffers (allocate) after the filter driver confirms the pause; i.e., extending packets passing through the driver. But not all calls to NdisFIndicateReceiveNetBufferLists seem to be prohibited in paused mode, and there’s still a hope that not all NBLs must be counted.

I’ll appreciate any information that allows me to better assess/alleviate the need for NBL counting to properly implementing filter driver pausing.

Thank you very much.

Best regards.

1 Like

You’ve basically arrived at the correct conclusions.

“Should not originate any new send requests”, is it referring to any calls to NdisFIndicateReceiveNetBufferLists or NdisFSendNetBufferLists from the filter driver?

That’s the mitigation: you aren’t required to refcount NBLs that you just pass through. If you originate* an NBL, you need to refcount that. If you make meaningful* modifications to an NBL, you need to refcount that. But if you just pass through an NBL, you don’t need to refcount it. (Someone else will be refcounting it, since someone originated it.)

  • “originate”: you called NdisAllocateNetBufferList or similar API; it’s your NBL and your responsibility to call NdisFreeNetBufferList on it.

  • “meaningful”: any change that you’d be required to undo on the completion path. So for example, writing to NBL->Scratch or NBL->Next is not meaningful, because you don’t need to undo that. But if you insert an MDL or change the packet length, then you must refcount this NBL, so your filter is still around long enough to undo what you did. Once you’ve undone your changes, you can release the refcount.

The idea is that, once your filter is Paused, it can be detached at any time. So you cannot allow your filter to enter the Paused state until there’s no more work for you to do. “work” here means freeing NBLs that you originated or undoing meaningful modifications that you performed.

Can the algorithm be improved?

You might take a look at EX_RUNDOWN_REFs. They’re not perfect: there’s no way to be notified when the rundown is complete, so you have to have a thread sit and wait for it. But it’s actually just fine to do that wait from within your FilterPause handler, since that thread would block anyway. Ex rundowns are basically the best you can do with a single integer refcount; their implementation uses all the usual tricks to improve performance.

But as you mentioned, there’s only so much you can do with a single integer; the next level of scalability comes from per-processor counts. There’s a rundown for that too: EX_RUNDOWN_REF_CACHE_AWARE. And you can always build your own. (Watch out for false sharing.)

1 Like

Thank you very much, Jeffrey. Your help is always invaluable!

You mention “meaningful modifications” which is another recurring headache for me.
Do I need to undo (in FilterReturnNetBufferLists and FilterSendNetBufferListsComplete) changes that didn’t produce allocations nor change the packet’s length, but altered the packet’s content?
Some examples:

  • UDP packet port modified as the packet traversed the filter driver.
  • IP header’s protocol field modified to bypass normal protocol processing by overlying drivers but retain accessibility through raw sockets.
    My guess is that it depends on whether not undoing the changes could hinder other drivers in their post-processing of operation completions.
    I suppose there’s no way to prevent, let’s say, a miniport from reading the UDP port in MiniportReturnNetBufferLists to make a relevant decision about some aspect of NBL destruction, even if it’s uncommon or inadvisable to depend on packet contents on the completion handler.
    Are there any official constraints or recommendations in this respect?

Best regards.

Yeah, unfortunately there’s a substantial “grey area” where, perhaps, you’re technically required to undo your changes, but where many drivers don’t bother, to improve performance.

Here’s my breakdown on where each datastructure field goes. There’s a bit of wiggle room on some of these. For example, editing packet payload on transmit path is slightly riskier than editing on the receive path, since protocols might do a retransmit and not be aware that the buffer was edited out from under them.

Fields that an NDIS driver MUST NOT write

  • NET_BUFFER_LIST::NdisReserved
  • NET_BUFFER_LIST::Context (use NdisAllocateNetBufferListContext instead)
  • NET_BUFFER::Reserved
  • NET_BUFFER::NdisReserved
  • NET_BUFFER::DataPhysicalAddress
  • NET_BUFFER::SharedMemoryInfo (in the latest version of NDIS)
  • MDL::Size
  • MDL::Process
  • MDL::MdlFlags (MmGetSystemAddressForMdlSafe writes a flag that is permitted)
  • MDL::MappedSystemVa (use MmGetSystemAddressForMdlSafe instead)

Fields that an NDIS driver MUST NOT write, unless that driver allocated the NBL/NB

  • NET_BUFFER_LIST::FirstNetBuffer
  • NET_BUFFER_LIST::ParentNetBufferList
  • NET_BUFFER_LIST::NdisPoolHandle
  • NET_BUFFER_LIST::SourceHandle
  • NET_BUFFER::Next
  • NET_BUFFER::NdisPoolHandle

Fields that an NDIS driver MAY write, but which MUST be restored before completion

  • NET_BUFFER::CurrentMdl
  • NET_BUFFER::CurrentMdlOffset
  • NET_BUFFER::DataLength
  • NET_BUFFER::MdlChain
  • NET_BUFFER::DataOffset
  • MDL::Next
  • MDL::StartVa
  • MDL::ByteCount
  • MDL::ByteOffset

Fields that an NDIS driver MAY write, but which SHOULD be restored before completion
These are the grey area ones. On paper, you should restore these. To get maximum compat/interop, you should restore these. But some drivers do not.

  • NET_BUFFER_LIST::NblFlags
  • NET_BUFFER_LIST::NetBufferListInfo (in general; some exceptions exist)
  • NET_BUFFER::ChecksumBias
  • The packet payload itself

Fields that designated NDIS drivers MAY write, depending on the field (see docs)

  • NET_BUFFER_LIST::ProtocolReserved
  • NET_BUFFER_LIST::MiniportReserved
  • NET_BUFFER_LIST::Flags
  • NET_BUFFER::ProtocolReserved
  • NET_BUFFER::MiniportReserved

Fields that an NDIS driver MAY write, and which carry no obligation to restore before completion

  • NET_BUFFER_LIST::Next (unless the NDIS_RECEIVE_FLAGS_RESOURCES flag is set!)
  • NET_BUFFER_LIST::Scratch
  • NET_BUFFER_LIST::ChildRefCount
  • NET_BUFFER_LIST::Status

@“Jeffrey_Tippet_[MSFT]” said:
Fields that an NDIS driver MUST NOT write

  • NET_BUFFER::SharedMemoryInfo (in the latest version of NDIS)

Can you clarify this, please?
Shouldn’t NET_BUFFER::SharedMemoryInfo be set by miniport drivers making VMQ receive indications?

NDIS 6.20 introduced that for VMQ, but in subsequent versions, the virtualization stack doesn’t use the fancy shared memory provider feature.

But you’re right that I shouldn’t have classified that as “MUST NOT”. That should have been “designated NDIS drivers MAY write”, with the caveat that it is a waste of CPU cycles to write it on current versions of the OS.

@“J._M.” said:
Do I need to undo (in FilterReturnNetBufferLists and FilterSendNetBufferListsComplete) changes that didn’t produce allocations nor change the packet’s length, but altered the packet’s content?
Some examples:

  • UDP packet port modified as the packet traversed the filter driver.
  • IP header’s protocol field modified to bypass normal protocol processing by overlying drivers but retain accessibility through raw sockets.

Sorry to resurrect an old thread, but as someone new to NDIS and starting work on a filter driver, I’m not clear on something. How is it possible to alter a packet without introducing a copy that would need to be freed in the FilterSendNetBufferListsComplete handler? I thought a driver could not write any data into an MDL it didn’t create.

What am I missing, here?

Thanks,
Dave

Sorry to resurrect an old thread

From the community guidelines, Mr. Ruske, which you no doubt read before you posted:

There are a number of other behaviors that are considered rules of “good conduct” for posting here on the forums. These include:

  1. Please do not revive “old” threads. Please not reply to threads where the last post is more than a month old. This is referred to as “necroposting.” If you have a comment/question/issue that’s raised in a “dead” thread, just start a new thread (you can post a link to the old thread if you want) and ask your question there.

Peter

1 Like

Entirely correct, I screwed up, and can now only ask for forgiveness and promise to follow the guideline moving forward. Is it too late to create a new thread (as I should have done), or would that just compound the damage at this point?