Reusing a NET_BUFFER_LIST in a protocol driver

I’ve been trying to optimize my protocol driver by reusing the NET_BUFFER_LIST allocated by NdisAllocateNetBufferAndNetBufferList(). The documentation for this function says it can be reused by reinitializing it with another MDL chain. I’ve been trying to do this with no success. There must be a detail I’m missing somewhere. Maybe my google foo is weak but I’ve done a lot of searching for an example and the best I could find was this:
https://community.osr.com/discussion/275123
I don’t like this method because it does a NdisMoveMemory() to update the NBL - I tried this anyway just to see if I could reuse the NBL (using memory from SendNetBufferListPool and not freeing it). This causes some kind of memory corruption and the system eventually crashes in nt!RtlRbRemoveNode. In any case I don’t see this method as an optimization. There is a filter driver post here:
https://community.osr.com/discussion/172579/how-to-modify-netbufferlist-in-filterreceivenetbufferlists
This is not what I’m trying to do. I’d like to reuse a NET_BUFFER_LIST that is allocated and owned by my driver.

According to the documentation here:
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/nblapi/nf-nblapi-ndisallocatenetbufferandnetbufferlist
In the remarks section, it says the NBL can be reused by reinitializing it with another MDL chain. This is exactly what I’d like to do but it doesn’t work. Currently I’m doing this to write each packet:

The user provides the network packet in two parts, the header and the payload. These two parts have IoAllocateMdl(), MmProbeAndLockPages() and combined in the same MDL chain making a complete packet. Then we do:
pNetBufferList = NdisAllocateNetBufferAndNetBufferList(
pOpenContext->SendNetBufferListPool,
sizeof(NPROT_SEND_NETBUFLIST_RSVD),
0,
Packet.pMDL,
0,
Packet.size);
Then,
NdisSendNetBufferLists(
pOpenContext->BindingHandle,
pNetBufferList,
NDIS_DEFAULT_PORT_NUMBER,
wSendFlags);

This works fine and the Packet.pMDL is freed in the SendComplete callback. At this point I’d like to keep pNetBufferList instead of freeing it. On the next call to the driver I’d like to simply update the MDL and data size in the NBL. I suspect this would be a huge optimization. According to the doc if I were to keep pNetBufferList as pKeptNBL:

NET_BUFFER_FIRST_MDL  (NET_BUFFER_LIST_FIRST_NB(pKeptNBL)) = Packet.pMDL;
NET_BUFFER_CURRENT_MDL(NET_BUFFER_LIST_FIRST_NB(pKeptNBL)) = Packet.pMDL;
NET_BUFFER_DATA_LENGTH(NET_BUFFER_LIST_FIRST_NB(pKeptNBL)) = Packet.size;

I’m not playing with offsets so they should remain zero. Wouldn’t it be nice if it were that easy? Obviously this does not work or I would not be asking the question. Are there any online examples? A code same is worth a thousand words but if someone knows the correct solution please let me know.

How much are you really saving? Is the cost of preparing for reuse really significantly less than the cost of allocating new?

I’m glad you asked that question. This driver was ported from NDIS5 to NDIS6 not too long ago. I’ve been getting a lot of complaints that the NDIS6 version is slower. This is actually true when sending one packet at a time. When sending many packets in a list of NBL the performance evens out. The only real difference I can see is the NDIS5 version has per-allocated NDIS_PACKET structures that is was reusing. I stripped all that out when porting the driver to NDIS6.

I recall seeing somewhere in the MS docs that reusing the NBL adds performance. If I success in reusing the NBL I post the results here.