NDIS filter driver: buffer alignment and non-contiguous header modification

J_M · July 24, 2019, 5:42pm

Hi,
Let me ask you for some help.
I thought NDIS driver writers had to ensure NET_BUFFER 4-byte alignment at the Ethernet-IP border. That is, the 14 byte Ethernet header, if in the same MDL than the IP header, had to begin at an alignment offset of 2, so that the IP header was 4-byte aligned.
But, after reading other threads in this forum, now I think the opposite: NET_BUFFERs passed to FilterSendNetBufferLists/FilterReceiveNetBufferLists can be unaligned.
What seems counterintuitive, because an appropriate allocation and use of buffers could prevent that for the sake of performance.
So, if I want to read an IPV4_HEADER, and I want to avoid copies as much as possible, I could write:
NdisAdvanceNetBufferDataStart(NetBuffer, sizeof(ETHERNET_HEADER), FALSE, NULL);
__declspec(align(4)) UINT8 Storage[sizeof(IPV4_HEADER)]; // Only used if the IPv4 header is not contiguous.
UNALIGNED PIPV4_HEADER IPHeader = (UNALIGNED PIPV4_HEADER)NdisGetDataBuffer(NetBuffer, sizeof(IPV4_HEADER), Storage, 1, 0);
Is it right that the “UNALIGNED” macro is mandatory in this case if I don’t want to increase the AlignMultiple value passed to NdisGetDataBuffer?
That would solve the reading part.
Now imagine the header is not contiguous (Sending Ethernet Frames only prohibits MAC header splitting) and I need to write directly to it. Then, I need access to individual bytes for writing.
I guess using NdisGetDataBuffer with BytesNeeded = 1 can be overkill.
I’m currently using this alternative code to get a pointer to a single byte to write to it:
PUINT8 Byte = (PUINT8)MmGetMdlVirtualAddress(NET_BUFFER_CURRENT_MDL(NetBuffer)) + NET_BUFFER_CURRENT_MDL_OFFSET(NetBuffer);
In fact, I used to read the ETHERNET_HEADER (which is documented to be always contiguous) also this way (casting to PETHERNET_HEADER instead of PUINT8). But I’ve stopped doing it (in favor of NdisGetDataBuffer), because I don’t know if the guarantee is forced by NDIS.
Anyway, I’ve read the documentation for MmGetMdlVirtualAddress and it says:

MmGetMdlVirtualAddress returns a virtual address that is not necessarily valid in the current thread context. Lower-level drivers should not attempt to use the returned virtual address to access memory, particularly user memory space.
My question is: even if that is true in the general case (particularly for user memory space), can I assume in FilterSendNetBufferLists/FilterReceiveNetBufferLists that NET_BUFFERs handed to me are already locked and mapped into kernel virtual memory, and that MmGetMdlVirtualAddress will return a valid virtual address I can directly use without any other measures?
If not, what would be the correct way of writing to a non-contiguous NET_BUFFER?
I know it is inadvisable to modify the original NET_BUFFER instead of modifying a clone, and I actually clone the NET_BUFFER_LIST, but first I need to advance each NET_BUFFER past the TCP header, then clone, and finally retreat the original buffer and the clone by different lengths (the clone will have some additional headers added by the driver). As I want to deal with IP and TCP options differing among the different NET_BUFFERs in a NET_BUFFER_LIST passed to FilterSendNetBufferLists, I’m currently modifying the original buffer (at the first byte after the “advance”) to keep a backup of the length I need to retreat (a complicated trick because I don’t want to allocate context space).
My doubt also comes from the documented fact that NdisGetDataBuffer can return NULL when NB’s buffer can’t be mapped into VA. If that is the case also for filter drivers, buffers could be “not mapped into VA” when passed to the filter driver, making MmGetMdlVirtualAddress useless.
Last question: can I definitely assume NdisRetreatNetBufferDataStart will always return NDIS_STATUS_SUCCESS (as to ignore the result) if retreating an offset previously advanced (at least) by a call to NdisAdvanceNetBufferDataStart? I’m trying to reduce the amount of conditions I check, because they are overwhelming.

Thank you very much.
Best regards.

Jeffrey_Tippet_MSFT · July 26, 2019, 11:37pm

Is it right that the “UNALIGNED” macro is mandatory in this case if I don’t want to increase the AlignMultiple value passed to NdisGetDataBuffer?

So, on paper, UNALIGNED is required. But all 4 processor architectures that Windows currently ships on don’t implement unaligned access in software; they rely on either processor microcode or kernel trap handlers to fix up unaligned access. You’ll notice that our compiler toolchain, by default, generates code that optimistically assumes the data is aligned. (Unless you pass the under-documented -QRunaligned- flag.)

So in practice, you’ll see people forgetting about UNALIGNED, since no current platform will nag you about it. I’d still recommend including it anyway, since (a) it’s good practice for systems engineers to think about alignment, and (b) you never know what the next processor architecture will look like, and (c) it’s sort of handy to document where the weird stuff happens in your code.

In fact, I used to read the ETHERNET_HEADER (which is documented to be always contiguous) also this way (casting to PETHERNET_HEADER instead of PUINT8). But I’ve stopped doing it (in favor of NdisGetDataBuffer), because I don’t know if the guarantee is forced by NDIS.

NDIS6 doesn’t have any alignment requirements for the packet payload. Microsoft encourages NIC vendors to align the start of the IP header on a 4-byte boundary (which requires 2 mod 4 alignment of any outer 14-byte Ethernet header), but I think a couple years ago, our TCPIP stack reduced the penalty for an unaligned IP header, to where it doesn’t matter as much.

can I assume in FilterSendNetBufferLists/FilterReceiveNetBufferLists that NET_BUFFERs handed to me are already locked and mapped into kernel virtual memory, and that MmGetMdlVirtualAddress will return a valid virtual address I can directly use without any other measures?

Nah, you shouldn’t assume that. NDIS6 doesn’t force it, and I think I heard anecdotally once about a driver that didn’t map the MDLs into kernel VA. (NetAdapter makes it explicit whether you want the payload mapped into VA, PA, MDLs, and/or SGLs. So a NetAdapter based NIC driver can rely on getting the mappings it wants, and the OS doesn’t need to pay for mappings that aren’t needed.)

Use MmGetSystemAddressForMdlSafe. And the good news is that that routine is super efficient if the MDL is already mapped (which it would be 99.99% of the time).

I’m trying to reduce the amount of conditions I check, because they are overwhelming.

Understood – there are a lot. One trick you can play is you can move all this to a pre-processing step. Before your FilterSendNetBufferLists / FilterIndicateReceiveNetBufferLists does anything fancy, run though the NBL chain and ensure everything is mapped to kernel VA and any other sanity / preconditions are met. If you don’t like an NBL, it’s much easier to drop it early, than to wait until you’ve acquired some locks and buffers and are in the middle of editing it.