Using large MDL with WskSendTo function

Hi All,

In my driver code which is a NDIS miniport driver, i am planning to use Winsockets to send data to the network. This driver will receive data from the NDIS port driver and copy the data to ring buffer (32MB size). This ring buffer will be mapped to a MDL using IoAllocateMDL().

For socket programming, when i checked the documentation of the WskSendTo(), it takes a MDL, Offset and Length embedded in the WSK_BUF structure. Then can i use something like this for the WSK_BUF:

WSK_BUF.Mdl = RingBufMDL (One time initialization)
WSK_BUF.Offset = Offset In the RingBufMDL (Will be updated for every call to WskSendTo())
WSK_BUF.Length = Size of the bytes to be sent from the RingBufMDL

For each call to the WskSendTo(), i will just update the Offset and Length. Basically, i am trying to avoid IoAllocateMDL and IoFreeMDL for each call to WskSendTo() function which takes WSK_BUF. Please advise me on this.

Have you tried it? What leads you to fear that it wouldn’t work?

BTW, IoAllocateMDL and IoFreeMDL are lightweight functions. To a certain degree, you are guilty of premature optimization here.

Thanks Tim.

I was just thinking of IoAllocateMDL and IoFreeMDL for every NDIS packet, might be time consuming overall.

Hello All,

I got one more question is WskSendTo() sends packets sequentially? For example, if i sends 10 packets numbered from 1 to 10 to the socket using the WskSendTo() function, then IoCompletionRoutine() for these packets will be called in the same order? I see WskSendTo() is not using any buffering internally.

Is this a new kind of network adapter? Most established NIC families have established code bases. Usually, the send path is simple to implement relative to the receive path.

If you send 10 packets from a single thread, they will probably be sent in that order. But the completion handler calls are another matter. But before you think that this can matter, what happens if 10 different threads send 1 packet each?

Before you say that this is unreasonable, consider a DNS server. Every single input packet (DNS query) is a separate request, so it is not unreasonable to process them on multiple threads in a thread pool. And that means that on a single socket, multiple threads will read and write concurrently

That’s just with one socket. Real NICs have to process traffic from possibly hundreds of processes and thousands of sockets.

In the send path, you have the option of stalling the caller if you can’t immediately complete the request. In the receive path, no such option exists. The data can continue to arrive at the wire rate, even when the system memory pressure means that allocations fail. And it is theoretically impossible to pre-allocate enough memory.

Hi MBond2,

This is my problem i am trying to understand. Assume that i have 20 network packets to be written to socket from a single thread. To facilitate the send method i am using a pool of 10 memory buffers. Consider this situation. I am writing packet from number 1 to 10 to the 10 available memory buffers. As soon as i write one memory buffer, invoking the WskSendTo function to write to the sockets. Assume that IoCompletion routine (callback from the WskSendTo) for these memory buffer are delayed. So i have to wait for a free memory buffer to write the 11th packets onwards. I can stall my write operation till memory buffer gets freed.

Here is the thing i am trying to understand. If the IoCompletion routine invoked in order for the packets 1 to 10, then as soon as the i received the IoCompletion for the 1st packet, i can go and write 11th packet content in the 1st memory buffer. Basically, I don’t need to check which memory buffer is free in the pool. I can decide this very easily if the IoCompletion routine completed in order.

After packet 1 to 10 written to the memory buffer:
Wait for first memory buffer free → IoCompletion for packet no 1-> First Memory buffer is free → Write 11th packet
Wait for second memory buffer free → IoCompletion for packet no 2 → Second Memory buffer is free → Write 12th packet
and so on.

Basically, i am trying to avoid IoAllocateMDL and IoFreeMDL for each call to WskSendTo() function which takes WSK_BUF. Please advise me on this.

This works fine. Winsock itself does this for built-in TCP/UDP/RAW sockets.

I got one more question is WskSendTo() sends packets sequentially?

Assuming you’re sending them all to the same socket, precisely-speaking, the answer is no. But in practice, the answer is yes.

If the datagrams are all going to the same socket, then most of the time, they’ll be processed in order. But it’s possible for a WFP callout or NDIS filter driver to pluck out a single unlucky datagram and delay it. We don’t encourage them to do that, since packet reordering within a socket is usually bad for perf, so most don’t do this. So it’s not completely crazy for your application to require previous transmits to be completed before sending the next – especially if you have enough space in the ringbuffer to hide occasional small delays.

If you’re shipping your product on millions of arbitrary machines, you’ll discover an annoying gotcha: on occasion, NIC drivers will completely lose an NBL. This will eventually result in some sort of bugcheck (usually 0x9F), but until then, the system might sort of limp along in a seemingly-normal state. If you rely on completions to come back, you’ll get bitten hard if the NIC driver loses one of your NBLs. Your datapath will seize up, and customers will of course blame you, since your feature is likely to be the first user-visible symptom to appear. To defend against this, I suggest adding some careful counters & bookkeeping to know when a completion is taking an unreasonable time. If you’re only shipping your product on a few machines, or machines with carefully-vetted NIC drivers, this won’t be as much of a problem.

Also, if you’re still anxious about requiring in-order completion, it’s probably not too crazy difficult code or bad for perf to add the ability to use buffers out-of-order. (This is what Winsock does for the built-in TCP/UDP/RAW sockets). You can keep a freelist of idle buffers, and fill the ring with buffer indexes. Or similar schemes. This means that if someone does lose a buffer, your pool of available buffers will shrink a bit, but you won’t completely seize up. You don’t have to do this, but it might help you sleep better.

1 Like

Thanks Jeffrey for the great input.

Yeah as you mentioned i will go with maintaining free list of available buffers instead of relying on the in-order packet completion.

Jeffry has much more direct knowledge than i do, but as a general programing practice, you should never rely on in order completion of anything that runs on SMP machines and has preemtive multi-tasking. the operations might indeed complete in order, but thread pre-emption can immediatly derail any ordering guarentees unless you run at high enough IRQL. And if you do, system performance likly suffers unless this is a purpose built system