What driver in the network driver stack fragments the buffers?

Hello.

I have tried / tested this way:
I have VM1 and VM2, both configured with network adapter NAT, an external virtual switch set on VM1 (it’s using the one network adapter available in VM1).

With a test app I made a tcp client send a buffer of 3000 bytes from VM1 to VM2 (in VM lies the tcp server).

In my virtual switch extension (VM1) I have noticed that the packet arrived on ingress was one NB inside one NBL, and the packet size was 3058 bytes (so 58 bytes is the headers), and the IPv4 flag “Don’t Fragment” was set.

Now I wonder:
a) why did the protocol driver not fragment the buffers (and even specified “Don’t Fragment”)?
The NIC is found in the virtual switch extension to be of type “external”.

b) which driver is responsible for fragmenting buffers, the protocol or the miniport driver?

Thanks!

I meant,
“(in VM2 lies the tcp server).”

> a) why did the protocol driver not fragment the buffers (and even specified “Don’t Fragment”)?

The NIC is found in the virtual switch extension to be of type “external”.

As long as NIC supports jumbo frames you can pack 3K of data into a single transmission. Whether it is going to reach the destination without getting fragmented is already another question, and this question will be answered negatively by the very first router that has to fragment a packet - it will send ICMP error, informing the sender that destination cannot be reached without fragmentation. This is how TCP can discover the “minimal MTU size” on the way to destination, and send data in chunks that do not exceed this size, effectively avoiding IP fragmentation.

In your particular case, probably, this is exactly what happens - probably, TCP just “investigates” the situation and tries to discover “minimal MTU size” …

b) which driver is responsible for fragmenting buffers, the protocol or the miniport driver?

Well, IP fragmentation in itself is 100% IP’s (i.e. protocol layer’s) responsibility, and this is what DF bit in IP header refers to. Datalink theoretically may perform its own fragmentation as well, for example, if its MTU is less than 576 bytes (i.e. the minimal MTU size that IP allows, which also happens to coincide with the minimal datagram that IP is required to reassemble). However, this fragmentation has to be totally transparent to the IP - from the IP’s perspective it is going to send and receive 576-byte datagrams…

Try to get yourself Volume 1 (i.e. “Protocols”) of Stevens-Wright classic - I would say that when it comes to TCP/IP protocol suite, this is the most authoritative source of info in existence…

Anton Bassov

Thanks!

I have read today some about MTU path discovery.
As I understand, the DF flag is set, the data is sent, and there could be an ICMP error message “fragmentation needed but don’t-fragment bit set” which might contain the MTU of the interface / router (if tcp, MSS is also checked).

So:
a) if it receives an MTU value (in an ICMP message), it fragments the packet to have size = max that MTU value, again with DF set.
b) if it doesn’t receive an MTU value, it attempts with an arbitrary smaller value, again with DF set.
c) either way, it continues to fail with “fragmentation needed but DF is set”, until it finds a packet size that can pass through.

And this method is used in order to avoid having packets fragmented multiple times, at every router in the path (where each device / router could have a different MTU).

So I understand that, if want to add an encapsulation over IP, I need / must not concern myself with fragmentation, as that is being handled automatically by the other mechanisms. Even if it will fail with ICMP = “fragmentation needed and DF is set”, the protocol driver would fragment the buffers and try again, where in my filter driver they would grow in size, it would reach the interface / router, fail again, where the protocol driver would try with a smaller packet size, my filter driver would make them grow in size, it would reach the interface / router, and now they will pass through.