Hey everyone,
I’m trying to nail down an out of order issue in my NDIS 6.2 driver’s rather simple send and receive path, and this is probably highlighting a fundamental flaw in my understanding. I found this excellent archived thread and I believe I’m fulfilling the requirements listed by Jeffrey, but the problem remains.
To set the stage, in the simple case I have a 128 core system, 16 transmit queues and 32 receive queues (2 per MSI-X entry). The receive path is easy since it uses RSS, so there’s no real way to reorder packets as Windows is selecting the queues and I’m indicating packets as I receive them.
In the transmit path, 16 cores are affinitized to the 16 queues via MSI-X entry, and they handle calling NdisMSendNetBufferListsComplete. I’ve spread those 16 queues across the other 112 cores evenly for MiniportSendNetBufferLists.
Now the logic is simple, when I receive a NET_BUFFER_LIST from MiniportSendNetBufferLists, I find the mapped queue, and append that NET_BUFFER_LIST to its work list. The thread then pops off whatever is at the head of the work list and writes it to the NIC. These 16 queues aren’t ordered with each other and can all write to the NIC simultaneously, but from the docs this seems fine since each CPU only has a single queue it will write to. Going through Jeffrey’s instructions:
If a single call to MiniportSendNetBufferLists has two packets A1 and A2; you will transmit A1 before transmitting A2.
This is true, since a single call to MiniportSendNetBufferLists always lands on the same queue, and writing to the NIC is serialized per queue via spinlock.
If we call you at DISPATCH_LEVEL on CPU X to send packet B1, then later call you again at DISPATCH_LEVEL on CPU X to send packet C1; you will transmit B1 before transmitting C1.
This is true, since we finish with an entire list of NET_BUFFER_LISTs before moving on to the next, and CPUs always write to the same queue.
If we tell you to send packet D1, and you call NdisMSendNetBufferListsComplete(D1), then later we tell you to send packet E1; you will transmit D1 before transmitting E1.
This is true, assuming it’s referring to packets sent on CPUs mapped to a single queue.
I’m somewhat at a loss here – can anyone confirm that it’s fine if the ordering between NET_BUFFER_LISTs on different CPUs isn’t maintained? Or are TCP streams (where ordering matters) always going to hit the same CPU?
Cheers,
David