I/O completion port does not deliver requests in the order they were completed

Hi,

I have developed a simple WDF function driver for a IEEE 1394 device. The function driver uses the Microsoft 1394 bus driver in the FIFO list backing store mode. In this mode the function driver adds backing store to an interlocked single linked list, and the 1394 bus driver retrieves backing store from the list when it receives a packet and returns the packet to the function driver using a callback.

From user mode an application thread uses the driver in the following way:

  1. opens a handle to the device
  2. creates an I/O completion port (IOCP) for the handle with a concurrency count of 1
  3. passes 300 backing store requests (IOCTL, METHOD_IN_DIRECT) to the driver
  4. calls GetQueuedCompletionStatus on the completion port
  5. processes the completed request, and
  6. returns the backing store request to the driver
  7. repeats from step 4

The packets I receive on the address range contain an ascending sequence number. A bus analyser confirms that the packets appear in order on the bus and there are no retransmissions or corrupted packets.

I have tested this implementation with Windows XP Professional Version 2002 SP2 on IBM R51, R60, and T60 laptops. It works fine except on the R60 with a Intel Celeron M 420 CPU @ 1.60 GHz. Here to my surprise the application thread sometimes receive packets out of order from the IOCP!

To further investigate the problem I have introduced the following check in the callback invoked by the bus driver:

  1. acquire a spinlock
  2. check that the received packet is the successor to the previously received packet, otherwise trace the error
  3. complete the associated request
  4. release spinlock

This check confirms that the driver completes packets in order. However, the application thread still sometimes receives packets out of order.

Am I wrong in my understanding of completion ports: threads are queued in LIFO order and completed requests are queued in FIFO order? From my understanding I concluded that if only a single thread received packets from the IOCP then the thread would receive the packets in the order they were completed, thus packets would appear to the application thread in order.

According to my understanding the following happens when my driver completes a request: an APC is queued on the thread that submitted the request, when the thread is scheduled the APC is executed finalizing the completion of the request in the context of the submitting thread in this case queuing a completion packet on the IOCP. When the APC terminates the application thread performs its own work.

Am I doing any obvious mistakes? I thought I had created a text book example on using IOCPs…

xxxxx@hedemand.com wrote:


I have tested this implementation with Windows XP Professional Version 2002 SP2 on IBM R51, R60, and T60 laptops. It works fine except on the R60 with a Intel Celeron M 420 CPU @ 1.60 GHz. Here to my surprise the application thread sometimes receive packets out of order from the IOCP!

To further investigate the problem I have introduced the following check in the callback invoked by the bus driver:

  1. acquire a spinlock
  2. check that the received packet is the successor to the previously received packet, otherwise trace the error
  3. complete the associated request
  4. release spinlock

This check confirms that the driver completes packets in order. However, the application thread still sometimes receives packets out of order.

Am I wrong in my understanding of completion ports: threads are queued in LIFO order and completed requests are queued in FIFO order? From my understanding I concluded that if only a single thread received packets from the IOCP then the thread would receive the packets in the order they were completed, thus packets would appear to the application thread in order.

We actually had a similar experience using I/O completion ports with our
USB driver. We have come to the conclusion that I/O completion ports
are not really designed for hardware. They’re designed for things like
web servers, where you need to handle a bazillion unrelated I/O requests.

The operating system guarantees that GetQueuedCompletionStatus will
return requests to you in the order they were received. However, once
it’s in your completion thread, anything can happen. We had to grab a
lock after the GQCS call, and that was the problem. If three requests
came in all at once, two of them would be waiting for the lock. When
the lock was released, there are NO guarantees about who gets awakened
first.

Our solution was to create only one thread calling
GetQueuedCompletionStatus instead of N+1 threads (where N = number of
CPUs), as recommended in the docs. This solved the serialization
problem, and since we had to grab a lock anyway, it did not impact
performance at all.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hi Tim,

thank you for your reply! I think my design is exactly like the solution you describe: there is only a single application thread that submits all backing store requests and it is the same application thread (and only that thread) that calls GetQueuedCompletionStatus on the IOCP.

I have researched the problem a little more and I am able to reproduce it by opening a big file in notepad. What happens is that the application thread receives the packets in the order …, 2, 3, 7, 4, 5, 6, 8, 9, … Opening a very big file enhances the problem: … 2, 3, 998, 4, 5, 6, … 997, 999, 1000, … keep in mind that the driver received and completed the packets in order!

The application thread must receive the incoming packets in order, so an IOCP seems like a nice FIFO for receiving incoming packets. I cannot believe IOCPs are “broken” like this.

Br,
Thomas Nielsen

From MSDN:

When you perform an input/output operation with a file handle
that has an associated input/output completion port, the I/O system
sends a completion notification packet to the completion port when the
I/O operation completes. The completion port places the completion
packet in a first-in-first-out queue. The
GetQueuedCompletionStatus function retrieves these queued completion
packets.

Notice the wording in the first sentence; it says “when the I/O
operation completes”. So remember an IO could be completed in ANY order
with respect to other IO and placed in the queue. Once in the queue, it
is guaranteed to be FIFO, but FIFO with respect to the order of
completion.

-Jeff

Hi Jeff,

I am aware of these remarks to the GetQueuedCompletionStatus. In the design I have tried to synchronize the completion of the backing store requests in the following way to ensure that packets were received from the IOCP in order.

The IEEE 1394 bus driver returns a received packet to my driver using a callback registerred by my driver. This callback first acquires a spinlock which is only associated with the receiving address range, then it checks if the packet is the successor to the previous packet, if it was not an error trace is issued, then WdfRequestComplete is called and finally the spinlock is released. Thus the order of a received packet is checked and the packet is completed in a mutual exclusive way. This marks the completion of the backing store request from the perspective of my driver. I get no error traces, so I conclude that the driver completes the backing store requests in the expected order.

I learned from Russinovich’s Windows Internals that there is more to the completion of an I/O request: an APC is queued to the APC queue of the thread that submitted the request, and the APC is executed performing the work of the I/O manager when the thread is again scheduled. I am assuming the APC queue is a FIFO queue.

Since I am submitting the backing store requests from the only thread that calls GetQueuedCompletionStatus on the IOCP, and since the driver completes the the requests in order in a mutual exclusive way, then I would expect the thread to receive the packets in order from the IOCP.

What is breaking this argument?

Br,
Thomas Nielsen

I would assume that this

>Celeron M 420 CPU @ 1.60 GHz. Here to my surprise the application thread

sometimes receive packets out of order from the IOCP!

This is normal. IRP completion order is not guaranteed.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> The operating system guarantees that GetQueuedCompletionStatus will

return requests to you in the order they were received.

Yes, but “the order they were received” is a volatile thing in the kernel,
since IoCompleteRequest ordering is not guaranteed.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> I am aware of these remarks to the GetQueuedCompletionStatus. In the design I

have tried to synchronize the completion of the backing store requests in the

I would add the serial number field in the IOCTL’s packet structure and
increment it in the driver on each new callback from 1394bus. Then assemble the
packets in proper order in user mode.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

OK, so the I/O request completion order is nondeterministic in general. Is there no special cases in which the completion order is deterministic?

The two step completion process described by Russinovich’s Windows Internals:

  1. driver calls WdfRequestComplete, an an APC is queued (in FIFO manner) on the requesting thread, and
  2. The queued APC executes the last step in the I/O Manager’s completion of the request inthe context of the requesting thread
    gave me the impression that the requesting thread would experience a completion order identical to the order in which the driver called WdfRequestComplete.

What is the I/O manager doing, or what conditions does it meet during its competion part that will result in a nondeterministic completion order?

Br,
Thomas Nielsen

So the general conclusion is that if a user thread passes backing store requests to a driver, and the order of the received data is important, then the thread must itself arrange the completed requests into the correct order. This places requirements on the interface to the driver and will most likely require the implementation of a user mode driver API to hide the ugly details of this task.

When there’s only one outstanding request at a time? :slight_smile:

There probably are a number of conditions under which deterministic I/O completion order could happen, but that’s like saying a system can be realtime some of the time. Given that you’re completing requests while holding a spinlock (a no-no generally) such that you should only be putting one request into the completion port’s queue at a time, I’ll admit I’m surprised you’re not seeing a deterministic order. I’m assuming you only have one thread servicing the completion port (you are using a completion port I seem to recall - if not then you’re probably never going to be deterministic).

But I don’t believe we’ve done anything to guarantee determinism, and what you’re seeing would seem to back that up, and the only way to figure it out would probably be to put some tracing in the kernel and try to track down why things are going out of order when we don’t promise they’ll be in order anyway.

The I/O manager is designed to be fast and dumb. Guaranteeing completion ordering might be feasible, but I’m sure (like anything) it would come with a cost. It would probably incur that cost for any application regardless of whether it cared about order of completion since I doubt it would be a zero-insertion force style addition. Though it makes your life more painful, it’s better to put the onus for special requirements like this onto the application (or class of applications) that are introducing the requirement rather than making everyone pay the cost.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@hedemand.com
Sent: Thursday, February 28, 2008 11:35 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] I/O completion port does not deliver requests in the order they were completed

OK, so the I/O request completion order is nondeterministic in general. Is there no special cases in which the completion order is deterministic?

The two step completion process described by Russinovich’s Windows Internals:

  1. driver calls WdfRequestComplete, an an APC is queued (in FIFO manner) on the requesting thread, and
  2. The queued APC executes the last step in the I/O Manager’s completion of the request inthe context of the requesting thread
    gave me the impression that the requesting thread would experience a completion order identical to the order in which the driver called WdfRequestComplete.

What is the I/O manager doing, or what conditions does it meet during its competion part that will result in a nondeterministic completion order?

Br,
Thomas Nielsen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer