FILE_FLAG_OVERLAPPED - &OVERLAPPED

Hello all.

Digging in kernel32.dll I found that when passing NULL as address of OVERLAPPED structure in Read/Write/IoCtl API functions these functions execute WaitForSingleObject(hFile, …) in the case of STATUS_PENDING. From the docs I know that it is forbidden to pass NULL for OVERLAPPED on asynchronously opened handles. Then why these functions execute Wait? Just to prevent programmers’ mistakes?


Thanking In Advance,
Mikae.

>Then why these functions execute Wait?

Because they don’t know how the handle was opened.

On 16-May-2012 13:52, xxxxx@yahoo.com wrote:

Hello all.

Hello, Mikae :slight_smile:

Digging in kernel32.dll I found that when passing NULL as address of OVERLAPPED structure in Read/Write/IoCtl API functions these functions execute WaitForSingleObject(hFile, …) in the case of STATUS_PENDING. From the docs I know that it is forbidden to pass NULL for OVERLAPPED on asynchronously opened handles. Then why these functions execute Wait? Just to prevent programmers’ mistakes?

Passing NULL in this case is Undefined Behavior. How it is implemented
(or patched), may vary in each specific case.

– pa

In this case, if there are multiple overlapped IOs pending, the wait in the hfile will be completed when the first io completes, not necessarily the io issued on that thread

d

debt from my phone


From: Pavel A
Sent: 5/16/2012 6:39 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] FILE_FLAG_OVERLAPPED - &OVERLAPPED

On 16-May-2012 13:52, xxxxx@yahoo.com wrote:

Hello all.

Hello, Mikae :slight_smile:

Digging in kernel32.dll I found that when passing NULL as address of OVERLAPPED structure in Read/Write/IoCtl API functions these functions execute WaitForSingleObject(hFile, …) in the case of STATUS_PENDING. From the docs I know that it is forbidden to pass NULL for OVERLAPPED on asynchronously opened handles. Then why these functions execute Wait? Just to prevent programmers’ mistakes?

Passing NULL in this case is Undefined Behavior. How it is implemented
(or patched), may vary in each specific case.

– pa


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Alex, they don’t need it as I think. Since IopSynchronousServiceTail executes Wait(FileObject ->Event, …); in the case of synchronous operation and STATUS_PENDING from the lower driver. ReadFile doesn’t know how a file was opened, but it checks presence of OVERLAPPED structure. It looks like ReadFile tries to be synchronous even when a file is opened asynchronously and a programmer didn’t supply OVERLAPPED structure.

Well, I have another general questions. Why it is so important to serialize synchronous IO operations? When a file is opened in synchronous mode, NtXxxFile serialize access to the file. Could you provide a small example when serialization is important?

Thank you.

Hello, Pavel :slight_smile:

I know it. I just wonder why ReadFile does wait. Looks like an attempt to hide a bug of API programmer from API programmer. I don’t want to say if it is good or bad, I just want know that I didn’t miss something.

Doron, didn’t get you. As I see from WRK I have to supply an event in the case of asynchronous operation. The event will be set when completing IRP. If the operation is synchronous, I don’t supply any events, FileObject ->Event will be used. To prevent from multiple threads using the same FO’s event simultaneously there is a FO’s lock which is acquired in the case of synchronous operation.

It looks consistent, the only things I don’t get is why synchronous operation must be serialized (it is easier to implement but makes IO slower) and why ReadFile waits on handle even in the case of asynchronous API (probably to make forgetful programmers’ life easier, but not sure).

It has nothing to do with making a forgetful programmer’s life easier, it is faster to wait and follow the synchronous contract when there is no OVERLAPPED. Serialization is important because you need to have a mode where io is serialized because that is a needed programming model for some apps. Lots of apps don’t need overlapped io, they want a simpler file io contract

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Wednesday, May 16, 2012 7:19 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] FILE_FLAG_OVERLAPPED - &OVERLAPPED

Hello, Pavel :slight_smile:

I know it. I just wonder why ReadFile does wait. Looks like an attempt to hide a bug of API programmer from API programmer. I don’t want to say if it is good or bad, I just want know that I didn’t miss something.

Doron, didn’t get you. As I see from WRK I have to supply an event in the case of asynchronous operation. The event will be set when completing IRP. If the operation is synchronous, I don’t supply any events, FileObject ->Event will be used. To prevent from multiple threads using the same FO’s event simultaneously there is a FO’s lock which is acquired in the case of synchronous operation.

It looks consistent, the only things I don’t get is why synchronous operation must be serialized (it is easier to implement but makes IO slower) and why ReadFile waits on handle even in the case of asynchronous API (probably to make forgetful programmers’ life easier, but not sure).


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>it is faster to wait and follow the synchronous contract when there is no OVERLAPPED.

But waiting in the case of synchronous operation is already done in kernel mode. There are following options:

  1. ReadFile (synch): wait in kernel mode.
  2. ReadFile (asynch): provide the event with OVERLAPPED.
  3. ReadFile (asynch): OVERLAPPED not provided. MSDN says not to do that. In this case wait will be satisfied by my IO completion or someone’s synch IO completion since the event is shared and no serialization between synch/asynch operations.

Lots of apps don’t need overlapped io, they want a simpler file io contract

This is what I think. I just can’t imagine an example of an app needs serialization.

On 16-May-2012 17:45, xxxxx@yahoo.com wrote:

I just can’t imagine an example of an app needs serialization.

printf()

-pa

Pavel,

Ok, we have two threads issuing WriteFile(“Hello world!\n”) and WriteFile(“Goodbye world!\n”) simultaneously. Finally it is all about IRP (I don’t consider FastIO now). At some point we need serialization since we have only one output device. But isn’t it better to do it in driver to reduce lock’s granularity?

> It looks consistent, the only things I don’t get is why synchronous operation must be serialized (it is

easier to implement but makes IO slower)

To guard lseek pointer (FILE_OBJECT::CurrentByteOffset)


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Hi Mikae
We had a problem some time ago that was traced to a scenario where a component of Exchange was opening a file asynchronously but then not passing in an overlappped structure. It appears that in this unhappy circumstance, ReadFile will assume the file was opened synchronously but this can have some unintended consequences (in this case when filter manager was added to the equation the read completed prematurely). There’s an excellent article by Alex that describes this problem at

http://fsfilters.blogspot.com/2011/05/irp-completion-statuspending-and-fltmgr.html

Regards

Mark

Maxim, it not only guards the pointer, but also an FO::Event that is used to wait for the IO. I don’t know how FILE_OBJECT::CurrentByteOffset is used (I didn’t see any operation on it during NtWriteFile, but I am tired a bit). Probably it is possible to protect CurrentByteOffset with interlocked operations? I will try to find it out later. Just want to mention that not only CurrentByteOffset is under protection.

My question is consequence of a conversation with some guy. I explained him how synch/asynch works and how FO::Lock is used to prevent from simultaneous using of only one FO::Event in multiple synch threads and also mentioned FO::CurrentByteOffset.

The dialogue was like this:
He: Why do they do serialization? They can speed up synch IO without serialization.
Me: In this case you have to add a new event to IRP structure, since you can’t use only one FO::Event. Probably they don’t want to change code that works.
He: So what? MS can do it for the sake of speed.
Me: Ok, apps may also need it to make life easier.
He: Can you give an example?
Me: Hmmm… Have to think about it.

I also get the answer for my previous question: asynch IRP are added to thread’s IRP list: CancelIo uses the list to cancel IRPs that are pending.

Mark, thank you for the article. I will read it tomorrow, for now I bet that wait was satisfied by some thread that completed synch IO while you IO was still in progress. After setting FO::Event by someone’s completing IO you return from ReadFile, but your data is still not ready. So you read previous content of the passed buffer. At some point your IRP completes and you suddenly get data you requested before but don’t expect it at this time. Not sure I am correct, just interesting to model the situation.

Thank you all for interesting conversation.

The default for all file systems is to serialize I/O requests and execute
them synchronously. There’s an MSDN article about the conditions that
force synchronous file I/O, and one of them is a normal open-and-read or
open-and-write scenario (FILE_FLAG_NO_BUFFERING changes this; see my Async
Explorer on my MVP Tips site, http://www.flounder.com/asynchexplorer.htm)
joe

> It looks consistent, the only things I don’t get is why synchronous
> operation must be serialized (it is
>easier to implement but makes IO slower)

To guard lseek pointer (FILE_OBJECT::CurrentByteOffset)


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>how FILE_OBJECT::CurrentByteOffset is used (I didn’t see any operation on it during NtWriteFile, but

It takes default offset from it.

And the FSD’s code updates it.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

That sounds reasonable, except for the small matter that most applications
actually care about which order the ‘Hello world’ and ‘Goodbye world’ get
written to the device. Unless each IOOP is stateless, as in a UDP server
like DNS, the context of one IOOP with respect to another is important. For
applications using sync IO, typically only a single thread will be issuing
IO per handle, but if that is not the case, then the synchronization means
that the programmer can expect the IOOPs to complete in the same identical
order that the UM code issued them. This is largely unhelpful because of
context switches, but at least it is comprehensible to a UM programmer who
has no idea about IRPs or how the IO completes in the kernel.

IIMO the one limitation of overlapped IO & IOCP is that it requires the app
to synchronize the calls to pend reads because there is no way for the app
to determine the sequence in which a particular IOOP was ‘seen’ by the
kernel. The IOCP will dequeue in order, but one instruction later, the
thread may be prempted by the thread that dequeues the next completion and
the sequence is lost. This overhead is still small compared with the
benefit of multiple pending reads for many applications however; it is more
of the lack of elegance that bothers me than anything else

wrote in message news:xxxxx@ntdev…

Pavel,

Ok, we have two threads issuing WriteFile(“Hello world!\n”) and
WriteFile(“Goodbye world!\n”) simultaneously. Finally it is all about IRP (I
don’t consider FastIO now). At some point we need serialization since we
have only one output device. But isn’t it better to do it in driver to
reduce lock’s granularity?

To me, the great power of the IOCP is that I do NOT need to
synchronizeanything; I just handle the completions in the order they are
delivered, with no concern at all for the order in which they were
submitted. Each transaction carries its own state along with it. If you
care about fifoness then you cannot have ore than one thread waiting for
the IOCP, but even then, you only get them in the order they appear in the
IOCP, which may not be the order in which they were submitted, or even the
order in which they are completed (there was a thread about a year ago
when a number of experts told me that completion order and IOCP order are
explicitly not guaranteed to be the same order!)

To get the full benefit of the IOCP model, you have to assume I/O
completes in some opportunistic order, and therefore the disconnect
between completion order and IOCP order becomes irrelevant.

To carry the state along, embed the OVERLAPPED structure in a
state-carrying structure, e.g.,

typedef struct {
OVERLAPPED ovl;
BYTE buffer[somesize]; // or LPBYTE buffer and allocate it yourself
DWORD count; // actual buffer space used
MYREQUESTORINF inf; // whatever you want, either individual
// fields or some other struct that carries your state
} MYSTATE, * PMYSTATE;

You pass the address of ovl to your I/O operation, and in your IOCP
receiving thread you do appropriate casts to get it back to a PMYSTATE.
So there is never a need to do any kind of synchronization. If your app
truly requires FIFO responses based on submittal order, redesign your app
to remove this requirement, because we are already told that the
completions are not guaranteed to appear in submission, or even
completion, order.

It is important when considering concurrency to handle evything as
asynchronous opportunistic completion events. No good ever comes of
trying to preserve the dead single-thread sequential behavior model. You
have to shift paradigms, big time,or you are doomed.
joe

That sounds reasonable, except for the small matter that most applications
actually care about which order the ‘Hello world’ and ‘Goodbye world’ get
written to the device. Unless each IOOP is stateless, as in a UDP server
like DNS, the context of one IOOP with respect to another is important.
For
applications using sync IO, typically only a single thread will be issuing
IO per handle, but if that is not the case, then the synchronization means
that the programmer can expect the IOOPs to complete in the same identical
order that the UM code issued them. This is largely unhelpful because of
context switches, but at least it is comprehensible to a UM programmer who
has no idea about IRPs or how the IO completes in the kernel.

IIMO the one limitation of overlapped IO & IOCP is that it requires the
app
to synchronize the calls to pend reads because there is no way for the app
to determine the sequence in which a particular IOOP was ‘seen’ by the
kernel. The IOCP will dequeue in order, but one instruction later, the
thread may be prempted by the thread that dequeues the next completion and
the sequence is lost. This overhead is still small compared with the
benefit of multiple pending reads for many applications however; it is
more
of the lack of elegance that bothers me than anything else

wrote in message news:xxxxx@ntdev…

Pavel,

Ok, we have two threads issuing WriteFile(“Hello world!\n”) and
WriteFile(“Goodbye world!\n”) simultaneously. Finally it is all about IRP
(I
don’t consider FastIO now). At some point we need serialization since we
have only one output device. But isn’t it better to do it in driver to
reduce lock’s granularity?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Joseph, didn’t read the whole article yet, but you say:

“When an I/O operation is started on an asynchronous handle, the handle goes into a non-signaled state. Therefore, when used in the context of a WaitForSingleObject or WaitForMultipleObjects operation, the file handle will become signaled when the I/O operation completes. However, Microsoft actively discourages this technique; it does not generalize if there exists more than one pending I/O operation; the handle would become signaled if any I/O operation completed. Therefore, although this technique is feasible, it is not considered best practice.”

As I understand it is not feasible at all, since waiting on file handle will be satisfied by any IO completion (the event is shared). Or you rely on some knowledge that some lower driver completes operations in the order they was issued to it?

Maxim, thank you, found it in fast FAT WDK sample driver.

NTDEV,

That sounds reasonable, except for the small matter that most applications
actually care about which order the ‘Hello world’ and ‘Goodbye world’ get
written to the device.

Having two threads I can’t rely on order ‘Hello world’ and ‘Goodbye world’ get written to device even in the case of serialization. Anyway I need some code to synchronize IO issuing to a handle.

Perhaps some additional context will help.

If the nature of your IO is that each operation is independent, then
carrying the state with the request is obviously the best choice and, as you
say, frees you from any additional synchronization beyond whatever the
underlying drivers have done. A good example of this would be a DNS server.
Each DNS request is completely contained in a UDP datagram and the response
is completely contained in a single datagram. There is no relationship
whatsoever between any requests and every IO operation can be handled
stateless. The only synchronization needed in this server would be on the
in memory DNS data (ideally a shared reader, single writer lock).

Another example, which is not completely stateless, but can effectively use
this model, is a batched log writer. Here some data to be logged is saved
in a memory buffer and periodically flushed to disk. There is some synch
needed when issuing the IO to ensure the correct file offset for each
operation and switch the incoming data stream to the next buffer, but on
completion, the buffer can be freed or returned to a standby list without
regard to what order it completed or whether there are several others in
progress.

If however, your IO is of a nature that each operation is not independent,
then carrying the state from your application is not sufficient. Consider
the example of a TCP socket server: IO issued on independent connections is
not dependent in any way, but IOOPs on the same socket are. For writes, the
application must use synchronization to enforce the consistency of whatever
protocol it is using, so while there may be multiple pending IOOPs, the
order in which they entered the drivers write queue must be the same as the
order that the thread(s) intended so an exclusive lock per socket is needed.
For example, thread A acquires the lock and sends three buffers, then thread
B acquires the lock and sends two. The lock guarantees the order in which
the writes are queued but the order in which the completions are handled is
irrelevant because the completion handler need only free or reuse the
resources for the write. This case is exactly like the log writer above.

It is for reads that the problem I was referring to exists. If the
application uses a single read operation per socket, then there is no
ambiguity with respect to order of operations because all of the IOOPs are
independent. This model works well if there are a large number of low
bandwidth connections, but does not achieve high performance if the
connections are high bandwidth or have bursts of traffic because additional
buffering and copying in KM is required when there is no read pending.
Keeping some number of reads pending on every socket at all times, so that
as soon as the TCP stack receives data it can copy it into a user buffer and
complete the read, greatly improves the performance for reading data on high
bandwidth or ‘bursty’ connections. But as these reads are not all
independent, ambiguity in completion order becomes a problem. Unlike
writes, where the completion routine is simply freeing resources, for reads
the completion routine must start or perform the protocol decode and
whatever action the server should take on receipt of this data. In order
for the application to process the TCP stream, it needs to do so by
gathering data from the buffers in the same order as they were filled in KM.
Because the thread execution order is indeterminate (context switches +
multiple processors), the obvious choice is to add a sequence to each read
(with a scope of the socket) and to use that plus some pointer swapping
magic to reassemble the stream in a way that one of the completion routines
can begin processing (the one that completed the read representing the next
block is usually a good choice, though this is not required necessarily; a
forward progress guarantee is all that is needed).

All of this leads me to the problem or limitation or annoyance that I
mentioned in the beginning; that in order for an application to ensure that
these sequence numbers are assigned in the same order as the underlying
drivers queue the reads, it must use a lock to ensure that a read is fully
queued (ReadFile or WSARecv fail with pending or succeed immediately) before
issuing another read whereas if the API returned a per connection, per
direction sequence, then the lock could be avoided. It is this lack of
elegance in this one element of a design paradigm that otherwise has nothing
to object to that has irked me, albeit only moderately, and provoked my
comments.

wrote in message news:xxxxx@ntdev…

To me, the great power of the IOCP is that I do NOT need to
synchronizeanything; I just handle the completions in the order they are
delivered, with no concern at all for the order in which they were
submitted. Each transaction carries its own state along with it. If you
care about fifoness then you cannot have ore than one thread waiting for
the IOCP, but even then, you only get them in the order they appear in the
IOCP, which may not be the order in which they were submitted, or even the
order in which they are completed (there was a thread about a year ago
when a number of experts told me that completion order and IOCP order are
explicitly not guaranteed to be the same order!)

To get the full benefit of the IOCP model, you have to assume I/O
completes in some opportunistic order, and therefore the disconnect
between completion order and IOCP order becomes irrelevant.

To carry the state along, embed the OVERLAPPED structure in a
state-carrying structure, e.g.,

typedef struct {
OVERLAPPED ovl;
BYTE buffer[somesize]; // or LPBYTE buffer and allocate it yourself
DWORD count; // actual buffer space used
MYREQUESTORINF inf; // whatever you want, either individual
// fields or some other struct that carries your state
} MYSTATE, * PMYSTATE;

You pass the address of ovl to your I/O operation, and in your IOCP
receiving thread you do appropriate casts to get it back to a PMYSTATE.
So there is never a need to do any kind of synchronization. If your app
truly requires FIFO responses based on submittal order, redesign your app
to remove this requirement, because we are already told that the
completions are not guaranteed to appear in submission, or even
completion, order.

It is important when considering concurrency to handle evything as
asynchronous opportunistic completion events. No good ever comes of
trying to preserve the dead single-thread sequential behavior model. You
have to shift paradigms, big time,or you are doomed.
joe

That sounds reasonable, except for the small matter that most applications
actually care about which order the ‘Hello world’ and ‘Goodbye world’ get
written to the device. Unless each IOOP is stateless, as in a UDP server
like DNS, the context of one IOOP with respect to another is important.
For
applications using sync IO, typically only a single thread will be issuing
IO per handle, but if that is not the case, then the synchronization means
that the programmer can expect the IOOPs to complete in the same identical
order that the UM code issued them. This is largely unhelpful because of
context switches, but at least it is comprehensible to a UM programmer who
has no idea about IRPs or how the IO completes in the kernel.

IIMO the one limitation of overlapped IO & IOCP is that it requires the
app
to synchronize the calls to pend reads because there is no way for the app
to determine the sequence in which a particular IOOP was ‘seen’ by the
kernel. The IOCP will dequeue in order, but one instruction later, the
thread may be prempted by the thread that dequeues the next completion and
the sequence is lost. This overhead is still small compared with the
benefit of multiple pending reads for many applications however; it is
more
of the lack of elegance that bothers me than anything else

wrote in message news:xxxxx@ntdev…

Pavel,

Ok, we have two threads issuing WriteFile(“Hello world!\n”) and
WriteFile(“Goodbye world!\n”) simultaneously. Finally it is all about IRP
(I
don’t consider FastIO now). At some point we need serialization since we
have only one output device. But isn’t it better to do it in driver to
reduce lock’s granularity?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer