How to asynchronously create a file

Adrien_de_Croy · May 17, 2015, 8:00pm

Hi all

sorry if this topic is covered in a FAQ somewhere.

We’re going to some effort to restructure our code base to use asynch IO
for all things, and one stumbling block we have hit is that we need to
access lots of files, and CreateFile itself is a synchronous call that
can take a long time to complete (esp over a network path).

Is there any way to asynchronously create a file? E.g. use DeviceIoCtl
or something so that we get a pending result and a callback or
completion of some sort that happens when the handle is available?

Thanks

Adrien

anton_bassov · May 17, 2015, 9:10pm

> E.g. use DeviceIoCtl or something so that we get a pending result…

Whom are you going to send an IOCTL to and, even more important, what association are you going to use upon your call???Let’s assume your has failed. How on Earth is the OS supposed to notify you about it??? As long as you have an open file handle there is no problem here whatsoever, because this handle refers to a FILE_OBJECT that provides an association with your target. However, at the time when you try to create/open a file there is no corresponding FILE_OBJECT yet. If you think about it a bit you are going to understand why creating/opening a file is different from reads/writes on it as far as asynch notification is concerned,at least with the existing API…

Anton Bassov

Adrien_de_Croy · May 17, 2015, 9:37pm

There are plenty of ways commonly used to notify success or failure
after the fact.

When the IRP is completed it has a final status and extra informaton.
This could include an error value if it’s an error. Even NtCreateFile
has a status block.

But I guess we’d need a handle first to associate the request with the
result.

Is there any way then to create/open files in the way socket connections
are made, so that the handle instantiation can be synchronous (but
reliably quick) and the association with a file object (slow and/or
unreliable) can be decoupled from this and asynchronous. Then the
handle can be used to identify the thing that the response is for.

Otherwise we read all these articles about the amazing wonders of async
programming but we have to resort to hacks and over-provisioning of
threads to poorly work around the problems created by blocking
CreateFile calls. So it’s a sham.

Adrien

------ Original Message ------
From: “xxxxx@hotmail.com”
To: “Windows System Software Devs Interest List”
Sent: 18/05/2015 1:09:50 p.m.
Subject: RE:[ntdev] How to asynchronously create a file

>> E.g. use DeviceIoCtl or something so that we get a pending result…
>
>
>Whom are you going to send an IOCTL to and, even more important, what
>association are you going to use upon your call???Let’s assume your
>has failed. How on Earth is the OS supposed to notify you about it???
>As long as you have an open file handle there is no problem here
>whatsoever, because this handle refers to a FILE_OBJECT that provides
>an association with your target. However, at the time when you try to
>create/open a file there is no corresponding FILE_OBJECT yet. If you
>think about it a bit you are going to understand why creating/opening a
>file is different from reads/writes on it as far as asynch notification
>is concerned,at least with the existing API…
>
>
>Anton Bassov
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · May 17, 2015, 10:24pm

> But I guess we’d need a handle first to associate the request with the result.

Voila! You started getting it…

Otherwise we read all these articles about the amazing wonders of async programming

You should realize that all these "wonders of async programming"apply only in very specific situations. These are, first, socket IO,and second, direct IO on disk files A long as you want to use it for buffered IO on disk files, as well as for operations on cached metadata, its practical usefulness is is just zero. Although this is true under any OS, this may be particularly true under Windows because Windows caches files on per-file basis (as well as meta-data), rather than using disk buffer cache for file data and VFS caches (namely, inode cache and dentry cache) for metadata, plus implements more aggressive policies of purging unused cache entries. This implies that the thread is more likely to go blocking if no valid data is found in the cache, which simply defeats the purpose of asynch IO

Once file creation is inherently cacheable operation you would not get any performance enhancement here even if the specifc API for this purpose did exist…

Anton Bassov

Adrien_de_Croy · May 17, 2015, 10:53pm

a problem scenario for us is serving a file (to an HTTP client) from a
cache volume on a network store.

All the socket stuff can be very efficient, but in the end we need to
call CreateFile, and that blocks, which means we need to pass CreateFile
calls to another thread pool so we can release the socket IOCP?

Surely underneath it all, the file creation (or more likely just
opening) which maps to a network call is happening asynchronously, and
being converted to blocking by an upper layer anyway?

------ Original Message ------
From: “xxxxx@hotmail.com”
To: “Windows System Software Devs Interest List”
Sent: 18/05/2015 2:23:21 p.m.
Subject: RE:[ntdev] How to asynchronously create a file

>> But I guess we’d need a handle first to associate the request with
>>the result.
>
>Voila! You started getting it…
>
>
>
>> Otherwise we read all these articles about the amazing wonders of
>>async programming
>
>
>You should realize that all these "wonders of async programming"apply
>only in very specific situations. These are, first, socket IO,and
>second, direct IO on disk files A long as you want to use it for
>buffered IO on disk files, as well as for operations on cached
>metadata, its practical usefulness is is just zero. Although this is
>true under any OS, this may be particularly true under Windows because
>Windows caches files on per-file basis (as well as meta-data), rather
>than using disk buffer cache for file data and VFS caches (namely,
>inode cache and dentry cache) for metadata, plus implements more
>aggressive policies of purging unused cache entries. This implies that
>the thread is more likely to go blocking if no valid data is found in
>the cache, which simply defeats the purpose of asynch IO
>
>
>Once file creation is inherently cacheable operation you would not get
>any performance enhancement here even if the specifc API for this
>purpose did exist…
>
>
>
>Anton Bassov
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

Alex_Grig · May 17, 2015, 11:06pm

Have a thread pool dedicated to CreateFile calls. Post a workitem to it when you need to open a file.

anton_bassov · May 18, 2015, 1:17am

Well, assuming that you are speaking about a socket() call, I can assure you it is not simple at all.
IIRC, it involves multiple CreateFile(),DeviceIOControl() and CloseHandle()calls with
//Device//Afd//Endpoint target.All this stuff is done synchronously…

Anton Bassov

Adrien_de_Croy · May 18, 2015, 1:40am

socket call even though synchronous does not require network access to
complete and is rarely not very fast, unlike connect

CreateFile for a file is like socket() + connect() in one call.

------ Original Message ------
From: “xxxxx@hotmail.com”
To: “Windows System Software Devs Interest List”
Sent: 18/05/2015 5:16:25 p.m.
Subject: RE:[ntdev] How to asynchronously create a file

>

>
>
>Well, assuming that you are speaking about a socket() call, I can
>assure you it is not simple at all.
>IIRC, it involves multiple CreateFile(),DeviceIOControl() and
>CloseHandle()calls with
>//Device//Afd//Endpoint target.All this stuff is done synchronously…
>
>
>Anton Bassov
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · May 18, 2015, 1:45am

> Have a thread pool dedicated to CreateFile calls. Post a workitem to it when you

need to open a file.

Actually, this is a pretty accurate description of how GNU/Linux userland C library implements aio’.h.
It does not make any use of asynch API provided by LInux kernel(which is ignored almost by everyone in the kernel anyway) . In order to make use of it one needs a separate libaio library…

Anton Bassov

Maxim_S_Shatskih · May 18, 2015, 5:32am

>Is there any way to asynchronously create a file?

Pre-Vista - none, not at all.

You need to use your own work items and your own logic to do async create (sync create from kernel’s point of view, but in a work item).

BTW - async IO is NOT faster then sync IO.

The only benefit of async IO is that you need lesser threads -> scalability.

Are you aware of this?

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 18, 2015, 5:44am

>rather than using disk buffer cache for file data

I don’t think any modern OS uses this outdated design.

Linux caches file data in physical pages. Same as Windows.

In both OSes, the same pages are used for mmap().

Yes, these OSes differ in how the set of these pages is organized. Windows uses prototype PTE tables, while Linux uses them only for SysV shmem() stuff, and, for mapped files, it uses hash lists of physical page descriptors.

But the general picture is the same. Even the functions are similar - generic_file_write() is the same as FsRtlCopyWrite().

Yes, disk buffers are also used in Linux, for metadata caching, and, due to legacy stuff, the physical page IO is done by attaching the fake buffer descriptors to it, but page cache is not buffer cache. Also, recent Linux versions decoupled “struct buf” from “struct bio” totally.

and VFS caches (namely, inode cache and dentry cache) for metadata,

This is not 1-to-1 cache of the disk data, and Windows FSDs have the same (though they have no VFS).

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 18, 2015, 5:46am

>Surely underneath it all, the file creation (or more likely just

opening) which maps to a network call is happening asynchronously, and
being converted to blocking by an upper layer anyway?

Pre-Vista no. MJ_CREATE is inherently synchronous.

Vista+ - at least MJ_CREATE is cancellable, and I think (can be wrong) there was an API for async create.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 18, 2015, 5:48am

> Well, assuming that you are speaking about a socket() call, I can assure you it is not simple at all.

IIRC, it involves multiple CreateFile(),DeviceIOControl() and CloseHandle()calls with
//Device//Afd//Endpoint target.All this stuff is done synchronously…

Yes, but there are also WSAXxx and AcceptEx, which are not compatible with BSD sockets but are async.

BSD sockets API is sync.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 18, 2015, 5:52am

> Actually, this is a pretty accurate description of how GNU/Linux userland C library implements aio’.h.

Is it the same as FreeBSD’s aio of 10 years old, with just plain pathetic completion notifications?

Or they finally created the analog of IOCP? Linux kernel has it, under the name of “event queue” or such. Is it exposed to user mode? Is aio using it?

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 18, 2015, 6:02am

Also note the Windows limitation on: the IRP in progress belongs to a thread, and is cancelled on thread exit.

So, your thread which is submitting async IO cannot exit till all IRPs will be completed.

This poses major limitations on the design. One of the possible designs is to have 1 dedicated high-priority thread which only submits IRPs and does nothing else. The completed IRPs are then processed in some IOCP consumer thread (at this moment they do not reference the “master” thread anymore).

This gives you the ability to exit from the “processor” thread, and thus implement smart thread pool management.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

“Adrien de Croy” wrote in message news:xxxxx@ntdev…

socket call even though synchronous does not require network access to
complete and is rarely not very fast, unlike connect

CreateFile for a file is like socket() + connect() in one call.

------ Original Message ------
From: “xxxxx@hotmail.com”
To: “Windows System Software Devs Interest List”
Sent: 18/05/2015 5:16:25 p.m.
Subject: RE:[ntdev] How to asynchronously create a file

>

>
>
>Well, assuming that you are speaking about a socket() call, I can
>assure you it is not simple at all.
>IIRC, it involves multiple CreateFile(),DeviceIOControl() and
>CloseHandle()calls with
>//Device//Afd//Endpoint target.All this stuff is done synchronously…
>
>
>Anton Bassov
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · May 18, 2015, 12:05pm

>> rather than using disk buffer cache for file data

I don’t think any modern OS uses this outdated design.

Stop thinking, Max,and, instead, start reading books/articles/etc,as well as looking at the code - this will save you quite a few silly statements that you seem to be fond of making…

Linux caches file data in physical pages. Same as Windows.

Correct. However, once you are so desperate to"think", then try to do it properly, at least once in a while. Let’s say you want to read a page from the file. First of all, FSD has to translate it to the disk address. However, after it has been already done once, what is the point of going to FSD every time you want to read or flush this page??? This is what disk buffer cache is for - FSD translates a page
to its disk address and inserts it into a buffer cache. At this point, memory management layer can simply check whether a page is in the buffer cache, and, if it is, read or flush it using disk address that is associated with it (this association had been earlier established by FSD) without ever having to go to FSD again. This is how normal OSes do things - it is called “disk buffer cache”. Unlike the normal OS, Windows does not have any, and, instead, uses some dubious “Cache Manager” that handles cache on per-file basis and goes to the FSD every time it has to read or flush a page, and does so recursively upon the request of the same FSD that,in turn, does it upon MM’s request…

In both OSes, the same pages are used for mmap().

True, but, as you can see, it is done differently…

Linux kernel has it, under the name of “event queue” or such. Is it exposed to user mode?
Is aio using it?

Not the one that C library implements - you need libaio for this. The “only” question is who is going to implement it in the kernel- historically, the only one who ever made use of it is /usb/gadget. This API had been around for a decade, had been completely ignored by everyone, and finally seems to have been deprecated (at least kick_iocb() has been gone since version 3.10)…

Anton Bassov

Peter_Viscarola_OSR · May 18, 2015, 12:50pm

Before you beat-up on people, get your own thinking in order.

Well, no. FSDs in general do not translate to DISK relative addresses. At least, not on Windows. They translate to VOLUME relative addresses. There’s a big difference.

Going the FSF *nominally* each time there’s a read ensures the FSD is “in the loop” for each operation. There’s NOMINAL overhead required to achieve this. After all, on Windows on the FSD does is copy CcCopyRead (whether the data is cached or not). So, here you’re talking about a distinction without any meaningful difference.

Peter
OSR
@OSRDrivers

anton_bassov · May 18, 2015, 2:24pm

>Before you beat-up on people, get your own thinking in order.

Oh, come on - you are not about to proceed with psychopharmacology topic again, are you…

Well, no. FSDs in general do not translate to DISK relative addresses. At
least, not on Windows.

…while we were speaking about Linux and disk buffer cache that Max claims is “outdated design”…

They translate to VOLUME relative addresses. There’s a big difference.

You seem to be just desperate to set me on “Windows bashing frenzy”,right. Indeed, there is a big difference…but only under Windows, because the normal OSes don’t make any, at least as far as mounted FSDs are concerned. Why? Simply because everyone is supposed to do its own job, and FSD’s job is all about managing the storage of file data on a block device, without having to worry about how this device is actually implemented. There MAY be exceptions to this rule- for example, a layered file system like ZFS may implement both FS and logical volume management…

Going the FSF *nominally* each time there’s a read ensures the FSD is “in the
loop” for each operation. There’s NOMINAL overhead required to achieve this.

OK, I am out of this discussion -allocating an IRP is nominal; obtaining the locks in advance in order to ensure that you don’t deadlock is nominal; maintaining a complex locking hierarchy is nominal; and having to do this throughout the entire FS filter stack is nominal as well…

After all, on Windows on the FSD does is copy CcCopyRead (whether the data is
cached or not

…and if it is not in the cache, then look above and see what has to be done(sorry, I forgot - it is all nominal anyway, so that there is nothing to worry about)…

Anton Bassov

Maxim_S_Shatskih · May 18, 2015, 4:59pm

>you want to read or flush this page??? This is what disk buffer cache is for - FSD translates a page

to its disk address and inserts it into a buffer cache.

And then? so, after the page has its buffer heads attached to it, the IO for it is NOT going thru the FSD at all?

Amazing.

This means that FSFs on Linux are not only very complex.

This means that FSFs on Linux are impossible.

Also, on-the-fly defrag is impossible.

Also, caching the non-disk FSs (see: SMB client) is a royal PITA. That’s probably why the SMB client on both Linux and FreeBSD is of pathetic quality.

Are you really sure of it? I think that the buffer heads attached to a page are not registered in any global structure and are only used as “IRPs” for disk IO, and are now replaced by “struct bio”. Am I wrong?

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Adrien_de_Croy · May 18, 2015, 5:40pm

Yes, our main concern is scalability.

Thanks

Adrien

------ Original Message ------
From: “Maxim S. Shatskih”
To: “Windows System Software Devs Interest List”
Sent: 18/05/2015 9:31:42 p.m.
Subject: Re:[ntdev] How to asynchronously create a file

>>Is there any way to asynchronously create a file?
>
>Pre-Vista - none, not at all.
>
>You need to use your own work items and your own logic to do async
>create (sync create from kernel’s point of view, but in a work item).
>
>BTW - async IO is NOT faster then sync IO.
>
>The only benefit of async IO is that you need lesser threads ->
>scalability.
>
>Are you aware of this?
>
>–
>Maxim S. Shatskih
>Microsoft MVP on File System And Storage
>xxxxx@storagecraft.com
>http://www.storagecraft.com
>
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer