ERESOURCE

Hi,
I gather from earlier postings on this list that APCs must be disabled
for as long as an ERESOURCE is acquired. I have the following situation:

Im trying to break up read/write requests into smaller requests for my
remote file system. A request is taken off an internal queue, one chunk of
the read or write request is sent and the request is requeued. To preserve
atomicity of the IO I need to acquire the FCB EResource(s) before I send
out the first chunk and till the last chunk is sent (Is that right?).
There are the following problems with this:

  1. I have to hold the ERESOURCE while my request is queued. There could be
    long delays till a reply is received for a given read/write chunk.

  2. Only one chunk of a given request is sent when the file system code is
    entered. So if a request is requeued, I end up enabling APCs (by calling
    FsRtlExitFileSystem) just before leaving the FS dispatch point. The
    EResource is still held at this point! Can this still result in a
    deadlock?

Just wanted to know if this approach is correct and if not how this
atomicity is acheived in other remote file systems.

Also, can we not bother with ERESOURCE objects at all and just use our own
synchronization scheme? Or does NT require that we use this (Its a part of
the required FCB header)

Enabling and disabling APCs is done in nested fashion. In other words if
your logic is:

Enter File System (APCs disabled)
Acquire ERESOURCE
Remove Element from list
Send request
Enter File System (APCs disabled)
Acquire ERESOURCE
Process Request
Release ERESOURCE
Exit File System (APCs disabled)
Reached end of list
Release ERESOURCE
Exit File System (APCs enabled)

The way that APC enable/disable works is via a counter field. Each time you
call FsRtlEnterFileSystem (a/k/a KeEnterCriticalRegion) the counter is
decremented. Each time you call FsRtlExitFileSystem (a/k/a
KeLeaveCriticalRegion) it increments that counter field. When the field
reaches zero, the kernel then delivers any pending APCs.

You can build your own synchronization primitives, but I think that is
ill-advised. The history of computer science is littered with numerous
incorrect serialization techniques. In addition, Microsoft really does make
their built-in versions both efficient to use as well as to debug.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Jagannath Krishnan [mailto:xxxxx@hotmail.com]
Sent: Tuesday, March 04, 2003 1:16 PM
To: File Systems Developers
Subject: [ntfsd] ERESOURCE

Hi,
I gather from earlier postings on this list that APCs must be disabled
for as long as an ERESOURCE is acquired. I have the following situation:

Im trying to break up read/write requests into smaller requests for my
remote file system. A request is taken off an internal queue, one chunk of
the read or write request is sent and the request is requeued. To preserve
atomicity of the IO I need to acquire the FCB EResource(s) before I send
out the first chunk and till the last chunk is sent (Is that right?).
There are the following problems with this:

  1. I have to hold the ERESOURCE while my request is queued. There could be
    long delays till a reply is received for a given read/write chunk.

  2. Only one chunk of a given request is sent when the file system code is
    entered. So if a request is requeued, I end up enabling APCs (by calling
    FsRtlExitFileSystem) just before leaving the FS dispatch point. The
    EResource is still held at this point! Can this still result in a
    deadlock?

Just wanted to know if this approach is correct and if not how this
atomicity is acheived in other remote file systems.

Also, can we not bother with ERESOURCE objects at all and just use our own
synchronization scheme? Or does NT require that we use this (Its a part of
the required FCB header)


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Thanks Tony!
The scheme I was trying to describe is a little different. I’ll use your
method to describe it.

Enter File System (Disabling APCs)
Acquire ERESOURCE
Deque Request1, send chunk 1 of request 1, enque request1
Exit File System (enabling APCs)
return

Enter File System (Disabling APCs)
Deque Request1, Process reply of chunk1, send chunk 2 of request 1, enque
request1
Exit File System (enabling APCs)
return
.
.
.
Enter File System (Disabling APCs)
Deque Request1, process reply of n-1, send chunk n of request 1, enque
request1
Exit File System (enabling APCs)
return

Enter File System (Disabling APCs)
Deque Request1, process reply of chunk n
Release ERESOURCE
Exit File System (enabling APCs)
return

Ofcourse beween any two steps, a chunk of some entirely different request
can be sent.

In an earlier posting on this list last month, you have described how APCs
need to be disabled when the resource is acquired, to prevent deadlocks.
In the above scheme, APCs are enabled just before exiting file system
code. The resource is held across multiple invocations of the FS code till
an entire request completes. Will this still result in deadlock? Is it not
advisable for some other reason?

Im also concerned that I have to hold the resource for so long. Is there
any other, more standard way to provide atomicity of operations in remote
file systems? I notice that in SMBMRX in the IFS toolkit they dont seem to
use ERESOURCE objects.

Thanks again for your prompt reply!
Jagannath

The first question I asked myself when I started thinking about your post is
the same one you should be asking - what is your ERESOURCE protecting.
While there are some rare circumstances when I’ve done thread-to-thread I/O
handoffs of lock ownership.

So, is the ERESOURCE protecting the queue or is it protecting the request
structure? If it is protecting the queue, you only need to hold it around
queue/dequeue operations. If it protects the request, you only need to hold
it while using/modifying the request. This latter case would make more
sense in the scheme you describe, but then I don’t think you need the
ERESOURCE at all - the request is a token, and only the owner of the token
can modify it.

I hope this makes sense.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Jagannath Krishnan [mailto:xxxxx@hotmail.com]
Sent: Tuesday, March 04, 2003 2:20 PM
To: File Systems Developers
Subject: [ntfsd] RE: ERESOURCE

Thanks Tony!
The scheme I was trying to describe is a little different. I’ll use your
method to describe it.

Enter File System (Disabling APCs)
Acquire ERESOURCE
Deque Request1, send chunk 1 of request 1, enque request1
Exit File System (enabling APCs)
return

Enter File System (Disabling APCs)
Deque Request1, Process reply of chunk1, send chunk 2 of request 1, enque
request1
Exit File System (enabling APCs)
return
.
.
.
Enter File System (Disabling APCs)
Deque Request1, process reply of n-1, send chunk n of request 1, enque
request1
Exit File System (enabling APCs)
return

Enter File System (Disabling APCs)
Deque Request1, process reply of chunk n
Release ERESOURCE
Exit File System (enabling APCs)
return

Ofcourse beween any two steps, a chunk of some entirely different request
can be sent.

In an earlier posting on this list last month, you have described how APCs
need to be disabled when the resource is acquired, to prevent deadlocks.
In the above scheme, APCs are enabled just before exiting file system
code. The resource is held across multiple invocations of the FS code till
an entire request completes. Will this still result in deadlock? Is it not
advisable for some other reason?

Im also concerned that I have to hold the resource for so long. Is there
any other, more standard way to provide atomicity of operations in remote
file systems? I notice that in SMBMRX in the IFS toolkit they dont seem to
use ERESOURCE objects.

Thanks again for your prompt reply!
Jagannath


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Thanks again Tony!
Im actually protecting neither the queue nor the ‘request’ (IRP context)
with the ERESOURCE. Im hoping to protect the file on ‘disk’ which is being
read/modified by this request. Im doing this in accordance with, say,
FASTFAT or Nagar’s SFSD where you protect the FCB using 2 ERESOURCE
objects.

My understanding is that if you have 2 write requests to the same file
(possibly using different handles) you need to serialize them. This is
done using the ERESOURCE objects (main and paging). Given the algorithm in
my previous mail Im trying to see that even though a write request may be
split into multiple chunks, each satisfied separately, the ERESOURCE is
held for the entire duration of write1 so that write2 is not allowed to
proceed till all of write1 completes.

Given this, am I right in concluding that:

  1. I need to serialize 2 requests to the same file for a remote file
    system
  2. ERESOURCE is the best way to achieve this serialization by protecting
    the FCB
  3. I can hold ERESOURCE for long periods of time (till all chunks of 1
    request complete)
  4. APCs will be disabled when exiting the FS code, while the EResource is
    held, but thats ok as Im exiting the FS code right then. (Please see my
    previous mail for the skeleton algorithm)

Is this correct? Is there a better way to serialize?

Thanks a lot!
Jagannath

Jagannath,

Ah. I think I understand your issue or concern. Fortunately, it isn’t
nearly as hard as you make it.

They key point here is that you need only serialize your own data
structures. It is a uniquely file systems perspective, perhaps, but we
really don’t care about user data (besides storing bits and retrieving
bits). What we care about is our meta-data and our data structures. So we
need to serialize access to our data. Let the user fend for himself.

In all seriousness, the I/O Manager will take care of most user applications
because they specified FILE_SYNCHRONOUS_IO_NONALERT. They are serialized
and the I/O handled synchronously by the I/O Manager. For applications that
use asynchronous I/O, they need to handle their own synchronization.
Otherwise, they end up in a “last writer wins” situation (even with your
synchronization) so they don’t know the results. An application can use
byte range locks or some other private scheme.

A database engine *relies* upon the ability to perform non overlapping
asynchronous I/O operations to achieve really good performance. A design
like yours would actually not add any support to their file system and at
the same time throttle their performance.

Thus, you probably don’t need that ERESOURCE at all to protect the user data
access. The ERESOURCE in file systems is used to serialize access to data
structures shared between distinct threads - notably the fields of the
FSRTL_ADVANCED_FCB_HEADER (or the now obsolete FSRTL_COMMON_FCB_HEADER).

I hope this helps explain things a little bit better.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Jagannath Krishnan [mailto:xxxxx@hotmail.com]
Sent: Tuesday, March 04, 2003 5:28 PM
To: File Systems Developers
Subject: [ntfsd] RE: ERESOURCE

Thanks again Tony!
Im actually protecting neither the queue nor the ‘request’ (IRP context)
with the ERESOURCE. Im hoping to protect the file on ‘disk’ which is being
read/modified by this request. Im doing this in accordance with, say,
FASTFAT or Nagar’s SFSD where you protect the FCB using 2 ERESOURCE
objects.

My understanding is that if you have 2 write requests to the same file
(possibly using different handles) you need to serialize them. This is
done using the ERESOURCE objects (main and paging). Given the algorithm in
my previous mail Im trying to see that even though a write request may be
split into multiple chunks, each satisfied separately, the ERESOURCE is
held for the entire duration of write1 so that write2 is not allowed to
proceed till all of write1 completes.

Given this, am I right in concluding that:

  1. I need to serialize 2 requests to the same file for a remote file
    system
  2. ERESOURCE is the best way to achieve this serialization by protecting
    the FCB
  3. I can hold ERESOURCE for long periods of time (till all chunks of 1
    request complete)
  4. APCs will be disabled when exiting the FS code, while the EResource is
    held, but thats ok as Im exiting the FS code right then. (Please see my
    previous mail for the skeleton algorithm)

Is this correct? Is there a better way to serialize?

Thanks a lot!
Jagannath


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Thanks a bunch Tony!
Thats what I was hoping to hear :slight_smile:

Just to be absolutely clear on this let me restate what I think you are
saying. I understand that the ERESOURCE protects the fields in the FCB.
Also serializing access to the file on disk can be acheived by serializing
access to the FCBs fields. When you say that IOManager will serialize 2
writes does it mean that one write will complete entirely before the
second request is sent to the driver? I may be asking the obvious here…

My worry was that if the driver gets both writes (or indeed, any
combination of 2 conflicting requests on the same file including
reads/deletes) then because of the fact that Im breaking up the request
into chunks, chunks from write1 and write2 may interleave (the numbers 1
and 2 dont indicate which one originated earlier or anything)

i.e. Im not worried about enforcing any order on the writes. Im only
worried that the 2 writes shouldnt overlap as follows in my driver:

chunk1 of write 1
chunk1 of write 2
chunk2 of write 2
chunk2 of write 1

etc.

Are you saying that it is not something the FSD needs to worry about? (I
hope so! :> ). Nagar’s book seems to emphatically state that it is the
FSDs responsibility (page 402), hence my confusion, I guess.

If it is indeed the FSDs responsibility to enforce this serialization then
could you please let me know about the 4 conclusions in my last mail?
Specially whether you think there’ll be a deadlock. In this case I plan to
acquire the FCB’s EResources in the dispatch points as the book suggests.
Thus 2 accesses to the same file will result in one of the client threads
calling into out FSD and waiting on this FCBs ERESOURCE because the other
one has it. This will ensure serialization.

Even if it is not the FSDs responsibility then, as you rightly mentioned
in your last mail, to protect the FCB’s fields the ERESOURCES will have to
be used. In the example above apart from the fact that user data will
overlap in the file, the fact is the 2 client threads that initiate this
write will both contend for the same FCB and may update the filesize and
other values incorrectly. To protect against this you are again correct
that I may only need to acquire the ERESOURCE for the short time that the
FCB is being accessed and can release it before a request is requeued
after sending one chunk. I guess my confusion here is that again in the
SFSD code and FASTFAT code it seems they acquire the ERESOURCE(S)
protecting the FCB before the request is started at the lower level and
till they get the results back, not only when modifying the FCB. This
further seems to indicate that they are trying to prevent 2 simultaneous
requests into the FSD for the same file from mucking with each other.

Best regards,
Jagannath

synchronous IO behaviour is guaranteed by IO manager using the FILE_OBJECT
lock (a KEVENT)
and a “bussy” indicator.

----- Original Message -----
From: “Jagannath Krishnan”
To: “File Systems Developers”
Sent: Wednesday, March 05, 2003 1:41 AM
Subject: [ntfsd] RE: ERESOURCE

> Thanks a bunch Tony!
> Thats what I was hoping to hear :slight_smile:
>
> Just to be absolutely clear on this let me restate what I think you are
> saying. I understand that the ERESOURCE protects the fields in the FCB.
> Also serializing access to the file on disk can be acheived by serializing
> access to the FCBs fields. When you say that IOManager will serialize 2
> writes does it mean that one write will complete entirely before the
> second request is sent to the driver? I may be asking the obvious here…
>
> My worry was that if the driver gets both writes (or indeed, any
> combination of 2 conflicting requests on the same file including
> reads/deletes) then because of the fact that Im breaking up the request
> into chunks, chunks from write1 and write2 may interleave (the numbers 1
> and 2 dont indicate which one originated earlier or anything)
>
> i.e. Im not worried about enforcing any order on the writes. Im only
> worried that the 2 writes shouldnt overlap as follows in my driver:
>
> chunk1 of write 1
> chunk1 of write 2
> chunk2 of write 2
> chunk2 of write 1
>
> etc.
>
> Are you saying that it is not something the FSD needs to worry about? (I
> hope so! :> ). Nagar’s book seems to emphatically state that it is the
> FSDs responsibility (page 402), hence my confusion, I guess.
>
>
> If it is indeed the FSDs responsibility to enforce this serialization then
> could you please let me know about the 4 conclusions in my last mail?
> Specially whether you think there’ll be a deadlock. In this case I plan to
> acquire the FCB’s EResources in the dispatch points as the book suggests.
> Thus 2 accesses to the same file will result in one of the client threads
> calling into out FSD and waiting on this FCBs ERESOURCE because the other
> one has it. This will ensure serialization.
>
> Even if it is not the FSDs responsibility then, as you rightly mentioned
> in your last mail, to protect the FCB’s fields the ERESOURCES will have to
> be used. In the example above apart from the fact that user data will
> overlap in the file, the fact is the 2 client threads that initiate this
> write will both contend for the same FCB and may update the filesize and
> other values incorrectly. To protect against this you are again correct
> that I may only need to acquire the ERESOURCE for the short time that the
> FCB is being accessed and can release it before a request is requeued
> after sending one chunk. I guess my confusion here is that again in the
> SFSD code and FASTFAT code it seems they acquire the ERESOURCE(S)
> protecting the FCB before the request is started at the lower level and
> till they get the results back, not only when modifying the FCB. This
> further seems to indicate that they are trying to prevent 2 simultaneous
> requests into the FSD for the same file from mucking with each other.
>
> Best regards,
> Jagannath
>
> —
> You are currently subscribed to ntfsd as: xxxxx@rdsor.ro
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Thanks, Dan.
I wasnt too clear about this in my very last mail though I mentioned it in
passing earlier. The interleaving problem that I described seems to still
exist for requests on the same on-disk file with different handles, right?
If the file is opened by name twice and then write1 is performed on
handle1 and write2 on handle2 they’ll have different file objects and IO
Manager wont serialize this right? Which takes me back to me earlier
question about EResource and deadlock given that Im exiting the FS code
right after enabling APCs, but still holding the resource. And whether it
is all necessary in the first place.

Thanks!
Jagannath

synchronous IO behaviour is guaranteed by IO manager using the FILE_OBJECT
lock (a KEVENT)
and a “bussy” indicator.

----- Original Message -----
From: “Jagannath Krishnan”
> To: “File Systems Developers”
> Sent: Wednesday, March 05, 2003 1:41 AM
> Subject: [ntfsd] RE: ERESOURCE
>
>
> > Thanks a bunch Tony!
> > Thats what I was hoping to hear :slight_smile:
> >
> > Just to be absolutely clear on this let me restate what I think you are
> > saying. I understand that the ERESOURCE protects the fields in the FCB.
> > Also serializing access to the file on disk can be acheived by serializing
> > access to the FCBs fields. When you say that IOManager will serialize 2
> > writes does it mean that one write will complete entirely before the
> > second request is sent to the driver? I may be asking the obvious here…
> >
> > My worry was that if the driver gets both writes (or indeed, any
> > combination of 2 conflicting requests on the same file including
> > reads/deletes) then because of the fact that Im breaking up the request
> > into chunks, chunks from write1 and write2 may interleave (the numbers 1
> > and 2 dont indicate which one originated earlier or anything)
> >
> > i.e. Im not worried about enforcing any order on the writes. Im only
> > worried that the 2 writes shouldnt overlap as follows in my driver:
> >
> > chunk1 of write 1
> > chunk1 of write 2
> > chunk2 of write 2
> > chunk2 of write 1
> >
> > etc.
> >
> > Are you saying that it is not something the FSD needs to worry about? (I
> > hope so! :> ). Nagar’s book seems to emphatically state that it is the
> > FSDs responsibility (page 402), hence my confusion, I guess.
> >
> >
> > If it is indeed the FSDs responsibility to enforce this serialization then
> > could you please let me know about the 4 conclusions in my last mail?
> > Specially whether you think there’ll be a deadlock. In this case I plan to
> > acquire the FCB’s EResources in the dispatch points as the book suggests.
> > Thus 2 accesses to the same file will result in one of the client threads
> > calling into out FSD and waiting on this FCBs ERESOURCE because the other
> > one has it. This will ensure serialization.
> >
> > Even if it is not the FSDs responsibility then, as you rightly mentioned
> > in your last mail, to protect the FCB’s fields the ERESOURCES will have to
> > be used. In the example above apart from the fact that user data will
> > overlap in the file, the fact is the 2 client threads that initiate this
> > write will both contend for the same FCB and may update the filesize and
> > other values incorrectly. To protect against this you are again correct
> > that I may only need to acquire the ERESOURCE for the short time that the
> > FCB is being accessed and can release it before a request is requeued
> > after sending one chunk. I guess my confusion here is that again in the
> > SFSD code and FASTFAT code it seems they acquire the ERESOURCE(S)
> > protecting the FCB before the request is started at the lower level and
> > till they get the results back, not only when modifying the FCB. This
> > further seems to indicate that they are trying to prevent 2 simultaneous
> > requests into the FSD for the same file from mucking with each other.
> >
> > Best regards,
> > Jagannath
> >
> > —
> > You are currently subscribed to ntfsd as: xxxxx@rdsor.ro
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >

Am I the only one worried about Jagannath’s sequence of Enter File System
and Acquire ERESOURCE calls detailed below?

I was under the impression that you should not hold an ERESOURCE after
re-enabling APCs (by calling Exit File System before releasing the
ERESOURCE, as shown below.)

I always thought Tony’s example was the right one:
Enter FS/Acquire ERESOURCE/Release ERESOURCE/Exit FS

Carl Appellof

“Jagannath Krishnan” wrote in message
news:xxxxx@ntfsd…
>
> Thanks Tony!
> The scheme I was trying to describe is a little different. I’ll use your
> method to describe it.
>
> Enter File System (Disabling APCs)
> Acquire ERESOURCE
> Deque Request1, send chunk 1 of request 1, enque request1
> Exit File System (enabling APCs)
> return
>
> Enter File System (Disabling APCs)
> Deque Request1, Process reply of chunk1, send chunk 2 of request 1, enque
> request1
> Exit File System (enabling APCs)
> return
> .
> .
> .
> Enter File System (Disabling APCs)
> Deque Request1, process reply of n-1, send chunk n of request 1, enque
> request1
> Exit File System (enabling APCs)
> return
>
> Enter File System (Disabling APCs)
> Deque Request1, process reply of chunk n
> Release ERESOURCE
> Exit File System (enabling APCs)
> return
>

>i.e. Im not worried about enforcing any order on the writes. Im only

worried that the 2 writes shouldnt overlap as follows in my driver:

chunk1 of write 1
chunk1 of write 2
chunk2 of write 2
chunk2 of write 1

Serializing writes to avoid above problem is file system responsibility.
The other problem related to this one is serializing reads with regard to
writes. Normally file system does this by acquiring resource before
sending request down to disk driver and releasing it after return from
IoCallDriver. This obviously doesn’t address the problem with overlapping
asynchronous writes. In order to address this issue FAT uses special event
embedded in FCB - OutstandingAsyncEvent. FAT sets the event before sending
asynchronous request to disk driver and signals this event in completion
routine for asynchronous writes and it waits for the event in the routine
that acquires resource to allow proceed with write or read. So when there
is an outstanding write request for a file no other read or write can
proceed. On the other hand if there is outstanding synchronous write
request no ansynchronous request can start.

Alexei.

> that acquires resource to allow proceed with write or read. So when
there

is an outstanding write request for a file no other read or write
can
proceed.

What about paging writes?

IIRC the resource locks in FCB header are used only to protect the
file size, so, writes to the middle of the file are not throttled by
the FCB locks.
PagingIoResource, for instance, is used to guard the truncations only.

Max

>What about paging writes?

IIRC the resource locks in FCB header are used only to protect the
file size, so, writes to the middle of the file are not throttled by
the FCB locks.
PagingIoResource, for instance, is used to guard the truncations only.

Paging write is different from normal writes because it doesn’t change
logical contents of the file and so it doesn’t affect results returned
normal read/write. On the other hand changing file size does affect
results of paging io (because you may need to truncate write) and so they
need to be serialized.
There should be no collisitons between paging write and non-cached io from
application because cache supposed to be flushed and purged before
non-cached write proceeds. Issue with asynchronous processing arises only
in case of non-cahed io when write is completed asynchronously by disk
driver at the time when file system returned control to application.

Alexei.