Let me repeat this Q separately ![]()
How justified would be to have a filepath->fileID map in order to have
most often used files being opened via ID? Is there going to be any
significant performance improvement?
TIA,
Vladimir
Let me repeat this Q separately ![]()
How justified would be to have a filepath->fileID map in order to have
most often used files being opened via ID? Is there going to be any
significant performance improvement?
TIA,
Vladimir
Defragmentation API. For a file system you probably need not keep
FN-ID map - since ID can be simply disk offset - not required but
possible.
For a filter - youâd better query.
â
Kind regards, Dejan M. MVP for DDK
http://www.alfasp.com E-mail: xxxxx@alfasp.com
Alfa Transparent File Encryptor - Transparent file encryption services.
Alfa File Protector - File protection and hiding library for Win32
developers.
Alfa File Monitor - File monitoring library for Win32 developers.
Thanks, Dejan!
My question was is it feasible to keep a map Path->ID to improve
performance in creates. I.e. when create comes my filter would try to
map path to ID and if mapping exists filter will transform path-based
create to ID-based create. Assuming that we have a huge amount of files
on disk, finding actual file entry by path is quite expensive. So I
thought that given the essence of file ID as the exact location of file
entry on disk (correct me if Iâm wrong) I may have a big benefit in
performance by [quickly] transforming path-based creates to ID - based
creates.
Thatâs a general thought, an idea. And before jumping into experiments
//i.e. wasting time
I wanted to kick tires to see if anyone has
something to say ![]()
Regards,
Vladimir
-----Original Message-----
From: Dejan Maksimovic [mailto:xxxxx@alfasp.com]
Sent: Friday, June 11, 2004 11:23 AM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] Open file by ID
Defragmentation API. For a file system you probably need not keep
FN-ID map - since ID can be simply disk offset - not required but
possible.
For a filter - youâd better query.
â
Kind regards, Dejan M. MVP for DDK
http://www.alfasp.com E-mail: xxxxx@alfasp.com
Alfa Transparent File Encryptor - Transparent file encryption services.
Alfa File Protector - File protection and hiding library for Win32
developers.
Alfa File Monitor - File monitoring library for Win32 developers.
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
C:\Long file name 1\Long file name 2\Long file name 3.âŚ\ Long file name
100âŚ
Thatâs 2^100 possible path combinations⌠Would you keep even 1% of them in
memory?
While 2^100 is _unlikely", 2^10 is not. Thatâs 1024 paths for one file:-(
Regards, Dejan.
â
Kind regards, Dejan M. MVP for DDK
http://www.alfasp.com E-mail: xxxxx@alfasp.com
Alfa Transparent File Encryptor - Transparent file encryption services.
Alfa File Protector - File protection and hiding library for Win32 developers.
Alfa File Monitor - File monitoring library for Win32 developers.
Dejan: Think a bit further
Here is what is given:
I ran some perf tests and here is some stats (in case somebody
interested)
On the system with total 32K files spread across 16 folders (i.e. 2K
files/folder) an average time to open one (random) file is 1.5
milliseconds. On the same system with 1M files spread across 16 folders
(i.e. 64K files/folder) and average time to open one file is 13
milliseconds and that time grows exponentially along with total number
of files.
-----Original Message-----
From: Dejan Maksimovic [mailto:xxxxx@alfasp.com]
Sent: Friday, June 11, 2004 12:54 PM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] Open file by ID
C:\Long file name 1\Long file name 2\Long file name 3.âŚ\ Long
file name
100âŚ
Thatâs 2^100 possible path combinations⌠Would you keep even 1%
of them in
memory?
While 2^100 is _unlikely", 2^10 is not. Thatâs 1024 paths for one
file:-(
Regards, Dejan.
â
Kind regards, Dejan M. MVP for DDK
http://www.alfasp.com E-mail: xxxxx@alfasp.com
Alfa Transparent File Encryptor - Transparent file encryption services.
Alfa File Protector - File protection and hiding library for Win32
developers.
Alfa File Monitor - File monitoring library for Win32 developers.
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
Yeah! Now I see your point
But I still strongly believe that even if
I restrict the path (index) to LFN only Iâm still gonna get my benefits.
At least dental ![]()
Anyway, looks like I have to build some prototype to test that
conceptâŚ
Regards,
Vladimir
-----Original Message-----
From: Dejan Maksimovic [mailto:xxxxx@alfasp.com]
Sent: Friday, June 11, 2004 12:54 PM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] Open file by ID
C:\Long file name 1\Long file name 2\Long file name 3.âŚ\ Long
file name
100âŚ
Thatâs 2^100 possible path combinations⌠Would you keep even 1%
of them in
memory?
While 2^100 is _unlikely", 2^10 is not. Thatâs 1024 paths for one
file:-(
Regards, Dejan.
â
Kind regards, Dejan M. MVP for DDK
http://www.alfasp.com E-mail: xxxxx@alfasp.com
Alfa Transparent File Encryptor - Transparent file encryption services.
Alfa File Protector - File protection and hiding library for Win32
developers.
Alfa File Monitor - File monitoring library for Win32 developers.
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
Try doing it the way I do filters: add entries (wild card based) to say what
files/folders should be monitored/MRUed.
Yeah! Now I see your point
But I still strongly believe that even if
I restrict the path (index) to LFN only Iâm still gonna get my benefits.
At least dentalAnyway, looks like I have to build some prototype to test that
conceptâŚ
â
Kind regards, Dejan M. MVP for DDK
http://www.alfasp.com E-mail: xxxxx@alfasp.com
Alfa Transparent File Encryptor - Transparent file encryption services.
Alfa File Protector - File protection and hiding library for Win32 developers.
Alfa File Monitor - File monitoring library for Win32 developers.
----- Original Message -----
From: âVladimir Chtchetkineâ
To: âWindows File Systems Devs Interest Listâ
Sent: Friday, June 11, 2004 12:21 PM
Subject: [ntfsd] Open file by ID
Let me repeat this Q separately ![]()
How justified would be to have a filepath->fileID map in order to have
most often used files being opened via ID? Is there going to be any
significant performance improvement?
If I understand your suggestion correctly, DECâs 16-bit RSX systems did
something like this nearly 3 decades ago, called a âpath cacheâ IIRC. One
reason was because caching entire directories was often too burdensome for
the 16-bit environment with limited physical memory, so a smallish path
cache could eliminate the common path look-ups far more efficiently (and
with a far lower instruction path-length as well).
You should consider whether to depend solely upon the access control
information for the target file to limit access, or emulate the
per-directory controls in the path (which the requestor would encounter
during a normal path traversal) by maintaining each directoryâs ACL in the
cached entry (and changing it if the actual directoryâs ACL changes).
Similar considerations apply to removing path entries when a file (or
directory) is moved.
- bill
To augment Bill Todd's comments as well: in UNIX file systems this is
typically referred to as a "DNLC" cache (Directory Name Lookup Cache).
It is a tremendous win, assuming you can fit within the restrictions of
the DNLC cache itself. I've certainly seen file systems for which this
would yield incorrect results and it can also have unexpected
interaction issues with security.
Restricting it to your own (kernel mode) filter's use is likely to
provide a performance win, but also may expose the system to new
exploits and certainly could cause incorrect behavior when used with
arbitrary file systems. At present, this would only work properly with
NTFS and CDFS, both of which support open by file ID. The SFU Server
(a/k/a "NFS Server") uses open by file ID, as does SFM.
Regards,
Tony
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Vladimir Chtchetkine
Sent: Friday, June 11, 2004 12:21 PM
To: ntfsd redirect
Subject: [ntfsd] Open file by ID
Let me repeat this Q separately ![]()
How justified would be to have a filepath->fileID map in order to have
most often used files being opened via ID? Is there going to be any
significant performance improvement?
TIA,
Questions? First check the IFS FAQ at
You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
Bill, Tony:
Thanks for the reply. It looks like my âseparateâ question was too broad which made some confusion. When I referred âfile IDâ I meant precisely NTFSâs feature, I didnât mean reinventing the wheel. I guess that security questions related to open by ID are fully addressed by NTFS itself, right?
Unfortunately (and to my surprise) I didnât see much benefit from using path-ID map and using ID-open instead of path-open. I donât know how to explain that yet but on 1M files environment Iâve seen just 2-3% improvement in performance. StrangeâŚ
Regards,
Vladimir
Bill Todd wrote:
----- Original Message -----
From: âVladimir Chtchetkineâ
To: âWindows File Systems Devs Interest Listâ
Sent: Friday, June 11, 2004 12:21 PM
Subject: [ntfsd] Open file by ID
Let me repeat this Q separately ![]()
How justified would be to have a filepath->fileID map in order to have
most often used files being opened via ID? Is there going to be any
significant performance improvement?
If I understand your suggestion correctly, DECâs 16-bit RSX systems did
something like this nearly 3 decades ago, called a âpath cacheâ IIRC. One
reason was because caching entire directories was often too burdensome for
the 16-bit environment with limited physical memory, so a smallish path
cache could eliminate the common path look-ups far more efficiently (and
with a far lower instruction path-length as well).
You should consider whether to depend solely upon the access control
information for the target file to limit access, or emulate the
per-directory controls in the path (which the requestor would encounter
during a normal path traversal) by maintaining each directoryâs ACL in the
cached entry (and changing it if the actual directoryâs ACL changes).
Similar considerations apply to removing path entries when a file (or
directory) is moved.
- bill
â
Questions? First check the IFS FAQ at https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@yahoo.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
---------------------------------
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger
----- Original Message -----
From: âVladimir Chtchetkineâ
To: âWindows File Systems Devs Interest Listâ
Sent: Friday, June 11, 2004 10:56 PM
Subject: Re: [ntfsd] Open file by ID
> Bill, Tony:
>
> Thanks for the reply. It looks like my âseparateâ question was too broad
which made some confusion. When I referred âfile IDâ I meant precisely
NTFSâs feature, I didnât mean reinventing the wheel. I guess that security
questions related to open by ID are fully addressed by NTFS itself, right?
Well, only sort-of. If you effectively perform by-ID access when the
application has specified a directory-path access, you are clandestinely
substituting the semantics of by-ID access for path access. In particular,
unless you guard against it, you may allow access via a directory path which
is either no longer valid (the âmoveâ - actually, rename - issue I
mentioned) or no longer legal for the accessor (the ACL issues I mentioned):
while the file remains legally accessible by ID, it should not be legally
accessible via the path.
Now, since traversing a directory path takes non-zero time, this may only
drastically widen the window in which changes in the portion of the path
already traversed in a tree-walk donât affect the balance of the look-up
(though my vague recollection is that NTFS may actually guard against
renaming any path-element used in the access path to a file while the file
is accessed, in which case again you would be changing real system
semantics, albeit subtly).
>
> Unfortunately (and to my surprise) I didnât see much benefit from using
path-ID map and using ID-open instead of path-open. I donât know how to
explain that yet but on 1M files environment Iâve seen just 2-3% improvement
in performance.
That may just mean that youâre not cache-constrained in keeping directories
memory-resident once theyâve been accessed, and all youâre seeing is the
difference in instruction path-length (i.e., no, or minimal, net saving in
disk I/O). Cut down significantly on the physical memory available for
caching (e.g., by increasing other system activity that competes with it)
and the value of the path-cache should become greater.
As Tony mentioned, Unix has done this kind of thing for a long time too.
IIRC Linux, for example, caches individual âdentriesâ for each
recently-accessed target within a given directory, and thus can walk the
directory path for a recently-accessed file by using the dentries in
succession to resolve each inode ID in the path - which also provides help
with the initial path to other unaccessed targets within the same sub-tree
that contains a recently-accessed target.
- bill
> If I understand your suggestion correctly, DECâs 16-bit RSX systems did
something like this nearly 3 decades ago, called a âpath cacheâ IIRC.
As do UNIX, as do all NTâs filesystems inside themselves.
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
>âDNLCâ cache (Directory Name Lookup Cache). It is a tremendous win, assuming
you can fit
within the restrictions of the DNLC cache itself.
At least in FreeBSD the interface to filesystems is polymorphic, very similar
to the interface to graphics drivers in Windows.
The driver (or UNIX FSD) can define any semantics for some operation which it
wants - be it pathname lookup or TextOut.
There is also the standard semantics, and the driver can just define its
semantics to be equal to standard one - by either not hooking the operation, or
declaring the standard routine as a part of its dispatch table.
Also the driverâs semantics, can be either wrapped around the standard one - or
fallthru to standard one in some cases (âpuntingâ).
For instance, the âlookupâ call - resolve pathname to vnode - is passed to the
driver. The standard semantics for âlookupâ decomposes it to 2 calls to the
driver of âresolve pathname to file IDâ then âload vnode by file IDâ. These 2
calls also have their standard semantics where caching is implemented.
So, the particular filesystem can completely override the name cache or vnode
cache, or both.
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
Bill:
Thanks! Fortunately, Iâm not developing a general-purpose system. What
Iâm trying to do is to address a particular problem in a particular
product. So, security-related issues are not that much of concern here.
Moving file, of course, could be. But, I think, on Win installation itâs
only defrag and prefetch that move files (correct me if Iâm wrong). And
those can be addressed administratively.
Anyways, it looks like Iâm going to have a hard time justifying ID-open
vs. Path-open. Results of the simple testing are not that compelling at
all. Iâm agreed with you that under a heavy memory usage results will be
much different and in big favor for ID-open but the actual problem was
scalability. And thatâs another thing I canât explainâŚ
On 32K files environment average time for open by ID was around 1.5
msec/file. On 1M files env. it was around 12 msec/file. Unless I was
missing something really obvious, I was expecting those times to be
close because (as I understood) open by ID should be independent from
number of files to search through. Any thoughts on that? Of course,
mapping name to ID on 1M map will be slower than on 32K map, but not
that significantlyâŚ
Any thoughts on that?
Regards,
Vladimir
-----Original Message-----
From: Bill Todd [mailto:xxxxx@metrocast.net]
Sent: Friday, June 11, 2004 9:56 PM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] Open file by ID
----- Original Message -----
From: âVladimir Chtchetkineâ
To: âWindows File Systems Devs Interest Listâ
Sent: Friday, June 11, 2004 10:56 PM
Subject: Re: [ntfsd] Open file by ID
> Bill, Tony:
>
> Thanks for the reply. It looks like my âseparateâ question was too
broad
which made some confusion. When I referred âfile IDâ I meant precisely
NTFSâs feature, I didnât mean reinventing the wheel. I guess that
security
questions related to open by ID are fully addressed by NTFS itself,
right?
Well, only sort-of. If you effectively perform by-ID access when the
application has specified a directory-path access, you are clandestinely
substituting the semantics of by-ID access for path access. In
particular,
unless you guard against it, you may allow access via a directory path
which
is either no longer valid (the âmoveâ - actually, rename - issue I
mentioned) or no longer legal for the accessor (the ACL issues I
mentioned):
while the file remains legally accessible by ID, it should not be
legally
accessible via the path.
Now, since traversing a directory path takes non-zero time, this may
only
drastically widen the window in which changes in the portion of the path
already traversed in a tree-walk donât affect the balance of the look-up
(though my vague recollection is that NTFS may actually guard against
renaming any path-element used in the access path to a file while the
file
is accessed, in which case again you would be changing real system
semantics, albeit subtly).
>
> Unfortunately (and to my surprise) I didnât see much benefit from
using
path-ID map and using ID-open instead of path-open. I donât know how to
explain that yet but on 1M files environment Iâve seen just 2-3%
improvement
in performance.
That may just mean that youâre not cache-constrained in keeping
directories
memory-resident once theyâve been accessed, and all youâre seeing is the
difference in instruction path-length (i.e., no, or minimal, net saving
in
disk I/O). Cut down significantly on the physical memory available for
caching (e.g., by increasing other system activity that competes with
it)
and the value of the path-cache should become greater.
As Tony mentioned, Unix has done this kind of thing for a long time too.
IIRC Linux, for example, caches individual âdentriesâ for each
recently-accessed target within a given directory, and thus can walk the
directory path for a recently-accessed file by using the dentries in
succession to resolve each inode ID in the path - which also provides
help
with the initial path to other unaccessed targets within the same
sub-tree
that contains a recently-accessed target.
- bill
â
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
What
Iâm trying to do is to address a particular problem in a particular
product. So, security-related issues are not that much of concern here.
Moving file, of course, could be. But, I think, on Win installation itâs
only defrag and prefetch that move files (correct me if Iâm wrong).
The issue I noted wasnât physical movement on the disk, but logical movement
within the directory structure caused by renaming the file or one of the
directories in the path to it without updating the path-cache entry for it.
I still donât think you understand that issue, though itâs a subtle one and
may not be a problem for the particular situation youâre trying to address.
âŚ
Anyways, it looks like Iâm going to have a hard time justifying ID-open
vs. Path-open. Results of the simple testing are not that compelling at
all. Iâm agreed with you that under a heavy memory usage results will be
much different and in big favor for ID-open but the actual problem was
scalability. And thatâs another thing I canât explainâŚ
On 32K files environment average time for open by ID was around 1.5
msec/file. On 1M files env. it was around 12 msec/file. Unless I was
missing something really obvious, I was expecting those times to be
close because (as I understood) open by ID should be independent from
number of files to search through. Any thoughts on that? Of course,
mapping name to ID on 1M map will be slower than on 32K map, but not
that significantlyâŚ
Any thoughts on that?
The only one that comes to mind involves relative caching of the MFT. With
the smaller number of files, the MFT may have been largely cache-resident
(in fact, it pretty well had to be, since the average time for the open was
only a small fraction of the time required for a single disk access) such
that after the first file in a region of the MFT was opened several others
leveraged the cached portion of the MFT to avoid having to perform any disk
access on the open. With the larger number of files, their MFT records may
have been sufficiently spread out that most opens required a single disk
access (to the usually uncached MFT record), which is about what you saw
(assuming a 7200 rpm ATA disk).
Bill: Youâre correct in your assumption that rename is not an issue in
my particular situation. I do understand that path->ID map is going to
be broken by rename so I would have to synch my map (which is pretty
heavy job), but renames (neither file nor path) are not going to occur
in this particular system. Thatâs why I didnât pay much attention on
renames itself.
But what is still puzzling me is that a) average ID-open time is not
nearly a constant and b) it stays very close to path-open time (within
the same environment). I understand that caching greatly influence both
pictures but to my taste itâs too much correlation between
(theoretically independent) path-based times and ID-based times.
Something smells fishy ![]()
Currently Iâm testing entirely from the UM and using UM file IDs (16
bytes). Iâm also going to try KM IDs (8 bytes, I believe) and see how
this going to change the picture.
Thanks for your willingness to help ![]()
Best regards,
Vladimir
-----Original Message-----
From: Bill Todd [mailto:xxxxx@metrocast.net]
Sent: Monday, June 14, 2004 4:08 PM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] Open file by ID
What
Iâm trying to do is to address a particular problem in a particular
product. So, security-related issues are not that much of concern here.
Moving file, of course, could be. But, I think, on Win installation itâs
only defrag and prefetch that move files (correct me if Iâm wrong).
The issue I noted wasnât physical movement on the disk, but logical
movement
within the directory structure caused by renaming the file or one of the
directories in the path to it without updating the path-cache entry for
it.
I still donât think you understand that issue, though itâs a subtle one
and
may not be a problem for the particular situation youâre trying to
address.
âŚ
Anyways, it looks like Iâm going to have a hard time justifying ID-open
vs. Path-open. Results of the simple testing are not that compelling at
all. Iâm agreed with you that under a heavy memory usage results will be
much different and in big favor for ID-open but the actual problem was
scalability. And thatâs another thing I canât explainâŚ
On 32K files environment average time for open by ID was around 1.5
msec/file. On 1M files env. it was around 12 msec/file. Unless I was
missing something really obvious, I was expecting those times to be
close because (as I understood) open by ID should be independent from
number of files to search through. Any thoughts on that? Of course,
mapping name to ID on 1M map will be slower than on 32K map, but not
that significantlyâŚ
Any thoughts on that?
The only one that comes to mind involves relative caching of the MFT.
With
the smaller number of files, the MFT may have been largely
cache-resident
(in fact, it pretty well had to be, since the average time for the open
was
only a small fraction of the time required for a single disk access)
such
that after the first file in a region of the MFT was opened several
others
leveraged the cached portion of the MFT to avoid having to perform any
disk
access on the open. With the larger number of files, their MFT records
may
have been sufficiently spread out that most opens required a single disk
access (to the usually uncached MFT record), which is about what you saw
(assuming a 7200 rpm ATA disk).
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com