how to modify data on the fly?

Hi All,

Is possible for a filter to modify data on the fly only ? I mean just change data during IRP requst and FASTIO( the data in cache is same as those on disk), but it seems not work for the notepad . I learn from someone the reason is the filemapping, but how to hook the data fetching for filemapping ?

Any advise is very appriated.

Best regards,

Ryan


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

trap Paging IO IRPs. You recognize those because they have IRP_PAGING_IO |
IRP_NOCACHE flags.

However, this WILL cause the data in the cache to be altered and NOT
represent the actual data on disk. I believe you stated that you wanted
the cache manager to have the actual data and not your modified data. I
believe this is something you just can’t do for memory mapping. I
recommend that you keep cache manager consistent with the data you want
to represent and deal with translation back onto your disk image on
paging writes from your translated data.

-----Original Message-----
From: Dan Partelly [mailto:xxxxx@rdsor.ro]
Sent: Wednesday, July 09, 2003 6:05 AM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

trap Paging IO IRPs. You recognize those because they have IRP_PAGING_IO
|
IRP_NOCACHE flags.


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Thanks a lot for your suggestions.

One more question: can it work with FSDs unsupport cache manager?

----- Original Message ----- From: “Tom Hansen”
To: “File Systems Developers”
Sent: Wednesday, July 09, 2003 10:47 PM
Subject: [ntfsd] Re: how to modify data on the fly?

However, this WILL cause the data in the cache to be altered and NOT
represent the actual data on disk. I believe you stated that you wanted
the cache manager to have the actual data and not your modified data. I
believe this is something you just can’t do for memory mapping. I
recommend that you keep cache manager consistent with the data you want
to represent and deal with translation back onto your disk image on
paging writes from your translated data.

-----Original Message-----
From: Dan Partelly [mailto:xxxxx@rdsor.ro]
Sent: Wednesday, July 09, 2003 6:05 AM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

trap Paging IO IRPs. You recognize those because they have IRP_PAGING_IO
|
IRP_NOCACHE flags.


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@yahoo.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

---------------------------------
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

What is “unsupport cache manager”? FSDs all use the cache manager and
Microsoft supports it, even if the algorithms are not the most optimal for
all circumstances. You can’t write a cache manager that can provide optimal
speed for an OS that can be used as a workstation, SQL server, file server,
print server, IIS server, etc.

Now, if you mean can you write a file system filter that can provide
encryption and also support memory mapped files, I can, but I don’t know
about you. Getting this right (correct) is one of the most difficult
drivers to write. There are no examples unless you pay for them. I know
one company that will provide libraries and maybe a sample or two, but the
last I heard it is the vicinity of $50,000.

P.S. I don’t work for that company.

P.P.S. Most of the people in newsgroups dislike (hate) HTML messages.

“Ryan David” wrote in message news:xxxxx@ntfsd…
Thanks a lot for your suggestions.

One more question: can it work with FSDs unsupport cache manager?

----- Original Message -----
From: “Tom Hansen”
To: “File Systems Developers”
Sent: Wednesday, July 09, 2003 10:47 PM
Subject: [ntfsd] Re: how to modify data on the fly?

However, this WILL cause the data in the cache to be altered and NOT
represent the actual data on disk. I believe you stated that you wanted
the cache manager to have the actual data and not your modified data. I
believe this is something you just can’t do for memory mapping. I
recommend that you keep cache manager consistent with the data you want
to represent and deal with translation back onto your disk image on
paging writes from your translated data.

-----Original Message-----
From: Dan Partelly [mailto:xxxxx@rdsor.ro]
Sent: Wednesday, July 09, 2003 6:05 AM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

trap Paging IO IRPs. You recognize those because they have IRP_PAGING_IO
|
IRP_NOCACHE flags.


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@yahoo.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

> speed for an OS that can be used as a workstation, SQL server

SQL servers do not rely on the OS’s cache.

Max

Does SQL server disable the cache manager? If not, CC is still active.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
>
> > speed for an OS that can be used as a workstation, SQL server
>
> SQL servers do not rely on the OS’s cache.
>
> Max
>
>
>
>

I believe SQL does non-buffered i/o to effectively disable cache manager
for its requests.

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 12:59 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does SQL server disable the cache manager? If not, CC is still active.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
>
> > speed for an OS that can be used as a workstation, SQL server
>
> SQL servers do not rely on the OS’s cache.
>
> Max
>
>
>
>


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Does that say anything about Microsoft’s, or some part thereof, view of
their own cache manager? I have done it myself but the 512 mod/multiple
requests can be fun. Maybe that explains why SQL needs the 3GB user address
space.

“Tom Hansen” wrote in message news:xxxxx@ntfsd…

I believe SQL does non-buffered i/o to effectively disable cache manager
for its requests.

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 12:59 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does SQL server disable the cache manager? If not, CC is still active.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
>
> > speed for an OS that can be used as a workstation, SQL server
>
> SQL servers do not rely on the OS’s cache.
>
> Max
>
>
>
>


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Actually, it talks alot about database design, where reliablity of data
(i.e. being sure it is on disk) is paramount. SQL and other enterprise
databases take the performance hit, to be sure that transactions are
present.

Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting

“David J. Craig” wrote in message
news:LYRIS-1729-118105-2003.07.10-14.50.24–burn#xxxxx@lists.osr.com…
> Does that say anything about Microsoft’s, or some part thereof, view of
> their own cache manager? I have done it myself but the 512 mod/multiple
> requests can be fun. Maybe that explains why SQL needs the 3GB user
address
> space.
>
> “Tom Hansen” wrote in message news:xxxxx@ntfsd…
>
> I believe SQL does non-buffered i/o to effectively disable cache manager
> for its requests.
>
> -----Original Message-----
> From: David J. Craig [mailto:xxxxx@yoshimuni.com]
> Sent: Thursday, July 10, 2003 12:59 PM
> To: File Systems Developers
> Subject: [ntfsd] Re: how to modify data on the fly?
>
> Does SQL server disable the cache manager? If not, CC is still active.
>
> “Maxim S. Shatskih” wrote in message
> news:xxxxx@ntfsd…
> >
> > > speed for an OS that can be used as a workstation, SQL server
> >
> > SQL servers do not rely on the OS’s cache.
> >
> > Max
> >
> >
> >
> >
>
>
>
> —
> You are currently subscribed to ntfsd as: xxxxx@exagrid.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
>
>
>
> —
> You are currently subscribed to ntfsd as: xxxxx@acm.org
> To unsubscribe send a blank email to xxxxx@lists.osr.com

WinFS (aka SQL Server as a filesystem) is going to be fun. All hail
Moore’s law…

David J. Craig wrote:

Does that say anything about Microsoft’s, or some part thereof, view of
their own cache manager? I have done it myself but the 512 mod/multiple
requests can be fun. Maybe that explains why SQL needs the 3GB user address
space.

“Tom Hansen” wrote in message news:xxxxx@ntfsd…
>
> I believe SQL does non-buffered i/o to effectively disable cache manager
> for its requests.
>
> -----Original Message-----
> From: David J. Craig [mailto:xxxxx@yoshimuni.com]
> Sent: Thursday, July 10, 2003 12:59 PM
> To: File Systems Developers
> Subject: [ntfsd] Re: how to modify data on the fly?
>
> Does SQL server disable the cache manager? If not, CC is still active.
>
> “Maxim S. Shatskih” wrote in message
> news:xxxxx@ntfsd…
>
>>>speed for an OS that can be used as a workstation, SQL server
>>
>>SQL servers do not rely on the OS’s cache.
>>
>> Max
>>
>>
>>
>>
>
>
>
>
> —
> You are currently subscribed to ntfsd as: xxxxx@exagrid.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
>
>
>
> —
> You are currently subscribed to ntfsd as: xxxxx@nryan.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>


- Nick Ryan (MVP for DDK)

A requirement of transactional systems is that they must be able to control
the order of write operations. This is because the journal must be written
in a specific order with respect to the actual protected data. The exact
details depend upon they type of journaling.

For example, in the ancient past, I worked on a journaling file system that
used a log technique known as “old value/new value”. This means that each
entry within a transaction contained both the old data value (for “undo”) as
well as the new data value (for “redo”). Thus, we had a set of ordering
constraints on how data was written (first the undo information had to be on
disk, then the data block could be written after that point.

In general, this scheme does not work if there is an out-of-order caching
scheme within this system. Thus, transactional systems (like SQL) use
non-cached I/O. They can implement their OWN caching, but they do not rely
upon the file system data cache. The one model where caching will work is
if the ordering of operations is preserved. However, this can be quite
complicated for a database (such as SQL) where the log may be located on
another device - the order preservation would need to be across devices.
Hence the simplest solution is to avoid adding caching at all.

Many years ago there was a commercial product that would speed up SQL. It
was a file system filter driver that would enable caching! The fine print
was the funniest part - it recommended that you only use their product in
conjunction with a UPS.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 2:50 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does that say anything about Microsoft’s, or some part thereof, view of
their own cache manager? I have done it myself but the 512 mod/multiple
requests can be fun. Maybe that explains why SQL needs the 3GB user address
space.

“Tom Hansen” wrote in message news:xxxxx@ntfsd…

I believe SQL does non-buffered i/o to effectively disable cache manager
for its requests.

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 12:59 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does SQL server disable the cache manager? If not, CC is still active.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
>
> > speed for an OS that can be used as a workstation, SQL server
>
> SQL servers do not rely on the OS’s cache.
>
> Max
>
>
>
>


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Also, any complex application engaging in complex access patterns to
files will have a far better idea of the best page management strategy
than general purpose algorithms, no matter how smart they are or can
get. Note in particular that DBs engage in highly structured access
which is predictable given such things as the SQL queries themselves.
When the application involved is engaged in performance benchmarking
into the last ergs of available system capability, it becomes what it
is. Virtually all high end DBs operate on raw disks in the UNIX world,
which is effectively the same thing.

The size requirement for noncached IO may be annoying but noncached IO
means noncached IO. In order to do that you have to present the IO to
the device in the units it does IO in. Otherwise, something has to cache
it to fix the IO alignment/length to match the device requirements. It’s
a pretty physical requirement, nothing fancy.

Dan Lovinger
Microsoft Corporation

This posting is provided “AS IS” with no warranties and confers no
rights.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Thursday, July 10, 2003 12:08 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

A requirement of transactional systems is that they must be able to
control
the order of write operations. This is because the journal must be
written
in a specific order with respect to the actual protected data. The
exact
details depend upon they type of journaling.

For example, in the ancient past, I worked on a journaling file system
that
used a log technique known as “old value/new value”. This means that
each
entry within a transaction contained both the old data value (for
“undo”) as
well as the new data value (for “redo”). Thus, we had a set of ordering
constraints on how data was written (first the undo information had to
be on
disk, then the data block could be written after that point.

In general, this scheme does not work if there is an out-of-order
caching
scheme within this system. Thus, transactional systems (like SQL) use
non-cached I/O. They can implement their OWN caching, but they do not
rely
upon the file system data cache. The one model where caching will work
is
if the ordering of operations is preserved. However, this can be quite
complicated for a database (such as SQL) where the log may be located on
another device - the order preservation would need to be across devices.
Hence the simplest solution is to avoid adding caching at all.

Many years ago there was a commercial product that would speed up SQL.
It
was a file system filter driver that would enable caching! The fine
print
was the funniest part - it recommended that you only use their product
in
conjunction with a UPS.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 2:50 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does that say anything about Microsoft’s, or some part thereof, view of
their own cache manager? I have done it myself but the 512 mod/multiple
requests can be fun. Maybe that explains why SQL needs the 3GB user
address
space.

“Tom Hansen” wrote in message news:xxxxx@ntfsd…

I believe SQL does non-buffered i/o to effectively disable cache manager
for its requests.

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 12:59 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does SQL server disable the cache manager? If not, CC is still active.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
>
> > speed for an OS that can be used as a workstation, SQL server
>
> SQL servers do not rely on the OS’s cache.
>
> Max
>
>
>
>


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

I triggered a lot of response on this. I do know how to do uncached IO and
have used it and have worked with mainframe database engines in the past. I
still wonder how commits can be ensured with some of the SAN file systems
where the disk drives might be on another computer. I think the designer
has to know all the specifications of the hardware and software to build a
system that can handle commits and journaling correctly. That was the
advantage of mainframes with vendor supplied database engines. How can SQL
know that a write will actually get committed? Even the hard drives have
caches that might be holding data? I guess a UPS is not longer optional,
but I won’t work without one on even my workstation.

“Daniel Lovinger” wrote in message
news:xxxxx@ntfsd…

Also, any complex application engaging in complex access patterns to
files will have a far better idea of the best page management strategy
than general purpose algorithms, no matter how smart they are or can
get. Note in particular that DBs engage in highly structured access
which is predictable given such things as the SQL queries themselves.
When the application involved is engaged in performance benchmarking
into the last ergs of available system capability, it becomes what it
is. Virtually all high end DBs operate on raw disks in the UNIX world,
which is effectively the same thing.

The size requirement for noncached IO may be annoying but noncached IO
means noncached IO. In order to do that you have to present the IO to
the device in the units it does IO in. Otherwise, something has to cache
it to fix the IO alignment/length to match the device requirements. It’s
a pretty physical requirement, nothing fancy.

Dan Lovinger
Microsoft Corporation

This posting is provided “AS IS” with no warranties and confers no
rights.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Thursday, July 10, 2003 12:08 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

A requirement of transactional systems is that they must be able to
control
the order of write operations. This is because the journal must be
written
in a specific order with respect to the actual protected data. The
exact
details depend upon they type of journaling.

For example, in the ancient past, I worked on a journaling file system
that
used a log technique known as “old value/new value”. This means that
each
entry within a transaction contained both the old data value (for
“undo”) as
well as the new data value (for “redo”). Thus, we had a set of ordering
constraints on how data was written (first the undo information had to
be on
disk, then the data block could be written after that point.

In general, this scheme does not work if there is an out-of-order
caching
scheme within this system. Thus, transactional systems (like SQL) use
non-cached I/O. They can implement their OWN caching, but they do not
rely
upon the file system data cache. The one model where caching will work
is
if the ordering of operations is preserved. However, this can be quite
complicated for a database (such as SQL) where the log may be located on
another device - the order preservation would need to be across devices.
Hence the simplest solution is to avoid adding caching at all.

Many years ago there was a commercial product that would speed up SQL.
It
was a file system filter driver that would enable caching! The fine
print
was the funniest part - it recommended that you only use their product
in
conjunction with a UPS.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 2:50 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does that say anything about Microsoft’s, or some part thereof, view of
their own cache manager? I have done it myself but the 512 mod/multiple
requests can be fun. Maybe that explains why SQL needs the 3GB user
address
space.

“Tom Hansen” wrote in message news:xxxxx@ntfsd…

I believe SQL does non-buffered i/o to effectively disable cache manager
for its requests.

-----Original Message-----
From: David J. Craig [mailto:xxxxx@yoshimuni.com]
Sent: Thursday, July 10, 2003 12:59 PM
To: File Systems Developers
Subject: [ntfsd] Re: how to modify data on the fly?

Does SQL server disable the cache manager? If not, CC is still active.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntfsd…
>
> > speed for an OS that can be used as a workstation, SQL server
>
> SQL servers do not rely on the OS’s cache.
>
> Max
>
>
>
>


You are currently subscribed to ntfsd as: xxxxx@exagrid.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

No, SQL server works with database files in noncached mode, and has
its own user-mode cache management.

Max

----- Original Message -----
From: “David J. Craig”
Newsgroups: ntfsd
To: “File Systems Developers”
Sent: Thursday, July 10, 2003 8:58 PM
Subject: [ntfsd] Re: how to modify data on the fly?

> Does SQL server disable the cache manager? If not, CC is still
active.
>
> “Maxim S. Shatskih” wrote in message
> news:xxxxx@ntfsd…
> >
> > > speed for an OS that can be used as a workstation, SQL server
> >
> > SQL servers do not rely on the OS’s cache.
> >
> > Max
> >
> >
> >
> >
>
>
>
> —
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> Does that say anything about Microsoft’s, or some part thereof, view
of

their own cache manager?

No.
Database’s cache manager is tightly coupled with the transaction log
engine, which is impossible with system cache manager and with mmap().
So databases make their own cache in user space, and the system cache
manager is shut off since, in the presense of user cache, the system
cache will provide nothing good except extra memcpy()

Max

> (i.e. being sure it is on disk) is paramount. SQL and other
enterprise

databases take the performance hit, to be sure that transactions are

They do not.
Using a user-mode cache instead of kernel-mode one is not a
performance hit.

Max

> constraints on how data was written (first the undo information had
to be on

disk, then the data block could be written after that point.

A common thing in most logging algorithms - first the log record
describing the operation must hit the disk platter, and only after
this the data updated by the operation can be lazy-written.

Also note that SQL server’s logging is by far more complex then, say,
NTFS one, due to being able of explicit transaction rollback, and also
unlimited number of operations per transaction (NTFS has limited one,
since it does not allow the user to declare several updates as single
transaction).

Max

> advantage of mainframes with vendor supplied database engines. How
can SQL

know that a write will actually get committed? Even the hard drives
have
caches that might be holding data? I guess a UPS is not longer
optional,

For a SCSI disk, there is ForceUnitAccess CDB bit. Any noncached IO
will result in such SCSI commands being sent to the disk.

Also you can disable on-disk write cache at all, which is usually done
for IDE disks, where the task file protocol has no means similar to
ForceUnitAccess.

Max