Zeroing a buffer in the page fault handler...

Hi All,

I am having a problem with a file system I am developing. It basically comes
down to zeroing a buffer that is passed into the filesystem for the paging
io path. The reason for zeroing the buffer is the read is over a hole (and
no we don’t yet support the windows sparse file ioctl interface),
unfortunately when the buffer is zeroed the handler returns from the paging
request (so this file system supports holes transparently), the pages are
marked as dirty and flushed by the lazy writer at some point later.

So on filesystem block sizes greater than PAGE_SIZE, extents are allocated
(so the hole or part there of is converted) and when we read off disk
(before or past where the lazy writer flushed pages to) we get junk based on
what is on disk. So how can I zero a buffer without dirtying the pages(s)?
So the lazy writer doesn’t unnecessarily flush zeroed pages to disk.

PS I have tried using the CcPinRead API but ran into problems with
exceptions thrown due to STATUS_IN_PAGE_ERROR, I put it down to something
you don’t want to do in the fault handler (haven’t investigated that any
further as yet) my assumption was the fault handler pins the pages anyway
and calls CcSetDirtyPinnedData() or something to that effect…

Thanks,

  • Ian Costello

Try the following:

  • if the whole request is inside the hole, fail it with STATUS_END_OF_FILE.

  • if the tail of the request is inside the hole, set Irp->IoStatus.Information
    properly to not cover the range which intersected to a hole.

This will force MM to do its own zeroing.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Ian Costello”
To: “Windows File Systems Devs Interest List”
Sent: Tuesday, September 28, 2004 5:32 AM
Subject: [ntfsd] Zeroing a buffer in the page fault handler…

> Hi All,
>
> I am having a problem with a file system I am developing. It basically comes
> down to zeroing a buffer that is passed into the filesystem for the paging
> io path. The reason for zeroing the buffer is the read is over a hole (and
> no we don’t yet support the windows sparse file ioctl interface),
> unfortunately when the buffer is zeroed the handler returns from the paging
> request (so this file system supports holes transparently), the pages are
> marked as dirty and flushed by the lazy writer at some point later.
>
> So on filesystem block sizes greater than PAGE_SIZE, extents are allocated
> (so the hole or part there of is converted) and when we read off disk
> (before or past where the lazy writer flushed pages to) we get junk based on
> what is on disk. So how can I zero a buffer without dirtying the pages(s)?
> So the lazy writer doesn’t unnecessarily flush zeroed pages to disk.
>
> PS I have tried using the CcPinRead API but ran into problems with
> exceptions thrown due to STATUS_IN_PAGE_ERROR, I put it down to something
> you don’t want to do in the fault handler (haven’t investigated that any
> further as yet) my assumption was the fault handler pins the pages anyway
> and calls CcSetDirtyPinnedData() or something to that effect…
>
> Thanks,
>
> - Ian Costello
>
>
>
> —
> Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

Do you create an MDL that describes the buffer to be zeroed? If you create
such MDL it may be the source of the problem.

Alexei.

“Ian Costello” wrote in message news:xxxxx@ntfsd…
> Hi All,
>
> I am having a problem with a file system I am developing. It basically
comes
> down to zeroing a buffer that is passed into the filesystem for the paging
> io path. The reason for zeroing the buffer is the read is over a hole (and
> no we don’t yet support the windows sparse file ioctl interface),
> unfortunately when the buffer is zeroed the handler returns from the
paging
> request (so this file system supports holes transparently), the pages are
> marked as dirty and flushed by the lazy writer at some point later.
>
> So on filesystem block sizes greater than PAGE_SIZE, extents are allocated
> (so the hole or part there of is converted) and when we read off disk
> (before or past where the lazy writer flushed pages to) we get junk based
on
> what is on disk. So how can I zero a buffer without dirtying the pages(s)?
> So the lazy writer doesn’t unnecessarily flush zeroed pages to disk.
>
> PS I have tried using the CcPinRead API but ran into problems with
> exceptions thrown due to STATUS_IN_PAGE_ERROR, I put it down to something
> you don’t want to do in the fault handler (haven’t investigated that any
> further as yet) my assumption was the fault handler pins the pages anyway
> and calls CcSetDirtyPinnedData() or something to that effect…
>
> Thanks,
>
> - Ian Costello
>
>
>

Thanks Max,

We do return STATUS_END_OF_FILE for a request that covers the hole. This
works fine and the Mm seems to do the right thing, the other case which you
discussed works fine as well (so 66% of the way to sorting out the problem
:))…

Anyway, when the head of the request covers a hole and the tail covers an
allocated extent we are in trouble. As we will return bytes read and not
fault in the remaining page(s) which are backed by allocated extent(s).

So is there anyway of turning off the dirty bit for the pages that cover the
hole in the above case (in which we are going to have to zero)?

  • Ian

Try the following:

  • if the whole request is inside the hole, fail it with
    STATUS_END_OF_FILE.

  • if the tail of the request is inside the hole, set
    Irp->IoStatus.Information
    properly to not cover the range which intersected to a hole.

This will force MM to do its own zeroing.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Create an alternative mapping to the same region of memory (a new MDL,
with a new VA assigned to it) and zero that. Then tear down that new
set of PTEs and new MDL - your dirty bit has been discarded, since it
was done through a different PTE.

This technique would work on any platform and is essentially the same
reason that PIO devices don’t force a rewrite of the data they just read
from the disk.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ian Costello
Sent: Tuesday, September 28, 2004 8:54 PM
To: ntfsd redirect
Subject: RE: [ntfsd] Zeroing a buffer in the page fault handler…

Thanks Max,

We do return STATUS_END_OF_FILE for a request that covers the hole. This
works fine and the Mm seems to do the right thing, the other case which
you
discussed works fine as well (so 66% of the way to sorting out the
problem
:))…

Anyway, when the head of the request covers a hole and the tail covers
an
allocated extent we are in trouble. As we will return bytes read and not
fault in the remaining page(s) which are backed by allocated extent(s).

So is there anyway of turning off the dirty bit for the pages that cover
the
hole in the above case (in which we are going to have to zero)?

  • Ian

Try the following:

  • if the whole request is inside the hole, fail it with
    STATUS_END_OF_FILE.

  • if the tail of the request is inside the hole, set
    Irp->IoStatus.Information
    properly to not cover the range which intersected to a hole.

This will force MM to do its own zeroing.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Tony,

Thanks for the suggestion, I have tried implementing “something” which I
have added below. Could you please review and determine if it is correct, as
the pages are still being paged out by the lazy writer. So the buffer that
is passed into this function comes from the cache manager in the form of
paging io (and is an mdl), and is mapped in via
MmGetSystemAddressForMdlSafe()

NTSTATUS
fs_zero_buffer(void *buffer, off_t offset, off_t length)
{
LARGE_INTEGER FileOffset;
PVOID ZeroBuffer;
PVOID Bcb = NULL;
PMDL Mdl = NULL;

FileOffset.QuadPart = offset;

/* todo: check that we have not gone across a 256k boundary */

/* Allocate an Mdl to describe the mapped data */
Mdl = IoAllocateMdl(buffer, length, FALSE, FALSE, NULL);
if (Mdl == NULL) {
return STATUS_INSUFFICIENT_RESOURCES;
}

/* lock down the data into memory */
MmBuildMdlForNonPagedPool(Mdl);
Mdl->MdlFlags |= MDL_MAPPED_TO_SYSTEM_VA;
ZeroBuffer = MmMapLockedPagesSpecifyCache(Mdl,
KernelMode,
MmCached,
NULL,
FALSE,
NormalPagePriority);
if (Buffer == NULL) {
IoFreeMdl(Mdl);
return STATUS_INSUFFICIENT_RESOURCES;
}
/* Zero the buffer, this will set the dirty bit */
RtlZeroMemory(ZeroBuffer, length);
/* Cleanup the Mdl */
MmUnmapLockedPages(Mdl->MappedSystemVa, Mdl);
IoFreeMdl(Mdl);

return STATUS_SUCCESS;
}

I am at the point where I am hacking up something to query whether a PTE has
the dirty bit set (also have trace inside the function although I haven’t
included it above). I have confirmed that a new VA for the Buffer is
created. Note: this code is not complete as there are other caveats that
have to be handled which I haven’t as yet done.

Thanks,

Ian Costello

Create an alternative mapping to the same region of memory (a new MDL,
with a new VA assigned to it) and zero that. Then tear down that new
set of PTEs and new MDL - your dirty bit has been discarded, since it
was done through a different PTE.

This technique would work on any platform and is essentially the same
reason that PIO devices don’t force a rewrite of the data they just read
from the disk.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

Whoops, slight typo below, where I check for a NULL Buffer. It should be
ZeroBuffer.

Tony,

Thanks for the suggestion, I have tried implementing “something”
which I have added below. Could you please review and determine
if it is correct, as the pages are still being paged out by the
lazy writer. So the buffer that is passed into this function
comes from the cache manager in the form of paging io (and is an
mdl), and is mapped in via MmGetSystemAddressForMdlSafe()

NTSTATUS
fs_zero_buffer(void *buffer, off_t offset, off_t length)
{
LARGE_INTEGER FileOffset;
PVOID ZeroBuffer;
PVOID Bcb = NULL;
PMDL Mdl = NULL;

FileOffset.QuadPart = offset;

/* todo: check that we have not gone across a 256k boundary */

/* Allocate an Mdl to describe the mapped data */
Mdl = IoAllocateMdl(buffer, length, FALSE, FALSE, NULL);
if (Mdl == NULL) {
return STATUS_INSUFFICIENT_RESOURCES;
}

/* lock down the data into memory */
MmBuildMdlForNonPagedPool(Mdl);
Mdl->MdlFlags |= MDL_MAPPED_TO_SYSTEM_VA;
ZeroBuffer = MmMapLockedPagesSpecifyCache(Mdl,
KernelMode,
MmCached,
NULL,
FALSE,
NormalPagePriority);
if (Buffer == NULL) {
IoFreeMdl(Mdl);
return STATUS_INSUFFICIENT_RESOURCES;
}
/* Zero the buffer, this will set the dirty bit */
RtlZeroMemory(ZeroBuffer, length);
/* Cleanup the Mdl */
MmUnmapLockedPages(Mdl->MappedSystemVa, Mdl);
IoFreeMdl(Mdl);

return STATUS_SUCCESS;
}

I am at the point where I am hacking up something to query
whether a PTE has the dirty bit set (also have trace inside the
function although I haven’t included it above). I have confirmed
that a new VA for the Buffer is created. Note: this code is not
complete as there are other caveats that have to be handled which
I haven’t as yet done.

Thanks,

Ian Costello

Ian,

By using an MDL for non-paged pool and setting the MAPPED_TO_SYSTEM_VA
bit you’ve circumvented the system so it doesn’t create a second
mapping. Since there’s no second mapping, the dirty bit gets set in the
original mapping.

We first started using this “trick” as a mechanism for modifying
read-only memory. It does work, but the implementation isn’t nearly so
complicated as your implementation - best to take the VA, probe and lock
it, and call MmGetSystemAddressForMdl - the VA you get should NOT be the
same as the original buffer.

Hope this helps.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ian Costello
Sent: Monday, October 04, 2004 1:05 AM
To: ntfsd redirect
Subject: RE: [ntfsd] Zeroing a buffer in the page fault handler…

Whoops, slight typo below, where I check for a NULL Buffer. It should be
ZeroBuffer.

Tony,

Thanks for the suggestion, I have tried implementing “something”
which I have added below. Could you please review and determine
if it is correct, as the pages are still being paged out by the
lazy writer. So the buffer that is passed into this function
comes from the cache manager in the form of paging io (and is an
mdl), and is mapped in via MmGetSystemAddressForMdlSafe()

NTSTATUS
fs_zero_buffer(void *buffer, off_t offset, off_t length)
{
LARGE_INTEGER FileOffset;
PVOID ZeroBuffer;
PVOID Bcb = NULL;
PMDL Mdl = NULL;

FileOffset.QuadPart = offset;

/* todo: check that we have not gone across a 256k boundary */

/* Allocate an Mdl to describe the mapped data */
Mdl = IoAllocateMdl(buffer, length, FALSE, FALSE, NULL);
if (Mdl == NULL) {
return STATUS_INSUFFICIENT_RESOURCES;
}

/* lock down the data into memory */
MmBuildMdlForNonPagedPool(Mdl);
Mdl->MdlFlags |= MDL_MAPPED_TO_SYSTEM_VA;
ZeroBuffer = MmMapLockedPagesSpecifyCache(Mdl,
KernelMode,
MmCached,
NULL,
FALSE,
NormalPagePriority);
if (Buffer == NULL) {
IoFreeMdl(Mdl);
return STATUS_INSUFFICIENT_RESOURCES;
}
/* Zero the buffer, this will set the dirty bit */
RtlZeroMemory(ZeroBuffer, length);
/* Cleanup the Mdl */
MmUnmapLockedPages(Mdl->MappedSystemVa, Mdl);
IoFreeMdl(Mdl);

return STATUS_SUCCESS;
}

I am at the point where I am hacking up something to query
whether a PTE has the dirty bit set (also have trace inside the
function although I haven’t included it above). I have confirmed
that a new VA for the Buffer is created. Note: this code is not
complete as there are other caveats that have to be handled which
I haven’t as yet done.

Thanks,

Ian Costello


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com