IRQL in FSD/Filter driver

Hi everyone!

Sorry for rising that question again and again, but could anyone verify that
in my FSD/Filter driver all IRPs and Fast I/O request come at IRQL <
DISPATCH_LEVEL? In other words, can I build my filter driver (and decide
from which pool allocate data) in assumption that the only time I can see
DISPATCH_LEVEL or higher is when I acquire a spinlock or in my completion
routine?

Thanks in advance,

Vladimir.

Yes, this is right (driver’s Dispatch/FastIoDispatch routine is
called at IRQL <= APC_LEVEL, CompletionRoutine can be called
at IRQL <= DISPATCH_LEVEL).

But there are some routines (mostly IRP_MJ_READ and IRP_MJ_WRITE)
which even if called at IRQL < DISPATCH_LEVEL cannot cause any
page fault. This is true only for paging files [i.e. IRP_PAGING_IO request to the mapped file can cause a page fault (eg. because of outswapped FCB/CCB) which must be satisfied from some paging file].

Paul

-----P?vodn? zpr?va-----
Od: Chtchetkine, Vladimir [SMTP:xxxxx@Starbase.com]
Odesl?no: 19. ?ervna 2000 19:40
Komu: File Systems Developers
P?edm?t: [ntfsd] IRQL in FSD/Filter driver

Hi everyone!

Sorry for rising that question again and again, but could anyone verify
that in my FSD/Filter driver all IRPs and Fast I/O request come at IRQL <
DISPATCH_LEVEL? In other words, can I build my filter driver (and decide
from which pool allocate data) in assumption that the only time I can see
DISPATCH_LEVEL or higher is when I acquire a spinlock or in my completion
routine?

Thanks in advance,

Vladimir.

Thanks, Paul!

So, if my driver doesn’t allow paging file creation/open (returns some error
from IRP_MJ_CREATE if Flags SL_OPEN_PAGING_FILE is set), I’m free to
allocate my data and put my handlers into PagedPool as long as they are not
called from the completion routine and inside of an acquired spinlock. Sorry
for the annoyance, but this is a design decision I have to make and would
prefer to “over check” than “under check”.

Thanks for the patience :slight_smile:

Vladimir

-----Original Message-----
From: Pavel Hrdina [mailto:xxxxx@sodatsw.cz]
Sent: Monday, June 19, 2000 1:06 PM
To: File Systems Developers
Subject: [ntfsd] RE: IRQL in FSD/Filter driver

Yes, this is right (driver’s Dispatch/FastIoDispatch routine is
called at IRQL <= APC_LEVEL, CompletionRoutine can be called
at IRQL <= DISPATCH_LEVEL).

But there are some routines (mostly IRP_MJ_READ and IRP_MJ_WRITE)
which even if called at IRQL < DISPATCH_LEVEL cannot cause any
page fault. This is true only for paging files [i.e. IRP_PAGING_IO request to the mapped file can cause a page fault (eg. because of outswapped FCB/CCB) which must be satisfied from some paging file].

Paul

-----P?vodn? zpr?va-----
Od: Chtchetkine, Vladimir [SMTP:xxxxx@Starbase.com]
Odesl?no: 19. ?ervna 2000 19:40
Komu: File Systems Developers
P?edm?t: [ntfsd] IRQL in FSD/Filter driver

Hi everyone!

Sorry for rising that question again and again, but could anyone verify
that in my FSD/Filter driver all IRPs and Fast I/O request come at IRQL <
DISPATCH_LEVEL? In other words, can I build my filter driver (and decide
from which pool allocate data) in assumption that the only time I can see
DISPATCH_LEVEL or higher is when I acquire a spinlock or in my completion
routine?

Thanks in advance,

Vladimir.


You are currently subscribed to ntfsd as: xxxxx@Starbase.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

> page fault. This is true only for paging files [i.e. IRP_PAGING_IO request

to the mapped file can cause a page fault (eg. because of outswapped
FCB/CCB)
which must be satisfied from some paging file].

Another question:

  • is there any proof that CcPinMappedData and friends prevent the cache
    page from being outswapped? Can I rely on this?
    For instance, FASTFAT uses MCBs to contain VCN->LCN mapping for a
    file. If the cluster is not in MCB yet - it consults the pinned FAT.
    Can a page fault occur during it?

Max

> - is there any proof that CcPinMappedData and friends prevent the
cache

page from being outswapped? Can I rely on this?

No.

Dirty, pinned modified-no-write pages cannot be written by the lazy
writer, and since the lazy writer is the only thread that will write
those pages they are effectively locked in memory. Clean pinned
modified-no-write pages can be reclaimed like any other clean page in
the system.

If you really want the pages in memory, build an MDL and lock them down.

For instance, FASTFAT uses MCBs to contain VCN->LCN mapping for a
file. If the cluster is not in MCB yet - it consults the pinned FAT.
Can a page fault occur during it?

Yes.

RE: [ntfsd] RE: IRQL in FSD/Filter driver>> - is there any proof that
CcPinMappedData and friends prevent the cache

> page from being outswapped? Can I rely on this?
No.
Dirty, pinned modified-no-write pages cannot be written by the lazy writer,
and since the lazy writer is the only thread that will write those pages
they are
effectively locked in memory.

According to David Solomon’s book, the cache is one more workspace
(separate from processes’ workspaces) and is therefore subject to workspace
management and trimming. Can dirty cache pages be trimmed from the working
set, thus allow MPW thread to flush them to the underlying storage without
the help from the lazy writer?

Second question - CcFlushCache can be used to flush any byte range, which is
possibly lesser than a page. CcFlushCache then causes IoSynchronousPageWrite
back to the same FSD.
What will be the parameters? Will only the requested offset be written on
flush or will it be rounded to a multiple of PAGE_SIZE?

Max

> According to David Solomon’s book, the cache is one more workspace

(separate from processes’ workspaces) and is therefore subject to
workspace
management and trimming. Can dirty cache pages be trimmed from the
working
set, thus allow MPW thread to flush them to the underlying storage
without
the help from the lazy writer?

This is the differentiation between regular streams and
modified-no-write streams. MNW streams can only be written by the cache
manager’s lazy writer, and since pinning of a range in the cache manager
prevents the lazy writer from writing the range, a dirty MNW page is
effectively locked in memory. Mapping pages in the cache does not
perform any synchronization - the page can be written at any time.

So for a regular stream, yes, for a MNW stream, no.

(filesystem metadata streams are always MNW - this prevents partial
updates from being written to disk, and serves as the basis for logging
mechanisms)

Second question - CcFlushCache can be used to flush any byte range,
which is
possibly lesser than a page. CcFlushCache then causes
IoSynchronousPageWrite
back to the same FSD.
What will be the parameters? Will only the requested offset be written
on
flush or will it be rounded to a multiple of PAGE_SIZE?

Pages. Mm has no knowledge of which exact bytes get dirtied in a page to
break it into sector chunks, and Cc can’t assume it has perfect
knowledge of which bytes got dirtied (imagine user mapped sections).

RE: [ntfsd] RE: IRQL in FSD/Filter driver>So for a regular stream, yes, for
a MNW stream, no.

And how the FSD can specify what stream is regular and what is MNW?
By PinAccess parameter to CcInitializeCacheMap?

Pages. Mm has no knowledge of which exact bytes get dirtied in a page to
break it into sector chunks, and Cc can’t assume it has perfect knowledge
of
which bytes got dirtied (imagine user mapped sections).

So - low-level writes are always page-sized, even low-level metadata
writes?

Max

> And how the FSD can specify what stream is regular and what is MNW?

By PinAccess parameter to CcInitializeCacheMap?

This is keyed off the FsContext2 field of the fileobject which first
initiates caching - if it is non-NULL, it is a regular file, if NULL and
the fcb common header flags2 field does not contain
FSRTL_FLAG2_DO_MODIFIED_WRITE, the file is modified-no-write. MNW means
that the lazy writers are the only threads that can write dirty pages in
the stream.

For example, all internal metadata in FAT is modified through MNW
streams (directory streams and the virtual volume file).

So - low-level writes are always page-sized, even low-level metadata
writes?

Paging IO is from pages, truncated to filesize (which the filesystem
will round up to sector size).