I spent many hours discussing this in my file systems class, so it’s difficult to capture that in an e-mail on the discussion forum.
The issue of “direct” versus “buffered” versus “neither” relate to the transfer of information between two distinct address spaces (user mode versus kernel mode) while the job of the memory manager is to control the mapping of pages within an address space (“virtual address”) to its corresponding data, which includes physical memory. When a virtual-to-physical mapping is not defined, the hardware cannot do the translation and invokes the OS - this is known as a “page fault” and happens when the virtual to physical mapping is not defined (for the hardware) OR when the operation on the memory is not consistent with the protection on the virtual page, such as a write to a read-only (virtual) page.
In Windows the Memory Manager is responsible for handling virtual memory. When a page fault occurs the kernel receives a processor fault (0xE on x86 and x64 CPUs) and control is transferred to a function in the kernel (e.g., KiTrap0E). That OS function does some preliminary analysis of the page fault and then normally transfers control to the Memory Manager (MmAccessFault).
The hardware only has its translation cache (“translation lookaside buffer”) and the page tables (shared between OS and CPU) to consult. The Memory Manager maintains a number of other data structures that allow it to “fix up” the virtual to physical mapping so that it can “satisfy” the page fault. For a region of virtual memory that represents a file backed section, Mm keeps a pointer to a file object. It can then allocate a physical page (or pages) and call the I/O Manager to “fill in” those pages with the correct data (IoPageRead, which is actually in ntifs.h). The IRPs built in this case will have the IRP_PAGING_IO bit set. While it tells a file system and filter that this is a paging I/O, the I/O Manager does it because it needs to know how to properly clean up the IRP when it is done. For example, the I/O Manager does not use APCs to indicate completion of a paging I/O IRP - it directly sets the event object in the IRP, as the memory manager has never allowed APCs during paging I/O (XP and earlier it was done by raising to IRQL APC_LEVEL, S03 and beyond it’s done by using a guarded mutex, which disables all APCs via a different mechanism that doesn’t raise IRQL).
So buffered/neither/direct have to do with how the OS transfers information from the user address space into the OS address space:
- Buffered allocates a kernel buffer and copies the data (it is “captured”)
- Direct builds an MDL that describes the virtual pages of the user data. When you “lock” those pages, it means that the Virtual-to-Physical translation is fixed. Those physical pages cannot be used for anything else by the OS as long as the buffer is locked. The kernel address is then constructed by using the locked MDL description to create a second virtual-to-physical mapping to those pages (“MmGetSystemAddressForMdlSafe” which calls MmMapLockedPages when the mapping hasn’t been set in the MDL yet, or uses the mapping indicated in the MDL). The advantage of this approach is that there’s no data copy. The disadvantage of this approach is that the contents of the buffer are shared with user mode and thus are not captured. But the kernel address is valid and remains so until the mapping is torn down.
- Neither hands the user’s buffer address to the OS. It may or may not be valid and it may be valid at the start of an operation and go invalid by the end of the operation.
I hope this helps clarify these concepts in your mind. VM is amazingly complicated and the file systems in Windows and the VM system are very intertwined with one another.
Tony
OSR
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Muhammad Umair
Sent: Sunday, January 27, 2013 9:17 AM
To: ntfsd redirect
Subject: Re:[ntfsd] [FLTFL_OPERATION_REGISTRATION_SKIP_PAGING_IO] Why is this flag used here? (minifilter)
Thankyou so much for the clarification!
Forgive my lack of knowledge, i need a little explanation of how paged i/o works. I understand that, some of the memory will not be available physically, instead it will be paged to a storage device. I also understand that when you try to read something that is paged, the system will bring in that page onto the physical memory then it will be read.
In case of for example a write operation, I have read about the different I/O types, direct, buffered and niether. How does the paged I/O fit into the write operation? I mean when the write operation is buffered then maybe the allocated buffer has been paged? but in case of say direct I/O, there is no buffer, than how will paged I/O fit into that? or will the MDL be paged or something? Similarly, the Neither I/O i think only uses a virtual address, no MDL or buffer. Forgive my confusion but i am just trying to figure out how this all fits together.
On Sun, 27 Jan 2013 07:37:14 +0500, Tony Mason wrote:
> The paging I/O operations in the set information path change the file
> sizes. Those cannot matter for deletion detection.
>
> There are no paging I/O operations for create or cleanup.
>
> The four operations that support paging I/O operations are: read,
> write, query information and set information.
>
> Tony
> OSR
>
>
–
Using Opera’s mail client: http://www.opera.com/mail/
—
NTFSD is sponsored by OSR
OSR is hiring!! Info at http://www.osr.com/careers
For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer