huge performance issue with MmMapLockedPagesSpecifyCache

Hey guys, I’m getting a little frustrated on this one.

I’m using a control device object for my minifilter to send writes to
the drivers below me in the stack. I’m using direct i/o because I’m
transfering a lot of data through it, and previously I was using
MmGetSystemAddressForMdlSafe/FltWriteFile to do the writes.

Doing some testing, I was seeing a massive hit for calling
MmGetSystemAddress alone (without calling FltWrite file) (like a 90%
reduction in throughput), so I went and made some changes in my driver
to use FltAllocateCallbackData/FltPerformSynchronousIo. The performance
is still completely lousy. It looks about the same as it did before I
made the change. If I omit the call to FltPerformSynchronousIo, it
blazes along at about 80Mb/sec, when I make the call I see about
7Mb/sec. This pretty much sucks the same way that doing it the old way
did. How suspicious.

So I set a breakpoint on MmMapLockedPagesSpecifyCache, which
MmGetSystemAddress (which is actually a macro) wraps, waited for my call
to FltPerformSynchronousIo, enabled it, and poof, there’s the problem
(stack trace follows)

So, any ideas what I can do in formulating my Io request to avoid Ntfs
making this call?

~Eric

kd> kb
ChildEBP RetAddr Args to Child
fa1369ec f9978840 ffab1000 00000000 00000001
nt!MmMapLockedPagesSpecifyCache
fa136a0c f997d5e8 82336e70 00000010 82336e70 Ntfs!NtfsMapUserBuffer+0x40
fa136bfc f997c7d0 ffb2e510 82336e70 80a5bf00 Ntfs!NtfsCommonWrite+0x1cbd
fa136c70 809b550c 81a51030 82336e70 80873864 Ntfs!NtfsFsdWrite+0x16a
fa136ca0 8081df33 f9a49198 fa136cd4 f9a49198 nt!IovCallDriver+0x112
fa136cac f9a49198 81d4f55c 80873864 00000000 nt!IofCallDriver+0x13
fa136cd4 f9a4c716 fa136cf4 81a58ee8 00000000
fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x3cc
fa136d14 f9aa3c5c 00000000 81e458b0 ffac8f18
fltmgr!FltPerformSynchronousIo+0x1c8
fa136d28 ba65edc3 81d4f55c e10badd4 81cd8768
fltmgr!FltvPerformSynchronousIo+0x80
fa136d50 ba65ebe7 822f6f68 822f6fd8 ffb111f8
OurDriver!ehr_file_write_sync+0x193
fa136d6c 808ec1eb ffac8f18 82268ff0 808ae5fc
OurDriver!ehr_file_write_async_worker+0x27
fa136d80 8088043d 81f4a180 00000000 81e458b0 nt!IopProcessWorkItem+0x13
fa136dac 80949b7c 81f4a180 00000000 00000000 nt!ExpWorkerThread+0xeb
fa136ddc 8088e062 80880352 00000001 00000000
nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

Just to be sure, I double checked, and when I call
FltPerformSynchronousIo, it is my MDL that it’s mapping.

~Eric

Eric,

You don’t specify, but I assume that this is all going to be noncached IO?
If not NTFS will grow a VM so that it can copy into the cache.

If you must use the cache you might want to explore the IRP_MN_MDL approach.
I don’t see any documentation for it, but the FAT sources and the doc for
CcPrepareMdlWrite() should help you piece together what you want…

If you have specified noncached IO then I’m a bit flumoxed…

Rod

“Eric Diven” wrote in message news:xxxxx@ntfsd…
Hey guys, I’m getting a little frustrated on this one.

I’m using a control device object for my minifilter to send writes to
the drivers below me in the stack. I’m using direct i/o because I’m
transfering a lot of data through it, and previously I was using
MmGetSystemAddressForMdlSafe/FltWriteFile to do the writes.

Doing some testing, I was seeing a massive hit for calling
MmGetSystemAddress alone (without calling FltWrite file) (like a 90%
reduction in throughput), so I went and made some changes in my driver
to use FltAllocateCallbackData/FltPerformSynchronousIo. The performance
is still completely lousy. It looks about the same as it did before I
made the change. If I omit the call to FltPerformSynchronousIo, it
blazes along at about 80Mb/sec, when I make the call I see about
7Mb/sec. This pretty much sucks the same way that doing it the old way
did. How suspicious.

So I set a breakpoint on MmMapLockedPagesSpecifyCache, which
MmGetSystemAddress (which is actually a macro) wraps, waited for my call
to FltPerformSynchronousIo, enabled it, and poof, there’s the problem
(stack trace follows)

So, any ideas what I can do in formulating my Io request to avoid Ntfs
making this call?

~Eric

kd> kb
ChildEBP RetAddr Args to Child
fa1369ec f9978840 ffab1000 00000000 00000001
nt!MmMapLockedPagesSpecifyCache
fa136a0c f997d5e8 82336e70 00000010 82336e70 Ntfs!NtfsMapUserBuffer+0x40
fa136bfc f997c7d0 ffb2e510 82336e70 80a5bf00 Ntfs!NtfsCommonWrite+0x1cbd
fa136c70 809b550c 81a51030 82336e70 80873864 Ntfs!NtfsFsdWrite+0x16a
fa136ca0 8081df33 f9a49198 fa136cd4 f9a49198 nt!IovCallDriver+0x112
fa136cac f9a49198 81d4f55c 80873864 00000000 nt!IofCallDriver+0x13
fa136cd4 f9a4c716 fa136cf4 81a58ee8 00000000
fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x3cc
fa136d14 f9aa3c5c 00000000 81e458b0 ffac8f18
fltmgr!FltPerformSynchronousIo+0x1c8
fa136d28 ba65edc3 81d4f55c e10badd4 81cd8768
fltmgr!FltvPerformSynchronousIo+0x80
fa136d50 ba65ebe7 822f6f68 822f6fd8 ffb111f8
OurDriver!ehr_file_write_sync+0x193
fa136d6c 808ec1eb ffac8f18 82268ff0 808ae5fc
OurDriver!ehr_file_write_async_worker+0x27
fa136d80 8088043d 81f4a180 00000000 81e458b0 nt!IopProcessWorkItem+0x13
fa136dac 80949b7c 81f4a180 00000000 00000000 nt!ExpWorkerThread+0xeb
fa136ddc 8088e062 80880352 00000001 00000000
nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

Thanks for the suggestion, but it looks like the first thing fat does
for IRP_MN_MDL is call MmGetSystemAddressForMdlSafe. Damn anyway.

Instead I’m working on sharing a chunk of memory between the driver and
the service. That way I can map it once instead of every time I call
into my cdo. (http://www.osronline.com/article.cfm?id=39)

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Rod Widdowson
Sent: Monday, September 15, 2008 4:16 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] huge performance issue with
MmMapLockedPagesSpecifyCache

Eric,

You don’t specify, but I assume that this is all going to be noncached
IO?
If not NTFS will grow a VM so that it can copy into the cache.

If you must use the cache you might want to explore the IRP_MN_MDL
approach.
I don’t see any documentation for it, but the FAT sources and the doc
for
CcPrepareMdlWrite() should help you piece together what you want…

If you have specified noncached IO then I’m a bit flumoxed…

Rod

“Eric Diven” wrote in message
news:xxxxx@ntfsd…
Hey guys, I’m getting a little frustrated on this one.

I’m using a control device object for my minifilter to send writes to
the drivers below me in the stack. I’m using direct i/o because I’m
transfering a lot of data through it, and previously I was using
MmGetSystemAddressForMdlSafe/FltWriteFile to do the writes.

Doing some testing, I was seeing a massive hit for calling
MmGetSystemAddress alone (without calling FltWrite file) (like a 90%
reduction in throughput), so I went and made some changes in my driver
to use FltAllocateCallbackData/FltPerformSynchronousIo. The performance
is still completely lousy. It looks about the same as it did before I
made the change. If I omit the call to FltPerformSynchronousIo, it
blazes along at about 80Mb/sec, when I make the call I see about
7Mb/sec. This pretty much sucks the same way that doing it the old way
did. How suspicious.

So I set a breakpoint on MmMapLockedPagesSpecifyCache, which
MmGetSystemAddress (which is actually a macro) wraps, waited for my call
to FltPerformSynchronousIo, enabled it, and poof, there’s the problem
(stack trace follows)

So, any ideas what I can do in formulating my Io request to avoid Ntfs
making this call?

~Eric

kd> kb
ChildEBP RetAddr Args to Child
fa1369ec f9978840 ffab1000 00000000 00000001
nt!MmMapLockedPagesSpecifyCache fa136a0c f997d5e8 82336e70 00000010
82336e70 Ntfs!NtfsMapUserBuffer+0x40 fa136bfc f997c7d0 ffb2e510 82336e70
80a5bf00 Ntfs!NtfsCommonWrite+0x1cbd fa136c70 809b550c 81a51030 82336e70
80873864 Ntfs!NtfsFsdWrite+0x16a fa136ca0 8081df33 f9a49198 fa136cd4
f9a49198 nt!IovCallDriver+0x112 fa136cac f9a49198 81d4f55c 80873864
00000000 nt!IofCallDriver+0x13
fa136cd4 f9a4c716 fa136cf4 81a58ee8 00000000
fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x3cc
fa136d14 f9aa3c5c 00000000 81e458b0 ffac8f18
fltmgr!FltPerformSynchronousIo+0x1c8
fa136d28 ba65edc3 81d4f55c e10badd4 81cd8768
fltmgr!FltvPerformSynchronousIo+0x80
fa136d50 ba65ebe7 822f6f68 822f6fd8 ffb111f8
OurDriver!ehr_file_write_sync+0x193
fa136d6c 808ec1eb ffac8f18 82268ff0 808ae5fc
OurDriver!ehr_file_write_async_worker+0x27
fa136d80 8088043d 81f4a180 00000000 81e458b0 nt!IopProcessWorkItem+0x13
fa136dac 80949b7c 81f4a180 00000000 00000000 nt!ExpWorkerThread+0xeb
fa136ddc 8088e062 80880352 00000001 00000000
nt!PspSystemThreadStartup+0x2e 00000000 00000000 00000000 00000000
00000000 nt!KiThreadStartup+0x16


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@edsiohio.com To
unsubscribe send a blank email to xxxxx@lists.osr.com

Okay, I’m really starting to pull my hair out over this one. I’ve set
up memory shared between the driver and my service according to the
directions in the NT insider article I referenced earlier. I have an
MDL whose pages are locked into physical memory. I’ve mapped it into
the system address space using MmGetSystemAddressForMdlSafe, I’ve mapped
it into the user-mode address space using MmMapLockedPagesSpecifyCache.

I’m doing the write using a pointer into the system mapping of the
buffer, and it still appears to be calling MmMapLockedPagesSpecifyCache
in Ntfs. Hell, I’ve even tried illegitimately setting the
FLTFL_CALLBACK_DATA_SYSTEM_BUFFER flag, and the performance is still
abyssmal.

Can anybody please take a look at this and tell me what is going on with
my code that’s causing the issue? Do I have some incorrect irpflags?
Do I need to provide an MDL too, or what?

status = FltAllocateCallbackData (args->iid, NULL, &cbd);
TEST_STATUS_AND_ABORT ((“Could not allocate callback data”));

cbd->Iopb->TargetInstance = args->iid;
cbd->Iopb->TargetFileObject = fileo;
cbd->Iopb->IrpFlags = IRP_WRITE_OPERATION;

cbd->Iopb->MajorFunction = IRP_MJ_WRITE;
cbd->Iopb->MinorFunction = 0;

params = &cbd->Iopb->Parameters;

params->Write.Length = data_size;
params->Write.Key = 0;
RtlCopyMemory (&params->Write.ByteOffset, &args->offset,
sizeof (LARGE_INTEGER));

// data is a pointer into the aforementioned system space
mapping of the buffer
params->Write.WriteBuffer = data;
params->Write.MdlAddress = NULL;

// this is totally illegitimate. I know.
cbd->Flags |= FLTFL_CALLBACK_DATA_SYSTEM_BUFFER;

FltPerformSynchronousIo (cbd);

status = cbd->IoStatus.Status;

FltFreeCallbackData (cbd);

Thanks,

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Eric Diven
Sent: Monday, September 15, 2008 11:41 AM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] huge performance issue with
MmMapLockedPagesSpecifyCache

Thanks for the suggestion, but it looks like the first thing fat does
for IRP_MN_MDL is call MmGetSystemAddressForMdlSafe. Damn anyway.

Instead I’m working on sharing a chunk of memory between the driver and
the service. That way I can map it once instead of every time I call
into my cdo. (http://www.osronline.com/article.cfm?id=39)

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Rod Widdowson
Sent: Monday, September 15, 2008 4:16 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] huge performance issue with
MmMapLockedPagesSpecifyCache

Eric,

You don’t specify, but I assume that this is all going to be noncached
IO?
If not NTFS will grow a VM so that it can copy into the cache.

If you must use the cache you might want to explore the IRP_MN_MDL
approach.
I don’t see any documentation for it, but the FAT sources and the doc
for
CcPrepareMdlWrite() should help you piece together what you want…

If you have specified noncached IO then I’m a bit flumoxed…

Rod

“Eric Diven” wrote in message
news:xxxxx@ntfsd…
Hey guys, I’m getting a little frustrated on this one.

I’m using a control device object for my minifilter to send writes to
the drivers below me in the stack. I’m using direct i/o because I’m
transfering a lot of data through it, and previously I was using
MmGetSystemAddressForMdlSafe/FltWriteFile to do the writes.

Doing some testing, I was seeing a massive hit for calling
MmGetSystemAddress alone (without calling FltWrite file) (like a 90%
reduction in throughput), so I went and made some changes in my driver
to use FltAllocateCallbackData/FltPerformSynchronousIo. The performance
is still completely lousy. It looks about the same as it did before I
made the change. If I omit the call to FltPerformSynchronousIo, it
blazes along at about 80Mb/sec, when I make the call I see about
7Mb/sec. This pretty much sucks the same way that doing it the old way
did. How suspicious.

So I set a breakpoint on MmMapLockedPagesSpecifyCache, which
MmGetSystemAddress (which is actually a macro) wraps, waited for my call
to FltPerformSynchronousIo, enabled it, and poof, there’s the problem
(stack trace follows)

So, any ideas what I can do in formulating my Io request to avoid Ntfs
making this call?

~Eric

kd> kb
ChildEBP RetAddr Args to Child
fa1369ec f9978840 ffab1000 00000000 00000001
nt!MmMapLockedPagesSpecifyCache fa136a0c f997d5e8 82336e70 00000010
82336e70 Ntfs!NtfsMapUserBuffer+0x40 fa136bfc f997c7d0 ffb2e510 82336e70
80a5bf00 Ntfs!NtfsCommonWrite+0x1cbd fa136c70 809b550c 81a51030 82336e70
80873864 Ntfs!NtfsFsdWrite+0x16a fa136ca0 8081df33 f9a49198 fa136cd4
f9a49198 nt!IovCallDriver+0x112 fa136cac f9a49198 81d4f55c 80873864
00000000 nt!IofCallDriver+0x13
fa136cd4 f9a4c716 fa136cf4 81a58ee8 00000000
fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x3cc
fa136d14 f9aa3c5c 00000000 81e458b0 ffac8f18
fltmgr!FltPerformSynchronousIo+0x1c8
fa136d28 ba65edc3 81d4f55c e10badd4 81cd8768
fltmgr!FltvPerformSynchronousIo+0x80
fa136d50 ba65ebe7 822f6f68 822f6fd8 ffb111f8
OurDriver!ehr_file_write_sync+0x193
fa136d6c 808ec1eb ffac8f18 82268ff0 808ae5fc
OurDriver!ehr_file_write_async_worker+0x27
fa136d80 8088043d 81f4a180 00000000 81e458b0 nt!IopProcessWorkItem+0x13
fa136dac 80949b7c 81f4a180 00000000 00000000 nt!ExpWorkerThread+0xeb
fa136ddc 8088e062 80880352 00000001 00000000
nt!PspSystemThreadStartup+0x2e 00000000 00000000 00000000 00000000
00000000 nt!KiThreadStartup+0x16


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@edsiohio.com To
unsubscribe send a blank email to xxxxx@lists.osr.com


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

> Thanks for the suggestion, but it looks like the first thing fat does

for IRP_MN_MDL is call MmGetSystemAddressForMdlSafe. Damn anyway.

That seems strange - that’s exacly what CcPrepareForMdl is supposed to
avoid. Looking at my WLH sources for FAT I see :

if (!FlagOn(IrpContext->MinorFunction, IRP_MN_MDL)) {

[snip]
SystemBuffer = FatMapUserBuffer( IrpContext, Irp );
[snip]
} else {
[snip]
CcPrepareMdlWrite( FileObject,
[snip]

Note the !