Avoid Copying IO Buffers

Hello

I’m working on a device to which I get IRP_MJ_READ/WRITE, I need to send the buffer
that I get to some user mode component for inspection.
Currently, I’m copying each buffer that I get in the IRP to a user supplied buffer.
The problem with this method is that it can be pretty heavy to copy
large buffers for each read or write request. Thus, to avoid the copying I decided to
map the read/write buffers by using “MmMap… SpecifyCache()”.

The problem with this method is that the MDL of the Read/Write IRP sent is tied to
a different process context. so I decided to build a new MDL in the context of the process
which I want to share the buffer with. So for every request I create a new mdl and map
it using “mm… SpecifyCache” . I wasn’t able to get it to work yet… though, I’m not so
sure if this will improve performance or incur even more overhead?

Thanks in advance

Mike

Not sure I get what you’re asking.

The reads/writes are coming from process A, but you want to send them to a process B?

Peter
OSR

xxxxx@osr.com wrote:

Not sure I get what you’re asking.

The reads/writes are coming from process A, but you want to send them to a process B?

That’s what I read. He has a process submitting read and write
requests, and he wants to have another user-mode process that basically
acts as a filter driver.

I was waiting for the avalanche of posts telling him what a bad idea
that was before chiming in.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hi

Sorry for being unclear.

I have a device that receives system wide read/write IO. I want to hand the data buffers to another “controller” process which will perform some operation on those buffers.
I did all that using inverted callbacks, problem is that I’m copying the buffer to a user mode buffer
and I would like to avoid the copying and mapping that buffer instead to lower the overhead.

My question is if there is some good way to do it?

Thanks

Mike

Sorry… but I’ve GOT to ask the obvious question: Why not have the controller program post the reads or writes initially??

Peter
OSR

(I *hate* it when people follow-up their own posts, don’t you?)

And, assuming having the controller program post the read/writes isn’t possible or practical, where are you currently locating the buffers?

Assuming you’ve got them in some non-paged memory space, then you must have allocated that memory somehow… hopefully in kernel mode. If this is so, it should be rather straight forward to simply IoAllocateMdl, MmCreateMdlForNonPagedPool, and MmMapLockedPagesSpecifyCache to map the pages into the target process.

If the controlling program is under, er, your control (that is, if you’re responsible for writing it), then you can pro’lly make-up a protocol whereby the controlling program is given a series of buffers from your driver, your driver builds MDLs for these buffers and maps them ONCE into the controlling program’s address space, and then it’s just a matter of informing the controlling program which buffer to use and indicating when it’s “available”…

It seems like I’ve answered this question at least twice in the past month… Perhaps we need an article or something on this?

Peter
OSR

I don’t think I explained the problem properly.
My controller is just a user-mode application which communicate with my device.
My device receives IRP read/write from different applications that run on the system.
Think of this device as a simple disk…

xxxxx@yahoo.com wrote:

Sorry for being unclear.

I have a device that receives system wide read/write IO. I want to hand the data buffers to another “controller” process which will perform some operation on those buffers.
I did all that using inverted callbacks, problem is that I’m copying the buffer to a user mode buffer
and I would like to avoid the copying and mapping that buffer instead to lower the overhead.

My question is if there is some good way to do it?

Actually, no, there is no good way. Every process has its own address
space. You can’t avoid a copy. Even if you get clever and share a
chunk of memory between the driver and the “controller” process, you’ll
still end up copying in and copying out.

How much data are we talking about? Today’s CPUs do memory copies
really stinkin’ fast.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Soooooo… You have application A sending data to your driver, and you want to efficiently pass that data to application B. Is that a reasonable description? No hardware involved?

If this is the case, I don’t see any REASONABLE alternative but to copy the data from the address space of Application A and to the address space of Application B.

Sounds like a job for… (play trumpets here)… Direct I/O!

Peter
OSR

All I was thinking about is to find someway to reuse the read/write IRP buffers that are being received by my device from applications that are doing IO read/write.
So by mapping this already allocated kernel mode buffer to the right process context (creating PTEs) I could probably save one copy operation of the data. And I was wondering if there is going to be any performance gain.

I already get an MDL locked and probed in the IRP and then I pend the IRP.
So why I cannot use the mdl of that IRP and hand it to the user app? This could spare my device
some operation such as buffer allocation, and copying and more.
Problem is that I’m not sure if it is possible to reuse that MDL?

Thanks, Mike

I dunno, dude. You’ve got me totally confused.

I’ve now given you two different, and very specific, answers… based on two very different and not very specific sets of information you’ve provided. Pick one, I guess.

Sorry I can’t be more helpful… Hey, I tried,

Peter
OSR

I guess the first answer will work for me.

Thanks Peter.

Mike

xxxxx@yahoo.com wrote:

I already get an MDL locked and probed in the IRP and then I pend the IRP.

That gives you a kernel space address, corresponding to pages in the
user-mode space of the originating process.

So why I cannot use the mdl of that IRP and hand it to the user app? This could spare my device
some operation such as buffer allocation, and copying and more.
Problem is that I’m not sure if it is possible to reuse that MDL?

How would you “hand” the MDL to the other IRP? Both of the two
processes have allocated memory, within their own unique process address
space, and handed you pointers to those buffers. You can certainly
convert both of those to kernel addresses, using any one of a number of
techniques, but there’s no way to pull a switcheroo and shove one MDL
into another IRP. The process that owns the IRP doesn’t ever SEE the
MDL. All it has is a buffer in its own address space.

The two processes own two different sets of physical pages. Nope, you
are going to have to copy.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> The problem with this method is that the MDL of the Read/Write IRP sent is
tied

to
a different process context.

MDLs are tied, the mappings to the system address space are not.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Empiric of Pentium-2 era: if the data amount is > 1 page, then mapping has
gain.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntdev…
> All I was thinking about is to find someway to reuse the read/write IRP
buffers that are being received by my device from applications that are doing
IO read/write.
> So by mapping this already allocated kernel mode buffer to the right process
context (creating PTEs) I could probably save one copy operation of the data.
And I was wondering if there is going to be any performance gain.
>

Last time I spoke with our mm & performance folks about this (since UMDF copies buffers it’s a concern of mine) the magic number for direct mapping vs copying was actually around 24kb. Locking and mapping a page and then unmapping it has a considerable expense since you have to muck with page tables, then with the TLB. The CPU is quite good at copying.

I haven’t personally verified those numbers but I would recommend that anyone trying to decide whether to copy or map < about 32kb of data at a time look carefully at the performance of each option before choosing a path.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Friday, March 07, 2008 11:27 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Avoid Copying IO Buffers

Empiric of Pentium-2 era: if the data amount is > 1 page, then mapping has
gain.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntdev…
> All I was thinking about is to find someway to reuse the read/write IRP
buffers that are being received by my device from applications that are doing
IO read/write.
> So by mapping this already allocated kernel mode buffer to the right process
context (creating PTEs) I could probably save one copy operation of the data.
And I was wondering if there is going to be any performance gain.
>


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

That’s mostly true. If, of course, the OS that you’re dealing with is
running in a virtual machine, the mapping cost goes up while the
copying cost doesn’t. We’ve found that, in a VM, the flip-over point
happens around 64K.

  • Jake Oshins
    Windows Virtualization Guy

“Peter Wieland” wrote in message
news:xxxxx@ntdev…
> Last time I spoke with our mm & performance folks about this (since
> UMDF copies buffers it’s a concern of mine) the magic number for
> direct mapping vs copying was actually around 24kb. Locking and
> mapping a page and then unmapping it has a considerable expense
> since you have to muck with page tables, then with the TLB. The CPU
> is quite good at copying.
>
> I haven’t personally verified those numbers but I would recommend
> that anyone trying to decide whether to copy or map < about 32kb of
> data at a time look carefully at the performance of each option
> before choosing a path.
>
> -p
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S.
> Shatskih
> Sent: Friday, March 07, 2008 11:27 AM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] Avoid Copying IO Buffers
>
> Empiric of Pentium-2 era: if the data amount is > 1 page, then
> mapping has
> gain.
>
> –
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
> wrote in message news:xxxxx@ntdev…
>> All I was thinking about is to find someway to reuse the read/write
>> IRP
> buffers that are being received by my device from applications that
> are doing
> IO read/write.
>> So by mapping this already allocated kernel mode buffer to the
>> right process
> context (creating PTEs) I could probably save one copy operation of
> the data.
> And I was wondering if there is going to be any performance gain.
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>