METHOD_BUFFERED vs METHOD_IN_DIRECT

Robert_Gaurie · April 22, 2013, 11:31pm

I have a Device_Ioctl call I use to write data to my hardware. The input buffer is an array of various values such as an address, width, count, and finally a pointer to the data buffer containing the values I want to write to my hardware.

driverParam[0] = offset;
driverParam[1] = count;
driverParam[2] = width;
driverParam[3] = (UINT32) hostAddr; //pass pointer to buffer as a UINT32

status = DeviceIoControl
(
hDevice,
IOCTL_DEVICE_WRITE,
driverParam,
4*sizeof(UINT32),
NULL,
0,
&bytesReturned,
&ol
);

The driver will retrieve the input buffer via WdfRequestRetrieveInputBuffer and
write to my hardware with:
WRITE_REGISTER_BUFFER_ULONG((PULONG) bytePointer, (PULONG)inputBuffer[4], count);
//bytePointer is an assembled address from my offset value

This has worked for years but recently I noticed on some systems I get a crash and the kernel dump points to this instruction suggesting the (PULONG)inputBuffer[4] value is invalid now:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.

I tried switching to METHOD_IN_DIRECT and pass this buffer in using the outputBuffer and the crash disappears. I’m guessing that the buffered method loses track of the buffer I point to and it may become pageable memory causing this crash. I also tried forcing the buffer to be a local value in my IOCTL driver call, ignoring the pointer I passed in, and the crash vanishes too. I don’t know how else to prove the pointer is invalid after the WdfRequestRetrieveInputBuffer call.

The MSDN site says that small transfers should use the BUFFERED method because DIRECT I/O will be slower so I’d like to stick with buffered. I’m thinking I could pass in the entire buffer rather than a pointer but then I’d have to copy the entire buffer in my application which would cause performance problems too. The parameters have been set for years so I can’t simply build the buffer with data and parameters without a copy.

What is the proper way to pass in a data buffer for the driver? Does it appear that I’m making valid assumptions? Is there some way to pass in a pointer and not lose the data it points to? I can’t seem to use the output buffer for the input data wen using METHOD_BUFFERED. At least I tried as I did with METHOD_IN_DIRECT and it didn’t work for BUFFERED transfers.

Doron_Holan · April 23, 2013, 12:09am

What you are doing is method neither by embezzling the um pointer as a value. Not to mention it is not 64 bit compatible. I would guess the reason you are now bugchecking is that this is a power managed queue processing the request and you are no longer in the app’s context, thus invalidating the pointer. So, you have a fee options.
1 map the buffer in a inprocesscontext callback using the appropriate wdf API. This will unmap the buffer upon completion
2 use in_direct and let the os map it for you
3 use method buffered. Since the input and output buffer are the same, you nerd to capture the input values before writing to the output

I would go with 2. Simplest solution

d

Bent from my phone

From: xxxxx@yahoo.com mailto:xxxxx
Sent: ?4/?22/?2013 8:32 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

I have a Device_Ioctl call I use to write data to my hardware. The input buffer is an array of various values such as an address, width, count, and finally a pointer to the data buffer containing the values I want to write to my hardware.

driverParam[0] = offset;
driverParam[1] = count;
driverParam[2] = width;
driverParam[3] = (UINT32) hostAddr; //pass pointer to buffer as a UINT32

status = DeviceIoControl
(
hDevice,
IOCTL_DEVICE_WRITE,
driverParam,
4*sizeof(UINT32),
NULL,
0,
&bytesReturned,
&ol
);

The driver will retrieve the input buffer via WdfRequestRetrieveInputBuffer and
write to my hardware with:
WRITE_REGISTER_BUFFER_ULONG((PULONG) bytePointer, (PULONG)inputBuffer[4], count);
//bytePointer is an assembled address from my offset value

This has worked for years but recently I noticed on some systems I get a crash and the kernel dump points to this instruction suggesting the (PULONG)inputBuffer[4] value is invalid now:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.

I tried switching to METHOD_IN_DIRECT and pass this buffer in using the outputBuffer and the crash disappears. I’m guessing that the buffered method loses track of the buffer I point to and it may become pageable memory causing this crash. I also tried forcing the buffer to be a local value in my IOCTL driver call, ignoring the pointer I passed in, and the crash vanishes too. I don’t know how else to prove the pointer is invalid after the WdfRequestRetrieveInputBuffer call.

The MSDN site says that small transfers should use the BUFFERED method because DIRECT I/O will be slower so I’d like to stick with buffered. I’m thinking I could pass in the entire buffer rather than a pointer but then I’d have to copy the entire buffer in my application which would cause performance problems too. The parameters have been set for years so I can’t simply build the buffer with data and parameters without a copy.

What is the proper way to pass in a data buffer for the driver? Does it appear that I’m making valid assumptions? Is there some way to pass in a pointer and not lose the data it points to? I can’t seem to use the output buffer for the input data wen using METHOD_BUFFERED. At least I tried as I did with METHOD_IN_DIRECT and it didn’t work for BUFFERED transfers.

—
NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

OSR_Community_User · April 23, 2013, 3:25am

In geeral, there is so little advantage to putting in a pointer in this
fashion that it is hardly worth the effort to do so.

The first and most fundaental error here is te idea that you can cast a
pointer to a 32-bit integer. This code will not port to Win64, and if you
do a 64-bit driver, you have to worry about whether it was called from a
32-bit or 64-bit app. So you should forget any mechanism that believes
anything about pointer sizes. You will save yourself infinite grief.

I have a Device_Ioctl call I use to write data to my hardware. The input
buffer is an array of various values such as an address, width, count, and
finally a pointer to the data buffer containing the values I want to write
to my hardware.

driverParam[0] = offset;
driverParam[1] = count;
driverParam[2] = width;
driverParam[3] = (UINT32) hostAddr; //pass pointer to buffer as a UINT32

***
This represents such a fundamental failure of design in 2013 that I cannot
fathom any excuse that could allow it to make sense. Lose the entire idea
that a pointer fits into 32 bits. That way madness lies.
****

status = DeviceIoControl
(
hDevice,
IOCTL_DEVICE_WRITE,
driverParam,
4*sizeof(UINT32),
***
Of course, the only sensible thing would be to write 3*sizeof(UINT32) +
sizeof(PVOID).

You are literally living in another century.
****

NULL,
0,
***
This is why it makes no sense. There are two perfectly fine parameters
that you should be using to point to the data, and you shoud use
OUT_DIRECT. In a giant act of stupidity, the output was called input, and
the input was called output, which is true unless it is input. The only
intelligent way to think of DeviceIoControl is that it has two buffers:
The “parameters” parameter is always called in buffered mode, and the
“data” parameter is in one of buffered mode (in which case, it can only be
written by the driver), or is direct-mode souce of data (OUT_DIRECT) or
direct-mode sink of data (IN_DIRECT). So you are inventing a
mind-boggling complex solution to a truly trivial problem.

Lose the entire concept of passing a user-level address. You simply don’t
want to go there.
***

&bytesReturned,
&ol
);

The driver will retrieve the input buffer via
WdfRequestRetrieveInputBuffer and
write to my hardware with:
WRITE_REGISTER_BUFFER_ULONG((PULONG) bytePointer, (PULONG)inputBuffer[4],
count);
//bytePointer is an assembled address from my offset value

***
This line is so completely nonsense that it is mind-blowing. You can’t
use a user-level address like this; the fact that it EVER worked is what
is surprising. Do NOT use Driver Verifier on this driver; it will reach
out of your screen and strangle you. Rewrite it to be a proper driver
before using Driver Verifier.

In addition, you have shown this line with absolutely NO context! You
have not

You do not have a device driver. You have an artifact of code that
guarantees that it will crash your system. As I said, what is amazing is
tat it ever worked!

There is actually no need to pass an offset and count in. Since you know
the offset, there is no reason you can’t compute this address in your
program; the count can be passed as te buffer size, so instead of NULL,0
you will pass te address you want, and the count, instead, and your
problems will go away. So your parameters should be

DeviceIoControl(handle, NULL, 0, hostaddr+offset, count,
&bytesTransferred, NULL)

Don’t worry about how to fix your driver. It is unfixable, and the “fix”
involves learning how to create a MDL, lock down the pages, etc., and even
then it will be wrong because it will lock down all the pages, so you will
have to build a partial MDL, and the actual fix is so trivial that it is
not worth the effort.
****

This has worked for years but recently I noticed on some systems I get a
crash and the kernel dump points to this instruction suggesting the
(PULONG)inputBuffer[4] value is invalid now:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
****
NEWS FLASH! You ASKED for this crash. The code you showed can guarantee
this! If you had used the Driver Verifier when you wrote this, it would
have caught it. You obviously believe in miracles, because that is the
only mechanism that could prevent this.
****
An attempt was made to access a pageable (or completely invalid) address
at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.

***
When we ask for the !analyze -v output, we don’t mean that you should omit
all the useful information and include the lines that most of us can
recite from memory. Like, what routine were you in, what IRQL were you
running at, what are hte registers, what is the stack dump, and all that
REALLY IMPORTANT stuff.
*****

I tried switching to METHOD_IN_DIRECT and pass this buffer in using the
outputBuffer and the crash disappears. I’m guessing that the buffered
method loses track of the buffer I point to and it may become pageable
memory causing this crash. I also tried forcing the buffer to be a local
value in my IOCTL driver call, ignoring the pointer I passed in, and the
crash vanishes too. I don’t know how else to prove the pointer is invalid
after the WdfRequestRetrieveInputBuffer call.
****
There is no concept of “losing track”; there is absolutely nothing you
have done that could cause te buffer to be paged in and locked down. It
didn’t “become” pageable; you did nothing whatsoever to make it non-paged!

I have no idea what you mean by “forcing the buffer to be a local” because
I know of no way to accomplish this.

You can safely assume that if you have not explicitly done something to
lock the pages, they are not going to be locked. That’s why IN/OUT_DIRECT
is used; it tells the I/O Manager that the pages must be paged in an
locked down. WdfRequestRetrieveInputBuffer obtains a pointer to those 4
UINT32. Period. Since no place along the way have you have done anything
to lock the pages of your buffer down, the pointer you get from WdfRRIB is
perfectly valid. Your choice to interpret some of those bits as a
pointer, beside being a deep and fundamental design error which makes
unsupportable assumptions like pointers being 32 bits, or that a user
address is valid in arbitrary kernel contexts, there is NO WAY the I/O
Manager is going to know that some of those bits are a pointer. That’s
now your responsibility. But the interface you have designed is
remarkably clumsy, and should be scrapped, rather than trying any other
“fix”.

****

The MSDN site says that small transfers should use the BUFFERED method
because DIRECT I/O will be slower so I’d like to stick with buffered.

****
The reason direct mode is slow is that it has to do all those things like
bring all the pages and make sure they’re locked down! If you really want
to use buffered mode, pass in a pointer as the “parameters” which has
everything you need. For example, the count is redundant because you can
pass that in to the DeviceIoControl. The pointer to that buffer can be
computed based on the offset. You didn’t show the “width” parameter being
used at all. Bottom line, you have to think VERY carefully about ever
passing a user address into the kernel as part of the bits you send. The
simplest predicate to apply is “If I’m putting a user address in the data
packet I’m sending, I’ve made a fundamental design error”

(Note: before I generate a ton of responses from the advanced driver
writers saying “That’s completely wrong, and bad advice” let me assure
you: newbies need simple design rules to build successful drivers.
Putting up a sign that says “land mines” in front of a field of land mines
saves the experience described here, which is stepping on a land mine.
Those of us with serious driver experience either know where the mines are
buried, or we know how to use mine detectors. Newbies say “mines?”)

The fact that you didn’t understand what went wrong means you have walked
into a mine field and are now looking at your missing leg and saying “What
happened? My GPS said this was the shortest route!”

The MSDN was correct, but you completely misunderstood what it was saying.
*****

I’m
thinking I could pass in the entire buffer rather than a pointer but then
I’d have to copy the entire buffer in my application which would cause
performance problems too.

****
Several problems here:
(a) why do you think you need to copy the entire buffer?
(b) why do you think a copy on a modern machine is slow?
(c) how big is your buffer? On the one hand, you say you want to used
buffered mode because you have small amounts of data. Then you say that
copying those few bytes will be a performance problem? [hint: what do you
think the kernel does when you use buffered I/O? Oh, right, it COPIES
THE DATA]
(d) if you pass in the width (whatever that is) and a direct reference to
the data, where is the copy being done? I see no copy here.
***

The parameters have been set for years so I
can’t simply build the buffer with data and parameters without a copy.
***
And you think this is a problem?

Seriously, this is not a device driver you have described; it is a land
mine. Nothing short of a total rewrite is going to save it. Or, abandon
the concept of pointer, and offset, and count, and write a simple driver
that is actually correct.

You either have to change the interface or figure out all the things you
HAVE to fix to make it work. Changing the interface and the calls is
ultimately simpler. However, do NOT reuse the IOCTL code if you change
the interface,
***

What is the proper way to pass in a data buffer for the driver? Does it
appear that I’m making valid assumptions?

***
(a) pointers are 32 bits
(b) it is valid to touched unlocked pages in a driver
(c) there is a psychic component in the I/O Manager that knows that some
of the bits are a pointer and will lock that data down.

Any one of these invalid assumptions dooms your driver.
****

Is there some way to pass in a
pointer and not lose the data it points to?
****
Absolutely! Use the “data” parameter and direct mode. Or, if you like
walking around in minefields while poking with a stick (so you don’t
actually step on a land mine), then you can simulate everything direct
mode does, getting the same performance cost direct mode has, but with the
added advantage of costing weeks of effort.
****
I can’t seem to use the
output buffer for the input data wen using METHOD_BUFFERED. At least I
tried as I did with METHOD_IN_DIRECT and it didn’t work for BUFFERED
transfers.

****
No surprise here; if you use the “data” parameter, you can only write to
it. That’s how the call is defined to work in buffered mode.

I suspect the root cause of this horrid code is a completely failed
understanding of “performance”. You have pre-optimized a solution based
on rumor, rather than factual data. It is hard to offer specific advice
because

(a) you have given no numbers on the quantity of bytes you send down
(b) you have given no numbers on how many times per second you need to
make this call
(c) you have no measurements to tell you anything about actual performance
(d) the code fragment you show has no context
(e) you have not indicated if this device needs interrupts and hence also
needs a DPC
(f) you omitted any meaningful data from !analyze -v

Provide relevant information and the readers here might be able to suggest
a simpler and correct approach.

I say, scrap it and start over. Others may not be so negative.

Someday, I should write an essay called “how to define a driver
interface”, which used to be a two- hour lecture when I taught courses on
this.

joe
*****

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Tim_Roberts · April 23, 2013, 12:17pm

xxxxx@yahoo.com wrote:

The MSDN site says that small transfers should use the BUFFERED method because DIRECT I/O will be slower so I’d like to stick with buffered.

This is such an incredibly small micro-optimization that I’m surprised
such a note is still in the documentation. Consider the
implementation. With METHOD_BUFFERED, the system allocates new
non-paged space and copies your user-mode buffer into and out of it.
With METHOD_xx_DIRECT, the system just locks your buffer into memory and
passes the kernel address. I seriously doubt that the difference in
performance is measurable, and even if it were, it wouldn’t matter
unless you were making hundreds of thousands of requests per second.
I’m pretty sure you’re not doing that.

Ignore this note. Do what works.

The parameters have been set for years so I can’t simply build the buffer with data and parameters without a copy.

I’m sorry, but that is EXACTLY what you have to do. Pass the first
three parameters as the input buffer, pass the pointer as the output
buffer. That way, the I/O system takes care of making sure the buffer
has a kernel address, and that the buffer remains valid until your
request completes. Plus, it will work regardless of whether the
application is 32-bit or 64-bit, and whether the driver is 32-bit or 64-bit.

It has NEVER been safe to pass a raw user-mode address to a driver. It
works much of the time, but there are just way too many corner cases,
and it fails miserably when you try to cross the 32/64 borders.

(Actually, you only have to pass TWO parameters, because the length of
the buffer will be passed to you already.)

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Robert_Gaurie · April 23, 2013, 3:55pm

Thanks everyone. I was aware of the 64 bit incompatibility and the proper solutions avoid that mistake. I wasn’t sure how the performance would be affected so I was worried about using direct. If it’s not noticeable I’ll just stick with direct.
Thanks again.

Peter_Viscarola_OSR · April 23, 2013, 10:13pm

Hmmmm… I beg to differ. The overhead that matters in METHOD_xxx_DIRECT is not what’s required to map the user buffer, but rather the overhead implicit in UNmapping the user data buffer from kernel virtual address space, which requires invalidating the TLB. While TLB flushing is handled *much* better in Windows than it was “in the old days”, this operation is still expensive… at least it is to the best of my knowledge. INVD costs overall system performance and is hard to measure.

OSR_Community_User · April 24, 2013, 2:13am

But his driver would have to reach out to those user pages the pointer
reference, make sure they are paged in and locked down, and upon the
completion of the IRP will have to unkap the pages. I saw the error as
treating that sentence as meaning “use buffered I/O cecause it has less
overhead, but he did not ubderstand that his"buffered alternatvive” did
bot solve the REAL problem of locking pages into memory, which is a
prerequisite for successful performance kf the operation. Any program can
be “more efficient” if it omits costly but nexessary steps. What he
missed was that the pagein and locking is required, no matter who does it,
and the unmapping is still required. But instead of handing off this
cumbersome work to a kernel component that knows how to do it right; he
chose to not do it, which gives him the desired performance by avoiding
that bugaboo of all coding: that the code be correct (some bean counters
are fussy that way; they think the complany should ship a product that
works).

It is indeed an “optimization”, whose only downside is that the optimized
code is no longer a valid device driver, and is best thought of as a BSOD
generator, and it doesn 't even do THAT well, since accidents of timing
can mean that it mostly fails to create a BSOD each time it is called.

Here’s an optimized tax form program:
int main()
{
printf(“You owe the coverment $%f8.2\n”, ((double)(rand()) / 100.0);
return 0;
}
This is a classic “Unix-efficient” program: small, fast, and wrong

This is such an incredibly small micro-optimization that I’m surprised
such a note is still in the documentation. Consider the
implementation. With METHOD_BUFFERED, the system allocates new
non-paged space and copies your user-mode buffer into and out of it.
With METHOD_xx_DIRECT, the system just locks your buffer into memory and
passes the kernel address.

Hmmmm… I beg to differ. The overhead that matters in METHOD_xxx_DIRECT
is not what’s required to map the user buffer, but rather the overhead
implicit in UNmapping the user data buffer from kernel virtual address
space, which requires invalidating the TLB. While TLB flushing is handled
*much* better in Windows than it was “in the old days”, this operation is
still expensive… at least it is to the best of my knowledge. INVD costs
overall system performance and is hard to measure.

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 24, 2013, 3:32am

The problem as I see it is that this is a classic Unix optimization. The
result is a driver that is small, fast, and wrong. The OP read the advice
that direct mode is less efficient, and solved it by not using it, thus
avoiding all those ghastly overheads of bringing all the pages in and
locking them down, then unmapping them. Hey, look at this! It runs faster
when it’s wrong!

[and before someone hijacks this thread to a ***x-vs.-Windows flame war, I
worked in Unices for 15 years and had to deal with software which, on a
nearly daily basis, exhibited one or more pathologies which coulld be
summed up by the authors as “What does it matter if it’s wrong? It’s
small and fast!”.]

Here, for example, is my highly-efficient tax program:

int main()
{
printf(“You owe the government $%8.2f\n”, ((double)rand()) / 100.0);
return 0;
}

This kind of solution was first pointed out by H. L. Menken, who said,
“For every complex problem, there is a answer which is clear, simple, and
wrong.”

And this driver exemplar isn’t even a representative of a clear or simple
solution. It involves putting pointers to a user buffer where values
like “offset” and “length” to select a substream of data bytes. If this
were converted to use METHOD_xxx_DIRECT (I think “out”, but the stupid
names given to the parameters always is confusing) then the kernel would
have to lock all of thr pagges in the buffer down, and, since the OP
didn’t want us to hurt our brains by trying to do arithmetic, failed to
say things like “The data being sent down is rarely more than 60 bytes,
100 at most” or “The data being sent down is rarely more than 60 pages,
100 at most” or “The data beibg sent down is rarely more than 60
megabytes, 100 at most” nor is the size of the buffer specified relative
to the contents being transmitted. Does each call transfer 0.1% of the
buffer each time, or 1% or 10% or 50%? Paramers like these usually form
the basis forr technical tradeoffs. If the buffer is 10MB being sent out
in 30-byte chunks, the whole notion of locking down the entre buffer is
nonsensical. this suggests that the idea of computing the start of the
buffer and the length of 30 means that the I/O Manager will typically lock
down one page, and occasionally two.

At that point the “micro-optimization” is just that: a pointless waste of
programming time to save nothing important. If the general case is that
90% of the 10MB buffer are transferred the “optimization” become entirely
nonsensical because (to be correct, which the current one appears to not
be concerned about at all) the driver will get a buffer pointer and buffer
offset and length, obliging the I/O Manager to lock down what might be a
giant buffer, transact I/O on some tiny subset, and unmap the gigantic
buffer, then the REAL optimization is to use direct mode for the “data”
parameter, specfying the actual start address, and the length in bytes.
This will mean that only the needed pages need to be loaded and locked.

It is neither optimization nor pessimization to do those things that
establish the necessary preconditions for valid code execution. Those
operations are there because they need to be, for correctness. A different
design would entail more, or fewer, of such operations. The tradeoffs
become “which of these choices allows me to achieve my goal in the most
cost-effective manner?” The problem with the OP’s code is that there are
NO calls to establish necessary preconditions. Putting a pointer to the
subsequence of bytes as a direct “data” parameter, leaves only the
mysterious “width” parameter, which should be the only one passed in.
This would mean that only the subsequence of bytes actually being used
needs to be locked down, and the size of the buffer is irrelevant.

The tradeoffs are not between “more efficient” and “less efficient” code
in the abstract; the tradeoffs are between “efficient and hopelessly
incorrect code” and “Less efficient but guaranteed correct code”. Hmmm.
Difficult Choice. I’ll Have To Think About that…
joe

This is such an incredibly small micro-optimization that I’m surprised
such a note is still in the documentation. Consider the
implementation. With METHOD_BUFFERED, the system allocates new
non-paged space and copies your user-mode buffer into and out of it.
With METHOD_xx_DIRECT, the system just locks your buffer into memory and
passes the kernel address.

Hmmmm… I beg to differ. The overhead that matters in METHOD_xxx_DIRECT
is not what’s required to map the user buffer, but rather the overhead
implicit in UNmapping the user data buffer from kernel virtual address
space, which requires invalidating the TLB. While TLB flushing is handled
*much* better in Windows than it was “in the old days”, this operation is
still expensive… at least it is to the best of my knowledge. INVD costs
overall system performance and is hard to measure.

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Peter_Viscarola_OSR · April 24, 2013, 8:57am

Thanks for the dissertation. Always entertaining.

I didn’t claim the use of an embedded pointer was a good thing. It’s obviously not. In fact, i was contributing to thread drift and not commenting on the OP’s issue at all.

I was merely responding to Mr. Robert’s comment on the WDK docs mentioning the overhead difference between buffered and direct I/O. He characterized the difference as “incredibly small” and seemed to focus, as most people do in evaluating buffered vs direct, on the relative cost of setting up the operations. I was reminding him, and all reading along, that the real cost difference is in the tear-down of the extra mapping when the TLB needs flushed.

That’s all…

Peter
OSR

Maxim_S_Shatskih · April 25, 2013, 5:18am

>values such as an address, width, count, and finally a pointer to the data buffer

This is called “method neither”.

To handle this correctly, you need:

declare a pointer in the IOCTL input structure as a 64bit entity, always, for 32bit code too.
use IoAllocateMdl+MmProbeAndLockPages with a try/except frame in the driver to access the pointer

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Pavel_Lebedinsky · April 26, 2013, 1:03pm

To clarify #2 below, when you are dealing with raw user pointers (as in METHOD_NEITHER, or pointers embedded in other structures) you have two options:

a) IoAllocateMdl+MmProbeAndLockPages+MmGetSystemAddressForMdlSafe, then access the contents of the user buffer through the returned system mapping.

b) ProbeForRead/ProbeForWrite, then access the user buffer directly under try/except.

If you change your design to use a single contiguous buffer instead of embedded pointers then you can avoid all this and just use METHOD_BUFFERED (or direct).

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Thursday, April 25, 2013 2:18 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

values such as an address, width, count, and finally a pointer to the
data buffer

This is called “method neither”.

To handle this correctly, you need:

declare a pointer in the IOCTL input structure as a 64bit entity, always, for 32bit code too.
use IoAllocateMdl+MmProbeAndLockPages with a try/except frame in the driver to access the pointer

Doron_Holan · April 26, 2013, 1:11pm

WdfRequestProbeAndLockUserBufferForRead/Write (http://msdn.microsoft.com/en-us/library/windows/hardware/ff549987(v=vs.85).aspx ) does a) for you and then unmaps it when the request is completed

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Pavel Lebedynskiy
Sent: Friday, April 26, 2013 10:00 AM
To: Windows System Software Devs Interest List
Subject: RE: Re:[ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

To clarify #2 below, when you are dealing with raw user pointers (as in METHOD_NEITHER, or pointers embedded in other structures) you have two options:

a) IoAllocateMdl+MmProbeAndLockPages+MmGetSystemAddressForMdlSafe, then access the contents of the user buffer through the returned system mapping.

b) ProbeForRead/ProbeForWrite, then access the user buffer directly under try/except.

If you change your design to use a single contiguous buffer instead of embedded pointers then you can avoid all this and just use METHOD_BUFFERED (or direct).

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Thursday, April 25, 2013 2:18 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

values such as an address, width, count, and finally a pointer to the
data buffer

This is called “method neither”.

To handle this correctly, you need:

declare a pointer in the IOCTL input structure as a 64bit entity, always, for 32bit code too.
use IoAllocateMdl+MmProbeAndLockPages with a try/except frame in the driver to access the pointer

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 28, 2013, 1:57am

Which doesn’t solve the basic design failure of thinking pointers are 32
bits, which has a trivial solution: use the “data” parameter to point to
the data! Since the suggested solution has te same overhead of direct I/O
without te convenience, or forward compatibility, it feels a bit like
suggesting library paste to attach the wings to the pig. It works, sort
of, but the failure modes can be catastrophic (e.g. moving to 64 bit), and
why use a complex solution tat has the same runtime costs as using
METHOD_OUT_DIRECT but costs a whole lot more to implement. Note also that
in buffered mode, it requires computing the address of the data, as passed
in as uninterpreted data bits, adding the offset, and then using only te
length amount of bytes, which seems to be needless complexity in the
driver when it can be done more cheaply in application space. Lacking any
useful data, it is hard to tell what the tradeoffs are, but passing in an
offset and length in the first buffer and the pointer to the base and the
length of the entire buffer means potentially a lotvof pages may need to
be locked down, instead of only the page(s) needed.

The MSDN says buffered mode is cheaper, but it doesn’t say how much
cheaper, or if there is a particular point where the cost of the buffer
copy exceeds the cost of direct. Lacking any quantitative data on the
tradeoffs, it is hard to see how ANY decision could be made. My advice
has been to err on the side of driver complexity as the dominant
parameter. Once you have a driver you can instrument it, and if you
discover that it impacts app or system performance then, and only then do
you think about adding driver complexity. But otherwise, the KISS
principle should apply.

Another major failure is to think of DeviceIoControl as the user
interface. In a clean design that is meant for wide distribution, you
should do a DLL whose interface is something like
BOOL WriteToMyDevice(LPVOID addr, SIZE_T offset, SIZE_T length);

and that is the ONLY supported interface for apps. The version 1.0 dtiver
comes with the version 1.0 DLL, and the version 2 driver comes with the
version 2 DLL. Thus, the driver writer is free to implement whatever
works best, and the only issue of compatibility is what the exported
interface looks like. But if you bind unecessary details of the driver
design in the driver interface, then you may have to continue an
ill-designed or totally inappropriate interface lest all existing apps
break. If changing the implementation of the interface would break the
apps, then the interface is designed incorrectly.
joe

WdfRequestProbeAndLockUserBufferForRead/Write
(http://msdn.microsoft.com/en-us/library/windows/hardware/ff549987(v=vs.85).aspx
) does a) for you and then unmaps it when the request is completed

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Pavel Lebedynskiy
Sent: Friday, April 26, 2013 10:00 AM
To: Windows System Software Devs Interest List
Subject: RE: Re:[ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

To clarify #2 below, when you are dealing with raw user pointers (as in
METHOD_NEITHER, or pointers embedded in other structures) you have two
options:

a) IoAllocateMdl+MmProbeAndLockPages+MmGetSystemAddressForMdlSafe, then
access the contents of the user buffer through the returned system
mapping.

b) ProbeForRead/ProbeForWrite, then access the user buffer directly under
try/except.

If you change your design to use a single contiguous buffer instead of
embedded pointers then you can avoid all this and just use METHOD_BUFFERED
(or direct).

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Thursday, April 25, 2013 2:18 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

>values such as an address, width, count, and finally a pointer to the
>data buffer

This is called “method neither”.

To handle this correctly, you need:

declare a pointer in the IOCTL input structure as a 64bit entity,
always, for 32bit code too.

use IoAllocateMdl+MmProbeAndLockPages with a try/except frame in the
driver to access the pointer

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 28, 2013, 9:51pm

> To clarify #2 below, when you are dealing with raw user pointers (as in

METHOD_NEITHER, or pointers embedded in other structures) you have two
options:

a) IoAllocateMdl+MmProbeAndLockPages+MmGetSystemAddressForMdlSafe, then
access the contents of the user buffer through the returned system
mapping.

b) ProbeForRead/ProbeForWrite, then access the user buffer directly under
try/except.

If you change your design to use a single contiguous buffer instead of
embedded pointers then you can avoid all this and just use METHOD_BUFFERED
(or direct).
****
In the context of the OP’s question, because the data to be sent is in the
user buffer. METHOD_BUFFERED cannot be used, because all it can do is
write to the data buffer; the contents are not copied in when the call is
made (nor could they be, because only one physical buffer is allocated in
the kernel). So only direct mode will work. And since it will have the
same teardown cost as he simulation of “neither”, it is easier to write
the driver.
joe

(this is the case where the “output buffer”, which we are told to think of
as “the input buffer”, is really an output buffer, except that while it is
already called output, it is, from the viewpont of the app, supposedly
input, but it isn’t, it really is output. Sorry, if I don’t stop here my
head is going to explode)
*****

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Thursday, April 25, 2013 2:18 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] METHOD_BUFFERED vs METHOD_IN_DIRECT

>values such as an address, width, count, and finally a pointer to the
>data buffer

This is called “method neither”.

To handle this correctly, you need:

declare a pointer in the IOCTL input structure as a 64bit entity,
always, for 32bit code too.

use IoAllocateMdl+MmProbeAndLockPages with a try/except frame in the
driver to access the pointer

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer