When do BUFFERED and NEITHER methods break even?

Hi guys!

Does anybody know (approximate) magic number when METHOD_BUFFERED breaks even with MTHOD_NEITHER as far as performance is concerned?
I can vaguely remember something like 64 bytes or so when copy back’n’forth overhead breaks even with building and locking MDLs overhead, but it’s been too long ago. So, are there any guidelines on when to use which method?

I remember that 1 page is an approx boundary where _DIRECT IOCTLs are
faster then _BUFFERED.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntfsd…
Hi guys!

Does anybody know (approximate) magic number when METHOD_BUFFERED breaks even
with MTHOD_NEITHER as far as performance is concerned?
I can vaguely remember something like 64 bytes or so when copy back’n’forth
overhead breaks even with building and locking MDLs overhead, but it’s been too
long ago. So, are there any guidelines on when to use which method?

> Does anybody know (approximate) magic number when METHOD_BUFFERED breaks even

with MTHOD_NEITHER as far as performance is concerned?

If you use METHOD_NEITHER you have to do all necessary validation, which is going to offset
the potential speed-up that results from eliminating copying to/from the buffer. As Don Burn said,
he requests his developers to present a doc with all pros and cons whenever they want to use METHOD_NEITHER, and, according to Don, in most cases they come to the conclusion that METHOD_NEITHER does not offer any advantage in a given situation. In other words, think it over again - there is a good chance that using METHOD_BUFFERED is more reasonable approach in your situation…

Anton Bassov

Thanks, Maxim! Although, it seems that one page is too much, but again, it was ten years ago when I’ve heard the number. I guess, I’ll have to run some stats myself :frowning:
*
Anton, there are cases when one buffering method is more suitable than another. And for large buffers one of DIRECT or NEITHER methods are more suitable than BUFFERED. That’s all. Of course, as usual, if you work with buffers located in UM (even if they are locked) you have to be careful not to trust anything in these buffers (like embedded offsets or lengths), but as long as you remember this rule and follow it religiously, you can freely choose one over another, and benefit from your choice depending on the scenario.

No one is arguing about BUFFERED versus DIRECT, though your assumption
about a page being too much is incorrect AFAIK. The overhead of setting up
a DIRECT call is enough that you can copy a page. As Anton pointed out I
ask developers to justify method NEITHER, and they quickly discover they
can’t. The amount of pain to get it right is just too much, and for those
who have used method NEITHER a good code review has always found bugs that
cause crashes.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntfsd…
Thanks, Maxim! Although, it seems that one page is too much, but again, it
was ten years ago when I’ve heard the number. I guess, I’ll have to run
some stats myself :frowning:
*
Anton, there are cases when one buffering method is more suitable than
another. And for large buffers one of DIRECT or NEITHER methods are more
suitable than BUFFERED. That’s all. Of course, as usual, if you work with
buffers located in UM (even if they are locked) you have to be careful not
to trust anything in these buffers (like embedded offsets or lengths), but
as long as you remember this rule and follow it religiously, you can freely
choose one over another, and benefit from your choice depending on the
scenario.

> Anton, there are cases when one buffering method is more suitable than another.

And for large buffers one of DIRECT or NEITHER methods are more suitable than
BUFFERED. That’s all. Of course, as usual, if you work with buffers located in
UM (even if they are locked) you have to be careful not to trust anything in
these buffers (like embedded offsets or lengths), but as long as you remember
this rule and follow it religiously you can freely choose one over another, and
benefit from your choice depending on the scenario.

To begin with, METHOD_DIRECT and METHOD_NEITHER are from the totally different fields. No one argues that METHOD_DIRECT works better than METHOD_BUFFERED for large transfers. However, this is not necessarily the case with METHOD_NEITHER. Please don’t forget that “religiously following” the above rule does not come for free - the amount of validation that you have to make will, in most cases, offset any potential benefits. Certainly, there are some situations when METHOD_NEITHER applies. For example, consider the scenario when, for this or that reason, you have to pass a linked list with IOCTL (for example, the buffer is so large that you are not sure that the client app will always be able to find the contigious address space for it). In this case, it is, indeed, better to use METHOD_NEITHER, rather than sending separate request with each buffer.
However, these situations are not that frequent, and I am not 100% sure that your particular situation is among them…

Anton Bassov

Don: I’ll gladly present my code for your review. Just kidding :wink:
*
Anton: the only significant difference between DIRECT and NEITHER methods is that for the first one IOMgr will take care of MDLing the buffer while for the second one it’s your responsibility. Other than that there are no differences in how you handle these types of I/O. Am I missing something?

wrote in message news:xxxxx@ntfsd…
> Don: I’ll gladly present my code for your review. Just kidding :wink:
> *
> Anton: the only significant difference between DIRECT and NEITHER methods
> is that for the first one IOMgr will take care of MDLing the buffer while
> for the second one it’s your responsibility. Other than that there are no
> differences in how you handle these types of I/O. Am I missing something?
>

Yes you don’t have a clue on how much validation the IoManager is doing
with the buffers. If your code has not been reviewed by a set of really
good developers (and rereviewed every time you touch it afterward), it is
probably a disaster waiting to happen.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

Don, if you really care and want to make this discussion constructive, then just give me some of these clues that I don’t have. Otherwise, if you don’t care, why bother replying to clueless posts?

You stated that all that METHOD DIRECT did over METHOD NEITHER is build a
MDL. Sorry that is not the case. If you can really justify using METHOD
NEITHER fine, but expect a lot of bugs, since validating the buffers is not
a small thing. If it was, then we would not hear of all the buffer
overflow security holes that continue to appear in the various OS’es.

So far you have not addressed the issue of why you think you need METHOD
NEITHER. Even having worked on systems that deliver high speed video, I
have never encountered a justified use of this method.

If you want do this, then go read some of the better security books on
buffer validation. Take a lot of time and expect to have a lot of code
when you are done. Finally, if this is for an IOCTL use the device path
exerciser with driver verifier on this, running the code for a long time (I
recomend a few days) to find the bugs that always seem to be there.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntfsd…
> Don, if you really care and want to make this discussion constructive,
> then just give me some of these clues that I don’t have. Otherwise, if
> you don’t care, why bother replying to clueless posts?
>

Don: We’re talking about different things. You’re saying that using NEITHER requires justification. I totally agreed and I don’t use it too often just because in most cases using one of the DIRECT methods is all I need. However, I want to use method NEITHER to tune up my inverted calls machinery, when IRP that delivers (potentially large) UM responses can also pick up (potentially large) KM requests. This can improve overall inverted calls performance when there are many KM requests coming quickly one after another. I know that performance benefits may be not that significant to 100% justify the usage of NEITHER method, but it’s one of the cases when using NEITHER seems to be beneficial. So, why not benefit from it?
I do use dev path exerciser in my tests. May be not that extensively as for many days, but certainly for overnight testing. And I did have some bugs related to the lack of buffer validations, but nothing too overwhelming to justify not using NEITHER. So, if you could name top five (or three, or whatever number you want) checks that IOMgr does on DIRECT which devs typically don’t do on NEITHER, it would be much more beneficial for all of us that just continuing this discussion on preferences.

> Anton: the only significant difference between DIRECT and NEITHER methods is

that for the first one IOMgr will take care of MDLing the buffer while for the
second one it’s your responsibility.

Actually, using MDL defeats the very purpose of METHOD_NEITHER, don’t you think?
METHOD_NEITHER is meant to be used by the highest-level drivers so that they can access the target buffer directly by its virtual address in context of a calling process. Otherwise, METHOD_NEITHER just turns into METHOD_DIRECT.

In general, you should realize that IO method, in practical terms, refers to how your driver *actually* does IO, rather than to the formal declaration. For example, lets say you use METHOD_BUFFERED, and pass the addresses of some other buffers in a system buffer. If you
build MDLs for them, your *actual* IO will be METHOD_DIRECT, and if you access them directly, it will be METHOD_NEITHER. Thererfore, a single request may contain additional subrequests that
*in actuality* may rely upon IO methods that are different from the one that the main request uses…

Anton Basssov

So, I ran some tests :wink: and got puzzling (well, puzzling to me) results. All tests were run as such:
For each pair (method, buffer_size) in a time-restricted loop (4 seconds for each pair) I was issuing (and counting number of) synchronous DeviceIoControl calls to my test driver (no verifier attached). For each IOCTL both, in and out buffers had same size (buffer_size) and both were page-aligned. In the driver, for NEITHER method I create MDL, lock it and get mapped system address. After that (and in BUFFERED case starting with this) I just complete IRP with IO_STATUS.Information set to IOCTL’s input buffer length. So, there is no data copy is involved, just pure minimal overhead calculation. Result was sort of expected with one puzzling exception: a sudden (and significant) drop in BUFFERED method performance around 1.5K – 3.5K buffer size region. This drop is pretty much consistent. No matter how many times I run the test, it’s always there at least for one buffer size from this range. The machine I ran this test is HT single CPU with 3G of memory. Probably, I should Kernrate that test to get an idea where is the source for the drop, but before starting that extra work, I’d rather ask the experts :wink: So, does anybody have an idea why I see what I see?
Here are the stats that I collected:

Method: BUFFERED, Buffer size = 6144, Requests 336257, Rate = 84 req / msec
Method: NEITHER, Buffer size = 6144, Requests 387828, Rate = 96 req / msec

Method: BUFFERED, Buffer size = 5632, Requests 341320, Rate = 85 req / msec
Method: NEITHER, Buffer size = 5632, Requests 387961, Rate = 96 req / msec

Method: BUFFERED, Buffer size = 5120, Requests 341749, Rate = 85 req / msec
Method: NEITHER, Buffer size = 5120, Requests 387452, Rate = 96 req / msec

Method: BUFFERED, Buffer size = 4608, Requests 345939, Rate = 86 req / msec
Method: NEITHER, Buffer size = 4608, Requests 389834, Rate = 97 req / msec

Method: BUFFERED, Buffer size = 4096, Requests 350218, Rate = 87 req / msec
Method: NEITHER, Buffer size = 4096, Requests 402132, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 3584, Requests 380733, Rate = 95 req / msec
Method: NEITHER, Buffer size = 3584, Requests 400632, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 3072, Requests 110274, Rate = 27 req / msec
Method: NEITHER, Buffer size = 3072, Requests 400716, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 2560, Requests 122674, Rate = 30 req / msec
Method: NEITHER, Buffer size = 2560, Requests 402039, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 2048, Requests 123675, Rate = 30 req / msec
Method: NEITHER, Buffer size = 2048, Requests 400389, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 1536, Requests 420653, Rate = 105 req / msec
Method: NEITHER, Buffer size = 1536, Requests 402658, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 1024, Requests 431754, Rate = 107 req / msec
Method: NEITHER, Buffer size = 1024, Requests 402271, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 512, Requests 457249, Rate = 114 req / msec
Method: NEITHER, Buffer size = 512, Requests 402445, Rate = 100 req / msec

I guess I don’t really follow the discussion: The delta between METHOD_NEITHER and METHOD_BUFFERED will be exactly:

a) I/O Manager overhead required to validate the buffers (no validation for METHOD_NEITHER);
b) I/O Manager overhead required to allocate pool and copy the data (no copy for METHOD_NEITHER)

Given that your driver will have to do its own buffer validation for the neither I/O buffer, and assuming you can write code as well as the Windows folks, the overhead due to that checking will offset the I/O Manager’s overhead to do the same.

Thus, you’re left with a direct, linear, increase in (CPU TIME) cost related to the size of the data that is copied. So, the answer to the OP’s question about where neither breaks-even with buffered in an IOCTL is it never does. Neither is always going to be faster (unless your driver has to allocate a buffer from non-paged pool and copy the data from the METHOD_NEITHER buffer into it).

The problem with trying to measure “wall clock” time (as done in this experiment) is you need to account for all the anomalies that the scheduler can introduce.

Whether it is WISE to use METHOD_NEITHER is an entirely different question. I agree with Don (et al): Don’t do it unless you are certain that you know what you’re doing. There are validation and security issues that need to be properly dealt with. I am of the opinion that METHOD_NEITHER IOCTLs are a hideous plague upon humankind, and have at points even gone as far as advocating removal of this support from Windows. Seriously.

Peter
OSR

Sorry, your sizes are just way too small to justify using METHOD_NEITHER.
The only times I have seen anything near justification for using it was in
a system that handed in 4MB buffers and larger at a time. Of course, these
big sizes do not work for METHOD_BUFFERED so comparing things this way does
not work. Though it will work to compare METHOD_NEITHER with all the code
you need to do to validate things, with the METHOD_IN_DIRECT or
METHOD_OUT_DIRECT.

As Peter pointed out the wall clock hits a number of scheduler problems.
About the best you can do is run the tests on a system with as much as
possible in the way of user space stuff shut down (i.e. all services you
can) and if possible using the EMS console so there is no graphics or
shell. Then run the test for 24 hours, this will at least give you a good
average to study.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntfsd…
So, I ran some tests :wink: and got puzzling (well, puzzling to me) results.
All tests were run as such:
For each pair (method, buffer_size) in a time-restricted loop (4 seconds
for each pair) I was issuing (and counting number of) synchronous
DeviceIoControl calls to my test driver (no verifier attached). For each
IOCTL both, in and out buffers had same size (buffer_size) and both were
page-aligned. In the driver, for NEITHER method I create MDL, lock it and
get mapped system address. After that (and in BUFFERED case starting with
this) I just complete IRP with IO_STATUS.Information set to IOCTL’s input
buffer length. So, there is no data copy is involved, just pure minimal
overhead calculation. Result was sort of expected with one puzzling
exception: a sudden (and significant) drop in BUFFERED method performance
around 1.5K - 3.5K buffer size region. This drop is pretty much consistent.
No matter how many times I run the test, it’s always there at least for one
buffer size from this range. The machine I ran this test is HT single CPU
with 3G of memory. Probably, I should Kernrate that test to get an idea
where is the source for the drop, but before starting that extra work, I’d
rather ask the experts :wink: So, does anybody have an idea why I see what I
see?
Here are the stats that I collected:

Method: BUFFERED, Buffer size = 6144, Requests 336257, Rate = 84 req / msec
Method: NEITHER, Buffer size = 6144, Requests 387828, Rate = 96 req / msec

Method: BUFFERED, Buffer size = 5632, Requests 341320, Rate = 85 req / msec
Method: NEITHER, Buffer size = 5632, Requests 387961, Rate = 96 req / msec

Method: BUFFERED, Buffer size = 5120, Requests 341749, Rate = 85 req / msec
Method: NEITHER, Buffer size = 5120, Requests 387452, Rate = 96 req / msec

Method: BUFFERED, Buffer size = 4608, Requests 345939, Rate = 86 req / msec
Method: NEITHER, Buffer size = 4608, Requests 389834, Rate = 97 req / msec

Method: BUFFERED, Buffer size = 4096, Requests 350218, Rate = 87 req / msec
Method: NEITHER, Buffer size = 4096, Requests 402132, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 3584, Requests 380733, Rate = 95 req / msec
Method: NEITHER, Buffer size = 3584, Requests 400632, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 3072, Requests 110274, Rate = 27 req / msec
Method: NEITHER, Buffer size = 3072, Requests 400716, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 2560, Requests 122674, Rate = 30 req / msec
Method: NEITHER, Buffer size = 2560, Requests 402039, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 2048, Requests 123675, Rate = 30 req / msec
Method: NEITHER, Buffer size = 2048, Requests 400389, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 1536, Requests 420653, Rate = 105 req /
msec
Method: NEITHER, Buffer size = 1536, Requests 402658, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 1024, Requests 431754, Rate = 107 req /
msec
Method: NEITHER, Buffer size = 1024, Requests 402271, Rate = 100 req / msec

Method: BUFFERED, Buffer size = 512, Requests 457249, Rate = 114 req / msec
Method: NEITHER, Buffer size = 512, Requests 402445, Rate = 100 req / msec

Don, buffer sizes you see in the stats are there not to justify using NEITHER. It just happened that it’s indeed close to page size where BUFFERED and NEITHER start to break even on the “minimal required handling”. So, I just have thrown out the “upper part” of the stats, where I started around 1M buffer size.

Now, guys, I’m not stubborn (although, at times, I may seem like it) and I do value opinion of people who are more knowledgeable and experienced than I’m. And I listen to their advices. However, being in this business (FS-stack drivers) for ten years and have done some real projects, I also have heard number of “Urban legends” even from experienced and authoritative guys, that turned out (the legends) to be not as scary as they seemed to be. So, with all due respect, I’m much more interested in knowing what indeed is going on behind “buffer validation” that IOMgr does on DIRECT methods, than blindly follow advice “just don’t do it”. Yeah, I’m convinced that for what I currently need DIRECT is good enough, but I’m still unsatisfied in the quest for knowledge :wink:

Cheers

To go back to your original problem, if preformance is the thing you are
most concerned about, I would recomend you figure a maximum inflight
request value (MIR) then allocate a buffer 2*MIR*(sizeof(data)) and pass
that in as DIRECT and map the data and pend the call. Then for the
individual calls, you pass a buffer of 4 bytes which is the index to the
data area from the direct call you passed. These later calls should be
BUFFERED. On shutdown of the application you cancel the DIRECT Ioctl.

This will eliminate the overhead you are so worried about, without using
things like NEITHER that are hard to get right. Of course there are
additional optimizations such as multiple buffer indexes in a single IOCTL
that can be experimented with.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntfsd…
Don, buffer sizes you see in the stats are there not to justify using
NEITHER. It just happened that it’s indeed close to page size where
BUFFERED and NEITHER start to break even on the “minimal required handling”.
So, I just have thrown out the “upper part” of the stats, where I started
around 1M buffer size.

Now, guys, I’m not stubborn (although, at times, I may seem like it) and I
do value opinion of people who are more knowledgeable and experienced than
I’m. And I listen to their advices. However, being in this business
(FS-stack drivers) for ten years and have done some real projects, I also
have heard number of “Urban legends” even from experienced and
authoritative guys, that turned out (the legends) to be not as scary as
they seemed to be. So, with all due respect, I’m much more interested in
knowing what indeed is going on behind “buffer validation” that IOMgr does
on DIRECT methods, than blindly follow advice “just don’t do it”. Yeah, I’m
convinced that for what I currently need DIRECT is good enough, but I’m
still unsatisfied in the quest for knowledge :wink:

Cheers

>validation” that IOMgr does on DIRECT methods, than blindly follow advice
“just

don’t do it”.

IoAllocateMdl + MmProbeAndLockPages

Disassemble NtReadFile yourself.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim, this was my impression too (although IOCTL is the subject, not the Read). But since Don and Pete suggest that there is much more, this make me to be suspicious

wrote in message news:xxxxx@ntfsd…
> Maxim, this was my impression too (although IOCTL is the subject, not the
> Read). But since Don and Pete suggest that there is much more, this make
> me to be suspicious
>

The problem is that using METHOD_NEITHER where the first thing you do is
build the MDL and map it just reduces the call to a METHOD_XXX_DIRECT. So
why are you writing code to do something the kernel is going to do for you?

The problem with METHOD_NEITHER is getting it right when you want to do
somethng other than code METHOD_XXX_DIRECT yourself. Unfortunately, way
too many folks have done this with disasterous results. Even trying to
handle the buffer in smaller pieces can lead to errors and security holes.
My comments on what the IoManager does for you are in relation to your
trying to handle it for this case, since I have never seen the first case,
except in code where someone tried to be tricky and in the end gave up
saying I should have used METHOD_XXX_DIRECT.

So you can use METHOD_NEITHER to code METHOD_XXX_DIRECT yourself. But
that would just make any experienced developer question why and what did he
do wrong. Or you can try to use METHOD_NEITHER to do something fancy, and
almost guarantee you will do something wrong.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply