ZwMapViewOfSection's BaseAddress is unaccessable

Jamey_Kirby · June 3, 2017, 10:35am

A driver I recently created uses a system thread in the user-mode process
context. I pass a buffer and several events to the driver in an IOCTL no
reverse callback or section needed. This is very fast and efficient. It
takes a little work to make sure termination an cleanup are clean, but wow,
it’s zippy.

On Fri, Jun 2, 2017, 1:06 AM wrote:

>

>
> Well, I don’t see anything wrong with this approach either. However, as
> we can see, some posters prefer to present their “Dr.Flounder -style”
> propaganda/hysteria as a sound technical analysis while blowing the
> existing minor issues totally out of the proportion…
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

david_gomes · June 4, 2017, 7:30am

@Jamey Kirby can you post some guidelines?

OSR_Community_User · June 4, 2017, 10:42am

If you want to notify a user app that something happened you can use :

. ETW or Event Tracing for Windows.

https://msdn.microsoft.com/fr-fr/library/windows/desktop/bb968803(v=vs.85).aspx

. PNP custom event notification.

https://msdn.microsoft.com/en-us/library/windows/hardware/ff565465(v=vs.85).aspx

david_gomes · June 4, 2017, 11:01am

@N. D. My problem isn’t notify but pass data, tried using sections, tried kestackattach and directly using the usermode address, tried kestackattach+rtlcopymemory, tried mmcopyvirtualmemory and nothing worked! I don’t really wanna use inverted call.

Slava_Imameev · June 4, 2017, 11:06am

This is an overkill for the OP who is after the event shared by a handle passed to a driver. This is accomplished by ObReferenceObjectByHandle, KeSetEvent, KeWaitForSingleObject .

Peter_Viscarola_OSR · June 4, 2017, 11:55am

Because?

Yet you’re willing to use some of the other complex, heavyweight, difficult, bug prone, methods you’ve listed.

The OS provides I/O requests for use in exactly the situation you want. What is it that makes you need to avoid completing a pending IRP?

Peter
OSR

Jamey_Kirby · June 4, 2017, 12:02pm

Sure. I’ll do it shortly when I get back to my desk

On Sun, Jun 4, 2017, 7:31 AM wrote:

> @Jamey Kirby can you post some guidelines?
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

OSR_Community_User · June 4, 2017, 12:48pm

You can transfer data with a custom PNP event.

https://msdn.microsoft.com/en-us/library/windows/hardware/ff564596(v=vs.85).aspx

Jamey_Kirby · June 4, 2017, 12:48pm

davg556:

Set up an IOCTL in your driver.
In user mode, allocate your buffer with VirtualAlloc(…, COMMIT |
RESERVE);
Lock the buffer with VirtualLock()
Create your events
Call that IOCTL with your buffer, and your events in a data structure.
Reference the event handles with ObReferenceObjectByHandle() so that you
can use kernel primitives to access the events. You could not do this and
use functions like ZwSetEvent() rather than KeSetEvent(), but I prefer to
use the native objects when in kernel mode, so I do the reference. It gives
you more flexibility.
In the IOCTL, you are running in the process context of the UM app that
called you, so now create a thread with PsCreateSystemThread(), but for the
process parameter, instead of passing NULL for the system process, pass
ZwCurrentProcess(). This will cause the thread to be created in your UM
processes context. If you run procexp.exe, you’ll see your thread running
as a thread in your process (but owned by your driver). Now, when your
thread is running, it will have access to everything in your processes
space.

When your process terminates, your thread will NOT terminate, so you have
to handle that. One event that I pass down to the driver is a terminate
event. If the user-mode apps sets this event, it signals the thread to
terminate, and do any app specific cleanup. I pass two events
(terminate_start_event, and terminate_stop_event). When my application
wants to terminate, I set this start event, and then wait for the complete
event in WaifForSingleObjet(). This synchronized the threads termination so
that the application does not terminate before the thread terminates. The
driver cleans up, and the sets the terminate complete event.

Abnormal termination is handled by sending as another parameter to the
IOCTL, the main processes thread (the thread that is running under main(),
WinMain(), or ServiceMain()). This is a fail safe in the event that the
application terminates abnormally. This thread is also waited on in the
driver. If it is set, do your cleanup and terminate the thread.

That handles termination. Now for request processing. I send in a
start_request event, and a complete_request event. When the driver needs
work from UM, it fills in the buffer, and then sets the start_event, and
then waits on the complete event. UM gets the start event, processed the
request in the buffer, and when done, sets the complete event.

You need to wrap access to your UM objects and buffers in a __try __except
block; safety check for bad hombres.

My driver is quite complex, and I pass lots of things to the driver. What I
outlined is a simple case. I did mine the way I did more as an exercise
than for practicality. My goal was to reduce the IOCTLs to the driver to
one.

It was a little tricky to get everything working smoothly, but it is now
rock-solid with excellent data transfer speeds.

I wrote this post in a bit of a rush, so I apologize in advance for my
chicken scratch.

– Jamey

On Sun, Jun 4, 2017 at 12:01 PM Jamey Kirby wrote:

> Sure. I’ll do it shortly when I get back to my desk
>
> On Sun, Jun 4, 2017, 7:31 AM wrote:
>
>> @Jamey Kirby can you post some guidelines?
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: <
>> http://www.osronline.com/showlists.cfm?list=ntdev>
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>> software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer>
>>
></http:>

MBond · June 4, 2017, 8:06pm

Note that the call to VirtualLock here is unnecessary and requires UM privileges that may not be present (seLockMemory). It is good practice of course to do this, but it is not strictly necessary as in KM all buffers from UM must be probed and locked as a matter of course to prevent security issues

Note also that signaling an event that a UM thread has been waiting on will require interaction with the scheduler, a KM / UM transition and possibly a context switch. This overheat exactly parallels the most costly parts of an IOCTL plus an interaction with the scheduler which may not be required for an IOCTL. It is of course possible to avoid these overheads by using only interlocked operations in the shared memory region, but that would mean developing your own version of an CRITICAL_SECTION that can be shared between UM & KM (an inherent security issue) or using a spin wait in UM ? nether is an attractive option IMHO.

I have said many times on this form, that a correct implementation of a shared memory interface with a UM app generally is much more complex and performs no better than the use of IOCTLs. Clearly it is possible to do better under specific circumstances where you can make assumptions that the MSFT engineers cannot, but do you really think you can outguess the 1,000+ man years that MSFT have put into the NT kernel? ( I have no figures on this, but it seems a reasonable number)

Sent from Mailhttps: for Windows 10

From: Jamey Kirbymailto:xxxxx
Sent: June 4, 2017 12:49 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: Re: [ntdev] ZwMapViewOfSection’s BaseAddress is unaccessable

davg556:

1) Set up an IOCTL in your driver.
2) In user mode, allocate your buffer with VirtualAlloc(…, COMMIT | RESERVE);
3) Lock the buffer with VirtualLock()
4) Create your events
5) Call that IOCTL with your buffer, and your events in a data structure.
6) Reference the event handles with ObReferenceObjectByHandle() so that you can use kernel primitives to access the events. You could not do this and use functions like ZwSetEvent() rather than KeSetEvent(), but I prefer to use the native objects when in kernel mode, so I do the reference. It gives you more flexibility.
7) In the IOCTL, you are running in the process context of the UM app that called you, so now create a thread with PsCreateSystemThread(), but for the process parameter, instead of passing NULL for the system process, pass ZwCurrentProcess(). This will cause the thread to be created in your UM processes context. If you run procexp.exe, you’ll see your thread running as a thread in your process (but owned by your driver). Now, when your thread is running, it will have access to everything in your processes space.

When your process terminates, your thread will NOT terminate, so you have to handle that. One event that I pass down to the driver is a terminate event. If the user-mode apps sets this event, it signals the thread to terminate, and do any app specific cleanup. I pass two events (terminate_start_event, and terminate_stop_event). When my application wants to terminate, I set this start event, and then wait for the complete event in WaifForSingleObjet(). This synchronized the threads termination so that the application does not terminate before the thread terminates. The driver cleans up, and the sets the terminate complete event.

Abnormal termination is handled by sending as another parameter to the IOCTL, the main processes thread (the thread that is running under main(), WinMain(), or ServiceMain()). This is a fail safe in the event that the application terminates abnormally. This thread is also waited on in the driver. If it is set, do your cleanup and terminate the thread.

That handles termination. Now for request processing. I send in a start_request event, and a complete_request event. When the driver needs work from UM, it fills in the buffer, and then sets the start_event, and then waits on the complete event. UM gets the start event, processed the request in the buffer, and when done, sets the complete event.

You need to wrap access to your UM objects and buffers in a try except block; safety check for bad hombres.

My driver is quite complex, and I pass lots of things to the driver. What I outlined is a simple case. I did mine the way I did more as an exercise than for practicality. My goal was to reduce the IOCTLs to the driver to one.

It was a little tricky to get everything working smoothly, but it is now rock-solid with excellent data transfer speeds.

I wrote this post in a bit of a rush, so I apologize in advance for my chicken scratch.

– Jamey

On Sun, Jun 4, 2017 at 12:01 PM Jamey Kirby > wrote:
Sure. I’ll do it shortly when I get back to my desk
On Sun, Jun 4, 2017, 7:31 AM > wrote:
@Jamey Kirby can you post some guidelines?

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:
— NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! Details at To unsubscribe, visit the List Server section of OSR Online at</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

anton_bassov · June 4, 2017, 9:33pm

Ha-ha-ha!!! The most interesting point here is that he DOES NOT seem to probe and lock pages in a driver at all - he calls VirtualLock() instead. In fact, the point of the whole exercise seems to be avoiding the standard procedure of probing and locking pages in MDL with subsequent mapping of the target MDL into the kernel address space. He wants to access the target buffers by their userland addresses instead, and, in order to be able to do so, he starts a system thread that runs only in context of the target app, rather than in arbitrary one.

There are 2 points that are worth mentioning here:

IIRC, VirtualLock() locks pages only in working set, rather than in RAM, by specifying a corresponding flag in an underlying call to ZwLockVirtualMemory(). As you have pointed out already,
locking pages in physical RAM requires seLockMemory privilege that,IIRC, is disabled by default even for Admin account - Admins have to explicitly grant this privilege to themselves. Therefore,
VirtualLock() locks pages only in a working set - otherwise, it would have to return failure 99% of the time. This means the target pages may still be swapped out under some circumstances.
Even if it was perfectly safe to access userland addresses from a driver doing so would, in terms of performance, offer no advantage whatsoever if one compared it to a standard and well-defined approach of accessing a probed,locked and mapped MDL that describes the target range.

I don’t mean to offend Mr.Kirby in any possible way, but the thing he suggests is typical of an ambitious newbie who does things in convoluted ways just in order to prove that they can be done.
Even if it worked the intended way (and it does not seem to, as we saw already) it would still add absolutely unnecessary extra complications without offering even a slightest advantage…

Anton Bassov

Jamey_Kirby · June 4, 2017, 10:46pm

First, I do probe. My write up was quick, and not complete. I wan’t going
to write the code for him. Here is a piece of my code:

__try {
ProbeForWrite(request, sizeof(ENCAPSULATED_REQUEST), PAGE_SIZE);
request->transfer_bytes = srb->DataTransferLength;
request->valid_sense_bytes = srb->SenseInfoBufferLength;
request->path = srb->PathId;
request->target = srb->TargetId;
request->lun = srb->Lun;
StorPortCopyMemory(&request->cdb, &srb->Cdb, sizeof(CDB));
if ((srb->SrbFlags & SRB_FLAGS_DATA_OUT) && srb->DataTransferLength) {
PVOID buffer;
StorPortGetSystemAddress(vhba_ext, srb, &buffer);
StorPortCopyMemory(request->io_buffer, buffer,
srb->DataTransferLength <= MSTOR_MAX_IO_SIZE ?
srb->DataTransferLength : MSTOR_MAX_IO_SIZE);
}
}

I also mentioned in my post that this was more for experimental purposes. I
took an existing driver, and converted it to use this method. Not quite
sure what I will do with it yet.

The best way to learn is to make mistakes.

– Jamey

On Sun, Jun 4, 2017 at 9:32 PM wrote:

>

>
>
> Ha-ha-ha!!! The most interesting point here is that he DOES NOT seem to
> probe and lock pages in a driver at all - he calls VirtualLock() instead.
> In fact, the point of the whole exercise seems to be avoiding the standard
> procedure of probing and locking pages in MDL with subsequent mapping of
> the target MDL into the kernel address space. He wants to access the target
> buffers by their userland addresses instead, and, in order to be able to do
> so, he starts a system thread that runs only in context of the target app,
> rather than in arbitrary one.
>
> There are 2 points that are worth mentioning here:
>
>
> 1. IIRC, VirtualLock() locks pages only in working set, rather than in
> RAM, by specifying a corresponding flag in an underlying call to
> ZwLockVirtualMemory(). As you have pointed out already,
> locking pages in physical RAM requires seLockMemory privilege that,IIRC,
> is disabled by default even for Admin account - Admins have to explicitly
> grant this privilege to themselves. Therefore,
> VirtualLock() locks pages only in a working set - otherwise, it would
> have to return failure 99% of the time. This means the target pages may
> still be swapped out under some circumstances.
>
>
> 2. Even if it was perfectly safe to access userland addresses from a
> driver doing so would, in terms of performance, offer no advantage
> whatsoever if one compared it to a standard and well-defined approach of
> accessing a probed,locked and mapped MDL that describes the target range.
>
>
> I don’t mean to offend Mr.Kirby in any possible way, but the thing he
> suggests is typical of an ambitious newbie who does things in convoluted
> ways just in order to prove that they can be done.
> Even if it worked the intended way (and it does not seem to, as we saw
> already) it would still add absolutely unnecessary extra complications
> without offering even a slightest advantage…
>
>
>
> Anton Bassov
>
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Jamey_Kirby · June 4, 2017, 11:02pm

Anton, thanks for forcing me to go back and review my code. I found a bug.
Nothing related to our discussion, but on the return path, I was forgetting
to copy my sense buffer back to the Srb

On Sun, Jun 4, 2017 at 10:45 PM Jamey Kirby wrote:

> First, I do probe. My write up was quick, and not complete. I wan’t going
> to write the code for him. Here is a piece of my code:
>
> __try {
> ProbeForWrite(request, sizeof(ENCAPSULATED_REQUEST), PAGE_SIZE);
> request->transfer_bytes = srb->DataTransferLength;
> request->valid_sense_bytes = srb->SenseInfoBufferLength;
> request->path = srb->PathId;
> request->target = srb->TargetId;
> request->lun = srb->Lun;
> StorPortCopyMemory(&request->cdb, &srb->Cdb, sizeof(CDB));
> if ((srb->SrbFlags & SRB_FLAGS_DATA_OUT) && srb->DataTransferLength) {
> PVOID buffer;
> StorPortGetSystemAddress(vhba_ext, srb, &buffer);
> StorPortCopyMemory(request->io_buffer, buffer,
> srb->DataTransferLength <= MSTOR_MAX_IO_SIZE ?
> srb->DataTransferLength : MSTOR_MAX_IO_SIZE);
> }
> }
>
> I also mentioned in my post that this was more for experimental purposes.
> I took an existing driver, and converted it to use this method. Not quite
> sure what I will do with it yet.
>
> The best way to learn is to make mistakes.
>
> – Jamey
>
> On Sun, Jun 4, 2017 at 9:32 PM wrote:
>
>>

>>
>>
>> Ha-ha-ha!!! The most interesting point here is that he DOES NOT seem to
>> probe and lock pages in a driver at all - he calls VirtualLock() instead.
>> In fact, the point of the whole exercise seems to be avoiding the standard
>> procedure of probing and locking pages in MDL with subsequent mapping of
>> the target MDL into the kernel address space. He wants to access the target
>> buffers by their userland addresses instead, and, in order to be able to do
>> so, he starts a system thread that runs only in context of the target app,
>> rather than in arbitrary one.
>>
>> There are 2 points that are worth mentioning here:
>>
>>
>> 1. IIRC, VirtualLock() locks pages only in working set, rather than in
>> RAM, by specifying a corresponding flag in an underlying call to
>> ZwLockVirtualMemory(). As you have pointed out already,
>> locking pages in physical RAM requires seLockMemory privilege that,IIRC,
>> is disabled by default even for Admin account - Admins have to explicitly
>> grant this privilege to themselves. Therefore,
>> VirtualLock() locks pages only in a working set - otherwise, it would
>> have to return failure 99% of the time. This means the target pages may
>> still be swapped out under some circumstances.
>>
>>
>> 2. Even if it was perfectly safe to access userland addresses from a
>> driver doing so would, in terms of performance, offer no advantage
>> whatsoever if one compared it to a standard and well-defined approach of
>> accessing a probed,locked and mapped MDL that describes the target range.
>>
>>
>> I don’t mean to offend Mr.Kirby in any possible way, but the thing he
>> suggests is typical of an ambitious newbie who does things in convoluted
>> ways just in order to prove that they can be done.
>> Even if it worked the intended way (and it does not seem to, as we saw
>> already) it would still add absolutely unnecessary extra complications
>> without offering even a slightest advantage…
>>
>>
>>
>> Anton Bassov
>>
>>
>>
>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: <
>> http://www.osronline.com/showlists.cfm?list=ntdev>
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>> software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer>
>>
></http:>

anton_bassov · June 4, 2017, 11:48pm

Jamey,

I also mentioned in my post that this was more for experimental purposes.

Fair enough, but what is the point of this experimentation if it does not offer you any advantage,
real one or otherwise, while giving you an extra headache(and even a certain overhead of calling ProbeForRead() upon every access of the target buffer)???

The only situation when it may make sense is the one when your target buffer is REALLY HUGE (and I mean it - at least few hundred MB) so that you cannot afford to lock the entire buffer into RAM. However, your call to VirtualLock() strongly suggest this is not your objective…

Anton Bassov

OSR_Community_User · June 5, 2017, 12:09am

The inverted call model is quite flexible.

The app could pend a large number of, say, standard pended requests in a
(main) pool. The driver would then consume these requests as needed.

In a secondary (manual) queue, the driver would pend special requests. For
instance, one that on completion, would signal the app that the main pool
of requests is empty. The app would then, in response to the completion of
this special request, pend a large number of requests in the main pool
along with a new special request so that the driver can resume its
operations.

With the KMDF framework it is easy: just push and pop requests from the
appropriate queue.

david_gomes · June 5, 2017, 4:29am

Ok I concede, I’ll try implementing inverted calls and come back if it doesn’t work.

Thanks anyways.

Jamey_Kirby · June 5, 2017, 10:57am

It’s faster using a dedicated kernel thread attached to the application.
I’ve tested inverted IOCTL, shared sections, and kernel thread attach.
Kernel thread attach is faster. Inverted call does make life easier. You
see, it was that experimentation that lead me to the results that it is
faster. So people can talk until they are blue in the face as far as I am
concerned. I know the truth because I tried it.

On Mon, Jun 5, 2017 at 4:29 AM wrote:

> Ok I concede, I’ll try implementing inverted calls and come back if it
> doesn’t work.
>
> Thanks anyways.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Don_Burn · June 5, 2017, 11:07am

Jamey,

For some specific example your test is valid. I’ve done these experiments myself and depending on what you are attempting to achieve, inverted call is just as fast. There is no “one right way” they can all be valid depending on the circumstances.

What people should recognizes is that premature optimization is never a good idea. Inverted call can be done simply in KMDF, it can then be optimized if needed in many ways. What I find discouraging are the number of complex interfaces out there, that once you test the product could have been done simpler. It is better to use a simple approach, and then optimize to the level of performance you need once you know everything is working, than to add complexity to a driver until it is proven it is needed.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jamey Kirby
Sent: Monday, June 05, 2017 10:57 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] ZwMapViewOfSection’s BaseAddress is unaccessable

It’s faster using a dedicated kernel thread attached to the application. I’ve tested inverted IOCTL, shared sections, and kernel thread attach. Kernel thread attach is faster. Inverted call does make life easier. You see, it was that experimentation that lead me to the results that it is faster. So people can talk until they are blue in the face as far as I am concerned. I know the truth because I tried it.

On Mon, Jun 5, 2017 at 4:29 AM > wrote:

Ok I concede, I’ll try implementing inverted calls and come back if it doesn’t work.

Thanks anyways.

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:

— NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! Details at To unsubscribe, visit the List Server section of OSR Online at</http:></http:></http:>

Jamey_Kirby · June 5, 2017, 11:12am

I am attempting to achieve experimentation. Sometimes I do things just so I
know the answer for myself.
It only took me a day to code all this up in an existing driver. I’m
semi-retired, so I have a little leisure time to do these sorts of things.

On Mon, Jun 5, 2017 at 11:08 AM Don Burn wrote:

> Jamey,
>
> For some specific example your test is valid. I’ve done these
> experiments myself and depending on what you are attempting to achieve,
> inverted call is just as fast. There is no “one right way” they can all
> be valid depending on the circumstances.
>
> What people should recognizes is that premature optimization is
> never a good idea. Inverted call can be done simply in KMDF, it can then
> be optimized if needed in many ways. What I find discouraging are the
> number of complex interfaces out there, that once you test the product
> could have been done simpler. It is better to use a simple approach, and
> then optimize to the level of performance you need once you know everything
> is working, than to add complexity to a driver until it is proven it is
> needed.
>
>
> Don Burn
> Windows Driver Consulting
> Website: http://www.windrvr.com
>
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:
> xxxxx@lists.osr.com] On Behalf Of Jamey Kirby
> Sent: Monday, June 05, 2017 10:57 AM
> To: Windows System Software Devs Interest List
> Subject: Re: [ntdev] ZwMapViewOfSection’s BaseAddress is unaccessable
>
> It’s faster using a dedicated kernel thread attached to the application.
> I’ve tested inverted IOCTL, shared sections, and kernel thread attach.
> Kernel thread attach is faster. Inverted call does make life easier. You
> see, it was that experimentation that lead me to the results that it is
> faster. So people can talk until they are blue in the face as far as I am
> concerned. I know the truth because I tried it.
>
>
> On Mon, Jun 5, 2017 at 4:29 AM > xxxxx@gmail.com> > wrote:
>
>
> Ok I concede, I’ll try implementing inverted calls and come back
> if it doesn’t work.
>
> Thanks anyways.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals
> and software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
>
>
> — NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars
> on crash dump analysis, WDF, Windows internals and software drivers!
> Details at To unsubscribe, visit the List Server section of OSR Online at
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:></http:>

Jamey_Kirby · June 5, 2017, 2:07pm

Inverted IOCTL requires you to send the IOCTL between each request. This
involves checking and validating the input address each time. Even with a
pool of IRPs, you still have to continually send down new IRPS and handle
address safety.

If the IOCTL is using buffered method, you are limited to 64K per transfer.
If you have any other data that needs to be in the buffer, your transfer
buffer could end up being only 32K.

If you use neither method, then were back to sharing user-mode memory, and
requires all of the overhead of validating and setting up the buffer each
time the IRP is sent down.

With my method, the buffer is sent down once, and validated in the IOCTL (I
moved my probe out of my loop as it was overkill).

So there is some overhead with the inverted IOCTL method that is not
present in the shared buffer model.

I added code to track SRB process timing using
StorPortNotification(IoTargetRequestServiceTime, …), but I don’t yet know
how to query those results. I may add my own profiling.

I posted a request to this list asking for some information on using the
IoTargetRequestServiceTime, but I’ve gotten no responses.

– Jamey

On Mon, Jun 5, 2017 at 11:11 AM Jamey Kirby wrote:

> I am attempting to achieve experimentation. Sometimes I do things just so
> I know the answer for myself.
> It only took me a day to code all this up in an existing driver. I’m
> semi-retired, so I have a little leisure time to do these sorts of things.
>
>
> On Mon, Jun 5, 2017 at 11:08 AM Don Burn wrote:
>
>> Jamey,
>>
>> For some specific example your test is valid. I’ve done these
>> experiments myself and depending on what you are attempting to achieve,
>> inverted call is just as fast. There is no “one right way” they can all
>> be valid depending on the circumstances.
>>
>> What people should recognizes is that premature optimization is
>> never a good idea. Inverted call can be done simply in KMDF, it can then
>> be optimized if needed in many ways. What I find discouraging are the
>> number of complex interfaces out there, that once you test the product
>> could have been done simpler. It is better to use a simple approach, and
>> then optimize to the level of performance you need once you know everything
>> is working, than to add complexity to a driver until it is proven it is
>> needed.
>>
>>
>> Don Burn
>> Windows Driver Consulting
>> Website: http://www.windrvr.com
>>
>>
>>
>> -----Original Message-----
>> From: xxxxx@lists.osr.com [mailto:
>> xxxxx@lists.osr.com] On Behalf Of Jamey Kirby
>> Sent: Monday, June 05, 2017 10:57 AM
>> To: Windows System Software Devs Interest List
>> Subject: Re: [ntdev] ZwMapViewOfSection’s BaseAddress is unaccessable
>>
>> It’s faster using a dedicated kernel thread attached to the application.
>> I’ve tested inverted IOCTL, shared sections, and kernel thread attach.
>> Kernel thread attach is faster. Inverted call does make life easier. You
>> see, it was that experimentation that lead me to the results that it is
>> faster. So people can talk until they are blue in the face as far as I am
>> concerned. I know the truth because I tried it.
>>
>>
>> On Mon, Jun 5, 2017 at 4:29 AM >> xxxxx@gmail.com> > wrote:
>>
>>
>> Ok I concede, I’ll try implementing inverted calls and come back
>> if it doesn’t work.
>>
>> Thanks anyways.
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: <
>> http://www.osronline.com/showlists.cfm?list=ntdev>
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals
>> and software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer>
>>
>>
>> — NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars
>> on crash dump analysis, WDF, Windows internals and software drivers!
>> Details at To unsubscribe, visit the List Server section of OSR Online at
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: <
>> http://www.osronline.com/showlists.cfm?list=ntdev>
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>> software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer>
>>
></http:></http:>