Shared memory between kernel and user mode

Maxim_S_Shatskih · March 31, 2015, 3:32pm

> IOCTL_SCSI_MINIPORT can do that, but it’s buffered.

IOCTL_SCSI_PASS_THROUGH_DIRECT is direct

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Tim_Roberts · March 31, 2015, 5:00pm

Maxim S. Shatskih wrote:

> IOCTL_SCSI_MINIPORT can do that, but it’s buffered.

IOCTL_SCSI_PASS_THROUGH_DIRECT is direct

Ah, yes, I let myself get fooled by the ioctl definition:

\DDK\7600\inc\api\ntddscsi.h:#define IOCTL_SCSI_PASS_THROUGH
CTL_CODE(IOCTL_SCSI_BASE, 0x0401, METHOD_BUFFERED, FILE_READ_ACCESS |
FILE_WRITE_ACCESS)
\DDK\7600\inc\api\ntddscsi.h:#define IOCTL_SCSI_PASS_THROUGH_DIRECT
CTL_CODE(IOCTL_SCSI_BASE, 0x0405, METHOD_BUFFERED, FILE_READ_ACCESS |
FILE_WRITE_ACCESS)

But there’s a level of indirection. The buffered structures contain a
pointer to a data buffer, and THAT buffer is mapped direct.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Maxim_S_Shatskih · April 1, 2015, 12:08am

It is hard to do random access over a set of pending buffers

“Marion Bond” wrote in message news:xxxxx@ntdev…
Sequential? What does sequence have to do with it?

Sent from Surface Pro

From: Maxim S. Shatskih
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎1‎:‎04‎ ‎PM
To: Windows System Software Devs Interest List

>But seriously, the better design is for you app to use overlapped IO and pend several buffers.

If memory access is only sequential - then yes.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

—
NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · April 1, 2015, 2:33am

> Trying to force the driver for the convenience of the application is just ass backwards.

Well, when I made a similar statement some time ago PGV said it was the one of the worst statements he had heard in years. IIRC, an app was incapable of processing that data driver was producing because it was processing the whole thing synchronously in context of a single thread so that the driver writer was looking for a way to throttle the data rate by suspending a driver. When I tried to point out that designing a driver that tries to meet the demands of improperly designed app is pretty much the same thing as re-designing electricity supplies in a building in order to be able to use an appliance that requires non-standard voltage, I was accused of “derailing the train of discussion right in the loonie-lland” by TIm…

Anton Bassov

Peter_Viscarola_OSR · April 1, 2015, 8:20am

Well… for some value of “force”…

Trying to DESIGN the driver for the convenience of the application is precisely what good driver developers DO. It’s call “tailoring your upper edge interface to meet functional requirements.” Of course, the edge between “design” and “force” is the subject of many a spirited debate among driver and application developers, consultants and clients.

And that ugly edge is often why it’s best to hide the implementation details from the application developer by a tasty little DLL. InitializeDevice(…), GetDataPointer(…), WaitForNewData(…) or whatever. This decouples the app itself from the mechanics of the interface.

But everyone who’s reading here already knows this.

Ah, Anton! About what you were saying:

Peter
OSR
@OSRDrivers

Don_Burn · April 1, 2015, 9:07am

I fully concur with Peter’s interpretation of “force”, that is why I used
ass backwards, and think it applies to the OP’s efforts! The developer
wants to use MapViewOfFile explicitly to communicate, if he had said “we
have a function MapCommunicationRegion that acts similar to MapViewOfFile,
and we want to extend it to handle the link to the driver”, this would be a
productive discussion, right now it is basically how to do something that is
a poor design and likely produce an unstable system where used.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Wednesday, April 01, 2015 8:21 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Shared memory between kernel and user mode

Well… for some value of “force”…

Trying to DESIGN the driver for the convenience of the application is
precisely what good driver developers DO. It’s call “tailoring your upper
edge interface to meet functional requirements.” Of course, the edge
between “design” and “force” is the subject of many a spirited debate among
driver and application developers, consultants and clients.

And that ugly edge is often why it’s best to hide the implementation details
from the application developer by a tasty little DLL.
InitializeDevice(…), GetDataPointer(…), WaitForNewData(…) or whatever.
This decouples the app itself from the mechanics of the interface.

But everyone who’s reading here already knows this.

Ah, Anton! About what you were saying:

Peter
OSR
@OSRDrivers

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 2, 2015, 11:30am

Guys, thanks for your help. I will definitely change my code and use proposed methods. Just one more question on this topic. Should it better to have contract with driver and constantly supply one/few pended IOCTL to it so driver has a buffer available to write-in or just send first IOCTL, pend it forever and use buffer similar to shared section so both driver and app can use it? Or it really doesn’t matter?

Secondly, I just want to share with community that, for now, I found workaround for my primary approach how to make shared memory work via ZwCreateSection.
Named event should be created in user process before going to driver.
Driver need to create section while in context of user process, save handle1 and pend irp. Later in system thread open that section by name and complete Irp. User process to open section by handle1. This method works for non-admin account without problem. Locking done via InterlockedCompareExchange.

Jamey_Kirby · April 5, 2015, 5:31pm

Here it comes

I have done what you are trying to do and I was successful. It is quite
simple. To get full access to the section, you need to make some
adjustments to your sections DACL (RtlSetDaclSecurityDescriptor()).

I would be glad to take the discussion offline as I will get beat up pretty
bad on this forum for taking you down this road.

ᐧ

On Thu, Apr 2, 2015 at 11:31 AM, wrote:

> Guys, thanks for your help. I will definitely change my code and use
> proposed methods. Just one more question on this topic. Should it better to
> have contract with driver and constantly supply one/few pended IOCTL to it
> so driver has a buffer available to write-in or just send first IOCTL, pend
> it forever and use buffer similar to shared section so both driver and app
> can use it? Or it really doesn’t matter?
>
> Secondly, I just want to share with community that, for now, I found
> workaround for my primary approach how to make shared memory work via
> ZwCreateSection.
> Named event should be created in user process before going to driver.
> Driver need to create section while in context of user process, save
> handle1 and pend irp. Later in system thread open that section by name and
> complete Irp. User process to open section by handle1. This method works
> for non-admin account without problem. Locking done via
> InterlockedCompareExchange.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

–
Jamey Kirby
Disrupting the establishment since 1964

This is a personal email account and as such, emails are not subject to
archiving. Nothing else really matters.

Alexander_Spasov · April 6, 2015, 9:21am

I guess this is a good place to ask this, since it fundamentally is part of the same topic:

What if I want to share a driver-allocated DMA (common, need it for continuous DMA) buffer with a user-mode process (or any driver-allocated buffer, since it’s an MDL anyways)? I’m specifically interested in the way the WaveRT audio model does this, and if it can be considered a “clean” (or reliable) method, i.e not relying on undocumented functionality/etc. I have a use-case for something similar (streaming), but needing a lot more bandwidth, and being custom (i.e. not using any device type-specific driver models).

Also, as a side question, won’t the current continuous common buffer model without kernel mode interference fail when faced with a non cache-coherent architecture? A.K.A. is it even possible to implement WaveRT the way it’s meant to be™ on, say, itanium? Guess not (requiring manual cache flushing), but still curious

Alex

Alex_Grig · April 6, 2015, 12:11pm

>I have done what you are trying to do and I was successful. It is quite
simple. To get full access to the section, you need to make some
adjustments to your sections DACL (RtlSetDaclSecurityDescriptor()).

That’s like answering the question:
“A client has asked me to build and install a custom shelving system. I’m at the point where I need to nail it, but I’m not sure what to use to pound the nails in. Should I use an old shoe or a glass bottle?”

with:

“A had a great success with an old shoe. I filled it with concrete beforehand, and it totally worked. I was successful!”

Tim_Roberts · April 6, 2015, 12:54pm

xxxxx@gmail.com wrote:

What if I want to share a driver-allocated DMA (common, need it for continuous DMA) buffer with a user-mode process (or any driver-allocated buffer, since it’s an MDL anyways)? I’m specifically interested in the way the WaveRT audio model does this, and if it can be considered a “clean” (or reliable) method, i.e not relying on undocumented functionality/etc.

Although it runs in user mode, the Audio Engine is a protected and
trusted process. Users cannot load arbitrary code into that process to
interfere with things. Plus, the exposed interfaces are narrow and
well-define. The only danger if the buffer is overwritten is garbled audio.

Sharing memory is not always a federal crime. You have to do a risk
analysis to figure out the risks and the exposure.

I have a use-case for something similar (streaming), but needing a lot more bandwidth, and being custom (i.e. not using any device type-specific driver models).

How much bandwidth, exactly? As I’ve pointed out before, video capture
is one of the highest bandwidth tasks in most systems, and its driver
model manages to work just fine without sharing memory.

Also, as a side question, won’t the current continuous common buffer model without kernel mode interference fail when faced with a non cache-coherent architecture? A.K.A. is it even possible to implement WaveRT the way it’s meant to be™ on, say, itanium? Guess not (requiring manual cache flushing), but still curious

Cache coherency is mostly an issue with two-way processes. With WaveRT,
the streams flow in one direction. You do have to watch the buffer
pointers, and for that you do have to be careful about flushing.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Alexander_Spasov · April 6, 2015, 2:39pm

>How much bandwidth, exactly? As I’ve pointed out before, video capture
is one of the highest bandwidth tasks in most systems, and its driver
model manages to work just fine without sharing memory.

~5GB/s bidirectional worst case (gigabytes, correct). Achieve-able with method_buffered and etc (memcpy), but you can imagine the CPU overhead

Tim_Roberts · April 6, 2015, 3:07pm

xxxxx@gmail.com wrote:

> How much bandwidth, exactly? As I’ve pointed out before, video capture
> is one of the highest bandwidth tasks in most systems, and its driver
> model manages to work just fine without sharing memory.
~5GB/s bidirectional worst case (gigabytes, correct). Achieve-able with method_buffered and etc (memcpy), but you can imagine the CPU overhead

Probably about 25%.

What are you going to do with all that data? You can’t copy it to disk,
and you won’t have enough PCIExpress bandwidth left to put it anywhere.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Alexander_Spasov · April 7, 2015, 5:44am

>What are you going to do with all that data? You can’t copy it to disk,
and you won’t have enough PCIExpress bandwidth left to put it anywhere.

Not at liberty to say what I’m doing with it
Suffice to say 2 things:
-Doesn’t need to be read from/written to disk
-PCI-E bandwidth is ample - 16GB/s theoretical on an x16 3.0 slot.
-Memory bandwidth might’ve been a concern, but testing says otherwise.

anton_bassov · April 7, 2015, 5:28pm

> "A had a great success with an old shoe. I filled it with concrete

beforehand, and it totally worked. I was successful!"

BTW, you forgot to repeat your mantra about “never running the system with the Admin privilege level” - after all, in order to do something that Jamie had mentioned you need to run as Admin…

Anton Bassov