Mapping kernel memory or using read/write/ioctl for real time application

OSR_Community_User · September 16, 2009, 1:30am

Guys, since i am new in driver programming, i want to ask about the performance of using read/write/ioctl. Now, i am developing a pci driver for a real time application. The application above the driver will transfer the data within every milisecond in average. I read that mapping kernel memory to user mode is something that should be avoided (as far as my understanding) because it will affect the security and reliability. So, we should use standar read/write/ioctl that is provided. But, how is the perfomance of doing this? Is it fast enough because the application always callling winapi and?triggering IRP’s? created within milisecond? Or maybe there?are any alternatives for this case. Appreciate your?opinion.
?
Regards,
Sofian?

Maxim_S_Shatskih · September 16, 2009, 7:42am

>perfomance of doing this? Is it fast enough because the application always callling winapi and

triggering IRP’s created within milisecond?

Normal. Millisecond is lots of time.

Also, the app can do buffering and send 10 times more data once per 10ms.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Don_Burn_1 · September 16, 2009, 7:57am

Right now I have a stack of drivers for a client that is sending messages on
real hardware at a rate of close to 600,000 messages per second. This is an
IOCTL per message going down a stack of three drivers and hitting the
hardware. These are KMDF drivers though the path in question is FAST I/O.
You should have little problem making 1000 requests a second.

–
Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
>perfomance of doing this? Is it fast enough because the application always
>callling winapi and
>triggering IRP’s created within milisecond?

Normal. Millisecond is lots of time.

Also, the app can do buffering and send 10 times more data once per 10ms.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Information from ESET NOD32 Antivirus, version of virus signature
database 4429 (20090916)

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Information from ESET NOD32 Antivirus, version of virus signature database 4429 (20090916)

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Tim_Bergel · September 16, 2009, 8:17am

If you still want to speed things up despite all the other comments indicating that things will work file at 1ms, you could do what we do:

Basically, we don’t use Read and Write dispatch routines but handle everything via DIOC calls (you don’t say if you need to make use of read and write so that things can look like a file or whatever).

One DIOC specifies a user memory buffer that is to be used. The driver generates an MDL with IoAllocateMdl and then uses MMProbeAndLockPages to lock down the memory referenced by the MDL. I think VirtualLock is also used by the user mode core beforehand.

A separate DIOC simply triggers a transfer into a section of the user buffer, the driver signals that the data is available using an event. There are more complications but that gives you the general idea.

This avoids any memory buffer setup overhead per transfer and, I believe, avoids any data copying - the lower level driver will be transferring data directly into user memory.

David_R_Cattley · September 16, 2009, 8:29am

A separate DIOC simply triggers a transfer into a section of the user
buffer, the driver signals that the data is available using an event. There
are more complications but that gives you the general idea.

This works particularly well when the ‘triggering’ IOCTL request is simply
pended until the transfer is complete. This allows the application to
leverage I/O completion ports and scales very nicely. Allocating, sharing,
and managing ‘door-bell’ events between UM and KM is not necessarily the
only way to handle synchronization of a mapped shared buffer.

Cheers,
Dave Cattley

Tim_Bergel · September 16, 2009, 9:10am

Dave wrote:

This works particularly well when the ‘triggering’ IOCTL request is simply pended until the
transfer is complete. This allows the application to leverage I/O completion ports and scales
very nicely. Allocating, sharing, and managing ‘door-bell’ events between UM and KM is not
necessarily the only way to handle synchronization of a mapped shared buffer.

That is, of course, absolutely true and rather neat. I was just giving details of what we had done in this situation (where for complex reasons pending the IOCTL is not on).

Gary_Little-3 · September 16, 2009, 9:47am

One of my first NT drivers was for a RS-485 line running 500Kbps. Once the
synchronization and other “first timer” “oh craps!” had been handled the
driver never lost a bit. Internally buffer the data per interrupt in the
driver and deliver it to the user when an IRP is received or pend multiple
IRPs from the user and consume them as data comes is received. Lots of ways
to do it.

The personal opinion of

Gary G. Little

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of sahrizal sofian
Sent: Wednesday, September 16, 2009 12:30 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Mapping kernel memory or using read/write/ioctl for real
time application

Guys, since i am new in driver programming, i want to ask about the
performance of using read/write/ioctl. Now, i am developing a pci driver for
a real time application. The application above the driver will transfer the
data within every milisecond in average. I read that mapping kernel memory
to user mode is something that should be avoided (as far as my
understanding) because it will affect the security and reliability. So, we
should use standar read/write/ioctl that is provided. But, how is the
perfomance of doing this? Is it fast enough because the application always
callling winapi and triggering IRP’s created within milisecond? Or maybe
there are any alternatives for this case. Appreciate your opinion.

Regards,

Sofian

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the
List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

__________ Information from ESET Smart Security, version of virus signature
database 4428 (20090916) __________

The message was checked by ESET Smart Security.

http://www.eset.com

Igor_Sharovar · September 16, 2009, 10:25am

You should take in account different considerations when you evaluate your speed.
On a speed of transferring data involve following factors:
-design of your hardware.PCI or PCI express. If PCIe what kind speed.
-DMA capabilities. I suppose your hardware should have such stuff
-also you need to remember that Windows is not real time system and you may get different result of transferring data from kernel stuff to your application.

Igor Sharovar

Maxim_S_Shatskih · September 16, 2009, 6:33pm

>I think VirtualLock is also used by the user mode core beforehand.

This looks redundant.

Also, the driver does not need to call MmProbeAndLockPages, IO manager can do it for the driver.

A separate DIOC simply triggers a transfer into a section of the user buffer, the driver signals that the
data is available using an event. There are more complications but that gives you the general idea.

Or even faster:

the buffer has producer and consumer pointers, existing as user-mode and kernel-mode version
when the driver needs to output a data, it writes and advances the kmode version of the producer pointer
the app processes the data and advances the umode version of the consumer pointer
at some rate (not necessary very often), the app calls the “exchange pointers” IOCTL, which passes the umode version of consumer to the driver, and extracts the driver’s version of the producer.
after this, the app sees the advanced producer and knows that it has more data to process, and the driver sees the advanced consumer and knows that it has more buffer space.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

anshul_makkar-3 · September 17, 2009, 2:37am

Why don’t you consider sharing memory between usermode and kernel mode using
section objects.

I used the shared memory object to share desktop’s content between user mode
application and kernel mode driver. (and desktop’s content refereshes at
vey fast rate).

On Thu, Sep 17, 2009 at 4:02 AM, Maxim S. Shatskih
wrote:

> >I think VirtualLock is also used by the user mode core beforehand.
>
> This looks redundant.
>
> Also, the driver does not need to call MmProbeAndLockPages, IO manager can
> do it for the driver.
>
> > A separate DIOC simply triggers a transfer into a section of the user
> buffer, the driver signals that the
> >data is available using an event. There are more complications but that
> gives you the general idea.
>
> Or even faster:
>
> - the buffer has producer and consumer pointers, existing as user-mode and
> kernel-mode version
> - when the driver needs to output a data, it writes and advances the kmode
> version of the producer pointer
> - the app processes the data and advances the umode version of the consumer
> pointer
> - at some rate (not necessary very often), the app calls the “exchange
> pointers” IOCTL, which passes the umode version of consumer to the driver,
> and extracts the driver’s version of the producer.
> - after this, the app sees the advanced producer and knows that it has more
> data to process, and the driver sees the advanced consumer and knows that it
> has more buffer space.
>
> –
> Maxim S. Shatskih
> Windows DDK MVP
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Doron_Holan · September 17, 2009, 3:03am

Sharing memory should be considered only if the io path is too slow. Measure the perf of your different options, starting from simple (IRPs) to the more complex (shared mem)

d

Sent from my phone with no t9, all spilling mistakes are not intentional.

From: anshul makkar
Sent: Wednesday, September 16, 2009 11:37 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Mapping kernel memory or using read/write/ioctl for real time application

Why don’t you consider sharing memory between usermode and kernel mode using section objects.

I used the shared memory object to share desktop’s content between user mode application and kernel mode driver. (and desktop’s content refereshes at vey fast rate).

On Thu, Sep 17, 2009 at 4:02 AM, Maxim S. Shatskih > wrote:
>I think VirtualLock is also used by the user mode core beforehand.

This looks redundant.

Also, the driver does not need to call MmProbeAndLockPages, IO manager can do it for the driver.

> A separate DIOC simply triggers a transfer into a section of the user buffer, the driver signals that the
>data is available using an event. There are more complications but that gives you the general idea.

Or even faster:

- the buffer has producer and consumer pointers, existing as user-mode and kernel-mode version
- when the driver needs to output a data, it writes and advances the kmode version of the producer pointer
- the app processes the data and advances the umode version of the consumer pointer
- at some rate (not necessary very often), the app calls the “exchange pointers” IOCTL, which passes the umode version of consumer to the driver, and extracts the driver’s version of the producer.
- after this, the app sees the advanced producer and knows that it has more data to process, and the driver sees the advanced consumer and knows that it has more buffer space.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com mailto:xxxxx
http://www.storagecraft.com http:</http:>

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx>

anton_bassov · September 18, 2009, 12:23am

>>I think VirtualLock is also used by the user mode core beforehand.

This looks redundant. Also, the driver does not need to call MmProbeAndLockPages,
IO manager can do it for the driver.

Actually, either of the above steps combined with sharing a buffer may be quite useful if we are speaking about large amounts of data involved, particularly if the transfer rate is high. In such situation you would not want to probe the buffer every time you make a transfer, would you. Instead, you would allocate a buffer and
lock it, so that all subsequent operations on it are guaranteed not to cause page faults. Concerning synchronizing an access to this buffer, you can use inverted call (although some additional pains associated with sharing a buffer will still remain)…

Anton Bassov