Re: [ntdev] Storport PDO I/O thread performance question

MBond · April 2, 2015, 7:45am

The part of your description that sounds worrying is 'the thread’. I don’t know anything about your design, but a single thread will always have performance limitations.

Think about your numbers for a minute. You say 20,000 IOPS, so on a single thread that means 0.000,05 seconds per on average. Not too bad for a design that requires at least two context switches and interactions with the scheduler plus allocation & free on each request

Your debug versus release performance is indicative of the fact that the limitation is no in the execution of your code, but in the thread signaling and sync. You get a bigger difference for larger blocks because of more efficient memory copy, but it does nothing to overcome your fundamental limits

Sent from Surface Pro

From: xxxxx@gmail.com
Sent: ‎Thursday‎, ‎April‎ ‎02‎, ‎2015 ‎5‎:‎21‎ ‎AM
To: Windows System Software Devs Interest List

In order to test performance of my storport driver, I created a simple RAMdisk plugin for it.
Then I ran ATTO Disk Benchmark on the resulting drive. Results were surprisingly low:
80 MB/s @ 4KB block size
160 MB/s @ 8KB block size
…
1.7GB/s @ 128KB block size

Basically, transfer speed roughly doubles for each doubling of transfer size.
That means my current design can only do about 20000 transactions per second.

I’m using LIST_ENTRY (doubly linked list) to store event log, KEVENT to signal the thread and KSPIN_LOCK to synchronise access to the list.

There’s not much other code in there. After getting event signal, the thread only calls these:
ExInterlockedRemoveHeadList (get one request to process)
CONTAINING_RECORD (get the Irp from list entry)
IoGetCurrentIrpStackLocation
MmGetSystemAddressForMdlSafe (get the buffer)
RtlCopyMemory (copy from memory to buffer)

Note: Compiling in release mode doesn’t affect performance more than 5%@ 4KB block size. Interestingly, @128KB block size, speed increase is 15% ??

Is this (20K IO/s) really top performance I can get out of this implementation? Should I move to direct request handling if I want more?

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

MBond · April 3, 2015, 8:07am

First read up on IRQL

http://blogs.msdn.com/b/doronh/archive/2010/02/02/what-is-irql.aspx

Also, some further suggestions

testing performance in with non-optimized code (debug build) is a waste of time.

also test your performance without doing anything in your read and write routines. Your read / write isn’t getting any useful data anyway

Are you using a tool or your own code to send the IO from UM? If your own code, make sure you used overlapped IO and send multiple concurrent requests. If a tool, which one?

Max is suggesting that you implement Fast IO. Fast IO is an optional interface for completing IO without the overhead of an IRP. This is only possible in specific circumstances (like your where you have no hardware and need no locks), and when it isn’t an IRP will be used

Sent from Surface Pro

From: xxxxx@gmail.com
Sent: ‎Friday‎, ‎April‎ ‎03‎, ‎2015 ‎3‎:‎48‎ ‎AM
To: Windows System Software Devs Interest List

Previous post doesn’t account for Maxim’s suggestions, I was typing this for much too long

@Maxim:
In what way does StartIo behave differently than IRP_MJ_READ / WRITE and IRP_MJ_SCSI ? I’m already have trouble with those, now a third I/O concept? Is there a fourth?
Why does driver framework enable so many different code paths for the same functionality? Is there documentation describing differences among them?

Is there a difference between memcpy and RtlCopyMemory? Also, what DISPATCH?

Finally, I don’t understand anything about your last paragraph:

>And yes, allocate the memory for your disk using MmAllocatePagesForMdl and map then (if the disk is not large) immediately.
The only way I “know” how to work with MDLs is through Irp.MdlAddress. Isn’t that IRP specific? How can I allocate memory using that?
How do you map this immediately? Currently I’m mapping the buffer that came with IRP, not my disk buffer.

Thanks

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer