User Mode events mapped to kernel vs inverted call

wd1 · February 10, 2015, 8:18am

Hi All. I’m trying to understand / improve / fix an issue I’m having with a driver that signals a UM thread via several events created in user space and mapped as the result of an IOCTL via ObReferenceObjectByHandle(). The UM code has a dedicated thread that loops calling WaitForMultipleObjects() using the mapped events. Periodically, the driver performs some actions at IRQL_DISPATCH (on a DPC) then calls KeSetEvent() setting one of the events, signaling the UM code that it should wake up and do something. About 99% of the time this all works perfectly and the DPC takes about 40uS to run. The other 1% of time the DPC is taking 60 - 600+ uS to execute. I suspect that the variable delay may be a result of the kernel waiting for the system-wide dispatcher lock but have no real data to back that claim up.

It was suggested on another forum that I abandon using events and move to an inverted call model with overlapped IO.
Will this improve or change the situation? Doesn’t the same dispatcher lock have to be acquired wether the dispatch happens directly from KeSetEvent() vs as an artifact of an IO completion, i.e. and event buried in an overlapped IO structure?

Thanks,
Wade.

Maxim_S_Shatskih · February 10, 2015, 8:43am

> It was suggested on another forum that I abandon using events and move to an inverted call model

with overlapped IO.
Will this improve or change the situation?

No it will not. Inverted call is the same perf but simpler and cleaner code.

The delays you experience are probably inevitable.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Mark_Roddy · February 10, 2015, 8:52am

Are you taking about the time it takes for your thread to run after your
dpc signals the thread, or the time it takes for your dpc to run? There is
essentially nothing you can do about either case :-). Not a realtime OS.
Your dpc can be blocked behind another dpc and or delayed by interrupt
processing. Thread scheduling is also indeterminate. Changing from an event
driven model to an inverted call model will not eliminate thread scheduling
delay.

Mark Roddy

On Tue, Feb 10, 2015 at 8:17 AM, wrote:

> Hi All. I’m trying to understand / improve / fix an issue I’m having with
> a driver that signals a UM thread via several events created in user space
> and mapped as the result of an IOCTL via ObReferenceObjectByHandle(). The
> UM code has a dedicated thread that loops calling WaitForMultipleObjects()
> using the mapped events. Periodically, the driver performs some actions at
> IRQL_DISPATCH (on a DPC) then calls KeSetEvent() setting one of the
> events, signaling the UM code that it should wake up and do something.
> About 99% of the time this all works perfectly and the DPC takes about 40uS
> to run. The other 1% of time the DPC is taking 60 - 600+ uS to execute. I
> suspect that the variable delay may be a result of the kernel waiting for
> the system-wide dispatcher lock but have no real data to back that claim up.
>
> It was suggested on another forum that I abandon using events and move to
> an inverted call model with overlapped IO.
> Will this improve or change the situation? Doesn’t the same dispatcher
> lock have to be acquired wether the dispatch happens directly from
> KeSetEvent() vs as an artifact of an IO completion, i.e. and event buried
> in an overlapped IO structure?
>
> Thanks,
> Wade.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

wd1 · February 10, 2015, 9:13am

Hi Mark. Thanks for your reply.
No, not DPC delay - I understand the difference. I am measuring only the amount of time that the call to KeSetevent() is taking once in the DPC. Is there a way to at least profile why the system is taking up to 600uS to acquire the dispatcher lock? If I can identify the cause, perhaps I can at least provide a workaround. A delay of 200uS is acceptable, but 600+ is problematic for us.

Mark_Roddy · February 10, 2015, 12:01pm

The windows performance tools are pretty good, if a bit on the gigantic
learning curve side.

http://www.microsoft.com/en-us/download/details.aspx?id=39982

Mark Roddy

On Tue, Feb 10, 2015 at 9:12 AM, wrote:

> Hi Mark. Thanks for your reply.
> No, not DPC delay - I understand the difference. I am measuring only the
> amount of time that the call to KeSetevent() is taking once in the DPC. Is
> there a way to at least profile why the system is taking up to 600uS to
> acquire the dispatcher lock? If I can identify the cause, perhaps I can at
> least provide a workaround. A delay of 200uS is acceptable, but 600+ is
> problematic for us.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Alex_Grig · February 11, 2015, 1:21pm

See if when you run the kernel in uniprocessor mode, KeSetEvent takes consistently little time, or takes excessive time more often. If it takes little time, then 600 us delay with milti-processor happens because of having to issue an IPC or corral processors. If it happens more often, it’s because of some ISR pre-empting you. Windows 7 doesn’t take a dispatcher lock anymore.