IRQL changes to 2 upon IOCTL request from user level driver

If the app sends an IOCTL from two threads at the same time, MyIoctlHandler will be invoked concurrently on both threads. if MyQueue is a sequential queue, you will only be called for one request at a time

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Thursday, August 30, 2012 12:46 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IRQL changes to 2 upon IOCTL request from user level driver

Dear Members,

When I tried to set SynchronizationScope = WdfSynchronizationScopeNone the IRQL was 0 upon IOCTL code.

According to the documentation:
The framework does not synchronize the object’s event callback functions, so the callback functions might run concurrently on a multiprocessor system

My queue has only one callback function: MyQueue.EvtIoDeviceControl = MyIoctlHandler All other callbacks are null.

In this case, shoud I have a problem using WdfSynchronizationScopeNone ?

Thanks,
Zvika


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

xxxxx@gmail.com wrote:

When I tried to set SynchronizationScope = WdfSynchronizationScopeNone the IRQL was 0 upon IOCTL code.

According to the documentation:
The framework does not synchronize the object’s event callback functions, so the callback functions might run concurrently on a multiprocessor system

My queue has only one callback function: MyQueue.EvtIoDeviceControl = MyIoctlHandler
All other callbacks are null.

In this case, shoud I have a problem using WdfSynchronizationScopeNone ?

Look, NO ONE can answer this question for you. There is no single
setting that can magically handle all your potential synchronization
problems. YOU have to understand how your driver uses its data and its
hardware. If you have sections of code that would dangerous if run in
two (or three or eight) processors at exactly the same time, then you
need to protect that section. You either do that yourself or with the
framework’s help. If you don’t have any sections like that, then you
don’t need synchronization.

And, by the way, this situation is not unique to drivers. This
situation is exactly the same in a multithreaded application. You
need to be able to identify potential synchronization problems to be a
21st Century programmer.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> You need to be able to identify potential synchronization problems to be a 21st Century programmer.

Actually, the current trend seems to be exactly the opposite, i.e. designing languages (like, for example, Google’s Go) that hide all concurrency-related stuff behind the language constructs, instead of directly exposing it to a programmer. Needless to say that these high-level toy languages are going to be much more popular, compared to C. Therefore, it looks like the proportion of programmers who understand the synch-related issues is going to be less and less and less - it looks like in not-so-distant future this kind of knowledge will be limited only to those who deal with OS and compiler design…

Anton Bassov

Actually, you should never protect code from being concurrently executed.
What you must do is prevent data from being modified concurrently. If you
lock code, then the code cannot be executed even if it is working on
thread-unique data. There is NEVER a conflict when the same code is
executed on multiple cores; there is only a conflict when multiple cores
try to access the same DATA, even if from different sections of code.

The rule I used to teach was “lock the smallest amount of data for the
shortest possible time”. This is a good touchstone for design.
joe

xxxxx@gmail.com wrote:
> When I tried to set SynchronizationScope = WdfSynchronizationScopeNone
> the IRQL was 0 upon IOCTL code.
>
> According to the documentation:
> The framework does not synchronize the object’s event callback
> functions, so the callback functions might run concurrently on a
> multiprocessor system
>
> My queue has only one callback function: MyQueue.EvtIoDeviceControl =
> MyIoctlHandler
> All other callbacks are null.
>
> In this case, shoud I have a problem using WdfSynchronizationScopeNone ?

Look, NO ONE can answer this question for you. There is no single
setting that can magically handle all your potential synchronization
problems. YOU have to understand how your driver uses its data and its
hardware. If you have sections of code that would dangerous if run in
two (or three or eight) processors at exactly the same time, then you
need to protect that section. You either do that yourself or with the
framework’s help. If you don’t have any sections like that, then you
don’t need synchronization.

And, by the way, this situation is not unique to drivers. This
situation is exactly the same in a multithreaded application. You
need to be able to identify potential synchronization problems to be a
21st Century programmer.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

The problem with synchronization “magic” is that unless you have a program
that can analyze it, the possibilities of deadlock are extensive and
incomprehensible.

You might check out Dean Sutherland’s PhD dissertation from Carnegie
Mellon University; his analysis program identified massive synchronization
problems in Java code that was either thought to be perfect or known to
deadlock.
joe

> You need to be able to identify potential synchronization problems to
be a 21st Century programmer.

Actually, the current trend seems to be exactly the opposite, i.e.
designing languages (like, for example, Google’s Go) that hide all
concurrency-related stuff behind the language constructs, instead of
directly exposing it to a programmer. Needless to say that these
high-level toy languages are going to be much more popular, compared to C.
Therefore, it looks like the proportion of programmers who understand the
synch-related issues is going to be less and less and less - it looks
like in not-so-distant future this kind of knowledge will be limited only
to those who deal with OS and compiler design…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>The problem with synchronization “magic” is that unless you have a program

that can analyze it, the possibilities of deadlock are extensive and
incomprehensible.

It looks like they don’t teach it anymore. Even Microsoft employs people that don’t know better than to wait for a thread exit in DLL_DETACH_THREAD handler (see jscript9.dll and ieui.dll). I had Internet Explorer hanging up on me, and in a debugger I see there is a thread hang in __exitthread at WaitForSingleObject… The saddest part of it is nobody seems to ever review code written by low-level drones at Microsoft…

And the rule I currently teach is “Make it work first, then make it work fast.” That usually means starting with the “big lock model” and, at some point in the future if performance deficits become evident, decomposing the “big lock” into smaller locks that are used by things that actually need to run in parallel. This helps ensure functional correctness, timely code completion, and avoids unnecessary optimizations. In 8 out of 10 cases the “big lock model” is all anyone ever needs.

BTW: Apparently, this is “the thread that won’t die”…

Zzzzzzzz,

Peter
OSR

One of the problems we found with the “big lock” model was that unless the
performance problems are over-the-top, it never gets split, even if
performance is bad. There’s always something more critical to do. Then,
when splitting it up, the deadlock problem creeps in, but the incremental
decomposition into locks means that it is harder to look at the overall
design and see what is messed up.

The problem with the 8 out of 10 is that 9 really, really, needed the lock
split, but nobody could be broken loose to do it, and the cost of
regression testing is too high, and it doesn’t matter that it screws up
the realtime audio, because the end user shouldn’t be trying to do
real-time audio when our device is being used, and we’ll fix it in some
future release, probably.

I’ve been in the middle of too many of these discussions in the last 40
years.
joe

And the rule I currently teach is “Make it work first, then make it work
fast.” That usually means starting with the “big lock model” and, at some
point in the future if performance deficits become evident, decomposing
the “big lock” into smaller locks that are used by things that actually
need to run in parallel. This helps ensure functional correctness, timely
code completion, and avoids unnecessary optimizations. In 8 out of 10
cases the “big lock model” is all anyone ever needs.

BTW: Apparently, this is “the thread that won’t die”…

Zzzzzzzz,

Peter
OSR


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I agree. That’s exactly what happens in the real world.

I look at a different way, however: In my world, what I hear you saying is “unless the performance problems are bad enough that somebody NEEDS to fix them, they don’t get fixed.”

To me, that’s precisely as it should be. It’s great, in fact. It’s a self-correcting model.

PLUS, it’s better to have a simple-to-maintain Big Lock implementation than a complex (perhaps NEEDlessly complex) mini-lock model, unless that mini-lock model is ABSOLUTELY necessary to achieve the requisite performance.

How many times does one design and implement the complex mini-lock model only to not REALLY need it? This results in wasted time, effort, testing, and introduces the risk of maintenance problems down the road… all when you could have had a nice simple Big Lock model and all would be well and easily maintained. Even IF the end-result is performance that’s sub-optimal.

What I hear you saying HERE is “if somebody doesn’t understand the overall design/implementation of the solution, it’s hard to decompose the Big Lock into multiple locks.”

Agreed! Unless you understand the overall design/implementation of the solution, it’s bloody unlikely that you’ll be able to figure out what the performance problem is and designing the smaller locking strategy will be difficult. So, to me again, this is precisely as it should be. If the performance problem is significant enough, somebody with the requisite chops will have to fix it. If not… no effort needed, as we previously determined.

PLUS… don’t forget: The Big Lock model is easy to understand and much easier to implement and test in the first place. Which decreases the development cycle time and increases the likelihood that the implementation will be correct, leading to shorter time-to-market.

In my experience, premature optimization is *never* a good thing. At best, it leads to designs based on guesswork. At worst, it wastes time, increases complexity, and solves perceived problems that never needed to be solved in the first place.

So… “Make it work first, then make it work fast.”

Peter
OSR

xxxxx@osr.com” wrote in message news:xxxxx@ntdev:

>
> In my experience, premature optimization is never a good thing. At best, it leads to designs based on guesswork. At worst, it wastes time, increases complexity, and solves perceived problems that never needed to be solved in the first place.
>
> So… “Make it work first, then make it work fast.”
>

AMEN!!!

I have fixed performance multiple drivers over the years and the ones
with premature optimization are always the worst. You get folks
believing that the only reason that X does not have a performance
problem is “we designed from the beginning to handle X”. When I
radically propose writing a simple driver, then profiling it to find the
problems, they object. If I do win the battle I have never had what
they optimized for come out in the top 5 of the performance problems the
driver has. And even if it is a performance problem, their solution is
not a good one once all the bigger performance bottlenecks are
addressed.

I keep wishing Microsoft would add some tracing capabilities to Windows
so that lock latency could be measured per lock. There are ways to do
this with WPP tracing but it is a pain in the ass to do it explicitly
for a driver. I suspect that if we had the capability a lot of the “I
need a gazillion locks to make windows fast” would be debunked.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

>In my experience, premature optimization is *never* a good thing. At best, it leads to designs based on guesswork.

“Big lock” could also be considered "premature optimization. Optimization of development time, that is.

Sounds like a product development opportunity, Don :slight_smile:

Peter
OSR

>I keep wishing Microsoft would add some tracing capabilities to Windows so that lock latency could be measured per lock.

With some #define wizardry, you can redefine KSPIN_LOCK and the corresponding functions for your driver, and collect that information easily, using TSC for timekeeping purposes. Since it’s all at DISPATCH_LEVEL, TSC will be reliable.

Yep you can do it for spinlocks to a point, but when you want to look at
all locks, things get painful quickly.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

xxxxx@broadcom.com” wrote in message
news:xxxxx@ntdev:

> >I keep wishing Microsoft would add some tracing capabilities to Windows so that lock latency could be measured per lock.
>
> With some #define wizardry, you can redefine KSPIN_LOCK and the corresponding functions for your driver, and collect that information easily, using TSC for timekeeping purposes. Since it’s all at DISPATCH_LEVEL, TSC will be reliable.

On 04-Sep-2012 21:06, xxxxx@broadcom.com wrote:

> I keep wishing Microsoft would add some tracing capabilities to Windows so that lock latency could be measured per lock.

With some #define wizardry, you can redefine KSPIN_LOCK and the corresponding functions for your driver, and collect that information easily, using TSC for timekeeping purposes. Since it’s all at DISPATCH_LEVEL, TSC will be reliable.

Maybe it is possible to (ab)use the support that exists for hypervisors?
IIRC enlightened guests do not spin on a spinlock but yield to the host
instead.

– pa

> enlightened guests do not spin on a spinlock but yield to the host instead.

That’s what I was telling my date, but no, she was such an un-yeliding beotch.