Question abot mouse service callback routine scenarios

(Sorry, I couldn’t find a better title)

The mouse class service callback routine can transfer more than one mouse packet at a time.
According to Doron Holan, you should “never ever assume you will be called with only one PMOUSE_INPUT_DATA”.

But what are the circumstances that might lead to multiple packets?

I have tried provoking such behaviour by simulating excessive system load without success.
Is it safe to assume that multiple packets will only occur under exceptional resource demand or can there be other, non-critical scenarios?

If a packet is “put back” into the queue and coincidentally a new packet arrives, is a certain order guaranteed (old before new, new before old)?

Furthermore, can I assume the service callback routine will always be called with at least one packet (excluding the last dummy)?

Thanks for your valuable replies!

It depends on the mouse port driver below. In its currently
implementation, mouhid only sends packet at a time, but this could
change in the future. I8042prt (ps2) can send many packets at a time.
Multiple packets can happen under normal circumstances. For instance,
if the device interrupts for one packet and before the DPC for ISR can
run, interrupts for another packet. Why do you care? Are you looking
at how to test your code or to avoid the problem? Actually the easiest
way to test for multiple packet logic is to set *NumberOfPacketsConsumed
to 0 in one call to the ServiceCallback for a ps2 device, move the mouse
and then on the next ServiceCallback you will get more then one packet.

Yes, order is maintained. If you do not consume a packet, it will be
presented back to you in your ServiceCallback later before any other new
packets. I suggest you study the WDK samples for i8042prt and kbdclass.
They show all of this behavior in the very last detail.

You will never get called with no data, why would you be called with no
data? That makes absolutely no sense.

d

-----Original Messag
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hushmail.com
Sent: Friday, June 22, 2007 5:51 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Question abot mouse service callback routine scenarios

(Sorry, I couldn’t find a better title)

The mouse class service callback routine can transfer more than one
mouse packet at a time.
According to Doron Holan, you should “never ever assume you will be
called with only one PMOUSE_INPUT_DATA”.

But what are the circumstances that might lead to multiple packets?

I have tried provoking such behaviour by simulating excessive system
load without success.
Is it safe to assume that multiple packets will only occur under
exceptional resource demand or can there be other, non-critical
scenarios?

If a packet is “put back” into the queue and coincidentally a new packet
arrives, is a certain order guaranteed (old before new, new before old)?

Furthermore, can I assume the service callback routine will always be
called with at least one packet (excluding the last dummy)?

Thanks for your valuable replies!


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Doron Holan wrote:

For instance, if the device interrupts for one packet
and before the DPC for ISR can run,
interrupts for another packet.

PS/2 mouse devices have a maximum sample rate of 200 Hz.
Thus, ISR & DPC have about 5 ms to complete until the next packet could arrive.

Correct me if I’m wrong, but I wouldn’t call that a normal circumstance.

Doron Holan wrote:

Why do you care?

I’m currently in the final optimization stage of development.

My filter driver conditionally timestamps the incoming mouse packets and modifies them in accordance to these timestamps.
I know it’s bad, but I need KeQueryPerformanceCounter for millisecond granularity (the only alternative, ExSetTimerResolution, is even more evil :wink: ).

The naive approach is to walk through the MOUSE_INPUT_DATA array, calling QPC and doing related computations with every iteration again and again.
This is obviously not the most efficient solution for more than one packet, but otherwise I’d have to do heavy branching to avoid superfluous calculations, thus impairing the source code’s readability and exposing myself to possible branch mispredictions.
In other words, I try to keep the execution flow as linear and simple as possible.

Now, I have been thinking: What if multiple packets can only occur under heavy system load anyways?
After all, it’s just a mouse I’m dealing with and in case of an overloaded system, inconsistent mouse feeling would be one of the user’s least worries.

Following that assumption, I’d only process the last packet (leaving the possible other ones untouched) and have a very lean, uncomplicated code, which would furthermore happen to be the most efficient method for the most likely scenario of only one packet at a time.

Therefore, I was also asking for the packet order and the guarantee of at least one mouse packet in the service callback routine (InputDataEnd - 1 could otherwise point to invalid memory).

Ironically, all this doesn’t make much of a performance difference in the real world and is more of a perfectionist ambition.
It’s all about balancing pros and cons, which I am not the very best at.

I believe you are wrong ;). Remember, DPC dequeueing is not
deterministic and the mouse DPC could be stuck behidng many other DPCs
on the machine (trust me, I have seen this happen). For instance, under
a high network load, the NICs DPC can squeeze out the mouse DPC.

Ironically, all this doesn’t make much of a performance difference in
the real world and is more of a ?>
perfectionist ambition.
It’s all about balancing pros and cons, which I am not the very best
at.

You kind of made my point for me, you are overoptimizing for a path that
does not require such optimization. There is a lot more processing that
occurs after you let the mouse packet go up to the RIT, you are probably
just noise in the overall time it takes to convert a mouse hw event into
a windows message. Instead of calling QPC for every packet, call it
once at the top of the function and use the same timestamp for each
packet. The counter is most likely not going to change while your
routine is executing, and if it does, the change is so slight it is not
worth the extra overhead.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hushmail.com
Sent: Saturday, June 23, 2007 9:50 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Question abot mouse service callback routine
scenarios

Doron Holan wrote:

For instance, if the device interrupts for one packet
and before the DPC for ISR can run,
interrupts for another packet.

PS/2 mouse devices have a maximum sample rate of 200 Hz.
Thus, ISR & DPC have about 5 ms to complete until the next packet could
arrive.

Correct me if I’m wrong, but I wouldn’t call that a normal circumstance.

Doron Holan wrote:

Why do you care?

I’m currently in the final optimization stage of development.

My filter driver conditionally timestamps the incoming mouse packets and
modifies them in accordance to these timestamps.
I know it’s bad, but I need KeQueryPerformanceCounter for millisecond
granularity (the only alternative, ExSetTimerResolution, is even more
evil :wink: ).

The naive approach is to walk through the MOUSE_INPUT_DATA array,
calling QPC and doing related computations with every iteration again
and again.
This is obviously not the most efficient solution for more than one
packet, but otherwise I’d have to do heavy branching to avoid
superfluous calculations, thus impairing the source code’s readability
and exposing myself to possible branch mispredictions.

In other words, I try to keep the execution flow as linear and simple as
possible.

Now, I have been thinking: What if multiple packets can only occur under
heavy system load anyways?
After all, it’s just a mouse I’m dealing with and in case of an
overloaded system, inconsistent mouse feeling would be one of the user’s
least worries.

Following that assumption, I’d only process the last packet (leaving the
possible other ones untouched) and have a very lean, uncomplicated code,
which would furthermore happen to be the most efficient method for the
most likely scenario of only one packet at a time.

Therefore, I was also asking for the packet order and the guarantee of
at least one mouse packet in the service callback routine (InputDataEnd

  • 1 could otherwise point to invalid memory).

Ironically, all this doesn’t make much of a performance difference in
the real world and is more of a perfectionist ambition.
It’s all about balancing pros and cons, which I am not the very best at.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Doron Holan wrote:

Instead of calling QPC for every packet, call it
once at the top of the function and use the same timestamp for each
packet. The counter is most likely not going to change while your
routine is executing, and if it does, the change is so slight it is not
worth the extra overhead.

I also expect, and actually want, the same tick counts.

I assume the MOUSE_INPUT_DATA array can only contain one “new” packet, whereas the possible other ones are outdated by at least the minimum latency.

So, if I timestamp or don’t timestamp these old packets, they’ll be wrong. I can only guess their timestamp by subtracting them with the minimum latency I also happen to keep track of.

I do only timestamp packets which contain relative motion.

Doron Holan wrote:

You kind of made my point for me, you are overoptimizing for a path that
does not require such optimization.

I agree, but it still bugs me from a theoretical perspective.

I once asked here about floating-point vs. fixed-point arithmetic and consequently did my own performance tests.

The Intel guys told me about the superiority of todays FPUs (for such calculations), but there was effectively no significant performance difference.

In this case, I have chosen floating-point math as its features then outweigh its drawbacks.

On my primary machine, KeSaveFloatingPointState and KeRestoreFloatingPointState only take ~ 183 ns together, if you can trust the time measurement.

> KeSaveFloatingPointState and KeRestoreFloatingPointState

only take ~ 183 ns together, if you can trust the time measurement.

Well, no matter how precise your timer is, you can perfectly trust it only if maskable interrupts on CPU are disabled while your code runs (or if you run your code at the highest possible IRQL). Trying to obtain high-precision timing results at IRQL< DISPATCH_LEVEL with interrupts enabled is just a pointless exercise in itself, because of possibility of context switches while your target code runs. If you run your code at DISPATCH_LEVEL, results are more precise, but still imperfect because of possibility of interrupts…

Anton Bassov

anton bassov:

Trying to obtain high-precision timing
results at IRQL< DISPATCH_LEVEL with interrupts enabled is just a pointless
exercise in itself, because of possibility of context switches while your
target code runs.

Thanks for remark, I’ll keep it in mind the next time.

I have performed this test at PASSIVE_LEVEL and ran 1 billion (1000000000) interations.
Of course, the loop itself has also a small overhead, but “not much” is all the precision I need in this case.

Doron Holan wrote:

It depends on the mouse port driver below. In its currently

You will never get called with no data, why would you be called with no
data? That makes absolutely no sense.

I would never make any assumptions about whether you can get called with
no data or with multiple packets. I’ve seen too many stupid human tricks
pulled with mouse filter drivers (and have pulled a few myself) to ever
trust anything about them.

It’s entirely possible that some weird mouse filter might call the
callback with no packets, either due to a coding bug (or a premature
optimization like what the OP is proposing), or some kind of mistaken
idea that it would keep the system awake.

You have little choice to to assume that mouse filters won’t pass
completely invalid pointers (though broken ones can and will, of
course)… but you should assume they will do anything that they can do
within the letter of the function call specification. Because they will.

And besides: Write your code to be robust and clear. Only optimize based
on profiling data. Never optimize based on theory, because you’re not
smart enough to know whether it will help or hurt (N.B.: that’s not
directed at anyone in particular… no one is smart enough :slight_smile:

Assume any unknown mouse filter is trying to mess you up, because most
of them are written by yahoos with very little kernel experience (hey,
that would include my first Windows driver 10 years ago :-).

d

-----Original Messag
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hushmail.com
Sent: Friday, June 22, 2007 5:51 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Question abot mouse service callback routine scenarios

(Sorry, I couldn’t find a better title)

The mouse class service callback routine can transfer more than one
mouse packet at a time.
According to Doron Holan, you should “never ever assume you will be
called with only one PMOUSE_INPUT_DATA”.

But what are the circumstances that might lead to multiple packets?

I have tried provoking such behaviour by simulating excessive system
load without success.
Is it safe to assume that multiple packets will only occur under
exceptional resource demand or can there be other, non-critical
scenarios?

If a packet is “put back” into the queue and coincidentally a new packet
arrives, is a certain order guaranteed (old before new, new before old)?

Furthermore, can I assume the service callback routine will always be
called with at least one packet (excluding the last dummy)?

Thanks for your valuable replies!


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


Ray
(If you want to reply to me off list, please remove “spamblock.” from my
email address)

Ray Trent wrote:

I would never make any assumptions about whether you can get called with
no data or with multiple packets. I’ve seen too many stupid human tricks
pulled with mouse filter drivers (and have pulled a few myself) to ever
trust anything about them.

That’s another issue I have troubles deciding on.

Monolithic/“hybrid” kernels are based on the principle of trust as all kernel modules share the same privilege level, which actually leaves no other choice.

You would never trust other filter drivers attached to the same device stack, but where do you make the distinction?

As you already said, they could not only supply an (effectively) empty array, but could instead point to entirely unrelated memory, corrupt the IRPs you are happily passing down or worst, modify your driver itself.

In this context, the idea of complete absence of trust is unenforceable and thus absurd.

My point was that you have to trust that they will pass “valid” data
buffers, etc. You have to trust that they will at least obey the letter
of the documentation.

But *don’t* assume they will behave the same way that Microsoft does, or
even in what you think is a “sensible” way.

In this case, *all* buffer sizes are allowed by the documentation, so
assume that mouse filters probably will send callbacks with 0, 1, or
many packets in them (possibly very many… allow for overflowing
whatever local buffer you might have for packets and do what mouclass
does in this case: return saying that you only consumed part of the
packets).

xxxxx@hushmail.com wrote:

Ray Trent wrote:
> I would never make any assumptions about whether you can get called with
> no data or with multiple packets. I’ve seen too many stupid human tricks
> pulled with mouse filter drivers (and have pulled a few myself) to ever
> trust anything about them.

That’s another issue I have troubles deciding on.

Monolithic/“hybrid” kernels are based on the principle of trust as all kernel modules share the same privilege level, which actually leaves no other choice.

You would never trust other filter drivers attached to the same device stack, but where do you make the distinction?

As you already said, they could not only supply an (effectively) empty array, but could instead point to entirely unrelated memory, corrupt the IRPs you are happily passing down or worst, modify your driver itself.

In this context, the idea of complete absence of trust is unenforceable and thus absurd.


Ray
(If you want to reply to me off list, please remove “spamblock.” from my
email address)

I have made a decision and chose an optimized, but proper way.

First, I will run a loop through the whole array, which will conditionally set bit flags (e.g., MOUSE_LASTX_PRESENT).
Then I will check the bit field, conditionally call QPC (etc.) and run a second loop to do the actual work.

Thanks to all; you have been a great help!