Windows Interrupt Latency Measurements... Anybody?

Peter_Viscarola_OSR · November 4, 2013, 10:24am

Does anybody have recent (last few years) Windows interrupt latency measurements that they’d be willing to share?

I don’t have a PCIe bus analyzer (hopefully next year… sigh) and haven’t seen any real measurements in quite a while.

We’re working with a client on a sort of high precision timing device, and having an idea of the interrupt latencies involved will help us with the overall timing budget.

Thanks for whatever info you can share,

Peter
OSR

Daniel_Terhell · November 4, 2013, 11:21am

“Interrupt latency” is a term somewhat overloaded.

Wikipedia: “interrupt latency is the time that elapses from when an
interrupt is generated to when the source of the interrupt is serviced”
(Hardware ->CPU->ISR)

MSDN: “interrupt latency refers primarily to the software interrupt handling
latencies; that is, the amount of time that elapses from the time that an
external interrupt arrives at the processor until the time that the
interrupt processing begins”
(ISR -> DPC or ISR->DPC->Process)

The second definition usually gives numbers orders of magnitude higher than
the first definition. You can measure ISR -> DPC -> Process latencies with
LatencyMon.

//Daniel

Peter_Viscarola_OSR · November 4, 2013, 12:00pm

By “civilians”, sure.

So, to be clear: I mean, specifically, actual interrupt latency: The time from when the device signals an interrupt to the driver for that device’s ISR servicing that interrupt. Hence my reference to not having a PCIe Bus Analyzer.

While LatencyMon is a terrific tool (thank you for developing it, it’s simple enough for my ham radio buddies to use and sort-of understand), XPERF and WPA are a bit more flexible in terms of detailing ISR to DPC latency for driver devs.

Peter
OSR

Daniel_Terhell · November 4, 2013, 12:18pm

So the MSDN definition is the “civilian” one. I noticed the reference to the
bus analyzer but what really put me in doubt was “Windows” in “Windows
interrupt latency measurements”.

Do you care about hardware interrupt latencies if those are just in the
nanosecond magnitude ? Or are you looking for a proof to backup that claim ?

//Daniel

“Interrupt latency” is a term somewhat overloaded.

By “civilians”, sure.
So, to be clear: I mean, specifically, actual interrupt latency: The time
from when the device signals an interrupt to >the driver for that device’s
ISR servicing that interrupt. Hence my reference to not having a PCIe Bus
Analyzer.

Peter_Viscarola_OSR · November 4, 2013, 12:46pm

Well, yeah. It’s the definition I’d expect to be used by users and audio engineers. Not device and driver developers.

I’m looking for actual measure values. These will vary by OS, depending on the overhead of dispatching the interrupt. Hence my question.

Hullo? You see that it’s ME posting this question… not some kid asking for their school project?

I asked, so I care. I wouldn’t have asked if I didn’t care.

But to be clear for anyone who has doubts: I am looking for recent measurements the detail that time from a PCIe interrupt being asserted by the hardware, to the time it takes that interrupt to be serviced in the driver’s ISR. On Windows.

I am well aware that the numbers are likely to be somewhere between 100ns and 1us… depending on whether a register access is required to satisfy the interrupt.

I am seeking to (a) verify my understanding with recent measured numbers, and (b) narrow down the range a bit.

I don’t have a PCIe analyzer available, and I don’t want to resort to cycle counting my way through the interrupt path (before somebody suggests that… gad, I need an intern I could assign that to).

Peter
OSR

Alex_Grig · November 4, 2013, 1:10pm

If it’s a custom hardware, and is in FPGA state, you could have a hardware clock counter on it, which would reset at the interrupt time. You would then read it from your ISR and/or DPC. If you need multiple ISRs (MSI-X), you can have multiple registers to latch from an always running counter.

Peter_Viscarola_OSR · November 4, 2013, 1:49pm

That is a terrific idea.

I happen to have a PCIe FPGA device from an eval kit sitting around doing nothing. An excellent candidate for this use.

If nobody’s collected such numbers for a while, that’s a definite possibility.

Thanks,

Peter
OSR

OSR_Community_User · November 4, 2013, 2:34pm

Cycle counting loses anyway, because it doesn’t take into account effects
like the cache; for a frequently-interrupting device, there’s a good
chance the code it needs is in the I-cache, on some architectures.

It is not just the version of Windows; it is very sensitive to the
platform (the chipset on the motherboard).

A friend who needed such measurements and lacked a bus analyzer, but
needed to know latency-to-app, had the app open the LPT port. He hooked
an oscilloscope to a line of the LPT port, did some measures to get an
estimate of the time required to get a byte from the app to the port, then
hooked another input to the oscilloscope on the external line to the card
that generated the interrupt. Sadly, after subtracting out the number he
was using for app-to-LPT, he was getting numbers in the ranges of hundreds
of milliseconds. There were many things he could have done, had he
understood Windows, to reduce this, but I only heard about it a year after
the experiment.

So, if you have some way you can put into your ISR something that will
generate an external signal, you could tweak this in your ISR. That’s
about the only way I know to get it without a bus analyzer.

The last time I needed to compute interrupt latency was pre-Windows, on a
286, and cycle counting still worked.

joe

So the MSDN definition is the “civilian” one.

Well, yeah. It’s the definition I’d expect to be used by users and audio
engineers. Not device and driver developers.

I noticed the reference to the
bus analyzer but what really put me in doubt was “Windows” in “Windows
interrupt latency measurements”.

I’m looking for actual measure values. These will vary by OS, depending
on the overhead of dispatching the interrupt. Hence my question.

Do you care about hardware interrupt latencies if those are just in the
nanosecond magnitude

Hullo? You see that it’s ME posting this question… not some kid asking
for their school project?

I asked, so I care. I wouldn’t have asked if I didn’t care.

But to be clear for anyone who has doubts: I am looking for recent
measurements the detail that time from a PCIe interrupt being asserted by
the hardware, to the time it takes that interrupt to be serviced in the
driver’s ISR. On Windows.

I am well aware that the numbers are likely to be somewhere between 100ns
and 1us… depending on whether a register access is required to satisfy
the interrupt.

I am seeking to (a) verify my understanding with recent measured numbers,
and (b) narrow down the range a bit.

I don’t have a PCIe analyzer available, and I don’t want to resort to
cycle counting my way through the interrupt path (before somebody suggests
that… gad, I need an intern I could assign that to).

Peter
OSR

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

ThomasL · November 5, 2013, 4:53am

Hi Peter,

to get an idea I ran a test on a PCIe card (PCIe x1 interface via TI-DM647 DSP -> XIO2001 PCI/PCIx bridge). The DM647-test-program triggers an interrupt and measures the time until the interrupt-bit in the corresponding register gets acknowledged by the PC side ISR. The time is measured on the DSP side and contains a little more than you ask (e.g. parts of the PC side ISR), but to get an idea…:

WinXP SP3 on a Core i5 660: 4?s (sometimes 3?s)
Linux 3.2.0-37 64bit on a Core i5 660: 5?s (sometimes 6?s)
Win7 64bit on an old Pentium D (“family 15 model 6”): 4?s (sometimes 5?s)

I’d say on my 2 test-PCs the time between the physical interrupt signal till the ISR actually starts is in the range of 3-4?s.

bye,

Thomas

Daniel_Terhell · November 5, 2013, 8:22am

>Hullo? You see that it’s ME posting this question… not some kid asking

for their school project?

Sorry for sounding like that Peter. What caused me to doubt is also the fact
that we all know that ISR can get preempted by a higher level interrupt, an
IPI or SMI all of which can dramatically change the numbers. But perhaps you
were not interested in worst case latencies at all.

//Daniel

Peter_Viscarola_OSR · November 5, 2013, 9:32am

Thanks, Thomas.

3us-4us is a *lot* longer than I expected. Perhaps the register accesses account for this difference… I’d have to think about it.

But that’s definitely good data to have.

Many thanks to you for taking the time to reply with your results.

For what it’s worth, when we get our PCIe analyzer I’ll post some actual measurements.

Peter
OSR

Alex_Grig · November 5, 2013, 11:10am

>3us-4us is a *lot* longer than I expected.

I’ve seen 100,000 legacy-style interrupts per second on a single CPU (Xeon class) without seriously bogging it down.

Peter_Viscarola_OSR · November 5, 2013, 12:11pm

You know what I’d *really* like, in a perfect world? I’d like a graph over time of interrupt latencies on a series of real, working, machines running S12R2.

I just don’t think I’ll get that unless I collect it myself

I figured validating that my understanding is not off by a factor of magnitude is a good place to start.

Very helpful would be common best-case and worst-case timings, collected on a real running system.

I mentioned this was for a high precision timing device. I would *wish* for as much deterministic behavior as possible for this device, with regard to interrupt latency. I don’t need high interrupt rates… What I need is some predictability to ensure that my PID loop, when properly tuned, won’t be thrashing about.

Peter
OSR

ThomasL · November 5, 2013, 12:22pm

Hi Peter,

3us-4us is a *lot* longer than I expected.

this is an interesting feedback from you. It was just a short test today… Till now I always thought of “about 4us” so I didn’t question the result but it seems that I am wrong here. It seems that my DM647-side timestamp calculation is part of the long time, also polling this register was done in too long intervals.

I did the test again and directly used DM647 timer registers instead of a higher level function to calculate a timestamp and also reduced the polling interval. It seems that the time is shorter. I used a DM647 timer running at about 117MHz now - so about 117increments per us. I see that it usually takes about 80-110 increments till the register gets reset - so a bit less than 1us. A few interrupts took longer, about 800 increments or about 7us. Another thing is that reading the timer registers twice directly show 28 increments. Since I read the timer before setting the ISR and again after detecting the acknowledge the real result will be even shorter… I did the test only on Linux now, it will be similar on Windows. again it was just a short test and most likely the values are again wrong

bye,

Thomas

Peter_Viscarola_OSR · November 5, 2013, 12:30pm

Thank you again, Thomas.

so a bit less than 1us

Ah, so THAT’s closer to what I expected. Not that the values NEED to be what I expect, of course. That’s the entire purpose of my posting here.

Once again, I really appreciate your taking the time to collect and provide this information. Very helpful.

Peter
OSR

Jan_Bottorff · November 5, 2013, 7:54pm

In your posting you said “I did the test only on Linux now”.

It seems like the number Windows driver developers (and maybe Linux driver developers too) likely care most about is the time from interrupt signal (which I suppose in MSI-X terms means the memory write to the interrupt controller), to when the DPC can process the interrupt. Windows ISR’s generally don’t do much work in the ISR, all the real work happens in the DPC. For example, a NIC driver will get the interrupt, but packets typically are not pulled off the completion status ring until the DPC where they can be indicated up to the transport.

If it’s about a uSec from interrupt to ISR, I could easily imagine it’s 2-4 uSec from interrupt to DPC (and a LOT of variation up to double and triple digit uSecs due to DPCs ahead of you). It would be interesting to see real measured numbers, maybe queuing and running a DPC is a fraction of a uSec, but maybe not. I suppose the DPC queuing/dequeuing part is easy to measure, but there also is the post ISR processing which I could imagine is similar to the pre ISR processing. Knowing typical distributions of the latency would also be interesting.

I did do a test a year or so ago while writing some driver private worker thread code, that as I remember said I could queue and dequeue a work item in about 0.2 uSec (around 5 million work requests/sec, on a single core, that did not cause the thread to suspend). The test was basically queue a work item that queued a work item, and seeing how fast the loop spins.

I also did some measurement of ETW trace events, and concluded you could do about a million ETW events/sec (as I remember on 1 core, if you didn’t mind using 100% of that core). These were basically back-to-back ETW event writes, and the trace timestamps showed about a uSec between each event.

I know it used to be that if you had more than about 75K interrupts/sec the checked OS decided your device was broken and bugchecked. I’m curious what the value is today, on W2K12r2. I also know there is an API call to query the DPC watchdog about percentage of the DPC time limit it left, so you can tune things like how many requests you process in each DPC invocation.

It would be great to have some table of a how long many operations take, on typical processors, for those of us who like doing back of envelope performance estimates.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@besi.com
Sent: Tuesday, November 5, 2013 9:22 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Windows Interrupt Latency Measurements… Anybody?

Hi Peter,

3us-4us is a *lot* longer than I expected.

this is an interesting feedback from you. It was just a short test today… Till now I always thought of “about 4us” so I didn’t question the result but it seems that I am wrong here. It seems that my DM647-side timestamp calculation is part of the long time, also polling this register was done in too long intervals.

I did the test again and directly used DM647 timer registers instead of a higher level function to calculate a timestamp and also reduced the polling interval. It seems that the time is shorter. I used a DM647 timer running at about 117MHz now - so about 117increments per us. I see that it usually takes about 80-110 increments till the register gets reset - so a bit less than 1us. A few interrupts took longer, about 800 increments or about 7us. Another thing is that reading the timer registers twice directly show 28 increments. Since I read the timer before setting the ISR and again after detecting the acknowledge the real result will be even shorter… I did the test only on Linux now, it will be similar on Windows. again it was just a short test and most likely the values are again wrong

bye,

Thomas

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 6, 2013, 12:46am

Many years ago, I used to give a lecture on performance measurement. I
agree with you; one way to get this in a somewhat simpler fashion is to
keep a mean and standard deviation over a large sample size. There is a
transformation that requires only three variables: N, the number of
samples, the sum of the samples S (mean is S/N for N > 0), and the sum of
the squares of the samples. Unfortunately, I and my statistics text are
separated by a couple floors right now. I’ll try to remember to post the
equation next time it and I are together. What made this interesting was
that when I did performance measurements, I had a standard deviation an
order of magnitude larger than the mean. Turns out, in that OS, context
swap time was being charged to the running process instead of to the
kernel (to give you a hint about how far back this was, the timer
resolution was 20us, which allowed no more than 10 instructions per
tick…). I identified it by keeping the running mean and variance, and
any reading > 3 sigma was recorded as an anomaly. Sure enough, every 60
ms I got a gigantic sample; 60ms was a timeslice, and from there it was
easy for the OS maintainers to track down the problem and fix it. Note
this means you can detect how often you get an anomalously large value
(over, say, N=1,000,000 or larger) sort the anomaly into k buckets of
sigma-multiple, and, as I did, not count samples that were too large as
contributing to the mean/s.d values. So I could say something like “The
typical processing time is n us, with 5% of the samples > n1 us, and of
those, 0.1% were anomalously (> w s.d. from the nominal mean) large.” So
if 99.99% of your samples are ~1us, and of your 1,000,000 samples, 2 are
really bad, this tells you how confident you can be in your timestamps.

Some years ago (about 2003) I was given a driver that needed precise
timestamps on its samples. The idea being that if we read a packet of N
samples, at T0, the next timestamp should be very close to T0+t(N) where
t(N) is the time required to read N samples. The jitter between the
expected values was hundreds of milliseconds. T values were a
KeQueryPerformanceCounter reading. I was asked to rewrite the driver to
reduce the jitter. The driver was a mess, and nothing short of throwing
it out and rewriting it could have saved it. But I got the jitter down to
the desired < 100us by rewriting the app instead. Out of sample sizes of
1,000,000, we had two or sometimes three “rejectable” jitter values, all <
50ms but > 1ms. The client decided that was acceptable. Average jitter
was 80us, and never exceeded 100us except for the “gigantic” values, which
were all > 1ms. There was nothing between 100us and the smallest
rejectable value, whose exact value I no longer recall but it was always >
5ms.

Sometimes you just have to accept big latencies. But knowing when to
reject a sample can frquently be more useful than achieving perfection.
It depends on the problem domain.
joe

we all know that ISR can get preempted by a higher level interrupt, an
IPI or SMI all of which can dramatically change the numbers.
But perhaps you were not interested in worst case latencies at all.

You know what I’d *really* like, in a perfect world? I’d like a graph
over time of interrupt latencies on a series of real, working, machines
running S12R2.

I just don’t think I’ll get that unless I collect it myself

I figured validating that my understanding is not off by a factor of
magnitude is a good place to start.

Very helpful would be common best-case and worst-case timings, collected
on a real running system.

I mentioned this was for a high precision timing device. I would *wish*
for as much deterministic behavior as possible for this device, with
regard to interrupt latency. I don’t need high interrupt rates… What I
need is some predictability to ensure that my PID loop, when properly
tuned, won’t be thrashing about.

Peter
OSR

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 6, 2013, 1:20am

The DPC time is relevant only if you have a sustained burst of interrupts.
In this case, if Tp, the total time required in the DPC, is >= Tiar, the
inter-arrival time of elements in the DPC queue, then queue size will grow
to infinite, as will latencies. If the Tiar is larger than Tp, however,
the problem does not exist. It gets really interesting for multimodal
Tiar, or “bursty data”. In this case, we can usually establish an upper
bound on queue size, and an upper bound on total latency.

But for a device that has a very small Tiar, even if the Tp is less than
Tiar, the limiting factor is how fast the ISR can process the data. This
would be expressed as Til + Tip, the base interrupt latency time plus the
interrupt processing time. That may impact the feasibility more than Tp,
at least for bursty data (good grief! I haven’t worried about this stuff
since I passed the Operations Research qualifier back in 1971!).

Ultimately, it means that for some devices you can separate the Tp
concerns (read: DPC latency) from the Til+Tip concerns (read: latency,
interrupt-to-ISR + ISR-to-IRET time). The interrupt-to-IRET time can, in
some cases, dominate the whole problem, because this places an upper bound
on the interrupt frequency, so knowing interrupt-to-ISR time may be only
half the problem if Til+Tip+Tiret ~ Tiar, even if Til+Tip < Tiar. (Oh,
for the ability to use subscripts) and Tiar is multimodal.

However, at this remove, I doubt if I could come up with the necessary
math to demonstrate expected values given the expected multimodal
distributions. 42 years ago, I could do better than be able to identify
an integral sign correctly two times out of three (and these days, only on
clear days, with a tailwind. And like a dog chasing a car, I’m not sure
what I’d do with one after I caught it)
joe

In your posting you said “I did the test only on Linux now”.

It seems like the number Windows driver developers (and maybe Linux driver
developers too) likely care most about is the time from interrupt signal
(which I suppose in MSI-X terms means the memory write to the interrupt
controller), to when the DPC can process the interrupt. Windows ISR’s
generally don’t do much work in the ISR, all the real work happens in the
DPC. For example, a NIC driver will get the interrupt, but packets
typically are not pulled off the completion status ring until the DPC
where they can be indicated up to the transport.

If it’s about a uSec from interrupt to ISR, I could easily imagine it’s
2-4 uSec from interrupt to DPC (and a LOT of variation up to double and
triple digit uSecs due to DPCs ahead of you). It would be interesting to
see real measured numbers, maybe queuing and running a DPC is a fraction
of a uSec, but maybe not. I suppose the DPC queuing/dequeuing part is easy
to measure, but there also is the post ISR processing which I could
imagine is similar to the pre ISR processing. Knowing typical
distributions of the latency would also be interesting.

I did do a test a year or so ago while writing some driver private worker
thread code, that as I remember said I could queue and dequeue a work item
in about 0.2 uSec (around 5 million work requests/sec, on a single core,
that did not cause the thread to suspend). The test was basically queue a
work item that queued a work item, and seeing how fast the loop spins.

I also did some measurement of ETW trace events, and concluded you could
do about a million ETW events/sec (as I remember on 1 core, if you didn’t
mind using 100% of that core). These were basically back-to-back ETW event
writes, and the trace timestamps showed about a uSec between each event.

I know it used to be that if you had more than about 75K interrupts/sec
the checked OS decided your device was broken and bugchecked. I’m curious
what the value is today, on W2K12r2. I also know there is an API call to
query the DPC watchdog about percentage of the DPC time limit it left, so
you can tune things like how many requests you process in each DPC
invocation.

It would be great to have some table of a how long many operations take,
on typical processors, for those of us who like doing back of envelope
performance estimates.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@besi.com
Sent: Tuesday, November 5, 2013 9:22 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Windows Interrupt Latency Measurements… Anybody?

Hi Peter,

>3us-4us is a *lot* longer than I expected.

this is an interesting feedback from you. It was just a short test
today… Till now I always thought of “about 4us” so I didn’t question the
result but it seems that I am wrong here. It seems that my DM647-side
timestamp calculation is part of the long time, also polling this register
was done in too long intervals.

I did the test again and directly used DM647 timer registers instead of a
higher level function to calculate a timestamp and also reduced the
polling interval. It seems that the time is shorter. I used a DM647 timer
running at about 117MHz now - so about 117increments per us. I see that
it usually takes about 80-110 increments till the register gets reset - so
a bit less than 1us. A few interrupts took longer, about 800 increments or
about 7us. Another thing is that reading the timer registers twice
directly show 28 increments. Since I read the timer before setting the ISR
and again after detecting the acknowledge the real result will be even
shorter… I did the test only on Linux now, it will be similar on
Windows. again it was just a short test and most likely the values are
again wrong

bye,

Thomas

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Daniel_Terhell · November 6, 2013, 2:58am

>“Jan Bottorff” wrote in message news:xxxxx@ntdev…

If it’s about a uSec from interrupt to ISR, I could easily imagine it’s 2-4
uSec from interrupt to DPC (and a LOT of >variation up to double and triple
digit uSecs due to DPCs ahead of you). It would be interesting to see real
>measured numbers, maybe queuing and running a DPC is a fraction of a uSec,
but maybe not. I suppose the DPC >queuing/dequeuing part is easy to
measure, but there also is the post ISR processing which I could imagine is
>similar to the pre ISR processing. Knowing typical distributions of the
latency would also be interesting.

Sorry for the plug again but LatencyMon will tell you that. It does
“interrupt to process” latency measuring which includes the additional step
of waking up an idle real-time priority thread to respond to an event set in
that DPC routine. When you look at the details it will tell also tell you
what part of those intervals are ISR->DPC latencies without including the
last step of waking up the user thread. Of course this doesn’t include
anything from before the interrupt handler started executing.

//Daniel

MBond · November 7, 2013, 5:39pm

It is also possible with storing the count, mean and variance and computing
the standard deviation on demand. This method is better when you need
ongoing calculations and frequent checks (i.e. was the completion time for
this IRP significantly different than the normal completion time). Also not
that the assumption made here and by Joe of normal distribution are
inherently bad except from the point of view of computational complexity.
By definition the distribution will be abnormal because the distribution has
a minimum value (systemic bias in the uncontended best case scenario), is
designed to deliver service as fast as possible, but has no maximum bound.
Unfortunately any statistical model I have tried that better fits is vastly
more complex to compute and the increase in accuracy has never been worth
the extra complexity - especially when the analysis must be done in a timely
manner

Some comments from relevant sections of source I have used

// save and compute new mean
// m1 = m0 + (xn - m0)/n

// compute new variance
// s1 = s0 + (xn - m0) * (xn - m1)

// calculate the standard deviation from the variance & datum count
// use n - 1 because we are evaluating a sample not a population
// sigma = sqrt(sn/(n - 1))

wrote in message news:xxxxx@ntdev…

Many years ago, I used to give a lecture on performance measurement. I
agree with you; one way to get this in a somewhat simpler fashion is to
keep a mean and standard deviation over a large sample size. There is a
transformation that requires only three variables: N, the number of
samples, the sum of the samples S (mean is S/N for N > 0), and the sum of
the squares of the samples. Unfortunately, I and my statistics text are
separated by a couple floors right now. I’ll try to remember to post the
equation next time it and I are together. What made this interesting was
that when I did performance measurements, I had a standard deviation an
order of magnitude larger than the mean. Turns out, in that OS, context
swap time was being charged to the running process instead of to the
kernel (to give you a hint about how far back this was, the timer
resolution was 20us, which allowed no more than 10 instructions per
tick…). I identified it by keeping the running mean and variance, and
any reading > 3 sigma was recorded as an anomaly. Sure enough, every 60
ms I got a gigantic sample; 60ms was a timeslice, and from there it was
easy for the OS maintainers to track down the problem and fix it. Note
this means you can detect how often you get an anomalously large value
(over, say, N=1,000,000 or larger) sort the anomaly into k buckets of
sigma-multiple, and, as I did, not count samples that were too large as
contributing to the mean/s.d values. So I could say something like “The
typical processing time is n us, with 5% of the samples > n1 us, and of
those, 0.1% were anomalously (> w s.d. from the nominal mean) large.” So
if 99.99% of your samples are ~1us, and of your 1,000,000 samples, 2 are
really bad, this tells you how confident you can be in your timestamps.

Some years ago (about 2003) I was given a driver that needed precise
timestamps on its samples. The idea being that if we read a packet of N
samples, at T0, the next timestamp should be very close to T0+t(N) where
t(N) is the time required to read N samples. The jitter between the
expected values was hundreds of milliseconds. T values were a
KeQueryPerformanceCounter reading. I was asked to rewrite the driver to
reduce the jitter. The driver was a mess, and nothing short of throwing
it out and rewriting it could have saved it. But I got the jitter down to
the desired < 100us by rewriting the app instead. Out of sample sizes of
1,000,000, we had two or sometimes three “rejectable” jitter values, all <
50ms but > 1ms. The client decided that was acceptable. Average jitter
was 80us, and never exceeded 100us except for the “gigantic” values, which
were all > 1ms. There was nothing between 100us and the smallest
rejectable value, whose exact value I no longer recall but it was always >
5ms.

Sometimes you just have to accept big latencies. But knowing when to
reject a sample can frquently be more useful than achieving perfection.
It depends on the problem domain.
joe

we all know that ISR can get preempted by a higher level interrupt, an
IPI or SMI all of which can dramatically change the numbers.
But perhaps you were not interested in worst case latencies at all.

You know what I’d *really* like, in a perfect world? I’d like a graph
over time of interrupt latencies on a series of real, working, machines
running S12R2.

I just don’t think I’ll get that unless I collect it myself

I figured validating that my understanding is not off by a factor of
magnitude is a good place to start.

Very helpful would be common best-case and worst-case timings, collected
on a real running system.

I mentioned this was for a high precision timing device. I would *wish*
for as much deterministic behavior as possible for this device, with
regard to interrupt latency. I don’t need high interrupt rates… What I
need is some predictability to ensure that my PID loop, when properly
tuned, won’t be thrashing about.

Peter
OSR

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer