ISR and DPC times

Hello,

I’ve written a little test driver, which uses the HPET to generate interrupts in periods of 0.5 millisecs. I’m measuring these intervals directly inside my ISR. So my ISR only reads the HPETs maincount, computes the difference to the preceding interrupt and sets the next interrupt (as the timer I’m using doesn’t support periodic mode). My ISR is called at IRQL 0x0b.

Now I’m watching my ISR being called correctly in intervals of approx. 0.5 millisecs for a long time (several minutes), but sometimes the interval is above 2 millisecs. When I’m watching Resplendence Latency Monitor, I can see my driver having the highest ISR execution time (65 microseconds) and sometimes the monitor also reports a interrupt to process latency of about 2000 microsecs right at the moment I can watch this high interval in my ISR.

Is there a chance to find the reason for these random high intervals?

Thanks and best regards,
Johannes F.

Hi,

thanks for your reply. Meanwhile I think this delay is the result of sharing the interrupt (22) with the HDAudio driver. But at the moment I don’t succeed in using an alternate interrupt (20 or 21) or a message signaled interrupt … I tried to allocate these (20 or 21) in the drivers *.inf LogConfig section, but then I cannot start my driver anymore (code 12). And when I disable the HDAudio driver, my call to IoConnectInterrupt() fails with STATUS_INVALID_PARAMETER. I’ll have to learn more about interrupts :wink:

Thanks and regards,
Johannes F.

xxxxx@Freyberger.de wrote:

Now I’m watching my ISR being called correctly in intervals of approx. 0.5 millisecs for a long time (several minutes), but sometimes the interval is above 2 millisecs. When I’m watching Resplendence Latency Monitor, I can see my driver having the highest ISR execution time (65 microseconds) and sometimes the monitor also reports a interrupt to process latency of about 2000 microsecs right at the moment I can watch this high interval in my ISR.

Is there a chance to find the reason for these random high intervals?

There are tools to help you measure ISR and RPC time.
https://msdn.microsoft.com/en-us/library/windows/hardware/ff545764.aspx

However, you need to remember that there is nothing you can do about
this. Windows is NOT a real-time operating system. There are no
guarantees of interrupt latency. If personal safety depends on you
getting 500us timer interrupts forever, then you need to choose a
different operating system.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Why are you meddling with the interrupts belonging to another piece of HW?

wrote in message news:xxxxx@ntdev…
> Hi,
>
> thanks for your reply. Meanwhile I think this delay is the result of sharing the interrupt (22) with the HDAudio driver. But at the moment I don’t succeed in using an alternate interrupt (20 or 21) or a message signaled interrupt … I tried to allocate these (20 or 21) in the drivers *.inf LogConfig section, but then I cannot start my driver anymore (code 12). And when I disable the HDAudio driver, my call to IoConnectInterrupt() fails with STATUS_INVALID_PARAMETER. I’ll have to learn more about interrupts :wink:
>
> Thanks and regards,
> Johannes F.
>

On 04-Feb-2015 07:55, Maxim S. Shatskih wrote:

Why are you meddling with the interrupts belonging to another piece of HW?

What else can you recommend to receive interrupts from HPET?

– pa

Hi,

as my HPET suppports FSB interrupts (Front Side Bus) meanwhile I think it would be the best to use these interrupts in order to avoid sharing interrupts. I’m assuming that FSB interrupts correspond to the MessageBased interrupts of IoConnectInterruptEx, but I can’t find any information how the two 32 bit registers for defining the FSB in the HPET (location that the FSB interrupt message should be written to and value that is written during the FSB interrupt message) correspond to the values in PIO_INTERRUPT_MESSAGE_INFO and about the lines needed in my inf file for these interrupts.

Thanks and regards,
Johannes F.

Can you explain the overall goal you’re trying to accomplish? Are you trying to write a driver for the HPET as an ultimate goal? Because IIRC, this device is reserved for use by Windows and you’re not supposed to fool with it.

Or, is this a lab experiment… and you’re just using the HPET as a convenient source of interrupts?

Peter
OSR
@OSRDrivers

Hi all,

but is Windows really using all HPET counters available? To my knowledge it uses just a few. So why not using unused HPET count/timers?

As IMHO Windows is using HPET via the HAL DLL, wouldn’t it be possible to even “steal” the HPET by a boot driver? Or at least by a custom HAL?

Just brain storming :slight_smile:

IMO Windows developers are handicapped enough. If we look at at Linux, this OS is screwing out microsecond timing/latency out of the same hardware where Windows only delivers milliseconds.

Unfortunately all the Windows scheduler/timing functions are millisecond based. So I can imagine that using the HPET together with a real hardware interrupt could be a possibility to go sub milliseconds (if the rest of the system behaves well in terms of ISR/DPC latency).

Am I writing nonsense?

Cheers
Axel

What the OS uses today is not guaranteed to be the same as what the OS uses tomorrow (or used to use in the releases of yesterday past).

Not to mention what would happen if some other software decided to play the same tricks with trying to wrest control over the HPET and an unfortunate user installed both pieces of software of their machine.

Windows is not a real time operating system. That being said, there is a coordinated high resolution timer facility present in Windows 8.1 / Windows Server 2012 R2 (https://msdn.microsoft.com/en-us/library/windows/hardware/dn265188(v=vs.85).aspx ). Nonwithstanding ISRs and DPCs which are a serious consideration for any realtime endeavor, ExQueryTimerResolution will tell you the best timing you can expect on a given system with that facility, which is tied to the clock source used.

(The machine I sent this mail from had a minimum resolution of 500 microseconds with the default time source.)

  • S (Msft)

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmx.de
Sent: Wednesday, February 04, 2015 8:48 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] ISR and DPC times

Hi all,

but is Windows really using all HPET counters available? To my knowledge it uses just a few. So why not using unused HPET count/timers?

As IMHO Windows is using HPET via the HAL DLL, wouldn’t it be possible to even “steal” the HPET by a boot driver? Or at least by a custom HAL?

Just brain storming :slight_smile:

IMO Windows developers are handicapped enough. If we look at at Linux, this OS is screwing out microsecond timing/latency out of the same hardware where Windows only delivers milliseconds.

Unfortunately all the Windows scheduler/timing functions are millisecond based. So I can imagine that using the HPET together with a real hardware interrupt could be a possibility to go sub milliseconds (if the rest of the system behaves well in terms of ISR/DPC latency).

Am I writing nonsense?

Cheers
Axel


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I’m very interested in this topic, and spent an inordinate amount of time worrying about it almost exactly a year ago.

In the sense that you didn’t answer my questions, and rather wrote a set of conclusions that are apparently based on random experiments… yes.

In the sense that one needs to be concerned with interrupt latency and the distribution of such latencies over time… no.

Well, that’s a poor conclusion based on I’m not sure what.

By actual measurement last year, on a commodity system, I measured the latency from the time a device generates an MSI to the time the driver’s ISR is entered to be *significantly* less than a microsecond, most typically 200 or so nanoseconds. The distribution of latencies is not a normal distribution and is usually in fact “multi-modal.”

The amount of time the ISR takes is influenced by the number (and type) of register accesses it makes (touching just ONE register on the device costs you almost a microsecond).

Measured latencies from ISR to DPC (what would be the “bottom half” in Linux) is most variable. I’ve measured latencies from ISR to DPC from 600 nanoseconds (seriously) to 7 milliseconds (not a typo).

In my experience, there’s absolutely no basic problem with Windows device to ISR, or ISR to DPC, latencies. The ONLY time I’ve seen problems, or aberrant timings, were when one or more device drivers “misbehaves” either intentionally or unintentionally (or the hardware is old or a very poor match for a general purpose, on demand paging, OS).

When you have an operating system that supports such a broad range of hardware, with devs of various levels of knowledge and ability writing drivers, you tend to have a lot of variability.

Peter
OSR
@OSRDrivers

Hi,

thanks for the interesting discussion on this thread.

  • what’s my goal: for (LAN) streaming purposes I want to send UDP pakets with the lowest possible jitter but without additional hardware. The low jitter is needed for hardware devices, which receive such streams but which provide only a very small receiving jitter buffer of a few milliseconds.

  • at the moment I’m sending my pakets in a high priority kernel thread at passive level using WSK. But sometimes this thread is blocked by DPCs. So for my experience the majority of PCs out there normally behaves quite good and doesn’t show DPCs > 1 millisec … for a certain amount of time. But only very few PCs show DPCs < 1 millisec for 24/7 usage.

  • I’ve already tried the new HighResTimer introduced in W8.1 and ExQueryTimerResolution also reported 500 usecs as minimum timer resolution on all machines so far. But finally the timer callback interval was never below 0.97 msecs and the max values were about 1.8 msecs.

  • So I decided to give HPET a chance. So far, on all the machines where I’ve tested my driver, there was not a single timer of all the HPET timers activated. I’d be interested in situations when windows actually activates one of these timers. The main count probably is used for QueryPerformanceCounter if you’ve set the platformclock, but I’m not planning to manipulate the main count. And before utilizing one of the timers, I can check, whether it’s already active to avoid troubles.

Best regards,
Johannes

So let’s assume you are able to connect an HPET timer to your ISR, then
what. You can’t send a network packet with WSK from the ISR, you will need
to queue a DPC and send from there, and the numbers Peter just gave don’t
suggest there will be a high probability over a longer time period of the
DPC always running within a short limited latency. DPC’s can run for “a
while”, like for example you get a blob of network receive packets, which
will be processed in a DPC, with some constraint on the number of packets
processed. There were features added to limit the amount of processing on
received packets, as it was degrading multimedia playback latency.

So, unless you are prepared to take over the NIC hardware, so you can put
packets on it’s send ring from your ISR, I’m not sure HPET interrupts will
help you accomplish your low jitter UDP packets. For an embedded use, you
could dedicate a specific NIC to your use, and send from an ISR. Some NICs
have their own timers, so you might pick a NIC with a timer, attach to the
NICs interrupt, and send your UDP packet from your code written to control
that NIC. I believe source code for a Realtek NIC is available in open
source repositories.

It seems like the core of your problem is you need to run code in a DPC
(DISPATCH_LEVEL), with limited timing latency, which will not be possible.
Other DPCs will not necessarily live by your latency constraints, and as
DPCs are not preemptable, you have to wait for the completion of any
running DPC, even if you queue your DPC at the head of the DPC queue. I
suppose a potentially useful OS feature would be a way for a driver to
declare it owns DPC/ISR processing for a specific core, such that that
driver will never have to share the DPC queue. Anytime you start getting
into processor affinity control, you start getting complexity from how
power should be managed. Say there was a way to own a core, does that core
always run at maximum frequency, which is clearly bad for power
conservation?

Out of curiosity, can you send network packets from an ISR on Linux?

For a specific system, with specific hardware, you might successfully
adjust the system to not have DPC delays longer than you require, but on
an arbitrary system, the DPC latency will be whatever it is. You might be
better off using a $35 embedded controller, that can dedicate all it’s
resources just to sending out your UDP packet on schedule.

Jan

On 2/4/15, 7:19 PM, “xxxxx@Freyberger.de
wrote:

>Hi,
>
>thanks for the interesting discussion on this thread.
>
>- what’s my goal: for (LAN) streaming purposes I want to send UDP pakets
>with the lowest possible jitter but without additional hardware. The low
>jitter is needed for hardware devices, which receive such streams but
>which provide only a very small receiving jitter buffer of a few
>milliseconds.
>
>- at the moment I’m sending my pakets in a high priority kernel thread at
>passive level using WSK. But sometimes this thread is blocked by DPCs. So
>for my experience the majority of PCs out there normally behaves quite
>good and doesn’t show DPCs > 1 millisec … for a certain amount of time.
>But only very few PCs show DPCs < 1 millisec for 24/7 usage.
>
>- I’ve already tried the new HighResTimer introduced in W8.1 and
>ExQueryTimerResolution also reported 500 usecs as minimum timer
>resolution on all machines so far. But finally the timer callback
>interval was never below 0.97 msecs and the max values were about 1.8
>msecs.
>
>- So I decided to give HPET a chance. So far, on all the machines where
>I’ve tested my driver, there was not a single timer of all the HPET
>timers activated. I’d be interested in situations when windows actually
>activates one of these timers. The main count probably is used for
>QueryPerformanceCounter if you’ve set the platformclock, but I’m not
>planning to manipulate the main count. And before utilizing one of the
>timers, I can check, whether it’s already active to avoid troubles.
>
>Best regards,
>Johannes
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

Weeeelll… I think Mr. Bottorff has it about right. You can solve this problem for a specific system configuration, with specific hardware. I’ve done it. But you’re unlikely to be able to ensure you have highly deterministic packet transmit, with low jitter, on a general purpose system running Windows.

Sending your packets from a high priority thread running at IRQL PASSIVE_LEVEL doesn’t seem like a great plan to me. I’m also not sure that using WSK is such a great idea. But if you’re really sold on WSK, I’m not sure why you wouldn’t be calling WskSend at IRQL DISPATCH_LEVEL, driven by a high resolution timer you’ve set with WdfTimerCreate or ExSetTimer and their associates.

Sending packets in response to a high resolution kernel timer callback, which runs at IRQL DISPATCH_LEVEL, should eliminate a great deal of your latency concerns. Now, granted… you DO have to worry about DPCs that might be queued in front of yours. But I, personally, would want to measure this to see how well it performs.

Do you have ANY tolerance for lost packets? IOW, is this an isochronous type of connection you’re driving, where you can occasionally stand to lose a packet in exchange for timely delivery of previous and following packets? I would think THAT should work.

Honestly, based on my experience, I’d be more concerned with the delay between WskSend and the receiving system getting the packet, than the time to originate the message.

I think you’re down a major rat hole with the whole HPET thing. You’d be foolish to ship a solution that includes “Think I’ll grab and reprogram an HPET and paaaaarty!” While it’s semi-sensible to hack around certain things, there are limits… and that’s outside of the limits of anything that’s supportable outside of a homework assignment.

Peter
OSR
@OSRDrivers

Hi,

OK, I think, I’ve understood the limits of sending packets at DISPATCH_LEVEL in terms of timing and also that using HPET could probably only a bring a small improvement compared to the new high resolution timer. Nevertheless I’d like to test it even if it’s just for homework, for learning and for understanding the HPETs FSB interrupts and their relation to windows message based interrupts … if there is any relation. For sure I’m a beginner in kernel development and interrupts but I’d be interested in learning a little bit more.

Are there any sources on the web which could help me? Googling for “ioconnectinterruptex fsb” doesn’t show many hits.

Thanks and regards,
Johannes F.

You don’t get any hits from Google because you’re doing something that’s not only not documented, it’s not architecturally appropriate.

Windows, as a system, manages device resources. It is responsible for arbitrating, assigning, and managing (in a general way) all the hardware resources in the system.

The FSB is an approximate architectural *location* within a computer design (not unlike “bedroom” within a house design). There are individual devices that may be attached to the FSB. To control one of these devices, you would need to write a driver that claims control of it by specifying the device’s VID/PID in an INF file.

If you would like to return to the practical matter of how to build a product, I’ll be happy to help you. But over the years on this forum I’ve developed a firm “I don’t help with homework” policy (both literal homework – we get a number of such requests – and figurative as in your case).

Peter
OSR
@OSRDrivers

[quote] If you really want to learn about network latency issues, I suggest that you
spend more time looking into network devices and protocols rather than HPETs [/quote]

I’m speaking about non compressed media streaming in a local gigabit or 10 gig network via UDP. As far as I know there will very rarely be a jitter > 100 usec added by “the network” even if you have a cascade of up to four switches.

So for my impression, the (Windows) PC is the device which is adding by far the biggest jitter … compared to switches, embedded devices sending packets from a FPGA or even Linux PCs using a realtime extension. And I’m still wondering about the neverending tip to use a different OS … in the end Windows is the platform used by most of our customers and they actually do media processing on their Windows PCs and what I’m trying is to find the best solution to let these PCs be an equitable participant in a local professional streaming network of embedded media devices etc. without the need for specialized hardware.

Best regards,
Johannes F.

That’s PROBABLY correct, assuming low network load and given that you added:

I’ll definitely agree with you.

Well, as engineers, we like to use the best tool for the job. You *can* use a pair of pliers to remove a bolt, but a properly sized wrench (spanner) will get the job done faster, easier, and with no chance of damaging the bolt.

In this case, you’re taking a general purpose computer system that uses a fully preemptible kernel, with a sophisticated on-access paging virtual memory system that supports paging of OS components (that’s likely to incur page faults at THE most inopportune times relative to your bitstream), and attempting to make it do a job for which it’s not really designed.

I’m not saying you can’t do a reasonable job of it… You can do a PRETTY good job with enough effort, for MOST environments. But the overall system environment, from the netcard to the graphics card to all sorts of other devices and their specific driver versions, will be the limiting factor in how successful you can be and how consistently you can be that successful.

I remember “fixing” a problem years ago where the FIFO on the device could buffer no more than 1 ms of data (this was on, like, a Pentium Pro or something)… it often take a lot of clever work. An IRQL PASSIVE_MODE thread running at high scheduling priority is definitely not going to do it. And trying to fix the problem with a driver for the HPET is a bit like trying to remove that bolt in my previous example with a linear shaped charge. It is certainly elegant, and it is amusing, but it’s wwwwaaaaaayyy more complex than you need for the job. And dangerous, too.

Peter
OSR
@OSRDrivers