Re: [ntdev] IRQL Priority Problems

Well if this is avionics, then you can probably afford to abuse the host system - regardless of the OS you use!

Sent from Surface Pro

From: Brad Aswegan
Sent: ‎Monday‎, ‎April‎ ‎13‎, ‎2015 ‎6‎:‎16‎ ‎PM
To: Windows System Software Devs Interest List

If I had my choice, I would have developed this in Linux like the majority of our products that use this communication standard. Not my choice, we’re replacing a legacy product and its windows based.

The good news, the environment of devices we need to communicate with is pretty much static and we know every single one of them. This protocol is old and it’s not being used anymore, at least not on newer devices. They just use Ethernet. This is avionics though, so the existing devices will never be retired. This one device just can’t seem to handle it if there is any latency between words… pretty silly. But yes, this problem originates with the host and our driver. Our problem being that the previous version of this product works with the device, so we’ll get a lot of blowback over this (despite the new version working with 99% of the devices in use).

And yes, this is PCI… you would be terrified but we have to go through a PCIe-to-PCI bridge and then a PCI-to-local bus bridge. That’s how old this stuff is, designed to work in a different age of computing.

Thanks much for the advice,

Brad

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Phil Barila
Sent: Monday, April 13, 2015 4:48 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IRQL Priority Problems

As noted in your response to SNoone, you have made it appear to work most of the time. And if you can identify the device that is pushing you out of your timing spec, you can probably mitigate that by disabling that device, or taking some other approach.

I do hope you can resolve the issue with your one “device”. I suspect it’s actually the host system. If you are selling this as a general purpose thing, without controlling the host environment, it’s going to occur again, no matter how hard you try to sort out the issue with the single thing you’re currently chasing.

I hate to say it, but as long as your architecture doesn’t fit your environment, you’re always going to be looking over your shoulder at this kind of issue.

Since you mention interrupts, I assume that means a PCI device? If so, you might be able to make it work just by moving the PCI devices around to get a different interrupt, or different enumeration order, so your ISR gets invoked before that of the problem device.

Phil

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Brad Aswegan
Sent: Monday, April 13, 2015 3:29 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IRQL Priority Problems

I will have to look into the capabilities of the chipset we’re using. Off the top of my head, I’m not sure it does this kind of interrupting. Originally, we took a similar approach but ran into a race condition if you ran things fast enough. I’ve also considered just pushing data into the buffer as quickly as possible using a non-interrupt driven approach. Oh, the buffer isn’t full? Well, here’s more data. After we figured out what was causing us so many problems before (hyperthreading), it did not look like such measures would be necessary.

I really, really don’t want to make such a major change to the driver at this point. Testing passes on the majority of target devices. A change such as this would force us to retest on every target. This could delay the release of the product by months and we’re under the gun right now. This issue with one silly device is holding up everything.

Brad


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

The dedicated thread approach, coupled with high IRQL is called hijacking a core and is valid if you control everything about your environment.

Sent from Surface Pro

From: xxxxx@hotmail.com
Sent: ‎Tuesday‎, ‎April‎ ‎14‎, ‎2015 ‎12‎:‎59‎ ‎PM
To: Windows System Software Devs Interest List

If I’m understanding your issue correctly the undesired behavior is that there is too much latency between subsequent transmit WORDs due to an underrun condition.

The sequence of events is:
(1) receive a buffer empty interrupt
(2) Run your DPC to put more data in the buffer
The I/O with hardware in the DPC has very slow transactions.
The transmitter is able to transmit the entire contents of the buffer before you can provide more
data to transmit. This introduces latency between subsequent WORDs.

It may seem that responding to the interrupt more quickly would be a suitable solution, but really the underlying issue is that you need to keep the buffer from emptying. One approach would be to look for an alternate interrupt condition that would notify you when the buffer is ‘almost’ empty. This would give you time to add more data to the buffer before the transmitter drains it. What you really need to figure out is just how long your I/O transactions with the device are taking. Then you need to instrument how quickly your transmitter can drain the buffer. If you know these two things you can then figure out just how low you can allow the buffer to get before you are approaching your underrun condition. If you don’t know this information and need a shotgun approach, your best option it to likely abandon the interrupt driven model and fill the buffer in a separate thread. You could have the thread have an arbitrary wait interval and the sole responsibility of the thread is to keep feeding that buffer.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer