Re: [ntdev] Re: [ntdev] IRQL Priority Problems

Someone wrote C++ for an 8086? That language didn’t even exist until that processor was about a decade old

That aside, the application must be quite simplistic and probably not that delicate. It must be single threaded and code size restrictions of that era also preclude the possibility that it is millions of lines long. Undoubtedly it relies on very specific binary layouts and probably has timing assumptions that are invalid in any primitive multi-tasking OS. The good news is that even modern PCI based hardware can probably serve requests two orders of magnitude faster than the authors expected and the inter-instruction latency they designed for might be close to pre-emption latency on proper hardware - remember when 16 NS RAM was so fast we could hardly believe it was possible! And remember those 6 MHz memory cores! Fortunately paper tape was never something I had to worry about

Sent from Surface Pro

From: Brad Aswegan
Sent: ‎Monday‎, ‎April‎ ‎13‎, ‎2015 ‎8‎:‎13‎ ‎PM
To: Windows System Software Devs Interest List

Abuse doesn’t even describe what we want to do to this thing right about now.

The user mode application that actually uses the driver is a delicate little snowflake as well. The core C++ code was written (not kidding you) on an 8086. Everyone is terrified to touch it or start from scratch.

Also, correction, I meant 100+ microseconds.

From: xxxxx@lists.osr.com [xxxxx@lists.osr.com] on behalf of Marion Bond [xxxxx@hotmail.com]
Sent: Monday, April 13, 2015 7:02 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: [ntdev] IRQL Priority Problems

Well if this is avionics, then you can probably afford to abuse the host system - regardless of the OS you use!

Sent from Surface Pro

From: Brad Aswegan
Sent: ‎Monday‎, ‎April‎ ‎13‎, ‎2015 ‎6‎:‎16‎ ‎PM
To: Windows System Software Devs Interest List

If I had my choice, I would have developed this in Linux like the majority of our products that use this communication standard. Not my choice, we’re replacing a legacy product and its windows based.

The good news, the environment of devices we need to communicate with is pretty much static and we know every single one of them. This protocol is old and it’s not being used anymore, at least not on newer devices. They just use Ethernet. This is avionics though, so the existing devices will never be retired. This one device just can’t seem to handle it if there is any latency between words… pretty silly. But yes, this problem originates with the host and our driver. Our problem being that the previous version of this product works with the device, so we’ll get a lot of blowback over this (despite the new version working with 99% of the devices in use).

And yes, this is PCI… you would be terrified but we have to go through a PCIe-to-PCI bridge and then a PCI-to-local bus bridge. That’s how old this stuff is, designed to work in a different age of computing.

Thanks much for the advice,

Brad

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Phil Barila
Sent: Monday, April 13, 2015 4:48 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IRQL Priority Problems

As noted in your response to SNoone, you have made it appear to work most of the time. And if you can identify the device that is pushing you out of your timing spec, you can probably mitigate that by disabling that device, or taking some other approach.

I do hope you can resolve the issue with your one “device”. I suspect it’s actually the host system. If you are selling this as a general purpose thing, without controlling the host environment, it’s going to occur again, no matter how hard you try to sort out the issue with the single thing you’re currently chasing.

I hate to say it, but as long as your architecture doesn’t fit your environment, you’re always going to be looking over your shoulder at this kind of issue.

Since you mention interrupts, I assume that means a PCI device? If so, you might be able to make it work just by moving the PCI devices around to get a different interrupt, or different enumeration order, so your ISR gets invoked before that of the problem device.

Phil

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Brad Aswegan
Sent: Monday, April 13, 2015 3:29 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IRQL Priority Problems

I will have to look into the capabilities of the chipset we’re using. Off the top of my head, I’m not sure it does this kind of interrupting. Originally, we took a similar approach but ran into a race condition if you ran things fast enough. I’ve also considered just pushing data into the buffer as quickly as possible using a non-interrupt driven approach. Oh, the buffer isn’t full? Well, here’s more data. After we figured out what was causing us so many problems before (hyperthreading), it did not look like such measures would be necessary.

I really, really don’t want to make such a major change to the driver at this point. Testing passes on the majority of target devices. A change such as this would force us to retest on every target. This could delay the release of the product by months and we’re under the gun right now. This issue with one silly device is holding up everything.

Brad


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I’m sure you have had fun - and I’m also sure that you know that you will ultimately fail at making the magic timing happen every single time. You can succeed at making it work nearly every time

If it is for a live avionics system I would be very concerned with your approach, but if you just have to deal with old hardware and have a reasonable failure path (something other than a buffer underrun /timing error leads to unrecoverable engine failure at 10,000 Ft) you will be okay.

As I was reminding one of my colleges today, testing cannot prove the absence of bugs, but only their presence. The fact that you run a test 1,000,000 times in a row and don’t see a failure does not mean that the code is correct. Hence the reason we generally juxtapose standard debugging and testing with static analysis as a code review technique - they are independent and good at finding errors that the other will miss. But I digress

If you really must make it work every time and the consequence is life or safety critical, then you cannot use Window or *nix based operating systems. Then you would want to rely on the physical properties of the hardware and there are various designs possible

If you just need to make it work well enough, and you control the hardware, then hijack a core and abuse it

Sent from Surface Pro

From: xxxxx@teledyne.com
Sent: ‎Monday‎, ‎April‎ ‎13‎, ‎2015 ‎9‎:‎10‎ ‎PM
To: Windows System Software Devs Interest List

Truthfully, it might be C. It’s wrapped up into a C++ application now. You are absolutely correct about the timing problems with this code. You can’t even insert a debug statement into it without causing failure. It’s written deterministically, single threaded, synchronous operation and with specific timings. It is almost antithetical to modern operating systems.

I’m not responsible for the user mode application beyond getting all these timing messes to work flawlessly somehow. This has been a fun adventure in debugging the thing that doesn’t want to be debugged.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hijacking your only core is equivalent to simply doing the work & stalling in your ISR. In that case, Peter’s hybrid approach is probably the best choice

Sent from Surface Pro

From: Brad Aswegan
Sent: ‎Tuesday‎, ‎April‎ ‎14‎, ‎2015 ‎6‎:‎28‎ ‎PM
To: Windows System Software Devs Interest List

I’m actually on a conference call discussing this now. We really appreciate all the suggestions. We did look into other possible interrupting options, this chip ONLY supports an FIFO_EMPTY interrupt. We have approximately 360 microseconds after the empty interrupt to put more data into the buffer before that final word is done being transmitted.

We only have a single core processor as we disabled hyperthreading. Hyperthreading was causing lots and lots of problems with timings and our very deterministic application. Is it possible to “hijack” the only core we have? When we still had hyperthreading enabled, I tried reassigning our driver to the second “core” and it appears the framework still executes all ISRs on the primary “core”.

Cheers,

Brad

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Marion Bond
Sent: Tuesday, April 14, 2015 5:20 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: [ntdev] IRQL Priority Problems

The dedicated thread approach, coupled with high IRQL is called hijacking a core and is valid if you control everything about your environment.

Sent from Surface Pro

From: xxxxx@hotmail.com
Sent: ‎Tuesday‎, ‎April‎ ‎14‎, ‎2015 ‎12‎:‎59‎ ‎PM
To: Windows System Software Devs Interest List

If I’m understanding your issue correctly the undesired behavior is that there is too much latency between subsequent transmit WORDs due to an underrun condition.

The sequence of events is:
(1) receive a buffer empty interrupt
(2) Run your DPC to put more data in the buffer
The I/O with hardware in the DPC has very slow transactions.
The transmitter is able to transmit the entire contents of the buffer before you can provide more
data to transmit. This introduces latency between subsequent WORDs.

It may seem that responding to the interrupt more quickly would be a suitable solution, but really the underlying issue is that you need to keep the buffer from emptying. One approach would be to look for an alternate interrupt condition that would notify you when the buffer is ‘almost’ empty. This would give you time to add more data to the buffer before the transmitter drains it. What you really need to figure out is just how long your I/O transactions with the device are taking. Then you need to instrument how quickly your transmitter can drain the buffer. If you know these two things you can then figure out just how low you can allow the buffer to get before you are approaching your underrun condition. If you don’t know this information and need a shotgun approach, your best option it to likely abandon the interrupt driven model and fill the buffer in a separate thread. You could have the thread have an arbitrary wait interval and the sole responsibility of the thread is to keep feeding that buffer.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I agree with Peter - if the protocol is that slow, shove few words in via the ISR to prevent underrun and then do the rest in a DPC. On a single core machine, with a timing sensitive application, mucking with the system timer resolution could be just as bad as the problems you had with hyperthreading

Sent from Surface Pro

From: xxxxx@osr.com
Sent: ‎Tuesday‎, ‎April‎ ‎14‎, ‎2015 ‎10‎:‎50‎ ‎PM
To: Windows System Software Devs Interest List

Sorry… But I *really* don’t like your chosen solution.

What version of the OS are you using?

How are you going to use the timer? Wait on it, or have it queue a DPC? Either way, are you not screwed?

Don’t fix it until you’ve profiled it, and understand PRECISELY what the problem is.

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Well the how isn’t that hard

He had changed the system timer resolution to 1ms and set a timer to run every 2/5 ms and in the call back he stuffs as many words as will fit into his hardware

On account of the slow hardware he is working with, he manages with this design

Sent from Surface Pro

From: xxxxx@osr.com
Sent: ‎Thursday‎, ‎April‎ ‎16‎, ‎2015 ‎10‎:‎49‎ ‎AM
To: Windows System Software Devs Interest List

I am very happy this worked for you. I am also seriously struggling to understand how it could! This either points to a lack of knowledge on my part in terms of your original problem or Windows internals.

But, again, I’m very happy it’s worked out!,

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer