IoConnectInterrupt fails

I have developed a WDM driver for a PCI card. For some reason, the IoConnectInterrupt on a specific slot. IoConnectInterrupt returns STATUS_INVALID_PARAMETER. At first I just thought there was something wrong with the motherboard. You see, if a move the PCI card to another PCI slot, the card works just find. IoConnectInterrupt returns STATUS_SUCCESS.

Now I just found out that one of our customers is having the same problem. However, this customer only has one PCI slot which means the customer cannot move the PCI card to another slot.

Here’s the weird part, we have another PCI card of a different type that works just find in that slot. I took a look at the parameters passed to IoConnectInterrupt between the card that does not work and the one that does, and they are the same. There are two different drivers but the Start Device intitialization of the drivers are basically the same.

I am using the translated info vice the raw info from PCM_PARTIAL_RESOURCE_LIST during the Start Device processing.

I have searched the internet and I have found suggestion of what might be causing the problem but nothing so far has fixed the problem.

The engineering says there are no difference between the cards. We think it might be an OS thing but not really sure. Maybe a motherboard but again there is no proof.

All the parameters to IoConnectInterrupt are the same between each card on that specific slot. It just that one card works and the other one does not.

I move the card to another slot and both cards work just find.

Any ideas or suggestions.

are you using all of the values from the resource descriptor or are you modifying any of them (DIRQL, affinity, etc)?

d


From: xxxxx@lists.osr.com [xxxxx@lists.osr.com] on behalf of xxxxx@getntds.com [xxxxx@getntds.com]
Sent: Friday, September 25, 2009 12:26 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] IoConnectInterrupt fails

I have developed a WDM driver for a PCI card. For some reason, the IoConnectInterrupt on a specific slot. IoConnectInterrupt returns STATUS_INVALID_PARAMETER. At first I just thought there was something wrong with the motherboard. You see, if a move the PCI card to another PCI slot, the card works just find. IoConnectInterrupt returns STATUS_SUCCESS.

Now I just found out that one of our customers is having the same problem. However, this customer only has one PCI slot which means the customer cannot move the PCI card to another slot.

Here’s the weird part, we have another PCI card of a different type that works just find in that slot. I took a look at the parameters passed to IoConnectInterrupt between the card that does not work and the one that does, and they are the same. There are two different drivers but the Start Device intitialization of the drivers are basically the same.

I am using the translated info vice the raw info from PCM_PARTIAL_RESOURCE_LIST during the Start Device processing.

I have searched the internet and I have found suggestion of what might be causing the problem but nothing so far has fixed the problem.

The engineering says there are no difference between the cards. We think it might be an OS thing but not really sure. Maybe a motherboard but again there is no proof.

All the parameters to IoConnectInterrupt are the same between each card on that specific slot. It just that one card works and the other one does not.

I move the card to another slot and both cards work just find.

Any ideas or suggestions.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

You could use a PCI bus analyzer on “bad” slot and see if your card is properly assigned PCI resources including interrupt.

Igor Sharovar

Doron Holan:

I do not modify any of the resources.

John

Igor Sharovar

How would I know if the assigned PCI resources including interrupt are OK. Remember, the other card that works on that bad slot get the same resources as the card that doesn’t work in that slot.
The hardware engineer want’s to call Microsoft. I don’t think its a good idea. They will probably say there’s something wrong with the motherboard or my driver. I really don’t think its a driver issue. The code the gets the resources and call IoConnectInterrupt are basically the same code.

Weird!
John

In my experience a PCI bus analyzer help me couple times to identify a problem. You could see if there are any errors in PCI transactions. Especially you should check such messages during the booting of your system when BIOS assign PCI resources to your card. In fact BIOS assign such resources not Windows. I did not say that using PCI bus analyzer would solve your problem but at least you would see that your PCI resources assigned properly. A good exercise would be also compare “good” cards and a card that behave not like you expect. PCI bus analyzer is expensive tool but it is possible to rent one for a while. I strongly believe that any company which develop PCI base devices must have such tool.

Igor Sharovar

Igor wrote:

_It’s actually not uncommon that HW doesn’t act in an expected fashion for SW – indeed, it’s amazing that a PC even boots at *all* sometimes! … a few things to verify with your HW fellow before you pop the MS button …

  • Go get a copy of the PCI bus spec book and make *sure* that every bit in the PCI config region has been mapped to an address *and* that there is a valid value in that region, even if the spec says the area is “optional”. Engineers, especially with smaller FPGA’s, tend to put memory “holes” where they don’t think they need them (such as in “optional” areas) to save gate counts …
  • Make *sure* that *every* address line has been properly traced to the bus *and* is controlled, not just left to “float”. Again, HW engineers will try to save pin counts by tying all of a range of address lines together or will leave some of them unconnected/ tied to some common ground or trace. This is especially important for the sort of Heisenbug that you’re seeing: imagine that one address line is a “floater”, which means it can have an indeterminate state on power up. Suppose that on one particular PCI slot the bridge vendor put a pulldown/ pullup on that line (which means it will force the line one way or another) and on another slot (for some reason) that pullup/ pulldown isn’t there. That means when you pop the card into the “first” slot the address line is pulled to a value, the OS makes the read and all is good. When the card goes in the “second” slot, though, it’s a coin toss what value the address line has and the OS may or may not get a good read … ouch! [I make the HW people knock out pushups when I find floating address lines, it’s that important]
  • Before you pull the analyzer, see if it’s a physical configuration problem, i.e. heat sensitive or RFI sensitive: does the problem happen on powerup (cold start) or after the device has been running for awhile. Are there any components nearby the “bad” slot that could be generating EMI that aren’t next to the “good” slot? Again, PCI is very finicky for EMI and if an engineer isn’t quite as careful about shielding the traces they can again be getting bad reads

Finally, as suggested earlier, the analyzer is the next step: make a capture of a “good” cycle and one of a “bad” cycle and see what is different …

Good luck!_

Craig Howard:

Thanks for the info. I will talk with the Engineer this morning about your suggestion. However, I do believe the address lines are not floating.

I do have a PCI Bus analyzer. I’ll check that out.

Thank again.

Craig Howard:

What you suggest turns out to be a Xilinx thing. This is what the engineers use to produce the firmware. We do set are Vendor ID/Device ID and the range of the addresses we use. We also indicate the type of PCI we are, which we set to PCI Communication Device. As far as the rest of the PCI configuration, it is handle by the PCI core of Xilinx. The PCI core was fully qualified by Xilinx to meet the PCI specifications.

This PCI device is actually a PMC device that is attached to a PMC to PCI adapter. Now I just found out that our PMC to PCI adapter though they do work are not completely following the rules of the PCI specs. The “good” card is also a PMC device. I am using the same PMC to PCI adapter on both the “Good” card and the “Bad”.

We do have a lot of other PMC and PCI adapters. I’m going to find out if any of the other cards behave the same way as the “bad” card.

Craig Howard:

I have put all of are PCI adapters into that slot and the all work, include the PMC/104+ adapters using a PCI carrier card.

Using a PCI Bus analyzer I did see something. The BIOS write a 0x100 to the PCI Config space command register all the good cards return a 0x106 except for the bad card. It returns a 100. The engineers are not thinking that the BUS controller is not getting loaded fast enough. We are investigating.

No luck in trying to figure out what IoConnectInterrupt fails. We switch to a quad-core machine and are having the same problem. One slot is causing IoConnectInterrupt to fail. Move it down a slot and IoConnectInterrupt does not fail and the adapter gets installed.

We just can’t figure out what is wrong with this adapter. The other adapters ("good) always work.

Anybody have any ideas what might be happening here? We are using the Pheonix Bios. Could that be the problem?

John, … umm … you already *know* what is happening with the bad slot – you mentioned it earlier, you are sporadically getting an “A” when you should be getting a “B”. Right now everyone is doing the “duck and cover” to avoid answering the real question, “why are we getting an A when we were supposed to get a B” … more importantly, since timing issues are like roaches in the kitchen, “if we are getting an A when we are supposed to be getting a B then can we trust we’re getting a C later”? You’re fortunate that you’re getting this sort of error where it can be (reasonably) easily detected – imagine if this were in a DMA transfer of user data …

  • It has nothing to do with a quad, dual, octo or whatever core – this is purely a PCIe bus transaction, it doesn’t care what processor is driving it
  • It has nothing to do with the Phoenix BIOS – the PCIe microcode is in the NorthBridge, and moreover has been tested with trillions of PCIe operations across hundreds of thousands of computers

Look to the symptoms you’ve reported: a) it occurs sporadically and b) only happens in a specific slot. That says either EMI problems or timing in the PCIe (Xilinx) core. I would have your engineers take a really hard look at the timing they are using with the Xilinx core, specifically how close they cutting the rise and fall times of the engine inputs (which is where most sporadic timing errors manifest). Make test runs of 10,000 transactions for each slot and graph how many pass and how many fail (your LogicAnalyzer can do this). Put the adapter on a big long PCIe extender cable and re-run the tests for each slot. Do as much as possible to characterize the problem …

Like I said, everyone is in “duck and cover” mode right now – have your hardware engineers prove that it’s “not their stuff” before they are allowed to point at BIOS’s or quad cores …

Good Luck!

Craig:
Thanks for you response.
I found the problem. It had nothing to do with hardware. We verified that the BIOS was do it thing correctly. We installed Linux and ran the same driver (the Linux version) and it worked with no problems.

I started to suspect there was something wrong with Windows XP. I was using the older DDK version, 3790.1830 (WDM). I downloaded the latest version 7600.16385.0 (WDK) and recompiled the driver. That did not fix the interrupt problem. Then, I decided to use IoConnectInterruptEx instead of IoConnectInterrupt. I modified the code and recompiled the driver. IT WORKED!! What a surprise. Why the other drivers worked and this one didn’t, I do not care. I am glad I got this to work. Now I can send the updated driver to our customer to see if it fixes their problem.

Fixing this problem did give me more work to do, I now have a lot of deprecated functions I must convert to the new way of doing things. I will also modify the drivers, which previously had no problems with connecting interrupt, to the new way.

What OS are making the API call on? If it is pre Vista, IoConnectInterruptEx just thunks down to IoConnectInterrrupt

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@getntds.com
Sent: Thursday, October 08, 2009 9:10 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IoConnectInterrupt fails

Craig:
Thanks for you response.
I found the problem. It had nothing to do with hardware. We verified that the BIOS was do it thing correctly. We installed Linux and ran the same driver (the Linux version) and it worked with no problems.

I started to suspect there was something wrong with Windows XP. I was using the older DDK version, 3790.1830 (WDM). I downloaded the latest version 7600.16385.0 (WDK) and recompiled the driver. That did not fix the interrupt problem. Then, I decided to use IoConnectInterruptEx instead of IoConnectInterrupt. I modified the code and recompiled the driver. IT WORKED!! What a surprise. Why the other drivers worked and this one didn’t, I do not care. I am glad I got this to work. Now I can send the updated driver to our customer to see if it fixes their problem.

Fixing this problem did give me more work to do, I now have a lot of deprecated functions I must convert to the new way of doing things. I will also modify the drivers, which previously had no problems with connecting interrupt, to the new way.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Using Windows XP Pro.

I was surprised that it worked, I truly wasn’t expecting the good results but I’ll take it.

Interesting. I would step through the ex call and see what it is doing differently to make the call succeed

d

Sent from my phone with no t9, all spilling mistakes are not intentional.

-----Original Message-----
From: xxxxx@getntds.com
Sent: Thursday, October 08, 2009 2:27 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IoConnectInterrupt fails

Using Windows XP Pro.

I was surprised that it worked, I truly wasn’t expecting the good results but I’ll take it.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I would love to step through the EX call to see what’s going on. I just don’t have the time. I have other projects to work on.