PCIe card randomly not detected at boot time

We are seeing an intermittent issue with our PCIe card not being detected at boot time on an XP system. The system comes up, and no trace of the PCIe card to be found – looking in Device Manager the card is not listed. Selecting detect hardware changes does not cause the card to be detected – it is just like it doesn’t exist. Looking at PCI info in windbg and using siv32x, I can see no trace of the card at all. Another reboot of the system will cause the issue to go away – the card is always detected the next time the system is rebooted and then the problem won’t happen again for days or weeks. We have never been able to reproduce it on demand – just once in a blue moon, the card isn’t there after the system is restarted.

It doesn’t seem to me like this would be an issue with the driver – it is a PnP driver, so, since the card is never detected, no attempt would be made to load the driver. Additionally, I believe we have previously seen the Intel NIC that is on the same system not detected, although that hasn’t happened in a very long time and it is not clear that that is related. We have a logic analyzer hooked up right now on one of the systems where we have seen the issue, waiting for the problem to reproduce to see if we can capture anything interesting. Beyond that, I am at a loss for what to look at. I thought I would throw this out there in case anyone had come across a similar issue, or had any advice on what I could look at to diagnose the issue.

Any ideas would be greatly appreciated.

Thanks,
Sherri

You didn’t mention the hardware platform/BIOS. Dell had a serious problem one time about their bridges, as have others from time to time. Pay close attention when the system is powered up/restarted. Do the banners and card listings show that card when it has a problem detecting it in OS? Does the card exhibit similar behavior when installed in a completely different computer?

This sounds to me very much like a BIOS and/or bridge problem, nothing the OS can do with that. I would suggest checking with the motherboard manufacturer to see if they have a new BIOS and if any of the release notes mention such a problem. If it’s a major brand, like Dell, you may be SOL as they seem much less responsive to these kinds of issues than the MoBo manufacturers.

Just a few thoughts…

xxxxx@yahoo.com wrote:

From: xxxxx@yahoo.com
To: “Windows System Software Devs Interest List”
Subject: [ntdev] PCIe card randomly not detected at boot time
Date: Tue, 15 Jun 2010 10:16:15 -0400 (EDT)

We are seeing an intermittent issue with our PCIe card not being detected at boot time on an XP system. The system comes up, and no trace of the PCIe card to be found – looking in Device Manager the card is not listed. Selecting detect hardware changes does not cause the card to be detected – it is just like it doesn’t exist. Looking at PCI info in windbg and using siv32x, I can see no trace of the card at all. Another reboot of the system will cause the issue to go away – the card is always detected the next time the system is rebooted and then the problem won’t happen again for days or weeks. We have never been able to reproduce it on demand – just once in a blue moon, the card isn’t there after the system is restarted.

It doesn’t seem to me like this would be an issue with the driver – it is a PnP driver, so, since the card is never detected, no attempt would be made to load the driver. Additionally, I believe we have previously seen the Intel NIC that is on the same system not detected, although that hasn’t happened in a very long time and it is not clear that that is related. We have a logic analyzer hooked up right now on one of the systems where we have seen the issue, waiting for the problem to reproduce to see if we can capture anything interesting. Beyond that, I am at a loss for what to look at. I thought I would throw this out there in case anyone had come across a similar issue, or had any advice on what I could look at to diagnose the issue.

Any ideas would be greatly appreciated.

Thanks,
Sherri


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks very much for the response. It is an HP Z800. We have updated to the latest available bios. Not sure what you mean by banners and card listings? The problem is we can’t know until the system is already booted that there is a problem, and beiong that right now the problem only reproduces every couple weeks or so, we can’t really pay attention every time it boots, and even so, I am not sure what we would be looking for. If there were some information we could be logging that would be great, but I don’t know what that would be…

It is a good idea to try it in another system, there are some complications with that that I have been working on. To try this, I first need to reproduce it on the z800 offline (i.e. not connected to our equipment) and have tried to do this by setting it to repeatedly reboot and check for the card but have run into issues because of frequent hangs at boot time which, of course, may or may not be related to this issue, so trying to narrow that down. Situation is further complicated because we actually have a pair of cards installed together – both have basically same hardware but different fpga and different connection to our equipment. So need to narrow down whether it matters whether both are installed or just one, (or even neither) etc., which is difficult to do when I can’t get the problem to reproduce.

A logic analyzer may not be the most efficient tool for this problem. I PCIe bus analyzer does a better job here. Hook it up and watch how the card responds to config cycle during enumeration (both bios and os)

good luck,
Calvin

On 6/15/2010 7:16 AM, xxxxx@yahoo.com wrote:

We are seeing an intermittent issue with our PCIe card not being detected at boot time on an XP system. The system comes up, and no trace of the PCIe card to be found – looking in Device Manager the card is not listed. Selecting detect hardware changes does not cause the card to be detected – it is just like it doesn’t exist. Looking at PCI info in windbg and using siv32x, I can see no trace of the card at all. Another reboot of the system will cause the issue to go away – the card is always detected the next time the system is rebooted and then the problem won’t happen again for days or weeks. We have never been able to reproduce it on demand – just once in a blue moon, the card isn’t there after the system is restarted.

It doesn’t seem to me like this would be an issue with the driver – it is a PnP driver, so, since the card is never detected, no attempt would be made to load the driver.

PCIExpress has some rather rigid rules about the timing of the
enumeration process at boot time. If, for example, your device is
trying to load an FPGA from an EEPROM before it can responds to config
space cycles, and that process takes too long, the root complex will
simply shut you down, game over.

This is a horribly difficult problem to chase down. I’m not aware of
any good way to chase it without using a PCIExpress bus analyzer, and
they don’t come cheap. A logic analyzer MIGHT be of some help, if you
have access to the insides of your PCIExpress IP.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

>We are seeing an intermittent issue with our PCIe card not being detected at boot time on an XP >system. The system comes up, and no trace of the PCIe card to be found – looking in Device >Manager the card is not listed.
It is likely hardware problem. If your device is not listed in Device Manager it means that Windows PnP stuff, or be more specific PnP Manager, could not retrieve correctly information from device PCI Configuration registers. I would recommend to use PCI bus analyzer as Calvin suggest. If this problem is relatively easy to reproduce try to use PCI analyzer in the beginning of boot. You could see some messages which read PCI Configuration registers from installed devices.

Igor Sharovar

Thanks very much for all the responses.

Sorry – I misspoke. We have a PCIe analyzer hooked up, not a logic analyzer. Glad to hear that is the right way to go.

On 06/15/2010 05:33 PM, xxxxx@yahoo.com wrote:

Not sure what you mean by banners and card listings?

If you switch the BIOS boot settings (from “quiet” or “silent” or
“fast”) to “normal”, the BIOS should normally output a screen page with
all detected devices - e.g. graphics cards, drives and other PCI(e)
devices. => card listings

Also often PCI(e) devices have own init code that prints some banner
(e.g. graphics cards do this regularly). => banners

(Suppressing these text-output messages and having a fancy boot logo is
all nice and well, but sometimes it is better to see the POST and BIOS
boot messages.)