Is your device implemented such that firmware has control over config cycles? Or firmware can ask the hardware to send config cycle retries status until firmware is ready.
I’ve seen PCIe device that take many seconds for the firmware to boot. Technically, if I remember details of the PCIe spec correctly, a device is allowed to return a retry status to a config read cycle, and the root complex does not complete the config read, but retries the config cycle, until it gets something other than a retry status. I was amazed to see the PCIe spec says a device can return this retry status FOREVER, and still be conforming to the spec (WTF). I have a fuzzy memory there are some limitations on when a device is allowed to do this, like perhaps only power on. Look at the PCIe spec for config retry status.
I know there was once some spec that said devices had to be ready in 100ms (I believe the PCI spec, but not the PCIe spec), but if you have frozen the processor, time stops as far as the cpu is concerned, so with hardware designer handwaving this long PCIe retry is not violating the 100ms requirement. Of course time only stops for the cpu doing the config read, so other cores may wonder why one core no longer responds to interrupts or is getting any work done. I’m guess the PCIe spec folks assume this would only happen on system power on, and then the BIOS never used more than one core. In the case of system power on, perhaps stalling the boot by 1000ms is not too evil.
I don’t personally think this is a sane hardware design, as it basically causes the processor to freeze on a config read during initial BIOS PCie enumeration. Some Intel (and perhaps other processors) have some optional handling in the root complex than can translate repeated config retry status into returning a value of 0. There then is a spec that says if you first read a device config space (VID/PID) and it comes back 0, you should interpret that as meaning the device is not ready to chat yet. This has the huge advantage the cpu is still running BIOS code, so if for example, the device doesn’t become ready after some timeout (30 seconds?), it can declare the device broken or not present. If you don’t have this logic in the root complex and BIOS, and you do something like flash new firmware and cause the firmware to crash at power up, you are in the situation that your system just powers on and locks, with no message or indication on what it’s problem is. A LOT of system don’t implement this, but I assume some do.
Some PCIe hardware devices have a “firmware ready” control but that causes them to return config retry status, until the firmware says it’s ready. The strategy is the power on EEPROM configures the hardware to return config retry until the firmware is ready. For example I believe some/many PCIe bridge chips support this, so if you have a bridge in your device, firmware get’s to say when it will accept config requests. A problem with this is that some devices when reset, or after a firmware update, will then stall config cycles, locking the OS initiated PCIe tree enumeration. I’ve known a number of device that do this. Again, the possible solution to the silly PCIe spec is for the root complex to not lock the CPU when config retries happen. If you search the Linux kernel for PCIe bus handling, there are code comments that talk about this.
Your device might comply with the PCIe spec, and still have this problematic behavior.
While I’m on a rand about silly hardware behavior. The SMBUS spec does some handwaving about potential conflicting addresses, but has no details on what address should be avoided. I had some hardware that caused the system to not boot when you added memory to all DRAM slots. I eventually tracked the issue down to the fact that many desktop and single socket server boards share the only SMBUS/I2C controller on the chipset between the DRAM I2C config eeproms and the PCIe SMBUS pins. What then happens is at power on, some PCIe cards by default use the same SMBUS/I2C address as the system DRAM config info, and the BIOS can’t read the DRAM config to program the DRAM controllers, and the system either locks or gives a beep code. Larger servers have multiple I2C controllers, and the PCIe SMBUS pins are then on a different bus than the DRAM, so you never have this conflict. I’ve seen cards from multiple vendors with this problem. The workaround is to put some tape over the SMBUS pins on the offending card (doable, but intricate), so their SMBUS interface is disabled, preventing a conflict with DRAM. As far as I can tell, these devices conform to the SMBUS specs, and fail on some/many systems. This mostly causes compatibility issues between cards intended for servers, but inserted into desktop or single socket server machines.
Jan
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@pisem.net
Sent: Friday, February 2, 2018 1:28 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] PCIe re-enumeration on Wndows
Hi
I have PCIe card that has PCIe configuration time longer than 100ms, and on some hardware it is not detectable by Windows until soft reset.
I understand that device should be re-designed in order to comply with the PCIe specification, but my task is to make the existing cards working on Windows 7/8/10.
If I just run new HW enumeration from the Device Manager, it can not find the card.
So my question is can it be possible to enforce the OS to detect such bad device?
As I understand PCIe should support hot-plug, so why not…
Thanks I advance!
—
NTDEV is sponsored by OSR
Visit the list online at: http:
MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:
To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>