Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

PCIe re-enumeration on Wndows

NikolayNikolay Member Posts: 55
Hi

I have PCIe card that has PCIe configuration time longer than 100ms, and on some hardware it is not detectable by Windows until soft reset.
I understand that device should be re-designed in order to comply with the PCIe specification, but my task is to make the existing cards working on Windows 7/8/10.

If I just run new HW enumeration from the Device Manager, it can not find the card.

So my question is can it be possible to enforce the OS to detect such bad device?
As I understand PCIe should support hot-plug, so why not...

Thanks I advance!

Comments

  • OSR_Community_UserOSR_Community_User Member Posts: 110,218
    You could try turning on Extended Synch for the particular port the card is on. You can also extend the sync timeout values from 1000ns to 10000ns. Either change the values in AmiBcp and reflash the altered bios (risky) or use the RU Utility to changr the specific register values in your bios corresponding to the setup program or the platform IntelSetup modules by changing their IFR values.

    Sent from my iPhone

    > On Feb 2, 2018, at 4:27 PM, xxxxx@pisem.net <xxxxx@lists.osr.com> wrote:
    >
    > Hi
    >
    > I have PCIe card that has PCIe configuration time longer than 100ms, and on some hardware it is not detectable by Windows until soft reset.
    > I understand that device should be re-designed in order to comply with the PCIe specification, but my task is to make the existing cards working on Windows 7/8/10.
    >
    > If I just run new HW enumeration from the Device Manager, it can not find the card.
    >
    > So my question is can it be possible to enforce the OS to detect such bad device?
    > As I understand PCIe should support hot-plug, so why not...
    >
    > Thanks I advance!
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list online at: <http://www.osronline.com/showlists.cfm?list=ntdev>;
    >
    > MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    > Details at <http://www.osr.com/seminars>;
    >
    > To unsubscribe, visit the List Server section of OSR Online at <http://www.osronline.com/page.cfm?name=ListServer>;
  • Dzmitry_AltukhouDzmitry_Altukhou Member - All Emails Posts: 16
    I'm afraid but only the way to solve your problem and make the device to be
    comply with PCIe standard is re-disign of this device.
    The Device Manager does not executes any re-enumeration of existing
    peripheral modules, it just retrieve this information from the BIOS side.
    BIOS firmware does this regularly during PC booting time. That's why your
    attempt to scan connected plug&play devices has no effect.
    Concerning the hot-plug, for that you need a special motherboard supported
    this feature. Regular motherboards do not support this feature since
    special chipset is required for support given mechanism.

    BR,
    Dzmitry

    On Feb 2, 2018 10:27 PM, "xxxxx@pisem.net" wrote:

    > Hi
    >
    > I have PCIe card that has PCIe configuration time longer than 100ms, and
    > on some hardware it is not detectable by Windows until soft reset.
    > I understand that device should be re-designed in order to comply with the
    > PCIe specification, but my task is to make the existing cards working on
    > Windows 7/8/10.
    >
    > If I just run new HW enumeration from the Device Manager, it can not find
    > the card.
    >
    > So my question is can it be possible to enforce the OS to detect such bad
    > device?
    > As I understand PCIe should support hot-plug, so why not...
    >
    > Thanks I advance!
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list online at: showlists.cfm?list=ntdev>
    >
    > MONTHLY seminars on crash dump analysis, WDF, Windows internals and
    > software drivers!
    > Details at
    >
    > To unsubscribe, visit the List Server section of OSR Online at <
    > http://www.osronline.com/page.cfm?name=ListServer>;
    >
  • Jan_BottorffJan_Bottorff Member - All Emails Posts: 469
    Is your device implemented such that firmware has control over config cycles? Or firmware can ask the hardware to send config cycle retries status until firmware is ready.

    I've seen PCIe device that take many seconds for the firmware to boot. Technically, if I remember details of the PCIe spec correctly, a device is allowed to return a retry status to a config read cycle, and the root complex does not complete the config read, but retries the config cycle, until it gets something other than a retry status. I was amazed to see the PCIe spec says a device can return this retry status FOREVER, and still be conforming to the spec (WTF). I have a fuzzy memory there are some limitations on when a device is allowed to do this, like perhaps only power on. Look at the PCIe spec for config retry status.

    I know there was once some spec that said devices had to be ready in 100ms (I believe the PCI spec, but not the PCIe spec), but if you have frozen the processor, time stops as far as the cpu is concerned, so with hardware designer handwaving this long PCIe retry is not violating the 100ms requirement. Of course time only stops for the cpu doing the config read, so other cores may wonder why one core no longer responds to interrupts or is getting any work done. I'm guess the PCIe spec folks assume this would only happen on system power on, and then the BIOS never used more than one core. In the case of system power on, perhaps stalling the boot by 1000ms is not too evil.

    I don't personally think this is a sane hardware design, as it basically causes the processor to freeze on a config read during initial BIOS PCie enumeration. Some Intel (and perhaps other processors) have some optional handling in the root complex than can translate repeated config retry status into returning a value of 0. There then is a spec that says if you first read a device config space (VID/PID) and it comes back 0, you should interpret that as meaning the device is not ready to chat yet. This has the huge advantage the cpu is still running BIOS code, so if for example, the device doesn't become ready after some timeout (30 seconds?), it can declare the device broken or not present. If you don't have this logic in the root complex and BIOS, and you do something like flash new firmware and cause the firmware to crash at power up, you are in the situation that your system just powers on and locks, with no message or indication on what it's problem is. A LOT of system don't implement this, but I assume some do.

    Some PCIe hardware devices have a "firmware ready" control but that causes them to return config retry status, until the firmware says it's ready. The strategy is the power on EEPROM configures the hardware to return config retry until the firmware is ready. For example I believe some/many PCIe bridge chips support this, so if you have a bridge in your device, firmware get's to say when it will accept config requests. A problem with this is that some devices when reset, or after a firmware update, will then stall config cycles, locking the OS initiated PCIe tree enumeration. I've known a number of device that do this. Again, the possible solution to the silly PCIe spec is for the root complex to not lock the CPU when config retries happen. If you search the Linux kernel for PCIe bus handling, there are code comments that talk about this.

    Your device might comply with the PCIe spec, and still have this problematic behavior.

    While I'm on a rand about silly hardware behavior. The SMBUS spec does some handwaving about potential conflicting addresses, but has no details on what address should be avoided. I had some hardware that caused the system to not boot when you added memory to all DRAM slots. I eventually tracked the issue down to the fact that many desktop and single socket server boards share the only SMBUS/I2C controller on the chipset between the DRAM I2C config eeproms and the PCIe SMBUS pins. What then happens is at power on, some PCIe cards by default use the same SMBUS/I2C address as the system DRAM config info, and the BIOS can't read the DRAM config to program the DRAM controllers, and the system either locks or gives a beep code. Larger servers have multiple I2C controllers, and the PCIe SMBUS pins are then on a different bus than the DRAM, so you never have this conflict. I've seen cards from multiple vendors with this problem. The workaround is to put some tape over the SMBUS pins on the offending card (doable, but intricate), so their SMBUS interface is disabled, preventing a conflict with DRAM. As far as I can tell, these devices conform to the SMBUS specs, and fail on some/many systems. This mostly causes compatibility issues between cards intended for servers, but inserted into desktop or single socket server machines.

    Jan

    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@pisem.net
    Sent: Friday, February 2, 2018 1:28 PM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: [ntdev] PCIe re-enumeration on Wndows

    Hi

    I have PCIe card that has PCIe configuration time longer than 100ms, and on some hardware it is not detectable by Windows until soft reset.
    I understand that device should be re-designed in order to comply with the PCIe specification, but my task is to make the existing cards working on Windows 7/8/10.

    If I just run new HW enumeration from the Device Manager, it can not find the card.

    So my question is can it be possible to enforce the OS to detect such bad device?
    As I understand PCIe should support hot-plug, so why not...

    Thanks I advance!

    ---
    NTDEV is sponsored by OSR

    Visit the list online at: <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at <http://www.osronline.com/page.cfm?name=ListServer>;
  • NikolayNikolay Member Posts: 55
    > Is your device implemented such that firmware has control over config cycles?
    > Or firmware can ask the hardware to send config cycle retries status until firmware is ready

    Yes, I think FW can control the config cycles, and send retries.
    Actually I wanted to avoid any FW modifications if it is possible, but it looks like there is no
    other choice.

    Thanks for your help
  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 6,794
    <quote>While I'm on a rand about silly hardware behavior. The SMBUS spec does some
    handwaving about potential conflicting addresses,</quote>

    Nothing too very much to complain about in terms of this. Slave addresses on SMBus (and it's cousin I2C) are fixed. These buses are not intended to be "plug and play" where you can randomly add stuff and expect it to work. The host controller doesn't know what slaves are connected, and can't even talk to the slave unless it's given a specific slave address. So, whoever talks to the slave needs to know and provide the address.

    Now... things are different on I2C and SPI buses that are under the control of SPBCx in Windows 8.x and later... but those bus instances aren't the ones you're talking about.

    Peter
    OSR
    @OSRDrivers

    Peter Viscarola
    OSR
    @OSRDrivers

  • Pavel_APavel_A Member Posts: 2,643
    > I have PCIe card that has PCIe configuration time longer than 100ms, and on some
    > hardware it is not detectable by Windows until soft reset.

    But, on some hardware Windows will wait? So is the timeout is defined in Windows' pci driver or in the BIOS?

    > So my question is can it be possible to enforce the OS to detect such bad
    device?

    Well, if "the OS" is not Windows... it's easy :)

    These who deal with bring-ups of PCIe devices do this all the time. Models running on things like Cadence's Palladiums, Protiums may take several minutes to start. As Jan noticed, BIOSes often do not mind if some device starts slowly. It is very helpful to be able to plug these models into a standard PC. Of course, a real finished device _should_ better start timely. But I don't remember don't remember doing a chip bring-up under Windows in 10 last years.


    -- pa
  • Don_BurnDon_Burn Member - All Emails Posts: 1,625
    In my case it depends on what you mean about a chip bring up. I got hired
    to port a board that was shipping on Linux to Windows, it had two FPGA's one
    at revision 4 and the other at revision 2. By the time we debugged all PCI
    problems that Linux ignored, and the timing problems because we were faster,
    the initial beta went out with revision 35 and revision 15. Technically
    not a board bring up, but it might as well have been. Note: for the beta
    we were careful since we knew we could still cause the board to hang and
    then self-destruct by over heating.


    Don Burn
    Windows Driver Consulting
    Website: http://www.windrvr.com



    -----Original Message-----
    From: xxxxx@lists.osr.com
    [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@fastmail.fm
    Sent: Sunday, February 11, 2018 2:59 PM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: RE:[ntdev] PCIe re-enumeration on Wndows

    > I have PCIe card that has PCIe configuration time longer than 100ms,
    > and on some hardware it is not detectable by Windows until soft reset.

    But, on some hardware Windows will wait? So is the timeout is defined in
    Windows' pci driver or in the BIOS?

    > So my question is can it be possible to enforce the OS to detect such
    > bad
    device?

    Well, if "the OS" is not Windows... it's easy :)

    These who deal with bring-ups of PCIe devices do this all the time. Models
    running on things like Cadence's Palladiums, Protiums may take several
    minutes to start. As Jan noticed, BIOSes often do not mind if some device
    starts slowly. It is very helpful to be able to plug these models into a
    standard PC. Of course, a real finished device _should_ better start
    timely. But I don't remember don't remember doing a chip bring-up under
    Windows in 10 last years.


    -- pa


    ---
    NTDEV is sponsored by OSR

    Visit the list online at:
    <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
    drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at
    <http://www.osronline.com/page.cfm?name=ListServer>;
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!