Re: Driver PCI returned invalid ID for a child device

For those who are interested in the solution:

It had nothing to do with our own driver, but with a PLX PCIe switch
chip. Each board contained such a chip which has also a small EEPROM
inside. In this EEPROM you can program a serial number. In our case we
had multiple boards (thus multiple PLX PCIe switch chips) without a
serial number programmed in EEPROM of the PLX chip. After programming a
serial number, the error message disappeared in the event log. It looks
like modern systems are more strict in checking this kind of data,
because at older systems there was no issue.

Best regards,

Frank van Eijkelenburg

On 10-05-16 17:30, xxxxx@technolution.nl wrote:

If I take the first six entries from the device manager:

PCI\VEN_1187&DEV_7F01&SUBSYS_7F011187&REV_00\6&9F0CD7D&0&00300012
PCI\VEN_1187&DEV_7F01&SUBSYS_7F011187&REV_00\6&34359174&0&00480012
PCI\VEN_1187&DEV_7F01&SUBSYS_7F011187&REV_00\6&20DDCEC0&0&00400008
PCI\VEN_1187&DEV_7F01&SUBSYS_7F011187&REV_00\6&A3066DB&0&00280018
PCI\VEN_1187&DEV_7F01&SUBSYS_7F011187&REV_00\6&14CC3038&0&00280010
PCI\VEN_1187&DEV_7F01&SUBSYS_7F011187&REV_00\6&3650DC51&0&00400018


NTDEV is sponsored by OSR

Visit the list online at: http:
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

Thank you, Mr. van Eijkelenburg, for taking the time to past back here and report the resolution to this issue.

Can you please clarify one detail for us: Did your device not have a serial number capability, or did you have one that have the capability but with no serial number (either all zeros or all 0xFF).

I’ll be very surprised to hear that the (optional) serial number capability is now somehow required. I’ve never met a board that implements it, and if something somewhere is now trying to ensure that capability is present…

Peter
OSR
@OSRDrivers

Hi Peter,

My colleague investigated the issue and as far as I understand it was the
PEX8624 device on our board which had the capability for a serial number
and also had a valid (default) serial number present. However, this serial
number must be unique within a single system.

In our case, we had multiple boards within a single system and for all
PEX8624 devices the serial number was the same (the default manufacturer
serial number).

You can overrule the default serial number by writing the EEPROM of the
PEX8624 device.

Best regards,

Frank van Eijkelenburg

On Fri, Jun 17, 2016 at 3:28 PM, wrote:

> Thank you, Mr. van Eijkelenburg, for taking the time to past back here and
> report the resolution to this issue.
>
> Can you please clarify one detail for us: Did your device not have a
> serial number capability, or did you have one that have the capability but
> with no serial number (either all zeros or all 0xFF).
>
> I’ll be very surprised to hear that the (optional) serial number
> capability is now somehow required. I’ve never met a board that implements
> it, and if something somewhere is now trying to ensure that capability is
> present…
>
> Peter
> OSR
> @OSRDrivers
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

If you cannot provide the really unique serial number - then just switch it off totally.

You can go without the serial number. For instance, most modems (like EV-DO or LTE) in my experience do not have one, and do work fine. Yes, if you insert the modem to another USB port - then the new devnode is created for it with the new COM%d number, but still the thing works - and yes, modern Windows modem stack can find the modem to dial out even without the COM%d number.

But, providing the “serial number” which is not actually unique is the way to disaster. Older Windows just plain crash if the 2 devices with the same “serial number” are detected.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com
“Frank van Eijkelenburg” wrote in message news:xxxxx@ntdev…
Hi Peter,

My colleague investigated the issue and as far as I understand it was the PEX8624 device on our board which had the capability for a serial number and also had a valid (default) serial number present. However, this serial number must be unique within a single system.

In our case, we had multiple boards within a single system and for all PEX8624 devices the serial number was the same (the default manufacturer serial number).

You can overrule the default serial number by writing the EEPROM of the PEX8624 device.

Best regards,

Frank van Eijkelenburg

On Fri, Jun 17, 2016 at 3:28 PM, wrote:

Thank you, Mr. van Eijkelenburg, for taking the time to past back here and report the resolution to this issue.

Can you please clarify one detail for us: Did your device not have a serial number capability, or did you have one that have the capability but with no serial number (either all zeros or all 0xFF).

I’ll be very surprised to hear that the (optional) serial number capability is now somehow required. I’ve never met a board that implements it, and if something somewhere is now trying to ensure that capability is present…

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

Thank you again, Mr. van Eijkelenburg … That’s very helpful to know.

Peter
OSR
@OSRDrivers

As I’m a R&D driver engineer working at Avago/Broadcom on PLX based products (PLX was acquired by Avago 2 years ago, Avago acquired Broadcom this spring and the name on the building is now Broadcom Limited which is now classified as the 3rd largest semiconductor company, Avago had also acquired LSI Logic and Emulex a while back, and was originally a spinoff from HP’s semiconductor group). I’ll send an email to make sure the PLX technical support engineers are aware of this bridge serial number requirement for Windows OSs. It might not be a requirement to be PCIe spec compliant or to work with Linux. From a card manufacturing point of view, assuring each PCIe bridge has a unique serial number burned into the configuration EEPROM sounds like a requirement that could easily get missed (or ignored for board cost reduction).

My understanding is Windows would like every PCIe device to have a unique serial number, and this is not always the case.


Starting in about Windows Server 2012, the OS PCIe bus driver tries to use PCIe serial numbers as a unique value for the PnP instance id generation. In previous versions, a hash of the PCIe device tree (like the BDF) was used. When software control over the PCIe hierarchy started happening (I think in the 2009 time frame), using a PnP id based on the location in the PCIe bus hierarchy became a problem. An example of the problem is you would have a server with 2 partitionable NIC cards. You set the firmware configuration on the first PCIe NIC card to partition the NIC into 4 devices instead of 1. When the system was rebooted the PnP instance ids of the second NIC had changed, because the first NIC was now using 4 PCIe bus ids instead of 1 and the second NIC is now at different BDFs. Nothing had been physically moved. When the NIC PnP instance id changed, all the settings associated with that NIC no longer are found, which is especially catastrophic if the system was iSCSI booting from that second NIC.

So, the solution to changing PCIe BDF was to make the PCIe bus driver use the device serial number as the PnP instance id. The bus hierarchy could change, but the serial number stayed the same. A few side effects of this was your NIC device goes bad and you plug in a replacement, and the new card has a different serial number, so generates a different PnP instance id, and your iSCSI booted system no longer boots because it can’t find the NIC settings. I believe a workaround to this was also added (there was as Server 2012 hot fix, and believe Server 2012 R2 had the fix out of the box), so that if a new PCIe serial number shows up, the PCIe bus driver attempts to match the BDF and VID/PID, and use the old instance id.

So the new PCIe instance id generation logic also has to deal with the case of a device has a serial number, but it’s a duplicate of another device serial number. My understanding is the PCIe bus driver has to calculate a repeatable unique value for the PnP instance id, which I believe is prefixed with a hash of parent bus PnP instance ids. The result of this algorithm is PCIe devices get a repeatable instance id, although if you uninstall the parent bridge drivers and let the OS rediscover them, the instance ids are not guaranteed to be the same as before. If your PCIe device has a unique serial number, it doesn’t matter what the instance ids of the parent bridges are. If your device does not have a serial number or has duplicate numbers, it does matter to the generation of your device instance id. There is a KMDF WHQL test that removes the parent bridges that can stimulate this behavior.

There has been ongoing evolution of PnP instance id generation, and have not heard if Win 10/Server 2016 have changed. If anybody has corrections or additions, please do add to this thread.

This is also part of the reason Thunderbolt support is difficult. If you plug in a Thunderbolt device, it participates in the PCIe bus hierarchy enumeration, and can cause changes to the BDF and as a result PnP instance ID of devices later in the tree. To correctly support Thunderbolt dynamically, the OS would need to do PCIe bus id rebalancing, which I fuzzily remember being partially supported in Win 10. A problem is if a boot critical device needs to have it’s PCIe bus id reassigned, you’re stuck because you can’t disable it, so you can’t reassign bus ids, so you can’t open up bus id space for the hot plugged Thunderbolt device.

I assume NVME storage devices may be introducing issues in PCIe enumeration, as it seems reasonable to want NVME devices you can partition, just like you want NICs you can partition. You definitely want hot pluggable NVME storage devices.

Some hardware engineers are oblivious about how OSs need to dynamically cope with hot plug devices, so are unaware of the need for PCIe serial numbers. Linux and Windows also handle device enumeration a little differently, so a product that works fine on Linux may not work fine on Windows.

Jan
From: on behalf of Frank van Eijkelenburg
Reply-To: Windows List
Date: Friday, June 17, 2016 at 6:55 AM
To: Windows List
Subject: Re: [ntdev] Re: Driver PCI returned invalid ID for a child device

Hi Peter,
My colleague investigated the issue and as far as I understand it was the PEX8624 device on our board which had the capability for a serial number and also had a valid (default) serial number present. However, this serial number must be unique within a single system.

In our case, we had multiple boards within a single system and for all PEX8624 devices the serial number was the same (the default manufacturer serial number).

You can overrule the default serial number by writing the EEPROM of the PEX8624 device.
Best regards,
Frank van Eijkelenburg

On Fri, Jun 17, 2016 at 3:28 PM, > wrote:
Thank you, Mr. van Eijkelenburg, for taking the time to past back here and report the resolution to this issue.

Can you please clarify one detail for us: Did your device not have a serial number capability, or did you have one that have the capability but with no serial number (either all zeros or all 0xFF).

I’ll be very surprised to hear that the (optional) serial number capability is now somehow required. I’ve never met a board that implements it, and if something somewhere is now trying to ensure that capability is present…

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:

— NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! Details at To unsubscribe, visit the List Server section of OSR Online at</http:></http:></http:>

The PCIe switches don’t need a serial number and better not have it, because they don’t need to carry persistent settings in the registry. It’s the endpoint devices that need a serial number.

One PCIe device I worked on had a bus driver which enumerated child devices such as Network device and storage adapters off a single PCIe function. The cards had unique serial numbers. But the bus driver used its bus/device/function to generate the child instance IDs (instead of using its own instance ID as base) and that was causing fuckups when its bus number changed.