PCIe device disappears from bus after being disabled, but only if drivers are installed?

Hi all, I’m currently in the process of bringing up some custom video controller hardware under Windows 10 and am running into some issues that have made the driver development process incredibly annoying. Basically, every time the device is disabled (be that manually or when trying to deploy a new version of the driver to debug), it completely disappears from the PCI bus. Attempts to re-enable it give errors indicating that the device no longer exists, and I need to perform an entire reboot cycle to get it to come back. This basically means that every attempt to deploy and debug a new driver involves at least one full restart, at which point the driver is already installed and I end up missing the startup process.

The hardware in question is a Xilinx/AMD K26 SOM FPGA on a custom carrier card that exposes the PCIe interface. From what I can tell, during the device disable process none of the PCIe endpoint state changes in the hardware - it still thinks it’s connected, but Windows does not. Note that I can successfully flash Xilinx’s PCIe DMA bitstream to the device and use their Windows drivers to communicate with it, so I’m pretty confident that the actual PCIe interface/hardware work.

Now, if I completely remove the drivers and reboot the machine, the device shows up as a generic Video Controller (due to its Physical Function class code settings) and I can disable and re-enable over and over without issue, leading me to believe that something about the driver is causing the device to disappear completely when it is disabled.

Note that this driver, at the moment, is completely barebones - it does basically nothing but try to map the various PCI BARs presented by the device and then read from some registers through those addresses. I can get VS to attach to the remote machine, but I have thus far been unable to actually get it to stop at a breakpoint in the driver so I can step through and actually see what it’s doing. The driver IS logging trace messages that I can view using traceview, which show that it’s progressing through the driver as I would expect, during both device startup and shutdown.

Somewhere, after the driver is done trying to clean up the device, Windows seems to be completely removing the device from the bus and disposing of it. The only way to get it back is to restart.

Does this make any kind of sense? Granted, I should probably spend some time getting the debugger working properly (perhaps foregoing VS’ wrapper and just sticking with windbg directly?) so I can see if anything odd is happening as I step through the driver, but this behavior is still a bit confusing to me.

Please forgive if these questions seem too basic. I’ve made enough dumb mistakes to know that sometimes even a dumb question from the outside can be a trigger.

What makes you say the device drops off the bus? Are you looking at Device Manager? Have you looked at the “Devices by Connection” view, just in case it is re-enumerating in a different class? The fact that the Xilinx driver can talk to it means it must be in Device Manager somewhere, it’s just not where you’re looking.

Do the setupapi logs in \Windows\Inf have anything interesting? Have you looked at the error log?

After much head scratching and digging through the HDL, I realized that my design was not properly responding to PCIe transactions at all, leaving them basically open on the bus and never completed. Once the driver was uninstalled, the PCIe root realized it could no longer communicate with the device (because it was not accepting non-posted TLPs), assumed that it must no longer exist, and when Windows asked it responded that there was no longer a device on the bus at that location.

Once I fixed the issue in the hardware design, it’s now behaving as I’d expect - I can disable and re-enable it and the driver is doing what it should. I’m having issues reading memory from the device, but that’s down to what’s happening on the device itself and isn’t related to what the driver is doing.

A silly mistake, to be sure. PCIe is complicated!

PCIe is complicated!

That is the understatement of the week. I’m sure most technical people don’t realize how dang much traffic there is even on an idle PCIe bus. The exchange of credits and status reports goes on almost continuously.

1 Like