x1 PCIe in x4 slot confusion

Hi all,

We have a x1 PCIe IO card we have been shipping for some time (this is a closed system by the way) and we realized recently that it does not work in x4 slots. It shows up in the Windows Device manager with the correct vendor/product ID, but windows cannot find the driver (yes, tested our board in a x1 and another device in the x4 slot and both work).

I thought that the whole PCIe cross-slot compatibility system was one of those things that Just Works, but I am now starting to think that there is something crafty I need to do in the INF file, or possibly something in the (Altera) PCIe core. The first few pages of Google have not pointed to anything blindingly obvious for either possibility.

Any thoughts?

Best,

Tom

I doesn’t load the driver…or the device doesn’t show up in device manager at all?

There is certainly nothing you need to do in your INF related to this.

It should “just work”… but the BIOS might be uncooperative. Have you tried this (your x1 in a x4 slot) on multiple machines, by different vendors?

Peter

Hi Peter,

Thanks for the feedback.

It does not appear that our driver gets loaded (at least a debugPrint in the load entry point does not get hit).

However, the device does appear in the device manager I can see the vendor/product ID and they are correct. But it has errors saying it cannot find a driver. So it made me think that there were some additional incantations I needed in the inf file to let it know that this driver was the right one. But of course, I could not find documentation on any such incantations which makes sense because you are saying there are none.

We did try on a couple of different machines (all Dells of differing vintage) and the behavior is a little different on them. Some lock up (which makes me think there is a problem in the PCIe core configuration that is choking the BIOS/Bus driver), but others do what I am describing above.

But this is a solid data point I was looking for. Given that it should all work just fine, I will go back and double-check to make sure we are not doing something stupid in our test matrix and make sure my guys are getting cozy enough with WinDbg to be certain the driver is not getting loaded.

Best,

Tom

Have you run the PCIe compliance test (config space validation etc) and signal validation?

When plugged into a Linux (sorry!) machine, are there any interesting kernel messages?

Hi Pavel,

That would be a “no” on the PCIe compliance test I expect. I am out of my depth on the specific test you are mentioning; is this a software or hardware (logic analyzer) test? [Edit: I just Googled this and see you probably mean PCI-Sig compliance testing - no we did not do this. PCI-SIG is very expensive my by standards - even just for the VEN_ID - I did not bother with the rest - perhaps a mistake. ]

My EE did just plug it into a Linux box today when we were pondering this. The error he got was something to the effect of Invalid Class ID 0xfffff (this is off the top of my head). I took that to mean that we were not reporting a PCI specific device class properly which did not sound like the end of the world.

Thanks and all the best,

Tom

It’s been a long time. Tom.

Can you show us exactly what the hardware ID lines in your INF look like? The PnP hwid matching shouldn’t change based on slot.

My EE did just plug it into a Linux box today when we were pondering this. The error he got was something to the effect of Invalid Class ID 0xfffff (this is off the top of my head). I took that to mean that we were not reporting a PCI specific device class properly which did not sound like the end of the world.

I don’t know if the Class ID 0xfffff is the reason for the problem you encountered. The company I’m working for also manufactures some PCI cards (Radio clocks, GPS/GNSS receivers, IRIG time code receivers) for which there is no predefined Class ID, so we use Class ID 0x0880 (system peripheral). This has been working for 20 years now, and current PCIe x1 devices also work properly in x4 and other slots.

Martin

Are you SURE your driver isn’t loading? As opposed to loading, and say, returning an error at one if it’s early entry points? Yes, I read what you said about the DbgPrint not being seen… but…. Do you have the kernel debugger setup? Do that, if not. Enable an exception breakpoint when your driver is loaded (see the “sxe ld” command).

If the device is enumerated by PCI, and subsequently appears in Device Manager, I have never seen a situation where a properly matched driver won’t be loaded. Are you sure the VID and DID and (everything else) match? Do you see anything relevant in the Event Log (OK, that’s a stretch, but we can always hope)? If PCI.sys hated your device, in all the cases I’ve ever seen, that device does not appear in Device Manager.

If you do “update driver” in Device Manager for your device, and manually select the device, what happens? What does it say in setupapi.dev.log? On this exact system, if you put the card in a x1 slot does the driver load?

Peter

@Tom_Udale said:
Hi Peter,

Thanks for the feedback.

It does not appear that our driver gets loaded (at least a debugPrint in the load entry point does not get hit).

However, the device does appear in the device manager I can see the vendor/product ID and they are correct. But it has errors saying it cannot find a driver. So it made me think that there were some additional incantations I needed in the inf file to let it know that this driver was the right one. But of course, I could not find documentation on any such incantations which makes sense because you are saying there are none.

We did try on a couple of different machines (all Dells of differing vintage) and the behavior is a little different on them. Some lock up (which makes me think there is a problem in the PCIe core configuration that is choking the BIOS/Bus driver), but others do what I am describing above.

Hmm … as @Peter said, put a hard breakpoint right after the first bracket in the DriverEntry call and see if WinDbg hits on that; if it does, then we know the driver image is being loaded from the store. If that makes it, put a hard breakpoint at DeviceCreate and PrepareHardware and again see if you get a hit

It would also be instructive to look at the setup.inf.dev log file for your driver installation, to see if anything odd is happening

To get a bit more info …

  • Is this a KMDF (PnP) driver or a WDM (KernelService) one?
  • Are you enabling/ using MSI interrupts in the .inf?
  • On the machines that are failing, you can step the slot down to a x1 by using one of these [ https://www.amazon.com/Kingwin-Powered-Adapter-Flexible-Extension/dp/B07QBF2X6C/ref=sr_1_9?dchild=1&keywords=bitcoin+mining+pci+riser&qid=1635261615&sr=8-9 ] … if the card works with this in the x4 slot then it’s another datapoint, to see if the problem is only with x4 slots
  • Doublecheck the PCI config space as well as the ECAM space (with Linux if needed) and make sure that every field is properly filled out … if you don’t have a valid value, use 0xDEAD or something identifiable
  • Test on earlier OS’s, specifically Win10 and Win7 … MS tightens things up every OS revision, this might be an example of something tolerated in earlier version that isn’t tolerated now

The fact that you’re getting a VID/PID in all cases means that the OS is able to read the PCI config space and pull that info, but it’s unable to proceed further due to something it’s missing or is not finding acceptable, which in my (admittedly limited) experience comes down to interrupt resources (asking for MSI and it’s not there or done incorrectly) and BAR regions …

How much control do you have with the IP? Is it burned onto the card, or can you flash it?

Hi Tim,

Yes, it has been a long time. I hope all is well with you.

Funny you should ask about the INF files. We have 2 INF files for this device, one for Win7 and one for Win10 (https://community.osr.com/discussion/291262/driver-signing-on-windows-7-and-10).

In principal these INFs should be identical but for the CAT= line, but it was reported to me today that by some incredible case of knuckleheadery, they are not. Indeed the Win10 one is very wrong (incorrect VID/PID, no MSI interrupt stuff). I am having trouble constructing exactly how this ever works(ed) in any slot on Win10, but this certainly could be the entire problem.

I was not able to get to the bottom of this today but hopefully will be able to tomorrow and then we can see where we stand.

With any luck the legacy of this thread will be “no magic with PCIe width compatibility, just don’t be a knucklehead”.

Best,

Tom

Hi Craig, Peter, and Martin,

Thanks for the ideas and comments.

The optimist in me says the INF file is the problem.

However, the pragmatist says there is more to the story because the INF file in question should not have worked at all. The VID/PID was wrong and the MSI stuff was missing, so the card should never have worked on any Win10 system - but it most certainly did. So I have a feeling we will be running through your lists before I am out of the woods, but I need to clear up this INF thing before I risk wasting everyone’s time.

All the best,

Tom

1 Like

Definitely run through the list, but it could still very well be the INF file; you don’t need an INF for the OS to recognize that there’s something on the bus and have that show up in device manager, just a PCIe config region for the OS to get a VID/PID from; there’s a (vaguely) related thread here [ https://community.osr.com/discussion/293149/inf-file-creates-unknown-device-in-device-manager#latest ] recently on this which I commented in about halfway through

Check the other items, but I’m 95% sure that it’s a crummy INF file. You should be able to use the same INF file for Win7 and Win10, so simply swapping the two and attempting to access the card will give another datapoint on the list …

It’s the INF. I’d even place a bet on it.

Peter

It’s the INF. I’d even place a bet on it.

Ding, ding, ding - we have a winner.

So in the end there were two bugs that canceled each other out. First bug was that the W10 INF was completely broken. Second bug was that our installer was copying both the W7 and W10 INFs onto the machines. So Windows 10 was pulling the W7 INF and masking the W10 INF error.

I think we got lucky with the signing for a while but when MS changed the rules again the W7 cat would not validate any more and that was when we found the installer bug. I am not certain why the W10 rigs did not fail immediately after we fixed that bug, but I suspect some copies of the old INF/drivers floating around on the machines in the OEM.inf store or something.

Anyway, fixing the INF makes it all happy again. The PCIe width stuff really does work transparently as advertised. This reinforces the old maxim of “If it’s not on fire, it’s the software”.

Thanks to everyone for all the help. I will try to come with a less mundane problem in the future.

Best,

Tom