PCI device read crash

Albert · August 24, 2020, 9:20pm

unsigned short addrPort = // has some value in it
unsigned short dataPort = // has some value
unsigned long pciReg = // has some value in it

KIRQL currentIrql;
KeRaiseIrql(DISPATCH_LEVEL, &currentIrql);
__outdword(addrPort, pciReg);
unsigned long Val = __indword(dataPort);
KeLowerIrql(currentIrql);

// do something with Val;

Is this code pattern correct while access a pci device? We sometimes (rarely) see crashes

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff80254d0a470, Address of the instruction which caused the bugcheck
Arg3: ffffdb815d8e0920, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

Sinse it is an access violation, will adding __try…__except block around the in and out instructions be able to mitigate the blue screen? What is the right way to program this?

Tim_Roberts · August 24, 2020, 9:55pm

A critical section or a spin lock would be a better choice than arbitrarily raising the IRQL. What you’re doing does not prevent code in another CPU from also raising its IRQL and executing the exact same code.

You should be using READ_PORT_ULONG and WRITE_PORT_ULONG instead of the intrinsics. Do you really have I/O ports, and not memory-mapped registers? That’s hard to imagine. I/O port access can be hundreds of times slower than memory-mapped I/O, and it’s a limited resource. There shouldn’t be any I/O ports in any 21st Century designs.

However, that won’t cause an access violation. EXACTLY what instruction triggers the blue screen? Is it possible you were already at dispatch?

Albert · August 24, 2020, 10:26pm

@Tim_Roberts said:
A critical section or a spin lock would be a better choice than arbitrarily raising the IRQL. What you’re doing does not prevent code in another CPU from also raising its IRQL and executing the exact same code.

Agree, that stood out to me too (I inherited this code).

You should be using READ_PORT_ULONG and WRITE_PORT_ULONG instead of the intrinsics. Do you really have I/O ports, and not memory-mapped registers? That’s hard to imagine. I/O port access can be hundreds of times slower than memory-mapped I/O, and it’s a limited resource. There shouldn’t be any I/O ports in any 21st Century designs.

This tries to read the PCI config space, I don’t think those have MMIO regs.

However, that won’t cause an access violation. EXACTLY what instruction triggers the blue screen? Is it possible you were already at dispatch?

8: kd> u fffff80254d0a470 Drv!AcquireSpiBarPhysicalAddress+0xd0 fffff80254d0a470 0fb7d3 movzx edx,bx
fffff80254d0a473 ed in eax,dx fffff80254d0a474 440f22c1 mov cr8,rcx
fffff80254d0a478 418906 mov dword ptr [r14],eax fffff80254d0a47b 85c0 test eax,eax
fffff80254d0a47d 740b je fffff80254d0a47f 48c70604000000 mov qword ptr [rsi],4
fffff80254d0a486 33c0 xor eax,eax 8: kd> r Last set context: rax=000000008000fd10 rbx=0000000000000cfc rcx=0000000000000000 rdx=0000000000000cf8 rsi=ffff800cc7183c28 rdi=000000008000fd10 rip=fffff80254d0a470 rsp=ffffa4803a577620 rbp=0000000000000002 r8=ffff800ca46c6500 r9=0000000000000004 r10=fffff8020cc03d30 r11=0000000000000000 r12=ffff800cc7183c28 r13=ffff800cc7183bf0 r14=ffff800ca46c6500 r15=0000000000000004 iopl=0 nv up ei pl nz na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040206 Drv!AcquireSpiBarPhysicalAddress+0xd0: fffff80254d0a470 0fb7d3 movzx edx,bx

Tim_Roberts · August 25, 2020, 12:11am

I wondered if this was trying for config space. This code cannot be made safe. The configuration space does not belong to you, it belongs to the PCI bus driver. There is nothing you can do to synchronize with that driver. Code like this is inherently dangerous. If you are trying to read your OWN config space, you can do that through the PNP request IRP_MN_READ_CONFIG, or by fetching the BUS_INTERFACE_STANDARD interface, although all of the BARs are passed to you at IRP_MN_START_DEVICE time. What is this code trying to do?

Regardless of that, however, the crash confuses me. The GP fault clearly occurred at the “out” instruction, which is the one immediately before the code you showed. “out” never causes a GP fault at ring 0.

Albert · August 25, 2020, 1:30am

@Tim_Roberts said:
I wondered if this was trying for config space. This code cannot be made safe. The configuration space does not belong to you, it belongs to the PCI bus driver. There is nothing you can do to synchronize with that driver. Code like this is inherently dangerous. If you are trying to read your OWN config space, you can do that through the PNP request IRP_MN_READ_CONFIG, or by fetching the BUS_INTERFACE_STANDARD interface, although all of the BARs are passed to you at IRP_MN_START_DEVICE time. What is this code trying to do?

It tries to read the SPI ROM, the HAL apis seems to mask them off, and hence this driver seems to try to go around HAL and enumerate the nodes and figure it out by itself.

Regardless of that, however, the crash confuses me. The GP fault clearly occurred at the “out” instruction, which is the one immediately before the code you showed. “out” never causes a GP fault at ring 0.

If I am reading it right, it can :
https://www.felixcloutier.com/x86/out
https://www.felixcloutier.com/x86/in

The question is, is this type of exception catch-able by SEH in kernel?

Tim_Roberts · August 25, 2020, 3:41am

If I am reading it right, it can:

Then you aren’t reading it right. It will GP fault if the current privilege level is greater than IOPL. CPL is the low order 2 bits of CS, which as expected are 0 in your dump, and you can also see that IOPL is 0. It can’t fault in kernel mode.

Are you running in a VM? Maybe your hypervisor is enforcing restricted access to the CF8/CFC mechanism.

When you say “SPI ROM”, are you talking about accessing what PCI calls the “option ROM”? The one that’s pointed to in PCI Configuration Space and used to be located in low memory? If so, you’re screwed. If secure boot is enabled in your BIOS, and many systems do so, then option ROMs are not allowed. The system will suppress them, because it’s a security flaw. You can’t access it. It’s a dead feature.

Albert · August 25, 2020, 5:48am

@Tim_Roberts said:

If I am reading it right, it can:

Then you aren’t reading it right. It will GP fault if the current privilege level is greater than IOPL. CPL is the low order 2 bits of CS, which as expected are 0 in your dump, and you can also see that IOPL is 0. It can’t fault in kernel mode.

Yes, after speaking with you, I realized that.

5: kd> .formats cs
Evaluate expression:
Hex: 00000000`00000010
Decimal: 16
Octal: 0000000000000000000020
Binary: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010000
Chars: …
Time: Wed Dec 31 16:00:16 1969
Float: low 2.24208e-044 high 0
Double: 7.90505e-323

Are you running in a VM? Maybe your hypervisor is enforcing restricted access to the CF8/CFC mechanism.

No, these are on various hardware like Lenovo and HP. I think we are getting somewhere here though, googling I found that both of these have Bios Protection technology in some of their high end devices. Perhaps those are enabled, I need to ask the customers with the exceptions to help us here.

When you say “SPI ROM”, are you talking about accessing what PCI calls the “option ROM”? The one that’s pointed to in PCI Configuration Space and used to be located in low memory? If so, you’re screwed. If secure boot is enabled in your BIOS, and many systems do so, then option ROMs are not allowed. The system will suppress them, because it’s a security flaw. You can’t access it. It’s a dead feature.

Yes, we realized early on (as I am told) that the HAL APIs mask much of it out by design. This is an old driver, perhaps it is time to retire it.

Mark_Roddy · August 25, 2020, 12:10pm

So any access method that uses a 'write address read data ’ sequence has to
guarantee that the write-read sequence is atomic. As your driver is not the
owner of the SPI, it cannot make that guarantee without doing horrible
things. So every now and then your address write is going to interleave
with some other SPI address write and bad things will happen.

Mark Roddy

Albert · August 25, 2020, 6:54pm

@Mark_Roddy said:
As your driver is not the
owner of the SPI, it cannot make that guarantee without doing horrible
things. So every now and then your address write is going to interleave
with some other SPI address write and bad things will happen.

Hi Mark, is there a legitimate way to take ownership of the SPI?

Mark_Roddy · August 25, 2020, 10:57pm

Take a look at
https://docs.microsoft.com/en-us/samples/microsoft/windows-driver-samples/spbtesttool/

Mark Roddy

Albert · August 25, 2020, 11:43pm

@Mark_Roddy said:
Take a look at
https://docs.microsoft.com/en-us/samples/microsoft/windows-driver-samples/spbtesttool/

Mark Roddy

This is for the SPI bus, but not the SPI flash ROM on the South Bridge. SPB cant acces the south bridge AFAIK, unless there is I2C.

Tim_Roberts · August 26, 2020, 4:36am

Well, let me point out that you have still never said what you are actually trying to do. The word “SPI” is overloaded.

Albert · August 26, 2020, 7:06am

@Tim_Roberts said:
Well, let me point out that you have still never said what you are actually trying to do. The word “SPI” is overloaded.

I apologize. I am trying to read the SPI ROM.

I think I figured out the problem though.

This is a Windows Secure Core feature. In the latest Windows, if secure Biometrics is enabled and the ACPI SDEV table is present, then all accesses to PCI config space (R/W) through CFC/CF8 will be blocked by the Secure Kernel.

Peter_Viscarola_OSR · August 26, 2020, 12:05pm

all accesses to PCI config space (R/W) through CFC/CF8 will be blocked by the Secure Kernel

I did not know that. Do you have a pointer to docs that say this?

That’s an excellent feature, if true.

Peter

Mark_Roddy · August 26, 2020, 12:31pm

It seems odd to me as of course pci config space operations are being
performed by device enumeration bus drivers, and for pci* that continues to
use cf8. But if the op is using those sequences directly that suffers from
the same write-read atomic sequence requirement. The driver can use the IRP
RW config space commands to get the data if their driver is on the right
device stack.
Mark Roddy

Albert · August 26, 2020, 6:26pm

@Mark_Roddy said:
It seems odd to me as of course pci config space operations are being
performed by device enumeration bus drivers, and for pci* that continues to
use cf8. But if the op is using those sequences directly that suffers from
the same write-read atomic sequence requirement. The driver can use the IRP
RW config space commands to get the data if their driver is on the right
device stack.
Mark Roddy

I don’t think there is any driver for the SPI ROM, is there? It is a hidden device, and not even shown.

Albert · August 26, 2020, 8:08pm

@“Peter_Viscarola_(OSR)” said:
That’s an excellent feature, if true.

Intel instruction manual says the out cannot throw in Ring 0, this behavior violates that and people who wrote code before the latest version of windows has no clue why they are crashing. Bios manufacturers are also crashing simply by trying to read the PCI BAR register/config space. One can argue that It breaks compatibility.

Peter_Viscarola_OSR · August 26, 2020, 8:15pm

One can argue that It breaks compatibility.

It breaks compat for something you’re not supposed to be doing in the first place. “That’s what you get for violating the rules” I say.

Peter