Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV
Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


PCI device read crash

AlbertAlbert Member - All Emails Posts: 450

unsigned short addrPort = // has some value in it
unsigned short dataPort = // has some value
unsigned long pciReg = // has some value in it

KIRQL currentIrql;
KeRaiseIrql(DISPATCH_LEVEL, &currentIrql);
__outdword(addrPort, pciReg);
unsigned long Val = __indword(dataPort);
KeLowerIrql(currentIrql);

// do something with Val;

Is this code pattern correct while access a pci device? We sometimes (rarely) see crashes

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff80254d0a470, Address of the instruction which caused the bugcheck
Arg3: ffffdb815d8e0920, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

Sinse it is an access violation, will adding __try...__except block around the in and out instructions be able to mitigate the blue screen? What is the right way to program this?

Comments

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,695

    A critical section or a spin lock would be a better choice than arbitrarily raising the IRQL. What you're doing does not prevent code in another CPU from also raising its IRQL and executing the exact same code.

    You should be using READ_PORT_ULONG and WRITE_PORT_ULONG instead of the intrinsics. Do you really have I/O ports, and not memory-mapped registers? That's hard to imagine. I/O port access can be hundreds of times slower than memory-mapped I/O, and it's a limited resource. There shouldn't be any I/O ports in any 21st Century designs.

    However, that won't cause an access violation. EXACTLY what instruction triggers the blue screen? Is it possible you were already at dispatch?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • AlbertAlbert Member - All Emails Posts: 450

    @Tim_Roberts said:
    A critical section or a spin lock would be a better choice than arbitrarily raising the IRQL. What you're doing does not prevent code in another CPU from also raising its IRQL and executing the exact same code.

    Agree, that stood out to me too (I inherited this code).

    You should be using READ_PORT_ULONG and WRITE_PORT_ULONG instead of the intrinsics. Do you really have I/O ports, and not memory-mapped registers? That's hard to imagine. I/O port access can be hundreds of times slower than memory-mapped I/O, and it's a limited resource. There shouldn't be any I/O ports in any 21st Century designs.

    This tries to read the PCI config space, I don't think those have MMIO regs.

    However, that won't cause an access violation. EXACTLY what instruction triggers the blue screen? Is it possible you were already at dispatch?

    8: kd> u fffff80254d0a470 Drv!AcquireSpiBarPhysicalAddress+0xd0 fffff80254d0a470 0fb7d3 movzx edx,bx
    fffff80254d0a473 ed in eax,dx fffff80254d0a474 440f22c1 mov cr8,rcx
    fffff80254d0a478 418906 mov dword ptr [r14],eax fffff80254d0a47b 85c0 test eax,eax
    fffff80254d0a47d 740b je fffff80254d0a47f 48c70604000000 mov qword ptr [rsi],4
    fffff80254d0a486 33c0 xor eax,eax 8: kd> r Last set context: rax=000000008000fd10 rbx=0000000000000cfc rcx=0000000000000000 rdx=0000000000000cf8 rsi=ffff800cc7183c28 rdi=000000008000fd10 rip=fffff80254d0a470 rsp=ffffa4803a577620 rbp=0000000000000002 r8=ffff800ca46c6500 r9=0000000000000004 r10=fffff8020cc03d30 r11=0000000000000000 r12=ffff800cc7183c28 r13=ffff800cc7183bf0 r14=ffff800ca46c6500 r15=0000000000000004 iopl=0 nv up ei pl nz na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040206 Drv!AcquireSpiBarPhysicalAddress+0xd0: fffff80254d0a470 0fb7d3 movzx edx,bx

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,695

    I wondered if this was trying for config space. This code cannot be made safe. The configuration space does not belong to you, it belongs to the PCI bus driver. There is nothing you can do to synchronize with that driver. Code like this is inherently dangerous. If you are trying to read your OWN config space, you can do that through the PNP request IRP_MN_READ_CONFIG, or by fetching the BUS_INTERFACE_STANDARD interface, although all of the BARs are passed to you at IRP_MN_START_DEVICE time. What is this code trying to do?

    Regardless of that, however, the crash confuses me. The GP fault clearly occurred at the "out" instruction, which is the one immediately before the code you showed. "out" never causes a GP fault at ring 0.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • AlbertAlbert Member - All Emails Posts: 450

    @Tim_Roberts said:
    I wondered if this was trying for config space. This code cannot be made safe. The configuration space does not belong to you, it belongs to the PCI bus driver. There is nothing you can do to synchronize with that driver. Code like this is inherently dangerous. If you are trying to read your OWN config space, you can do that through the PNP request IRP_MN_READ_CONFIG, or by fetching the BUS_INTERFACE_STANDARD interface, although all of the BARs are passed to you at IRP_MN_START_DEVICE time. What is this code trying to do?

    It tries to read the SPI ROM, the HAL apis seems to mask them off, and hence this driver seems to try to go around HAL and enumerate the nodes and figure it out by itself.

    Regardless of that, however, the crash confuses me. The GP fault clearly occurred at the "out" instruction, which is the one immediately before the code you showed. "out" never causes a GP fault at ring 0.

    If I am reading it right, it can :
    https://www.felixcloutier.com/x86/out
    https://www.felixcloutier.com/x86/in

    The question is, is this type of exception catch-able by SEH in kernel?

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,695

    If I am reading it right, it can:

    Then you aren't reading it right. It will GP fault if the current privilege level is greater than IOPL. CPL is the low order 2 bits of CS, which as expected are 0 in your dump, and you can also see that IOPL is 0. It can't fault in kernel mode.

    Are you running in a VM? Maybe your hypervisor is enforcing restricted access to the CF8/CFC mechanism.

    When you say "SPI ROM", are you talking about accessing what PCI calls the "option ROM"? The one that's pointed to in PCI Configuration Space and used to be located in low memory? If so, you're screwed. If secure boot is enabled in your BIOS, and many systems do so, then option ROMs are not allowed. The system will suppress them, because it's a security flaw. You can't access it. It's a dead feature.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • AlbertAlbert Member - All Emails Posts: 450

    @Tim_Roberts said:

    If I am reading it right, it can:

    Then you aren't reading it right. It will GP fault if the current privilege level is greater than IOPL. CPL is the low order 2 bits of CS, which as expected are 0 in your dump, and you can also see that IOPL is 0. It can't fault in kernel mode.

    Yes, after speaking with you, I realized that.

    5: kd> .formats cs
    Evaluate expression:
    Hex: 00000000`00000010
    Decimal: 16
    Octal: 0000000000000000000020
    Binary: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010000
    Chars: ........
    Time: Wed Dec 31 16:00:16 1969
    Float: low 2.24208e-044 high 0
    Double: 7.90505e-323

    Are you running in a VM? Maybe your hypervisor is enforcing restricted access to the CF8/CFC mechanism.

    No, these are on various hardware like Lenovo and HP. I think we are getting somewhere here though, googling I found that both of these have Bios Protection technology in some of their high end devices. Perhaps those are enabled, I need to ask the customers with the exceptions to help us here.

    When you say "SPI ROM", are you talking about accessing what PCI calls the "option ROM"? The one that's pointed to in PCI Configuration Space and used to be located in low memory? If so, you're screwed. If secure boot is enabled in your BIOS, and many systems do so, then option ROMs are not allowed. The system will suppress them, because it's a security flaw. You can't access it. It's a dead feature.

    Yes, we realized early on (as I am told) that the HAL APIs mask much of it out by design. This is an old driver, perhaps it is time to retire it.

  • Mark_RoddyMark_Roddy Member - All Emails Posts: 4,375
    via Email
    So any access method that uses a 'write address read data ' sequence has to
    guarantee that the write-read sequence is atomic. As your driver is not the
    owner of the SPI, it cannot make that guarantee without doing horrible
    things. So every now and then your address write is going to interleave
    with some other SPI address write and bad things will happen.

    Mark Roddy
  • AlbertAlbert Member - All Emails Posts: 450

    @Mark_Roddy said:
    As your driver is not the
    owner of the SPI, it cannot make that guarantee without doing horrible
    things. So every now and then your address write is going to interleave
    with some other SPI address write and bad things will happen.

    Hi Mark, is there a legitimate way to take ownership of the SPI?

  • AlbertAlbert Member - All Emails Posts: 450

    This is for the SPI bus, but not the SPI flash ROM on the South Bridge. SPB cant acces the south bridge AFAIK, unless there is I2C.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,695

    Well, let me point out that you have still never said what you are actually trying to do. The word "SPI" is overloaded.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • AlbertAlbert Member - All Emails Posts: 450

    @Tim_Roberts said:
    Well, let me point out that you have still never said what you are actually trying to do. The word "SPI" is overloaded.

    I apologize. I am trying to read the SPI ROM.

    I think I figured out the problem though.

    This is a Windows Secure Core feature. In the latest Windows, if secure Biometrics is enabled and the ACPI SDEV table is present, then all accesses to PCI config space (R/W) through CFC/CF8 will be blocked by the Secure Kernel.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,160

    all accesses to PCI config space (R/W) through CFC/CF8 will be blocked by the Secure Kernel

    I did not know that. Do you have a pointer to docs that say this?

    That’s an excellent feature, if true.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Mark_RoddyMark_Roddy Member - All Emails Posts: 4,375
    via Email
    It seems odd to me as of course pci config space operations are being
    performed by device enumeration bus drivers, and for pci* that continues to
    use cf8. But if the op is using those sequences directly that suffers from
    the same write-read atomic sequence requirement. The driver can use the IRP
    RW config space commands to get the data if their driver is on the right
    device stack.
    Mark Roddy
  • AlbertAlbert Member - All Emails Posts: 450

    @Mark_Roddy said:
    It seems odd to me as of course pci config space operations are being
    performed by device enumeration bus drivers, and for pci* that continues to
    use cf8. But if the op is using those sequences directly that suffers from
    the same write-read atomic sequence requirement. The driver can use the IRP
    RW config space commands to get the data if their driver is on the right
    device stack.
    Mark Roddy

    I don't think there is any driver for the SPI ROM, is there? It is a hidden device, and not even shown.

  • AlbertAlbert Member - All Emails Posts: 450

    @Peter_Viscarola_(OSR) said:
    That’s an excellent feature, if true.

    Intel instruction manual says the out cannot throw in Ring 0, this behavior violates that and people who wrote code before the latest version of windows has no clue why they are crashing. Bios manufacturers are also crashing simply by trying to read the PCI BAR register/config space. One can argue that It breaks compatibility.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,160

    One can argue that It breaks compatibility.

    It breaks compat for something you're not supposed to be doing in the first place. "That's what you get for violating the rules" I say.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Writing WDF Drivers 7 Dec 2020 LIVE ONLINE
Internals & Software Drivers 25 Jan 2021 LIVE ONLINE
Developing Minifilters 8 March 2021 LIVE ONLINE