My $0.02 on PCI(e) devices and surprise removal.
If you surprise remove a PCI(e) device, the device memory regions on the PCI
bus will start causing master aborts when accessed. On a read, you can
detect this by checking the PCI status register or noticing you read back
all 1’s. The PCI spec defines this behavior, although the PCIe 2.0 spec just
seems to imply behavior will be the same as PCI without actually saying what
the behavior will be. The manuals for PCIe to processor bridge chips may
define the behavior of reading and getting a master abort (really a PCI-e
response timeout). Hardware that has all 1’s as valid register values is
broken, as you would always need to read the PCI status register after every
read to determine if there was an error.
Also note that if a device has PCI bus error interrupts enabled, these will
not get delivered synchronously with the invalid access, and it will be many
processor cycles later that you get notified of the error in the past. This
is one excellent argument of why mapping device registers into a user mode
app is not such a great idea, as people writing user mode code often will
not understand these subtle interactions with the hardware, and may assume
that a “memory read” has to either succeed or get an exception, which will
not be the case for mapped device memory. Long ago, processor architectures
were a lot less asynchronous, and you could deliver a “bus fault exception”
to a processor to indicate a hardware access had failed, just like the
processor can deliver a memory fault exception when a virtual address is not
valid. Those days are long done on modern high speed processors, although
perhaps some of the embedded processors still can treat bus faults as
synchronous processor exceptions.
As surprise removal can happen at any moment, you should always test read
values for all 1’s, indicating the device is gone. A test of a constant
against a processor register will not have much performance impact. I’ve
seen hardware that you read back a device register, and use it for an index
into memory (like a ring buffer index). This is especially nasty on surprise
removal, as if you don’t validate the value you will now index into an
invalid virtual address, causing a system crash. Writing to an invalid PCI
bus address often causes less damage, and just doesn’t do anything. You
might also get WHEA errors when the PCI master abort happens.
One way to simulate surprise removal for testing is to write to the parent
bridge control registers. This can make a device drop off the bus, which is
what happens if you physically remove it.
When you received the surprise removal PnP irp, your device is already gone,
so you can’t touch device registers (although you might try doing a device
reset, like via config space FLR, to get the device back to a clean state).
The issue is what state is the device in if say the connector has
intermitted contact, drivers like to work with devices in a known state. You
should then unmap the virtual addresses which point at invalid PCI bus
addresses. The OS will not unmap those virtual addresses for you.
Correctly handling surprise removal is one of those sticky little edge cases
than can happen and drivers sometimes don’t handle so well. Some might argue
that if the device vanishes, all bets of stability are off, and if the OS
crashes it’s the hardware’s fault. I’d argue that if a driver can prevent a
system crash caused by malfunctioning hardware, it should do so.
Jan
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-422606-
xxxxx@lists.osr.com] On Behalf Of xxxxx@hcl.in
Sent: Tuesday, August 24, 2010 6:28 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] surprise removal of an AHCI controller
Consider I have a PCIe hot-pluggable AHCI controller. I am removing the
controller from PCIe slot during I/O (not a graceful hot-plug as defined
by PCI
spec - presuming the hardware supports surprise removal).
I guess, PCI bus layer would invalidate memory mapped addresses upon
device removal.
- If the driver is executing IdeHwStartIo or servicing an
interrupt(IdeHwInterrupt), what will happen during accessing registers
through memory mapped address?
- PCI will notify the ATAport about the device removal. ATAport will call
IdeHwControl with action IdeStop. Is that correct?
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer