surprise removal of an AHCI controller

Consider I have a PCIe hot-pluggable AHCI controller. I am removing the controller from PCIe slot during I/O (not a graceful hot-plug as defined by PCI spec - presuming the hardware supports surprise removal).

I guess, PCI bus layer would invalidate memory mapped addresses upon device removal.

  1. If the driver is executing IdeHwStartIo or servicing an interrupt(IdeHwInterrupt), what will happen during accessing registers through memory mapped address?
  2. PCI will notify the ATAport about the device removal. ATAport will call IdeHwControl with action IdeStop. Is that correct?

Consider I have a PCIe hot-pluggable AHCI controller. I am removing the controller from PCIe slot during I/O (not a graceful hot-plug as defined by PCI spec - presuming the hardware supports surprise removal).

I guess, PCI bus layer would invalidate memory mapped addresses upon device removal.

  1. If the driver is executing IdeHwStartIo or servicing an interrupt(IdeHwInterrupt) at that time, what will happen when accessing registers through memory mapped address?

  2. PCI will notify the ATAport about the device removal. ATAport will call IdeHwControl with action IdeStop, followed by AtaAdapterControl with action IdeStop. Is that correct?
    a. One of the old MS presentations on ATAport mentions “per channel, only one channel entry point will be running at a time”. If this still holds good, IdeHwControl will not be called unless HwStartIo or HwInterrupt returns. In this case, StartIo and Interrupt routines are trying to access the hardware register, how would they exit gracefully?

  3. In IdeHwControl and AtaAdapterControl, what should a miniport driver do? There is no hardware to stop and all memory allocations/mappings are controlled by ATAport, so it would take care of that.

My $0.02 on PCI(e) devices and surprise removal.

If you surprise remove a PCI(e) device, the device memory regions on the PCI
bus will start causing master aborts when accessed. On a read, you can
detect this by checking the PCI status register or noticing you read back
all 1’s. The PCI spec defines this behavior, although the PCIe 2.0 spec just
seems to imply behavior will be the same as PCI without actually saying what
the behavior will be. The manuals for PCIe to processor bridge chips may
define the behavior of reading and getting a master abort (really a PCI-e
response timeout). Hardware that has all 1’s as valid register values is
broken, as you would always need to read the PCI status register after every
read to determine if there was an error.

Also note that if a device has PCI bus error interrupts enabled, these will
not get delivered synchronously with the invalid access, and it will be many
processor cycles later that you get notified of the error in the past. This
is one excellent argument of why mapping device registers into a user mode
app is not such a great idea, as people writing user mode code often will
not understand these subtle interactions with the hardware, and may assume
that a “memory read” has to either succeed or get an exception, which will
not be the case for mapped device memory. Long ago, processor architectures
were a lot less asynchronous, and you could deliver a “bus fault exception”
to a processor to indicate a hardware access had failed, just like the
processor can deliver a memory fault exception when a virtual address is not
valid. Those days are long done on modern high speed processors, although
perhaps some of the embedded processors still can treat bus faults as
synchronous processor exceptions.

As surprise removal can happen at any moment, you should always test read
values for all 1’s, indicating the device is gone. A test of a constant
against a processor register will not have much performance impact. I’ve
seen hardware that you read back a device register, and use it for an index
into memory (like a ring buffer index). This is especially nasty on surprise
removal, as if you don’t validate the value you will now index into an
invalid virtual address, causing a system crash. Writing to an invalid PCI
bus address often causes less damage, and just doesn’t do anything. You
might also get WHEA errors when the PCI master abort happens.

One way to simulate surprise removal for testing is to write to the parent
bridge control registers. This can make a device drop off the bus, which is
what happens if you physically remove it.

When you received the surprise removal PnP irp, your device is already gone,
so you can’t touch device registers (although you might try doing a device
reset, like via config space FLR, to get the device back to a clean state).
The issue is what state is the device in if say the connector has
intermitted contact, drivers like to work with devices in a known state. You
should then unmap the virtual addresses which point at invalid PCI bus
addresses. The OS will not unmap those virtual addresses for you.

Correctly handling surprise removal is one of those sticky little edge cases
than can happen and drivers sometimes don’t handle so well. Some might argue
that if the device vanishes, all bets of stability are off, and if the OS
crashes it’s the hardware’s fault. I’d argue that if a driver can prevent a
system crash caused by malfunctioning hardware, it should do so.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-422606-
xxxxx@lists.osr.com] On Behalf Of xxxxx@hcl.in
Sent: Tuesday, August 24, 2010 6:28 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] surprise removal of an AHCI controller

Consider I have a PCIe hot-pluggable AHCI controller. I am removing the
controller from PCIe slot during I/O (not a graceful hot-plug as defined
by PCI
spec - presuming the hardware supports surprise removal).

I guess, PCI bus layer would invalidate memory mapped addresses upon
device removal.

  1. If the driver is executing IdeHwStartIo or servicing an
    interrupt(IdeHwInterrupt), what will happen during accessing registers
    through memory mapped address?
  2. PCI will notify the ATAport about the device removal. ATAport will call
    IdeHwControl with action IdeStop. Is that correct?

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Mark Roddy

On Tue, Aug 24, 2010 at 9:39 PM, wrote:

> 2. PCI will notify the ATAport about the device removal.

Not on all platforms. In fact on most platforms the pci bus driver will be
clueless and it is up to your function driver to detect the unplug and kick
the bus driver.

Thanks Jan, for clearing explaining the PCIe perspective of surprise H/W removal.

Mark,

Please clarify on platforms. If a PCIe device is plug out, won’t the PCI driver know about it through presence detect?

In my case, if miniport detects (through register read of all 1s) the surprise removal, how does it notify the ATAport? There is a ATAport library routine - AtaPortReportBusChange. It kick starts the ATA rescan. This is not the scenario here. How do I notify ATAport that there is no controller?

xxxxx@hcl.in wrote:

Please clarify on platforms. If a PCIe device is plug out, won’t the PCI driver know about it through presence detect?

Only if the root complex reports it properly, and if the PCI bus driver
is looking for it. On Windows XP, for example, the PCI driver does not
expect devices to come and go. XP has no knowledge of PCIExpress at all.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

First report bus change and then start flunking everything
with SRB_STATUS_NO_HBA. Eventually the port driver will clue up and convince
pci the device (hba) is gone and pnp will run and all will be good, or more
likely everything will be a pile of smoking rubble.

Mark Roddy

On Fri, Aug 27, 2010 at 8:57 PM, wrote:

> Thanks Jan, for clearing explaining the PCIe perspective of surprise H/W
> removal.
>
> Mark,
>
> Please clarify on platforms. If a PCIe device is plug out, won’t the PCI
> driver know about it through presence detect?
>
> In my case, if miniport detects (through register read of all 1s) the
> surprise removal, how does it notify the ATAport? There is a ATAport library
> routine - AtaPortReportBusChange. It kick starts the ATA rescan. This is not
> the scenario here. How do I notify ATAport that there is no controller?
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

> In my case, if miniport detects (through register read of all 1s) the surprise removal, how does it notify

the ATAport? There is a ATAport library routine - AtaPortReportBusChange.

No, this is about the connected LUNs and not the controller itself.

It kick starts the ATA rescan. This is not the scenario here. How do I notify ATAport that there is no
controller?

You do not need to.

pci.sys and PnP will do this for you.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Thanks Tim, Mark and Maxim. I was away for sometime, so couldn’t respond to this. Now I am clear on the role of the miniport driver in surprise removal.

If the ATAport miniport choose to avoid doing anything that would hinder the normal I/O path, and have surprise removal detection in the error paths. There might be scenarios like commands outstanding at the miniport and there is no device to respond/interrupt.

These requests should timeout at the ATAport and error handling of these commands should kick in. I am not able to find any information on ATAport error handing in WDK documentation or elsewhere.

Could you tell about the ATAport error handling mechanism, particularly command timeout? Or give some pointers to look for.

> Could you tell about the ATAport error handling mechanism, particularly command timeout?

When PnP will detect surprise removal - it will fail all queued commands with some error status.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com