Is there an inherent race condition in this event.
My handler notes the event in a flag.
All hw access routines will check the flag on entry.
But then if the event comes after the check, driver would attempt to access h/w that’s
no longer there?
Yes this race exists.
d
Bent from my phone
From: xxxxx@yahoo.commailto:xxxxx
Sent: ?1/?14/?2015 4:22 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] PNP Surprise Remove
Is there an inherent race condition in this event.
My handler notes the event in a flag.
All hw access routines will check the flag on entry.
But then if the event comes after the check, driver would attempt to access h/w that’s
no longer there?
—
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>
How does software surprise remove work?
When I physically remove my adapter, surprise remove handling works reliably.
But when WHCK simulates surprise remove, the system hangs after the unload of my driver.
As if there are conflicting rules: on one hand I should not touch hardware after surprise remove
notification, on the other hand, my interrupt is still asserted, and hangs the system such that neither
KD can break into the system, nor a keyboard initiated crash dump works. It looks like my surprise removed adapter can still asserting interrupts.
Whether you can touch hardware after a surprise remove has happened depends on your device. But you shouldn’t hang expecting it to respond. For PCI devices the simplest thing is to try a register read of something that shouldn’t give back 0xff[ff[ffff]] and see what you get back.
Surprise remove is, unfortunately, not limited to physical removal of hardware. Errors in PNP state transitions in the drivers above you could cause you to see a surprise remove.
So poke away at your registers, but be careful. That’s generally a good rule - a while (register != 1); loop is always a bad idea in a driver.
-p
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Monday, January 26, 2015 11:12 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PNP Surprise Remove
How does software surprise remove work?
When I physically remove my adapter, surprise remove handling works reliably.
But when WHCK simulates surprise remove, the system hangs after the unload of my driver.
As if there are conflicting rules: on one hand I should not touch hardware after surprise remove notification, on the other hand, my interrupt is still asserted, and hangs the system such that neither KD can break into the system, nor a keyboard initiated crash dump works. It looks like my surprise removed adapter can still asserting interrupts.
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
Thanks Peter.
So by the time, PnpEvent comes to my driver, PCI BAR mappings are not removed, and attempts
at accessing them will not result in Access Violations.
If I can attempt to clear/mask interrupts, that is good news.
Don’t need to read registers, as long as I can write/clear them.
I don’t believe the windows for your hardware will be reassigned until after your driver has been torn down.
-p
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Monday, January 26, 2015 11:45 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PNP Surprise Remove
Thanks Peter.
So by the time, PnpEvent comes to my driver, PCI BAR mappings are not removed, and attempts at accessing them will not result in Access Violations.
If I can attempt to clear/mask interrupts, that is good news.
Don’t need to read registers, as long as I can write/clear them.
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
xxxxx@yahoo.com wrote:
So by the time, PnpEvent comes to my driver, PCI BAR mappings are not removed, and attempts
at accessing them will not result in Access Violations.
Windows never removes any mappings. That’s only done by your driver,
with a call to MmUnmapIoSpace.
If I can attempt to clear/mask interrupts, that is good news.
Don’t need to read registers, as long as I can write/clear them.
Even if the hardware goes away, on most platforms you can still touch
the physical addresses. Just make sure you don’t loop to wait for a bit
to go to 0…
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
You really can’t rely on turning off interrupts in your surprise remove
handling. Instead all of your isr and dpc routines need to be coded so that
they do not have potentially non-terminating loops waiting for hardware
registers to change state. What Peter W said upthread.
Mark Roddy
On Mon, Jan 26, 2015 at 2:44 PM, wrote:
> Thanks Peter.
>
> So by the time, PnpEvent comes to my driver, PCI BAR mappings are not
> removed, and attempts
> at accessing them will not result in Access Violations.
>
> If I can attempt to clear/mask interrupts, that is good news.
> Don’t need to read registers, as long as I can write/clear them.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>You really can’t rely on turning off interrupts in your surprise remove
handling.
You have to deassert interrupts, to avoid an interrupt storm. To do that, you have to at least try to turn them off in your device.
The docs at
https://msdn.microsoft.com/en-us/library/windows/hardware/ff546699(v=vs.85)
.aspx spell out what needs to happen on a surprise removal PnP irp. Among
other things it says:
"In response to IRP_MN_SURPRISE_REMOVAL, a driver must do the following,
in the listed order:
Determine if the device has been removed.
The driver must always attempt to determine if the device is still
connected. If it is, the driver must attempt to stop the device and
disable it.
Release the device’s hardware resources (interrupts, I/O ports, memory
registers, and DMA channels).
In a parent bus driver, power down the bus slot if the driver is capable
of doing so. Call PoSetPowerState to notify the power manager. For
additional information, see Power Management.
…"
For KMDF drivers, the page at
https://msdn.microsoft.com/en-us/library/windows/hardware/ff540913(v=vs.85)
.aspx looks like KMDF has a pretty thin pass through of the surprise
removal irp, although I have to admit the docs don’t seem super clear if
other KMDF event functions will get called on surprise removal, on the
assumption those other event function already know how to do parts of the
device shutdown (like the power state handling?).
One could argue that surprise removal code can’t make the same assumptions
about the device state as a normal device shutdown path, so perhaps needs
unique code. On the other hand, a device might disappear at any moment, so
if the normal shutdown code assumes the device is present, there are
vulnerable time windows when the driver might get into trouble if a device
vanishes at the right instant. This leads to the conclusion that normal
shutdown code should cope with the device being gone or in a broken state,
which would suggest there is no need for different surprise removal code.
Since ISRs and DPCs can happen asynchronously from the passive level
surprise removal notification, all driver code at all times need to be
prepared for a device vanishing, and the surprise removal irp is just a
notification that happens when the device wasn’t so busy.
On PCI devices, I once traced the surprise removal IRPs down into the PCI
bus driver, and as I remember it did things like set the PCI command
register to 0, which should inhibit the device from communicating to the
system anymore. I don’t remember if the bus driver tried to do a FLR (PCI
function level reset), although a major purpose of FLR is to allow generic
PCI bus drivers to shutup/disable/reset a device they know nothing about
internally. I’ve unfortunately seen PCI device that didn’t correctly
respond to FLR requests, even though they claimed to support it.
Handing a device that vanishes is one of those areas where it would be
real handy to know EXACTLY what the bus driver does, either with explicit
documentation, or with open source code.
I see responses from three highly experienced kernel developers who gave
somewhat different answers, which makes me think the docs on surprise
removal processing need improving. The three responses included:
- So poke away at your registers, but be careful
- You really can’t rely on turning off interrupts in your surprise remove
handling. Instead all of your isr and dpc routines need to be coded so
that they do not have potentially non-terminating loops - You have to deassert interrupts, to avoid an interrupt storm. To do
that, you have to at least try to turn them off in your device
All of these answers seem like good advice, but yet to me seem like all
are based on the lack of a specific documented contract between the
hardware, OS, and driver developers, and more of an empirically determined
strategy to cope with past/current devices and OS versions. These
strategies may get us to 99.9990% reliability, but if we can get to
99.9991% reliability simply by some conversations between hardware, OS,
and driver developers, and some tech writes packaging up the wisdom
learned, that seems like a good thing.
Jan
On 1/26/15, 7:12 PM, “xxxxx@yahoo.com” wrote:
>How does software surprise remove work?
>When I physically remove my adapter, surprise remove handling works
>reliably.
>
>But when WHCK simulates surprise remove, the system hangs after the
>unload of my driver.
>As if there are conflicting rules: on one hand I should not touch
>hardware after surprise remove
>notification, on the other hand, my interrupt is still asserted, and
>hangs the system such that neither
>KD can break into the system, nor a keyboard initiated crash dump works.
>It looks like my surprise removed adapter can still asserting interrupts.
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer
Yes my point is that you cannot *rely* on turning off interrupts in
surprise remove to avoid dealing with issues in other code paths that can
have non-terminating loops waiting for hardware state changes that aren’t
going to happen.
Mark Roddy
On Mon, Jan 26, 2015 at 4:59 PM, wrote:
> >You really can’t rely on turning off interrupts in your surprise remove
> handling.
>
> You have to deassert interrupts, to avoid an interrupt storm. To do that,
> you have to at least try to turn them off in your device.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
On Mon, Jan 26, 2015 at 6:40 PM, Jan Bottorff
wrote:
> All of these answers seem like good advice, but yet to me seem like all
> are based on the lack of a specific documented contract between the
> hardware
>
There is a very simple contract regarding a specific hardware device and
its existence in the system - it can cease to exist at any time. Any other
assumption is faulty.
Mark Roddy