Reset PCIe device

OSR_Community_User · May 13, 2011, 6:36am

Hi

We have a PCIe board that uses Freescale’s P2020 cpu. This
includes a PCI endpoint interface which is used for the connection
to the PC. There is no separate PCI chip like PLX or so. We run
programs on the cpu and it can happen that it completely crashes
(as with every cpu). So there are times when a complete reset
of the cpu is necessary. However as the PCIe interface is also
in this chip this is also reset.

For now I have worked around this by reading all the PCI
registers, then issuing the reset and then restoring the registers,
currently done in user mode. This could also be done in the
kernel driver but I think is still a suboptimal solution.

Is there a way to stop the device (so Windows will not access
it anymore), issuing a reset (probably also solved onboard as a
reaction to the stopping) and then restarting the device again? I
have seen postings and other hints on the internet that discourage
this kind of thing but I have no other choice. It seems I can’t
just send a IRP_MN_STOP_DEVICE as this might interfere
with the pnp-manager.

So is there an “official” way to solve this in a wdm driver that
should work on Vista and 7, if posssible also XP? I have already
a working driver that can handle interrupts but I’m new to
driver development so I’d appreciate a sample to show how to
do this.

Thanks a lot.

bye Fabi

OSR_Community_User · May 13, 2011, 8:36am

Hi Fabi,

have a look at this:

http://msdn.microsoft.com/en-us/library/ff553315(VS.85).aspx

best

Kerem

OSR_Community_User · May 13, 2011, 9:01am

At 08:36 13.05.2011 -0400, xxxxx@arcor.de wrote:

Hi Fabi,

have a look at this:

http://msdn.microsoft.com/en-us/library/ff553315(VS.85).aspx

I have already seen this but these functions are used for installers.
Additionally some of the SetupDi* functions need admin rights
to be executed. But my reset should work from a normal application,
possibly with support from the driver.

Do you use these functions yourself, do they work for this case?

Thanks

bye Fabi

OSR_Community_User · May 13, 2011, 9:23am

Hi,

you can use them from your application as well, there is no limitation, except you must be a admin or user with admin rights: First you do a SetupDiSetClassInstallParams with SP_PROPCHANGE_PARAMS with StateChange set to DICS_PROPCHANGE on the selected device and then a SetupDiCallClassInstaller,…the selected device will be restarted! Change made to hardware needs admin rights. Have a look at this example from microsoft, the well documented and developed devcon:

http://support.microsoft.com/kb/311272/en-us

Do you use these functions yourself, do they work for this case?

Yes,…

What kind of device is that you want to reset/access?

best

Kerem

OSR_Community_User · May 13, 2011, 9:58am

At 09:22 13.05.2011 -0400, you wrote:

Hi,

you can use them from your application as well, there is no limitation, except you must be a admin or user with admin rights: First you do a SetupDiSetClassInstallParams with SP_PROPCHANGE_PARAMS with StateChange set to DICS_PROPCHANGE on the selected device and then a SetupDiCallClassInstaller,…the selected device will be restarted! Change made to hardware needs admin rights.

I was wondering if there’s another way e.g. with power modes or
just anything that will Windows make to “look the other way”
while the device is being reset but doesn’t need admin rights.

Have a look at this example from microsoft, the well documented and developed devcon:

http://support.microsoft.com/kb/311272/en-us

I have already looked in to the source code in the DDK and tried
it out, but I had problems selecting my card, I’ll try some more.

> Do you use these functions yourself, do they work for this case?

Yes,…

What kind of device is that you want to reset/access?

It’s a general controller that can control a range of peripheral
components like digital/analog signals and motor drives, used
for machine control systems. Something like this with a new cpu:

(only in German)
http://indel.ch/ftp/News/Deutsch/INFO-PCIe.pdf

So the master reset should be possible as part of the normal
software development phase. Editing sources, compiling,
resetting the controller and loading the software should be
possible from the IDE without admin rights.

Thanks

bye Fabi

Doron_Holan · May 13, 2011, 10:21am

I think you are looking at this too hard. If PCI.sys has nothing to do with the reset, there is no reason for a pnp stop to be sent. Windows will not arbitrarily tough your hw, only your driver does, so everything you need to do is in the driver. Define a reset IOCTL and when it is received, your driver drains all io, stops processing new io (is gains exclusive access) and resets the device.

d

debt from my phone

-----Original Message-----
From: Fabian Cenedese
Sent: Friday, May 13, 2011 6:58 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Reset PCIe device

At 09:22 13.05.2011 -0400, you wrote:

Hi,

you can use them from your application as well, there is no limitation, except you must be a admin or user with admin rights: First you do a SetupDiSetClassInstallParams with SP_PROPCHANGE_PARAMS with StateChange set to DICS_PROPCHANGE on the selected device and then a SetupDiCallClassInstaller,…the selected device will be restarted! Change made to hardware needs admin rights.

I was wondering if there’s another way e.g. with power modes or
just anything that will Windows make to “look the other way”
while the device is being reset but doesn’t need admin rights.

Have a look at this example from microsoft, the well documented and developed devcon:

http://support.microsoft.com/kb/311272/en-us

I have already looked in to the source code in the DDK and tried
it out, but I had problems selecting my card, I’ll try some more.

> Do you use these functions yourself, do they work for this case?

Yes,…

What kind of device is that you want to reset/access?

It’s a general controller that can control a range of peripheral
components like digital/analog signals and motor drives, used
for machine control systems. Something like this with a new cpu:

(only in German)
http://indel.ch/ftp/News/Deutsch/INFO-PCIe.pdf

So the master reset should be possible as part of the normal
software development phase. Editing sources, compiling,
resetting the controller and loading the software should be
possible from the IDE without admin rights.

Thanks

bye Fabi

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · May 15, 2011, 4:01pm

> Is there a way to stop the device (so Windows will not access

it anymore), issuing a reset (probably also solved onboard as a
reaction to the stopping) and then restarting the device again?

For Ethernet, yes for sure.

For storage miniports, IIRC also yes.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · May 16, 2011, 5:28am

At 14:21 13.05.2011 +0000, you wrote:

I think you are looking at this too hard. If PCI.sys has nothing to do with the reset, there is no reason for a pnp stop to be sent. Windows will not arbitrarily tough your hw, only your driver does, so everything you need to do is in the driver. Define a reset IOCTL and when it is received, your driver drains all io, stops processing new io (is gains exclusive access) and resets the device.

I would like this reset function to return only after the device is ready
again. So this needs a waiting time of let’s say 100ms (don’t know
the exact time yet). Can I do this with KeDelayExecutionThread?
Or would that stop the system and I need to do it with a kernel
thread and a e.g. a timer?

Something like

NTSTATUS ChipReset(…)
{
// read all pci registers

// issue reset

// wait some time
KeDelayExecutionThread(…)

// restore all pci registers

}

But I’m not sure about draining the IO. Do I need to flush the DPC queue?
Or is there a command that will wait for IOs to complete? If it is in a
sample you can point me there, I just don’t know what to search for now.

Thanks

bye Fabi

Doron_Holan · May 16, 2011, 3:27pm

In a kmdf driver, you have higher level building blocks to quiet your driver. In wdm, you have to do all of this yourself with lower level primitives and state tracking.

As for the wait, you can initialize and set a KTIMER and then wait on it at passive level.

d

debt from my phone

-----Original Message-----
From: Fabian Cenedese
Sent: Monday, May 16, 2011 2:28 AM
To: Windows System Software Devs Interest List
Subject: RE: RE:[ntdev] Reset PCIe device

At 14:21 13.05.2011 +0000, you wrote:

I think you are looking at this too hard. If PCI.sys has nothing to do with the reset, there is no reason for a pnp stop to be sent. Windows will not arbitrarily tough your hw, only your driver does, so everything you need to do is in the driver. Define a reset IOCTL and when it is received, your driver drains all io, stops processing new io (is gains exclusive access) and resets the device.

I would like this reset function to return only after the device is ready
again. So this needs a waiting time of let’s say 100ms (don’t know
the exact time yet). Can I do this with KeDelayExecutionThread?
Or would that stop the system and I need to do it with a kernel
thread and a e.g. a timer?

Something like

NTSTATUS ChipReset(…)
{
// read all pci registers

// issue reset

// wait some time
KeDelayExecutionThread(…)

// restore all pci registers

}

But I’m not sure about draining the IO. Do I need to flush the DPC queue?
Or is there a command that will wait for IOs to complete? If it is in a
sample you can point me there, I just don’t know what to search for now.

Thanks

bye Fabi

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · May 17, 2011, 9:47am

At 19:27 16.05.2011 +0000, you wrote:

>>I think you are looking at this too hard. If PCI.sys has nothing to do with the reset, there is no reason for a pnp stop to be sent. Windows will not arbitrarily tough your hw, only your driver does, so everything you need to do is in the driver. Define a reset IOCTL and when it is received, your driver drains all io, stops processing new io (is gains exclusive access) and resets the device.
>
>I would like this reset function to return only after the device is ready
>again. So this needs a waiting time of let’s say 100ms (don’t know
>the exact time yet). Can I do this with KeDelayExecutionThread?
>Or would that stop the system and I need to do it with a kernel
>thread and a e.g. a timer?
>…
>But I’m not sure about draining the IO. Do I need to flush the DPC queue?
>Or is there a command that will wait for IOs to complete? If it is in a
>sample you can point me there, I just don’t know what to search for now.

In a kmdf driver, you have higher level building blocks to quiet your driver. In wdm, you have to do all of this yourself with lower level primitives and state tracking.

As mentioned I’m new to kernel drivers so I can’t make too much out of this.
Do you mean I need to do book-keeping on all IRPs that are pending and
wait until they are all completed? Is there an example in the ddk or on the
net where I can peek at?

As for the wait, you can initialize and set a KTIMER and then wait on it at passive level.

If I have a normal DeviceIoControl command for a reset then this
already runs at irql 0. So is there a difference if I call
KeDelayExecutionThread or wait on a timer?

Thanks

bye Fabi

Doron_Holan · May 17, 2011, 10:41am

KeDelayExecutionThread will accomplish the same. Just don’t use KeStallExecutionProcessor.

d

debt from my phone

-----Original Message-----
From: Fabian Cenedese
Sent: Tuesday, May 17, 2011 6:49 AM
To: Windows System Software Devs Interest List
Subject: RE: RE:[ntdev] Reset PCIe device

At 19:27 16.05.2011 +0000, you wrote:

>>I think you are looking at this too hard. If PCI.sys has nothing to do with the reset, there is no reason for a pnp stop to be sent. Windows will not arbitrarily tough your hw, only your driver does, so everything you need to do is in the driver. Define a reset IOCTL and when it is received, your driver drains all io, stops processing new io (is gains exclusive access) and resets the device.
>
>I would like this reset function to return only after the device is ready
>again. So this needs a waiting time of let’s say 100ms (don’t know
>the exact time yet). Can I do this with KeDelayExecutionThread?
>Or would that stop the system and I need to do it with a kernel
>thread and a e.g. a timer?
>…
>But I’m not sure about draining the IO. Do I need to flush the DPC queue?
>Or is there a command that will wait for IOs to complete? If it is in a
>sample you can point me there, I just don’t know what to search for now.

In a kmdf driver, you have higher level building blocks to quiet your driver. In wdm, you have to do all of this yourself with lower level primitives and state tracking.

As mentioned I’m new to kernel drivers so I can’t make too much out of this.
Do you mean I need to do book-keeping on all IRPs that are pending and
wait until they are all completed? Is there an example in the ddk or on the
net where I can peek at?

As for the wait, you can initialize and set a KTIMER and then wait on it at passive level.

If I have a normal DeviceIoControl command for a reset then this
already runs at irql 0. So is there a difference if I call
KeDelayExecutionThread or wait on a timer?

Thanks

bye Fabi

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · May 18, 2011, 4:29am

At 14:40 17.05.2011 +0000, you wrote:

KeDelayExecutionThread will accomplish the same. Just don’t use KeStallExecutionProcessor.

I now execute the device reset from a kernel function. However I still get
a BSOD on some computers, it works on others. Here’s the dump from
the last crash. My driver doesn’t even show up in the call stack. Instead
pci! is there. Is there something my driver should handle that is now
forwarded to the base driver?

Kernel base = 0xfffff80002e65000 PsLoadedModuleList = 0xfffff800030a2e50

BugCheck 124, {4, fffffa80062b2038, 0, 0}

Probably caused by : hardware

2: kd> !analyze -v

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: fffffa80062b2038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:

BUGCHECK_STR: 0x124_4
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: a

STACK_TEXT:
fffff88002f8ca78 fffff80002e2e903 : 0000000000000124 0000000000000004 fffffa80062b2038 0000000000000000 : nt!KeBugCheckEx
fffff88002f8ca80 fffff80002feb593 : 0000000000000001 fffffa8006289b10 0000000000000000 fffffa80062891b0 : hal!HalBugCheckSystem+0x1e3
fffff88002f8cac0 fffff88000f7eaff : fffffa8000000750 fffffa8006289b10 0000000000000000 fffffa80062b1010 : nt!WheaReportHwError+0x263
fffff88002f8cb20 fffff88000f7e526 : 0000000000000000 fffff88002f8cc70 fffffa80054b4c00 fffff88002f8cbf0 : pci!ExpressRootPortAerInterruptRoutine+0x27f
fffff88002f8cb80 fffff80002ed153c : fffff88002f64180 fffff88002f8cc01 fffffa80054b4c00 0000000000000001 : pci!ExpressRootPortInterruptRoutine+0x36
fffff88002f8cbf0 fffff80002eddec2 : fffff88002f64180 fffff88000000002 0000000000000002 fffff80000000000 : nt!KiInterruptDispatch+0x16c
fffff88002f8cd80 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiIdleLoop+0x32

STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: hardware
IMAGE_NAME: hardware
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: X64_0x124_4_PCIEXPRESS
BUCKET_ID: X64_0x124_4_PCIEXPRESS

2: kd> !errrec 0xfffffa80062b2038

Common Platform Error Record @ fffffa80062b2038

Record Id : 01cc15230444d405
Severity : Fatal (1)
Length : 672
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 5/18/2011 7:46:18
Flags : 0x00000000

===============================================================================
Section 0 : PCI Express

Descriptor @ fffffa80062b20b8
Section @ fffffa80062b2148
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : Recoverable

Port Type : Root Port
Version : 1.1
Command/Status: 0x4010/0x0506
Device Id :
VenId:DevId : 8086:340c
Class code : 030400
Function No : 0x00
Device No : 0x05
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ fffffa80062b217c
Device Caps : 00008021 Role-Based Error Reporting: 1
Device Ctl : 0127 ur FE NF CE
Dev Status : 0003 ur fe NF CE
Root Ctl : 0008 fs nfs cs

AER Information @ fffffa80062b21b8
Uncorrectable Error Status : 00000020 ur ecrc mtlp rof uc ca cto fcp ptlp SD dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000005 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 00000000 00000000 00000000 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00

===============================================================================
Section 1 : Processor Generic

Descriptor @ fffffa80062b2100
Section @ fffffa80062b2218
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational

Proc. Type : x86/x64
Instr. Set : x64
CPU Version : 0x00000000000106a5
Processor ID : 0x0000000000000002

I already tried to make a SIO debug connection from another computer but
for some reason it never connected. Maybe a problem with the COM port.
Can anybody tell me what the reason for the crash is?

Thanks

bye Fabi

Tim_Roberts · May 18, 2011, 12:55pm

Fabian Cenedese wrote:

At 14:40 17.05.2011 +0000, you wrote:
> KeDelayExecutionThread will accomplish the same. Just don’t use KeStallExecutionProcessor.
I now execute the device reset from a kernel function. However I still get
a BSOD on some computers, it works on others. Here’s the dump from
the last crash. My driver doesn’t even show up in the call stack. Instead
pci! is there. Is there something my driver should handle that is now
forwarded to the base driver?
…
I already tried to make a SIO debug connection from another computer but
for some reason it never connected. Maybe a problem with the COM port.
Can anybody tell me what the reason for the crash is?

Yes. Your PCIExpress device has triggered a hardware fault, like a
parity error or a PCIe protocol violation.

How is the reset being handled by the hardware? There are strict rules
for how a device pops off and pops back on to a PCIExpress bus. The
timing and the protocol exchanges are very clearly spelled out. If you
are just killing the power to your device, then you are committing a
PCIExpress protocol violation, and you will get a BSOD.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

OSR_Community_User · May 19, 2011, 3:19am

At 09:55 18.05.2011 -0700, you wrote:

Fabian Cenedese wrote:
> At 14:40 17.05.2011 +0000, you wrote:
>> KeDelayExecutionThread will accomplish the same. Just don’t use KeStallExecutionProcessor.
> I now execute the device reset from a kernel function. However I still get
> a BSOD on some computers, it works on others. Here’s the dump from
> the last crash. My driver doesn’t even show up in the call stack. Instead
> pci! is there. Is there something my driver should handle that is now
> forwarded to the base driver?
> …
> I already tried to make a SIO debug connection from another computer but
> for some reason it never connected. Maybe a problem with the COM port.
> Can anybody tell me what the reason for the crash is?

Yes. Your PCIExpress device has triggered a hardware fault, like a
parity error or a PCIe protocol violation.

How is the reset being handled by the hardware? There are strict rules
for how a device pops off and pops back on to a PCIExpress bus. The
timing and the protocol exchanges are very clearly spelled out. If you
are just killing the power to your device, then you are committing a
PCIExpress protocol violation, and you will get a BSOD.

I know that resetting a device out of the blue is not correct, that’s why
I asked in my first mail how I can disable or shut down a device before
the reset gets issued.

Is there a way to stop the device (so Windows will not access
it anymore), issuing a reset (probably also solved onboard as a
reaction to the stopping) and then restarting the device again?

The only solution so far was using SetupDi* functions like here:

http://www.osronline.com/showThread.CFM?link=128995

However I can’t believe that this is the only way and that there are
no “official” Kernel/Pci/Power/Whatever functions to deregister a
PCIe device and then restart it again. That’s what plug’n’play is
about, no?

I will next have a go at the solution above but I’d appreciate if someone
could mention other things to try out.

Thanks

bye Fabi

OSR_Community_User · May 19, 2011, 10:45am

>>Yes. Your PCIExpress device has triggered a hardware fault, like a

>parity error or a PCIe protocol violation.
>
>How is the reset being handled by the hardware? There are strict rules
>for how a device pops off and pops back on to a PCIExpress bus. The
>timing and the protocol exchanges are very clearly spelled out. If you
>are just killing the power to your device, then you are committing a
>PCIExpress protocol violation, and you will get a BSOD.

I know that resetting a device out of the blue is not correct, that’s why
I asked in my first mail how I can disable or shut down a device before
the reset gets issued.

>Is there a way to stop the device (so Windows will not access
>it anymore), issuing a reset (probably also solved onboard as a
>reaction to the stopping) and then restarting the device again?

The only solution so far was using SetupDi* functions like here:

http://www.osronline.com/showThread.CFM?link=128995

I tried this now and encountered some issues:

Admin rights needed, otherwise I get an access denied error. OK,
I think if necessary then we could make it mandatory to run this
application with admin rights though it’s not very nice.
The application has to be 64bit. Of course our app is still 32bit as
we also target 32bit systems. But then I get 0xe0000235 which
is ERROR_IN_WOW64. So I have to create a 64bit tool for that.
Even then it didn’t work fully. Disabling seems to have worked, but
enabling again resulted in

“Windows cannot load the device driver for this hardware because a previous instance of the device driver is still in memory. (Code 38)
You need to restart your computer before the changes you made to this device will take effect.”

Which is kind of what I wanted to prevent by just resetting the device.
So I guess I need to go “for something completely different”…

Thanks

bye Fabi

Tim_Roberts · May 19, 2011, 1:07pm

Fabian Cenedese wrote:

The only solution so far was using SetupDi* functions like here:

http://www.osronline.com/showThread.CFM?link=128995

However I can’t believe that this is the only way and that there are
no “official” Kernel/Pci/Power/Whatever functions to deregister a
PCIe device and then restart it again. That’s what plug’n’play is
about, no?

No. A device that needs a power reset at full speed is broken. PnP is
about letting your driver know when some outside influence has resulted
in the loss of your device, such as an unplug. Now, I suppose you could
try to argue that you’re just simulating an unplug/replug, but in that
case your driver is going to have to go away and let another driver take
over. It cannot survive that operation.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim_Roberts · May 19, 2011, 1:10pm

Fabian Cenedese wrote:

I tried this now and encountered some issues:

Admin rights needed, otherwise I get an access denied error. OK,
I think if necessary then we could make it mandatory to run this
application with admin rights though it’s not very nice.

The application has to be 64bit. Of course our app is still 32bit as
we also target 32bit systems. But then I get 0xe0000235 which
is ERROR_IN_WOW64. So I have to create a 64bit tool for that.

Yes, a few of the more complicated SetupDi operations only work from a
64-bit application.

Even then it didn’t work fully. Disabling seems to have worked, but
enabling again resulted in

“Windows cannot load the device driver for this hardware because a previous instance of the device driver is still in memory. (Code 38)
You need to restart your computer before the changes you made to this device will take effect.”

Which is kind of what I wanted to prevent by just resetting the device.
So I guess I need to go “for something completely different”…

Yes, when you reset the device, it drops off the bus. The bus driver
notifies PnP of this, which then needs to tear down your driver stack so
it can rebuild it when the device returns. Your driver instance cannot
survive this operation.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Pavel_A1 · May 19, 2011, 9:43pm

“Fabian Cenedese” wrote in message news:xxxxx@ntdev…

“Tim Roberts” wrote in message news:xxxxx@ntdev…
…
>> “Windows cannot load the device driver for this hardware because a
>> previous instance of the device driver is still in memory. (Code 38)
>> You need to restart your computer before the changes you made to this
>> device will take effect.”
>>
>> Which is kind of what I wanted to prevent by just resetting the device.
>> So I guess I need to go “for something completely different”…
>
> Yes, when you reset the device, it drops off the bus. The bus driver
> notifies PnP of this, which then needs to tear down your driver stack so
> it can rebuild it when the device returns. Your driver instance cannot
> survive this operation.

This disapproves earlier assumption that PCI bus driver won’t notice if a
device is reset.
Maybe this depends on the chipset. Newer chipsets are well aware of PCIe
hotplug.

This of course complicates the software. But still nothing unusual.
Disabling the device via SetupDi or dev. manager is a wrong direction to
handle this;
let the PnP to do its magic, do not stand on its way.
The usermode apps must quickly release any handles of the removed instance,
and the driver may (or may not) unload before re-detection of the device.

Regards,
–pa

OSR_Community_User · May 20, 2011, 5:17am

>>However I can’t believe that this is the only way and that there are

>no “official” Kernel/Pci/Power/Whatever functions to deregister a
>PCIe device and then restart it again. That’s what plug’n’play is
>about, no?

No. A device that needs a power reset at full speed is broken.

Seems then like it’s a mistake to create a chip with inbuilt PCI
endpoint if there’s no way to reset it independently.

PnP is
about letting your driver know when some outside influence has resulted
in the loss of your device, such as an unplug. Now, I suppose you could
try to argue that you’re just simulating an unplug/replug, but in that
case your driver is going to have to go away and let another driver take
over. It cannot survive that operation.

I could live with that. But do I need to unload it manually (and if so how)
or will it be unloaded automatically upon the reset of the device?

>Yes, when you reset the device, it drops off the bus. The bus driver
>notifies PnP of this, which then needs to tear down your driver stack so
>it can rebuild it when the device returns. Your driver instance cannot
>survive this operation.

This disapproves earlier assumption that PCI bus driver won’t notice if a device is reset.
Maybe this depends on the chipset. Newer chipsets are well aware of PCIe hotplug.

This of course complicates the software. But still nothing unusual.
Disabling the device via SetupDi or dev. manager is a wrong direction to handle this;
let the PnP to do its magic, do not stand on its way.
The usermode apps must quickly release any handles of the removed instance,
and the driver may (or may not) unload before re-detection of the device.

I will try now closing of handles and if this doesn’t work then unloading of
the driver. But first I need to find a way to init a reset if I have no contact
anymore.

Thanks both of you.

bye Fabi

OSR_Community_User · May 23, 2011, 8:50am

>>Yes, when you reset the device, it drops off the bus. The bus driver

>notifies PnP of this, which then needs to tear down your driver stack so
>it can rebuild it when the device returns. Your driver instance cannot
>survive this operation.

This disapproves earlier assumption that PCI bus driver won’t notice if a device is reset.
Maybe this depends on the chipset. Newer chipsets are well aware of PCIe hotplug.

This of course complicates the software. But still nothing unusual.
Disabling the device via SetupDi or dev. manager is a wrong direction to handle this;
let the PnP to do its magic, do not stand on its way.
The usermode apps must quickly release any handles of the removed instance,
and the driver may (or may not) unload before re-detection of the device.

I now tried something different. We added a reset switch to our device
so we can issue a reset even if there’s no contact to it (Windows side).
I then rebooted freshly, closed all programs and uninstalled the device.
This should unload the driver, right? As I then flipped the switch I got
a “Black screen of death”, meaning everything went black, no reaction
anymore.

The minidump looks exactly the same as with driver loaded:

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: fffffa80062ca038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:

OVERLAPPED_MODULE: Address regions for ‘nvlddmkm’ and ‘nvlddmkm.sys’ overlap
BUGCHECK_STR: 0x124_4
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: a

STACK_TEXT:
fffff88002f8ca78 fffff80002e2c903 : 0000000000000124 0000000000000004 fffffa80062ca038 0000000000000000 : nt!KeBugCheckEx
fffff88002f8ca80 fffff80002fe9593 : 0000000000000001 fffffa80062b1b10 0000000000000000 fffffa80062b11b0 : hal!HalBugCheckSystem+0x1e3
fffff88002f8cac0 fffff88000d5faff : fffffa8000000750 fffffa80062b1b10 0000000000000000 fffffa80062c9010 : nt!WheaReportHwError+0x263
fffff88002f8cb20 fffff88000d5f526 : 0000000000000000 fffff88002f8cc70 fffffa80054d5c00 fffff88002f8cbf0 : pci!ExpressRootPortAerInterruptRoutine+0x27f
fffff88002f8cb80 fffff80002ecf53c : fffff88002f64180 fffff88002f8cc70 fffffa80054d5c00 0000000000000001 : pci!ExpressRootPortInterruptRoutine+0x36
fffff88002f8cbf0 fffff80002edbec2 : fffff88002f64180 fffff88000000002 0000000000000002 fffff80000000000 : nt!KiInterruptDispatch+0x16c
fffff88002f8cd80 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiIdleLoop+0x32

STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: hardware
IMAGE_NAME: hardware
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: X64_0x124_4_PCIEXPRESS
BUCKET_ID: X64_0x124_4_PCIEXPRESS

Do I have any chance of doing a hot plug unplug in case I need to issue
a hard reset? Is there a way to assert the PCIe PERST signal?

Thanks

bye Fabi

Reset PCIe device

Debugging Details:

2: kd> !errrec 0xfffffa80062b2038

Common Platform Error Record @ fffffa80062b2038

=============================================================================== Section 0 : PCI Express

=============================================================================== Section 1 : Processor Generic

Debugging Details:

===============================================================================
Section 0 : PCI Express

===============================================================================
Section 1 : Processor Generic