PCI config space uninitialized after resuming from S3

Do you have a PCIe trace captured while the system is going in and out of
s3? If yes, post it somewhere so that we can take a look.

Calvin

On Thu, Mar 28, 2013 at 10:59 AM, Jay Talbott wrote:

> The issue is knowing if the spec non-compliance is the root cause of this
> particular issue, or if there’s something else going on that will need to
> be addressed in the next revision of the silicon. They don’t want to respin
> the silicon to fix one thing and find out that this particular problem
> still exists.
>
> > -----Original Message-----
> > From: xxxxx@lists.osr.com [mailto:bounce-529986-
> > xxxxx@lists.osr.com] On Behalf Of Tim Roberts
> > Sent: Thursday, March 28, 2013 10:56 AM
> > To: Windows System Software Devs Interest List
> > Subject: Re: [ntdev] PCI config space uninitialized after resuming from
> S3
> >
> > Jay Talbott wrote:
> > >
> > > Anybody else have any ideas before I go eat another one of my MSDN
> > > support incidents?
> > >
> >
> > If (A) you know your hardware is not spec compliant, and (B) you have a
> > hacky but acceptable workaround, then why are you spending any more
> > time
> > and money on this? Microsoft is not going to invest very much time
> > chasing down an issue on non-compliant hardware. Put it away and enter
> > a bug report in your internal database so that the next round of the
> > hardware gets fixed.
> >
> > –
> > Tim Roberts, xxxxx@probo.com
> > Providenza & Boekelheide, Inc.
> >
> >
> > —
> > NTDEV is sponsored by OSR
> >
> > OSR is HIRING!! See http://www.osr.com/careers
> >
> > For our schedule of WDF, WDM, debugging and other seminars visit:
> > http://www.osr.com/seminars
> >
> > To unsubscribe, visit the List Server section of OSR Online at
> > http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

This device uses an ASIC? Oh, boy… here I had been assuming the device was using some weirdo IP in an FPGA for the PCIe interface.

Peter
OSR

>The code doesn?t want to attempt
to restore the state of a device that is no longer present.

Won’t PCI.SYS report the device change and exclude the PDO from QUERY_DEVICE_RELATIONS then?

That’s what I expected. But that’s not what it does. The device remains in
Device Manger. The driver’s EvtDeviceD0Entry gets called. Everything
proceeds normally as it should when the system wakes from sleep… except
that the PCI bus driver doesn’t restore the device’s PCI config data. As far
as everything else in the system is concerned, the device should be ready to
resume operation. But obviously without the PCI config data getting
restored, bad things happen when attempting to resume communication with the
device.

As I previously mentioned, my workaround is to have the driver handle the
PCI config save and restore, which works, but only masks the underlying
issue.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-529997-
xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
Sent: Thursday, March 28, 2013 3:51 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCI config space uninitialized after resuming from S3

>The code doesn?t want to attempt
to restore the state of a device that is no longer present.

Won’t PCI.SYS report the device change and exclude the PDO from
QUERY_DEVICE_RELATIONS then?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

The first prototypes were based on an FPGA that included a PCIe interface /
DMA engine IP that was provided by a 3rd party. However, the performance of
the FPGA was too slow, so they created a custom ASIC that integrated their
specific hardware technology with the PCIe interface / DMA engine. The
resulting ASIC performs much faster than the FPGA. However, this means that
silicon respins are required to fix any hardware issues. I got involved
later in the game after the ASIC silicon was already designed, so I’m just
living with the way things are and providing whatever solutions I can in the
driver to work around the hardware issues.

That being said, they’d like to understand the root cause of this issue so
that it can be corrected in the next spin of the silicon if it’s a hardware
issue.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-529995-
xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Thursday, March 28, 2013 2:49 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCI config space uninitialized after resuming from S3

This device uses an ASIC? Oh, boy… here I had been assuming the device
was using some weirdo IP in an FPGA for the PCIe interface.

Peter
OSR


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

If there’s a chance that the hardware is broken, all bets are off
(a.k.a. Undefined Behavior).
Basically your question now is, how robustly pci.sys behaves if the
device is “less than 100%” compliant.
We can only guess that pci.sys probably was not tested against all
imaginable sorts of broken hardware so there may be some undriven paths.
So what then?
Has your company ever done formal PCI spec tests on this device?
– pa

On 29-Mar-2013 01:11, Jay Talbott wrote:

That’s what I expected. But that’s not what it does. The device remains in
Device Manger. The driver’s EvtDeviceD0Entry gets called. Everything
proceeds normally as it should when the system wakes from sleep… except
that the PCI bus driver doesn’t restore the device’s PCI config data. As far
as everything else in the system is concerned, the device should be ready to
resume operation. But obviously without the PCI config data getting
restored, bad things happen when attempting to resume communication with the
device.

As I previously mentioned, my workaround is to have the driver handle the
PCI config save and restore, which works, but only masks the underlying
issue.

> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:bounce-529997-
> xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
> Sent: Thursday, March 28, 2013 3:51 PM
> To: Windows System Software Devs Interest List
> Subject: RE:[ntdev] PCI config space uninitialized after resuming from S3
>
>> The code doesn?t want to attempt
> to restore the state of a device that is no longer present.
>
> Won’t PCI.SYS report the device change and exclude the PDO from
> QUERY_DEVICE_RELATIONS then?
>
> —

There’s only one known non-compliance, and we don’t know if that’s
contributing to the problem at hand or not. What we are trying to determine
at this point is what’s the root cause of the problem so that it can get
fixed in a future revision of my client’s hardware. If ultimately it’s a
result of the known non-compliance, then we know that fixing that will also
resolve this problem. But if it’s being caused by something else, we want to
know what it is so that it too can be corrected.

According to my client, the PCIe interface / DMA engine IP that was obtained
from a 3rd party passed the PCI compliance, but they admitted they had not
done their own PCI compliance testing once they integrated that IP into
their custom ASIC. I personally recommended that they take their hardware to
the next PCI compliance workshop coming up in April, but whether they do
that or not is out of my hands.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-530001-
xxxxx@lists.osr.com] On Behalf Of Pavel A.
Sent: Thursday, March 28, 2013 4:28 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] PCI config space uninitialized after resuming from S3

If there’s a chance that the hardware is broken, all bets are off
(a.k.a. Undefined Behavior).
Basically your question now is, how robustly pci.sys behaves if the
device is “less than 100%” compliant.
We can only guess that pci.sys probably was not tested against all
imaginable sorts of broken hardware so there may be some undriven paths.
So what then?
Has your company ever done formal PCI spec tests on this device?
– pa

On 29-Mar-2013 01:11, Jay Talbott wrote:
> That’s what I expected. But that’s not what it does. The device remains
in
> Device Manger. The driver’s EvtDeviceD0Entry gets called. Everything
> proceeds normally as it should when the system wakes from sleep…
except
> that the PCI bus driver doesn’t restore the device’s PCI config data. As
far
> as everything else in the system is concerned, the device should be
ready
to
> resume operation. But obviously without the PCI config data getting
> restored, bad things happen when attempting to resume communication
with the
> device.
>
> As I previously mentioned, my workaround is to have the driver handle
the
> PCI config save and restore, which works, but only masks the underlying
> issue.
>
>> -----Original Message-----
>> From: xxxxx@lists.osr.com [mailto:bounce-529997-
>> xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
>> Sent: Thursday, March 28, 2013 3:51 PM
>> To: Windows System Software Devs Interest List
>> Subject: RE:[ntdev] PCI config space uninitialized after resuming from
S3
>>
>>> The code doesn?t want to attempt
>> to restore the state of a device that is no longer present.
>>
>> Won’t PCI.SYS report the device change and exclude the PDO from
>> QUERY_DEVICE_RELATIONS then?
>>
>> —


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

The failing device isn’t the only one behind a given bridge, is it? Just checking.

I’ve asked this before, but since we’re grasping at straws I’ll ask it again.

What, exactly, do you see on the PCIe bus trace for your device? What accesses are performed on your device around the power state transitions? What do you see at the D0->D3 transition? What do you see on the D3->D0 transition?

Specifically… do you see the state of the device being saved on D0->D3?

Do you see the VID/DID being read on D3->D0, and coming back correctly?

Do you see any additional accesses on D3->D0?

Note that Mr. Guan – who I know to be a very clever engineer with an excellent hardware background – has asked you to post the analyzer log someplace so he (and whoever else may have time) can take a look. You might want to do that (or at least acknowledge his offer).

It *really* seems to me that you could discern a lot from the analyzer log…

Peter
OSR

(This problem has me intrigued… can you tell?)

You know what really bothers me about this problem? The report that a BIOS change made one system work that didn’t work previously.

Assuming this is a correct report, his leads me to believe that … duh… the BIOS must have something to do with the problem. And… hmmm… what the BIOS *does* control is whether the system utilizes PCI Express Native Control. When enabled, this *will* result in PCI.SYS using native power management and checking for hot-plug and stuff.

I don’t know if any of that matters… but I know of anything else that could make a difference.

Do you have the before and after BIOS revisions where updating the BIOS fixed the problem? If so, dump them and see what changed. Look for the _OSC method… what does it return?

If not, can you compare the BIOS on various systems – working and non-working – to see if there’s a difference based on _OSC?

Hey… does this device support XP? Does the device *ever* fail on an XP system?

Peter
OSR

> You know what really bothers me about this problem? The report that a BIOS change made one system work that didn’t work previously.

There is that option in some BIOS’s that says “OS is PnP”, and if you say yes, it only configures a minimum number of PCI devices needed to boot, and if you say no, it configures the whole PCI tree. Isn’t there then something about Windows leaving the PCI configuration alone for devices that are configured by the BIOS? Perhaps if the device is configured by the BIOS before boot, it makes some assumption (incorrectly) that the BIOS will also configure the device during power state transitions. I could imagine updating the BIOS changes the default for “OS is PNP” since that was really a leftover from long ago.

Just an idea that came to me via the PDP (psychic debugging protocol). Looking at actual PCIe traces would certainly be better.

Jan

Alright, I’m finally going to jump in here. I have some experience in
this area.

As has been pointed out, the problem is that the OS is not reading back
the same Vendor ID / Device ID on wake. This can be caused by several
things.

  1. If link training on the PCIe bus to the card is not complete by the
    time the host comes along with the read of the VID/PID, the bridge leading
    to the card may respond with an Unsupported Request (UR). That is going
    to return all F’s on the Config Read, which of course doesn’t match the
    previous one. The config space will not be restored.

  2. If the device is not ready to respond to Config accesses yet at the
    time the Config Read is sent down, if it’s there, it should be responding
    with Config Retry. This is most often used when the device has firmware
    that needs to run to initialize some PCI registers before the BIOS or OS
    scans the bus. These Config Reads should be retried but I don’t know for
    how long. If the card is doing anything else, and the Config Read times
    out, it is going to return all F’s on the Config Read which of course
    doesn’t match the previous VID/PID. Again, the config space will not be
    restored.

  3. The device may be slow responding to Config Reads in certain
    circumstances because it’s busy with something. How long will the host
    wait for the read to complete? Well that depends. I have seen most
    systems wait as much as about 17 milliseconds. However, we have a certain
    Intel platform that waits only 800 us. Now, the card in question where we
    were having a problem took 5 milliseconds to respond to the Config Read of
    the MSI-X Control register after the OS slammed down writes to the 1024
    MSI-X vectors because there was a FIFO on the card that filled up with all
    those memory writes. The Config Read could not be responded to until the
    chip processed all the memory writes in the FIFO. Furthermore we observed
    that if the card was placed in a certain slot in that motherboard, things
    worked (no timeout). We got a BIOS update and sure enough now the card
    worked in all but one slot. In the slot where the card didn’t work the
    system still timed out the Config Read in 800 uS and in the other slots it
    did not time out the request. We tried to get Intel to fix the System
    BIOS but they said that they wouldn’t do anything because the motherboard
    was EOL. We only saw this problem on this one system and we have tested
    it on many others.

As others have said, get a PCIe trace of the system coming out of sleep.
Otherwise you’re just wasting bandwidth.

Oh, and I don’t think you have told us what the “known noncompliance” is.

“Jay Talbott”
Sent by: xxxxx@lists.osr.com
03/28/2013 09:14 PM
Please respond to
“Windows System Software Devs Interest List”

To
“Windows System Software Devs Interest List”
cc

Subject
RE: [ntdev] PCI config space uninitialized after resuming from S3

There’s only one known non-compliance, and we don’t know if that’s
contributing to the problem at hand or not. What we are trying to
determine
at this point is what’s the root cause of the problem so that it can get
fixed in a future revision of my client’s hardware. If ultimately it’s a
result of the known non-compliance, then we know that fixing that will
also
resolve this problem. But if it’s being caused by something else, we want
to
know what it is so that it too can be corrected.

According to my client, the PCIe interface / DMA engine IP that was
obtained
from a 3rd party passed the PCI compliance, but they admitted they had not
done their own PCI compliance testing once they integrated that IP into
their custom ASIC. I personally recommended that they take their hardware
to
the next PCI compliance workshop coming up in April, but whether they do
that or not is out of my hands.

> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:bounce-530001-
> xxxxx@lists.osr.com] On Behalf Of Pavel A.
> Sent: Thursday, March 28, 2013 4:28 PM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] PCI config space uninitialized after resuming from
S3
>
> If there’s a chance that the hardware is broken, all bets are off
> (a.k.a. Undefined Behavior).
> Basically your question now is, how robustly pci.sys behaves if the
> device is “less than 100%” compliant.
> We can only guess that pci.sys probably was not tested against all
> imaginable sorts of broken hardware so there may be some undriven paths.
> So what then?
> Has your company ever done formal PCI spec tests on this device?
> – pa
>
> On 29-Mar-2013 01:11, Jay Talbott wrote:
> > That’s what I expected. But that’s not what it does. The device
remains
in
> > Device Manger. The driver’s EvtDeviceD0Entry gets called. Everything
> > proceeds normally as it should when the system wakes from sleep…
> except
> > that the PCI bus driver doesn’t restore the device’s PCI config data.
As
far
> > as everything else in the system is concerned, the device should be
ready
> to
> > resume operation. But obviously without the PCI config data getting
> > restored, bad things happen when attempting to resume communication
> with the
> > device.
> >
> > As I previously mentioned, my workaround is to have the driver handle
the
> > PCI config save and restore, which works, but only masks the
underlying
> > issue.
> >
> >> -----Original Message-----
> >> From: xxxxx@lists.osr.com [mailto:bounce-529997-
> >> xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
> >> Sent: Thursday, March 28, 2013 3:51 PM
> >> To: Windows System Software Devs Interest List
> >> Subject: RE:[ntdev] PCI config space uninitialized after resuming
from
S3
> >>
> >>> The code doesn?t want to attempt
> >> to restore the state of a device that is no longer present.
> >>
> >> Won’t PCI.SYS report the device change and exclude the PDO from
> >> QUERY_DEVICE_RELATIONS then?
> >>
> >> —
>
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

My client has two varieties of boards, one with a single device that
connects directly to the PCIe slot connector, and another with two devices
that uses a PLX PCIe switch between the devices and the PCIe slot connector.
The problem has been observed on both versions of the board. I happen to be
testing on a two chip version of the board.

So, to answer your question, for the single chip version of the board, the
chip itself would be the only device behind the PCIe root port connected to
the slot into which the board is inserted, and for the dual chip version of
the board, each chip is the only device behind each of the downstream ports
of the PLX PCIe switch.

I don’t personally have a PCIe bus analyze, which is why I haven’t (yet)
posted a log. My client, however, is going to capture one with theirs, and
hopefully I can then post a log. At this point I’m still waiting to get a
log from them. However, what I do have at this point is the log of debug
output from the checked build of pci.sys.

Here is an edited debug log with all the entries relevant to the two devices
in my system:

// This is before sleeping
00000031 0.00099816 PCI: PDO(b=0x3, d=0x0, f=0x0)<-QUERY_POWER

00000033 0.00100412 PCI: PDO(b=0x4, d=0x0, f=0x0)<-QUERY_POWER

00000133 0.00537817 PCI: PDO(b=0x3, d=0x0, f=0x0)<-SET_POWER

00000135 0.00539108 PCI: PDO(b=0x4, d=0x0, f=0x0)<-SET_POWER

// These reads are actually the driver capturing the PCI config space in
EvtDeviceD0Exit
00000139 0.00545862 PCI: External Config Read
00000140 0.00546292 PCI: Config Read - (3,0,0) offset 0 length
100 buffer FFFFF88000E81440
00000141 0.00557449 PCI: PDO(b=0x3, d=0x0, f=0x0)<-SET_POWER

00000157 0.00579928 PCI: External Config Read
00000158 0.00580259 PCI: Config Read - (4,0,0) offset 0 length
100 buffer FFFFF88000E81440
00000159 0.00590853 PCI: PDO(b=0x4, d=0x0, f=0x0)<-SET_POWER

// This is after the system wakes
00001666 34.98833466 PCI: PDO(b=0x3, d=0x0, f=0x0)<-SET_POWER

00001668 34.98833847 PCI: PDO(b=0x4, d=0x0, f=0x0)<-SET_POWER

00001707 34.98848343 PCI: PDO(b=0x3, d=0x0, f=0x0)<-SET_POWER

00001709 34.98848724 PCI: Config Read - (3,0,0) offset 0 length 2
buffer FFFFF880009E88A0
00001711 34.98849106 PCI: Config Read - (3,0,0) offset 2 length 2
buffer FFFFF880009E88A2
00001713 34.98850250 PCI: Config Read - (3,0,0) offset 8 length 1
buffer FFFFF880009E88A4
00001715 34.98850632 PCI: Config Read - (3,0,0) offset 9 length 1
buffer FFFFF880009E88A5
00001716 34.98851013 PCI: PDO(b=0x4, d=0x0, f=0x0)<-SET_POWER

00001718 34.98851013 PCI: Config Read - (3,0,0) offset a length 1
buffer FFFFF880009E88A6
00001719 34.98851013 PCI: Config Read - (4,0,0) offset 0 length 2
buffer FFFFF88003CE28A0
00001721 34.98851395 PCI: Config Read - (3,0,0) offset b length 1
buffer FFFFF880009E88A7
00001722 34.98851776 PCI: Config Read - (4,0,0) offset 2 length 2
buffer FFFFF88003CE28A2
00001724 34.98851776 PCI: Config Read - (3,0,0) offset e length 1
buffer FFFFF880009E88AC
00001725 34.98851776 PCI: Config Read - (4,0,0) offset 8 length 1
buffer FFFFF88003CE28A4
00001728 34.98852539 PCI: Config Read - (4,0,0) offset 9 length 1
buffer FFFFF88003CE28A5
00001730 34.98852921 PCI: Config Read - (4,0,0) offset a length 1
buffer FFFFF88003CE28A6
00001732 34.98853683 PCI: Config Read - (4,0,0) offset b length 1
buffer FFFFF88003CE28A7
00001734 34.98854446 PCI: Config Read - (4,0,0) offset e length 1
buffer FFFFF88003CE28AC
00001875 35.13643265 PCI: Config Read - (3,0,0) offset 0 length 2
buffer FFFFF880031CB578
00001877 35.13643646 PCI: Config Read - (3,0,0) offset 0 length 2
buffer FFFFF880031CB588
00001878 35.13644409 PCI: Config Read - (3,0,0) offset 2 length 2
buffer FFFFF880031CB58A
00001879 35.13645172 PCI: Config Read - (3,0,0) offset 8 length 1
buffer FFFFF880031CB58C
00001880 35.13645554 PCI: Config Read - (3,0,0) offset 9 length 1
buffer FFFFF880031CB58D
00001881 35.13645935 PCI: Config Read - (3,0,0) offset a length 1
buffer FFFFF880031CB58E
00001882 35.13646698 PCI: Config Read - (3,0,0) offset b length 1
buffer FFFFF880031CB58F
00001883 35.13647079 PCI: Config Read - (3,0,0) offset e length 1
buffer FFFFF880031CB594
00001884 35.13647842 PCI: Config Read - (3,0,0) offset 2c length
2 buffer FFFFF880031CB590
00001885 35.13648224 PCI: Config Read - (3,0,0) offset 2e length
2 buffer FFFFF880031CB592
00001929 35.13666153 PCI: PDO(b=0x3, d=0x0,
f=0x0)<-QUERY_DEVICE_RELATIONS
00001936 35.13668442 PCI: Config Read - (4,0,0) offset 0 length 2
buffer FFFFF880031CB578
00001938 35.13669205 PCI: Config Read - (4,0,0) offset 0 length 2
buffer FFFFF880031CB588
00001939 35.13669586 PCI: Config Read - (4,0,0) offset 2 length 2
buffer FFFFF880031CB58A
00001940 35.13670349 PCI: Config Read - (4,0,0) offset 8 length 1
buffer FFFFF880031CB58C
00001941 35.13670731 PCI: Config Read - (4,0,0) offset 9 length 1
buffer FFFFF880031CB58D
00001942 35.13671112 PCI: Config Read - (4,0,0) offset a length 1
buffer FFFFF880031CB58E
00001943 35.13671494 PCI: Config Read - (4,0,0) offset b length 1
buffer FFFFF880031CB58F
00001944 35.13671875 PCI: Config Read - (4,0,0) offset e length 1
buffer FFFFF880031CB594
00001945 35.13672638 PCI: Config Read - (4,0,0) offset 2c length
2 buffer FFFFF880031CB590
00001946 35.13673019 PCI: Config Read - (4,0,0) offset 2e length
2 buffer FFFFF880031CB592
00001990 35.13692856 PCI: PDO(b=0x4, d=0x0,
f=0x0)<-QUERY_DEVICE_RELATIONS

// These writes are actually the driver restoring the PCI config space in
EvtDeviceD0Entry (this is my workaround)
00001992 35.99947357 PCI: External Config Write
00001993 35.99948120 PCI: Config Write - (3,0,0) offset 0 length
100 buffer FFFFF88000E81440
00001994 35.99949265 PCI: External Config Write
00001995 35.99958038 PCI: Config Write - (4,0,0) offset 0 length
100 buffer FFFFF88000E81440

Note that other than my driver reading the PCI config space, there are no
other reads of PCI config space by pci.sys prior to the system sleeping.
The same is true if the driver is uninstalled, so it’s not like it’s simply
caching the data that was read by my driver. The PCI config state is thus
not being saved by pci.sys when going from D0->D3.

Upon awakening, pci.sys reads a number of fields from the PCI config space
for each device, but does not perform any writes to restore their PCI config
state. The only writes are those done by my driver in EvtDeviceDoEntry where
the driver actually restores the PCI config state as a workaround to the
fact that it’s not getting restored by pci.sys.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-530010-
xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Thursday, March 28, 2013 8:04 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCI config space uninitialized after resuming from S3

The failing device isn’t the only one behind a given bridge, is it? Just
checking.

I’ve asked this before, but since we’re grasping at straws I’ll ask it
again.

What, exactly, do you see on the PCIe bus trace for your device? What
accesses are performed on your device around the power state transitions?
What do you see at the D0->D3 transition? What do you see on the D3->D0
transition?

Specifically… do you see the state of the device being saved on D0->D3?

Do you see the VID/DID being read on D3->D0, and coming back correctly?

Do you see any additional accesses on D3->D0?

Note that Mr. Guan – who I know to be a very clever engineer with an
excellent hardware background – has asked you to post the analyzer log
someplace so he (and whoever else may have time) can take a look. You
might want to do that (or at least acknowledge his offer).

It *really* seems to me that you could discern a lot from the analyzer
log…

Peter
OSR


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I knew I could get you intrigued on this one…

I agree that the change in the BIOS added an unexpected twist. And then they
reverted back to the prior BIOS rev and the problem returns. What was even
more strange about the results on that particular system was (as reported to
me), they could only reproduce the problem in Vista, and not in Win7 or
Win8, whereas I can repro it in all three versions of Windows on my test
system.

I will see if I can get a copy of the BIOS binaries for analysis of the _OSC
methods.

For the most part, my client has otherwise had trouble reproducing this
problem on any of their other systems (mostly high end workstations), which
is why they probably never caught it in all of their testing. Fortunately,
the test system I am currently using will repro it 100% of the time.

The driver for the device does not support XP, but there’s certainly no
reason why I can’t boot XP on a system that has the card installed if I take
the time to do an install of XP. The behavior is the same even if no driver
is installed, except that of course the driver isn’t there to restore the
PCI config space itself.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-530011-
xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Thursday, March 28, 2013 8:21 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCI config space uninitialized after resuming from S3

(This problem has me intrigued… can you tell?)

You know what really bothers me about this problem? The report that a
BIOS change made one system work that didn’t work previously.

Assuming this is a correct report, his leads me to believe that … duh…
the
BIOS must have something to do with the problem. And… hmmm… what
the BIOS *does* control is whether the system utilizes PCI Express Native
Control. When enabled, this *will* result in PCI.SYS using native power
management and checking for hot-plug and stuff.

I don’t know if any of that matters… but I know of anything else that
could
make a difference.

Do you have the before and after BIOS revisions where updating the BIOS
fixed the problem? If so, dump them and see what changed. Look for the
_OSC method… what does it return?

If not, can you compare the BIOS on various systems – working and non-
working – to see if there’s a difference based on _OSC?

Hey… does this device support XP? Does the device *ever* fail on an XP
system?

Peter
OSR


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Actually, I never said it wasn’t reading back the same Vendor ID / Device ID
on wake. What I said was that it wasn’t restoring the PCI config space on
wake.

I fully agree at this point that we need a PCIe analyzer trace to really get
to the bottom of what’s going on here.

And, the known non-compliance is that the device does not support ASPM. We
have no idea if this is contributing to this problem or not.

Thanks for your insights. I’ll take that into account once I have an
analyzer log to look at.

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@attotech.com
Sent: Friday, March 29, 2013 7:35 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] PCI config space uninitialized after resuming from S3

Alright, I’m finally going to jump in here. I have some experience in this
area.

As has been pointed out, the problem is that the OS is not reading back the
same Vendor ID / Device ID on wake. This can be caused by several things.

  1. If link training on the PCIe bus to the card is not complete by the time
    the host comes along with the read of the VID/PID, the bridge leading to the
    card may respond with an Unsupported Request (UR). That is going to return
    all F’s on the Config Read, which of course doesn’t match the previous one.
    The config space will not be restored.

  2. If the device is not ready to respond to Config accesses yet at the time
    the Config Read is sent down, if it’s there, it should be responding with
    Config Retry. This is most often used when the device has firmware that
    needs to run to initialize some PCI registers before the BIOS or OS scans
    the bus. These Config Reads should be retried but I don’t know for how
    long. If the card is doing anything else, and the Config Read times out, it
    is going to return all F’s on the Config Read which of course doesn’t match
    the previous VID/PID. Again, the config space will not be restored.

  3. The device may be slow responding to Config Reads in certain
    circumstances because it’s busy with something. How long will the host wait
    for the read to complete? Well that depends. I have seen most systems wait
    as much as about 17 milliseconds. However, we have a certain Intel platform
    that waits only 800 us. Now, the card in question where we were having a
    problem took 5 milliseconds to respond to the Config Read of the MSI-X
    Control register after the OS slammed down writes to the 1024 MSI-X vectors
    because there was a FIFO on the card that filled up with all those memory
    writes. The Config Read could not be responded to until the chip processed
    all the memory writes in the FIFO. Furthermore we observed that if the card
    was placed in a certain slot in that motherboard, things worked (no
    timeout). We got a BIOS update and sure enough now the card worked in all
    but one slot. In the slot where the card didn’t work the system still timed
    out the Config Read in 800 uS and in the other slots it did not time out the
    request. We tried to get Intel to fix the System BIOS but they said that
    they wouldn’t do anything because the motherboard was EOL. We only saw this
    problem on this one system and we have tested it on many others.

As others have said, get a PCIe trace of the system coming out of sleep.
Otherwise you’re just wasting bandwidth.

Oh, and I don’t think you have told us what the “known noncompliance” is.

“Jay Talbott”
Sent by: xxxxx@lists.osr.com

03/28/2013 09:14 PM

Please respond to
“Windows System Software Devs Interest List”

To

“Windows System Software Devs Interest List”

cc

Subject

RE: [ntdev] PCI config space uninitialized after resuming from S3

There’s only one known non-compliance, and we don’t know if that’s
contributing to the problem at hand or not. What we are trying to determine
at this point is what’s the root cause of the problem so that it can get
fixed in a future revision of my client’s hardware. If ultimately it’s a
result of the known non-compliance, then we know that fixing that will also
resolve this problem. But if it’s being caused by something else, we want to
know what it is so that it too can be corrected.

According to my client, the PCIe interface / DMA engine IP that was obtained
from a 3rd party passed the PCI compliance, but they admitted they had not
done their own PCI compliance testing once they integrated that IP into
their custom ASIC. I personally recommended that they take their hardware to
the next PCI compliance workshop coming up in April, but whether they do
that or not is out of my hands.

> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:bounce-530001-
> xxxxx@lists.osr.com] On Behalf Of Pavel A.
> Sent: Thursday, March 28, 2013 4:28 PM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] PCI config space uninitialized after resuming from S3
>
> If there’s a chance that the hardware is broken, all bets are off
> (a.k.a. Undefined Behavior).
> Basically your question now is, how robustly pci.sys behaves if the
> device is “less than 100%” compliant.
> We can only guess that pci.sys probably was not tested against all
> imaginable sorts of broken hardware so there may be some undriven paths.
> So what then?
> Has your company ever done formal PCI spec tests on this device?
> – pa
>
> On 29-Mar-2013 01:11, Jay Talbott wrote:
> > That’s what I expected. But that’s not what it does. The device remains
in
> > Device Manger. The driver’s EvtDeviceD0Entry gets called. Everything
> > proceeds normally as it should when the system wakes from sleep…
> except
> > that the PCI bus driver doesn’t restore the device’s PCI config data. As
far
> > as everything else in the system is concerned, the device should be
ready
> to
> > resume operation. But obviously without the PCI config data getting
> > restored, bad things happen when attempting to resume communication
> with the
> > device.
> >
> > As I previously mentioned, my workaround is to have the driver handle
the
> > PCI config save and restore, which works, but only masks the underlying
> > issue.
> >
> >> -----Original Message-----
> >> From: xxxxx@lists.osr.com [mailto:bounce-529997-
> >> xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
> >> Sent: Thursday, March 28, 2013 3:51 PM
> >> To: Windows System Software Devs Interest List
> >> Subject: RE:[ntdev] PCI config space uninitialized after resuming from
S3
> >>
> >>> The code doesn?t want to attempt
> >> to restore the state of a device that is no longer present.
> >>
> >> Won’t PCI.SYS report the device change and exclude the PDO from
> >> QUERY_DEVICE_RELATIONS then?
> >>
> >> —
>
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

— NTDEV is sponsored by OSR OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars To unsubscribe, visit the List Server section of
OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks for the idea. However, it appears that pci.sys is restoring the PCI
config space for all devices in the tree except these devices, which really
points to there being something unique about my client’s hardware that is
contributing to the problem. The point is to try to identify and understand
the root source of the problem so they can fix it in a future hardware
respin.

In the meantime, they are planning on using my workaround in the driver as a
fix.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-530029-
xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Friday, March 29, 2013 2:05 AM
To: Windows System Software Devs Interest List
Subject: RE: RE:[ntdev] PCI config space uninitialized after resuming from
S3

> You know what really bothers me about this problem? The report that a
BIOS change made one system work that didn’t work previously.

There is that option in some BIOS’s that says “OS is PnP”, and if you say
yes, it
only configures a minimum number of PCI devices needed to boot, and if you
say no, it configures the whole PCI tree. Isn’t there then something about
Windows leaving the PCI configuration alone for devices that are
configured
by the BIOS? Perhaps if the device is configured by the BIOS before boot,
it
makes some assumption (incorrectly) that the BIOS will also configure the
device during power state transitions. I could imagine updating the BIOS
changes the default for “OS is PNP” since that was really a leftover from
long
ago.

Just an idea that came to me via the PDP (psychic debugging protocol).
Looking at actual PCIe traces would certainly be better.

Jan


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Can you print the dump of the config descriptors for us?

It’s taking an awfully longtime to get that trace…

Zzzzzzzzz…

Peter
OSR