Glad to be able to help.
Now, I am sure that the problem is caused by hardware. It is FPGA
Based on my experience with the fabless chip companies I work for, it’s
a good practice to find as many problems as possible in pre-silicon
stage. Just try to break your FPGA in any creative way and get it fixed.
Once it got taped out, it’s extremely expensive to fix silicon by
spinning another revision. It can easily cost half to one million
dollars - ouch! On the other hand, we don’t find all problems nor can we
run FPGA at full clk rate in general. Backend, timing closure issues
can’t be identified. (Board level design company usually does not have
such trouble as chip company has.) Another problem of FPGA is that the
signal level is weak, most of time it can’t even drive the bus analyzer
especially, PCI Express, and the machine may boot one out of ten.
Thank you for your advice to use a PCI Bus analyzer. It really helped
me.
Bus analyzer is my best friend in dealing with bus level problems when
you have exhausted all resources you have with a PC and a s/w debugger.
It’s a must for hardcore performance tuning at I/O level. Plus, it
teaches you how exactly the bus system works better than any books and
seminars do. I’d suggest every PCI driver programmer has one in hand.
Unfortunately, most of the product I worked with are either soldered on
motherboard or integrated into north bridge when it comes to production
form – no more tracing.
Calvin Guan (DDK MVP)
Sr. Staff Engineer
NetXtreme NTX Miniport
Broadcom Corporation, Irvine CA 92618
Connecting Everything(r)
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-260425-
xxxxx@lists.osr.com] On Behalf Of Igor Sharovar
Sent: Tuesday, August 29, 2006 2:00 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Problem of accessing PCI deviceCalvin,
My device behaves exactly as you described. It asserts DEVSEL# but
never
drivesTRDY# and a PCI master goes into a forever retry cycle.
Now, I am sure that the problem is caused by hardware. It is FPGA not
a
standard memory. I also have another PCI mapped memory. It is SDRAM
and it
works fine.Thank you for your advice to use a PCI Bus analyzer. It really helped
me.Igor
“Calvin (Hao) Guan” wrote in message
> news:xxxxx@ntdev…
> Igor:
>
> You got me interested in the good old PCI stuff:)
>
> Once the target has asserted DEVSEL#, Master Abort can NOT take
place.
> (Sorry, I made an inaccurate statement in my previous post. I haven’t
> worked on the PCI technology for a while).
>
> Master Abort can happen if:
> 1) The target does not drive the DEVSEL# after the master has asserted
> FRAME# for 5 PCI clk cycles, OR
> 2) The master’s latency timer expired.
>
> Does your device drive the TRDY# during the first (probably also the
> last) data phase when the C/BE# is being driven? If it doesn’t but
still
> is driving the DEVSEL#, this is called “WAIT state” inserted by
target,
> the master of a PCI will then retry until the target finishes the
> transaction by de-asserting TRDY# and DEVSEL#. If the target doesn’t
do
> that nor does it initiate a “target abort”, the bus will never goto
idle
> state and the machine locks. Has the target even attempted to
terminate
> the transaction by driving the STOP#? (pci-x would use split
transaction
> to improve bus efficiency.)
>
> You need to fix the chip, or at least have your h/w folks figure out
why
> the device is not responding to an Mrd.
>
> “PCIe completion timer” is used in “PCI Express” bus. (PCI-X has
similar
> concept for the split transaction). PCIe is a serial bus that
endpoints
> use TLP, DLLP to talk instead of the traditional PCI bus signals such
as
> FRAME,GNT,DEVSEL,IRDY,TRDY,C/BE,STOP. Requestor (similar to master)
will
> fire a timer each time it makes a non-posted request towards the
> completer(similar to target). If the completion timer expired and the
> completer didn’t complete the request, a completion timeout for that
> request occurs and the master will signal the chipset such an error.
If
> the completer completed the request AFTER the completion timeout had
> occurred, then it’s a fatal error. If it “completed with data”, then
> it’s nightmare.
>
> Depending on how the chipset is configured and the way the platform
> handles it, a completion timeout could lead to a beautiful NMI where
the
> kb command would show a stack back trace in the middle of nowhere:)
>
> Good luck and keep us posted
>
> Calvin Guan (DDK MVP)
> Sr. Staff Engineer
> NetXtreme NTX Miniport
> Broadcom Corporation, Irvine CA 92618
> Connecting Everything(r)
>
> > -----Original Message-----
> > From: xxxxx@lists.osr.com [mailto:bounce-260280-
> > xxxxx@lists.osr.com] On Behalf Of Igor Sharovar
> > Sent: Monday, August 28, 2006 1:18 PM
> > To: Windows System Software Devs Interest List
> > Subject: Re:[ntdev] Problem of accessing PCI device
> >
> > Calvin,
> > I run a bus analyzer and found what happened.
> >
> > The driver initiates a PCI transaction and the PCI master sends a
PCI
> read
> > request.
> >
> > A Target asserts the DEVSEL# but never returns data. The state of
C/BE
> > never
> > changes.
> >
> > The PCI master continues to do Retry and the driver never finishes
its
> > Read
> > command.
> >
> > The manual of the PCI Bridge says that a PCI Master Abort generated
> when
> > no
> > target responds with PCIDevSel within the required time-out window.
I
> my
> > case the Target responds with DevSel, but it doesn’t finish
> transaction.
> > I
> > couldn’t find in the manual the others condition for generating the
> Master
> > Abort.
> >
> > Does it mean that I couldn’t solve this problem in my device?
> >
> > You mention about PCIe completion timer. What is that?
> >
> >
> >
> > Igor
> >
> >
> >
> > “Calvin (Hao) Guan” wrote in message
> > news:xxxxx@ntdev…
> > If a read transition is not claimed by any PCI agent, the initiating
> > agent will return all-ones so everyone is happy (except for your
> driver
> > of course). However, if it’s claimed by an agent and it’s not doing
> > things right, many issues can occur depending on how the involved
> agents
> > handle the transaction.
> >
> > In PCI case, if the target had claimed the transaction (by
responding
> to
> > the #FRAME, and asserting #TRDY) but is not responding later during
> the
> > entire transaction, regardless of whether or not additional timeout
> > facility (such as the PCIe completion timer) is available, the
master
> > will at least attempt to terminate the transaction due to expiration
> of
> > the internal Latency Timer by initiating master abort cycle. The
> master
> > will de-assert the #GNT, #FRAME, #IRDY signals in order. In response
> to
> > a Master Abort, the target is supposed to de-assert the #DEVSEL and
> > #TRDY. If the target is complete dead, then the master abort can not
> > take place and the entire system will get locked you don’t even get
an
> > NMI.
> >
> > The target, during the transaction can also initiate a termination
> which
> > is even more complicated.
> >
> > There are many funny things can happen when the bus protocol is
> broken.
> > Different bus technologies (pci,pci-x,pci-e,agp) have their own
> > protocols and vendors may have their own implementation.
> >
> > A bus analyzer (or logic analyzer in some cases) is your best friend
> if
> > the device is not soldered on motherboard nor integrated into
chipset.
> >
> >
> > Calvin Guan (DDK MVP)
> > Sr. Staff Engineer
> > NetXtreme NTX Miniport
> > Broadcom Corporation
> > Connecting Everything(r)
> >
> > > -----Original Message-----
> > > From: xxxxx@lists.osr.com [mailto:bounce-258846-
> > > xxxxx@lists.osr.com] On Behalf Of Igor Sharovar
> > > Sent: Thursday, August 10, 2006 2:16 PM
> > > To: Windows System Software Devs Interest List
> > > Subject: Re:[ntdev] Problem of accessing PCI device
> > >
> > > Mark,
> > > I mean ‘dies after a couple of seconds’ that the device doesn’t
> > work. We
> > > try to figure out what exactly happened but I don’t really care if
> the
> > > device works or not.
> > >
> > > I worry how the Windows driver responds to the PCI device failure.
> It
> > just
> > > cannot perform simple PCI read memory operation.
> > >
> > > I will try to run a PCI Bus analyzer. I hope it would help.
> > >
> > > Thank you
> > >
> > > “Roddy, Mark” wrote in message
> > news:xxxxx@ntdev…
> > > You will get 0xFFFFFFFF if the device is not present but there
> really
> > is
> > > no guarantee about what happens if the device is present but
> > > malfunctioning. What exactly do you mean by ‘dies after a couple
of
> > > seconds’? Perhaps a PCI bus analyzer would help.
> > >
> > > -----Original Message-----
> > > From: xxxxx@lists.osr.com
> > > [mailto:xxxxx@lists.osr.com] On Behalf Of Igor
Sharovar
> > > Sent: Thursday, August 10, 2006 3:18 PM
> > > To: Windows System Software Devs Interest List
> > > Subject: [ntdev] Problem of accessing PCI device
> > >
> > > Hello,
> > >
> > >
> > >
> > > I have a problem of accessing memory of PCI device.
> > >
> > > Here is my scenario.
> > >
> > > The PCI device starts during Power On of PC but dies after couple
> > > seconds.
> > >
> > > The driver of the PCI device starts later. It gets a PNP
> START_DEVICE
> > > message and maps PCI memory. When the driver tries to read this
map
> > > memory
> > > it never returns from a read operation. I used both
> READ_REGISTER_XXXX
> > > and
> > > direct access to memory.
> > >
> > > I don’t really understand what happened. I suppose even the PCI
> device
> > > memory is not available (like in my case when the device is dead)
> the
> > > driver
> > > should get some value, for example 0xFFFFFFFF.
> > >
> > > I would appreciate any suggestions and help.
> > >
> > >
> > >
> > > Igor Sharovar
> > >
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > To unsubscribe, visit the List Server section of OSR Online at
> > > http://www.osronline.com/page.cfm?name=ListServer
> > >
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > To unsubscribe, visit the List Server section of OSR Online at
> > > http://www.osronline.com/page.cfm?name=ListServer
> >
> >
> >
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > To unsubscribe, visit the List Server section of OSR Online at
> > http://www.osronline.com/page.cfm?name=ListServer
>
>
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer