Windows dies after several DMA operations

Please forgive me if it is not appropriate to ask such as question over
here.

I developed a driver to manage the DMA operation. In my DMA system, there
have 3 devices involved: PLX 9056 on PCI, local device, and a FPGA device in
between. The driver is developed based on PLX SDK 4.1, but we only support
Scatter/Gather List DMA with demand mode. The DMA operation works fine
initially, but after several tries (10 - 100 times), Windows is totally
locked up. I even can’t break in from the remote kernel debugger (WinDbg).
The place where Windows dies is kind of random. Also, it is easier to die
when the DMA size is big like 10MB or more.

The advice I am seeking is, what should I do to isolate the problem? What
may cause Windows to lockup, software or likely hardware?

Thanks,

zhong

I’ve seen this kind of lockup happen often enough when our graphics chip
mistreated the PCI bus or the bridge. We used a logic analyzer to figure out
what was going on.

As an alternative, you can get hold of a design of a PCI board that has a
switch attached to a wire: you push the switch and it generates an NMI, and
that’s sometimes enough to get the debugger to react. One of our guys here
wired a couple of those boards for us. You can find it at

http://www.microsoft.com/whdc/system/CEC/dmpsw.mspx

It works very nicely with SoftICE and BoundsChecker, chances are it’ll work
with Windbg too. Hope this helps !

Alberto.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of
xxxxx@exgate.tek.com
Sent: Thursday, April 29, 2004 11:58 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Windows dies after several DMA operations

Please forgive me if it is not appropriate to ask such as question over
here.

I developed a driver to manage the DMA operation. In my DMA system, there
have 3 devices involved: PLX 9056 on PCI, local device, and a FPGA device in
between. The driver is developed based on PLX SDK 4.1, but we only support
Scatter/Gather List DMA with demand mode. The DMA operation works fine
initially, but after several tries (10 - 100 times), Windows is totally
locked up. I even can’t break in from the remote kernel debugger (WinDbg).
The place where Windows dies is kind of random. Also, it is easier to die
when the DMA size is big like 10MB or more.

The advice I am seeking is, what should I do to isolate the problem? What
may cause Windows to lockup, software or likely hardware?

Thanks,

zhong


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

If you can’t break in with WinDBG, it’s most likely that the PCI bus is
locked up (i.e. hardware failure). Technically it would be possible for the
CPU to continue to operate after the PCI bus is locked up (as long as the
cause of the lock wasn’t the CPU), but sooner or later, the CPU will most
likely try to access the PCI bus (if not sooner, when it tries to fetch the
packet from serial port that is sent when you try to break in with WinDBG…
;-).

Hard to debug these unless you have some sort of logic analyzer type device
that you can hook up to the PCI bus. A dedicated PCI bus analyzer is of
course the best option, as it’s going to tell you what everything means in a
better way than a generic logic analyzer, but as long as you (or some
collegue) have some understanding of how the PCI bus works, you should be
able to analyze it with a simple logic analyzer, as long as it’s capable of
analyzing 33MHz or more. You don’t need all the address and data pins to
figure out things, just a dozen or two will do. [In fact, you can probably
get away with using a 4 probe 'scope if you’re really clever and desperate,
but a few hundred dollars worth of instrument rental should be able to get
you a LA that will give you plenty more info]. I’m not an expert on PCI, but
I’ve worked with some people who are/were, and it’s not TOO complicated to
figure out what the access that hung the system is…


Mats

-----Original Message-----
From: xxxxx@exgate.tek.com
[mailto:xxxxx@exgate.tek.com]
Sent: Thursday, April 29, 2004 4:58 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Windows dies after several DMA operations

Please forgive me if it is not appropriate to ask such as
question over
here.

I developed a driver to manage the DMA operation. In my DMA
system, there
have 3 devices involved: PLX 9056 on PCI, local device, and a
FPGA device in
between. The driver is developed based on PLX SDK 4.1, but we
only support
Scatter/Gather List DMA with demand mode. The DMA operation works fine
initially, but after several tries (10 - 100 times), Windows
is totally
locked up. I even can’t break in from the remote kernel
debugger (WinDbg).
The place where Windows dies is kind of random. Also, it is
easier to die
when the DMA size is big like 10MB or more.

The advice I am seeking is, what should I do to isolate the
problem? What
may cause Windows to lockup, software or likely hardware?

Thanks,

zhong


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@3dlabs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com