Minimally invasive

OSR_Community_User · February 7, 2005, 6:16pm

It has been many years since I used an ICE on an x86 system.

The question I have for the assembled wisdom here is: what do you use when
faced with an intractable problem?

In my particular case, turning on the kernel debugger (e.g. /debugport=1394
/channel=33) causes my problem to go away.

So the question I have for the geniuses here is: If you can’t get hold of an
ICE (too expensive) and you can’t get the debugger to replicate the problem,
then what do you do to debug?

Clearly, one thing that can be done is to do a KeBugCheck and get a static
dump when a certain condition happens in your code. But what do you do if
the condition occurs elsewhere?

A static dump doesn’t help in this case because a driver really doesn’t
maintain much in the way of state information.

One of the things that I do is create an in-memory buffer that collects
certain state information. File and line number are logged. This helps
localize what is happening when the crash occurs. When the crash occurs,
one can scan the MEMORY.DMP file for the human readable trace buffer.

Writing to a memory buffer is better than writing to a log file because the
timing is better. It is worse than writing to a log file because sometimes
you crash with a system hang.

Speaking of MEMORY.DMP, does anyone know the rules under which MEMORY.DMP is
generated from PAGEFILE.SYS? I’ve seen plenty of times when I see memory
being written but MEMORY.DMP is not created.

Specifically, the following does not seem to work on my system.

(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows on partition D:.
(4) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(5) Boot into Windows on partition C:
(6) C:\WINDOWS\MEMORY.DMP isn’t there (or it is an old copy)

What seems to work consistently for me is
(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows safe mode partition C:.
(4) Shut down safe mode Windows.
(5) Boot into Windows on partition D:.
(6) C:\WINDOWS\MEMORY.DMP is there and can be examined by WinDBg, etc.
(7) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(8) Boot into Windows on partition C: so that one can remove (using the
various SetupDi services) and reinstall the bad driver.

What I really want is to be able to use (in the absence of an ICE) are the
debug registers to trap certain conditions in other people’s code.

Is this doable in a reasonable amount of code?

Does anyone else have suggestions on how to do minimally invasive kernel
mode debugging?

Ralph Shnelvar

Michal_Vodicka-2 · February 7, 2005, 9:05pm

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Ralph Shnelvar[SMTP:xxxxx@dos32.com]
Reply To: Windows System Software Devs Interest List
Sent: Tuesday, February 08, 2005 12:19 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Minimally invasive

Clearly, one thing that can be done is to do a KeBugCheck and get a static
dump when a certain condition happens in your code. But what do you do if
the condition occurs elsewhere?

There are two possibilities: the condition causes bugcheck and then you have a dump or you need to invoke BSOD manually. It is easy: enable system crash using the keyboard. It is described in the WinDbg docs and usually work even for hung system.

A static dump doesn’t help in this case because a driver really doesn’t
maintain much in the way of state information.

One of the things that I do is create an in-memory buffer that collects
certain state information. File and line number are logged. This helps
localize what is happening when the crash occurs. When the crash occurs,
one can scan the MEMORY.DMP file for the human readable trace buffer.

Basically, you describe traces and yes, this is the way to go. ETW (search docs and list archives for details) is MS supported way and IIRC there is a KD extension which can extract trace info from memory dump. Next possibility is to use DbgPrint and System Internals DebugView utility which is able to extract trace buffer from memory dump, too.

There are cases when problem can’t be reproduced with traces enabled because of changed timing. Usually for race conditions which is one of the worst problems to debug. It is necessary to make traces as fast as possible. Recently I had such a problem which was reproducible once per week under high stress on the other side of the world. With standard traces enabled it disappeared of course. So I changed trace macros which normally use DbgPrint to store info (RDTSC result, line, file, trace parameters) to fixed size records and stored them to the big memory buffer. User was able to reproduce it with this driver version, forced dump and finally I found the problem there. Please note everything necessary was to change existing trace macros, no other change to code.

Writing to a memory buffer is better than writing to a log file because the
timing is better. It is worse than writing to a log file because sometimes
you crash with a system hang.

No if you enable BSOD from keyboard. The only situation it didn’t work for me was system hang during shutdown. Fortunately, problem occured also during suspend where it worked.

Speaking of MEMORY.DMP, does anyone know the rules under which MEMORY.DMP is
generated from PAGEFILE.SYS? I’ve seen plenty of times when I see memory
being written but MEMORY.DMP is not created.

Specifically, the following does not seem to work on my system.

(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows on partition D:.
(4) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(5) Boot into Windows on partition C:
(6) C:\WINDOWS\MEMORY.DMP isn’t there (or it is an old copy)

I guess you mix pagefile and memory dump file. Pagefile is stored in the drive root so c:\pagefile.sys and not c:\windows\pagefile.sys. Memory content is saved to pagefile on BSOD and copied to dump file after next boot. If OS at partition D: is configured to use pagefile at partition C:, dump is copied (or destroyed) during step 3 of course. Maybe even if not configured to use pagefile at C.

What I really want is to be able to use (in the absence of an ICE) are the
debug registers to trap certain conditions in other people’s code.>

Is this doable in a reasonable amount of code?

Never tried it manually; when I (rarely) need it, I use SoftICE debugger.

Does anyone else have suggestions on how to do minimally invasive kernel
mode debugging?

In my experience traces are the best way. Traces should be natural part of code; adding them when you need to find something is unpleasant work. They can almost completely avoid the need for a debugger; I debug all my code (kernel, user, embedded) purely using traces. It is much more comfortable than using debugger and allows to debug real time problems. And when I have a problem somewhere, there is no need to reproduce it at my machine. I just send info how to enable and capture related traces and later simply analyze the log. 95% of problems can be found within 5 minutes of log reading.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

OSR_Community_User · February 8, 2005, 11:31am

Have you tried soft-ice or serial port debugging instead of firewire.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ralph Shnelvar
Sent: Monday, February 07, 2005 6:20 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Minimally invasive

It has been many years since I used an ICE on an x86 system.

The question I have for the assembled wisdom here is: what do you use
when
faced with an intractable problem?

In my particular case, turning on the kernel debugger (e.g.
/debugport=1394
/channel=33) causes my problem to go away.

So the question I have for the geniuses here is: If you can’t get hold
of an
ICE (too expensive) and you can’t get the debugger to replicate the
problem,
then what do you do to debug?

Clearly, one thing that can be done is to do a KeBugCheck and get a
static
dump when a certain condition happens in your code. But what do you do
if
the condition occurs elsewhere?

A static dump doesn’t help in this case because a driver really doesn’t
maintain much in the way of state information.

One of the things that I do is create an in-memory buffer that collects
certain state information. File and line number are logged. This helps
localize what is happening when the crash occurs. When the crash
occurs,
one can scan the MEMORY.DMP file for the human readable trace buffer.

Writing to a memory buffer is better than writing to a log file because
the
timing is better. It is worse than writing to a log file because
sometimes
you crash with a system hang.

Speaking of MEMORY.DMP, does anyone know the rules under which
MEMORY.DMP is
generated from PAGEFILE.SYS? I’ve seen plenty of times when I see
memory
being written but MEMORY.DMP is not created.

Specifically, the following does not seem to work on my system.

(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows on partition D:.
(4) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(5) Boot into Windows on partition C:
(6) C:\WINDOWS\MEMORY.DMP isn’t there (or it is an old copy)

What seems to work consistently for me is
(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows safe mode partition C:.
(4) Shut down safe mode Windows.
(5) Boot into Windows on partition D:.
(6) C:\WINDOWS\MEMORY.DMP is there and can be examined by WinDBg, etc.
(7) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(8) Boot into Windows on partition C: so that one can remove (using the
various SetupDi services) and reinstall the bad driver.

What I really want is to be able to use (in the absence of an ICE) are
the
debug registers to trap certain conditions in other people’s code.

Is this doable in a reasonable amount of code?

Does anyone else have suggestions on how to do minimally invasive kernel
mode debugging?

Ralph Shnelvar

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@nsisoftware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 8, 2005, 1:20pm

Ivan:

My copy of Soft-Ice is far to old to be useful. Worse, I have severe budget
constraints.

I have not tried serial port debugging. That’s a good idea.

Ralph Shnelvar

On Tue, 8 Feb 2005 11:30:52 -0500, you wrote:

Have you tried soft-ice or serial port debugging instead of firewire.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ralph Shnelvar
Sent: Monday, February 07, 2005 6:20 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Minimally invasive

It has been many years since I used an ICE on an x86 system.

The question I have for the assembled wisdom here is: what do you use
when
faced with an intractable problem?

In my particular case, turning on the kernel debugger (e.g.
/debugport=1394
/channel=33) causes my problem to go away.

So the question I have for the geniuses here is: If you can’t get hold
of an
ICE (too expensive) and you can’t get the debugger to replicate the
problem,
then what do you do to debug?

Clearly, one thing that can be done is to do a KeBugCheck and get a
static
dump when a certain condition happens in your code. But what do you do
if
the condition occurs elsewhere?

A static dump doesn’t help in this case because a driver really doesn’t
maintain much in the way of state information.

One of the things that I do is create an in-memory buffer that collects
certain state information. File and line number are logged. This helps
localize what is happening when the crash occurs. When the crash
occurs,
one can scan the MEMORY.DMP file for the human readable trace buffer.

Writing to a memory buffer is better than writing to a log file because
the
timing is better. It is worse than writing to a log file because
sometimes
you crash with a system hang.

Speaking of MEMORY.DMP, does anyone know the rules under which
MEMORY.DMP is
generated from PAGEFILE.SYS? I’ve seen plenty of times when I see
memory
being written but MEMORY.DMP is not created.

Specifically, the following does not seem to work on my system.

(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows on partition D:.
(4) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(5) Boot into Windows on partition C:
(6) C:\WINDOWS\MEMORY.DMP isn’t there (or it is an old copy)

What seems to work consistently for me is
(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows safe mode partition C:.
(4) Shut down safe mode Windows.
(5) Boot into Windows on partition D:.
(6) C:\WINDOWS\MEMORY.DMP is there and can be examined by WinDBg, etc.
(7) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(8) Boot into Windows on partition C: so that one can remove (using the
various SetupDi services) and reinstall the bad driver.

What I really want is to be able to use (in the absence of an ICE) are
the
debug registers to trap certain conditions in other people’s code.

Is this doable in a reasonable amount of code?

Does anyone else have suggestions on how to do minimally invasive kernel
mode debugging?

Ralph Shnelvar

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@nsisoftware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 8, 2005, 1:43pm

Have you considered the possibility of hardware defects? I had a
similarly elusive problem in my code (I change the code, the problem
appears, I changed it back, the problem went away) that I chased for a
couple of weeks. Finally I started considering hardware. I swapped the
RIMMs (Rambus was popular at Intel back then) and the problem went away.
Swapped them back, problem reappeared. I subsequently replaced the RIMMs,
of course.

It’s uncommon, but not impossible.

Phil

Philip D. Barila
Seagate Technology LLC
(720) 684-1842

xxxxx@lists.osr.com wrote on 02/07/2005 04:19:30 PM:

It has been many years since I used an ICE on an x86 system.

The question I have for the assembled wisdom here is: what do you use
when
faced with an intractable problem?

In my particular case, turning on the kernel debugger (e.g.
/debugport=1394
/channel=33) causes my problem to go away.

So the question I have for the geniuses here is: If you can’t get hold
of an
ICE (too expensive) and you can’t get the debugger to replicate the
problem,
then what do you do to debug?

Clearly, one thing that can be done is to do a KeBugCheck and get a
static
dump when a certain condition happens in your code. But what do you do
if
the condition occurs elsewhere?

A static dump doesn’t help in this case because a driver really doesn’t
maintain much in the way of state information.

One of the things that I do is create an in-memory buffer that collects
certain state information. File and line number are logged. This helps
localize what is happening when the crash occurs. When the crash
occurs,
one can scan the MEMORY.DMP file for the human readable trace buffer.

Writing to a memory buffer is better than writing to a log file because
the
timing is better. It is worse than writing to a log file because
sometimes
you crash with a system hang.

Speaking of MEMORY.DMP, does anyone know the rules under which
MEMORY.DMP is
generated from PAGEFILE.SYS? I’ve seen plenty of times when I see
memory
being written but MEMORY.DMP is not created.

Specifically, the following does not seem to work on my system.

(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows on partition D:.
(4) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(5) Boot into Windows on partition C:
(6) C:\WINDOWS\MEMORY.DMP isn’t there (or it is an old copy)

What seems to work consistently for me is
(1) Boot into Windows on partition C:.
(2) Create a crash dump (C:\Windows\pagefile.sys)
(3) Boot into Windows safe mode partition C:.
(4) Shut down safe mode Windows.
(5) Boot into Windows on partition D:.
(6) C:\WINDOWS\MEMORY.DMP is there and can be examined by WinDBg, etc.
(7) Delete c:\WINDOWS\system32\drivers\baddriver.sys
(8) Boot into Windows on partition C: so that one can remove (using the
various SetupDi services) and reinstall the bad driver.

What I really want is to be able to use (in the absence of an ICE) are
the
debug registers to trap certain conditions in other people’s code.

Is this doable in a reasonable amount of code?

Does anyone else have suggestions on how to do minimally invasive kernel
mode debugging?

Ralph Shnelvar

Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@seagate.com
To unsubscribe send a blank email to xxxxx@lists.osr.com