A Multiproccessing problem

Hi all,
I have a NIC device driver that works fine at single processor machines and I want improve it so that can work on SMP machines. I’ve used NdisDprAquireSpinLock, NdisDprReleaseSpinLock and NdisMSyncronizeWithInterrupt functions for this propose. But My driver hangs after a lot of time, actually CPU is in a TEST_AND_SET_BIT loop, probably due to NdisMSyncronizeWithInterrupt machine code.
Now I have two problems:
first, How can I find my mistake in synchronization process.
second that is much more important in my debug process: I want to use SoftIce to debug my code as I did previously on single processor machine, but after a clean installation of SoftIce
on target machine and before enabling the driver, I can’t activate SoftIce through Ctrl-D hotkey while “blue screen” appears and system halted. Does work SoftIce 4.05 on SMP machines or there exist any certain setting in SoftIce?
Thanks

NdisDprAcquireSpinLock() may only be called if you already know that the
IRQL == DISPATCH_LEVEL.

My first inclination is to just tell you to change those calls to
NdisAcquireSpinLock/NdisReleaseSpinLock.

However, I will just simply ask, why have you chosen to use
NdisDprXxxSpinLock() instead of NdisXxxSpinLock() in your case? Do you know
for certain that this is the correct choice?

As for you second problem I can only say that we stood around an watched
SoftIce get lowered into its grave quite some time ago. Use Windbg - for a
whole host of reasons that I will not cite since you can (and should have)
searched this list for the keyword “SoftIce” and found them all (and more).

Good Luck,
Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@BasamadCo.com
Sent: Saturday, August 29, 2009 5:40 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] A Multiproccessing problem

Hi all,
I have a NIC device driver that works fine at single processor machines and
I want improve it so that can work on SMP machines. I’ve used
NdisDprAquireSpinLock, NdisDprReleaseSpinLock and
NdisMSyncronizeWithInterrupt functions for this propose. But My driver hangs
after a lot of time, actually CPU is in a TEST_AND_SET_BIT loop, probably
due to NdisMSyncronizeWithInterrupt machine code.
Now I have two problems:
first, How can I find my mistake in synchronization process.
second that is much more important in my debug process: I want to use
SoftIce to debug my code as I did previously on single processor machine,
but after a clean installation of SoftIce
on target machine and before enabling the driver, I can’t activate SoftIce
through Ctrl-D hotkey while “blue screen” appears and system halted. Does
work SoftIce 4.05 on SMP machines or there exist any certain setting in
SoftIce?
Thanks


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I should have also included this recommendation in my first reply:

Turn on Verifier for your driver and (equally important) for NDIS.SYS. Make
sure to enable IRQL checking and deadlock detection. Then sit back and
watch the bugchecks come out (and they will).

Of course, this will all be much more productive if you use Windbg.

Good Luck,
Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of David R. Cattley
Sent: Saturday, August 29, 2009 7:43 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] A Multiproccessing problem

NdisDprAcquireSpinLock() may only be called if you already know that the
IRQL == DISPATCH_LEVEL.

My first inclination is to just tell you to change those calls to
NdisAcquireSpinLock/NdisReleaseSpinLock.

However, I will just simply ask, why have you chosen to use
NdisDprXxxSpinLock() instead of NdisXxxSpinLock() in your case? Do you know
for certain that this is the correct choice?

As for you second problem I can only say that we stood around an watched
SoftIce get lowered into its grave quite some time ago. Use Windbg - for a
whole host of reasons that I will not cite since you can (and should have)
searched this list for the keyword “SoftIce” and found them all (and more).

Good Luck,
Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@BasamadCo.com
Sent: Saturday, August 29, 2009 5:40 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] A Multiproccessing problem

Hi all,
I have a NIC device driver that works fine at single processor machines and
I want improve it so that can work on SMP machines. I’ve used
NdisDprAquireSpinLock, NdisDprReleaseSpinLock and
NdisMSyncronizeWithInterrupt functions for this propose. But My driver hangs
after a lot of time, actually CPU is in a TEST_AND_SET_BIT loop, probably
due to NdisMSyncronizeWithInterrupt machine code.
Now I have two problems:
first, How can I find my mistake in synchronization process.
second that is much more important in my debug process: I want to use
SoftIce to debug my code as I did previously on single processor machine,
but after a clean installation of SoftIce
on target machine and before enabling the driver, I can’t activate SoftIce
through Ctrl-D hotkey while “blue screen” appears and system halted. Does
work SoftIce 4.05 on SMP machines or there exist any certain setting in
SoftIce?
Thanks


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

And the good news for spinlock deadlocks is that they are generally
pretty easy to figure out as all the perps and victims are generally
on one processor or another executing code, with a lock or locks held
and/or spinning waiting for one of those locks.

Driver verifier and windbg are your best tools here.

!running -it

one of my new favorite windbg commands.

Mark Roddy

On Sat, Aug 29, 2009 at 7:43 AM, David R. Cattley wrote:
> NdisDprAcquireSpinLock() may only be called if you already know that the
> IRQL == DISPATCH_LEVEL.
>
> My first inclination is to just tell you to change those calls to
> NdisAcquireSpinLock/NdisReleaseSpinLock.
>
> However, I will just simply ask, why have you chosen to use
> NdisDprXxxSpinLock() instead of NdisXxxSpinLock() in your case? ?Do you know
> for certain that this is the correct choice?
>
> As for you second problem I can only say that we stood around an watched
> SoftIce get lowered into its grave quite some time ago. ?Use Windbg - for a
> whole host of reasons that I will not cite since you can (and should have)
> searched this list for the keyword “SoftIce” and found them all (and more).
>
> Good Luck,
> Dave Cattley
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of
> xxxxx@BasamadCo.com
> Sent: Saturday, August 29, 2009 5:40 AM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] A Multiproccessing problem
>
> Hi all,
> I have a NIC device driver that works fine at single processor machines and
> I want improve it so that can work on SMP machines. I’ve used
> NdisDprAquireSpinLock, NdisDprReleaseSpinLock and
> NdisMSyncronizeWithInterrupt functions for this propose. But My driver hangs
> after a lot of time, actually CPU is in a TEST_AND_SET_BIT loop, probably
> due to NdisMSyncronizeWithInterrupt machine code.
> Now I have two problems:
> first, How can I find my mistake in synchronization process.
> second that is much more important in my debug process: I want to use
> SoftIce to debug my code as I did previously on single processor machine,
> but after a clean installation of SoftIce
> on target machine and before enabling the driver, I can’t activate SoftIce
> through Ctrl-D hotkey while “blue screen” appears and system halted. Does
> work SoftIce 4.05 on SMP machines or there exist any certain setting in
> SoftIce?
> Thanks
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Hi, thank you David and Mark for your comments and hints.

I don’t know driver verifier,but I guess that it a part of WinDbg (if it isn’t, please correct me).
Anyway I’m trying to setup a WinDbg environment(It seems that it is not so simple and I must setup target and host computer for kernel mode debug. I haven’t any experience with WinDbg) and test my driver and I’ll be report the results as soon as possible.

regards
Alireza

>

Hi, thank you David and Mark for your comments and hints.

I don’t know driver verifier,but I guess that it a part of WinDbg (if
it
isn’t, please correct me).
Anyway I’m trying to setup a WinDbg environment(It seems that it is
not so
simple and I must setup target and host computer for kernel mode
debug. I
haven’t any experience with WinDbg) and test my driver and I’ll be
report the
results as soon as possible.

http://support.microsoft.com/kb/244617 would be a good start.

James

Thanks James.

I’ll be use it.

Alireza

Driver verifier is a separate tool included with the WDK, not windbg.

In addition to the link James, provided, here’s one to the actual wdk docs:

http://msdn.microsoft.com/en-us/library/ms792872.aspx

While driver verifier is a great tool, windbg is definitely the place to start, though the learning curve does indeed suck.

In any case, a word document called (something like) ‘kernel_debugging_tutorial.doc’ can be found the root of your windbg installation. I can’t say that I’ve ever really read it, but it’s the only introduction to the subject of using windbg for kernel debugging that I can think of off the top of my head.

This is one of those things that there’s no easy way to start learning, but windbg is a great tool, I think, once you get used to it, it’s bizarre ways, and it’s truly quirky ui, and to be sure the docs don’t make that easy.

What’s your target scenario - serial/1394/usb, os version, physical machine/vmware/hyper-v?

Good luck,

mm

>

Driver verifier is a separate tool included with the WDK, not windbg.

I think it’s actually part of any windows install. It’s definitely
present on all the brand-new-no-additional-software-installed
installations of Windows XP and 2003 that I’ve seen. Not sure about
2008, 7, or ‘home’ versions of any of those operating systems though.

While driver verifier is a great tool, windbg is definitely the place
to
start, though the learning curve does indeed suck.

Using the full power of windbg is certainly a long path to learn, but
simply turning on the verifier and attaching the debugger is an
incredibly useful thing to do. In the event of a crash or a break it
even says that ‘analyze -v’ is probably what you want to do next. If the
cause of the OP’s error is a ‘simple’ deadlock or bad caller IRQL and
the verifier picks it up, ‘analyze -v’ might tell him all he needs to
know.

James

I found driver verifier, it as James said is a part of OS.
I have NIC device driver that works fine for a lot of time, but after a long time (and randomly) encounter deadlock and for this reason I want to debug my code.
The driver was developed for Windows 2000 on x86 machines.
I use null modem cable for target and host computers.

Alireza

> I have NIC device driver that works fine for a lot of time, but after a long time (and randomly)

encounter deadlock and for this reason I want to debug my code.

I am afraid debugger is not particularly useful here - when it comes to synch-related issues the most reliable approach is thorough code analysis. My very first suggestion is that you use NdisDprxxx at low IRQL…

Anton Bassov

Hmmm… I have had just the opposite experience. Nothing wrong with
code analysis, and it certainly can help find inappropriate lock
usage, but a spinlock deadlock is very easy to analyze with a
debugger.

Mark Roddy

On Sun, Aug 30, 2009 at 10:35 AM, wrote:
>> I have NIC device driver that works fine for a lot of time, but after a long time (and randomly)
>> encounter deadlock and for this reason I want to debug my code.
>
>
> I am afraid debugger is not particularly useful here - when it comes to synch-related issues the most reliable approach is thorough code analysis. My very first suggestion is that you use NdisDprxxx at low IRQL…
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Anton,

I hope you meant to say

“My very first suggestion is that *NOT* use NdisDprxxx at low IRQL …”

Cheers,
Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Sunday, August 30, 2009 10:35 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] A Multiproccessing problem

I have NIC device driver that works fine for a lot of time, but after a
long time (and randomly)
encounter deadlock and for this reason I want to debug my code.

I am afraid debugger is not particularly useful here - when it comes to
synch-related issues the most reliable approach is thorough code analysis.
My very first suggestion is that you use NdisDprxxx at low IRQL…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

David,

I hope you meant to say “My very first suggestion is that *NOT* use NdisDprxxx at low IRQL …”

I meant to say “My very first suggestion is that the reason for a deadlock in this particular case lies with using NdisDprxxx at low IRQL” …

Anton Bassov

> a spinlock deadlock is very easy to analyze with a debugger.

… when it reveals itself. For example, if you “forget” to release a spinlock, the whole thing is more than likely to reveal itself straight away, so that a debugger will be or great help here.

However, things are not necessarily that simple when it comes to random bugs. For example, if you acquire a spinlock at low IRQL with DPR function and hold it for a short duration of time, it may take quite a while before you are “lucky” enough to get context switch precisely at the moment when you hold a spinlock. IIRC, this particular bug can get caught pretty easily under Verifier, but consider, for example, a scenario when two pieces of code do nested acquisition once in a while (they do it in the reverse order which leads to a deadlock from time to time) - it may take hours and hours to reproduce a deadlock, and Verifier is not going to help you here either, because your code, despite being logically faulty, is technically correct…

Anton Bassov

Actually, verifier does catch the lock order escalation errors. It builds a
graph of lock acquisition and as soon as an acquisition violates the learned
escalation hierarchy, it bugchecks.

Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Sunday, August 30, 2009 12:47 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] A Multiproccessing problem

a spinlock deadlock is very easy to analyze with a debugger.

… when it reveals itself. For example, if you “forget” to release a
spinlock, the whole thing is more than likely to reveal itself straight
away, so that a debugger will be or great help here.

However, things are not necessarily that simple when it comes to random
bugs. For example, if you acquire a spinlock at low IRQL with DPR
function and hold it for a short duration of time, it may take quite a while
before you are “lucky” enough to get context switch precisely at the moment
when you hold a spinlock. IIRC, this particular bug can get caught pretty
easily under Verifier, but consider, for example, a scenario when two
pieces of code do nested acquisition once in a while (they do it in the
reverse order which leads to a deadlock from time to time) - it may take
hours and hours to reproduce a deadlock, and Verifier is not going to help
you here either, because your code, despite being logically faulty, is
technically correct…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

+1

Driver verifier/!deadlock/chk build is a good approach for this problem, I think. Actually, this is one of the few problems that I actually use the debugger in a significant way for, other than reverse engineering, unless my other options fail first.

In the presence of multicore/multiprocessors, I think that the code review approach is a difficult one.

mm

A deadlock is generally trivial to diagnose with a debugger either on
a live (deadlocked) system or through a dump file. Verifier will catch
cycles. Yes of course one has to reproduce the failure for the
debugger to be an effective tool and verifier has to actually observe
a cycle to catch it. That is the nature of these tools. Code review is
helpful for many things, but I’ve rarely caught a lock hierarchy
violation through a code audit or review, while I’ve caught quite a
few through dump analysis. I guess your experience is different.

Mark Roddy

On Sun, Aug 30, 2009 at 12:47 PM, wrote:
>> a spinlock deadlock is very easy to analyze with a debugger.
>
>
> … when it reveals itself. For example, if you “forget” to release a spinlock, the whole thing is more than likely to ?reveal itself straight away, so that a debugger will be or great help here.
>
>
> However, ?things are not necessarily that simple when it comes to random bugs. ?For example, if ?you ?acquire a spinlock at low IRQL with DPR function and hold it for a short duration of time, it may take quite a while before you are “lucky” enough to get context switch precisely at the moment when you hold a spinlock. IIRC, this particular bug can get caught pretty easily under Verifier, but consider, ?for example, a scenario when two pieces of code do nested acquisition once in a while (they do it in the reverse order which leads to ?a deadlock from time to time) - ?it may take hours and hours to reproduce a deadlock, and Verifier is not going to help you here either, because your code, despite being logically faulty, is technically correct…
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

I have reviewed my code several times, but the error is very delicate and isn’t simple to find it.Therefore I feel that using driver verifier is a new way. Yesterday I tried to use driver verifier and when I have enabled a certain part of code “blue screen” appears with SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION with P4=0x31 (MSDN: “A driver attempted to free pool at an incorrect IRQL.”).
In addition I must be enable deadlock detection in driver verifier using it’s manager(verifier.exe). I’m reading at the moment some article to know how can I do it.
Alireza

> I have reviewed my code several times, but the error is very delicate
and

isn’t simple to find it.Therefore I feel that using driver verifier is
a new
way. Yesterday I tried to use driver verifier and when I have enabled
a
certain part of code “blue screen” appears with
SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION with P4=0x31 (MSDN: “A driver
attempted to free pool at an incorrect IRQL.”).
In addition I must be enable deadlock detection in driver verifier
using it’s
manager(verifier.exe). I’m reading at the moment some article to know
how can
I do it.

After loading verifier, chose ‘Create custom settings (for code
developers)’ and then ‘Select individual settings from a full list’, you
can enable or disable any test you want.

But I would first find out where your memory problem is. If you are
indeed corrupting memory, then you could well be overwriting a
spinlock’s (or other synchronisation object’s) data - you may be chasing
the wrong problem if you look for the deadlock first.

James