Debugging DRIVER_IRQL_NOT_LESS_OR_EQUAL

OSR_Community_User · February 26, 2006, 3:23am

Thomas F. Divine wrote:

Below are extracts from working NDIS IM driver code (PCAUSA Packet
Redirector) that indicates user-mode packet “up the stack”. The missing
pieces aren’t really important.

In this particular implementation the user-mode app calls the driver at
IRP_MJ_WRITE and the I/O method is DO_BUFFERED_IO, which simplifies some
things.

No spin lock is in this path at all. Offhand, I don’t know why one would
be needed.

Yes, I also don’t understand the use of the SpinLock…

Tell me if I’m wrong, but on a single processor machine a SpinLock is
implemented in raising the IRQL to DISPATCH_LEVEL since no other thread
may be running… Am I correct?

–

Edouard

Thomas_F_Divine · February 26, 2006, 4:18am

“Edouard A.” wrote in message
news:xxxxx@ntdev…
> Thomas F. Divine wrote:
>> Below are extracts from working NDIS IM driver code (PCAUSA Packet
>> Redirector) that indicates user-mode packet “up the stack”. The missing
>> pieces aren’t really important.
>>
>> In this particular implementation the user-mode app calls the driver at
>> IRP_MJ_WRITE and the I/O method is DO_BUFFERED_IO, which simplifies some
>> things.
>>
>> No spin lock is in this path at all. Offhand, I don’t know why one would
>> be needed.
>
> Yes, I also don’t understand the use of the SpinLock…
>
> Tell me if I’m wrong, but on a single processor machine a SpinLock is
> implemented in raising the IRQL to DISPATCH_LEVEL since no other thread
> may be running… Am I correct?
>
On both single and multi-processor machines a sipnlock raises the IRQL to
DISPATCH_LEVEL. This prevents other threads on the same processor from
accessing the protected data.

In the kernel many operations are performed in “arbitrary thread context” at
DISPATCH_LEVEL and even higher as the results of interrupts and other
events. On MP machines spinlocks provide additional features that protect
the data from being accessed by another processor as long as the lock is
held.

Thomas F. Divine

Soren_Dreijer · February 26, 2006, 6:00am

Regarding the lock, during initialization of the driver I allocate an
adapter-specific context area where I store things such as the adapter
handle (which was supplied in MiniportInitialize() ), statistics, IRP
safe-cancel queues and the like.

I just assumed I had to use spin locks whenever accessing this area from my
functions, as another function could possibly be using it too. But maybe I
am wrong?
Currently, I have a global struct, NIC_GLOBAL_DATA, which contains a pointer
to the aforementioned adapter context area and a spin lock to synchronize
access to it.

Oh, and I should probably add that I intend only a single instance of the
driver to be running at a time. In other words, only a single network
adapter using my driver.

“Thomas F. Divine” wrote in message news:xxxxx@ntdev…
> Below are extracts from working NDIS IM driver code (PCAUSA Packet
> Redirector) that indicates user-mode packet “up the stack”. The missing
> pieces aren’t really important.
>
> In this particular implementation the user-mode app calls the driver at
> IRP_MJ_WRITE and the I/O method is DO_BUFFERED_IO, which simplifies some
> things.
>
> No spin lock is in this path at all. Offhand, I don’t know why one would
be
> needed.
>
> Thomas F. Divine, Windows DDK MVP
> http://www.pcausa.com
>
>
> //
> // Handle Writes On Virtual/Lower Adapter Differently
> // --------------------------------------------------
> // Virtual Adapter - Write data is indicated “up the stack” as if
> // the data had been received from the network.
> // Lower Adapter - Write data is sent “down the stack” to the lower
> // MAC driver to be placed on the wire.
> //
> if( pW32NOpenContext->m_bIsVirtualAdapter )
> {
> NDIS_HANDLE MacReceiveContext = NULL;
> PUCHAR HeaderBuffer;
> UINT HeaderBufferSize;
> PUCHAR LookAheadBuffer;
> UINT LookAheadBufferSize;
>
> HeaderBuffer = pIrp->AssociatedIrp.SystemBuffer;
> HeaderBufferSize = pAdapt->HeaderSize;
>
> LookAheadBuffer = &HeaderBuffer[HeaderBufferSize];
> LookAheadBufferSize = TotalPacketLength - HeaderBufferSize;
>
> switch( pAdapt->VirtualAdapterMediaType )
> {
> case NdisMedium802_3:
> NdisMEthIndicateReceive(
> pAdapt->MiniportHandle,
> MacReceiveContext,
> HeaderBuffer,
> HeaderBufferSize,
> LookAheadBuffer,
> LookAheadBufferSize,
> LookAheadBufferSize // PacketSize == LookAheadBufferSize
> );
> break;
>
> case NdisMedium802_5:
> NdisMTrIndicateReceive(
> pAdapt->MiniportHandle,
> MacReceiveContext,
> HeaderBuffer,
> HeaderBufferSize,
> LookAheadBuffer,
> LookAheadBufferSize,
> LookAheadBufferSize // PacketSize == LookAheadBufferSize
> );
> break;
>
> case NdisMediumFddi:
> NdisMFddiIndicateReceive(
> pAdapt->MiniportHandle,
> MacReceiveContext,
> HeaderBuffer,
> HeaderBufferSize,
> LookAheadBuffer,
> LookAheadBufferSize,
> LookAheadBufferSize // PacketSize == LookAheadBufferSize
> );
> break;
>
> default:
> ASSERT(0);
> break;
> }
>
> CLReceiveComplete( (NDIS_HANDLE )pAdapt );
>
> nBytesWritten = TotalPacketLength;
>
> //
> // Complete The IRP
> //
> pIrp->IoStatus.Status = Status;
> pIrp->IoStatus.Information = nBytesWritten;
>
> IoCompleteRequest(pIrp, IO_NETWORK_INCREMENT);
>
> return( Status );
> }
> else
> {
> }
>
>

Maxim_S_Shatskih · February 26, 2006, 6:16am

> Tell me if I’m wrong, but on a single processor machine a SpinLock is

implemented in raising the IRQL to DISPATCH_LEVEL since no other thread
may be running…

“Since no other DPC can be running”. Raising to DISPATCH_LEVEL prevents any
DPCs from running on this CPU, and there is no other CPUs.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · February 26, 2006, 6:18am

> Currently, I have a global struct, NIC_GLOBAL_DATA, which contains a pointer

Bad idea. What if you have several NICs?

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Soren_Dreijer · February 26, 2006, 6:46am

First of all, I don’t.

And why wouldn’t that work if I had several NICs? The pointer is protected
by a spin lock.

On another note, assume I have only a single NIC. Is it possible for one
routine to be halted and another one in the driver to be run? I mean, do I
need to synchronize access to the adapter context area if I only have a
single NIC?

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
> > Currently, I have a global struct, NIC_GLOBAL_DATA, which contains a
pointer
>
> Bad idea. What if you have several NICs?
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
>

Maxim_S_Shatskih · February 26, 2006, 7:39am

> And why wouldn’t that work if I had several NICs? The pointer is protected

by a spin lock.

Is it OK logically to have a single piece of global data for several
independent NICs?

routine to be halted and another one in the driver to be run? I mean, do I
need to synchronize access to the adapter context area if I only have a
single NIC?

Depends upon whether your miniport is serialized.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Soren_Dreijer · February 26, 2006, 9:47am

> Is it OK logically to have a single piece of global data for several

independent NICs?
I guess it depends on what data you’re storing. But yeah, you’re right. the
adapter handle, statistics, etc. are NIC dependent and hence each NIC should
have its own context area. However, as I’m still only aiming for a single
NIC at any instant, I guess it’s not a big problem?

Depends upon whether your miniport is serialized.
It’s deserialized.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
> > And why wouldn’t that work if I had several NICs? The pointer is
protected
> > by a spin lock.
>
> Is it OK logically to have a single piece of global data for several
> independent NICs?
>
> > routine to be halted and another one in the driver to be run? I mean, do
I
> > need to synchronize access to the adapter context area if I only have a
> > single NIC?
>
> Depends upon whether your miniport is serialized.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
>

Don_Burn_1 · February 26, 2006, 9:51am

It is a bad idea to assume a single anything in Windows, I’ve seen a lot of
drivers crash when the second device is installed. Unless there is a strong
reason to not support more than one, do not assume a singleton for anything.

–
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Remove StopSpam from the email to reply

“Soren Dreijer” wrote in message news:xxxxx@ntdev…
>> Is it OK logically to have a single piece of global data for several
>> independent NICs?
> I guess it depends on what data you’re storing. But yeah, you’re right.
> the
> adapter handle, statistics, etc. are NIC dependent and hence each NIC
> should
> have its own context area. However, as I’m still only aiming for a single
> NIC at any instant, I guess it’s not a big problem?
>
>> Depends upon whether your miniport is serialized.
> It’s deserialized.

Maxim_S_Shatskih · February 26, 2006, 10:07am

> NIC at any instant, I guess it’s not a big problem?

But is is better to implement all properly from the very beginning.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Soren_Dreijer · February 26, 2006, 10:08am

You’re probably right. So what would a multi-instance solution look like
then?

And what about the synchronizing issue. I mean, should I allow only a single
routine access to the adapter context area at any instant, or what should I
do? Consider the scenario where the driver is servicing a write dispatch
IRP, and in the middle of that a packet is transmitted on the adapter and
MiniportSendPackets() is called. Both (might) need access to the adapter
context…

“Don Burn” wrote in message news:xxxxx@ntdev…
> It is a bad idea to assume a single anything in Windows, I’ve seen a lot
of
> drivers crash when the second device is installed. Unless there is a
strong
> reason to not support more than one, do not assume a singleton for
anything.
>
>
> –
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
> Remove StopSpam from the email to reply
>
>
> “Soren Dreijer” wrote in message news:xxxxx@ntdev…
> >> Is it OK logically to have a single piece of global data for several
> >> independent NICs?
> > I guess it depends on what data you’re storing. But yeah, you’re right.
> > the
> > adapter handle, statistics, etc. are NIC dependent and hence each NIC
> > should
> > have its own context area. However, as I’m still only aiming for a
single
> > NIC at any instant, I guess it’s not a big problem?
> >
> >> Depends upon whether your miniport is serialized.
> > It’s deserialized.
>
>
>

Soren_Dreijer · February 26, 2006, 10:10am

Right. And such a solution would be…?

I’m just looking for best-practice guidelines, not actual code, but a hint
in the right direction :]

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
> > NIC at any instant, I guess it’s not a big problem?
>
> But is is better to implement all properly from the very beginning.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
>

Soren_Dreijer · February 26, 2006, 10:40am

Update:
I avoided holding a spin lock when calling NdisMEthIndicateReceive() and
that fixed the problem.

I’m am still very interested in what a robust solution for managing the
adapter context area for multiple NICs would look like. Also, the
synchronization question still stands :]

Thanks everyone so far!

“Don Burn” wrote in message news:xxxxx@ntdev…
> It is a bad idea to assume a single anything in Windows, I’ve seen a lot
of
> drivers crash when the second device is installed. Unless there is a
strong
> reason to not support more than one, do not assume a singleton for
anything.
>
>
> –
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
> Remove StopSpam from the email to reply
>
>
> “Soren Dreijer” wrote in message news:xxxxx@ntdev…
> >> Is it OK logically to have a single piece of global data for several
> >> independent NICs?
> > I guess it depends on what data you’re storing. But yeah, you’re right.
> > the
> > adapter handle, statistics, etc. are NIC dependent and hence each NIC
> > should
> > have its own context area. However, as I’m still only aiming for a
single
> > NIC at any instant, I guess it’s not a big problem?
> >
> >> Depends upon whether your miniport is serialized.
> > It’s deserialized.
>
>
>

OSR_Community_User · February 26, 2006, 11:44am

> Are you allocating memory with ExAllocatePoolWithTag() ?

If yes, is the ascii value of each character in the tag
between 0 and 127 ? If not, this could corrupt the pool.

Uhhh, can you provide some docs on why this is bad? I believe you, but what
code assumes that bits 7, 15, 23, and 31 of the pool tag are zero??

– arlie

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Edouard A.
Sent: Saturday, February 25, 2006 1:56 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Debugging DRIVER_IRQL_NOT_LESS_OR_EQUAL

S?ren Dreijer wrote:

Update:
I tried manually calling the write-method 128 times in a row without
problems.

The crash dumps still don’t indicate that my driver is at fault.
Instead the stack trace reveals:

f6604c38 80647152 000000c9 00000005 81666800 nt!KeBugCheckEx+0x19
f6604c70 8058b076 82086fd8 00000000 82086f68 nt!IovCallDriver+0xdf
f6604c84 8058dfc2 81666800 82086f68 ff964218
nt!IopSynchronousServiceTail+0x5e
f6604d38 804da140 00000f2c 00000f24 00000000 nt!NtWriteFile+0x5de
f6604d38 7ffe0304 00000f2c 00000f24 00000000 nt!KiSystemService+0xc4
01b8fa00 00000000 00000000 00000000 00000000
SharedUserData!SystemCallStub+0x4

So the problem is definitely upon calling WriteFile(). I just don’t
understand what is triggering the bug check…

Are you allocating memory with ExAllocatePoolWithTag() ? If yes, is the
ascii value of each character in the tag between 0 and 127 ? If not, this
could corrupt the pool.

–

EA

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thomas_F_Divine · February 26, 2006, 11:53am

“Søren Dreijer” wrote in message news:xxxxx@ntdev…
> Update:
> I avoided holding a spin lock when calling NdisMEthIndicateReceive() and
> that fixed the problem.
>
> I’m am still very interested in what a robust solution for managing the
> adapter context area for multiple NICs would look like. Also, the
> synchronization question still stands :]
>
> Thanks everyone so far!
>
In looking over this thread what I see is 1.) a very bright programmer 2.)
has a good insignt into protential problems, 3.) is unfamiliar with the
Windows kernel-mode software environment but 4.) is in too big of a hurry…

It is important for you to take the time to study Windows driver
fundamentals. Available resources include the DDK, books, training courses
and the DDK sample drivers.

Back to your original problem…

Elimination of holding the spinlock apparently “fixed” your original
problem.,

But, it there still need for some sort of "protection’ on that structure?

Of course there is…

For one thing, during some parts of your code execution NDIS is not aware
that you are doing anything. NDIS could, for example, choose a very bad
moment to disable your adapter. This sort of thing could result in your
per-nic structure being deleted - probably at the worst time. So, you must
take steps to insure that your per-nic instance data continues to exist
while you are using it. This problem concerns the “lifetime” of objects, and
in the kernel it is usually solved by using “reference counting”.

Several of the DDK samples illustrate reference counting. Basically you add
a reference counter to a structure whose existance you must insure. A
spinlock-protected function is provided to increment the refcount to
indicate that you are using it. A spinlock-protected function is used to
decrement the regerence count - AND this function is used to “hide” the code
that deletes the structure. When the refcount goes (typically) to zero, the
decerementing function will delete the structure.

Your original spinlock protection was held too long. It would be replaced by
code that 1.) grabs the spinlock, 2.) insures that the info actually exists,
3.) increments info’s refcount if it exists and 4.) releases the spinlock.
This is a very brief operation. Having done this (successfully) you now KNOW
that the per-nic instance data will continue to exist until you are done
with it. You would (while holding the spinlock…) decrement the reference
count after you completed your call to NdisMEthIndicateReceive. If you made
some sort of longer-duration asynchronous operation, you would hold on the
the spinlock until the operation is completed.

If you account for multiple nics, the solution is typically to use a linked
list of some sort. The global spinlock protects the list head. You may have
additional spinlocks associated with each individual NIC instance data so
you can, for example, safely operate on one NIC’s data without limiting
access to other NIC’s data.

Finally, take a look at other kernel-mode functions that are extremely
helpful. In particular, look at the “Interlocked” functions.

Good luck,

Thomas F. Divine, Windows DDK MVP
http://www.pcausa.com

Don_Burn_1 · February 26, 2006, 12:12pm

I can’t say anything about 7, 15, or 23, but bit 31 is used to indicate that
the memory must be freed with ExFreePoolWithTag with a matching tag to the
allocate. See the #define for PROTECTED_POOL in the ddk includes.

–
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Remove StopSpam from the email to reply

“Arlie Davis” wrote in message
news:xxxxx@ntdev…
> Are you allocating memory with ExAllocatePoolWithTag() ?
> If yes, is the ascii value of each character in the tag
> between 0 and 127 ? If not, this could corrupt the pool.

Uhhh, can you provide some docs on why this is bad? I believe you, but what
code assumes that bits 7, 15, 23, and 31 of the pool tag are zero??

– arlie

Soren_Dreijer · February 26, 2006, 12:44pm

> In looking over this thread what I see is 1.) a very bright programmer 2.)

has a good insignt into protential problems, 3.) is unfamiliar with the
Windows kernel-mode software environment but 4.) is in too big of a
hurry…

Thank you, and yeah, there are so many things to take into account when
you’re new to the WDM that it sometimes seems too overwhelming to carefully
read every little detail. That, of course, backfires later on. Sometimes you
need to get dirt on your hand too, though. Thank God for the existence of
people such as those in this newsgroup :]

But, it there still need for some sort of "protection’ on that structure?

Of course there is…
…

So, what you’re saying is that everytime I need to access the adapter
context area from a routine, I need to make use of reference counting etc.
to make sure the memory is still valid. That makes perfect sense as I
wouldn’t want to free the memory when it’s in use.
What about multiple routines accessing the structure at the same time?
According to your example below, one increments the refcount before using
the structure and decrements it after use. If data exists in the structure
which are used by more than one routine, would you then add a spin lock (or
similar) to guard that specific data?

Currently, I do make use of refcounting to see when the last NIC that uses
my driver shuts down (utilizing NdisInterlockedIncrement() /
NdisInterlockedDecrement() to perform the operations in an atomic fashion).

Regarding supporting multiple NICs, if you keep the adapter instances on a
linked list, how does each instance know which linked node belongs to them?
I guess I’m missing something here…

Thank you so very much for the elaborate reply!

“Thomas F. Divine” wrote in message news:xxxxx@ntdev…
>
> “Søren Dreijer” wrote in message news:xxxxx@ntdev…
> > Update:
> > I avoided holding a spin lock when calling NdisMEthIndicateReceive() and
> > that fixed the problem.
> >
> > I’m am still very interested in what a robust solution for managing the
> > adapter context area for multiple NICs would look like. Also, the
> > synchronization question still stands :]
> >
> > Thanks everyone so far!
> >
> In looking over this thread what I see is 1.) a very bright programmer 2.)
> has a good insignt into protential problems, 3.) is unfamiliar with the
> Windows kernel-mode software environment but 4.) is in too big of a
hurry…
>
> It is important for you to take the time to study Windows driver
> fundamentals. Available resources include the DDK, books, training courses
> and the DDK sample drivers.
>
> Back to your original problem…
>
> Elimination of holding the spinlock apparently “fixed” your original
> problem.,
>
> But, it there still need for some sort of "protection’ on that structure?
>
> Of course there is…
>
> For one thing, during some parts of your code execution NDIS is not aware
> that you are doing anything. NDIS could, for example, choose a very bad
> moment to disable your adapter. This sort of thing could result in your
> per-nic structure being deleted - probably at the worst time. So, you must
> take steps to insure that your per-nic instance data continues to exist
> while you are using it. This problem concerns the “lifetime” of objects,
and
> in the kernel it is usually solved by using “reference counting”.
>
> Several of the DDK samples illustrate reference counting. Basically you
add
> a reference counter to a structure whose existance you must insure. A
> spinlock-protected function is provided to increment the refcount to
> indicate that you are using it. A spinlock-protected function is used to
> decrement the regerence count - AND this function is used to “hide” the
code
> that deletes the structure. When the refcount goes (typically) to zero,
the
> decerementing function will delete the structure.
>
> Your original spinlock protection was held too long. It would be replaced
by
> code that 1.) grabs the spinlock, 2.) insures that the info actually
exists,
> 3.) increments info’s refcount if it exists and 4.) releases the spinlock.
> This is a very brief operation. Having done this (successfully) you now
KNOW
> that the per-nic instance data will continue to exist until you are done
> with it. You would (while holding the spinlock…) decrement the reference
> count after you completed your call to NdisMEthIndicateReceive. If you
made
> some sort of longer-duration asynchronous operation, you would hold on the
> the spinlock until the operation is completed.
>
> If you account for multiple nics, the solution is typically to use a
linked
> list of some sort. The global spinlock protects the list head. You may
have
> additional spinlocks associated with each individual NIC instance data so
> you can, for example, safely operate on one NIC’s data without limiting
> access to other NIC’s data.
>
> Finally, take a look at other kernel-mode functions that are extremely
> helpful. In particular, look at the “Interlocked” functions.
>
> Good luck,
>
> Thomas F. Divine, Windows DDK MVP
> http://www.pcausa.com
>
>

Thomas_F_Divine · February 26, 2006, 1:07pm

“Søren Dreijer” wrote in message news:xxxxx@ntdev…
<snip…>

> So, what you’re saying is that everytime I need to access the adapter
> context area from a routine, I need to make use of reference counting etc.
> to make sure the memory is still valid. That makes perfect sense as I
> wouldn’t want to free the memory when it’s in use.

You must use logic here. You must understand what is going on and what you
must do to prevent problems. There really isn’t a simple rule here. You must
give a think.

> What about multiple routines accessing the structure at the same time?
> According to your example below, one increments the refcount before using
> the structure and decrements it after use. If data exists in the structure
> which are used by more than one routine, would you then add a spin lock
> (or
> similar) to guard that specific data?

It is fine if multiple routines on multiple CPUs reference the same
refcount-protected memory. Only when the LAST refcount is removed can the
item be deleted.

So, refcounting deals with object lifetime - but is not all that must be
accounted for.

Yes, it is not uncommon to have approaches like this:

1.) Spinlock-protected global adapter list. Entries are per-adapter context.
2.) Spinlock(s) in each per-adapter context to protect fields, counters and
flags related to binding/unbinding etc.
3.) Spinlock(s) in each per-adapter context to protect queues pf items (IRPs
or packets?). Can use quick interlocked list routines here.

Others.

>
> Currently, I do make use of refcounting to see when the last NIC that uses
> my driver shuts down (utilizing NdisInterlockedIncrement() /
> NdisInterlockedDecrement() to perform the operations in an atomic
> fashion).
>
> Regarding supporting multiple NICs, if you keep the adapter instances on a
> linked list, how does each instance know which linked node belongs to
> them?
> I guess I’m missing something here…
>

Generally you keep the necessary handles in your per-nic context structure.
When an operation is initiated you must have some information (desired
handle of some sort) to look for. Make search routine to grab approbriate
lock, walk the linked list. If handle match is found, bump refcount and
return pointer to item to caller after releasing spinlock. If search routine
is successful, caller has pointer to item AND knows that a refcount has been
added to it (Must release sometime, of course…).

Thomas F. Divine, Windows DDK MVP
http://www.pcausa.com

> Thank you so very much for the elaborate reply!
>
> “Thomas F. Divine” wrote in message
> news:xxxxx@ntdev…
>>
>> “Søren Dreijer” wrote in message
>> news:xxxxx@ntdev…
>> > Update:
>> > I avoided holding a spin lock when calling NdisMEthIndicateReceive()
>> > and
>> > that fixed the problem.
>> >
>> > I’m am still very interested in what a robust solution for managing the
>> > adapter context area for multiple NICs would look like. Also, the
>> > synchronization question still stands :]
>> >
>> > Thanks everyone so far!
>> >
>> In looking over this thread what I see is 1.) a very bright programmer
>> 2.)
>> has a good insignt into protential problems, 3.) is unfamiliar with the
>> Windows kernel-mode software environment but 4.) is in too big of a
> hurry…
>>
>> It is important for you to take the time to study Windows driver
>> fundamentals. Available resources include the DDK, books, training
>> courses
>> and the DDK sample drivers.
>>
>> Back to your original problem…
>>
>> Elimination of holding the spinlock apparently “fixed” your original
>> problem.,
>>
>> But, it there still need for some sort of "protection’ on that structure?
>>
>> Of course there is…
>>
>> For one thing, during some parts of your code execution NDIS is not aware
>> that you are doing anything. NDIS could, for example, choose a very bad
>> moment to disable your adapter. This sort of thing could result in your
>> per-nic structure being deleted - probably at the worst time. So, you
>> must
>> take steps to insure that your per-nic instance data continues to exist
>> while you are using it. This problem concerns the “lifetime” of objects,
> and
>> in the kernel it is usually solved by using “reference counting”.
>>
>> Several of the DDK samples illustrate reference counting. Basically you
> add
>> a reference counter to a structure whose existance you must insure. A
>> spinlock-protected function is provided to increment the refcount to
>> indicate that you are using it. A spinlock-protected function is used to
>> decrement the regerence count - AND this function is used to “hide” the
> code
>> that deletes the structure. When the refcount goes (typically) to zero,
> the
>> decerementing function will delete the structure.
>>
>> Your original spinlock protection was held too long. It would be replaced
> by
>> code that 1.) grabs the spinlock, 2.) insures that the info actually
> exists,
>> 3.) increments info’s refcount if it exists and 4.) releases the
>> spinlock.
>> This is a very brief operation. Having done this (successfully) you now
> KNOW
>> that the per-nic instance data will continue to exist until you are done
>> with it. You would (while holding the spinlock…) decrement the
>> reference
>> count after you completed your call to NdisMEthIndicateReceive. If you
> made
>> some sort of longer-duration asynchronous operation, you would hold on
>> the
>> the spinlock until the operation is completed.
>>
>> If you account for multiple nics, the solution is typically to use a
> linked
>> list of some sort. The global spinlock protects the list head. You may
> have
>> additional spinlocks associated with each individual NIC instance data so
>> you can, for example, safely operate on one NIC’s data without limiting
>> access to other NIC’s data.
>>
>> Finally, take a look at other kernel-mode functions that are extremely
>> helpful. In particular, look at the “Interlocked” functions.
>>
>> Good luck,
>>
>> Thomas F. Divine, Windows DDK MVP
>> http://www.pcausa.com
>>
>>
>
>
></snip…>

Soren_Dreijer · February 26, 2006, 1:30pm

> It is fine if multiple routines on multiple CPUs reference the same

refcount-protected memory. Only when the LAST refcount is removed can the
item be deleted.

What I was hinting at were possible race-conditions.

I guess it’s time to go through the driver and find possible sources of
errors. Apparently I need to add much more protection to literally
everything :]

“Thomas F. Divine” wrote in message news:xxxxx@ntdev…
>
> “Søren Dreijer” wrote in message news:xxxxx@ntdev…
> <snip…>
>
> > So, what you’re saying is that everytime I need to access the adapter
> > context area from a routine, I need to make use of reference counting
etc.
> > to make sure the memory is still valid. That makes perfect sense as I
> > wouldn’t want to free the memory when it’s in use.
>
> You must use logic here. You must understand what is going on and what you
> must do to prevent problems. There really isn’t a simple rule here. You
must
> give a think.
>
> > What about multiple routines accessing the structure at the same time?
> > According to your example below, one increments the refcount before
using
> > the structure and decrements it after use. If data exists in the
structure
> > which are used by more than one routine, would you then add a spin lock
> > (or
> > similar) to guard that specific data?
>
> It is fine if multiple routines on multiple CPUs reference the same
> refcount-protected memory. Only when the LAST refcount is removed can the
> item be deleted.
>
> So, refcounting deals with object lifetime - but is not all that must be
> accounted for.
>
> Yes, it is not uncommon to have approaches like this:
>
> 1.) Spinlock-protected global adapter list. Entries are per-adapter
context.
> 2.) Spinlock(s) in each per-adapter context to protect fields, counters
and
> flags related to binding/unbinding etc.
> 3.) Spinlock(s) in each per-adapter context to protect queues pf items
(IRPs
> or packets?). Can use quick interlocked list routines here.
>
> Others.
>
> >
> > Currently, I do make use of refcounting to see when the last NIC that
uses
> > my driver shuts down (utilizing NdisInterlockedIncrement() /
> > NdisInterlockedDecrement() to perform the operations in an atomic
> > fashion).
> >
> > Regarding supporting multiple NICs, if you keep the adapter instances on
a
> > linked list, how does each instance know which linked node belongs to
> > them?
> > I guess I’m missing something here…
> >
>
> Generally you keep the necessary handles in your per-nic context
structure.
> When an operation is initiated you must have some information (desired
> handle of some sort) to look for. Make search routine to grab approbriate
> lock, walk the linked list. If handle match is found, bump refcount and
> return pointer to item to caller after releasing spinlock. If search
routine
> is successful, caller has pointer to item AND knows that a refcount has
been
> added to it (Must release sometime, of course…).
>
> Thomas F. Divine, Windows DDK MVP
> http://www.pcausa.com
>
>
> > Thank you so very much for the elaborate reply!
> >
> > “Thomas F. Divine” wrote in message
> > news:xxxxx@ntdev…
> >>
> >> “Søren Dreijer” wrote in message
> >> news:xxxxx@ntdev…
> >> > Update:
> >> > I avoided holding a spin lock when calling NdisMEthIndicateReceive()
> >> > and
> >> > that fixed the problem.
> >> >
> >> > I’m am still very interested in what a robust solution for managing
the
> >> > adapter context area for multiple NICs would look like. Also, the
> >> > synchronization question still stands :]
> >> >
> >> > Thanks everyone so far!
> >> >
> >> In looking over this thread what I see is 1.) a very bright programmer
> >> 2.)
> >> has a good insignt into protential problems, 3.) is unfamiliar with the
> >> Windows kernel-mode software environment but 4.) is in too big of a
> > hurry…
> >>
> >> It is important for you to take the time to study Windows driver
> >> fundamentals. Available resources include the DDK, books, training
> >> courses
> >> and the DDK sample drivers.
> >>
> >> Back to your original problem…
> >>
> >> Elimination of holding the spinlock apparently “fixed” your original
> >> problem.,
> >>
> >> But, it there still need for some sort of "protection’ on that
structure?
> >>
> >> Of course there is…
> >>
> >> For one thing, during some parts of your code execution NDIS is not
aware
> >> that you are doing anything. NDIS could, for example, choose a very bad
> >> moment to disable your adapter. This sort of thing could result in your
> >> per-nic structure being deleted - probably at the worst time. So, you
> >> must
> >> take steps to insure that your per-nic instance data continues to exist
> >> while you are using it. This problem concerns the “lifetime” of
objects,
> > and
> >> in the kernel it is usually solved by using “reference counting”.
> >>
> >> Several of the DDK samples illustrate reference counting. Basically you
> > add
> >> a reference counter to a structure whose existance you must insure. A
> >> spinlock-protected function is provided to increment the refcount to
> >> indicate that you are using it. A spinlock-protected function is used
to
> >> decrement the regerence count - AND this function is used to “hide” the
> > code
> >> that deletes the structure. When the refcount goes (typically) to zero,
> > the
> >> decerementing function will delete the structure.
> >>
> >> Your original spinlock protection was held too long. It would be
replaced
> > by
> >> code that 1.) grabs the spinlock, 2.) insures that the info actually
> > exists,
> >> 3.) increments info’s refcount if it exists and 4.) releases the
> >> spinlock.
> >> This is a very brief operation. Having done this (successfully) you now
> > KNOW
> >> that the per-nic instance data will continue to exist until you are
done
> >> with it. You would (while holding the spinlock…) decrement the
> >> reference
> >> count after you completed your call to NdisMEthIndicateReceive. If you
> > made
> >> some sort of longer-duration asynchronous operation, you would hold on
> >> the
> >> the spinlock until the operation is completed.
> >>
> >> If you account for multiple nics, the solution is typically to use a
> > linked
> >> list of some sort. The global spinlock protects the list head. You may
> > have
> >> additional spinlocks associated with each individual NIC instance data
so
> >> you can, for example, safely operate on one NIC’s data without limiting
> >> access to other NIC’s data.
> >>
> >> Finally, take a look at other kernel-mode functions that are extremely
> >> helpful. In particular, look at the “Interlocked” functions.
> >>
> >> Good luck,
> >>
> >> Thomas F. Divine, Windows DDK MVP
> >> http://www.pcausa.com
> >>
> >>
> >
> >
> >
>
></snip…>

Thomas_F_Divine · February 26, 2006, 2:21pm

“Søren Dreijer” wrote in message news:xxxxx@ntdev…
>> It is fine if multiple routines on multiple CPUs reference the same
>> refcount-protected memory. Only when the LAST refcount is removed can the
>> item be deleted.
>
> What I was hinting at were possible race-conditions.
>
> I guess it’s time to go through the driver and find possible sources of
> errors. Apparently I need to add much more protection to literally
> everything :]
After a little study you will probably find that it is fairly systematic and
not really that intrusive.

Do review some of the later DDK samples. Although there are a lot of good
devs at MS, Eliyas Yakub has worked on many of the NDIS (and other) DDK
samples, and I know he does a good job of making samples that deal with edge
conditions - like PnP and MPHalt. Some of the later DDK samples include some
fairly sophisticated debug aid techniques for tracking memory allocations,
etc.

Good luck,

Thomas