We came across some odd wording in the DDK for ExInterlockedRemoveHeadList:
If the caller uses only ExInterlocked…List routines to manipulate
the list, these routines can be called from a single IRQL that is <= DIRQL.
I am only using the ExInterlocked…List routines, so that seems to mean
that I could call them only in my ISR, only in my DPC, or only at
PASSIVE_LEVEL, but no combination of them. That’s just odd.
I looked at the assembly, and it (at least on the single-processor NT
kernel) is just an indirect call to ExfInterlockedRemoveHeadList, which does
a CLI (disable interrupts) as the very first instruction. So, it seems that
it should be safe to use at any IRQL or combination of IRQLs. So, what
could that IRQL restriction really mean?
Does anyone have any insight on that?
What I’m trying to track down is a very rare bluescreen that we’ve seen (of
which we suspect our drivers):
*** STOP: 0x0000000A (0x00000004,0x000000FF,0x00000001,0x8013E2E4)
IRQL_NOT_LESS_OR_EQUAL*** Address 8013e2e4 has base at 80100000 -
ntoskrnl.exe
Where that address turns out to be an instruction ntoskrnl !
ExfInterlockedRemoveHeadList+000C (mov [edx+04],ecx). By looking at the
assembly and trying it out, I’ve found that if I manually set
ListHead->Flink->Blink to NULL, I can duplicate that bluescreen. So, it
looks like corruption of the list pointers, and re-entrancy issues seem a
likely candidate.
Thanks in advance for any assistance!
There is a strange IRQL there 0xFF.
Best regards,
Andrei Zlate-Podani
Taed Wynnell wrote:
We came across some odd wording in the DDK for ExInterlockedRemoveHeadList:
If the caller uses only ExInterlocked…List routines to manipulate
the list, these routines can be called from a single IRQL that is <= DIRQL.
I am only using the ExInterlocked…List routines, so that seems to mean
that I could call them only in my ISR, only in my DPC, or only at
PASSIVE_LEVEL, but no combination of them. That’s just odd.
I looked at the assembly, and it (at least on the single-processor NT
kernel) is just an indirect call to ExfInterlockedRemoveHeadList, which does
a CLI (disable interrupts) as the very first instruction. So, it seems that
it should be safe to use at any IRQL or combination of IRQLs. So, what
could that IRQL restriction really mean?
Does anyone have any insight on that?
What I’m trying to track down is a very rare bluescreen that we’ve seen (of
which we suspect our drivers):
*** STOP: 0x0000000A (0x00000004,0x000000FF,0x00000001,0x8013E2E4)
IRQL_NOT_LESS_OR_EQUAL*** Address 8013e2e4 has base at 80100000 -
ntoskrnl.exe
Where that address turns out to be an instruction ntoskrnl !
ExfInterlockedRemoveHeadList+000C (mov [edx+04],ecx). By looking at the
assembly and trying it out, I’ve found that if I manually set
ListHead->Flink->Blink to NULL, I can duplicate that bluescreen. So, it
looks like corruption of the list pointers, and re-entrancy issues seem a
likely candidate.
Thanks in advance for any assistance!
Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@bitdefender.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
>
Taed Wynnell wrote:
>We came across some odd wording in the DDK for
ExInterlockedRemoveHeadList:
> If the caller uses only ExInterlocked…List routines to
manipulate the
>list, these routines can be called from a single IRQL that
is <= DIRQL.
>
>I am only using the ExInterlocked…List routines, so that seems to
>mean that I could call them only in my ISR, only in my DPC,
or only at
>PASSIVE_LEVEL, but no combination of them. That’s just odd.
>
>I looked at the assembly, and it (at least on the single-processor NT
>kernel) is just an indirect call to
ExfInterlockedRemoveHeadList, which
>does a CLI (disable interrupts) as the very first
instruction. So, it
>seems that it should be safe to use at any IRQL or combination of
>IRQLs. So, what could that IRQL restriction really mean?
>
It among other things it means that you do not write your code for the
“Mpness” of the platform.
On an MP system the function is indeed going to acquire the spinlock
indicated by the second parameter, so you had better respect the admonition
to use the same IRQL for all calls to this routine for a specific queue. In
general, this would be DIRQL because the motivation for using this mechanism
is to manipulate a queue from both your ISR and your DPC or DispatchXXX
routines.
Note also that unlike RemoveHeadList, ExInterlockedRemoveHeadList returns
NULL on an empty list, not the address of the head of the list. (This would
be in the ‘Consistency is the last refuge of the unimaginative’ school of
API design.)
Given your description, which seems to imply that you are testing on a UP
machine, I’d guess that you do not have a re-entrancy/concurrency problem -
cli avoids that - but instead you have a bug in one of your list processing
routines that is causing the corruption. If instead you are only seeing this
on an MP machine, and you are not following the IRQL rules, then indeed you
have a concurrency problem that should be easily resolved by only using
DIRQL when calling this routine.
“Andrei Zlate-Podani” wrote…
> There is a strange IRQL there 0xFF.
Yeah, I noticed that as well. My guess was that the BugCheckEx code was
putting 0xFF since interrupts were disabled and the IRQL was actually still
PASSIVE_LEVEL (I verified this in my debugger). Perhaps not and that’s
another clue?
“Mark Roddy” wrote…
> Given your description, which seems to imply that you are testing on a UP
> machine, I’d guess that you do not have a re-entrancy/concurrency problem
-
> cli avoids that - but instead you have a bug in one of your list
processing
> routines that is causing the corruption.
Thanks. Yes, I am only working on a uniprocessor. However, I couldn’t
imagine any bug in my list processing usage that could cause the Flink/Blink
pointers to get messed up since they are completely managed by the
ExInterlocked routines. And it’s only happened twice in the last year, so
it’s clearly something very rare (hence my re-entrancy thinking)… We also
run special pool * on most of our development/test machines, and we haven’t
found a bug of that variety in quite a while, so I don’t immediately suspect
memory corruption either.
Can you explain to me what is the logic behind allowing this function from
one single IRQL only ? What does it matter what was the previous IRQL that
was calling this function as the manipulation is always serialized with
interrupts disabled and a spinlock acquired ?
Normally spinlocks raise the IRQL to DISPATCH_LEVEL so it is understandable
that if you manipulate the list yourself, you cannot use this function
higher than DISPATCH_LEVEL for it may preempt your custom list manipulation,
and the ExInterlocked… may then deadlock waiting for the spinlock which
had already been acquired. But what is the problem if you only use the ex
interlocked function to manipulate the list ?
Thanks,
Daniel
“Mark Roddy” wrote in message news:xxxxx@ntdev…
> >
> >
> > Taed Wynnell wrote:
> >
> > >We came across some odd wording in the DDK for
> > ExInterlockedRemoveHeadList:
> > > If the caller uses only ExInterlocked…List routines to
> > manipulate the
> > >list, these routines can be called from a single IRQL that
> > is <= DIRQL.
> > >
> > >I am only using the ExInterlocked…List routines, so that seems to
> > >mean that I could call them only in my ISR, only in my DPC,
> > or only at
> > >PASSIVE_LEVEL, but no combination of them. That’s just odd.
> > >
> > >I looked at the assembly, and it (at least on the single-processor NT
> > >kernel) is just an indirect call to
> > ExfInterlockedRemoveHeadList, which
> > >does a CLI (disable interrupts) as the very first
> > instruction. So, it
> > >seems that it should be safe to use at any IRQL or combination of
> > >IRQLs. So, what could that IRQL restriction really mean?
> > >
>
> It among other things it means that you do not write your code for the
> “Mpness” of the platform.
>
> On an MP system the function is indeed going to acquire the spinlock
> indicated by the second parameter, so you had better respect the
admonition
> to use the same IRQL for all calls to this routine for a specific queue.
In
> general, this would be DIRQL because the motivation for using this
mechanism
> is to manipulate a queue from both your ISR and your DPC or DispatchXXX
> routines.
>
> Note also that unlike RemoveHeadList, ExInterlockedRemoveHeadList returns
> NULL on an empty list, not the address of the head of the list. (This
would
> be in the ‘Consistency is the last refuge of the unimaginative’ school of
> API design.)
>
> Given your description, which seems to imply that you are testing on a UP
> machine, I’d guess that you do not have a re-entrancy/concurrency
problem -
> cli avoids that - but instead you have a bug in one of your list
processing
> routines that is causing the corruption. If instead you are only seeing
this
> on an MP machine, and you are not following the IRQL rules, then indeed
you
> have a concurrency problem that should be easily resolved by only using
> DIRQL when calling this routine.
>
>
>
>
Good point. With the cli instruction there really is no chance that you will
deadlock due to re-acquisition on the same cpu, regardless of the irql.
Offhand I can’t come up with a reason for this restriction, other than that
the cli acquisition in the MP version is an artifact of implementation, not
a documented feature. Perhaps this was intentionally added at some point due
to widespread violation of the IRQL rule?
=====================
Mark Roddy
-----Original Message-----
From: Daniel Terhell [mailto:xxxxx@resplendence.com]
Sent: Monday, March 15, 2004 9:23 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] IRQL requirement for ExInterlocked…List
routines?
Can you explain to me what is the logic behind allowing this
function from one single IRQL only ? What does it matter what
was the previous IRQL that was calling this function as the
manipulation is always serialized with interrupts disabled
and a spinlock acquired ?
Normally spinlocks raise the IRQL to DISPATCH_LEVEL so it is
understandable that if you manipulate the list yourself, you
cannot use this function higher than DISPATCH_LEVEL for it
may preempt your custom list manipulation, and the
ExInterlocked… may then deadlock waiting for the spinlock
which had already been acquired. But what is the problem if
you only use the ex interlocked function to manipulate the list ?
Thanks,
Daniel
“Mark Roddy” wrote in message
> news:xxxxx@ntdev…
> > >
> > >
> > > Taed Wynnell wrote:
> > >
> > > >We came across some odd wording in the DDK for
> > > ExInterlockedRemoveHeadList:
> > > > If the caller uses only ExInterlocked…List routines to
> > > manipulate the
> > > >list, these routines can be called from a single IRQL that
> > > is <= DIRQL.
> > > >
> > > >I am only using the ExInterlocked…List routines, so
> that seems to
> > > >mean that I could call them only in my ISR, only in my DPC,
> > > or only at
> > > >PASSIVE_LEVEL, but no combination of them. That’s just odd.
> > > >
> > > >I looked at the assembly, and it (at least on the
> single-processor
> > > >NT
> > > >kernel) is just an indirect call to
> > > ExfInterlockedRemoveHeadList, which
> > > >does a CLI (disable interrupts) as the very first
> > > instruction. So, it
> > > >seems that it should be safe to use at any IRQL or
> combination of
> > > >IRQLs. So, what could that IRQL restriction really mean?
> > > >
> >
> > It among other things it means that you do not write your
> code for the
> > “Mpness” of the platform.
> >
> > On an MP system the function is indeed going to acquire the
> spinlock
> > indicated by the second parameter, so you had better respect the
> admonition
> > to use the same IRQL for all calls to this routine for a
> specific queue.
> In
> > general, this would be DIRQL because the motivation for using this
> mechanism
> > is to manipulate a queue from both your ISR and your DPC or
> > DispatchXXX routines.
> >
> > Note also that unlike RemoveHeadList, ExInterlockedRemoveHeadList
> > returns NULL on an empty list, not the address of the head of the
> > list. (This
> would
> > be in the ‘Consistency is the last refuge of the
> unimaginative’ school
> > of API design.)
> >
> > Given your description, which seems to imply that you are
> testing on a
> > UP machine, I’d guess that you do not have a re-entrancy/concurrency
> problem -
> > cli avoids that - but instead you have a bug in one of your list
> processing
> > routines that is causing the corruption. If instead you are only
> > seeing
> this
> > on an MP machine, and you are not following the IRQL rules, then
> > indeed
> you
> > have a concurrency problem that should be easily resolved by only
> > using DIRQL when calling this routine.
> >
> >
> >
> >
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as:
> xxxxx@stratus.com To unsubscribe send a blank email to
> xxxxx@lists.osr.com
>
This is a bug in the docs. I’m in touch with the writer here to fix them
already, based on this thread.
(you can’t mix interlocked and non-interlocked operations on the list no
matter what the IRQL on MP machines).
Ravi
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Roddy, Mark
Sent: Monday, March 15, 2004 6:54 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] IRQL requirement for ExInterlocked…List routines?
Good point. With the cli instruction there really is no chance that you
will deadlock due to re-acquisition on the same cpu, regardless of the
irql.
Offhand I can’t come up with a reason for this restriction, other than
that the cli acquisition in the MP version is an artifact of
implementation, not a documented feature. Perhaps this was intentionally
added at some point due to widespread violation of the IRQL rule?
=====================
Mark Roddy
-----Original Message-----
From: Daniel Terhell [mailto:xxxxx@resplendence.com]
Sent: Monday, March 15, 2004 9:23 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] IRQL requirement for ExInterlocked…List
routines?
Can you explain to me what is the logic behind allowing this function
from one single IRQL only ? What does it matter what was the previous
IRQL that was calling this function as the manipulation is always
serialized with interrupts disabled and a spinlock acquired ?
Normally spinlocks raise the IRQL to DISPATCH_LEVEL so it is
understandable that if you manipulate the list yourself, you cannot
use this function higher than DISPATCH_LEVEL for it may preempt your
custom list manipulation, and the ExInterlocked… may then deadlock
waiting for the spinlock which had already been acquired. But what is
the problem if you only use the ex interlocked function to manipulate
the list ?
Thanks,
Daniel
“Mark Roddy” wrote in message
> news:xxxxx@ntdev…
> > >
> > >
> > > Taed Wynnell wrote:
> > >
> > > >We came across some odd wording in the DDK for
> > > ExInterlockedRemoveHeadList:
> > > > If the caller uses only ExInterlocked…List routines to
> > > manipulate the
> > > >list, these routines can be called from a single IRQL that
> > > is <= DIRQL.
> > > >
> > > >I am only using the ExInterlocked…List routines, so
> that seems to
> > > >mean that I could call them only in my ISR, only in my DPC,
> > > or only at
> > > >PASSIVE_LEVEL, but no combination of them. That’s just odd.
> > > >
> > > >I looked at the assembly, and it (at least on the
> single-processor
> > > >NT
> > > >kernel) is just an indirect call to
> > > ExfInterlockedRemoveHeadList, which
> > > >does a CLI (disable interrupts) as the very first
> > > instruction. So, it
> > > >seems that it should be safe to use at any IRQL or
> combination of
> > > >IRQLs. So, what could that IRQL restriction really mean?
> > > >
> >
> > It among other things it means that you do not write your
> code for the
> > “Mpness” of the platform.
> >
> > On an MP system the function is indeed going to acquire the
> spinlock
> > indicated by the second parameter, so you had better respect the
> admonition
> > to use the same IRQL for all calls to this routine for a
> specific queue.
> In
> > general, this would be DIRQL because the motivation for using this
> mechanism
> > is to manipulate a queue from both your ISR and your DPC or
> > DispatchXXX routines.
> >
> > Note also that unlike RemoveHeadList, ExInterlockedRemoveHeadList
> > returns NULL on an empty list, not the address of the head of the
> > list. (This
> would
> > be in the ‘Consistency is the last refuge of the
> unimaginative’ school
> > of API design.)
> >
> > Given your description, which seems to imply that you are
> testing on a
> > UP machine, I’d guess that you do not have a re-entrancy/concurrency
> problem -
> > cli avoids that - but instead you have a bug in one of your list
> processing
> > routines that is causing the corruption. If instead you are only
> > seeing
> this
> > on an MP machine, and you are not following the IRQL rules, then
> > indeed
> you
> > have a concurrency problem that should be easily resolved by only
> > using DIRQL when calling this routine.
> >
> >
> >
> >
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as:
> xxxxx@stratus.com To unsubscribe send a blank email to
> xxxxx@lists.osr.com
>
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com