Producer at DPC,Consumer at helper thread,puzzled

I’m developping a TDI Client.
I TdiBuildSetEventHandler(ClientEventReceiveDatagram,…);

In my design,It’s one Producer and one Consumer.

in my InitFunc:
{
//…
//Init a spinlock
//Init a double linklist
KeInitializeSemaphore(pSemaphore,0,1);//count is 0,limit is 1
//…
}

In my EventReceiveDatagramHandler:
{
//…
//copy the datagram to a node
//add the node to a double-linklist, sync by spinlock
KeReleaseSemaphore(pSemaphore,SEMAPHORE_INCREMENT,1,FALSE);//release 1
//…
}

In my worker thread,It will handle the double linklist
{
//…
while (TRUE)
{
//…
KeWaitForSingleObject(pSemaphore, Executive, KernelMode, FALSE, NULL);
//get and remove the oldest node from double-linklist,sync by spinlock
//handle the data fo the node
//…
}
}

In my test,everything is OK.
But,when I read the DDK carefully,I find a problem,it puzzled me.
In document from DDK:
Releasing a semaphore object causes the semaphore count to be augmented by the value of the Adjustment parameter. If the resulting value is greater than the limit of the semaphore object, the count is not adjusted and an exception, STATUS_SEMAPHORE_LIMIT_EXCEEDED, is raised.
oh,my god!It’s not same as classic OS.(I think semaphore’s count has no limit in classic OS).

I think it’s trouble:
My EventReceiveDatagramHandler is called at DPC LEVEL,and my worker thread is in PASSIVE LEVEL.
When there are very many datagram received,my EventReceiveDatagramHandler will be called continually,and my worker thread has no chance to execute?
my EventReceiveDatagramHandler will be called once,and return to the UDP-driver;there are many datagram,so UDP-driver call my EventReceiveDatagramHandler again…my worker thread has no chance to execute.
So,I will KeReleaseSemaphore,and the resulting value is greater than the limit,crash!

But,in my test,everything is OK!
What the key theory?My analyse is right or wrong?
If I’m right,what should I do in this situation?
If I’m wrong,why?

thanks ~I wan to know WHY,not just HOW.Thanks very much~

oh,I test my driver again,and many many datagram flush to it.
BSOD
STATUS_SEMAPHORE_LIMIT_EXCEEDED

So,I’m right.It will be BSOD.
What should I do?
Producer at DPC level,consumer at PASSIVE LEVEL(work thread).

a) don’t use semaphores.
b) see (a).

Signal the worker thread only when the list transitions from the empty
state. Use an event.
If you are actually an infinite unlimited bandwidth no flow control
data producer then you do indeed have a design problem.

Mark Roddy

On Fri, Dec 10, 2010 at 8:49 AM, wrote:
> oh,I test my driver again,and many many datagram flush to it.
> BSOD
> STATUS_SEMAPHORE_LIMIT_EXCEEDED
>
> So,I’m right.It will be BSOD.
> What should I do?
> Producer at DPC level,consumer at PASSIVE LEVEL(work thread).
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

> Releasing a semaphore object causes the semaphore count to be augmented by the value of the

Adjustment parameter.

I would just throw away this bizarre semaphore object (the ivory-tower-like theoretical thing nearly never useful in real code) and use the event instead.

Producer code:
lock
if( IsListEmpty(List) )
KeSetEvent(Event)
InsertTailList
unlock

Consumer code:
for(;:wink:
{
KeWaitForSingleObject
lock
if( !IsListEmpty(List) )
{
RemoveHeadList
if( IsListEmpty )
KeResetEvent
}
unlock
}


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

thanks ~~~~
But really,I simplify my design in my post.
The cause I try to use semaphore:
There is one producer(EventReceiveDatagramHandler),and there are many consumer(the number of work threads==the number of CPUs).
according to classic theory,the semaphore is the best for this.But …

I think it’s a very common problem in driver developping,what’s the solution?Thanks

So what is the semaphore limiting? You cannot use it as an unbounded
counter, I think you’ve already learned that. You’ve limited the
number of worker threads to nCPUs. You have a spinlock to control
access to this N-1 bottlenecked queue. So what is the purpose of this
semaphore?

Ordering doesn’t seem to be an issue if you intend to have N threads
concurrently processing N requests, even if they picked them up in
some determined order. I vote for N lists.

Mark Roddy

On Fri, Dec 10, 2010 at 1:48 PM, wrote:
> thanks ~~~~
> But really,I simplify my design in my post.
> The cause I try to use semaphore:
> There is one producer(EventReceiveDatagramHandler),and there are many consumer(the number of work threads==the number of CPUs).
> according to classic theory,the semaphore is the best for this.But …
>
> I think it’s a very common problem in driver developping,what’s the solution?Thanks
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

not N Lists.
all datagram in one lists because they are equal to me.
If I use N lists, N threads is meaningless.I want speed,just speed,not order these datagram.
Of course,I can balance it in producer,it will speed it:
a datagram will insert the Counter%nCPUs list.and one work thread to one list.

That’s it.
not SMP,it is flow balance.

> I would just throw away this bizarre semaphore object (the ivory-tower-like theoretical thing nearly

never useful in real code) and use the event instead.

Well, semaphore in itself is not some kind of " ivory-tower-like theoretical thing" but quite useful construct
( BTW, what is known as "synch event"under Windows is just a binary semaphore anyway). The problem is that Windows happens to be the system that is…well, “not-so-well-suited for using classical semaphores”, so to say…

Anton Bassov

“If I use N lists, N threads is meaningless” - why? One list per
worker thread, one worker thread per cpu, one producer pushing
requests onto those lists as fast as it can.

perhaps we are saying the same thing and it is lost in translation?

Mark Roddy

On Fri, Dec 10, 2010 at 3:00 PM, wrote:
> not N Lists.
> all datagram in one lists because they are equal to me.
> If I use N lists, N threads is meaningless.I want speed,just speed,not order these datagram.
> Of course,I can balance it in producer,it will speed it:
> a datagram will insert the Counter%nCPUs list.and one work thread to one list.
>
> That’s it.
> not SMP,it is flow balance.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

On Dec 10, 2010, at 6:19 AM, Maxim S. Shatskih wrote:

> Releasing a semaphore object causes the semaphore count to be augmented by the value of the
> Adjustment parameter.

I would just throw away this bizarre semaphore object (the ivory-tower-like theoretical thing nearly never useful in real code) and use the event instead.

Semaphore does have its use. Well still now, and way back when rail road network was switched by hand :slight_smile:

But the main point is - what is the pattern of access ?

OP said, one link list !

Semaphore is the best pattern, when there are some finite number of INDEPENDENT resources to be assigned to a humongous number of contending customers. A car rental place, for example.

Here depending on the access pattern, the whole list could be one single resource. If the access pattern is destruction/modification/update and just query. Then multiple-read single write is the pattern to be used.

On the other hand, if it is that anyone can modify/destruct/update. A binary lock of appropriate nature is needed.

-pro

> Semaphore is the best pattern, when there are some finite number of INDEPENDENT resources

to be assigned to a humongous number of contending customers. A car rental place, for example

Actually, the OP’s situation is a classical example of the one when semaphore is the most appropriate construct. The problem is that Windows has …well, let’s say " very specific" understanding of semaphore usage, effectively turning the entire concept into something almost unusable in the real life. No wonder Max speaks of them as of “the ivory-tower-like theoretical thing nearly never useful in real code” and offers to use events.

Assuming that the OP want to have a single producer and predefined number of consumers, he can provide some sort of custom implementation of counting semaphore on top of a binary one that can be done approx. the following way (‘event’ is synch event initialized to non-signaled state; ‘flag’ is local variable in producer and any instance of consumer; and ‘count’ is global variable):

Producer:

flag=0;

KeAcquireSpinlock(&spinlock,& irql);

AddItemToTheList();

count++;

if(count
KeReleaseSpinLock(&spinlock, irql);

if(flag)KeSetEvent(&event, …);

Consumer:

while(1)
{

KeWaitForSingleObject(&event,…);

KeAcquireSpinlock(&spinlock,& irql);

Item=RemoveItemFromTheList();

KeReleaseSpinLock(&spinlock, irql);

while(1)
{

DoEverythingThatHasToBeDoneWithRemovedItem(Item);

flag=0;

KeAcquireSpinlock(&spinlock,& irql);

count–;

if(count
else Item=RemoveItemFromTheList();

KeReleaseSpinLock(&spinlock, irql);

if(flag)break;

}

}

A bit convoluted approach, but this is Windows programming,after all…

> On the other hand, if it is that anyone can modify/destruct/update. A binary lock of appropriate
> nature is needed.

Sorry, but semaphore is not really a synchronization construct, although it can be used for this purpose in some situations. However, in some situations an additional lock is needed. In the OP’s one, once producer runs in atomic context and is not allowed to block, spinlock is needed…

Anton Bassov

I still believe there is no need to use semaphore. Reasons I mentioned earlier and I will explain, just need some time to put all these together. I’ve your code and Max’s code. I will look at them and come up with (if any issues), I see. In fact both code examples don’t look that what I’m thiniking …

Couple key points I will stress are –

  1. The resources are not bounded. Baring available memory. Or at least not deterministic number of them.

  2. Do we need to signal from the producer to the customer. I think not. But for yielding, if not done by OS may need it.

  3. There is that “if the list is not empty or not” is the only thing needed with an appropriate binary lock

Finally (a)fairness (b) starvation

Personally I don’t think Windows APIs are all that convoluted. It is some of us with lack of clear understanding of what to shoot for makes it harder than required. Here us may not include you or Max. But it sure includes OP, and
perhaps me too…

I just like the idea “What is/are absouletly requried and why” !

Assumption: There is just one link list. There is one or more consumer.

Best,
-pro

On Dec 12, 2010, at 8:18 AM, xxxxx@hotmail.com wrote:

> Semaphore is the best pattern, when there are some finite number of INDEPENDENT resources
> to be assigned to a humongous number of contending customers. A car rental place, for example

Actually, the OP’s situation is a classical example of the one when semaphore is the most appropriate construct. The problem is that Windows has …well, let’s say " very specific" understanding of semaphore usage, effectively turning the entire concept into something almost unusable in the real life. No wonder Max speaks of them as of “the ivory-tower-like theoretical thing nearly never useful in real code” and offers to use events.

Assuming that the OP want to have a single producer and predefined number of consumers, he can provide some sort of custom implementation of counting semaphore on top of a binary one that can be done approx. the following way (‘event’ is synch event initialized to non-signaled state; ‘flag’ is local variable in producer and any instance of consumer; and ‘count’ is global variable):

Producer:

flag=0;

KeAcquireSpinlock(&spinlock,& irql);

AddItemToTheList();

count++;

if(count>
>
> KeReleaseSpinLock(&spinlock, irql);
>
> if(flag)KeSetEvent(&event, …);
>
>
>
>
>
> Consumer:
>
>
> while(1)
> {
>
> KeWaitForSingleObject(&event,…);
>
> KeAcquireSpinlock(&spinlock,& irql);
>
> Item=RemoveItemFromTheList();
>
> KeReleaseSpinLock(&spinlock, irql);
>
>
>
> while(1)
> {
>
> DoEverythingThatHasToBeDoneWithRemovedItem(Item);
>
> flag=0;
>
> KeAcquireSpinlock(&spinlock,& irql);
>
> count–;
>
> if(count>
> else Item=RemoveItemFromTheList();
>
> KeReleaseSpinLock(&spinlock, irql);
>
> if(flag)break;
>
> }
>
>
> }
>
>
> A bit convoluted approach, but this is Windows programming,after all…
>
>
>
>
>> On the other hand, if it is that anyone can modify/destruct/update. A binary lock of appropriate
>> nature is needed.
>
> Sorry, but semaphore is not really a synchronization construct, although it can be used for this purpose in some situations. However, in some situations an additional lock is needed. In the OP’s one, once producer runs in atomic context and is not allowed to block, spinlock is needed…
>
>
> Anton Bassov
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> I still believe there is no need to use semaphore.

Well, as long as you mean Windows implementation of it I fully agree with you (although you could replace
binary semaphore in my code with the counting one - this is just the question of taste). However, if we were
speaking about a “classicall” implementation of semaphore it would be quite useful.More on it below

  1. There is that “if the list is not empty or not” is the only thing needed with an appropriate binary lock .

There is just no need for having more worker threads in the running state than currently outstanding requests that these workers are meant to process, right. This is why semaphore would be just a perfect construct in this situation if Windows handled it the wayl other OSes do. Once it does not, we either have to find a way (convoluted one) to implement it ourselves or to take the approach Max does. In his implementation you may end up with a situation when multiple threads contend for a spinlock…only in order to find out that they have nothing to do at the moment and go blocking after having released spinlock (because, judging from his code, this is notification event, rather than synchronization one, so that it does not get reset when waiting thread gets released). I am already not mentioning calls to KeSetEvent() and KeResetEvent() while holding a spinlock, which does not seem to be particularly efficient approach either…

Anton Bassov

You may have point here, that I will look later ( and will open up a thread on ntTalk ).

At a very minimum, all it is needed is an appropriate lock ( otherwise crash sooner or later )

Then there is a need for signaling, otherwise consumers will be polling for work and will not find any work - hence go thru the scheduler cycle of bring it up, then put to wait for nothing productive. The question would be when an event is signalled, is there a fairness about who gets to wake up first… (Also event signaling should be
after releasing the lock )

-pro

other than that, it is fairly simple implementation for OP, once thought thru!
On Dec 12, 2010, at 10:51 AM, xxxxx@hotmail.com wrote:

> I still believe there is no need to use semaphore.

Well, as long as you mean Windows implementation of it I fully agree with you (although you could replace
binary semaphore in my code with the counting one - this is just the question of taste). However, if we were
speaking about a “classicall” implementation of semaphore it would be quite useful.More on it below

> 3) There is that “if the list is not empty or not” is the only thing needed with an appropriate binary lock .

There is just no need for having more worker threads in the running state than currently outstanding requests that these workers are meant to process, right. This is why semaphore would be just a perfect construct in this situation if Windows handled it the wayl other OSes do. Once it does not, we either have to find a way (convoluted one) to implement it ourselves or to take the approach Max does. In his implementation you may end up with a situation when multiple threads contend for a spinlock…only in order to find out that they have nothing to do at the moment and go blocking after having released spinlock (because, judging from his code, this is notification event, rather than synchronization one, so that it does not get reset when waiting thread gets released). I am already not mentioning calls to KeSetEvent() and KeResetEvent() while holding a spinlock, which does not seem to be particularly efficient approach either…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> according to classic theory,the semaphore is the best for this.

For me, semaphore is just plain unconvinient.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> Assuming that the OP want to have a single producer and predefined number of consumers, he can

provide some sort of custom implementation of counting semaphore

No need to ever count anything.

The pseudo-code provided by me above in this conversation is the easiest and thus the best way of doing this.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>problem is that Windows happens to be the system that is

Windows kernel has the queue object - KeRemoveQueue etc.

It is a combination of event and list head, where event is maintained signaled when the list is not empty and non-signaled when the list is empty.

This object suits the task 100%. My pseudo-code above is the implementation of such queue.

The Windows queue is used as the internals to implement a) IOCPs b) IoQueueWorkItem (more so, there is user mode QueueUserWorkItem which actually uses IOCPs i.e. kernel queue).

Unfortunately, it is undocumented and thus people are forced to re-implement it.

In user mode, though, you still can uses this queue by Get/PostQueuedCompletionStatus.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>that it does not get reset when waiting thread gets released). I am already not mentioning calls to

KeSetEvent() and KeResetEvent() while holding a spinlock

Any semaphore implementation would do the same.

You can “optimize” the things a bit by using dispatcher spinlock to guard the list - i.e. KeSetEvent with “retain the lock” parameter set to TRUE. But this will require global lock to be held for all queues in the system, which is maybe even worse.

If one is really smart - he/she can possibly implement this queue as lock-free structure, but I have doubts it will provide noticeable gain - the spinlock contention here is small, and lock-free requires InterlockedXxx anyway.

Queued spinlocks are also good.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>problem is that Windows happens to be the system that is…

One more thing to add: regardless of the semaphore implementation (which is bad in Windows only because Limit is mandatory, limit-less semaphores could be useful) you will need to guard the list itself with some lock.

So, you have 2 locks - 1 dispatcher lock in the semaphore implementation, 2 to guard the list.

The only solution to avoid this is to use the dispatcher lock to guard the queue itself, I expect KeInsert/RemoveQueue to do this.

But this queue is not a semaphore.

So yes, the semaphore itself is useless object in practice. Yes, it serializes access to some items in some container, but the container itself must also be guarded by the lock, so, 2 locks.

Yes, passing the semaphore really means “there are free objects in the container”, but there is no information on what of these objects are free.

To get this information, you will also need to have the second lock (or guard the container with the dispatcher lock), and, in this case, there is no need in a semaphore, notification event with the meaning of “there is at least 1 free object” will suite.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Windows is not Unix, Windows semaphores are not posix semaphores.

– pa