Question about Workitems.

Hi,

Does anybody know if either of the work item functions IoAllocateWorkItem() or IoQueueWorkItem() increment the reference count on the corresponding device object and so ensure the object cannot be deleted before the work item worker routine runs. If they do then presumably IoFreeWorkItem() decrements it.

Any help much appreciated.

Chris Kelly

IoQueueWorkItem references the underlying device object.

Peter
OSR

IoQueueWorkItem increments the reference count. IoAllocateWorkItem does
not.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@intalk.co.uk
Sent: Wednesday, June 13, 2007 9:43 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Question about Workitems.

Hi,

Does anybody know if either of the work item functions
IoAllocateWorkItem() or IoQueueWorkItem() increment the reference count
on the corresponding device object and so ensure the object cannot be
deleted before the work item worker routine runs. If they do then
presumably IoFreeWorkItem() decrements it.

Any help much appreciated.

Chris Kelly


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I don’t know about DEVICE_OBJECT, but the only thing I can tell you for sure (I have learnt it the hard way) is that your driver may get unloaded while workitem is still outstanding. If it happens,
you BSOD with the error telling you that your driver got unloaded without having completed outstanding operations. In order to deal with the problem that you have described, you can just manually increment refcount on your target DEVICE_OBJECT before queueing a workitem (you can call ObReferenceObject() at DISPATCH_LEVEL), and make your workitem routine decrement it However, it does not solve the one that I have described. Please note that non-zero refcount just keeps DEVICE_OBJECT and DRIVER_OBJECT in RAM, but it does not prevent DrvUnload() from getting called - after it gets called, your driver is, for the practical purposes, already gone…

In other words, be carefull when dealing with workitems…

Anton Bassov

> you can just manually increment refcount on your target

DEVICE_OBJECT before queueing a workitem (you can call ObReferenceObject() at
DISPATCH_LEVEL), and make your workitem routine decrement it
Not good enough, and, IIRC (Oney?), it was a bug in NT: if the work item has
already decremented the refcount but did not return just yet - and this can be true
only for a really short period of time - you are back in square 1.

-------------- Original message --------------
From: xxxxx@hotmail.com

I don’t know about DEVICE_OBJECT, but the only thing I can tell you for sure (I
have learnt it the hard way) is that your driver may get unloaded while workitem
is still outstanding. If it happens,
you BSOD with the error telling you that your driver got unloaded without having
completed outstanding operations. In order to deal with the problem that you
have described, you can just manually increment refcount on your target
DEVICE_OBJECT before queueing a workitem (you can call ObReferenceObject() at
DISPATCH_LEVEL), and make your workitem routine decrement it However, it does
not solve the one that I have described. Please note that non-zero refcount just
keeps DEVICE_OBJECT and DRIVER_OBJECT in RAM, but it does not prevent
DrvUnload() from getting called - after it gets called, your driver is, for the
practical purposes, already gone…

In other words, be carefull when dealing with workitems…

Anton Bassov


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I want to make sure that the disctinction between DrvUnload and the
image being unloaded are 2 separate events, like Anton eluded to .
DrvUnload is called when the last of your device objects have been
deleted via IoDeleteDevice. Deletion of a device does not necessarily
mean that its ref count is zero, just that the device has been marked as
deleted. The image itself will only be unloaded from memory when the
last reference count of the driver’s driver or device objects reaches
zero. As soon as it hits zero, the image can be unloaded before the
call to ObDereferenceObject returns. So if you hold on to the last
reference to your driver within your driver, you can not return after
the call to ObDereferenceObject b/c you will be executing code that is
no longer loaded.

The whole point of the Io work items vs the Ex work items is that this
problem is taken care of for you by the kernel. You need an entity
outside of your driver to

hold onto the last reference so that the code after the
ObDereferenceObject call is still valid. IoSetCompletionRoutineEx
fulfils the same goal except for completion routines.

Anton, the problems you state below are associated with Ex work items,
not Io work items. Are you sure that you have seen the BSOD with an io
work item pending?

d

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@comcast.net
Sent: Wednesday, June 13, 2007 8:26 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Question about Workitems.

you can just manually increment refcount on your target
DEVICE_OBJECT before queueing a workitem (you can call
ObReferenceObject() at
DISPATCH_LEVEL), and make your workitem routine decrement it

Not good enough, and, IIRC (Oney?), it was a bug in NT: if the work item
has

already decremented the refcount but did not return just yet - and this
can be true

only for a really short period of time - you are back in square 1.

-------------- Original message --------------
From: xxxxx@hotmail.com

I don’t know about DEVICE_OBJECT, but the only thing I can
tell you for sure (I
> have learnt it the hard way) is that your driver may get
unloaded while workitem
> is still outstanding. If it happens,
> you BSOD with the error telling you that your driver got
unloaded without having
> completed outstanding operations. In order to deal with the
problem that you
> have described, you can just manually increment refcount on
your target
> DEVICE_OBJECT before queueing a workitem (you can call
ObReferenceObject() at
> DISPATCH_LEVEL), and make your workitem routine decrement it
However, it does
> not solve the one that I have described. Please note that
non-zero refcount just
> keeps DEVICE_OBJECT and DRIVER_OBJECT in RAM, but it does not
preve nt
> DrvUnload() from getting called - after it gets called, your
driver is, for the
> practical purposes, already gone…
>
>
> In other words, be carefull when dealing with workitems…
>
> Anton Bassov
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at

http://www.osronline.com/page.cfm?name=ListServer


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>> you can just manually increment refcount on your target

> DEVICE_OBJECT before queueing a workitem (you can call ObReferenceObject() at
> DISPATCH_LEVEL), and make your workitem routine decrement it

Not good enough, and, IIRC (Oney?), it was a bug in NT: if the work item has
already decremented the refcount but did not return just yet - and this can be
true only for a really short period of time - you are back in square 1.

As it follows from Peter’s and Mark’s posts, this step is simply unnecessary. However, if it was necessary, the approach I suggested would work fine, at least in context the question the OP asked
Certainly, you would have to call ObDereferenceObject() as a very last step(actually, not call but jump) plus to set up a stack in such way that execution goes directly to the address your workitem routine is supposed to return control to (unfortunately, once x64 does not support inline assembly,
you would have to write a separate ASM routine if you want you code to work on both x86 and x64)
In any case, all this “hackery” is simply not needed here - according to Peter’s and Mark’s posts, the system takes care of everything itself

I just changed the subject and pointed out that protecting you against the deletion of the target DEVICE_OBJECT is just a part of the problem. If DrvUnload() gets invoked while some work items are still queued, it may well happen that doing any operations that these outstanding workitems have to perform may be rather problematic - after all, DrvUnload() is supposed to be the final stage of driver’s lifetime, so that after DrvUnload() returns, your driver is, for the practical purposes, already gone, even if its image is still in RAM. In my experience, waiting on synch events in DrvUnload() does not solve the problem…

In order to solve the problem like that, you just have to redesign your driver in such way that DrvUnload() gets called only if no workitems are outstanding. Sometimes it may be easy, but sometimes it is not

Anton Bassov

Doron,

Anton, the problems you state below are associated with Ex work items,
not Io work items. Are you sure that you have seen the BSOD with an io
work item pending?

I am absolutely sure it was IoAllocateWorkItem() - IoQueueWorkItemEx() sequence, and I was using it in my NDIS 6 LWF …

Anton Bassov

Yes, there is still the problem that you must account for that DrvUnload
has run and the environment the work item is running in must be able to
handle this. If you don’t want this to happen, you must track the work
items you allocate and queue and wait for them to drain before deleting
the device object. KMDF does this for instance.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Wednesday, June 13, 2007 3:25 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Question about Workitems.

> you can just manually increment refcount on your target
> DEVICE_OBJECT before queueing a workitem (you can call
ObReferenceObject() at
> DISPATCH_LEVEL), and make your workitem routine decrement it

Not good enough, and, IIRC (Oney?), it was a bug in NT: if the work
item has
already decremented the refcount but did not return just yet - and
this can be
true only for a really short period of time - you are back in square

As it follows from Peter’s and Mark’s posts, this step is simply
unnecessary. However, if it was necessary, the approach I suggested
would work fine, at least in context the question the OP asked
Certainly, you would have to call ObDereferenceObject() as a very last
step(actually, not call but jump) plus to set up a stack in such way
that execution goes directly to the address your workitem routine is
supposed to return control to (unfortunately, once x64 does not support
inline assembly,
you would have to write a separate ASM routine if you want you code to
work on both x86 and x64)
In any case, all this “hackery” is simply not needed here - according to
Peter’s and Mark’s posts, the system takes care of everything itself

I just changed the subject and pointed out that protecting you against
the deletion of the target DEVICE_OBJECT is just a part of the
problem. If DrvUnload() gets invoked while some work items are still
queued, it may well happen that doing any operations that these
outstanding workitems have to perform may be rather problematic - after
all, DrvUnload() is supposed to be the final stage of driver’s
lifetime, so that after DrvUnload() returns, your driver is, for the
practical purposes, already gone, even if its image is still in RAM. In
my experience, waiting on synch events in DrvUnload() does not solve the
problem…

In order to solve the problem like that, you just have to redesign your
driver in such way that DrvUnload() gets called only if no workitems are
outstanding. Sometimes it may be easy, but sometimes it is not

Anton Bassov


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I have a hard time believing that. The whole point of the Io routines
is that they keep the image loaded until the queued work item has
returned back to the NT work item scheduling code. I have debugged work
items who were holding the last reference on the driver and things
unwound properly w/out bugcheck (and yes it was under verifier)

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Wednesday, June 13, 2007 3:31 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Question about Workitems.

Doron,

Anton, the problems you state below are associated with Ex work items,
not Io work items. Are you sure that you have seen the BSOD with an
io
work item pending?

I am absolutely sure it was IoAllocateWorkItem() - IoQueueWorkItemEx()
sequence, and I was using it in my NDIS 6 LWF …

Anton Bassov


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Doron,

I have a hard time believing that.

Believe it or not, but this was happening under Vista RTM in January 2007. Probably, we have just accidentally discovered a bug in Vista???

Anton Bassov

Thanks guys - this exchange has been really useful - I much appreciate the trouble you have taken. It seems to me the safest way (other than using KMDF - which I will use for this drivereventually) is to do as Doron suggests and track the workitems and only delete the device object when there are none outstanding.

Chris Kelly

Anton, it could be an OS bug. It could also be a driver bug. This is the first reported instance of an Io work item being claimed as still pending after the driver has unloaded. For instance, you could be getting into a race where you are incrementing the object ref count from 0->1, in this situation Io work items do not protect you. You should look at your work item code, esp the one that the OS claimed was still queued when you unloaded to make sure you do not hit this type of situation. A repro with a callstack/dump would be great in helping figuring out what the issue is here.

thx
d

Doron,

Basically, the situation stood as following:

I had an unloadable LWF that was quueing workitems upon sends and receives of data.
IoAllocateWorkItem() was using DEVICE_OBJECT that was created by NdisRegisterDeviceEx(),
and IoQueueWorkItemEx() was called in context of FilterSendNetBufferLists() and FilterReceiveNetBufferLists() routines.

Under the normal circumstances, it all worked fine - I could unload my driver without a problem. However, when I was doing it during large upload/download (i.e. when the flow of incoming and outgoing packets was really heavy), I was crashing…

At this point I had 2 suggestions:

  1. My FilterSendNetBufferListsComplete() or FilterReturnNetBufferLists() was being invoked after a driver has been already unloaded.

  2. My workitem routine was being invoked after a driver has been already unloaded.

Therefore, I commented lines that call IoQueueWorkItemEx(), and introduced reference counting on packets and “IsUnloading” flag to the filter extension (I took advantage on the fact that driver cannot get unloaded until filter enters the paused state) . I hope it is needless to say that all access to the refcount and IsUnloading flag was guarded by a spinlock.

Unless “IsUnloading” flag was set, my FilterSendNetBufferLists() and FilterReceiveNetBufferLists() were incrementing the total count before sending or indicating packets (in the latter case, only if NDIS_STATUS_RESOURCES was not set), and FilterSendNetBufferListsComplete() and FilterReturnNetBufferLists() were decremeting it. FilterPause() was checking the count - if it was non-zero, it was setting “IsUnloading” flag on, and returned NDIS_STATUS_PENDING. If “IsUnloading” flag was set, my FilterSendNetBufferLists() and FilterReceiveNetBufferLists() did not proceed with sending or indicating packets, and checked the total count. If was zero, they were calling NdisFPauseComplete() before returning, and, at this point, my filter was entering the paused state.

At this point everything started working perfectly well - I could unload a driver without a problem, no matter how busy the network traffic was. However, when I uncommented the lines that queued a workitem, again, I was crashing.

Therefore, I extended the same counting logic to dealing with workitem - I incremented it before
queueing a workitem, and workitem routine decremented it, plus made sure that NdisFPauseComplete() may get called if and only if refcount for workitems is zero.
At this point,again, everything started working perfectly well - I could unload a driver without a problem, no matter how busy the network traffic was.

Anton Bassov