The whole benefit of STATUS_PENDING comes when the caller uses
overlapped I/O and “completion ports” in which case I/O mgr doesn’t wait
for completion and returns control to the caller so the thread can
continue its job. But that’s not the case for paging I/O where thread
cannot continue until page fault is satisfied. So, this I/O must be
“synchronous” for this thread and I/O mgr will wait for the request to
be completed.
Lets get back one step. As I understand, the mechanism that you use to
communicate between KM and UM is like this:
- You have a KEVENT object shared across KM and UM
- In UM you have a thread that waits on this event
- In KM you have a queue of driver requests, and:
a) after you add a request to the queue you signal the event
b) awaken, UM thread sends [pull] ioctl which pulls request from the
queue and copies request data into ioctl’s output buffer
c) UM handles that request and sends another [complete] ioctl that
signals completion of the request.
As I understand, the lost ioctl that you’re talking about is complete
ioctl, right? What I suspect is that there was no pull ioctl for that
request and that’s the real reason why you don’t see the complete ioctl
for it. Having multiple ioctls going on under heavy load you may oversee
that single case when pull ioctl is lost. And what you see as successful
group of ioctls in the log could be ioctls related to other requests.
One of the reasons why pull ioctl is lost could be the way you set/clear
that shared event. For example:
- Request 1 is added to the queue and event is set.
- UM thread awakens.
- Request 2 is added to the queue and event is set
- UM clears the event, pulls request 1 and completes it
- UM goes back to wait and… waits because last operation on the event
was “clear”
It’s just simplest illustration on how pull ioctl can be lost. The
sequence of events can be different but if you clear event in UM while
setting it in KM you’re exposed to the risk of clearing it just after KM
has set it. Even if you do both, set and clear in KM you have to design
some locking mechanism to make sure that event is not cleared while
there are still requests in the queue.
Of course it’s just a guess (based on my own experience) and the reality
could be different. You may have a bug in the ioctl handlers that simply
needs to be debugged
I just wanted to eliminate possibility of
“known issues”…
-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 9:41 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] private ioctl to filter fails under load
Vladimir,
Thanks for the input. I am indeed concerned about the possibility of
moving
the wait higher in the stack, though I would think that the I/O Manager
would be designed for that and not become a chokepoint on a
STATUS_PENDING
return. Isn’t that the point of STATUS_PENDING? If not, what good is
it?
The lost ioctl is indeed lost… The Ring 3 app made the first ioctl
call
to fetch the data that the filter signalled was available; I see that
call
complete and my filter acts on it. The Ring 3 app ‘thinks’ that it has
made
the second call (to return the response) and, in fact, it did receive a
successful rc from the system. The problem is that my filter entry
point
for that ioctl never got called. I can see this via my log, via
KdPrints,
via breakpoint, and via the fact that the waiting worker thread times
out
without awaking. The event I use to signal from the ioctl to the
waiting
worker thread is a private event, initialized in the worker thread,
unique
to the context it is working on, and signalled from within my own
program,
not from Ring 3 - no clearing is done nor necessary. The problem is
that
the ioctl never got delivered to me, even though the system told the
Ring 3
app that the ioctl was completed successfully and so the resulting
signal to
the worker thread never got issued. This is further supported by the
fact
that check data I should be copying out in that second ioctl to the Ring
3
app is not correct, again despite the success rc from DeviceIoControl().
That is the puzzle. Why does the system tell the caller of
DeviceIoControl() that it succeeded when it clearly did not?
The user mode program is not accessing the file for which the
outstanding
read is ‘pending’. It needs only consult some of its own internal data
before issuing the response. Further, the app never ‘signals’ the
filter,
except by calling an ioctl. I layed things out that way so as to
eliminate
the chance of missing a UM-generated signal. Only the filter performs a
KeSetEvent() to tell the UM program to come and get some data.
Bill
“Vladimir Chtchetkine” wrote in
message
news:xxxxx@ntfsd…
Bill: first of all, I’m not sure that your model is safe. Because by
returning STATUS_PENDING you simply “move” wait higher in the stack. But
still the calling thread will be blocked while you call your UM
component. Depending on what you do in UM you might be safe but you
might also end up deadlocking because most certainly FileObject for
which you see paging read is locked. If you do the same with paging
writes things are even worst because there are much more locks that
could be held…
Anyway, regarding a “hole” in your ioctl I would suggest to take a look
at the queue of filter requests to make sure that it’s empty when you
see a “lost” ioctl. It could be (I recall having the same problem
myself) that under heavy load you get into condition when you clear the
event while there are still filter request(s) in the queue. As I
remember, event-based KM-UM communication is not that simple as just
queuing the request and then signaling the event. Take a look at who is
setting the event (I presume, driver) and who is clearing it. If it’s UM
who clears it then chances are you’re missing some requests in driver
queue. So, I would suggest to look into this direction…
-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load
Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting
on
the Ring 3 reply.
I heeded the warning about waiting in kernel mode on Ring 3 computation
by
creating a worker thread specifically to handle the case of PAGING_IO
reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes
the
processing and the IRP later.
This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the
ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return
from
the ioctl.
I made a patch to the Ring 3 component to check for a specific DWORD
value
returned by the filter handling the second ioctl. When the error
occurs,
that value is wrong.
So… Ring 3 made the ioctl call and its return code was success, but
the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal.
This
always happens when a PAGING_IO read is being processed.
Any ideas where my ioctl is going/hanging up? I’m stymied.
Thanks,
Bill
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com