private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation by
creating a worker thread specifically to handle the case of PAGING_IO reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD value
returned by the filter handling the second ioctl. When the error occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal. This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill

Bill: first of all, I’m not sure that your model is safe. Because by
returning STATUS_PENDING you simply “move” wait higher in the stack. But
still the calling thread will be blocked while you call your UM
component. Depending on what you do in UM you might be safe but you
might also end up deadlocking because most certainly FileObject for
which you see paging read is locked. If you do the same with paging
writes things are even worst because there are much more locks that
could be held…
Anyway, regarding a “hole” in your ioctl I would suggest to take a look
at the queue of filter requests to make sure that it’s empty when you
see a “lost” ioctl. It could be (I recall having the same problem
myself) that under heavy load you get into condition when you clear the
event while there are still filter request(s) in the queue. As I
remember, event-based KM-UM communication is not that simple as just
queuing the request and then signaling the event. Take a look at who is
setting the event (I presume, driver) and who is clearing it. If it’s UM
who clears it then chances are you’re missing some requests in driver
queue. So, I would suggest to look into this direction…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting
on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation
by
creating a worker thread specifically to handle the case of PAGING_IO
reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes
the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the
ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return
from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD
value
returned by the filter handling the second ioctl. When the error
occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but
the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal.
This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Vladimir,
Thanks for the input. I am indeed concerned about the possibility of moving
the wait higher in the stack, though I would think that the I/O Manager
would be designed for that and not become a chokepoint on a STATUS_PENDING
return. Isn’t that the point of STATUS_PENDING? If not, what good is it?

The lost ioctl is indeed lost… The Ring 3 app made the first ioctl call
to fetch the data that the filter signalled was available; I see that call
complete and my filter acts on it. The Ring 3 app ‘thinks’ that it has made
the second call (to return the response) and, in fact, it did receive a
successful rc from the system. The problem is that my filter entry point
for that ioctl never got called. I can see this via my log, via KdPrints,
via breakpoint, and via the fact that the waiting worker thread times out
without awaking. The event I use to signal from the ioctl to the waiting
worker thread is a private event, initialized in the worker thread, unique
to the context it is working on, and signalled from within my own program,
not from Ring 3 - no clearing is done nor necessary. The problem is that
the ioctl never got delivered to me, even though the system told the Ring 3
app that the ioctl was completed successfully and so the resulting signal to
the worker thread never got issued. This is further supported by the fact
that check data I should be copying out in that second ioctl to the Ring 3
app is not correct, again despite the success rc from DeviceIoControl().
That is the puzzle. Why does the system tell the caller of
DeviceIoControl() that it succeeded when it clearly did not?

The user mode program is not accessing the file for which the outstanding
read is ‘pending’. It needs only consult some of its own internal data
before issuing the response. Further, the app never ‘signals’ the filter,
except by calling an ioctl. I layed things out that way so as to eliminate
the chance of missing a UM-generated signal. Only the filter performs a
KeSetEvent() to tell the UM program to come and get some data.

Bill

“Vladimir Chtchetkine” wrote in message
news:xxxxx@ntfsd…
Bill: first of all, I’m not sure that your model is safe. Because by
returning STATUS_PENDING you simply “move” wait higher in the stack. But
still the calling thread will be blocked while you call your UM
component. Depending on what you do in UM you might be safe but you
might also end up deadlocking because most certainly FileObject for
which you see paging read is locked. If you do the same with paging
writes things are even worst because there are much more locks that
could be held…
Anyway, regarding a “hole” in your ioctl I would suggest to take a look
at the queue of filter requests to make sure that it’s empty when you
see a “lost” ioctl. It could be (I recall having the same problem
myself) that under heavy load you get into condition when you clear the
event while there are still filter request(s) in the queue. As I
remember, event-based KM-UM communication is not that simple as just
queuing the request and then signaling the event. Take a look at who is
setting the event (I presume, driver) and who is clearing it. If it’s UM
who clears it then chances are you’re missing some requests in driver
queue. So, I would suggest to look into this direction…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting
on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation
by
creating a worker thread specifically to handle the case of PAGING_IO
reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes
the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the
ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return
from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD
value
returned by the filter handling the second ioctl. When the error
occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but
the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal.
This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Is the IRP completing before you mark it pending? This is a typical problem
people do not account for in this sort of filtering…

Just a thought.

Jamey

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Bill Wilhelm
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation by
creating a worker thread specifically to handle the case of PAGING_IO reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD value
returned by the filter handling the second ioctl. When the error occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal. This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Jamey,
Good question. I have two answers…

  1. How would that interfere with the delivery of a private ioctl from my
    user mode companion to my filter?

  2. I don’t think so. Since I’m doing decryption, I have to IoCallDriver()
    first. But… I set a completion routine which returns
    STATUS_MORE_PROCESSING_REQUIRED and signals back to the read dispatch
    routine where I determine whether to do decryption or not. As long as I’ve
    returned properly from the completion routine, it’s not possible for the Irp
    to be completed yet, is it?

Thanks,
Bill

“Jamey Kirby” wrote in message news:xxxxx@ntfsd…
> Is the IRP completing before you mark it pending? This is a typical
problem
> people do not account for in this sort of filtering…
>
>
> Just a thought.
>
> Jamey
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Bill Wilhelm
> Sent: Wednesday, April 28, 2004 5:42 AM
> To: Windows File Systems Devs Interest List
> Subject: [ntfsd] private ioctl to filter fails under load
>
> Here’s a reall odd one… My filter has a Ring 3 companion component.
> During certain filtered file operations, the filter signals the Ring 3
> piece, which fetches some data from the filter via an ioctl and then
> provides a reply with a different ioctl call. On receipt of the second
> ioctl, the filter signals itself to restart the thread that was waiting on
> the Ring 3 reply.
>
> I heeded the warning about waiting in kernel mode on Ring 3 computation by
> creating a worker thread specifically to handle the case of PAGING_IO
reads
> (cannot wait, right?). In this case, the Read entry point winds up
> returning STATUS_PENDING and the worker thread, when awakened, completes
the
> processing and the IRP later.
>
> This works great 99.9% of the time. When the system is under load
> (multipled target files being filtered simultaneously) it fails. What
> happens is that my worked thread never receives the signal from the ioctl
> and so times out. My logs indicate that the second ioctl was, in fact,
> never received yet the Ring 3 companion log showed a successful return
from
> the ioctl.
>
> I made a patch to the Ring 3 component to check for a specific DWORD value
> returned by the filter handling the second ioctl. When the error occurs,
> that value is wrong.
>
> So… Ring 3 made the ioctl call and its return code was success, but the
> filter driver never received the ioctl and times out. While the timeout
> interval is elapsing, I see NO additional file system I/O of any kind
> through the filter and, other than mouse movement, the system is
> unresponsive. When the timeout ends, the system returns to normal. This
> always happens when a PAGING_IO read is being processed.
>
> Any ideas where my ioctl is going/hanging up? I’m stymied.
>
> Thanks,
> Bill
>
>
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>

The whole benefit of STATUS_PENDING comes when the caller uses
overlapped I/O and “completion ports” in which case I/O mgr doesn’t wait
for completion and returns control to the caller so the thread can
continue its job. But that’s not the case for paging I/O where thread
cannot continue until page fault is satisfied. So, this I/O must be
“synchronous” for this thread and I/O mgr will wait for the request to
be completed.

Lets get back one step. As I understand, the mechanism that you use to
communicate between KM and UM is like this:

  1. You have a KEVENT object shared across KM and UM
  2. In UM you have a thread that waits on this event
  3. In KM you have a queue of driver requests, and:
    a) after you add a request to the queue you signal the event
    b) awaken, UM thread sends [pull] ioctl which pulls request from the
    queue and copies request data into ioctl’s output buffer
    c) UM handles that request and sends another [complete] ioctl that
    signals completion of the request.

As I understand, the lost ioctl that you’re talking about is complete
ioctl, right? What I suspect is that there was no pull ioctl for that
request and that’s the real reason why you don’t see the complete ioctl
for it. Having multiple ioctls going on under heavy load you may oversee
that single case when pull ioctl is lost. And what you see as successful
group of ioctls in the log could be ioctls related to other requests.
One of the reasons why pull ioctl is lost could be the way you set/clear
that shared event. For example:

  1. Request 1 is added to the queue and event is set.
  2. UM thread awakens.
  3. Request 2 is added to the queue and event is set
  4. UM clears the event, pulls request 1 and completes it
  5. UM goes back to wait and… waits because last operation on the event
    was “clear”

It’s just simplest illustration on how pull ioctl can be lost. The
sequence of events can be different but if you clear event in UM while
setting it in KM you’re exposed to the risk of clearing it just after KM
has set it. Even if you do both, set and clear in KM you have to design
some locking mechanism to make sure that event is not cleared while
there are still requests in the queue.

Of course it’s just a guess (based on my own experience) and the reality
could be different. You may have a bug in the ioctl handlers that simply
needs to be debugged :slight_smile: I just wanted to eliminate possibility of
“known issues”…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 9:41 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] private ioctl to filter fails under load

Vladimir,
Thanks for the input. I am indeed concerned about the possibility of
moving
the wait higher in the stack, though I would think that the I/O Manager
would be designed for that and not become a chokepoint on a
STATUS_PENDING
return. Isn’t that the point of STATUS_PENDING? If not, what good is
it?

The lost ioctl is indeed lost… The Ring 3 app made the first ioctl
call
to fetch the data that the filter signalled was available; I see that
call
complete and my filter acts on it. The Ring 3 app ‘thinks’ that it has
made
the second call (to return the response) and, in fact, it did receive a
successful rc from the system. The problem is that my filter entry
point
for that ioctl never got called. I can see this via my log, via
KdPrints,
via breakpoint, and via the fact that the waiting worker thread times
out
without awaking. The event I use to signal from the ioctl to the
waiting
worker thread is a private event, initialized in the worker thread,
unique
to the context it is working on, and signalled from within my own
program,
not from Ring 3 - no clearing is done nor necessary. The problem is
that
the ioctl never got delivered to me, even though the system told the
Ring 3
app that the ioctl was completed successfully and so the resulting
signal to
the worker thread never got issued. This is further supported by the
fact
that check data I should be copying out in that second ioctl to the Ring
3
app is not correct, again despite the success rc from DeviceIoControl().
That is the puzzle. Why does the system tell the caller of
DeviceIoControl() that it succeeded when it clearly did not?

The user mode program is not accessing the file for which the
outstanding
read is ‘pending’. It needs only consult some of its own internal data
before issuing the response. Further, the app never ‘signals’ the
filter,
except by calling an ioctl. I layed things out that way so as to
eliminate
the chance of missing a UM-generated signal. Only the filter performs a
KeSetEvent() to tell the UM program to come and get some data.

Bill

“Vladimir Chtchetkine” wrote in
message
news:xxxxx@ntfsd…
Bill: first of all, I’m not sure that your model is safe. Because by
returning STATUS_PENDING you simply “move” wait higher in the stack. But
still the calling thread will be blocked while you call your UM
component. Depending on what you do in UM you might be safe but you
might also end up deadlocking because most certainly FileObject for
which you see paging read is locked. If you do the same with paging
writes things are even worst because there are much more locks that
could be held…
Anyway, regarding a “hole” in your ioctl I would suggest to take a look
at the queue of filter requests to make sure that it’s empty when you
see a “lost” ioctl. It could be (I recall having the same problem
myself) that under heavy load you get into condition when you clear the
event while there are still filter request(s) in the queue. As I
remember, event-based KM-UM communication is not that simple as just
queuing the request and then signaling the event. Take a look at who is
setting the event (I presume, driver) and who is clearing it. If it’s UM
who clears it then chances are you’re missing some requests in driver
queue. So, I would suggest to look into this direction…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting
on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation
by
creating a worker thread specifically to handle the case of PAGING_IO
reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes
the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the
ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return
from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD
value
returned by the filter handling the second ioctl. When the error
occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but
the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal.
This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

You understand the mechanism correctly. It is the response ioctl, not the
pull ioctl which is getting lost. Thanks again for making me think, but I
don’t think that explains it either…

When the system enters this ‘hung’ mode waiting for the expiration of the
worker thread’s timeout, the last entry in my log and KdPrint trail is the
“pull”. It happened and I can see the results of it. The last entry in the
user mode companion app’s log is the reply (the one I never see). Debug
info emitted from that reply call to DeviceIoControl indicates that the rc
was SUCCESS but the returned data clearly did not come from my filters
matching ioctl entry point (because I never saw the ioctl into my filter).

What’s troubling, and make me think there’s something to your first point in
your first reply is that, while the worker thread’s timeout is expiring, I
see NO other file system activity passing through my filter. The stress
case we’re running is a simultaneous launch of 6 different apps against six
different files of interest to my filter. I would expect to see other FS
activity even if one thread/file is ‘waiting’.

“Vladimir Chtchetkine” wrote in message
news:xxxxx@ntfsd…
The whole benefit of STATUS_PENDING comes when the caller uses
overlapped I/O and “completion ports” in which case I/O mgr doesn’t wait
for completion and returns control to the caller so the thread can
continue its job. But that’s not the case for paging I/O where thread
cannot continue until page fault is satisfied. So, this I/O must be
“synchronous” for this thread and I/O mgr will wait for the request to
be completed.

Lets get back one step. As I understand, the mechanism that you use to
communicate between KM and UM is like this:
1. You have a KEVENT object shared across KM and UM
2. In UM you have a thread that waits on this event
3. In KM you have a queue of driver requests, and:
a) after you add a request to the queue you signal the event
b) awaken, UM thread sends [pull] ioctl which pulls request from the
queue and copies request data into ioctl’s output buffer
c) UM handles that request and sends another [complete] ioctl that
signals completion of the request.

As I understand, the lost ioctl that you’re talking about is complete
ioctl, right? What I suspect is that there was no pull ioctl for that
request and that’s the real reason why you don’t see the complete ioctl
for it. Having multiple ioctls going on under heavy load you may oversee
that single case when pull ioctl is lost. And what you see as successful
group of ioctls in the log could be ioctls related to other requests.
One of the reasons why pull ioctl is lost could be the way you set/clear
that shared event. For example:
1. Request 1 is added to the queue and event is set.
2. UM thread awakens.
3. Request 2 is added to the queue and event is set
4. UM clears the event, pulls request 1 and completes it
5. UM goes back to wait and… waits because last operation on the event
was “clear”

It’s just simplest illustration on how pull ioctl can be lost. The
sequence of events can be different but if you clear event in UM while
setting it in KM you’re exposed to the risk of clearing it just after KM
has set it. Even if you do both, set and clear in KM you have to design
some locking mechanism to make sure that event is not cleared while
there are still requests in the queue.

Of course it’s just a guess (based on my own experience) and the reality
could be different. You may have a bug in the ioctl handlers that simply
needs to be debugged :slight_smile: I just wanted to eliminate possibility of
“known issues”…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 9:41 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] private ioctl to filter fails under load

Vladimir,
Thanks for the input. I am indeed concerned about the possibility of
moving
the wait higher in the stack, though I would think that the I/O Manager
would be designed for that and not become a chokepoint on a
STATUS_PENDING
return. Isn’t that the point of STATUS_PENDING? If not, what good is
it?

The lost ioctl is indeed lost… The Ring 3 app made the first ioctl
call
to fetch the data that the filter signalled was available; I see that
call
complete and my filter acts on it. The Ring 3 app ‘thinks’ that it has
made
the second call (to return the response) and, in fact, it did receive a
successful rc from the system. The problem is that my filter entry
point
for that ioctl never got called. I can see this via my log, via
KdPrints,
via breakpoint, and via the fact that the waiting worker thread times
out
without awaking. The event I use to signal from the ioctl to the
waiting
worker thread is a private event, initialized in the worker thread,
unique
to the context it is working on, and signalled from within my own
program,
not from Ring 3 - no clearing is done nor necessary. The problem is
that
the ioctl never got delivered to me, even though the system told the
Ring 3
app that the ioctl was completed successfully and so the resulting
signal to
the worker thread never got issued. This is further supported by the
fact
that check data I should be copying out in that second ioctl to the Ring
3
app is not correct, again despite the success rc from DeviceIoControl().
That is the puzzle. Why does the system tell the caller of
DeviceIoControl() that it succeeded when it clearly did not?

The user mode program is not accessing the file for which the
outstanding
read is ‘pending’. It needs only consult some of its own internal data
before issuing the response. Further, the app never ‘signals’ the
filter,
except by calling an ioctl. I layed things out that way so as to
eliminate
the chance of missing a UM-generated signal. Only the filter performs a
KeSetEvent() to tell the UM program to come and get some data.

Bill

“Vladimir Chtchetkine” wrote in
message
news:xxxxx@ntfsd…
Bill: first of all, I’m not sure that your model is safe. Because by
returning STATUS_PENDING you simply “move” wait higher in the stack. But
still the calling thread will be blocked while you call your UM
component. Depending on what you do in UM you might be safe but you
might also end up deadlocking because most certainly FileObject for
which you see paging read is locked. If you do the same with paging
writes things are even worst because there are much more locks that
could be held…
Anyway, regarding a “hole” in your ioctl I would suggest to take a look
at the queue of filter requests to make sure that it’s empty when you
see a “lost” ioctl. It could be (I recall having the same problem
myself) that under heavy load you get into condition when you clear the
event while there are still filter request(s) in the queue. As I
remember, event-based KM-UM communication is not that simple as just
queuing the request and then signaling the event. Take a look at who is
setting the event (I presume, driver) and who is clearing it. If it’s UM
who clears it then chances are you’re missing some requests in driver
queue. So, I would suggest to look into this direction…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting
on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation
by
creating a worker thread specifically to handle the case of PAGING_IO
reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes
the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the
ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return
from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD
value
returned by the filter handling the second ioctl. When the error
occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but
the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal.
This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Well, then only last resort remains: debugging, debugging, debugging :slight_smile:
Nothing rings a bell anymore… Except exception :slight_smile: Could it be that
you have __try/__excepted ioctl handler in your driver but still returns
STATUS_SUCCESS even if exception has occurred?

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 10:59 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] private ioctl to filter fails under load

You understand the mechanism correctly. It is the response ioctl, not
the
pull ioctl which is getting lost. Thanks again for making me think, but
I
don’t think that explains it either…

When the system enters this ‘hung’ mode waiting for the expiration of
the
worker thread’s timeout, the last entry in my log and KdPrint trail is
the
“pull”. It happened and I can see the results of it. The last entry in
the
user mode companion app’s log is the reply (the one I never see). Debug
info emitted from that reply call to DeviceIoControl indicates that the
rc
was SUCCESS but the returned data clearly did not come from my filters
matching ioctl entry point (because I never saw the ioctl into my
filter).

What’s troubling, and make me think there’s something to your first
point in
your first reply is that, while the worker thread’s timeout is expiring,
I
see NO other file system activity passing through my filter. The stress
case we’re running is a simultaneous launch of 6 different apps against
six
different files of interest to my filter. I would expect to see other
FS
activity even if one thread/file is ‘waiting’.

“Vladimir Chtchetkine” wrote in
message
news:xxxxx@ntfsd…
The whole benefit of STATUS_PENDING comes when the caller uses
overlapped I/O and “completion ports” in which case I/O mgr doesn’t wait
for completion and returns control to the caller so the thread can
continue its job. But that’s not the case for paging I/O where thread
cannot continue until page fault is satisfied. So, this I/O must be
“synchronous” for this thread and I/O mgr will wait for the request to
be completed.

Lets get back one step. As I understand, the mechanism that you use to
communicate between KM and UM is like this:
1. You have a KEVENT object shared across KM and UM
2. In UM you have a thread that waits on this event
3. In KM you have a queue of driver requests, and:
a) after you add a request to the queue you signal the event
b) awaken, UM thread sends [pull] ioctl which pulls request from the
queue and copies request data into ioctl’s output buffer
c) UM handles that request and sends another [complete] ioctl that
signals completion of the request.

As I understand, the lost ioctl that you’re talking about is complete
ioctl, right? What I suspect is that there was no pull ioctl for that
request and that’s the real reason why you don’t see the complete ioctl
for it. Having multiple ioctls going on under heavy load you may oversee
that single case when pull ioctl is lost. And what you see as successful
group of ioctls in the log could be ioctls related to other requests.
One of the reasons why pull ioctl is lost could be the way you set/clear
that shared event. For example:
1. Request 1 is added to the queue and event is set.
2. UM thread awakens.
3. Request 2 is added to the queue and event is set
4. UM clears the event, pulls request 1 and completes it
5. UM goes back to wait and… waits because last operation on the event
was “clear”

It’s just simplest illustration on how pull ioctl can be lost. The
sequence of events can be different but if you clear event in UM while
setting it in KM you’re exposed to the risk of clearing it just after KM
has set it. Even if you do both, set and clear in KM you have to design
some locking mechanism to make sure that event is not cleared while
there are still requests in the queue.

Of course it’s just a guess (based on my own experience) and the reality
could be different. You may have a bug in the ioctl handlers that simply
needs to be debugged :slight_smile: I just wanted to eliminate possibility of
“known issues”…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 9:41 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] private ioctl to filter fails under load

Vladimir,
Thanks for the input. I am indeed concerned about the possibility of
moving
the wait higher in the stack, though I would think that the I/O Manager
would be designed for that and not become a chokepoint on a
STATUS_PENDING
return. Isn’t that the point of STATUS_PENDING? If not, what good is
it?

The lost ioctl is indeed lost… The Ring 3 app made the first ioctl
call
to fetch the data that the filter signalled was available; I see that
call
complete and my filter acts on it. The Ring 3 app ‘thinks’ that it has
made
the second call (to return the response) and, in fact, it did receive a
successful rc from the system. The problem is that my filter entry
point
for that ioctl never got called. I can see this via my log, via
KdPrints,
via breakpoint, and via the fact that the waiting worker thread times
out
without awaking. The event I use to signal from the ioctl to the
waiting
worker thread is a private event, initialized in the worker thread,
unique
to the context it is working on, and signalled from within my own
program,
not from Ring 3 - no clearing is done nor necessary. The problem is
that
the ioctl never got delivered to me, even though the system told the
Ring 3
app that the ioctl was completed successfully and so the resulting
signal to
the worker thread never got issued. This is further supported by the
fact
that check data I should be copying out in that second ioctl to the Ring
3
app is not correct, again despite the success rc from DeviceIoControl().
That is the puzzle. Why does the system tell the caller of
DeviceIoControl() that it succeeded when it clearly did not?

The user mode program is not accessing the file for which the
outstanding
read is ‘pending’. It needs only consult some of its own internal data
before issuing the response. Further, the app never ‘signals’ the
filter,
except by calling an ioctl. I layed things out that way so as to
eliminate
the chance of missing a UM-generated signal. Only the filter performs a
KeSetEvent() to tell the UM program to come and get some data.

Bill

“Vladimir Chtchetkine” wrote in
message
news:xxxxx@ntfsd…
Bill: first of all, I’m not sure that your model is safe. Because by
returning STATUS_PENDING you simply “move” wait higher in the stack. But
still the calling thread will be blocked while you call your UM
component. Depending on what you do in UM you might be safe but you
might also end up deadlocking because most certainly FileObject for
which you see paging read is locked. If you do the same with paging
writes things are even worst because there are much more locks that
could be held…
Anyway, regarding a “hole” in your ioctl I would suggest to take a look
at the queue of filter requests to make sure that it’s empty when you
see a “lost” ioctl. It could be (I recall having the same problem
myself) that under heavy load you get into condition when you clear the
event while there are still filter request(s) in the queue. As I
remember, event-based KM-UM communication is not that simple as just
queuing the request and then signaling the event. Take a look at who is
setting the event (I presume, driver) and who is clearing it. If it’s UM
who clears it then chances are you’re missing some requests in driver
queue. So, I would suggest to look into this direction…

-----Original Message-----
From: Bill Wilhelm [mailto:xxxxx@endwell.net]
Sent: Wednesday, April 28, 2004 5:42 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] private ioctl to filter fails under load

Here’s a reall odd one… My filter has a Ring 3 companion component.
During certain filtered file operations, the filter signals the Ring 3
piece, which fetches some data from the filter via an ioctl and then
provides a reply with a different ioctl call. On receipt of the second
ioctl, the filter signals itself to restart the thread that was waiting
on
the Ring 3 reply.

I heeded the warning about waiting in kernel mode on Ring 3 computation
by
creating a worker thread specifically to handle the case of PAGING_IO
reads
(cannot wait, right?). In this case, the Read entry point winds up
returning STATUS_PENDING and the worker thread, when awakened, completes
the
processing and the IRP later.

This works great 99.9% of the time. When the system is under load
(multipled target files being filtered simultaneously) it fails. What
happens is that my worked thread never receives the signal from the
ioctl
and so times out. My logs indicate that the second ioctl was, in fact,
never received yet the Ring 3 companion log showed a successful return
from
the ioctl.

I made a patch to the Ring 3 component to check for a specific DWORD
value
returned by the filter handling the second ioctl. When the error
occurs,
that value is wrong.

So… Ring 3 made the ioctl call and its return code was success, but
the
filter driver never received the ioctl and times out. While the timeout
interval is elapsing, I see NO additional file system I/O of any kind
through the filter and, other than mouse movement, the system is
unresponsive. When the timeout ends, the system returns to normal.
This
always happens when a PAGING_IO read is being processed.

Any ideas where my ioctl is going/hanging up? I’m stymied.

Thanks,
Bill


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Sounds like the new and improved Steve Balmer chant… Debugging, debugging,
debugging. :wink: That’s what I’ve been doing but am running out of ideas,
hence my query to the group… I am very wary of what we are doing even
though it works so well most of the time. I would suspect that this kind of
problem would ring a few bells somewhere… Tony?

Gotta give a no on the exception idea. The only relevant __try/__except in
there is not in the failing path at all (plus I log it and would see it,
anyway).

“Vladimir Chtchetkine” wrote in message
news:xxxxx@ntfsd…
Well, then only last resort remains: debugging, debugging, debugging :slight_smile:
Nothing rings a bell anymore… Except exception :slight_smile: Could it be that
you have try/ excepted ioctl handler in your driver but still returns
STATUS_SUCCESS even if exception has occurred?