Deadlock on kernel mode thread (IRP_MJ_CREATE hook) waiting for user mode

Hi,
I created a filter device which hooks create/cleanup requests and wait for
a user mode pool of threads to process that file (fopen/fread/fclose). In
kernel mode I call KeWaitForSingleObject(event1,…) and the event1 is
signaled when the user mode thread finished the job. It works fine, but
sometime the code waits for ever in the kernel mode hook routine (event1
is never signaled), waiting for the user mode thread to finish the job. In
fact, the user mode thread never exits from fopen() call.
Of course, into my hook routine I skip processing the file when I check
the PID of my user mode process, so there is no problem with reentrancy.
What happened and how could this be avoided?
Thanks in advance!

Assuming the Event Mechanism you implemented for usr<->krnl signalling is
correct most of the time ( there is a good article on OSR’s Nt Insider),
what is the reason that you may have to wait forever ? Can U have some kind
of worst case time that you would be waiting, then timeout and process the
error, if it is timed out !!!

-prokash
----- Original Message -----
From: “Areana Mere”
To: “Windows File Systems Devs Interest List”
Sent: Tuesday, September 23, 2003 9:11 AM
Subject: [ntfsd] Deadlock on kernel mode thread (IRP_MJ_CREATE hook) waiting
for user mode

> Hi,
> I created a filter device which hooks create/cleanup requests and wait for
> a user mode pool of threads to process that file (fopen/fread/fclose). In
> kernel mode I call KeWaitForSingleObject(event1,…) and the event1 is
> signaled when the user mode thread finished the job. It works fine, but
> sometime the code waits for ever in the kernel mode hook routine (event1
> is never signaled), waiting for the user mode thread to finish the job. In
> fact, the user mode thread never exits from fopen() call.
> Of course, into my hook routine I skip processing the file when I check
> the PID of my user mode process, so there is no problem with reentrancy.
> What happened and how could this be avoided?
> Thanks in advance!
>
> —
> You are currently subscribed to ntfsd as: xxxxx@garlic.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

deadlock.

Hi,

Of course I wait with an timeout, but this not help me. I want to let any
file unprocessed.

If I remember correctly, on the user space you are using thread pooling (
more than one precreated thd), and on the krnl you are waiting threads to be
exited and signalled ( assuming Ur KeWaitFor*() is keying on thd id !!!).

I think there is a bug lurking on your design, it would be better if you
just put the design ( not necessarily the code, for your propritary …)
someone will point you to the right direction in no time…

Already Nick mentioned that get the stack trace of the blocking thread from
windbg ( may be get the stack trace of the proceess with most of the
threads) then you would
probably have an answer to it quite fast.

For now ----

  1. How you waiting on the kernel side ? On thd terminations !!!, then how do
    you pass the thd id ?

-prokash
----- Original Message -----
From: “Areana Mere”
To: “Windows File Systems Devs Interest List”
Sent: Wednesday, September 24, 2003 2:05 AM
Subject: [ntfsd] Re: Deadlock on kernel mode thread (IRP_MJ_CREATE hook)
waiting for user mode

> Hi,
>
> Thanks for the advice. I will do that. I am able now to reproduce the bug,
> but still cannot solve it. It happened all the time I want to save an
> Microsoft Office file (.doc, .xls). It looks like it is a lock inside the
> OS kernel code . I have found that by putting debug trace into all my
> called functions.
> I noticed that many people use kernel <> user synchronization on
> IRP_MJ_CREATE hook routine. Did my problem happened to anybody else?
>
> —
> You are currently subscribed to ntfsd as: xxxxx@garlic.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>

Hi,

In kernel mode I am doing :

NTSTATUS IRP_MJ_CREATE_hook(…){

thread_number=choose_user_thread_to_process();
MUTEXACQUIRE( mutex1[thread_number] );

// store file name to be retreived by DIOC
store_file_name(thread_number,filename);

//here signal a kernel/user shared event
signal_scan_to_user_mode(thread_number);

time = RtlConvertLongToLargeInteger(-MY_TIME);
status=KeWaitForSingleObject(wait_event[thread_number],Executive,KernelMode,
FALSE, &time);
if(status==STATUS_TIMEOUT)
{
DbgPrint(“WRONG!!!\n”);
}
MUTEXRELEASE( mutex1[thread_number] );


}
the same for cleanup.

In user mode I wait for the share event, get the file name usig a
DeviceIoControl . When I am ready I send a DeviceIoControl in which my
driver do a simple set of the event wait_event[thread_number] coresponding
with the thread number from the service which processed the file.

This works fine, only that I get STATUS_TIMEOUT for some network opened
files or some files opened by MS Office. Those files are opened instantly
without my filter started.
I saw here that many guys use this approach. I am sure that they received
this problem too. I do not belive they do not tested that. The question is
if they issued their products with this timeout or really solved the
problem.
Me , I want that this timeout never appear. Please, notice that I’ve tried
diffrent values for MY_TIME (1sec.-20sec.)

Regards.

First of all, I don’t think that this is a good idea to wait for an
event while you’re holding a mutex.
Second, where and how do you clear events (especially, the one that is
used in UM)? Consider this scenario:
a) your driver’s thread is T1 and your UM thread is T2

  1. T1 sets event in the driver and goes to wait. - T1 is suspended
  2. UM thread wakes up, and sends IOCTL.
  3. In IOCTL handler T2 signals event on which your driver waits - T1
    gets activated, T2 is suspended (very likely). T1 releases the mutex and
    returns from the hook.
  4. By that time there is another request that needs to be sent to UM and
    it is going to rely on T2 for handling. So, another driver thread (T3)
    goes through your IRP_MJ_CREATE_hook routine, sets UM event and goes to
    wait (T3 gets suspended)
  5. T2 gets activated and… If you clear T2’s event at that point T2 is
    going to wait on it “forever” and T3 is going to wake up on time out.

So, carefully look at how do you set/clear events…

-----Original Message-----
From: Areana Mere [mailto:xxxxx@yahoo.com]
Sent: Thursday, September 25, 2003 12:53 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] Re: Deadlock on kernel mode thread (IRP_MJ_CREATE hook)
waiting for user mode

Hi,

In kernel mode I am doing :

NTSTATUS IRP_MJ_CREATE_hook(…){

thread_number=choose_user_thread_to_process();
MUTEXACQUIRE( mutex1[thread_number] );

// store file name to be retreived by DIOC
store_file_name(thread_number,filename);

//here signal a kernel/user shared event
signal_scan_to_user_mode(thread_number);

time = RtlConvertLongToLargeInteger(-MY_TIME);

status=KeWaitForSingleObject(wait_event[thread_number],Executive,KernelM
ode,
FALSE, &time);
if(status==STATUS_TIMEOUT)
{
DbgPrint(“WRONG!!!\n”);
}
MUTEXRELEASE( mutex1[thread_number] );


}
the same for cleanup.

In user mode I wait for the share event, get the file name usig a
DeviceIoControl . When I am ready I send a DeviceIoControl in which my
driver do a simple set of the event wait_event[thread_number]
coresponding
with the thread number from the service which processed the file.

This works fine, only that I get STATUS_TIMEOUT for some network opened
files or some files opened by MS Office. Those files are opened
instantly
without my filter started.
I saw here that many guys use this approach. I am sure that they
received
this problem too. I do not belive they do not tested that. The question
is
if they issued their products with this timeout or really solved the
problem.
Me , I want that this timeout never appear. Please, notice that I’ve
tried
diffrent values for MY_TIME (1sec.-20sec.)

Regards.


You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The information is certainly not close to complete !!!

  1. What is the purpose of having Mutex to guard execution ? Mutex is
    recursive, meaning if it is owned by the thd already, there is no
    block-wait.

  2. How the thread_number is being chosen ? What if the KeWait*() later down
    is waiting on itself, I MEAN WHO IS SUPPOSED TO SIGNAL THAT EVENT ELEMENT
    FROM THE ARRAYOF EVENT ELMENT, if the same thread from usr mode, we are
    stuck.

  3. Who is creating these event handle ? and how it is being accessible to
    other side? Are these events are threadid ( so signalling state is when
    exits properly !!).

I would recommend to read as much as possible from OSR insider articles to
simplify the design.

Also debugging articles ( in NT insider) to find the exact cause of deadlock
.

Also only one event ( shared event) to signal to the user !!! what is that
for.

It seems like there are usr threads that were designed to have separation of
concern !!!. But not sure !

Finally even if it works, I am afraid how it is going to have the benifit of
threaded design, if the Mutex is not clearly thought out .

Particularly to look at —

NT insider Jan, 1997 article on Synchronization
NT insider Jan-Feb 2002 on synchronization
NT insider Jan-Feb 2001 (De)bugging deadlocks

Also IIRC, there is an ext dll with 6.2.**** Windbg for deadlock
detection/analysis…

-prokash
----- Original Message -----
From: “Vladimir Chtchetkine”
To: “Windows File Systems Devs Interest List”
Sent: Thursday, September 25, 2003 9:01 AM
Subject: [ntfsd] Re: Deadlock on kernel mode thread (IRP_MJ_CREATE hook)
waiting for user mode

First of all, I don’t think that this is a good idea to wait for an
event while you’re holding a mutex.
Second, where and how do you clear events (especially, the one that is
used in UM)? Consider this scenario:
a) your driver’s thread is T1 and your UM thread is T2
1. T1 sets event in the driver and goes to wait. - T1 is suspended
2. UM thread wakes up, and sends IOCTL.
3. In IOCTL handler T2 signals event on which your driver waits - T1
gets activated, T2 is suspended (very likely). T1 releases the mutex and
returns from the hook.
4. By that time there is another request that needs to be sent to UM and
it is going to rely on T2 for handling. So, another driver thread (T3)
goes through your IRP_MJ_CREATE_hook routine, sets UM event and goes to
wait (T3 gets suspended)
5. T2 gets activated and… If you clear T2’s event at that point T2 is
going to wait on it “forever” and T3 is going to wake up on time out.

So, carefully look at how do you set/clear events…

-----Original Message-----
From: Areana Mere [mailto:xxxxx@yahoo.com]
Sent: Thursday, September 25, 2003 12:53 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] Re: Deadlock on kernel mode thread (IRP_MJ_CREATE hook)
waiting for user mode

Hi,

In kernel mode I am doing :

NTSTATUS IRP_MJ_CREATE_hook(…){

thread_number=choose_user_thread_to_process();
MUTEXACQUIRE( mutex1[thread_number] );

// store file name to be retreived by DIOC
store_file_name(thread_number,filename);

//here signal a kernel/user shared event
signal_scan_to_user_mode(thread_number);

time = RtlConvertLongToLargeInteger(-MY_TIME);

status=KeWaitForSingleObject(wait_event[thread_number],Executive,KernelM
ode,
FALSE, &time);
if(status==STATUS_TIMEOUT)
{
DbgPrint(“WRONG!!!\n”);
}
MUTEXRELEASE( mutex1[thread_number] );


}
the same for cleanup.

In user mode I wait for the share event, get the file name usig a
DeviceIoControl . When I am ready I send a DeviceIoControl in which my
driver do a simple set of the event wait_event[thread_number]
coresponding
with the thread number from the service which processed the file.

This works fine, only that I get STATUS_TIMEOUT for some network opened
files or some files opened by MS Office. Those files are opened
instantly
without my filter started.
I saw here that many guys use this approach. I am sure that they
received
this problem too. I do not belive they do not tested that. The question
is
if they issued their products with this timeout or really solved the
problem.
Me , I want that this timeout never appear. Please, notice that I’ve
tried
diffrent values for MY_TIME (1sec.-20sec.)

Regards.


You are currently subscribed to ntfsd as:
xxxxx@borland.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to ntfsd as: xxxxx@garlic.com
To unsubscribe send a blank email to xxxxx@lists.osr.com