ZwCreateFile in worker thread sometimes hangs

Hi,

In my driver I have registered a Load Image notification callback with PsSetCreateProcessNotifyRoutine.
In this callback I open the corresponding file to scan for VERSION_INFO etc. On windows 7 when this callback is called and the file is e.g. a DLL APCs are disabled. I then start a system worker thread to do the opening and short scan of the file.
This works fine except on some machines especially during boot of Windows. At some point (this varies) after login ZwCreateFile just hangs and does not return. And it does not hang on the same file every time and not at the same place in the login. Sometimes it only gets to the “Welcome” screen. Other times the desktop gets loaded.

Som of my code:
In the load image notify:

if (KeAreAllApcsDisabled()) {
pwork = IoAllocateWorkItem(device_obj);
KeInitializeEvent(&fi.kevent, NotificationEvent, FALSE);
fi.pFilename = pFilename;
IoQueueWorkItem(pwork, GetImageInfoWork, DelayedWorkQueue, &fi);
KeWaitForSingleObject(&fi.kevent, Executive, KernelMode, FALSE, NULL);

In GetImageInfoWork:
InitializeObjectAttributes(&attr, pFilename, OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE, NULL, NULL);
stat = ZwCreateFile(&hfile, FILE_READ_DATA | SYNCHRONIZE, &attr, &iostat, NULL, 0, FILE_SHARE_READ | FILE_SHARE_WRITE, FILE_OPEN, FILE_NON_DIRECTORY_FILE | FILE_SYNCHRONOUS_IO_NONALERT, NULL, 0);

Any ideas?

Can you use PsSetCreateProcessNotifyRoutineEx instead?

The CreateProcessNotifyRoutineEx callback is in the context of the actual
new process instead of the context of the creating thread. It also offers on
a silver platter the full image path in a form that can be used to open the
image file. See MSDN documentation for CreateProcessNotifyRoutineEx.

I have found using CreateProcessNotifyRoutineEx to be much more “reliable”
than SetCreateProcessNotifyRoutine.

FWIW. Your mileage may vary.

Thomas F. Divine

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@yahoo.dk
Sent: Tuesday, June 4, 2013 10:35 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] ZwCreateFile in worker thread sometimes hangs

Hi,

In my driver I have registered a Load Image notification callback with
PsSetCreateProcessNotifyRoutine.
In this callback I open the corresponding file to scan for VERSION_INFO etc.
On windows 7 when this callback is called and the file is e.g. a DLL APCs
are disabled. I then start a system worker thread to do the opening and
short scan of the file.
This works fine except on some machines especially during boot of Windows.
At some point (this varies) after login ZwCreateFile just hangs and does not
return. And it does not hang on the same file every time and not at the same
place in the login. Sometimes it only gets to the “Welcome” screen. Other
times the desktop gets loaded.

Som of my code:
In the load image notify:

if (KeAreAllApcsDisabled()) {
pwork = IoAllocateWorkItem(device_obj);
KeInitializeEvent(&fi.kevent, NotificationEvent, FALSE);
fi.pFilename = pFilename;
IoQueueWorkItem(pwork, GetImageInfoWork, DelayedWorkQueue, &fi);
KeWaitForSingleObject(&fi.kevent, Executive, KernelMode, FALSE, NULL);

In GetImageInfoWork:
InitializeObjectAttributes(&attr, pFilename, OBJ_CASE_INSENSITIVE |
OBJ_KERNEL_HANDLE, NULL, NULL);
stat = ZwCreateFile(&hfile, FILE_READ_DATA | SYNCHRONIZE, &attr,
&iostat, NULL, 0, FILE_SHARE_READ | FILE_SHARE_WRITE, FILE_OPEN,
FILE_NON_DIRECTORY_FILE | FILE_SYNCHRONOUS_IO_NONALERT, NULL, 0);

Any ideas?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

What does the entire work item look like up until the point you open the file? Maybe you are disabling apcs there too. What is the call stack of the hung zw call?

d

Bent from my phone


From: xxxxx@yahoo.dkmailto:xxxxx
Sent: ?6/?4/?2013 7:36 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] ZwCreateFile in worker thread sometimes hangs

Hi,

In my driver I have registered a Load Image notification callback with PsSetCreateProcessNotifyRoutine.
In this callback I open the corresponding file to scan for VERSION_INFO etc. On windows 7 when this callback is called and the file is e.g. a DLL APCs are disabled. I then start a system worker thread to do the opening and short scan of the file.
This works fine except on some machines especially during boot of Windows. At some point (this varies) after login ZwCreateFile just hangs and does not return. And it does not hang on the same file every time and not at the same place in the login. Sometimes it only gets to the “Welcome” screen. Other times the desktop gets loaded.

Som of my code:
In the load image notify:

if (KeAreAllApcsDisabled()) {
pwork = IoAllocateWorkItem(device_obj);
KeInitializeEvent(&fi.kevent, NotificationEvent, FALSE);
fi.pFilename = pFilename;
IoQueueWorkItem(pwork, GetImageInfoWork, DelayedWorkQueue, &fi);
KeWaitForSingleObject(&fi.kevent, Executive, KernelMode, FALSE, NULL);

In GetImageInfoWork:
InitializeObjectAttributes(&attr, pFilename, OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE, NULL, NULL);
stat = ZwCreateFile(&hfile, FILE_READ_DATA | SYNCHRONIZE, &attr, &iostat, NULL, 0, FILE_SHARE_READ | FILE_SHARE_WRITE, FILE_OPEN, FILE_NON_DIRECTORY_FILE | FILE_SYNCHRONOUS_IO_NONALERT, NULL, 0);

Any ideas?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

Thomas: Thanks for your suggestion. I didn’t try that but will give it a try!

Doron: Not much:
local vars…
if (pFilename == NULL || pFilename->Length == 0) {
DbgPrint(“GetImageInfoInternal filename NULL or length = 0!!!\n”);
return FALSE;
}
DbgPrint(“RBX: GetImageInfo %wZ\nIRQL = %d !Appcs = %d\n”, pFilename, KeGetCurrentIrql(), KeAreAllApcsDisabled());
pbuf = (PUCHAR) ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG);
if (pbuf) {
InitializeObjectAttributes(&attr, pFilename, OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE, NULL, NULL);
ZwCreateFile… etc.

And yes the DbgPrint correctly reports IRQL = 0 and Apcs as not disabled.

Downside of Ps…Ex is that it was introduced at Vista SP1. No joy on Vista
RTM.

Also Ps…Ex requires building and signing using “integrity checks”. If not
built and signed this somewhat different way driver will not load.

Thomas

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@yahoo.dk
Sent: Tuesday, June 4, 2013 11:08 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] ZwCreateFile in worker thread sometimes hangs

Thomas: Thanks for your suggestion. I didn’t try that but will give it a
try!

Doron: Not much:
local vars…
if (pFilename == NULL || pFilename->Length == 0) {
DbgPrint(“GetImageInfoInternal filename NULL or length = 0!!!\n”);
return FALSE;
}
DbgPrint(“RBX: GetImageInfo %wZ\nIRQL = %d !Appcs = %d\n”, pFilename,
KeGetCurrentIrql(), KeAreAllApcsDisabled());
pbuf = (PUCHAR) ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG);
if (pbuf) {
InitializeObjectAttributes(&attr, pFilename, OBJ_CASE_INSENSITIVE |
OBJ_KERNEL_HANDLE, NULL, NULL);
ZwCreateFile… etc.

And yes the DbgPrint correctly reports IRQL = 0 and Apcs as not disabled.


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thomas, now I had a chance to look at your suggestion. My problem is when processes start or are loaded but when DLL’s are loaded.

The line should be
My problem is not when processes start or are loaded but when DLL’s are loaded.

Does it consistently hang when a DLL is being loaded, or only sometimes?
If only sometimes, is there a pattern that you can infer? For example, if
a DLL is being loaded for the first time, does it hang, or does it hang
only if the DLL has already been loaded? Is it feasible to do a debug
print of each file name as your code is invoked?

I have not the slightest idea why it hangs, but I’d try to discover
patterns. Also, if you have a debugger connected, use !locks to see what
is locked by whom. Otherwise, you are trying to be a psychic, and tat’s
not productive. Also, look for cascading patterns of sequences of
dependent DLLs being loaded. Not sure what, if anything, this would
reveal, but it’s what I’d be trying.

Also note that you might be encountering a race condition that can result
in deadlock. If it appears to be no pattern, consider the problems of
deadlock. Also, note that DLLs can be loaded concurrently and in any oder
by multiple cores, which again opens the possibility of deadlock, and
nondeterministic deadlock at that. If there is a way to determine what
core you are running on, this is potentially useful information, but is
always suspect from a passive level thread because the thread can be
rescheduled between the query and the print, and thus may acquire a lock
from a different core than it clais to be running on. But !locks will
tell you the truth.

AFAIK, only KMUTEX has recursive-acquisition semantics; KFAST_MUTEX
definitely says it does not, and I think ERESOURCE is also not recusive.
So in addition to generic deadlock, it is possible you could be looking at
a recursive acquisition of a lock which does not support this. The file
system is full of all kinds of locks, so it may not e your code that is at
fault directly, but your code could be triggering either a deadlock or
recursive acquisition in the file system. The fact that your passive
thread is calling ZwCreateFile, in my frst guess, increases the odds of
this.

I can only say that for a problem like this I would try to learn as much
as possible about what is going on inside it and its interactions with the
fike system. Also, I would want to enable Driver Verifier on it, te
driver above it, the driver below it, and the top-level file system driver
invoked by ZwCreateFile. Make sure deadlck-detction is enabled in DV.
Combined with soe trace output of what files are being intercepted, this
might help narrow down the search.
joe

The line should be
My problem is not when processes start or are loaded but when DLL’s are
loaded.


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I’ve tried to protect this section of the code with different mutexes and critical regions. Makes no difference.

It is always a DLL but not all DLLs. I usually get to the logon screen and enter my password and often it gets past “Welcome” and into the desktop before it locks.

In my last run and trace it hangs when svchost.exe is loading a DLL that also was loaded earlier by another instance of svchost.exe and went through my code then without problems.

!locks revealed one lock this time but often there are none held:
Resource @ nt!IopDriverLoadResource (0xfffff80002cd3a20) Exclusively owned
Contention Count = 1
Threads: fffffa8003cf2b50-01<*>
9503 total locks, 1 locks currently held

Doron asked you to post the call stack of the hanging thread, which you
still haven’t done.

-scott
OSR

wrote in message news:xxxxx@ntdev…

I’ve tried to protect this section of the code with different mutexes and
critical regions. Makes no difference.

It is always a DLL but not all DLLs. I usually get to the logon screen and
enter my password and often it gets past “Welcome” and into the desktop
before it locks.

In my last run and trace it hangs when svchost.exe is loading a DLL that
also was loaded earlier by another instance of svchost.exe and went through
my code then without problems.

!locks revealed one lock this time but often there are none held:
Resource @ nt!IopDriverLoadResource (0xfffff80002cd3a20) Exclusively
owned
Contention Count = 1
Threads: fffffa8003cf2b50-01<*>
9503 total locks, 1 locks currently held

I am not sure how to get the call stack of the hung ZwCreateFile.

ZwCreateFile called directly from the work item callback very early in the function as shown in one of my previous posts.

We don’t need the code, we need the call stack from the debugger. When it
hangs, break into the debugger and execute the following command:

!stacks 2

Search the output for your code, you should see your call stack somewhere in
the System process.

-scott
OSR

wrote in message news:xxxxx@ntdev…

I am not sure how to get the call stack of the hung ZwCreateFile.

ZwCreateFile called directly from the work item callback very early in the
function as shown in one of my previous posts.

Break into the debugger, run !stacks (see MSDN) where you can filter the output to just your driver with the correct flags

d

Bent from my phone


From: xxxxx@yahoo.dkmailto:xxxxx
Sent: ?6/?5/?2013 9:27 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] ZwCreateFile in worker thread sometimes hangs

I am not sure how to get the call stack of the hung ZwCreateFile.

ZwCreateFile called directly from the work item callback very early in the function as shown in one of my previous posts.


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

Thanks. Need to learn something new every day. :slight_smile:

My driver is rbx. I see several blocks like this:

4.0022b0 fffffa80057f3060 ffffcce7 Blocked nt!KiSwapContext+0x7a
nt!KiCommitThreadWait+0x1d2
nt!KeWaitForSingleObject+0x19f
nt!AlpcpSignalAndWait+0x8f
nt!AlpcpReceiveSynchronousReply+0x46
nt!AlpcpProcessSynchronousRequest+0x33d
nt!LpcpRequestWaitReplyPort+0x9c
nt!LpcRequestWaitReplyPort+0x19
eamonm+0x6072
eamonm+0x5425
eamonm+0x841b
fltmgr!FltpPerformPreCallbacks+0x2f7
fltmgr!FltpPassThroughInternal+0x4a
fltmgr!FltpCreate+0x293
nt!IopParseDevice+0x14d3
nt!ObpLookupObjectName+0x588
nt!ObOpenObjectByName+0x306
nt!IopCreateFile+0x2bc
nt!NtCreateFile+0x78
nt!KiSystemServiceCopyEnd+0x13
nt!KiServiceLinkage
rbx!GetImageInfoInternal+0x11b
rbx!GetImageInfoThread+0x1d
nt!PspSystemThreadStartup+0x5a
nt!KxStartSystemThread+0x16

and this, which would be when a DLL loads as Apcs are disabled and I start a worker thread:

4.0022c0 fffffa800580e060 ffffccd5 Blocked nt!KiSwapContext+0x7a
nt!KiCommitThreadWait+0x1d2
nt!KeWaitForSingleObject+0x19f
nt!AlpcpSignalAndWait+0x8f
nt!AlpcpReceiveSynchronousReply+0x46
nt!AlpcpProcessSynchronousRequest+0x33d
nt!LpcpRequestWaitReplyPort+0x9c
nt!LpcRequestWaitReplyPort+0x19
eamonm+0x6072
eamonm+0x5425
eamonm+0x841b
fltmgr!FltpPerformPreCallbacks+0x2f7
fltmgr!FltpPassThroughInternal+0x4a
fltmgr!FltpCreate+0x293
nt!IopParseDevice+0x14d3
nt!ObpLookupObjectName+0x588
nt!ObOpenObjectByName+0x306
nt!IopCreateFile+0x2bc
nt!NtCreateFile+0x78
nt!KiSystemServiceCopyEnd+0x13
nt!KiServiceLinkage
rbx!GetImageInfoInternal+0x11b
rbx!GetImageInfoThread+0x1d
nt!PspSystemThreadStartup+0x5a
nt!KxStartSystemThread+0x16

Looks like eamonm is a part of ESET NOD32 Antivirus. You should ask them why they are blocking your threads.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.dk
Sent: Wednesday, June 5, 2013 11:29 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] ZwCreateFile in worker thread sometimes hangs

and this, which would be when a DLL loads as Apcs are disabled and I start a worker thread:

4.0022c0 fffffa800580e060 ffffccd5 Blocked nt!KiSwapContext+0x7a
nt!KiCommitThreadWait+0x1d2
nt!KeWaitForSingleObject+0x19f
nt!AlpcpSignalAndWait+0x8f
nt!AlpcpReceiveSynchronousReply+0x46
nt!AlpcpProcessSynchronousRequest+0x33d
nt!LpcpRequestWaitReplyPort+0x9c
nt!LpcRequestWaitReplyPort+0x19
eamonm+0x6072
eamonm+0x5425
eamonm+0x841b
fltmgr!FltpPerformPreCallbacks+0x2f7
fltmgr!FltpPassThroughInternal+0x4a
fltmgr!FltpCreate+0x293
nt!IopParseDevice+0x14d3
nt!ObpLookupObjectName+0x588
nt!ObOpenObjectByName+0x306
nt!IopCreateFile+0x2bc
nt!NtCreateFile+0x78
nt!KiSystemServiceCopyEnd+0x13
nt!KiServiceLinkage
rbx!GetImageInfoInternal+0x11b
rbx!GetImageInfoThread+0x1d
nt!PspSystemThreadStartup+0x5a
nt!KxStartSystemThread+0x16


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Interesting. This particular test machine is running ESET NOD32…

Moving more code out of the callback and to a seperate thread solved it so far, also on machines with other AVs that also made trouble.

Thanks for your help guys. Learned a lot from it.