Known Event Synchronization Issues on Windows XP, Possibly Related to .NET Interop?

Hi,

I’m only posting this after having narrowed down the conflicting code to the minimum possible (20-ish lines) that consistently reproduces this problem - I have no doubt in my mind that the event synchronization code in Windows is one of the most heavily tested and vetted bits of code in the kernel, seeing as pretty much everything relies on it.

Some background (including probably irrelevant details, just in case): I have a C++ session 0 application launching a .NET application under the currently-logged-on user’s session. This C# application loads a C++ DLL via .NET interop, which has some very basic synchronization code (reproduced below), that consistently (after random length of time) causes a deadlock on any Windows XP machine, but not on Windows 7.

In the code sample below, at any time that the variable “_count” is equal to 0, the manual reset event will be set. It will remain set until such a time that the _count variable is no longer equal to zero. i.e. the variable being zero and the event being set are analogous in an if-and-only-if relationship.

My problem: I will find the process deadlocked, and upon debugging, it will be stuck in a WaitForSingleObject() on the HANDLE for the manual reset event in question; also, the destructor will not have been called… However, the “_count” variable will be zero, which *should* never happen.

I hate posting code to this list, I know that all of you have something better to do than trawl through lines of useless and irrelevant code, so I’ve tried to minify it as much as possible. Also, by decreasing the length of the code it should make it easier to mentally validate.

The sample code: http://pastebin.ca/1979497
(just so that the whitespace isn’t mangled - no need to make it harder than it already is to peruse the sample code!)

The more I think about it, the more I realize that I shouldn’t expect it to work. From here on out is just random musings, please correct me if I’m wrong.

Synchronization events are, by and large, kernel objects. References to them would be stored in the PEB/TEB in the kernel. Depending on how I call the native APIs to create these objects, it’s certainly logical that something would go amiss, seeing as .NET threads and .NET processes aren’t in the same class as normal, native threads and processes. To the best of my knowledge, the .NET counterparts do not have the same entries in the same TEB/PEB entries in the kernel, so it would logically follow that native APIs that interact with the TEB or PEB objects will not function 100% the same when called via .NET interop? Unless, of course, the .NET virtual machine creates threads and processes differently in the case of interop?

I assume if I call “CreateThread” via Interop from within a .NET thread, a true, native thread with a proper TEB will be made… But what if I use Win32 synchronization APIs directly from a .NET thread? Makes me wonder if there’s more to the .NET ports of synchronization objects than a straight wrapper function…

You are over thinking things. The event object is stored in the handle table (in the peb), not the teb. As long as you are in the same .net process (could be diff app domains), the own us the same.

d

dent from a phpne with no keynoard

-----Original Message-----
From: xxxxx@NeoSmart.net
Sent: November 02, 2010 4:15 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Known Event Synchronization Issues on Windows XP, Possibly Related to .NET Interop?

Hi,

I’m only posting this after having narrowed down the conflicting code to the minimum possible (20-ish lines) that consistently reproduces this problem - I have no doubt in my mind that the event synchronization code in Windows is one of the most heavily tested and vetted bits of code in the kernel, seeing as pretty much everything relies on it.

Some background (including probably irrelevant details, just in case): I have a C++ session 0 application launching a .NET application under the currently-logged-on user’s session. This C# application loads a C++ DLL via .NET interop, which has some very basic synchronization code (reproduced below), that consistently (after random length of time) causes a deadlock on any Windows XP machine, but not on Windows 7.

In the code sample below, at any time that the variable “_count” is equal to 0, the manual reset event will be set. It will remain set until such a time that the _count variable is no longer equal to zero. i.e. the variable being zero and the event being set are analogous in an if-and-only-if relationship.

My problem: I will find the process deadlocked, and upon debugging, it will be stuck in a WaitForSingleObject() on the HANDLE for the manual reset event in question; also, the destructor will not have been called… However, the “_count” variable will be zero, which should never happen.

I hate posting code to this list, I know that all of you have something better to do than trawl through lines of useless and irrelevant code, so I’ve tried to minify it as much as possible. Also, by decreasing the length of the code it should make it easier to mentally validate.

The sample code: http://pastebin.ca/1979497
(just so that the whitespace isn’t mangled - no need to make it harder than it already is to peruse the sample code!)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

xxxxx@NeoSmart.net wrote:

The more I think about it, the more I realize that I shouldn’t expect it to work. From here on out is just random musings, please correct me if I’m wrong.

Synchronization events are, by and large, kernel objects. References to them would be stored in the PEB/TEB in the kernel. Depending on how I call the native APIs to create these objects, it’s certainly logical that something would go amiss, seeing as .NET threads and .NET processes aren’t in the same class as normal, native threads and processes.

That’s not true. A thread is a thread. The .NET CLR is just a large,
fancy layer on top of the Win32 API.

To the best of my knowledge, the .NET counterparts do not have the same entries in the same TEB/PEB entries in the kernel,

Again, that’s not true. A thread is a thread. All processes have PEB
entries. All threads have TEB entries. .NET puts its own scaffolding
on top of those structures, but the structures are there.

…so it would logically follow that native APIs that interact with the TEB or PEB objects will not function 100% the same when called via .NET interop? Unless, of course, the .NET virtual machine creates threads and processes differently in the case of interop?

I don’t think it’s really accurate to call .NET a “virtual machine”.
After .NET does its JIT compile, you have a plain vanilla Win32 process
running compiled code. .NET happens to provide a huge run-time library
for that code, but it’s not as sophisticated as a virtual machine.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

xxxxx@NeoSmart.net wrote:


In the code sample below, at any time that the variable “_count” is equal to 0, the manual reset event will be set. It will remain set until such a time that the _count variable is no longer equal to zero. i.e. the variable being zero and the event being set are analogous in an if-and-only-if relationship.

My problem: I will find the process deadlocked, and upon debugging, it will be stuck in a WaitForSingleObject() on the HANDLE for the manual reset event in question; also, the destructor will not have been called… However, the “_count” variable will be zero, which *should* never happen.

I don’t see any holes in this. How do you manage this object’s
lifetime? It must be used by multiple threads, since you have someone
waiting while others are bumping the counts. Are you quite sure that
the object cannot be used before its constructor runs and completes?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> Synchronization events are, by and large, kernel objects. References to them would be stored in the

PEB/TEB in the kernel.

No, in the handle table, which is neither TEB not PEB, but yes, it’s per-process.

would go amiss, seeing as .NET threads and .NET processes aren’t in the same class as normal,
native threads and processes.

They are the same from the kernel’s point of view.

proper TEB will be made… But what if I use Win32 synchronization APIs directly from a .NET thread?

Must work.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> The sample code: http://pastebin.ca/1979497

The code is OK. Note some things:

  1. interlocked operations are not needed, the critical section provides everything which is needed
  2. the object must not be used before constructed, and after destruction - can you guarantee this?
  3. probably there is some bug in .NET part of the code. I would put this code to a separate DLL loaded by both managed and unmanaged processes, exporting some sane API to this kind of object.

Also note that, for such a construct, you can consider using SignalObjectAndWait with your own critical section implementation - signal the event inside your critical section and wait for the main event. See the web pages on “Win32 events vs. UNIX condvars” topic.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

In addition to what have been already mentioned.

How do you instantiate your class? Is it a global object? If so, does your DLL links with CRT? Because it is CRT responsible to calling contructors for global objects before hitting main/DllMain.

Moreover, because it is a DLL, do you make sure that you are properly initializing a DLL? I.e., you should not call anything else except functions from kernel32.dll in DllMain, you should not do LoadLibrary for other DLLs there, etc.

You MUST delete the critical section in your destructor.

Also, it does NOT make sense to protect destruction by the critical section. By definition, the object should only be destroyed when it’s not accessed from outside.