Unique process id

>You’re not saying that there is one correct answer to a given problem, right?
There is *never* one correct answer to an engineering problem. There are merely
choices among correct answers which optimize different factors to obtain the
desired results. So, it’s not “an exact science” in MY book.

Well, in mechanical engineering “strenght of materials” is pretty much an exact science, but when you need to design a load-bearing part, you can come to different solutions, depending on whether you optimize for mass, cost, or other factors.

> loud. Joe has a tendency to be extremely literal (to the point of

I could express my opinion on Joe, but for me, it would be impolite.

He is one of us forum participants, and surely a valuable one.

“And the one without sin, let him to throw the first stone at her” (incorrect cite from the Gospel).


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> No that is not what I meant. Perhaps I should have said “engineering discipline” or at least

a process that can be formalized.

Good luck…

Surely there is some research done on formally correct OS kernel (you can check NICTA site for more info), but this is just a research. AFAIK, the most they have achieved in so far is developing a fully verified L4 microkernel. BTW, AFAIK, software verification normally requires use of pure functional languages. Do you want to write all your system software in Haskell and friends or you still would prefer a good old C for this purpose???

Furthermore, the most that verification can do is to prove that a given implementation
correctly implements a certain abstract mathematical specification. However, who says that this specification in itself cannot be erroneous?

Anton Bassov

> developing software for humans is a different practice with different rules

from developing software for devices.

…and developing software for Go’auld or Ori would be even third discipline, and for Oma Desala or Tolkien’s Valar would be even fourth :slight_smile:


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> required putting up a dialog that said “waiting for network response” and

I later added a progress bar that counted down 70 seconds (and recorded

TortoiseSVN is still such.

executed. But they’re not inportant anyway (and if you believe this, my
uncle was in the Nigerian army and I need your help in getting his assets
out of the country).

…and my sister was a personal helper of Michael Khodorkovsky and I need your help in getting his assets out of Russia

:-)))

I once had a client insist that C++ collections were unreliable because
they leaked memory (I was being asked to recode my solution because I had
used std::map). I said this was simply not true, and he countered by
presenting me with “proof”, a program that leaked massive amounts of heap
space when it had used C++ collections. In less than five minutes I found
an exit() call buried in an error path, and eventually found a couple
dozen. They used aalloc compulsively, “because using malloc() gave us the
same problems”

Too bad some people have too much white noise in their DNA (the usual conclusion on such issues).

It’s easy to write code that gives the illusion of working. It’s a lot
harder to write code that is actually correct, and will remain so for
years. Key is to not commit certain errors

+100


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>> That thread could not be broken out of the gethostbyname (if I’ve

> remembered the call correctly)
> function even by a call to ExitThread.

I just wonder how a thread may be blocking and calling ExitThread() at
the same time…

So do I. this was over 10 years ago, and I misremembered what they were
doing. But at this point, I can’t re-create what they were doing; it was
something they were trying to do, forcing a clean thread termination from
within the kernel for a blocked thread (they had a source license, and I
didn’t have access, and they couldn’t show me what they were doing because
I wasn’t an employee). They had done something in a “device” driver that
called some kernel functions. So I only had an hour’s exposure to the
problem, and clearly have forgotten a lot.

OK, I can attribute it to a typo, but look below:

If you make an exit() call the calling process terminates, and, at this
point, freeing memory becomes just unnecessary, right. In any case, you
will never reach the point of memory leak becoming evident, for
understandable reasons…

Yeah, I saw that just as I hit “send”, and it was too late to fix it; I
meant “ExitThread”, but the fingers apparently had a life of their own,
disconnected from the forebrain.

All your statements seem to be related strictly to ExitThread() - indeed,
you have a good chance to leave the address space in inconsistent state
unless you know what you are doing. However, ExitProces() just terminates
a caller, and, at this point, all cleanup tasks become already
meaningless. Therefore, unless you happen to use IPC synch constructs
across the process boundary or do something that may affect the behavior
of other processes in the system , what all this has to do with
ExitProcess() is just above me…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> wrote in message news:xxxxx@ntdev…
>>
>> Some years ago, we had to do DNS name resolution on program startup, so
>> it
>> could re-establish communication. There were no asynchronous DNS calls
>> then, so I did it all in a backround thread, with an unobtrusive
>> progress
>> bar displayed on the status bar. An attempt to shut the program down
>> required putting up a dialog that said “waiting for network response”
>> and
>> I later added a progress bar that counted down 70 seconds (and recorded
>> the time right before the call so it was mostly accurate). That thread
>> could not be broken out of the gethostbyname (if I’ve remembered the
>> call
>> correctly) function even by a call to ExitThread.
>>
>
> In our example we are dealing with a problem. A problem that is caused by
> a
> synchronous API that is blocking for way too long. I consider it just an
> easy way out to put it in the face of the end user by means of a waiting
> dialog so that now it becomes HIS problem instead. Like Windows Live Mail,
> it sucks. Like fighting over sand, it doesn’t mean you should, just
> because
> you can. And if the end user is savvy enough he will almost certainly kill
> off the hanging app with the task manager. That is worse than terminating
> the process in a controlled manner.
>
> In the above example, when it’s time to say goodbye for a process I don’t
> put up a dialog box telling the user he should quietly wait. What I do is,
> set a flag in my program so that
>> notify soneone the thread has cleanly terminated
> is no longer required and will be ignored if it happens thereafer
> regardless. So terminating the entire process at this point after the
> necessary cleanup has been done, is the only right thing to do in my
> book.
> The kernel will clean up all outstanding resources that I do not care
> about.
> Call that erroneous and if you still do consider that there are
> synchronous
> API calls that can block for much longer than 70 seconds.
>
> If you want to carve a rule in stone that says “TerminateThread is evil
> and
> should never be used” you can count me in. TerminateThread can leave your
> program in an undefined state and will leak. It’s not even allowed to use
> as
> a last recourse when the program exists because it can potentially
> deadlock.
>
> If you want to carve a rule in stone that says “TerminateProcess is evil
> and
> should never be used” then I say hold on a second.
>
Actually, it is extremely evil. Consider the following (very real) case
(which I had to fix)

The subroutine library was written by people who had no concept of
robustness. If they found, for example, an error in the database index
file (this was a library that allowed direct access to dBase IV
databases), they just called exit(-1). When I asked why, they said “if
the index is messed up, we can’t continue”. Fine. But we were working
with an app that did realtime data collection while processing report
creation in a separate thread. So we had the main GUI thread, the data
collection thread, and the report-printing thread. When they called
exit(-1), the process terminated, losing irreplaceable data. All I wanted
to do was shut down the report thread if there was an error in the
indexing. Since this was for a liquid CO2 chromatograph, it could be used
in contexts such as forensic analysis. I was called in because it had
actually crashed, losing valuable and non-reproducible evidence (by the
way, CO2 CAN be a liquid, at 10,000psi). I did not write the program, but
fortunately they had purchased the library in source form. Fixing it was
a real trial because there was no error-propagation path (why bother, when
you’re going to exit?) so it was not easy to change hundreds of little
subroutines from void to DWORD (actually, ERROR_TYPE, which was typedefed
to DWORD) so the error was propagated back.

Far too many programmers were taught, either by their professor or books,
that the proper response to a serious error is to exit the program. So
when we were using dbx (the Berkely predecessor of gdb), any time it got
uncomfortable, it exited. One of our staff found there were several
hundred exit() calls in it, and he meticulously replaced each one with
code that would simply return an error. Before he did this, it was simply
unusable.

So yes, ExitProcess is evil, and should never be used. Because most
(perhaps all) instances I have seen of its use were actually incorrect,
resulting in programs that simply disappeared in a puff of greasy blue
smoke. Have you ever been on the wrong side of a tech support call from
an important customer, trying to find out what happened? “It just
disappeared. No dialog box, no debug output, just gone” and then try to
reproduce it? I have been, and it was not a comfortable place to be. I
solved the problem by finding all the exit() calls and telling the
responsible programmer that he had a week to replace every one of them
with a sane recovery mechanism, and exiting the program was not a
permissible option. The feedback after product release was “This release
is so much better! It no longer crashes!” (Lots of users had the
problem, and attributed it to poor quality control and poor testing; only
one of them called, and the VP sales called me to take the call, because
all the tech support people had gone to lunch).

It is every bit as evil as ExitThread.

> My opinion is that kernel programming should be treated as an exact
> science
> in which we have to be pragmatic, correct, unequivocal and 100% loyal to
> our
> principles. Because if we don’t everybody knows the consequences.
>
> Programming in ring 3 in my book can be considered a craftsmanship at
> least
> in part because it offers us the luxury to weigh in various factors other
> than sterile academic correctness such as reasonability, user experience
> and
> the human factor.

Human factors are critical. The program does not own the right to exit in
case of an error (I’m talking primarily about GUI programs here); that
right is granted solely to the user.

I first hit this when Pascal was still a new toy. The problem was that
the programmer had used some function that read a number from the console,
but if you typed an ill-formed number (such as “1…23”, “cat”, or “1r6”
[note r and e are adjacent keys on the U.S. keyboard layout, and perhaps
others), the read-number routine just exited the program, instead of
returning an error code so the message "Your number ‘1r6’ is not a valid
number, please re-enter it). He had written a program to help teachers
enter grades, so if they had 50 students in the class and made an error
typing a number for the 50th student, the program exited, bang, and the
poor user had to start over). So I showed how to write code that parsed a
number (using an FSM) and issued the error message for ill-formed numbers.
Yes, this was in the Bad Old Days of Teletype input, prompt-and-read
paradigm, which made it that much worse). Note that _ttoi() will return 0
for all those examples, which is equallly unacceptable.

>
> //Daniel
>
>
> “Now some of us build and some of us teach,
> some of us build and some of us teach.
> And some of us kill what some of us eat.
> That is a fact of life.”
>
I teach how to build. How to build buildings that won’t fall down, that
have restrooms on every floor, and water fountains, and other nice
features (we used to live in an apartment building with an architect, and
I learned a lot listening to him…such as don’t put the food service
counter in a location that causes a line to form that blocks access to the
doors, or the restrooms). The hardest problem to teach newly-minted
programmers is that “it works” is not sufficient. Third-order effects can
kill you.

I once had a debate in which one programmer said, of his device driver,
“if I see something seriously wrong, I just call BugCheck()” and thought
not only did that make sense, but defended his choice. So I said,
“Suppose you had a guest at your house. He uses the bathroom, and finds
you are out of toilet paper and don’t have a fresh roll in arm’s reach.
So his response is to burn down your house. Right. Your driver is a
guest in the operating system, and is not entitled to burn the house down
for trivial problems. And all problems must be considered trivial from
the viewpoint of the operating system”.
joe
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

“There are nine and sixty ways,
of writing tribal lays.
And every single one of them
is right.” [Rudyard Kipling]

There are lots of “right” ways to do things, but there are a MASSIVELY
large number of /wrong/ ways. BugCheck() when the device has a parity
error, for example.

Wait, wait, wait, wait,wait…

You’re not saying that there is one correct answer to a given problem,
right? There is *never* one correct answer to an engineering problem.
There are merely choices among correct answers which optimize different
factors to obtain the desired results. So, it’s not “an exact science” in
MY book.

Peter
OSR


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

xxxxx@flounder.com” wrote in message
news:xxxxx@ntdev:

>
> I once had a debate in which one programmer said, of his device driver,
> “if I see something seriously wrong, I just call BugCheck()” and thought
> not only did that make sense, but defended his choice. So I said,
> “Suppose you had a guest at your house. He uses the bathroom, and finds
> you are out of toilet paper and don’t have a fresh roll in arm’s reach.
> So his response is to burn down your house. Right. Your driver is a
> guest in the operating system, and is not entitled to burn the house down
> for trivial problems. And all problems must be considered trivial from
> the viewpoint of the operating system”.

This is one of the challenges still out there. I know a lot of driver
developers for another OS that allocate memory and never check for NULL
being returned. Their logic is “I need the memory this is a serious
flaw if I don’t get it and the best thing to do is crash the system in a
way to make it easy to know where the problem was!” Of course I’ve
seen in a number of drivers in Windows the variant where they test the
return from the allocation and if a failure just call KeBugCheck.

There is almost never a justification for calling BugCheck, having the
driver fail the request5 or if need be go into a mode where it fails all
requests is better than crashing the system.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

>…and my sister was a personal helper of Michael Khodorkovsky and I need your
help in getting his assets out of Russia

Voldemor Voldemorovitch would like to have a word with you…

Yes, Don, but we are well-trained in the idea of making sure systems are
RELIABLE and have minimal crashes and downtime. This is not a criterion
for software goodness that most programmers have. I consider the approach
of “OK, let’s just crash the whole OS if the pointer comes back NULL” to
be in the wannabes-aspiring-to-become-incompetent-amateurs school of
programming. I once spent a year making sure the OS could not crash; it
was the OS for our multiprocessor hardware back in 1975. I spent most of
1976 working around both hardware failures and undiagnosable software
failures, and ended up with an absolutely bulletproof system that
“crashed” about once a day, but recovered without having to take the whole
system down. Most of my effort was in re-creating the service state from
first principles, and doing the equivalent of failing all the IRPs that
were pending. This is a lot harder than it sounds, and one programmer
said “But communication to the user keyboard and screen simply cannot
fail!” and I asked him to point out where this guarantee appeared in the
system documentation (which I had written). He had made an unfortunate
assumption. And I told him that if his apps failed because he wasn’t
looking at completion status, it was not MY problem. The protocol had
been documented for several years; all I did in 1975 was rewrite the
documentation to be coherent.

Unix programmers seem more prone to this attitude than most others. Key
is to recognize that ANY resource request can fail, and what are you going
to do about it when it does fail. First answer: Do your best, in spite of
that, to fulfill the request (e.g., build partial MDLs if the buffers are
too large). Second answer: fail “gracefully” so the app gets an error
return, but the driver doesn’t wedge because of (for example) an
uncompleted IRP or blocked IRP queue. There is no third answer. And any
answer that leads to system shutdown is so far beyond acceptable that
“totally unacceptable” is an understatement.

In my Advanced System Programming course, I tell the students to put a
5-second timeout on semaphores and an infinite timeout on mutexes.
Complete the assignment using these specs. Then I ask them, “Tell me what
you’d do if you put a timeout or ABANDONED_MUTEX on a Mutex”. They
invariably are totally clueless. Simplified, the flow is

WaitForSingleObject(semaphore, 5000);
// if the above returns an error, return FALSE, indicating nothing dequeued
WaitForSingleObject(mutex, 5000);
// What do you do if it has a ABANDONED_MUTEX return? What does this
mean? What do you
// do if the timeout happens?

They NEVER put a ReleaseSemaphore call in their failed-mutex recovery
path! And they have no idea of the implications of ABANDONED_MUTEX (which
was one of the many problems I had to deal with in making my code robust
in the presence of unreliable hardware and infrastructure).
joe

xxxxx@flounder.com” wrote in message
> news:xxxxx@ntdev:
>
>>
>> I once had a debate in which one programmer said, of his device driver,
>> “if I see something seriously wrong, I just call BugCheck()” and thought
>> not only did that make sense, but defended his choice. So I said,
>> “Suppose you had a guest at your house. He uses the bathroom, and finds
>> you are out of toilet paper and don’t have a fresh roll in arm’s reach.
>> So his response is to burn down your house. Right. Your driver is a
>> guest in the operating system, and is not entitled to burn the house
>> down
>> for trivial problems. And all problems must be considered trivial from
>> the viewpoint of the operating system”.
>
> This is one of the challenges still out there. I know a lot of driver
> developers for another OS that allocate memory and never check for NULL
> being returned. Their logic is “I need the memory this is a serious
> flaw if I don’t get it and the best thing to do is crash the system in a
> way to make it easy to know where the problem was!” Of course I’ve
> seen in a number of drivers in Windows the variant where they test the
> return from the allocation and if a failure just call KeBugCheck.
>
> There is almost never a justification for calling BugCheck, having the
> driver fail the request5 or if need be go into a mode where it fails all
> requests is better than crashing the system.
>
>
> Don Burn
> Windows Filesystem and Driver Consulting
> Website: http://www.windrvr.com
> Blog: http://msmvps.com/blogs/WinDrvr
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>