A basic question regarding Iocompletion

Bedanto · August 2, 2012, 3:28pm

I have a requirement, where by our filter driver (disk filter), needs to
update certain meta data upon eery IRP_MJ_WRITE it receives. the
requirement is such that the in flight IRP woul be paused till the
corresponding meta data is written to the disk and then it will be let to
hit the disk itself.

So for every IO, we would be generating another IO. Thus we need to role
our own IRPs in the write handler of the disk filter.

For the prototype, I want to implement the meta data writer as a
synchronous operation. That is my function WriteMetaData would create an
IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine
and then do an IoCallDriver. If IOCallDriver returns status pending, we
wait on an event infinitely. In WriteMetaDataCompletion, we set the event
and thus WriteMetaData completes.

This much I am clear about, my doubts are:

assuming that we get and IRP_MJ_WRITE from above, and inside it I call
WriteMetaData (which is a blocking call) do I still need to mark the write
IRP as peding?
If we want to make a better design, and make both the meta data write
and the original IRP completion asynchronous, then is there a way to
*guarantee* that the meta data write will hit the disk before the original
IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our meta
data IRP a paging_io?

thanks

B

Doron_Holan · August 2, 2012, 3:49pm

I think that your design that metadata writes are synchronous is broken since writes can come in to the disk at IRQL > passive level, which fundamentally requires you to be async in your metadata writes. Assuming you could get away with synchronous metadata writes, you would not need to return STATUS_PENDING in the io dispatch routine because you have not pended the original write irp in any way (just delayed it in the calling thread).

d

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Bedanto
Sent: Thursday, August 2, 2012 12:28 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] A basic question regarding Iocompletion

I have a requirement, where by our filter driver (disk filter), needs to update certain meta data upon eery IRP_MJ_WRITE it receives. the requirement is such that the in flight IRP woul be paused till the corresponding meta data is written to the disk and then it will be let to hit the disk itself.

So for every IO, we would be generating another IO. Thus we need to role our own IRPs in the write handler of the disk filter.

For the prototype, I want to implement the meta data writer as a synchronous operation. That is my function WriteMetaData would create an IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine and then do an IoCallDriver. If IOCallDriver returns status pending, we wait on an event infinitely. In WriteMetaDataCompletion, we set the event and thus WriteMetaData completes.

This much I am clear about, my doubts are:

assuming that we get and IRP_MJ_WRITE from above, and inside it I call WriteMetaData (which is a blocking call) do I still need to mark the write IRP as peding?
If we want to make a better design, and make both the meta data write and the original IRP completion asynchronous, then is there a way to *guarantee* that the meta data write will hit the disk before the original IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our meta data IRP a paging_io?

thanks

B
— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Bedanto · August 2, 2012, 4:20pm

Doron,

thanks for the prompt reply. Yes, I have read that at volume/disk level one
can get IRPs at > passive level. But so far, I am unable to see any such
IOs. this prototype runs on all operating system from xp till win 7, both
client and server,and on varied hardware, and in some RAID configuraions
also.

I have implented asserts in the write and the completion path to stop if
irql > dispatch, but so far the yhave not been hit.

can you please give me a scenario as to how we can generate this condition
in the lab or real world?

B

On Fri, Aug 3, 2012 at 1:18 AM, Doron Holan wrote:

> I think that your design that metadata writes are synchronous is broken
> since writes can come in to the disk at IRQL > passive level, which
> fundamentally requires you to be async in your metadata writes. Assuming
> you could get away with synchronous metadata writes, you would not need to
> return STATUS_PENDING in the io dispatch routine because you have not
> pended the original write irp in any way (just delayed it in the calling
> thread).
>
>
>
> d
>
> ****
>
> From: xxxxx@lists.osr.com [mailto:
> xxxxx@lists.osr.com] On Behalf Of Bedanto
> Sent: Thursday, August 2, 2012 12:28 PM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] A basic question regarding Iocompletion
>
> **
>
> I have a requirement, where by our filter driver (disk filter), needs to
> update certain meta data upon eery IRP_MJ_WRITE it receives. the
> requirement is such that the in flight IRP woul be paused till the
> corresponding meta data is written to the disk and then it will be let to
> hit the disk itself.
>
>
>
> So for every IO, we would be generating another IO. Thus we need to role
> our own IRPs in the write handler of the disk filter.
>
>
>
> For the prototype, I want to implement the meta data writer as a
> synchronous operation. That is my function WriteMetaData would create an
> IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine
> and then do an IoCallDriver. If IOCallDriver returns status pending, we
> wait on an event infinitely. In WriteMetaDataCompletion, we set the event
> and thus WriteMetaData completes.
>
>
>
> This much I am clear about, my doubts are:
>
>
>
>
>
>
>
> 1. assuming that we get and IRP_MJ_WRITE from above, and inside it I call
> WriteMetaData (which is a blocking call) do I still need to mark the write
> IRP as peding?
>
>
>
> 2. If we want to make a better design, and make both the meta data write
> and the original IRP completion asynchronous, then is there a way to
> guarantee that the meta data write will hit the disk before the original
> IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our meta
> data IRP a paging_io?
>
>
>
> thanks
>
>
>
> B
>
> — NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
> other seminars visit: http://www.osr.com/seminars To unsubscribe, visit
> the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer ****
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Maxim_S_Shatskih · August 2, 2012, 11:34pm

Install any 3rd party volume filter, most of them send IO down at DISPATCH.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com
“Bedanto” wrote in message news:xxxxx@ntdev…
Doron,

thanks for the prompt reply. Yes, I have read that at volume/disk level one can get IRPs at > passive level. But so far, I am unable to see any such IOs. this prototype runs on all operating system from xp till win 7, both client and server,and on varied hardware, and in some RAID configuraions also.

I have implented asserts in the write and the completion path to stop if irql > dispatch, but so far the yhave not been hit.

can you please give me a scenario as to how we can generate this condition in the lab or real world?

B

On Fri, Aug 3, 2012 at 1:18 AM, Doron Holan wrote:

I think that your design that metadata writes are synchronous is broken since writes can come in to the disk at IRQL > passive level, which fundamentally requires you to be async in your metadata writes. Assuming you could get away with synchronous metadata writes, you would not need to return STATUS_PENDING in the io dispatch routine because you have not pended the original write irp in any way (just delayed it in the calling thread).

d

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Bedanto
Sent: Thursday, August 2, 2012 12:28 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] A basic question regarding Iocompletion

I have a requirement, where by our filter driver (disk filter), needs to update certain meta data upon eery IRP_MJ_WRITE it receives. the requirement is such that the in flight IRP woul be paused till the corresponding meta data is written to the disk and then it will be let to hit the disk itself.

So for every IO, we would be generating another IO. Thus we need to role our own IRPs in the write handler of the disk filter.

For the prototype, I want to implement the meta data writer as a synchronous operation. That is my function WriteMetaData would create an IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine and then do an IoCallDriver. If IOCallDriver returns status pending, we wait on an event infinitely. In WriteMetaDataCompletion, we set the event and thus WriteMetaData completes.

This much I am clear about, my doubts are:

1. assuming that we get and IRP_MJ_WRITE from above, and inside it I call WriteMetaData (which is a blocking call) do I still need to mark the write IRP as peding?

2. If we want to make a better design, and make both the meta data write and the original IRP completion asynchronous, then is there a way to guarantee that the meta data write will hit the disk before the original IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our meta data IRP a paging_io?

thanks

B

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · August 3, 2012, 2:26am

Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.
joe

Doron,

thanks for the prompt reply. Yes, I have read that at volume/disk level
one
can get IRPs at > passive level. But so far, I am unable to see any such
IOs. this prototype runs on all operating system from xp till win 7, both
client and server,and on varied hardware, and in some RAID configuraions
also.

I have implented asserts in the write and the completion path to stop if
irql > dispatch, but so far the yhave not been hit.

can you please give me a scenario as to how we can generate this condition
in the lab or real world?

B

On Fri, Aug 3, 2012 at 1:18 AM, Doron Holan
wrote:
>
>> I think that your design that metadata writes are synchronous is broken
>> since writes can come in to the disk at IRQL > passive level, which
>> fundamentally requires you to be async in your metadata writes.
>> Assuming
>> you could get away with synchronous metadata writes, you would not need
>> to
>> return STATUS_PENDING in the io dispatch routine because you have not
>> pended the original write irp in any way (just delayed it in the calling
>> thread).
>>
>>
>>
>> d
>>
>> ****
>>
>> From: xxxxx@lists.osr.com [mailto:
>> xxxxx@lists.osr.com] On Behalf Of Bedanto
>> Sent: Thursday, August 2, 2012 12:28 PM
>> To: Windows System Software Devs Interest List
>> Subject: [ntdev] A basic question regarding Iocompletion
>>
>> **
>>
>> I have a requirement, where by our filter driver (disk filter), needs to
>> update certain meta data upon eery IRP_MJ_WRITE it receives. the
>> requirement is such that the in flight IRP woul be paused till the
>> corresponding meta data is written to the disk and then it will be let
>> to
>> hit the disk itself.
>>
>>
>>
>> So for every IO, we would be generating another IO. Thus we need to role
>> our own IRPs in the write handler of the disk filter.
>>
>>
>>
>> For the prototype, I want to implement the meta data writer as a
>> synchronous operation. That is my function WriteMetaData would create an
>> IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine
>> and then do an IoCallDriver. If IOCallDriver returns status pending, we
>> wait on an event infinitely. In WriteMetaDataCompletion, we set the
>> event
>> and thus WriteMetaData completes.
>>
>>
>>
>> This much I am clear about, my doubts are:
>>
>>
>>
>>
>>
>>
>>
>> 1. assuming that we get and IRP_MJ_WRITE from above, and inside it I
>> call
>> WriteMetaData (which is a blocking call) do I still need to mark the
>> write
>> IRP as peding?
>>
>>
>>
>> 2. If we want to make a better design, and make both the meta data write
>> and the original IRP completion asynchronous, then is there a way to
>> guarantee that the meta data write will hit the disk before the
>> original
>> IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our
>> meta
>> data IRP a paging_io?
>>
>>
>>
>> thanks
>>
>>
>>
>> B
>>
>> — NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging
>> and
>> other seminars visit: http://www.osr.com/seminars To unsubscribe, visit
>> the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer ****
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · August 3, 2012, 2:34am

> I have a requirement, where by our filter driver (disk filter), needs to

update certain meta data upon eery IRP_MJ_WRITE it receives. the
requirement is such that the in flight IRP woul be paused till the
corresponding meta data is written to the disk and then it will be let to
hit the disk itself.

So for every IO, we would be generating another IO. Thus we need to role
our own IRPs in the write handler of the disk filter.

For the prototype, I want to implement the meta data writer as a
synchronous operation. That is my function WriteMetaData would create an
IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine
and then do an IoCallDriver. If IOCallDriver returns status pending, we
wait on an event infinitely. In WriteMetaDataCompletion, we set the event
and thus WriteMetaData completes.

This much I am clear about, my doubts are:

assuming that we get and IRP_MJ_WRITE from above, and inside it I call
WriteMetaData (which is a blocking call) do I still need to mark the write
IRP as peding?

If we want to make a better design, and make both the meta data write
and the original IRP completion asynchronous, then is there a way to
*guarantee* that the meta data write will hit the disk before the original
IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our meta
data IRP a paging_io?
I have not studied file system filter drivers, but the word “hint” usually
translates as “Here is soe information that suggests that if you feel like
it, you might want to take advantage of this information” which is quite
different from an option that says “do it THIS way”. So I’d be careful of
depending on anything called “hint” having any guaranteed effect in the
wide world (as opposed to the one machine you are developing on, with the
current OS version, service pack, hot fixes, type of disk, current version
of the driver, the vendor-specific driver for your card, and for all I
know, the current radial separation of Mars and Jupiter, expressed in
radians)
joe

thanks

B

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

MBond · August 3, 2012, 11:24am

In your meta-data write, you will want to include the Force Unit Access bit. Otherwise, the underlying hardware is free to cache or reorder as it sees fit – and be aware that some hardware lies and ignores this bit, but users of such hardware can hardly blame your software for that.

“Bedanto” wrote in message news:xxxxx@ntdev…
I have a requirement, where by our filter driver (disk filter), needs to update certain meta data upon eery IRP_MJ_WRITE it receives. the requirement is such that the in flight IRP woul be paused till the corresponding meta data is written to the disk and then it will be let to hit the disk itself.

So for every IO, we would be generating another IO. Thus we need to role our own IRPs in the write handler of the disk filter.

For the prototype, I want to implement the meta data writer as a synchronous operation. That is my function WriteMetaData would create an IRP, fill up the necessary fields, setup WriteMetaDataCompletion routine and then do an IoCallDriver. If IOCallDriver returns status pending, we wait on an event infinitely. In WriteMetaDataCompletion, we set the event and thus WriteMetaData completes.

This much I am clear about, my doubts are:

1. assuming that we get and IRP_MJ_WRITE from above, and inside it I call WriteMetaData (which is a blocking call) do I still need to mark the write IRP as peding?

2. If we want to make a better design, and make both the meta data write and the original IRP completion asynchronous, then is there a way to guarantee that the meta data write will hit the disk before the original IRP? Can IO_PRIORITY_HINT be of any help to us? Or perhaps making our meta data IRP a paging_io?

thanks

B

Tim_Roberts · August 3, 2012, 12:37pm

xxxxx@flounder.com wrote:

Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.

I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Alex_Grig · August 3, 2012, 1:13pm

ASSERT in the kernel mode without a debugger attached just continues the execution, unlike an explicit breakpoint.

OSR_Community_User · August 6, 2012, 8:43am

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

xxxxx@flounder.com wrote:
> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.

I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · August 6, 2012, 12:54pm

> I always write

ASSERT(p != NULL);
if(p == NULL)
recover

I second this.

You’re absolutely, fully correct.

ASSERT is evil due to a) moronic way of providing an error message b) moronic way of recovering from the error.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Tim_Roberts · August 6, 2012, 1:01pm

xxxxx@flounder.com wrote:

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

You’re right, I was wrong. I tried it. ASSERT without the debugger
sends a DbgPrint message but does not blue screen.

I was thinking about embedded hard-coded breakpoints, which DO cause a
blue screen.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

MBond · August 6, 2012, 1:30pm

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly, the statement

‘The only possible way a program is allowed to exit is if the user requests
it to terminate; no mechanism that terminates execution is permissible.’

I can’t quite agree with. Certainly, this is quite correct when building a
single threaded DOS application, but many systems and programming paradigms
have the concept of panic or master alarm and in some cases ABEND is the
only sane course of action. We see posts from many who want to continue
after memory corruption, or unhandled KM exceptions and we try our best to
dissuade them because their task is impossible. Sometimes, the same is true
in a UM app too and there is no possible way to continue safely. Failure of
RevertToSelf or HeapUnlock is one example, and another would be corrupted
state in some multi-threaded designs. You could argue that designs and APIs
that can result in unrecoverable failures ought not be used, and certainly
to minimize the number of unrecoverable failure paths in a design is a
worthwhile objective, but some activities simply require designs that can
have unrecoverable failures.

wrote in message news:xxxxx@ntdev…

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

xxxxx@flounder.com wrote:
> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.

I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Scott_Noone_OSR · August 6, 2012, 2:45pm

“m” wrote in message news:xxxxx@ntdev…

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly…

Sorry, don’t agree with this at all (nor do I agree with Max’s, er,
assertion, that, “ASSERT is evil”). For me, ASSERTs are great provided that
they’re used as they are intended to be used. They can make the assumed
environment of a particular routine explicit to future maintainers of the
code (including the original author), which makes things that much more
future proof.

For example, say I have an I/O event processing callback that does some
validation of the incoming buffer then calls a helper routine:

{
if (BufferLen < MIN_BUFFER_LEN) {
// fail request
return;
}

DoStuff(Buffer, BufferLen);
}

Then, in DoStuff I ASSERT that the buffer passed in meets the minimum
requirements:

DoStuff
{
ASSERT(BufferLenParam >= MIN_BUFFER_LEN);
}

This helps me in two ways. First, if DoStuff grows to DoMoreStuff and gets
called from multiple places, it’s clear that the code was originally written
with a restriction on the incoming size buffer. Second, if I’m doing a code
review I get some quick insight into the runtime environment of this
function without having to track down every reference to it (which brings up
the issue of incorrect ASSERTs, but that’s a different problem).

Or how about a helper function written with the assumption that a lock is
held when it’s called? Why would ASSERTing that the appropriate lock is held
be evil? Or ASSERTing the IRQL restrictions on a particular routine?

Admittedly, more and more of this can be done with the SAL notations, though
I find ASSERTs to be clear, easy, and useful. Of course if you have someone
using ASSERTs as their only method of error handling then you’re doomed,
though that should be dealt with through your coding guidelines and not pass
any reasonable code review.

-scott

–
Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“m” wrote in message news:xxxxx@ntdev…

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly, the statement

‘The only possible way a program is allowed to exit is if the user requests
it to terminate; no mechanism that terminates execution is permissible.’

I can’t quite agree with. Certainly, this is quite correct when building a
single threaded DOS application, but many systems and programming paradigms
have the concept of panic or master alarm and in some cases ABEND is the
only sane course of action. We see posts from many who want to continue
after memory corruption, or unhandled KM exceptions and we try our best to
dissuade them because their task is impossible. Sometimes, the same is true
in a UM app too and there is no possible way to continue safely. Failure of
RevertToSelf or HeapUnlock is one example, and another would be corrupted
state in some multi-threaded designs. You could argue that designs and APIs
that can result in unrecoverable failures ought not be used, and certainly
to minimize the number of unrecoverable failure paths in a design is a
worthwhile objective, but some activities simply require designs that can
have unrecoverable failures.

wrote in message news:xxxxx@ntdev…

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

xxxxx@flounder.com wrote:
> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.

I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · August 6, 2012, 5:44pm

> While I agree that ASSERT is at best a shortcut for proper error handing

code and at worst something quite deadly, the statement

‘The only possible way a program is allowed to exit is if the user
requests
it to terminate; no mechanism that terminates execution is permissible.’

I can’t quite agree with. Certainly, this is quite correct when building
a
single threaded DOS application, but many systems and programming
paradigms
have the concept of panic or master alarm and in some cases ABEND is the
only sane course of action. We see posts from many who want to continue
after memory corruption, or unhandled KM exceptions and we try our best to
dissuade them because their task is impossible. Sometimes, the same is
true
in a UM app too and there is no possible way to continue safely. Failure
of
RevertToSelf or HeapUnlock is one example, and another would be corrupted
state in some multi-threaded designs. You could argue that designs and
APIs
that can result in unrecoverable failures ought not be used, and certainly
to minimize the number of unrecoverable failure paths in a design is a
worthwhile objective, but some activities simply require designs that can
have unrecoverable failures.

Boy, that’s a term I haven’t heard in decades: ABEND. It was a mistake in
the IBM world, and it remains a mistake. Note that any application that
has a user app must not exit unless the user asks it to exit; otherwise,
your tech support people get a phone call that said “The app just
disappeared”, which is less than useful. As soon as the attitude that
exiting exists, somebody will code the exit operation instead of doing the
job right and recovering, then some componeents of the app will interfere
with the reliability and robustness of other components of the app. So
all the files are closed, but what is the impact of this on the live data
collection? I’ve not seen one instance in the last 40 years where exiting
a program made sense. I remember trying to integrate a math library and
deal with things like sqrt(N) for N < 0. That was in 1968, and I’ve never
seen an improvement; the correct solution is the library raise an
exception, so the caller can intercept the exception. Or provide for an
application-specific error handler function, in which I can decide what to
do. But I once used a database library which, if it discovered a
corrupted index file, just exited the program, thus shutting off realtime
data collection, which had nothing to do with a background query. It is
not single-threaded; multithread apps make exiting even more wrong as a
form of behavior. I need that thread there to stay live, and I don’t care
that some other thread took a SQRT(-1) and has an error; the app must not
stop.

When shutting everything down is the only course of action, for a GUI app
this means disabling all menu items except File>Exit and Help. I’ve never
seen anything go this bad in years, because I never allow it to get that
far.
joe

wrote in message news:xxxxx@ntdev…

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

> xxxxx@flounder.com wrote:
>> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.
>
> I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

MBond · August 6, 2012, 5:59pm

This is what I would call a shortcut for error handling. The long hand
would be something like

#ifdef _DEBUG_OR_SOME_OTHER_DEF
if(condition)
{
RaiseErrorInSomeWayThatMakesSenseForThisEnvironment(message)
}
#endif

or
if(condition)
{
#ifdef _DEBUG_OR_SOME_OTHER_DEF
RaiseErrorInSomeWayThatMakesSenseForThisEnvironment(message)
#else
GracefulFailureOfSomeKind();
#endif
}

And as you say are ideal for verifying assumptions during testing. They are
not an error handling method and if used as such can be deadly; especially
in environments where exceptions are deadly.

In my opinion, error handling is one of the most difficult aspects of
programming for learners and an area where there is wide disagreement on
what constitutes good practice. Personally, for C code, I prefer the goto
‘ladder’ approach for control flow thru functions with multiple failure
points as it provides a clear egress path with no duplication of code (see
example below) but others will prefer nested if statements, stack frames or
one of several other styles. The important point is that, as Joe says, they
can fail gracefully with rollback after detecting a problem. What I object
to is the absolute terms in which he advocates this without considering what
should be done in a case like the second example

bool InitSomeStruct(SOME_STRUCT* pSS)
{
pSS->pBuffer1 = AllocateSomeBuffer();
if(pSS->pBuffer1 == NULL)
{
goto abort1;
}

pSS->pBuffer2 = AllocateSomeBuffer();
if(pSS->pBuffer2 == NULL)
{
goto abort2;
}
return true;

//abort3:
// FreeSomeBuffer(pSS->pBuffer2);
abort2:
FreeSomeBuffer(pSS->pBuffer1);
abort1:
return false;
}

void ProcessSomeRequest(REQUEST_DESCRIPTOR* pRequest)
{
// a request has been received from a user and is described by pRequest
if(!ImpersonateRequestUser(pRequest))
{
// send back some error
return;
}
TRY
{
DoSomethingToProcessTheRequest(pRequest);
}
FINALLY
{
if(!RevertToSelf())
{
// Now what? This call should never fail, but since it has, we
can conclude that either:
// 1) there is memory corruption
// 2) some code inside DoSomethingToProcessTheRequest has
allowed a hacker to call RevertToSelf already
// 3) the host OS is broken in some other way
// it is not safe to continue executing in this unknown security
context because we
// can’t know what the thread will do next, and it is not safe
in general to simply exit the thread
// because we know that the memory space has been compromised in
some way
// In a UM, attempt ExitProcess; in KM KeBugCheck; in an
embedded system, raise whatever panic signal
// that the environment defines and let the hardware reboot.
This is an unrecoverable error and an
// appropriate time to ABEND
}
}
}

“Scott Noone” wrote in message news:xxxxx@ntdev…

“m” wrote in message news:xxxxx@ntdev…

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly…

Sorry, don’t agree with this at all (nor do I agree with Max’s, er,
assertion, that, “ASSERT is evil”). For me, ASSERTs are great provided that
they’re used as they are intended to be used. They can make the assumed
environment of a particular routine explicit to future maintainers of the
code (including the original author), which makes things that much more
future proof.

For example, say I have an I/O event processing callback that does some
validation of the incoming buffer then calls a helper routine:

{
if (BufferLen < MIN_BUFFER_LEN) {
// fail request
return;
}

DoStuff(Buffer, BufferLen);
}

Then, in DoStuff I ASSERT that the buffer passed in meets the minimum
requirements:

DoStuff
{
ASSERT(BufferLenParam >= MIN_BUFFER_LEN);
}

This helps me in two ways. First, if DoStuff grows to DoMoreStuff and gets
called from multiple places, it’s clear that the code was originally written
with a restriction on the incoming size buffer. Second, if I’m doing a code
review I get some quick insight into the runtime environment of this
function without having to track down every reference to it (which brings up
the issue of incorrect ASSERTs, but that’s a different problem).

Or how about a helper function written with the assumption that a lock is
held when it’s called? Why would ASSERTing that the appropriate lock is held
be evil? Or ASSERTing the IRQL restrictions on a particular routine?

Admittedly, more and more of this can be done with the SAL notations, though
I find ASSERTs to be clear, easy, and useful. Of course if you have someone
using ASSERTs as their only method of error handling then you’re doomed,
though that should be dealt with through your coding guidelines and not pass
any reasonable code review.

-scott

–
Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“m” wrote in message news:xxxxx@ntdev…

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly, the statement

‘The only possible way a program is allowed to exit is if the user requests
it to terminate; no mechanism that terminates execution is permissible.’

I can’t quite agree with. Certainly, this is quite correct when building a
single threaded DOS application, but many systems and programming paradigms
have the concept of panic or master alarm and in some cases ABEND is the
only sane course of action. We see posts from many who want to continue
after memory corruption, or unhandled KM exceptions and we try our best to
dissuade them because their task is impossible. Sometimes, the same is true
in a UM app too and there is no possible way to continue safely. Failure of
RevertToSelf or HeapUnlock is one example, and another would be corrupted
state in some multi-threaded designs. You could argue that designs and APIs
that can result in unrecoverable failures ought not be used, and certainly
to minimize the number of unrecoverable failure paths in a design is a
worthwhile objective, but some activities simply require designs that can
have unrecoverable failures.

wrote in message news:xxxxx@ntdev…

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

xxxxx@flounder.com wrote:
> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.

I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · August 6, 2012, 8:47pm

Not quite. The correct solution in the second case is to do the graceful
failure, no matter what, and then, if there is an error, raise the
condition. C exceptions are not as elegant or friendly as C++ exceptions,
but they are not an unreasonable way to handle this situation, providing
you make sure that all callers have try/except frames that will get laid
down (it might be the caller of the caller of your caller, but each stage
that has to rollback state needs one).

I did a kernel component in 1975 that went from an MTBF of 45 minutes
(doing the equivalent of BugCheckEx) to an MTBF which was indefinite (we
ran for six solid weeks, 24/7, zero downtime, until we had a campus-wide
power failure). About once a day, the exception condition for total
failure was raised, and the recovery code came into play. The recovery
code was almost as complex as the kernel component itself.
joe

This is what I would call a shortcut for error handling. The long hand
would be something like

#ifdef _DEBUG_OR_SOME_OTHER_DEF
if(condition)
{
RaiseErrorInSomeWayThatMakesSenseForThisEnvironment(message)
}
#endif

or
if(condition)
{
#ifdef _DEBUG_OR_SOME_OTHER_DEF
RaiseErrorInSomeWayThatMakesSenseForThisEnvironment(message)
#else
GracefulFailureOfSomeKind();
#endif
}

And as you say are ideal for verifying assumptions during testing. They
are
not an error handling method and if used as such can be deadly; especially
in environments where exceptions are deadly.

In my opinion, error handling is one of the most difficult aspects of
programming for learners and an area where there is wide disagreement on
what constitutes good practice. Personally, for C code, I prefer the goto
‘ladder’ approach for control flow thru functions with multiple failure
points as it provides a clear egress path with no duplication of code (see
example below) but others will prefer nested if statements, stack frames
or
one of several other styles. The important point is that, as Joe says,
they
can fail gracefully with rollback after detecting a problem. What I
object
to is the absolute terms in which he advocates this without considering
what
should be done in a case like the second example

bool InitSomeStruct(SOME_STRUCT* pSS)
{
pSS->pBuffer1 = AllocateSomeBuffer();
if(pSS->pBuffer1 == NULL)
{
goto abort1;
}

pSS->pBuffer2 = AllocateSomeBuffer();
if(pSS->pBuffer2 == NULL)
{
goto abort2;
}
return true;

//abort3:
// FreeSomeBuffer(pSS->pBuffer2);
abort2:
FreeSomeBuffer(pSS->pBuffer1);
abort1:
return false;
}

void ProcessSomeRequest(REQUEST_DESCRIPTOR* pRequest)
{
// a request has been received from a user and is described by
pRequest
if(!ImpersonateRequestUser(pRequest))
{
// send back some error
return;
}
TRY
{
DoSomethingToProcessTheRequest(pRequest);
}
FINALLY
{
if(!RevertToSelf())
{
// Now what? This call should never fail, but since it has,
we
can conclude that either:
// 1) there is memory corruption
// 2) some code inside DoSomethingToProcessTheRequest has
allowed a hacker to call RevertToSelf already
// 3) the host OS is broken in some other way
// it is not safe to continue executing in this unknown
security
context because we
// can’t know what the thread will do next, and it is not safe
in general to simply exit the thread
// because we know that the memory space has been compromised
in
some way
// In a UM, attempt ExitProcess; in KM KeBugCheck; in an
embedded system, raise whatever panic signal
// that the environment defines and let the hardware reboot.
This is an unrecoverable error and an
// appropriate time to ABEND
}
}
}

“Scott Noone” wrote in message news:xxxxx@ntdev…

“m” wrote in message news:xxxxx@ntdev…
>While I agree that ASSERT is at best a shortcut for proper error handing
>code and at worst something quite deadly…

Sorry, don’t agree with this at all (nor do I agree with Max’s, er,
assertion, that, “ASSERT is evil”). For me, ASSERTs are great provided
that
they’re used as they are intended to be used. They can make the assumed
environment of a particular routine explicit to future maintainers of the
code (including the original author), which makes things that much more
future proof.

For example, say I have an I/O event processing callback that does some
validation of the incoming buffer then calls a helper routine:

{
if (BufferLen < MIN_BUFFER_LEN) {
// fail request
return;
}

DoStuff(Buffer, BufferLen);
}

Then, in DoStuff I ASSERT that the buffer passed in meets the minimum
requirements:

DoStuff
{
ASSERT(BufferLenParam >= MIN_BUFFER_LEN);
}

This helps me in two ways. First, if DoStuff grows to DoMoreStuff and gets
called from multiple places, it’s clear that the code was originally
written
with a restriction on the incoming size buffer. Second, if I’m doing a
code
review I get some quick insight into the runtime environment of this
function without having to track down every reference to it (which brings
up
the issue of incorrect ASSERTs, but that’s a different problem).

Or how about a helper function written with the assumption that a lock is
held when it’s called? Why would ASSERTing that the appropriate lock is
held
be evil? Or ASSERTing the IRQL restrictions on a particular routine?

Admittedly, more and more of this can be done with the SAL notations,
though
I find ASSERTs to be clear, easy, and useful. Of course if you have
someone
using ASSERTs as their only method of error handling then you’re doomed,
though that should be dealt with through your coding guidelines and not
pass
any reasonable code review.

-scott

–
Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“m” wrote in message news:xxxxx@ntdev…

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly, the statement

‘The only possible way a program is allowed to exit is if the user
requests
it to terminate; no mechanism that terminates execution is permissible.’

I can’t quite agree with. Certainly, this is quite correct when building
a
single threaded DOS application, but many systems and programming
paradigms
have the concept of panic or master alarm and in some cases ABEND is the
only sane course of action. We see posts from many who want to continue
after memory corruption, or unhandled KM exceptions and we try our best to
dissuade them because their task is impossible. Sometimes, the same is
true
in a UM app too and there is no possible way to continue safely. Failure
of
RevertToSelf or HeapUnlock is one example, and another would be corrupted
state in some multi-threaded designs. You could argue that designs and
APIs
that can result in unrecoverable failures ought not be used, and certainly
to minimize the number of unrecoverable failure paths in a design is a
worthwhile objective, but some activities simply require designs that can
have unrecoverable failures.

wrote in message news:xxxxx@ntdev…

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

> xxxxx@flounder.com wrote:
>> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.
>
> I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · August 6, 2012, 8:52pm

> “m” wrote in message news:xxxxx@ntdev…

>While I agree that ASSERT is at best a shortcut for proper error handing
>code and at worst something quite deadly…

Sorry, don’t agree with this at all (nor do I agree with Max’s, er,
assertion, that, “ASSERT is evil”). For me, ASSERTs are great provided
that
they’re used as they are intended to be used. They can make the assumed
environment of a particular routine explicit to future maintainers of the
code (including the original author), which makes things that much more
future proof.

Scott, I absolutlely agree with you. How many times has someone called a
function with a NULL pointer, even though NULL is not a valid value? How
many times have they NOT called a function with a NULL pointer, but
written another one entirely, not realizing the first function finds NULL
perfectly acceptable.

ASSERTs really do help the documentation process, in particular, because
they are “live” they tend to not get out-of-date like textual comments.

But, as you point out later, if the only use of ASSERTs is for debugging,
the programmer is doomed.
joe

For example, say I have an I/O event processing callback that does some
validation of the incoming buffer then calls a helper routine:

{
if (BufferLen < MIN_BUFFER_LEN) {
// fail request
return;
}

DoStuff(Buffer, BufferLen);
}

Then, in DoStuff I ASSERT that the buffer passed in meets the minimum
requirements:

DoStuff
{
ASSERT(BufferLenParam >= MIN_BUFFER_LEN);
}

This helps me in two ways. First, if DoStuff grows to DoMoreStuff and gets
called from multiple places, it’s clear that the code was originally
written
with a restriction on the incoming size buffer. Second, if I’m doing a
code
review I get some quick insight into the runtime environment of this
function without having to track down every reference to it (which brings
up
the issue of incorrect ASSERTs, but that’s a different problem).

Or how about a helper function written with the assumption that a lock is
held when it’s called? Why would ASSERTing that the appropriate lock is
held
be evil? Or ASSERTing the IRQL restrictions on a particular routine?

Admittedly, more and more of this can be done with the SAL notations,
though
I find ASSERTs to be clear, easy, and useful. Of course if you have
someone
using ASSERTs as their only method of error handling then you’re doomed,
though that should be dealt with through your coding guidelines and not
pass
any reasonable code review.

-scott

–
Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“m” wrote in message news:xxxxx@ntdev…

While I agree that ASSERT is at best a shortcut for proper error handing
code and at worst something quite deadly, the statement

‘The only possible way a program is allowed to exit is if the user
requests
it to terminate; no mechanism that terminates execution is permissible.’

I can’t quite agree with. Certainly, this is quite correct when building
a
single threaded DOS application, but many systems and programming
paradigms
have the concept of panic or master alarm and in some cases ABEND is the
only sane course of action. We see posts from many who want to continue
after memory corruption, or unhandled KM exceptions and we try our best to
dissuade them because their task is impossible. Sometimes, the same is
true
in a UM app too and there is no possible way to continue safely. Failure
of
RevertToSelf or HeapUnlock is one example, and another would be corrupted
state in some multi-threaded designs. You could argue that designs and
APIs
that can result in unrecoverable failures ought not be used, and certainly
to minimize the number of unrecoverable failure paths in a design is a
worthwhile objective, but some activities simply require designs that can
have unrecoverable failures.

wrote in message news:xxxxx@ntdev…

I just checked the documentation. In kernel mode, if a debugger is
attached, a breakpoint is taken, but there is no suggestion that any
exception is taken.

I consider calls like this which terminate execution to be designed by
irresponsible children. ASSERT is used only during development and is
never part of a deliverable product. And the presumption that my program
can terminate execution safely at random points in time has no foundation;
in a well-designed world, control returns to me and I either continue and
get a termination or continue with recovery. If I code continue with
recovery, an ASSERT macro that terminates execution is a complete
disaster.

There are few things more amateurish than seeing code of the form

ASSERT(p != NULL);
int n = *p;

I always write

ASSERT(p != NULL);
if(p == NULL)
recover

“recover” might be
return FALSE;
throw new CInternalError(FILE, LINE);

as typical examples. The exception is caught, the error is logged, the
transaction is aborted, modified state is rendered consistent (rollback)
and the program continues to run. The only possible way a program is
allowed to exit is if the user requests it to terminate; no mechanism that
terminates execution is permissible.

I remember the amateurish code of cdb, the Berkley Unix debugger. At the
slightest error, it did exit(1); So I’m an hour into the debug session,
I’ve finally seen the conditions that trigger the bug, I’m trying to
figure out how the values got that way, I ask for a stack backtrace.
Boom! I’m looking at a nearly blank screen which is showing the shell
prompt. The debugger exited, terminating my debug session!

We had our local expert work on fixing this. He fixed over 200 bugs that
led to these conditions, and turned the exit() calls into longjmps (OK,
this was C in 1982) so the debugger would not exit. The next day we
received a new cdb distribution tape (FTP? Tapes were faster!) which
claimed to have fixed over 200 bugs. So our programmer, in great dismay
that he’d wasted a week, diffed the sources. He announced the next day
that the overlap (intersection) of the bug fixes was [drum roll] 3! And
they still exited the debugger if anything seemed wrong.

I remember arguing with one programmer about putting a BugCheckEx call in
a driver. He thought that if the user app sent down a bad IOCTL code,
this was a valid response. So I asked him, “Suppose you have a guest in
your house. He discovers that there is an insufficiency of toilet paper.
What do YOU think the correct recovery should be: (a) look for a new roll
on the back of the toilet (b) burn down your house?” I pointed out to him
that his driver was a guest in the OS, and since, in fact, it was not a
file system driver, any errors represented (a) hardware failures, in which
case he should recover gracefully (b) driver coding errors, in which case
he should recover gracefully or (c) user errors, in which case he should
recover gracefully. The only time you are permitted to burn the house
down is if you find your host is a mad scientist who will, very shortly,
release a highly-contagious variant of pneumonic plague on the world, that
he created in his basement lab. Or your file system driver detects some
impossible state which can only mean that further attempts to use it would
cause even more damage. But crashing the system because the app sent a
bad IOCTL to a DAC device was not an appropriate response.

He appeared to be unconvinced.
joe

> xxxxx@flounder.com wrote:
>> Asserts do not stop execution, even if they fail; they only print out
messages. And that only in debug builds.
>
> I would have argued the reverse. Asserts (in both user and kernel code)
trigger a debug breakpoint. If a debugger is attached, execution is
stopped because the debugger fires up. If a debugger is not attached,
execution is stopped by an uncaught exception (in user mode) or a
bugcheck (in kernel mode).
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · August 7, 2012, 1:41am

> Sorry, don’t agree with this at all (nor do I agree with Max’s, er,

assertion, that, “ASSERT is evil”).

I mean that ASSERT must be replaced by the code of your own, which does something like this:

logs “Internal error” message to Windows log
returns STATUS_INVALID_PARAMETER

No message boxes, no breakpoints, no crashes.

Admittedly, more and more of this can be done with the SAL notations

OPTIONAL documentation macro is enough is most cases

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · August 7, 2012, 4:38am

> you make sure that all callers have try/except frames that will get laid

down

This is not simpler and not more elegant then the return-error-code-based approach.

SEH only makes sense in managed languages, when everything is automatically destroyed in a proper way.

Well, you can use C++ in a mode when SEH will make sense, but then you must put all rollback/recovery code to the destructors of C++ wrapper classes, thus using the RAII paradigm.

In other cases, SEH is just an evil mess.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com