The new wil library for C++ code in drivers

@“Jeffrey_Tippet_[MSFT]” said:

On the topic of error checking & logging, I did a lot of very similar stuff using WPP macros in my previous workplace. Is it the direction you’re locking for? Every place I’ve worked in defined “logging” differently.

wil’s logging is configurable. The full story is documented here. The TL;DR is that you set some #defines to control whether errors get printed to DbgPrint, TraceLogging, or nowhere. You’d presumably have the noisy mode enabled for debug builds, and lower the verbosity for retail.

I would have liked to see WPP in there – I’m a fan of WPP. But WPP is not particuarly friendly to being included in someone else’s macro; it wants to own the top-level macro. So I’m doubtful that it’s technically possible to squeeze WPP into the wil error-handling framework. Well, at least without changing it so much it becomes some sort of WPP 2.0… :wink:

Oh wow, I’ve only dived in to the code some hours after replying, and only then saw the scope. Indeed, it could take more than a single weekend to add WPP support :wink:

I imagined WPP would be the standard for kernel due to performance and security (erm, obfuscation) considerations, so it could be on by default when deployed. If I’m missing some obfuscation ability in TraceLogging I’ll be delighted to get a push in the right direction.

In any case, this is great work. I’m excited to see modern C++ idioms in Kernel space.

Nobody (at this point, at least) is really pushing wil outside of Microsoft, but there’s some hazy dream of a future where every win32 developer just knows these macros and types, and it would look weird to manually spell out the error-checking or handle-closing.

For me it looks like a nightmare. Marcos which influence code flow (contain return or goto) are IMO plain evil. They make code shorter but less readable and in turn less robust/reliable. Call me dinosaur but I prefer when the code is readable as a book and code flow is explicitly expressed. Here you’re trying to express it in the macro name which is an interesting idea but it leads to longer macro names and makes this problem worse:

I think the RETURN_IF_NTSTATUS_FAIL creates too much noise and emphasis on the error path. I’d rather see the main function not buried in an argument, and a go to.

I’ve seen too much code plagued by REQUIRE-like macros and it is far less readable than longer version which handles errors explicitly. For this reason. You see just a chain of REQUIREs and what is code really doing is hard to decipher.

You’re one of my favorite dinosaurs :slight_smile:

mm

Creating fossils museum? :wink:

Ad hominem attacks & reddit-style hyperbole have no place in a community of professional engineers.

Totally agree and I would never do that. A bigger problem I see is people who carelessly post false or misleading information in a forum dedicated to professionals helping each other. Faulty content has no place here and should be called into question for the benefit of the poster than made it and even more for others that read it. There is nothing personal about setting the record straight.

RETURN_IF_NTSTATUS_FAILED(ZwCreateFile(. . . ));

My first reaction when i saw this is it’s extremely ugly and breaks a lot of rules and no way I would ever want to see that in code. But thinking about it some more this is really just a poor man’s exception. Almost a throw statement if you will. If you look at it in that light it then becomes a thinking point on how it might be useful.

But thinking about it some more this is really just a poor man’s exception. Almost a throw statement if you will.

Well, a throw() statement is just a language construct that one may use or misuse the way they wish. Let’s say we have a function that may throw an exception, and this function may allocate resources and/or lock semaphores before doing so. As I have said already, I would personally prefer to make all the cleanup in the same function in case of a failure. The use of throw() statement does not preclude me from doing so in any possible way, does it - my callee can make its “internal” cleanup before throwing the exception, and let the caller do its “external” one in a catch() block.

However, RETURN_IF_NTSTATUS_FAILED() does not seem to offer me any possibilities to structure my code this way, does it.
Therefore, it does not really look like a throw() statement, at least from my perspective. OTOH, someone who would prefer to handle all the cleanup in a caller’s catch() block may, indeed, think of it as just of some kind of a throw() statement.

In other words, this is, again, just the question of personal taste and preferences…

.

Anton Bassov

Totally agree and I would never do that.

Bullshit. Just so we’re clear, regardless of what you think, if I think you do it one more time, I will put you on moderation.

Are we clear, Mr. Rourke?

Peter

(two weeks later…)

Anybody spent time looking at this from a driver perspective?

I’m about to start a new project, and I wandered over to GitHub to look into this a bit. You know, “no time like the present” to play around with some cool new stuff.

I am perhaps too stuck in my ways to understand or appreciate this, but… I don’t get it.

a) It’s just not clear to me how much of this applies to Kernel Mode – Yes, I can see that registry, token_helpers, filesystem, and result don’t allow inclusion in kernel-mode code. But… the other stuff is good to go?

b) It’s not even clear to me when stuff actually DOES apply to kernel-mode code, what that GETS me in real life or why on earth I’d want to use it. I’ll take Mr. Tippet’s example above:

NTSTATUS Example()
{
    wil::unique_wdf_common_buffer my_buffer;
    status = WdfCommonBufferCreate(..., &my_buffer);
    if (STATUS_SUCCESS != status)
         return status;

    if (.  . . something else fails  . . . )
        return STATUS_UNSUCCESSFUL; // no common buffer leak; WdfObjectDelete happens here

    return STATUS_SUCCESS;
}

Common Buffers are typically long-lived – You allocate them during initialization, you parent them on your WDFDEVICE, they go away because the WDFDEVICE goes away (or, if you explicitly delete them just before your WDFDEVICE is destructed). So… Why would I want a Common Buffer deleted when leaving a function? I’m not picking on Common Buffer… but I see a whole LIST of similar structures: DMA Enabler, DMA Transaction, etc. These structures aren’t simply used within a function and returned… their entire POINT is to be created and used “across time” in your driver.

So… I feel like I must be missing something. Pray tell: What am I missing?

Peter

@“Peter_Viscarola_(OSR)” said:
(two weeks later…)
b) It’s not even clear to me when stuff actually DOES apply to kernel-mode code, what that GETS me in real life or why on earth I’d want to use it. I’ll take Mr. Tippet’s example above:

NTSTATUS Example()
{
    wil::unique_wdf_common_buffer my_buffer;
    status = WdfCommonBufferCreate(..., &my_buffer);
    if (STATUS_SUCCESS != status)
         return status;

    if (.  . . something else fails  . . . )
        return STATUS_UNSUCCESSFUL; // no common buffer leak; WdfObjectDelete happens here

    return STATUS_SUCCESS;
}

Common Buffers are typically long-lived – You allocate them during initialization, you parent them on your WDFDEVICE, they go away because the WDFDEVICE goes away (or, if you explicitly delete them just before your WDFDEVICE is destructed). So… Why would I want a Common Buffer deleted when leaving a function? I’m not picking on Common Buffer… but I see a whole LIST of similar structures: DMA Enabler, DMA Transaction, etc. These structures aren’t simply used within a function and returned… their entire POINT is to be created and used “across time” in your driver.

So… I feel like I must be missing something. Pray tell: What am I missing?

Peter

That may be a contrived example, but there are certainly a lot of things that should apply in kernel mode.

I didn’t spend a lot of time looking at the specifics what already exists, but it’s certainly a direction I think is positive.

In my previous place of work, we relied heavily on RAII wrappers for locks, mutexes and such. We also had C++ std-compatible memory wrappers (unique and shared ptr) and a kstd list, which reduced leaks and assured clear memory ownership. Once most system objects were wrapped behind c++ classes, we could provide usermode implementation and run unit tests on our logic.

If you look at the examples at https://github.com/microsoft/Network-Adapter-Class-Extension/tree/release_1903/rtl/inc you can see most objects have kernel as well as user mode implementations, which probably allows running the same logic in usermode too.

I see where you’re coming from regarding the difficulty of loving RETURN_IF_NTSTATUS_FAIL , and it is verbose and unusual.
However, there are certainly functions in my code where the entire function body is a dozen lines such as

status = Function(.....);
if (!NT_STATUS(status))
 {
        Log(.....,status);
        goto failed;
}

status = Function2(.....);
if (!NT_STATUS(status))
 {
        Log(.....,status);
        goto failed;
}
PVOID* ptr = ExAlloc(....)
if (!ptr)
{
    Log(.....,status);
    goto failed;
}
status = Function3(.....);
if (!NT_STATUS(status))
 {
        Log(.....,status);
        goto failed;
}

and then the failed label has a long list of

if (thing)
   Cleanup(thing);

if (thing2)
   ObRelease(thing2);

if (ptr)
   ExFreePool(ptr);

...

The use of similar macros, as well as RAII objects could reduce the amount of code spent on error handling, leading to

   RETURN_ON_ERROR(Function(...));
   RETURN_ON_ERROR(Function2(...));
   PVOID* ptr = ExAlloc(....)
   RETURN_IF_NULL(ptr);
   RETURN_ON_ERROR(Function3(...));

Advantages would be:

  1. No cleanup / failed label. No leaked resources on error exit path.
  2. Code is short and you only see business logic (IDE syntax higlighting helps), and error handling is declarative, and unified

Obviously this style is very much a matter of taste.

Yeah, that example is contrived. Here’s a real-world example from the OS: https://github.com/microsoft/Network-Adapter-Class-Extension/blob/848d4db6ed4a48756c7a84a6f97d59cf275b33f7/cx/sys/nxqueueapi.cpp#L47

The function doesn’t have any explicit cleanup code – everything happens “automatically” because allocations that need to be cleaned up are stored in a wil::unique_wdf_object. So the error handling macros just say if (failed) return statuscode; instead of the C style if (failed) goto Cleanup; That netadapter code is is equivalent to:

NTSTATUS NetTxQueueCreate()
{
    NTSTATUS Status;
    WDFOBJECT Object = NULL;

    if (STATUS_SUCCESS != (Status = WdfObjectCreate(. . ., &Object)))
        goto Cleanup;

    if (STATUS_SUCCESS != (Status = TxQueueInitialize(. . .)))
        goto Cleanup;

    if (STATUS_SUCCESS != (Status = OtherStuff(. . .)))
        goto Cleanup;

    InitContext.CreatedQueueObject = Object;
    Object = NULL;
    Status = STATUS_SUCCESS;

Cleanup:
    if (Object != NULL)
        WdfObjectDelete(Object);
    return Status;
}

My point is that deleting a WDF object isn’t just a problem invented by C++. Even if you write in C, you might still need to manually delete a WDF object in an error path. Yes, WDF will clean all things up eventually. But if you parent everything to your device object, then the cleanup won’t happen until the device is unloaded, which for some types of devices, is indistinguishable from a permanent memory leak. (If you’re writing a driver for a simple USB gadget, maybe you don’t need to worry about error handling. But if you’re writing hardened OS components, you need to make sure that there’s not some error path that just accumulates an unbounded number of dead WDF objects.)

Switching to a different error handling idiom warps your brain a bit, and it won’t feel right at first. The only advice I can give there is that, if you’re going to try a new error handling style, commit to it. Don’t try to mix two error handling styles (goto-cleanup and C++ RAII, or whatever.)

One caveat. Since WDF forces you to use its cleanup idiom, you’re going to be forced to mix the WDF ownership model with the cleanup idiom you use for the rest of the code. (The example above demonstrates that: in the success case, the WDFOBJECT will get deleted by WDF when its parent is deleted. In a failure case, the WDFOBJECT is manually deleted by the goto-cleanup idiom. So there’s 2 very different ways this thing can get deleted.)

For a codebase that wants to primarily use C++ RAII, the same problem exists. I use something like this to bridge the gap:

    WDF_OBJECT_ATTRIBUTES attributes;
    attributes.EvtDestroyCallback = [](WDFOBJECT obj) { MyGetContext(obj)->~MyContext(); };

which uses a gratuitous lambda, just to annoy people who aren’t already neck-deep in C++ :wink:

Thanks, Mr. Sirotnikov and Mr. Tippet. Much appreciated.

The real-world example was exactly what I needed. I was missing the (key) concept that in this new paradigm, if you want to the object to persist beyond the current scope, you explicitly transfer its ownership. Sort of an important concept, that. It’s effectively the reverse of the “old way”, which is, we assume the object will be retained when we leave the current scope and if we want to return it we need to explicitly do so before we leave.

While I recognize this pattern could potentially reduce some leaks perhaps brought about by sloppy cut/paste, I sadly find myself struggling to get enthused enough about it to foist in on my colleagues.

But if you’re writing hardened OS components, you need to make sure that there’s not some error path that just accumulates an unbounded number of dead WDF objects.)

Yes, yes… sure, obviously. You’re preaching to the converted here Mr. Tippet. I didn’t say, or at least I didn’t intend to say, “parent everything on the WDFDEVICE and you’ll never have to worry about it.” As you said, " deleting a WDF object isn’t just a problem invented by C++" – but outside of WDF Objects that are designed to be “used and discarded” (WDFSTRING comes to mind) it’s reasonably rare to call WdfObjectDelete. If I can’t allocate my WDFCOMMONBUFFER, or my WDFDMAENABLER I’m going to fail to load my driver. I can’t believe any other driver is going to do anything different. If I can’t build a WDFDMATRANSACTION for a given Request, I’m going to fail the Request. That seems eminently reasonable to me.

Anyhow, it seems like this is only really “worth it” if you got the whole way and do:

   RETURN_ON_ERROR(Function(...));
   RETURN_ON_ERROR(Function2(...));
   PVOID* ptr = ExAlloc(....)
   RETURN_IF_NULL(ptr);
   RETURN_ON_ERROR(Function3(...));

as Mr. Sirotnikov illustrated, and was pleasant enough to note that “this style is very much a matter of taste” – and that is definitely not to my taste. When my eyes scan that code above quickly, all I see is a whole lot of “RETURN_ON_ERROR” AND “RETURN_IF_NULL” … with the actual WORK the code does buried in some arguments somewhere. Ah… this is why variety is the spice of life.

Thank you both for the help. It is now clear that I don’t need to concern myself with “wil” for the present time. Thus, I am now free to return to banging my fucking head against the wall trying to make clang-format (as packaged in VS 2019) actually do what I want it to do. Because, you know, that’s something easy to accomplish.

Peter

Personally I think the macros and the c++ objects and wrappers are orthogonal. You can still avoid much of the cleanup code, and the error handling by using RAIII alone, and I found that using OO concepts often helps work with better abstractions and keeping the code readable. The macros, I agree are an issue. It’s not easy for me too, and I haven’t committed to it. At the same time I also don’t like the ~if(error) { … 4 lines of logging and cleanup }~ blocks of error handling that bloat short methods into monsters. So I’m willing to give the macros a shot.

blocks of error handling that bloat short methods into monsters

Hmmmm… but I would argue one wants to see all that code. It’s there, you’re responsible for making it work… might as well have it nice and clear and “in front of your face”, no??

Peter

When it has meaningful logic, sure. But plenty of times it’s a routine of log, set error and jump to cleanup, but it adds up. I guess maybe the golden path would be ~status = Func(…) if(status) LOG_AND_RETURN(…)~

I see your point, I really do. And that’s a LOT better, in my view, than burying the call AND the error handling behind one big macro. But, I’ve gotta tell you, even this suggestion doesn’t make me happy. If I inherit this code from you, it leaves me with questions:

a) What status actually gets returned (we can assume it’s “status” but… do we know that for sure?

b) What flag and level is associated with this particular log output?

I am, perhaps, irretrievably old-school when it comes to such things.

If your point of view is that you want to minimize the intrusiveness of a lot of the boiler-plate error handling, I can see that. But I’d maintain that this is code, while not certainly particularly interesting code, is still code that we own, and that we have to make work correctly, and some port bastard in the future has to maintain. I’m in favor of everything being nice, clear, and explicit. Heck, I’m the guy who puts parens in operations where the precedence order is “obvious” – I most often favor separating the declaration of a local variable from its initialization “for the sake of clarity.” Even the code that Mr. Tippet previously showed:

    if (STATUS_SUCCESS != (Status = WdfObjectCreate(. . ., &Object)))
        goto Cleanup;

makes me absolutely nuts. I have to think about what this code says. Why not just be explicit and clear:

    Status = WdfObjectCreate(. . ., &Object);

    if (Status != STATUS_SUCCESS) {

        goto Cleanup;

    }

Doesn’t that scan faster and take less brain-power to decipher?

(I’m ignoring the issue of whether we should be checking for STATUS_SUCCESS specifically here, I’m not trying to pick on the example)

I recently worked on the port of a Linux driver project, where everything in the original Linux code was a freakin’ macro… and macros within macros. It sure made the code attractive to look at, but it was a complete nightmare to understand what it was actually doing at a detail level.

Again, let me be quick to acknowledge that there are good engineers who think differently on this topic.

Peter

Even the code that Mr. Tippet previously showed […] makes me absolutely nuts.

I hate that particular style too. I had pulled some filthy tricks to squeeze the code into less vertical space, as I’m aware that my posts here already consume a lot of space. And as you know, newspapers charge by the column-centimeter or column-inch.

I’m ignoring the issue of whether we should be checking for STATUS_SUCCESS specifically here

I’ve gotten in that habit, actually. I’ve gradually come around to the conclusion that all the other “success” codes are weird special cases that should be immediately handled and not bubbled up through general code. E.g., what would it mean for a generic middleware routine to return STATUS_ALERTED or STATUS_TIMEOUT? In some of my code, I actually have an error-checking macro that expands out to an assertion that the status is either exactly STATUS_SUCCESS or NT_ERROR(x) != FALSE. I’d hate for a STATUS_PENDING to come flying out of some IRP handler and bubble up through layers that weren’t expecting to see it.

> > @“Peter_Viscarola_(OSR)” said: > If I inherit this code from you, it leaves me with questions: > > a) What status actually gets returned (we can assume it’s “status” but… do we know that for sure? > > b) What flag and level is associated with this particular log output? > You’re right off course. It should be clear from name and parameters what is logged where, and what the return value is. I composed that example in haste just to make the point, of how a more concise, declarative style could look like. On a side note, I really appreciate the the resulting discussion.

Peter_Viscarola_(OSR) wrote:

If I inherit this code from you, it leaves me with questions:

To be honest, I’m guessing that if almost ANY of us got hold of a bunch
of code from almost any of the others, we’d gasp in horror. And yet,
somehow, works gets done…

I have to think about what this code says. Why not just be explicit and clear

What about a compromise?

if ((Status = WdfObjectCreate(. . ., &Object)) != STATUS_SUCCESS) {
    goto Cleanup;
}

This is my way. I see the advantage of switching left and right sides of the condition but for me is negligible and the advantage of seeing real action at first prevails.

To be honest, I’m guessing that if almost ANY of us got hold of a bunch
of code from almost any of the others, we’d gasp in horror.

My experience is different. I have to work with a lot of others’ code and yet usually don’t gaps in horror. Sometimes I do but it is because the code is miserable. For example I don’t like standard MS WDK style but still can easily read it. Currently I’m working with RTOS sources which use own rather unusual style and still clean and readable. All what matters is consistency and if the developer was lazy or not. Using “clever” macros to make code shorter is a sign of laziness for me. TANSTAAFL, the price is code readability and it is the bad price. The code should be readable as a book, descriptive enough to read it without skipping back and forth trying to decode what macros really do. Such a code is longer and takes more time to write it but it pays off later many times.