Just found a memory leak. How should I have found it?

Another engineer brought a bug to my attention after flooding millions of broadcast packets at my KMDF nic driver. His system was running out of memory and would eventually bluescreen. But if he unloaded my driver before bluescreening, all the memory came back and the system would continue to run fine.

I found the bug by systematically shutting off parts of my driver until the remaining few hundred lines of code could be hand analyzed. It turned out that a WdfWorkItem was being created and scheduled, but it wasn’t being deleted in the called subroutine.

The lucky part was that the RX routine wasn’t supposed to even schedule that work item. A flag was being set inadvertantly which caused the scheduling. If I hadn’t accidently set that flag I might never have found the leak because it’s a seldom called routine, and WDF cleans itself up so well that nothing ever complained about unfreed memory.

How do we find memory leaks like this when WDF just magically cleans everything up for us?

Since I don’t put tags on allocations from WDF, how could I have seend that I had millions of WDFWorkItems allocated in my driver?

xxxxx@email.com wrote:

Another engineer brought a bug to my attention after flooding millions of broadcast packets at my KMDF nic driver. His system was running out of memory and would eventually bluescreen. But if he unloaded my driver before bluescreening, all the memory came back and the system would continue to run fine.

I found the bug by systematically shutting off parts of my driver until the remaining few hundred lines of code could be hand analyzed. It turned out that a WdfWorkItem was being created and scheduled, but it wasn’t being deleted in the called subroutine.

The lucky part was that the RX routine wasn’t supposed to even schedule that work item. A flag was being set inadvertantly which caused the scheduling. If I hadn’t accidently set that flag I might never have found the leak because it’s a seldom called routine, and WDF cleans itself up so well that nothing ever complained about unfreed memory.

How do we find memory leaks like this when WDF just magically cleans everything up for us?

Since I don’t put tags on allocations from WDF, how could I have seend that I had millions of WDFWorkItems allocated in my driver?

There are many ways to find gradual leaking of memory. One obvious way
is to tag, another one is to observe from the performance statistics.
And there are other tools and kernel debugger command to look thru…

But what you mean by WDF just magically cleans everything??? What is
everything???. Every time I tries to base on magics, magically magic
fails, I got to be freaking unlucky :slight_smile:

-pro

tag all memory allocations and include pool monitoring in your test
matrix. And tagging memory means not using one tag for all
allocations, it means using one tag per allocation object type, for
some value of one and some value of allocation object type, such that
figuring out which of your 374 different allocation cases is actually
leaking is trivial.

Mark Roddy

On Wed, Nov 11, 2009 at 6:05 PM, wrote:
> Another engineer brought a bug to my attention after flooding millions of broadcast packets at my KMDF nic driver. ?His system was running out of memory and would eventually bluescreen. ?But if he unloaded my driver before bluescreening, all the memory came back and the system would continue to run fine.
>
> I found the bug by systematically shutting off parts of my driver until the remaining few hundred lines of code could be hand analyzed. ?It turned out that a WdfWorkItem was being created and scheduled, but it wasn’t being deleted in the called subroutine.
>
> The lucky part was that the RX routine wasn’t supposed to even schedule that work item. A flag was being set inadvertantly which caused the scheduling. ?If I hadn’t accidently set that flag I might never have found the leak because it’s a seldom called routine, and WDF cleans itself up so well that nothing ever complained about unfreed memory.
>
> How do we find memory leaks like this when WDF just magically cleans everything up for us?
>
> Since I don’t put tags on allocations from WDF, how could I have seend that I had millions of WDFWorkItems allocated in my driver?
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Two things you can use

  1. !wdfkd.wdfhandle 0xf0
    This will dump the entire object tree under your WDFDEVICE. Now with such a huge leak, waiting for the output to finish might take a long time, but along the way you would realize you were leaking pretty fast

    2) KMDF will synthesize a pool tag value for you (unless you specify one at WdfDriverCreate time) based on the first 4 letter of your driver’s name (unless it starts with “wdf” in which case we skip over the first 3 chars). KMDF will then use that tag for all objects you allocate. You can then use !poolused to view your pool usage

    d

    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Mark Roddy
    Sent: Wednesday, November 11, 2009 3:51 PM
    To: Windows System Software Devs Interest List
    Subject: Re: [ntdev] Just found a memory leak. How should I have found it?

    tag all memory allocations and include pool monitoring in your test matrix. And tagging memory means not using one tag for all allocations, it means using one tag per allocation object type, for some value of one and some value of allocation object type, such that figuring out which of your 374 different allocation cases is actually leaking is trivial.

    Mark Roddy

    On Wed, Nov 11, 2009 at 6:05 PM, wrote:
    > Another engineer brought a bug to my attention after flooding millions of broadcast packets at my KMDF nic driver. ?His system was running out of memory and would eventually bluescreen. ?But if he unloaded my driver before bluescreening, all the memory came back and the system would continue to run fine.
    >
    > I found the bug by systematically shutting off parts of my driver until the remaining few hundred lines of code could be hand analyzed. ?It turned out that a WdfWorkItem was being created and scheduled, but it wasn’t being deleted in the called subroutine.
    >
    > The lucky part was that the RX routine wasn’t supposed to even schedule that work item. A flag was being set inadvertantly which caused the scheduling. ?If I hadn’t accidently set that flag I might never have found the leak because it’s a seldom called routine, and WDF cleans itself up so well that nothing ever complained about unfreed memory.
    >
    > How do we find memory leaks like this when WDF just magically cleans everything up for us?
    >
    > Since I don’t put tags on allocations from WDF, how could I have seend that I had millions of WDFWorkItems allocated in my driver?
    >
    >
    > —
    > NTDEV is sponsored by OSR
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >


    NTDEV is sponsored by OSR

    For our schedule of WDF, WDM, debugging and other seminars visit:
    http://www.osr.com/seminars

    To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Mark,

That is exactly what I should have done. I can’t tell you how many different web search variations I used trying to figure out how to dump the WDF allocation tree. I figured because of the magic cleanup it must be WDF, and I knew there must be a way of dumping the entire tree or individual types, but danged if I could find out how.

As for the pool tagging, do you mean to say that WDF will put all allocations under the same tag? I kept trying to figure out if there wasn’t an object property that I could fill in before the allocation to make WDF tag different types. I never looked for a global allocatiion for the entire driver.

Clay

>But what you mean by WDF just magically cleans everything??? What is

everything???. Every time I tries to base on magics, magically magic
fails, I got to be freaking unlucky :slight_smile:

-pro

If you write your driver with WDF, then every allocation (queues, memory, WorkItems, string, etc) take a parent (Driver or device) as a part of the creation parameters. Wdf then keeps what I assume are linked lists from your root driver object down through your devices and ultimately to every object you’ve ever allocated (with wdf). When you delete any object, wdf will also delete all of the children under that object. You can manually delete the children before deleting the parent, but if you miss one, wdf will delete it for you. Which means that if you have a memory leak as I did, wdf will mask it by cleaning up your mess (my mess).

tag all memory allocations and include pool monitoring in your test
matrix. And tagging memory means not using one tag for all
allocations, it means using one tag per allocation object type, for
some value of one and some value of allocation object type, such that
figuring out which of your 374 different allocation cases is actually
leaking is trivial.

Mark Roddy

Mark, I got your an Doron’s responses mixed up. What Doron said is what I was ultimately looking for. How do I get WDF to put pool tags on my allocations.

When I wrote WDM drivers, I definitly did what you suggest.

>

If you write your driver with WDF, then every allocation (queues,
memory,
WorkItems, string, etc) take a parent (Driver or device) as a part of
the
creation parameters. Wdf then keeps what I assume are linked lists
from your
root driver object down through your devices and ultimately to every
object
you’ve ever allocated (with wdf). When you delete any object, wdf
will also
delete all of the children under that object. You can manually delete
the
children before deleting the parent, but if you miss one, wdf will
delete it
for you. Which means that if you have a memory leak as I did, wdf
will mask
it by cleaning up your mess (my mess).

Hmmm… so if you have a bunch of objects that you know that your code
should manually clean up as they should have a short lifespan, could you
parent them to a dummy object and then at shutdown time list out
everything still attached to the dummy object?

James

KMDF does not expose the ability to walk the object hierarchy, but the answer to the first question about grouping under a dummy object (WDFOBJECT) and just deleting the dummy object to delete everything underneath, is that, yes, that is a good practice. It keeps things simple and lets you still delete individual objects earlier if needed.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Thursday, November 12, 2009 2:57 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Just found a memory leak. How should I have found it?

If you write your driver with WDF, then every allocation (queues,
memory,
WorkItems, string, etc) take a parent (Driver or device) as a part of
the
creation parameters. Wdf then keeps what I assume are linked lists
from your
root driver object down through your devices and ultimately to every
object
you’ve ever allocated (with wdf). When you delete any object, wdf
will also
delete all of the children under that object. You can manually delete
the
children before deleting the parent, but if you miss one, wdf will
delete it
for you. Which means that if you have a memory leak as I did, wdf
will mask
it by cleaning up your mess (my mess).

Hmmm… so if you have a bunch of objects that you know that your code
should manually clean up as they should have a short lifespan, could you
parent them to a dummy object and then at shutdown time list out
everything still attached to the dummy object?

James


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

You cannot assign a tag per object that you allocate. You have one tag per driver. You can tag objects on your own with an object context, but that obviously does not tie into !poolused.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@email.com
Sent: Thursday, November 12, 2009 5:00 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Just found a memory leak. How should I have found it?

Mark,

That is exactly what I should have done. I can’t tell you how many different web search variations I used trying to figure out how to dump the WDF allocation tree. I figured because of the magic cleanup it must be WDF, and I knew there must be a way of dumping the entire tree or individual types, but danged if I could find out how.

As for the pool tagging, do you mean to say that WDF will put all allocations under the same tag? I kept trying to figure out if there wasn’t an object property that I could fill in before the allocation to make WDF tag different types. I never looked for a global allocatiion for the entire driver.

Clay


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

wrote in message news:xxxxx@ntdev…
> Another engineer brought a bug to my attention after flooding millions of
> broadcast packets at my KMDF nic driver. His system was running out of
> memory and would eventually bluescreen. But if he unloaded my driver
> before bluescreening, all the memory came back and the system would
> continue to run fine.
> …

this is not a memory leak. It is object leak - a bug in managing
lifetime of objects that you create,
or created by other layers (WDF) on your behalf.
The WDF layer correctly frees everything it owns when it is torn down (and
properly tags it’s allocations).

The tags only help to get the total memory usage, but you need to create
some instrumentation to track higher level things.
The common tactic to avoid uncontrolled build-up of objects (and associated
with them memory)
is to divide and conquer.
Define “activities” or “flows” that own allocations created in their
context.
These “activities” should have limited (or at least deterministic) life
time.
Next, provide some means for youself to know how many of these things exist
at any time, and why they exist.
This helps to detect leaks: when memory usage goes up when the number of
“activities” doesn’t, probably there’s a leak.
More over, if you still have some leaks, this pattern helps to localize the
damage:
a leak will live only while it’s parent object lives, and the latter should
have limited life time by design.
So even when some leaks remain, they won’t become show stoppers.

For example, in your case of network (ndis?) driver, activities can be
defined as round trip
of a received packet from it’s creation to returning to the pool,
and round trip of a Tx packet to completion of send.

The bottom line is that a framework (such as WDF) can’t structurize the
“business logic”
of the app, even if it manages life time for it’s own objects and
instruments low level allocations.
This is responsibility of the developer.


- pa