Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV

Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


ExAllocatePoolWithTag return null

beforeyouknowbeforeyouknow Member Posts: 21
edited April 2022 in NTDEV

Sometimes ExAllocatePoolWithTag will return null and cause blue screen (only in windows8), but I am sure that my computer has enough memory, the following is the screenshot of poolman, I don't know if the following data is normal

  • this is my driver(The display of the normal loading driver without blue screen):
    png

  • All drivers are sorted(The display of the normal loading driver without blue screen):
    image

  • Memory information displayed by crash file

3: kd> !vm
Page File: \??\C:\pagefile.sys
  Current:  15204352 Kb  Free Space:  14981688 Kb
  Minimum:  15204352 Kb  Maximum:     25165824 Kb
Page File: \??\C:\swapfile.sys
  Current:    262144 Kb  Free Space:    262136 Kb
  Minimum:    262144 Kb  Maximum:     12519760 Kb

Physical Memory:          2086627 (    8346508 Kb)
Available Pages:          1289719 (    5158876 Kb)
ResAvail Pages:           1846724 (    7386896 Kb)
Locked IO Pages:                0 (          0 Kb)
Free System PTEs:        33478485 (  133913940 Kb)
Modified Pages:             11024 (      44096 Kb)
Modified PF Pages:          10291 (      41164 Kb)
Modified No Write Pages:       92 (        368 Kb)
NonPagedPool Usage:          7474 (      29896 Kb)
NonPagedPoolNx Usage:       45003 (     180012 Kb)
NonPagedPool Max:         3989780 (   15959120 Kb)
PagedPool  0:               36872 (     147488 Kb)
PagedPool  1:               13400 (      53600 Kb)
PagedPool  2:                4244 (      16976 Kb)
PagedPool  3:                4137 (      16548 Kb)
PagedPool  4:                4213 (      16852 Kb)
PagedPool Usage:            62866 (     251464 Kb)
PagedPool Maximum:      100663296 (  402653184 Kb)
Processor Commit:             819 (       3276 Kb)
Session Commit:             15039 (      60156 Kb)
Syspart SharedCommit 0
Shared Commit:             317588 (    1270352 Kb)
Special Pool:                   0 (          0 Kb)
Kernel Stacks:              62272 (     249088 Kb)
Pages For MDLs:             11536 (      46144 Kb)
Pages For AWE:                  0 (          0 Kb)
NonPagedPool Commit:            0 (          0 Kb)
PagedPool Commit:           62904 (     251616 Kb)
Driver Commit:              14733 (      58932 Kb)
Boot Commit:                    0 (          0 Kb)
System PageTables:              0 (          0 Kb)
VAD/PageTable Bitmaps:       4798 (      19192 Kb)
ProcessLockedFilePages:         0 (          0 Kb)
Pagefile Hash Pages:            0 (          0 Kb)
Sum System Commit:         489689 (    1958756 Kb)
Total Private:             656494 (    2625976 Kb)
Misc/Transient Commit:      90741 (     362964 Kb)
Committed pages:          1236924 (    4947696 Kb)
Commit limit:             5887715 (   23550860 Kb)

I checked Ken_Johnson but couldn't come to a conclusion because I used c++'s stl container (vectore/ map/set...) so it may indeed bring a lot of memory fragmentation


If it is under normal circumstances (full memory) ExAllocatePoolWithTag may also return nullptr at some point, at this time maybe I should pre-allocate (50mb/100mb/...) non-paged memory and then on this big memory pool" malloc " or " free " memory? Since I've overloaded new/delete this is easy to do, the only thing needed is a proper algorithm to make sure the allocation/deallocation works correctly

Post edited by beforeyouknow on

Comments

  • beforeyouknowbeforeyouknow Member Posts: 21
    edited April 2022
    • Memory usage displayed by crash file(TAG ?ikm)
    3: kd> !poolused 1 
    ....
     Sorting by Tag
    
                                NonPaged                                         Paged
     Tag       Allocs       Frees      Diff         Used       Allocs       Frees      Diff         Used
    
     1NVD           0           0         0            0            1           1         0            0    UNKNOWN pooltag '1NVD', please update pooltag.txt
     2UuQ           4           0         4        16384            0           0         0            0    UNKNOWN pooltag '2UuQ', please update pooltag.txt
     ?ikm   255346783   255245471    101312     26744640            0           0         0            0    UNKNOWN pooltag '?ikm', please update pooltag.txt
     ?zyx         480         450        30          960            0           0         0            0    UNKNOWN pooltag '?zyx', please update pooltag.txt
     ACHA      142556      142452       104        18304            0           0         0            0    UNKNOWN pooltag 'ACHA', please update pooltag.txt
     ACPI           4           4         0            0            0           0         0            0    UNKNOWN pooltag 'ACPI', please update pooltag.txt
     AFGp           1           1         0            0            0           0         0            0    UNKNOWN pooltag 'AFGp', please update pooltag.txt
     ALPC      398139      396773      1366       769504            0           0         0            0    ALPC port objects , Binary: nt!alpc
    
  • Mark_RoddyMark_Roddy Member - All Emails Posts: 4,628
    edited April 2022

    What you ought to do is integrate ExXxxLookasideListEx with your stl container allocator(s) rather than writing your own heap cache.
    Regardless, you need to handle allocation failures rather than just crashing.
    Also use a 4 character tag.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 14,563

    You haven't literally done 4 billion memory allocations and frees from non-paged pool, have you? What on earth are you doing?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • beforeyouknowbeforeyouknow Member Posts: 21
    edited April 2022

    @Tim_Roberts said:
    You haven't literally done 4 billion memory allocations and frees from non-paged pool, have you? What on earth are you doing?

    I save their information (process name/path/id/etc) through the process/thread/module callback, which can be understood as an anti-virus security driver, using std::make_shared(nonpage) a lot, yes only when the driver is uninstalled They'll be freed when they're not (I've checked to "delete" all "new" memory when the driver is unloaded), but there shouldn't be 400 million requests or frees in runtime, which is weird.


    • allocs: 4283973658
    • frees: 4283782403
    • diff: 191255
      so diff = allocs - frees

    Can I understand the meaning of the diff field like this: The current number of unreleased memory is 191255.


    It's just that the number of times of applying and releasing memory above leads to too much. In this case, even if there is enough memory, there is a chance for ExAllocatePoolWithTag to return 0. Is that right?

  • beforeyouknowbeforeyouknow Member Posts: 21

    @Mark_Roddy said:
    What you ought to do is integrate ExXxxLookasideListEx with your stl container allocator(s) rather than writing your own heap cache.
    Regardless, you need to handle allocation failures rather than just crashing.
    Also use a 4 character tag.

    Thanks for your suggestion, I checked ExInitializeLookasideListEx and found that it initializes a fixed size list of. then in such a case the c++ objects/structs via new will have different sizes, then it seems that I need to initialize the lookaside list pointers of different sizes

  • Mark_RoddyMark_Roddy Member - All Emails Posts: 4,628

    @beforeyouknow said:
    in such a case the c++ objects/structs via new will have different sizes, then it seems
    that I need to initialize the lookaside list pointers of different sizes

    Sure, each class or struct that gets allocated has its own allocator. If you are clever you could template the allocator code and just write it once.

  • beforeyouknowbeforeyouknow Member Posts: 21
    edited April 2022

    @Mark_Roddy said:

    @beforeyouknow said:
    in such a case the c++ objects/structs via new will have different sizes, then it seems
    that I need to initialize the lookaside list pointers of different sizes

    Sure, each class or struct that gets allocated has its own allocator. If you are clever you could template the allocator code and just write it once.

    i will try this.


    I checked the lookasidelist again and it seems that ExAllocateFromLookasideListEx also returns null, in fact I still need to apply for a large memory pool in advance to ensure that it can handle the empty situation,

  • Doron_HolanDoron_Holan Member - All Emails Posts: 10,756
    You have to handle a failed memory allocation. Plain and simple. You can try to be as complicated and baroque as you want, but you are still not handling the underlying condition that is a part of the allocator contract.
    d
  • beforeyouknowbeforeyouknow Member Posts: 21

    @Doron_Holan said:
    You have to handle a failed memory allocation. Plain and simple. You can try to be as complicated and baroque as you want, but you are still not handling the underlying condition that is a part of the allocator contract.

    Yes, now it seems I have to handle the null case anyway.

  • Mark_RoddyMark_Roddy Member - All Emails Posts: 4,628

    Yes, now it seems I have to handle the null case anyway.

    yes always.

    But the lookaside lists will mitigate fragmentation and you don't have to allocate huge blocks ahead of time. The only downside is that the implementation gives you no control at all over the size of its free list, and may in fact trim your free list under memory pressure conditions. So I suggest using the existing look-aside list implementation first, see if it resolves the issue, and then consider replacing or extending it with your own version. (Hint: extending it is trivial.)

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    then consider replacing or extending it with your own version

    This.

    Those who've been here a while: Insert my standard grumble here, please.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • beforeyouknowbeforeyouknow Member Posts: 21
    edited April 2022

    @Mark_Roddy said:

    Yes, now it seems I have to handle the null case anyway.

    yes always.

    But the lookaside lists will mitigate fragmentation and you don't have to allocate huge blocks ahead of time. The only downside is that the implementation gives you no control at all over the size of its free list, and may in fact trim your free list under memory pressure conditions. So I suggest using the existing look-aside list implementation first, see if it resolves the issue, and then consider replacing or extending it with your own version. (Hint: extending it is trivial.)

    I tried allocating a large chunk of memory ahead of time and then using a spinlock to sync it, giving me very high cpu and then the system would crash.

    static volatile LONG reslock = 0;
    typedef volatile LONG MY_LOCK;
    void mySpin_Lock(MY_LOCK* lock) {
        while (_InterlockedExchange((volatile LONG*)lock, 1) != 0) {
            while (*lock) {
                _mm_pause();
            }
        }
    }
    
    void mySpinUnlock(MY_LOCK* lock) {
        *lock = 0;
    }
    
    void* stlnew(unsigned size) {
        mySpin_Lock(&reslock);
        void* mem = myMemoryPoolnew(size);
        mySpinUnlock(&reslock);
        return mem;
    }
    
    

    If I use a lookaside list, the best version I can think of is that I need to manage different size memory blocks lookaside list pointers eg: (8bytes/16bytes/32/64/128/512/1024/1096/4096), bigger The messages will be handled manually using ExAllocatePoolWithTag and checked for cases where 0 is returned

  • beforeyouknowbeforeyouknow Member Posts: 21
    edited April 2022

    I have locked the cpu for ExAllocatePoolWithTag up but it didn't crash, maybe it's my own big memory pool management problem

    void* stlnew(unsigned size) {
        mySpin_Lock(&reslock);
        void* mem = ExAllocatePoolWithTag(NonPagedPool, size, KERNEL_STL_POOL_TAG);
        mySpinUnlock(&reslock);
        return mem;
    }
    
    
  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    Why in the name of heaven would you even attempt to write your own spin lock code??

    peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • beforeyouknowbeforeyouknow Member Posts: 21

    @Peter_Viscarola_(OSR) said:
    Why in the name of heaven would you even attempt to write your own spin lock code??

    peter

    since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock, I tried to write a simple one.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock

    With all due respect, if KeAcquireInStackQueueSpinLock -- when used correctly -- was unstable or responsible for any bad behavior whatsoever, then the entire operating system would be in big trouble. It is widely used internally in Windows.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • MBond2MBond2 Member Posts: 564

    Well, a spin lock is a very easy thing to write, but you can't write an acquire path with InterlockedExchange - you have to use InterlockedCompareExchange. And you shouldn't try - the in box implementation is going to at least as good as anything that you can do. And probably better than what you will do

  • Phil_BarilaPhil_Barila Member - All Emails Posts: 165

    @beforeyouknow said:

    @Peter_Viscarola_(OSR) said:
    Why in the name of heaven would you even attempt to write your own spin lock code??

    peter

    since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock, I tried to write a simple one.

    Did you try to cheat it and pass the PKLOCK_QUEUE_HANDLE as a pointer to a block of memory, instead of the address of a local on the stack? You can't do that, you have to declare it as a local:

    KLOCK_QUEUE_HANDLE handle;
    ...
    KeAcquireInStackQueuedSpinLock(lock, &handle);
    

    At least, I had to do that 15 years ago ... Haven't been playing in the Windows kernel in a while, but we got the behavior you described when we tried using memory that was not in the stack.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    @Phil_Barila … Great insight/idea! (I miss seeing you here these days, Mr. Barila!)

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • beforeyouknowbeforeyouknow Member Posts: 21

    @MBond2 said:
    Well, a spin lock is a very easy thing to write, but you can't write an acquire path with InterlockedExchange - you have to use InterlockedCompareExchange. And you shouldn't try - the in box implementation is going to at least as good as anything that you can do. And probably better than what you will do

    Yes, I just used KeAcquireInStackQueueSpinLock to verify whether I used KeAcquireInStackQueueSpinLock incorrectly, and finally found that it was just a management problem of my large memory pool

  • Phil_BarilaPhil_Barila Member - All Emails Posts: 165

    @Peter_Viscarola_(OSR) said:
    @Phil_Barila … Great insight/idea! (I miss seeing you here these days, Mr. Barila!)

    Peter

    Aww, thanks! I like what I'm doing, but sometimes I miss working in KM. Returning to it sometime in the future is not out of the question.

  • Phil_BarilaPhil_Barila Member - All Emails Posts: 165

    @beforeyouknow said:
    Yes, I use KLOCK_QUEUE_HANDLE handle; as a local variable, I checked the official documentation as well as [microsoft/Windows-driver-samples//network/trans/inspect/sys/inspect.c#L453](https:// github.com/microsoft/Windows-driver-samples/blob/9e1a643093cac60cd333b6d69abc1e4118a12d63/network/trans/inspect/sys/inspect.c#L453), they all save KLOCK_QUEUE_HANDLE in the stack, if Microsoft didn't play hide and seek with us on purpose, Then it should be available on the stack as a temporary variable. But KSPIN_LOCK should be required to be statically declared as a global variable.

    I wonder if you deleted that, since it's no longer visible here? Glad to hear that your issues were not with how you were declaring the handle.

    You don't need to, (probably don't want to) make the KSPIN_LOCK a static global. It should be in the scope that is closest to the need, but widely visible enough to cover all accesses. I prefer to "hide" such locking/unlocking inside accessor methods that lock/access/unlock on behalf of the caller.

    Way back in the day I wrote a memory pool that "extended" the lookaside by pre-allocating a bunch of blocks at startup, and then asking the lookaside for more when my pool got low, and released back to the lookaside when my pool was getting close to overflowing. Worked great! I can't remember, but I probably used an in-stack queued spinlock to guard access.

  • rdmsrrdmsr Member Posts: 5

    @Phil_Barila said:

    @beforeyouknow said:
    Yes, I use KLOCK_QUEUE_HANDLE handle; as a local variable, I checked the official documentation as well as [microsoft/Windows-driver-samples//network/trans/inspect/sys/inspect.c#L453](https:// github.com/microsoft/Windows-driver-samples/blob/9e1a643093cac60cd333b6d69abc1e4118a12d63/network/trans/inspect/sys/inspect.c#L453), they all save KLOCK_QUEUE_HANDLE in the stack, if Microsoft didn't play hide and seek with us on purpose, Then it should be available on the stack as a temporary variable. But KSPIN_LOCK should be required to be statically declared as a global variable.

    I wonder if you deleted that, since it's no longer visible here? Glad to hear that your issues were not with how you were declaring the handle.

    You don't need to, (probably don't want to) make the KSPIN_LOCK a static global. It should be in the scope that is closest to the need, but widely visible enough to cover all accesses. I prefer to "hide" such locking/unlocking inside accessor methods that lock/access/unlock on behalf of the caller.

    Way back in the day I wrote a memory pool that "extended" the lookaside by pre-allocating a bunch of blocks at startup, and then asking the lookaside for more when my pool got low, and released back to the lookaside when my pool was getting close to overflowing. Worked great! I can't remember, but I probably used an in-stack queued spinlock to guard access.

    I've found that securing large memory pools with spinlocks will inevitably lead to CPU spikes.

  • beforeyouknowbeforeyouknow Member Posts: 21

    @rdmsr said:

    @Phil_Barila said:

    @beforeyouknow said:
    Yes, I use KLOCK_QUEUE_HANDLE handle; as a local variable, I checked the official documentation as well as [microsoft/Windows-driver-samples//network/trans/inspect/sys/inspect.c#L453](https:// github.com/microsoft/Windows-driver-samples/blob/9e1a643093cac60cd333b6d69abc1e4118a12d63/network/trans/inspect/sys/inspect.c#L453), they all save KLOCK_QUEUE_HANDLE in the stack, if Microsoft didn't play hide and seek with us on purpose, Then it should be available on the stack as a temporary variable. But KSPIN_LOCK should be required to be statically declared as a global variable.

    I wonder if you deleted that, since it's no longer visible here? Glad to hear that your issues were not with how you were declaring the handle.

    You don't need to, (probably don't want to) make the KSPIN_LOCK a static global. It should be in the scope that is closest to the need, but widely visible enough to cover all accesses. I prefer to "hide" such locking/unlocking inside accessor methods that lock/access/unlock on behalf of the caller.

    Way back in the day I wrote a memory pool that "extended" the lookaside by pre-allocating a bunch of blocks at startup, and then asking the lookaside for more when my pool got low, and released back to the lookaside when my pool was getting close to overflowing. Worked great! I can't remember, but I probably used an in-stack queued spinlock to guard access.

    I've found that securing large memory pools with spinlocks will inevitably lead to CPU spikes.

    That's what spinlocks do. I'm currently trying to solve this using a fallback list first, thanks everyone.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    The question, really, is whether the contested acquisition cost matters enough in your application for you to try to fix it. Usually, it does not. Premature optimization is a hallmark of poor engineering.

    It all depends on what you’re allocating, how frequent you expect the allocations to be, and whether you value rapid uncontested or contested lock acquisition. There is no lock that is “free.” Also important is how the memory you’re allocating is used, of course. Before the cost of “spinning” you might want to worry about allocating memory that’s “near” in the NUMA sense.

    Peter Viscarola
    OSR
    @OSRDrivers

  • anton_bassovanton_bassov Member MODERATED Posts: 5,281

    Why in the name of heaven would you even attempt to write your own spin lock code??

    Well, according to Mr.Kyler,what the OP does here is "the only proper spinlock implementation in existence", namely, a tight polling loop of interlocked operations.....

    On a serious note, there are some (admittedly rare) situations when a custom spinlock implementation may be beneficial indeed. For example, consider the scenario when you have multiple queues (for example, holding the workitems that have to be processed, each protected by a spinlock, and your goal is to ensure that all these queues get emptied as quickly as possible. Furthermore, these queues happen to be accessed in a code path the get frequently executed by all CPUs in the system, so that the contention for the locks may be high.

    In such case, "optimised" spinlocks versions like in-stack queued locks or ticket locks are going to be, in actuality, suboptimal. Why? Because these locks oblige you, by the very definition of queued locks, to keep on spinning until the target lock gets acquired, without giving you a chance to yield and do something else instead. However, in the situation like that you may want, instead of waiting until the lock to the queue A becomes available, to check if you can acquire a lock to a queue B,C,D or E so that you can process it straight away. Otherwise, you may be "surprised" to discover that, by the time you have acquired the target lock, the particular queue that it guards is already empty, because all the items in it have been already processed by the previous lock owners. This situation is going to repeat itself with one lock after another. Certainly, it is not necessarily going to be the same CPU that gets unlucky all the time, but the proportion of time spent in idle spinning by all CPUs in the system as a whole will unquestionably grow significantly, which means you will be unable to utilise all the processing resources in an optimal way.

    Therefore, in such case it would be better to use the "classical" spinlocks that are based upon test-and-set. Instead of spinning in an outer loop until a spinlock to a given queue gets released, it would make more sense for you to go and check if the lock to some other queue is available, which, in turn, implies that you would be better off with a custom spinlock implementation

    Anton Bassov

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    You’ve been quiet for ages, and your first post back… is a necropost?

    Peter Viscarola
    OSR
    @OSRDrivers

  • anton_bassovanton_bassov Member MODERATED Posts: 5,281

    ou’ve been quiet for ages, and your first post back… is a necropost?

    Sorry - it is just an accident. The thing is, I just received " I'm like crying from laughing so hard" message from one of my old NTDEV contacts with a screenshot of the OP's "custom implementation of a spinlock" and a link to this thread. Therefore, I could not resist and made a post....and then looked at the dates. Sorry for that

    Anton Bassov

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 9,077

    Therefore, I could not resist and made a post

    Well, I can certainly understand that. :-)

    Peter Viscarola
    OSR
    @OSRDrivers

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Internals & Software Drivers 19-23 June 2023 Live, Online
Writing WDF Drivers 10-14 July 2023 Live, Online
Kernel Debugging 16-20 October 2023 Live, Online
Developing Minifilters 13-17 November 2023 Live, Online