ExAllocatePoolWithTag return null

  • Memory usage displayed by crash file(TAG ?ikm)
3: kd> !poolused 1 
....
 Sorting by Tag

                            NonPaged                                         Paged
 Tag       Allocs       Frees      Diff         Used       Allocs       Frees      Diff         Used

 1NVD           0           0         0            0            1           1         0            0	UNKNOWN pooltag '1NVD', please update pooltag.txt
 2UuQ           4           0         4        16384            0           0         0            0	UNKNOWN pooltag '2UuQ', please update pooltag.txt
 ?ikm   255346783   255245471    101312     26744640            0           0         0            0	UNKNOWN pooltag '?ikm', please update pooltag.txt
 ?zyx         480         450        30          960            0           0         0            0	UNKNOWN pooltag '?zyx', please update pooltag.txt
 ACHA      142556      142452       104        18304            0           0         0            0	UNKNOWN pooltag 'ACHA', please update pooltag.txt
 ACPI           4           4         0            0            0           0         0            0	UNKNOWN pooltag 'ACPI', please update pooltag.txt
 AFGp           1           1         0            0            0           0         0            0	UNKNOWN pooltag 'AFGp', please update pooltag.txt
 ALPC      398139      396773      1366       769504            0           0         0            0	ALPC port objects , Binary: nt!alpc

What you ought to do is integrate ExXxxLookasideListEx with your stl container allocator(s) rather than writing your own heap cache.
Regardless, you need to handle allocation failures rather than just crashing.
Also use a 4 character tag.

You haven’t literally done 4 billion memory allocations and frees from non-paged pool, have you? What on earth are you doing?

@Tim_Roberts said:
You haven’t literally done 4 billion memory allocations and frees from non-paged pool, have you? What on earth are you doing?

I save their information (process name/path/id/etc) through the process/thread/module callback, which can be understood as an anti-virus security driver, using std::make_shared(nonpage) a lot, yes only when the driver is uninstalled They’ll be freed when they’re not (I’ve checked to "delete" all "new" memory when the driver is unloaded), but there shouldn’t be 400 million requests or frees in runtime, which is weird.


  • allocs: 4283973658
  • frees: 4283782403
  • diff: 191255
    so diff = allocs - frees

Can I understand the meaning of the diff field like this: The current number of unreleased memory is 191255.


It’s just that the number of times of applying and releasing memory above leads to too much. In this case, even if there is enough memory, there is a chance for ExAllocatePoolWithTag to return 0. Is that right?

@Mark_Roddy said:
What you ought to do is integrate ExXxxLookasideListEx with your stl container allocator(s) rather than writing your own heap cache.
Regardless, you need to handle allocation failures rather than just crashing.
Also use a 4 character tag.

Thanks for your suggestion, I checked ExInitializeLookasideListEx and found that it initializes a fixed size list of. then in such a case the c++ objects/structs via new will have different sizes, then it seems that I need to initialize the lookaside list pointers of different sizes

@beforeyouknow said:
in such a case the c++ objects/structs via new will have different sizes, then it seems
that I need to initialize the lookaside list pointers of different sizes

Sure, each class or struct that gets allocated has its own allocator. If you are clever you could template the allocator code and just write it once.

@Mark_Roddy said:

@beforeyouknow said:
in such a case the c++ objects/structs via new will have different sizes, then it seems
that I need to initialize the lookaside list pointers of different sizes

Sure, each class or struct that gets allocated has its own allocator. If you are clever you could template the allocator code and just write it once.

i will try this.


I checked the lookasidelist again and it seems that ExAllocateFromLookasideListEx also returns null, in fact I still need to apply for a large memory pool in advance to ensure that it can handle the empty situation,

You have to handle a failed memory allocation. Plain and simple. You can try to be as complicated and baroque as you want, but you are still not handling the underlying condition that is a part of the allocator contract.

@Doron_Holan said:
You have to handle a failed memory allocation. Plain and simple. You can try to be as complicated and baroque as you want, but you are still not handling the underlying condition that is a part of the allocator contract.

Yes, now it seems I have to handle the null case anyway.

Yes, now it seems I have to handle the null case anyway.

yes always.

But the lookaside lists will mitigate fragmentation and you don’t have to allocate huge blocks ahead of time. The only downside is that the implementation gives you no control at all over the size of its free list, and may in fact trim your free list under memory pressure conditions. So I suggest using the existing look-aside list implementation first, see if it resolves the issue, and then consider replacing or extending it with your own version. (Hint: extending it is trivial.)

then consider replacing or extending it with your own version

This.

Those who’ve been here a while: Insert my standard grumble here, please.

Peter

@Mark_Roddy said:

Yes, now it seems I have to handle the null case anyway.

yes always.

But the lookaside lists will mitigate fragmentation and you don’t have to allocate huge blocks ahead of time. The only downside is that the implementation gives you no control at all over the size of its free list, and may in fact trim your free list under memory pressure conditions. So I suggest using the existing look-aside list implementation first, see if it resolves the issue, and then consider replacing or extending it with your own version. (Hint: extending it is trivial.)

I tried allocating a large chunk of memory ahead of time and then using a spinlock to sync it, giving me very high cpu and then the system would crash.

static volatile LONG reslock = 0;
typedef volatile LONG MY_LOCK;
void mySpin_Lock(MY_LOCK* lock) {
    while (_InterlockedExchange((volatile LONG*)lock, 1) != 0) {
        while (*lock) {
            _mm_pause();
        }
    }
}

void mySpinUnlock(MY_LOCK* lock) {
    *lock = 0;
}

void* stlnew(unsigned size) {
    mySpin_Lock(&reslock);
    void* mem = myMemoryPoolnew(size);
    mySpinUnlock(&reslock);
    return mem;
}


If I use a lookaside list, the best version I can think of is that I need to manage different size memory blocks lookaside list pointers eg: (8bytes/16bytes/32/64/128/512/1024/1096/4096), bigger The messages will be handled manually using ExAllocatePoolWithTag and checked for cases where 0 is returned

I have locked the cpu for ExAllocatePoolWithTag up but it didn’t crash, maybe it’s my own big memory pool management problem

void* stlnew(unsigned size) {
    mySpin_Lock(&reslock);
    void* mem = ExAllocatePoolWithTag(NonPagedPool, size, KERNEL_STL_POOL_TAG);
    mySpinUnlock(&reslock);
    return mem;
}

Why in the name of heaven would you even attempt to write your own spin lock code??

peter

@“Peter_Viscarola_(OSR)” said:
Why in the name of heaven would you even attempt to write your own spin lock code??

peter

since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock, I tried to write a simple one.

since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock

With all due respect, if KeAcquireInStackQueueSpinLock – when used correctly – was unstable or responsible for any bad behavior whatsoever, then the entire operating system would be in big trouble. It is widely used internally in Windows.

Peter

Well, a spin lock is a very easy thing to write, but you can’t write an acquire path with InterlockedExchange - you have to use InterlockedCompareExchange. And you shouldn’t try - the in box implementation is going to at least as good as anything that you can do. And probably better than what you will do

@beforeyouknow said:

@“Peter_Viscarola_(OSR)” said:
Why in the name of heaven would you even attempt to write your own spin lock code??

peter

since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock, I tried to write a simple one.

Did you try to cheat it and pass the PKLOCK_QUEUE_HANDLE as a pointer to a block of memory, instead of the address of a local on the stack? You can’t do that, you have to declare it as a local:

KLOCK_QUEUE_HANDLE handle;
...
KeAcquireInStackQueuedSpinLock(lock, &handle);

At least, I had to do that 15 years ago … Haven’t been playing in the Windows kernel in a while, but we got the behavior you described when we tried using memory that was not in the stack.

@Phil_Barila … Great insight/idea! (I miss seeing you here these days, Mr. Barila!)

Peter

@MBond2 said:
Well, a spin lock is a very easy thing to write, but you can’t write an acquire path with InterlockedExchange - you have to use InterlockedCompareExchange. And you shouldn’t try - the in box implementation is going to at least as good as anything that you can do. And probably better than what you will do

Yes, I just used KeAcquireInStackQueueSpinLock to verify whether I used KeAcquireInStackQueueSpinLock incorrectly, and finally found that it was just a management problem of my large memory pool