ExAllocatePoolWithTag return null

beforeyouknow · April 15, 2022, 8:28am

Sometimes ExAllocatePoolWithTag will return null and cause blue screen (only in windows8), but I am sure that my computer has enough memory, the following is the screenshot of poolman, I don’t know if the following data is normal

this is my driver(The display of the normal loading driver without blue screen):
All drivers are sorted(The display of the normal loading driver without blue screen):
Memory information displayed by crash file

3: kd> !vm
Page File: \??\C:\pagefile.sys
  Current:  15204352 Kb  Free Space:  14981688 Kb
  Minimum:  15204352 Kb  Maximum:     25165824 Kb
Page File: \??\C:\swapfile.sys
  Current:    262144 Kb  Free Space:    262136 Kb
  Minimum:    262144 Kb  Maximum:     12519760 Kb

Physical Memory:          2086627 (    8346508 Kb)
Available Pages:          1289719 (    5158876 Kb)
ResAvail Pages:           1846724 (    7386896 Kb)
Locked IO Pages:                0 (          0 Kb)
Free System PTEs:        33478485 (  133913940 Kb)
Modified Pages:             11024 (      44096 Kb)
Modified PF Pages:          10291 (      41164 Kb)
Modified No Write Pages:       92 (        368 Kb)
NonPagedPool Usage:          7474 (      29896 Kb)
NonPagedPoolNx Usage:       45003 (     180012 Kb)
NonPagedPool Max:         3989780 (   15959120 Kb)
PagedPool  0:               36872 (     147488 Kb)
PagedPool  1:               13400 (      53600 Kb)
PagedPool  2:                4244 (      16976 Kb)
PagedPool  3:                4137 (      16548 Kb)
PagedPool  4:                4213 (      16852 Kb)
PagedPool Usage:            62866 (     251464 Kb)
PagedPool Maximum:      100663296 (  402653184 Kb)
Processor Commit:             819 (       3276 Kb)
Session Commit:             15039 (      60156 Kb)
Syspart SharedCommit 0
Shared Commit:             317588 (    1270352 Kb)
Special Pool:                   0 (          0 Kb)
Kernel Stacks:              62272 (     249088 Kb)
Pages For MDLs:             11536 (      46144 Kb)
Pages For AWE:                  0 (          0 Kb)
NonPagedPool Commit:            0 (          0 Kb)
PagedPool Commit:           62904 (     251616 Kb)
Driver Commit:              14733 (      58932 Kb)
Boot Commit:                    0 (          0 Kb)
System PageTables:              0 (          0 Kb)
VAD/PageTable Bitmaps:       4798 (      19192 Kb)
ProcessLockedFilePages:         0 (          0 Kb)
Pagefile Hash Pages:            0 (          0 Kb)
Sum System Commit:         489689 (    1958756 Kb)
Total Private:             656494 (    2625976 Kb)
Misc/Transient Commit:      90741 (     362964 Kb)
Committed pages:          1236924 (    4947696 Kb)
Commit limit:             5887715 (   23550860 Kb)

I checked Ken_Johnson but couldn’t come to a conclusion because I used c++'s stl container (vectore/ map/set…) so it may indeed bring a lot of memory fragmentation

If it is under normal circumstances (full memory) ExAllocatePoolWithTag may also return nullptr at some point, at this time maybe I should pre-allocate (50mb/100mb/…) non-paged memory and then on this big memory pool**" malloc "** or " free " memory? Since I’ve overloaded new/delete this is easy to do, the only thing needed is a proper algorithm to make sure the allocation/deallocation works correctly

beforeyouknow · April 15, 2022, 9:37am

Memory usage displayed by crash file(TAG ?ikm)

3: kd> !poolused 1 
....
 Sorting by Tag

                            NonPaged                                         Paged
 Tag       Allocs       Frees      Diff         Used       Allocs       Frees      Diff         Used

 1NVD           0           0         0            0            1           1         0            0	UNKNOWN pooltag '1NVD', please update pooltag.txt
 2UuQ           4           0         4        16384            0           0         0            0	UNKNOWN pooltag '2UuQ', please update pooltag.txt
 ?ikm   255346783   255245471    101312     26744640            0           0         0            0	UNKNOWN pooltag '?ikm', please update pooltag.txt
 ?zyx         480         450        30          960            0           0         0            0	UNKNOWN pooltag '?zyx', please update pooltag.txt
 ACHA      142556      142452       104        18304            0           0         0            0	UNKNOWN pooltag 'ACHA', please update pooltag.txt
 ACPI           4           4         0            0            0           0         0            0	UNKNOWN pooltag 'ACPI', please update pooltag.txt
 AFGp           1           1         0            0            0           0         0            0	UNKNOWN pooltag 'AFGp', please update pooltag.txt
 ALPC      398139      396773      1366       769504            0           0         0            0	ALPC port objects , Binary: nt!alpc

Mark_Roddy · April 15, 2022, 3:30pm

What you ought to do is integrate ExXxxLookasideListEx with your stl container allocator(s) rather than writing your own heap cache.
Regardless, you need to handle allocation failures rather than just crashing.
Also use a 4 character tag.

Tim_Roberts · April 15, 2022, 5:29pm

You haven’t literally done 4 billion memory allocations and frees from non-paged pool, have you? What on earth are you doing?

beforeyouknow · April 16, 2022, 1:55am

@Tim_Roberts said:
You haven’t literally done 4 billion memory allocations and frees from non-paged pool, have you? What on earth are you doing?

I save their information (process name/path/id/etc) through the process/thread/module callback, which can be understood as an anti-virus security driver, using std::make_shared(nonpage) a lot, yes only when the driver is uninstalled They’ll be freed when they’re not (I’ve checked to "delete" all "new" memory when the driver is unloaded), but there shouldn’t be 400 million requests or frees in runtime, which is weird.

allocs: 4283973658
frees: 4283782403
diff: 191255
so diff = allocs - frees

Can I understand the meaning of the diff field like this: The current number of unreleased memory is 191255.

It’s just that the number of times of applying and releasing memory above leads to too much. In this case, even if there is enough memory, there is a chance for ExAllocatePoolWithTag to return 0. Is that right?

beforeyouknow · April 16, 2022, 2:28am

@Mark_Roddy said:
What you ought to do is integrate ExXxxLookasideListEx with your stl container allocator(s) rather than writing your own heap cache.
Regardless, you need to handle allocation failures rather than just crashing.
Also use a 4 character tag.

Thanks for your suggestion, I checked ExInitializeLookasideListEx and found that it initializes a fixed size list of. then in such a case the c++ objects/structs via new will have different sizes, then it seems that I need to initialize the lookaside list pointers of different sizes

Mark_Roddy · April 16, 2022, 6:06pm

@beforeyouknow said:
in such a case the c++ objects/structs via new will have different sizes, then it seems
that I need to initialize the lookaside list pointers of different sizes

Sure, each class or struct that gets allocated has its own allocator. If you are clever you could template the allocator code and just write it once.

beforeyouknow · April 18, 2022, 1:50am

@Mark_Roddy said:

@beforeyouknow said:
in such a case the c++ objects/structs via new will have different sizes, then it seems
that I need to initialize the lookaside list pointers of different sizes

Sure, each class or struct that gets allocated has its own allocator. If you are clever you could template the allocator code and just write it once.

i will try this.

I checked the lookasidelist again and it seems that ExAllocateFromLookasideListEx also returns null, in fact I still need to apply for a large memory pool in advance to ensure that it can handle the empty situation,

Doron_Holan · April 18, 2022, 2:08am

You have to handle a failed memory allocation. Plain and simple. You can try to be as complicated and baroque as you want, but you are still not handling the underlying condition that is a part of the allocator contract.

beforeyouknow · April 18, 2022, 2:48am

@Doron_Holan said:
You have to handle a failed memory allocation. Plain and simple. You can try to be as complicated and baroque as you want, but you are still not handling the underlying condition that is a part of the allocator contract.

Yes, now it seems I have to handle the null case anyway.

Mark_Roddy · April 19, 2022, 1:26pm

Yes, now it seems I have to handle the null case anyway.

yes always.

But the lookaside lists will mitigate fragmentation and you don’t have to allocate huge blocks ahead of time. The only downside is that the implementation gives you no control at all over the size of its free list, and may in fact trim your free list under memory pressure conditions. So I suggest using the existing look-aside list implementation first, see if it resolves the issue, and then consider replacing or extending it with your own version. (Hint: extending it is trivial.)

Peter_Viscarola_OSR · April 19, 2022, 8:37pm

then consider replacing or extending it with your own version

This.

Those who’ve been here a while: Insert my standard grumble here, please.

Peter

beforeyouknow · April 20, 2022, 7:05am

@Mark_Roddy said:

Yes, now it seems I have to handle the null case anyway.

yes always.

But the lookaside lists will mitigate fragmentation and you don’t have to allocate huge blocks ahead of time. The only downside is that the implementation gives you no control at all over the size of its free list, and may in fact trim your free list under memory pressure conditions. So I suggest using the existing look-aside list implementation first, see if it resolves the issue, and then consider replacing or extending it with your own version. (Hint: extending it is trivial.)

I tried allocating a large chunk of memory ahead of time and then using a spinlock to sync it, giving me very high cpu and then the system would crash.

static volatile LONG reslock = 0;
typedef volatile LONG MY_LOCK;
void mySpin_Lock(MY_LOCK* lock) {
    while (_InterlockedExchange((volatile LONG*)lock, 1) != 0) {
        while (*lock) {
            _mm_pause();
        }
    }
}

void mySpinUnlock(MY_LOCK* lock) {
    *lock = 0;
}

void* stlnew(unsigned size) {
    mySpin_Lock(&reslock);
    void* mem = myMemoryPoolnew(size);
    mySpinUnlock(&reslock);
    return mem;
}

If I use a lookaside list, the best version I can think of is that I need to manage different size memory blocks lookaside list pointers eg: (8bytes/16bytes/32/64/128/512/1024/1096/4096), bigger The messages will be handled manually using ExAllocatePoolWithTag and checked for cases where 0 is returned

beforeyouknow · April 20, 2022, 9:29am

I have locked the cpu for ExAllocatePoolWithTag up but it didn’t crash, maybe it’s my own big memory pool management problem

void* stlnew(unsigned size) {
    mySpin_Lock(&reslock);
    void* mem = ExAllocatePoolWithTag(NonPagedPool, size, KERNEL_STL_POOL_TAG);
    mySpinUnlock(&reslock);
    return mem;
}

Peter_Viscarola_OSR · April 20, 2022, 2:22pm

Why in the name of heaven would you even attempt to write your own spin lock code??

peter

beforeyouknow · April 21, 2022, 1:56am

@“Peter_Viscarola_(OSR)” said:
Why in the name of heaven would you even attempt to write your own spin lock code??

peter

since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock, I tried to write a simple one.

Peter_Viscarola_OSR · April 21, 2022, 1:18pm

since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock

With all due respect, if KeAcquireInStackQueueSpinLock – when used correctly – was unstable or responsible for any bad behavior whatsoever, then the entire operating system would be in big trouble. It is widely used internally in Windows.

Peter

MBond2 · April 21, 2022, 10:12pm

Well, a spin lock is a very easy thing to write, but you can’t write an acquire path with InterlockedExchange - you have to use InterlockedCompareExchange. And you shouldn’t try - the in box implementation is going to at least as good as anything that you can do. And probably better than what you will do

Phil_Barila · April 21, 2022, 10:23pm

@beforeyouknow said:

@“Peter_Viscarola_(OSR)” said:
Why in the name of heaven would you even attempt to write your own spin lock code??

peter

since I got huge cpu boosts and crashes using KeAcquireInStackQueuedSpinLock, I tried to write a simple one.

Did you try to cheat it and pass the PKLOCK_QUEUE_HANDLE as a pointer to a block of memory, instead of the address of a local on the stack? You can’t do that, you have to declare it as a local:

KLOCK_QUEUE_HANDLE handle;
...
KeAcquireInStackQueuedSpinLock(lock, &handle);

At least, I had to do that 15 years ago … Haven’t been playing in the Windows kernel in a while, but we got the behavior you described when we tried using memory that was not in the stack.

Peter_Viscarola_OSR · April 22, 2022, 2:08am

@Phil_Barila … Great insight/idea! (I miss seeing you here these days, Mr. Barila!)

Peter