ExAllocatePoolWithTag return null

beforeyouknow · April 22, 2022, 6:19am

@MBond2 said:
Well, a spin lock is a very easy thing to write, but you can’t write an acquire path with InterlockedExchange - you have to use InterlockedCompareExchange. And you shouldn’t try - the in box implementation is going to at least as good as anything that you can do. And probably better than what you will do

Yes, I just used KeAcquireInStackQueueSpinLock to verify whether I used KeAcquireInStackQueueSpinLock incorrectly, and finally found that it was just a management problem of my large memory pool

Phil_Barila · April 22, 2022, 1:15pm

@“Peter_Viscarola_(OSR)” said:
@Phil_Barila … Great insight/idea! (I miss seeing you here these days, Mr. Barila!)

Peter

Aww, thanks! I like what I’m doing, but sometimes I miss working in KM. Returning to it sometime in the future is not out of the question.

Phil_Barila · April 22, 2022, 1:30pm

@beforeyouknow said:
Yes, I use KLOCK_QUEUE_HANDLE handle; as a local variable, I checked the official documentation as well as [microsoft/Windows-driver-samples//network/trans/inspect/sys/inspect.c#L453](https:// github.com/microsoft/Windows-driver-samples/blob/9e1a643093cac60cd333b6d69abc1e4118a12d63/network/trans/inspect/sys/inspect.c#L453), they all save KLOCK_QUEUE_HANDLE in the stack, if Microsoft didn’t play hide and seek with us on purpose, Then it should be available on the stack as a temporary variable. But KSPIN_LOCK should be required to be statically declared as a global variable.

I wonder if you deleted that, since it’s no longer visible here? Glad to hear that your issues were not with how you were declaring the handle.

You don’t need to, (probably don’t want to) make the KSPIN_LOCK a static global. It should be in the scope that is closest to the need, but widely visible enough to cover all accesses. I prefer to “hide” such locking/unlocking inside accessor methods that lock/access/unlock on behalf of the caller.

Way back in the day I wrote a memory pool that “extended” the lookaside by pre-allocating a bunch of blocks at startup, and then asking the lookaside for more when my pool got low, and released back to the lookaside when my pool was getting close to overflowing. Worked great! I can’t remember, but I probably used an in-stack queued spinlock to guard access.

rdmsr · April 23, 2022, 1:00am

@Phil_Barila said:

@beforeyouknow said:
Yes, I use KLOCK_QUEUE_HANDLE handle; as a local variable, I checked the official documentation as well as [microsoft/Windows-driver-samples//network/trans/inspect/sys/inspect.c#L453](https:// github.com/microsoft/Windows-driver-samples/blob/9e1a643093cac60cd333b6d69abc1e4118a12d63/network/trans/inspect/sys/inspect.c#L453), they all save KLOCK_QUEUE_HANDLE in the stack, if Microsoft didn’t play hide and seek with us on purpose, Then it should be available on the stack as a temporary variable. But KSPIN_LOCK should be required to be statically declared as a global variable.

I wonder if you deleted that, since it’s no longer visible here? Glad to hear that your issues were not with how you were declaring the handle.

You don’t need to, (probably don’t want to) make the KSPIN_LOCK a static global. It should be in the scope that is closest to the need, but widely visible enough to cover all accesses. I prefer to “hide” such locking/unlocking inside accessor methods that lock/access/unlock on behalf of the caller.

Way back in the day I wrote a memory pool that “extended” the lookaside by pre-allocating a bunch of blocks at startup, and then asking the lookaside for more when my pool got low, and released back to the lookaside when my pool was getting close to overflowing. Worked great! I can’t remember, but I probably used an in-stack queued spinlock to guard access.

I’ve found that securing large memory pools with spinlocks will inevitably lead to CPU spikes.

beforeyouknow · April 23, 2022, 12:03pm

@rdmsr said:

@Phil_Barila said:

@beforeyouknow said:
Yes, I use KLOCK_QUEUE_HANDLE handle; as a local variable, I checked the official documentation as well as [microsoft/Windows-driver-samples//network/trans/inspect/sys/inspect.c#L453](https:// github.com/microsoft/Windows-driver-samples/blob/9e1a643093cac60cd333b6d69abc1e4118a12d63/network/trans/inspect/sys/inspect.c#L453), they all save KLOCK_QUEUE_HANDLE in the stack, if Microsoft didn’t play hide and seek with us on purpose, Then it should be available on the stack as a temporary variable. But KSPIN_LOCK should be required to be statically declared as a global variable.

I wonder if you deleted that, since it’s no longer visible here? Glad to hear that your issues were not with how you were declaring the handle.

You don’t need to, (probably don’t want to) make the KSPIN_LOCK a static global. It should be in the scope that is closest to the need, but widely visible enough to cover all accesses. I prefer to “hide” such locking/unlocking inside accessor methods that lock/access/unlock on behalf of the caller.

Way back in the day I wrote a memory pool that “extended” the lookaside by pre-allocating a bunch of blocks at startup, and then asking the lookaside for more when my pool got low, and released back to the lookaside when my pool was getting close to overflowing. Worked great! I can’t remember, but I probably used an in-stack queued spinlock to guard access.

I’ve found that securing large memory pools with spinlocks will inevitably lead to CPU spikes.

That’s what spinlocks do. I’m currently trying to solve this using a fallback list first, thanks everyone.

Peter_Viscarola_OSR · April 23, 2022, 2:17pm

The question, really, is whether the contested acquisition cost matters enough in your application for you to try to fix it. Usually, it does not. Premature optimization is a hallmark of poor engineering.

It all depends on what you’re allocating, how frequent you expect the allocations to be, and whether you value rapid uncontested or contested lock acquisition. There is no lock that is “free.” Also important is how the memory you’re allocating is used, of course. Before the cost of “spinning” you might want to worry about allocating memory that’s “near” in the NUMA sense.

anton_bassov · May 15, 2022, 12:45am

Why in the name of heaven would you even attempt to write your own spin lock code??

Well, according to Mr.Kyler,what the OP does here is “the only proper spinlock implementation in existence”, namely, a tight polling loop of interlocked operations…

On a serious note, there are some (admittedly rare) situations when a custom spinlock implementation may be beneficial indeed. For example, consider the scenario when you have multiple queues (for example, holding the workitems that have to be processed, each protected by a spinlock, and your goal is to ensure that all these queues get emptied as quickly as possible. Furthermore, these queues happen to be accessed in a code path the get frequently executed by all CPUs in the system, so that the contention for the locks may be high.

In such case, “optimised” spinlocks versions like in-stack queued locks or ticket locks are going to be, in actuality, suboptimal. Why? Because these locks oblige you, by the very definition of queued locks, to keep on spinning until the target lock gets acquired, without giving you a chance to yield and do something else instead. However, in the situation like that you may want, instead of waiting until the lock to the queue A becomes available, to check if you can acquire a lock to a queue B,C,D or E so that you can process it straight away. Otherwise, you may be “surprised” to discover that, by the time you have acquired the target lock, the particular queue that it guards is already empty, because all the items in it have been already processed by the previous lock owners. This situation is going to repeat itself with one lock after another. Certainly, it is not necessarily going to be the same CPU that gets unlucky all the time, but the proportion of time spent in idle spinning by all CPUs in the system as a whole will unquestionably grow significantly, which means you will be unable to utilise all the processing resources in an optimal way.

Therefore, in such case it would be better to use the “classical” spinlocks that are based upon test-and-set. Instead of spinning in an outer loop until a spinlock to a given queue gets released, it would make more sense for you to go and check if the lock to some other queue is available, which, in turn, implies that you would be better off with a custom spinlock implementation

Anton Bassov

Peter_Viscarola_OSR · May 15, 2022, 2:33am

You’ve been quiet for ages, and your first post back… is a necropost?

anton_bassov · May 15, 2022, 10:36am

ou’ve been quiet for ages, and your first post back… is a necropost?

Sorry - it is just an accident. The thing is, I just received " I’m like crying from laughing so hard" message from one of my old NTDEV contacts with a screenshot of the OP’s “custom implementation of a spinlock” and a link to this thread. Therefore, I could not resist and made a post…and then looked at the dates. Sorry for that

Anton Bassov

Peter_Viscarola_OSR · May 16, 2022, 2:16am

Therefore, I could not resist and made a post

Well, I can certainly understand that.