Safety of using ExInterlockedPopEntrySList?

Simeon_Zlatarev · September 18, 2016, 12:15am

To my knowledge, pre-Windows 8 x64 implementations of SList use 9-bit sequence numbers in the SLIST_HEADER. This means that 512 operations can complete concurrently (without progress from particular thread) until an ABA problem potentially manifests. I wonder whether, depending on the number of threads and physical cores, this couldn’t plausibly occur. To further complicate, the kernel could run on a vcpu, creating time discontinuities.

I would like to ask:

Does the Windows scheduler protect against ABA by, e.g., restarting interlocked operation upon preemption?
Is there some protection against hypervisor interference?
In the light of the above concerns, is SList on a pre-Windows 8 x64 deployment really safe for all workloads?

I would have speculated that per-thread kernel allocator behavior was factored in for the ABA avoidance, but the primitives are in the Win32 API as well and any driver can employ custom pool allocator.

Alex_Grig · September 18, 2016, 11:38pm

The mitigating factor is that even if a collision occurs (which is very unlikely event), the probability that the seq# returns to the same value is 1/512. It’s not just that the API fails if the counter overflows. It only fails if it returns to the same number.

Simeon_Zlatarev · September 19, 2016, 4:14am

Thanks. This is what I meant as well. 1/512 is not waterproof. For a system corruption nonetheless.

Yes, the pointers must coincide. But let’s assume that this happens periodically with the right workload. If the pointer incident occurs 1024 times, the 1/512 chance becomes 86% chance. What is necessary is that the allocations must be reused and the contending threads must span *at least* 512 numbers due to scheduling decisions or hypervisor preemption, but again - why not?

Slava_Imameev · September 19, 2016, 8:44am

Hi,

I never heard about any fixup code for SList in the scheduler. There is a fixup code in the page fault handler to restart SList’s ***PopEntrySList if an invalid address was hit when accessing Header->NextEntry->Next.

SList has a user mode implementation similar to kernel mode one. In case of user mode page faults are likely to occur more often, so delays in user mode’s ****PopEntrySList are more unpredictable. The page fault handler contains the code to fixup the user mode Header->NextEntry->Next invalid access as well.

Simeon_Zlatarev · September 19, 2016, 3:01pm

Thanks Slava. An MS blog here describes it:
http://blogs.microsoft.co.il/sasha/2012/08/11/windows-memory-managers-preferential-treatment-of-access-faults-in-the-interlocked-singly-linked-list-pop-implementation/
In fact, the author of the blog, Sasha Goldshtein, is ntdev member. He referred to it in another post, but the link is out of date.

As you described it yourself, it is a different kind of fixup. Not for a roll-over protection.

Slava_Imameev · September 19, 2016, 3:17pm

This type of protection requires modification to either a scheduler or returning from interrupt, the latter is more appropriate for this task. As I said I never heard or observed such code.

Slava_Imameev · September 19, 2016, 4:08pm

I looked at the code again and found that interrupt processing code has a fixup for SList . There is a routine KiCheckForSListAddress. This routine is called at DISPATCH_LEVEL before returning from an interrupt and it fixes the EIP(RIP for x64) of a trap frame to restart SList pop operation if interrupt happened inside ExInterlockedPopEntrySList. So when an interrupt processing code returns execution to an interrupted code the code resumes at the beginning of ExInterlockedPopEntrySList ( namely ExpInterlockedPopEntrySListResume ).

kd> uf KiCheckForSListAddress
nt!KiCheckForSListAddress:
82acbdf1 0fb7416c movzx eax,word ptr [ecx+6Ch]
82acbdf5 8b5168 mov edx,dword ptr [ecx+68h]
82acbdf8 6683f808 cmp ax,8
82acbdfc 7511 jne nt!KiCheckForSListAddress+0x1e (82acbe0f) Branch

nt!KiCheckForSListAddress+0xd:
82acbdfe b8f4dda882 mov eax,offset nt!ExpInterlockedPopEntrySListResume (82a8ddf4)
82acbe03 3bd0 cmp edx,eax
82acbe05 7222 jb nt!KiCheckForSListAddress+0x38 (82acbe29) Branch

nt!KiCheckForSListAddress+0x16:
82acbe07 81fa1fdea882 cmp edx,offset nt!ExpInterlockedPopEntrySListEnd (82a8de1f)
82acbe0d eb15 jmp nt!KiCheckForSListAddress+0x33 (82acbe24) Branch

nt!KiCheckForSListAddress+0x1e:
82acbe0f 6683f81b cmp ax,1Bh
82acbe13 7514 jne nt!KiCheckForSListAddress+0x38 (82acbe29) Branch

nt!KiCheckForSListAddress+0x24:
82acbe15 a1ac69bb82 mov eax,dword ptr [nt!KeUserPopEntrySListResume (82bb69ac)]
82acbe1a 3bd0 cmp edx,eax
82acbe1c 720b jb nt!KiCheckForSListAddress+0x38 (82acbe29) Branch

nt!KiCheckForSListAddress+0x2d:
82acbe1e 3b15a469bb82 cmp edx,dword ptr [nt!KeUserPopEntrySListEnd (82bb69a4)]

nt!KiCheckForSListAddress+0x33:
82acbe24 7703 ja nt!KiCheckForSListAddress+0x38 (82acbe29) Branch

nt!KiCheckForSListAddress+0x35:
82acbe26 894168 mov dword ptr [ecx+68h],eax

nt!KiCheckForSListAddress+0x38:
82acbe29 c3 ret Branch

Simeon_Zlatarev · September 19, 2016, 5:48pm

Wow, Slava. Thanks. This seems to be it. Certainly fixes the OS scheduler problem.

MS also seems to have a “hypercall” interface for Hyper-V - their version of paravirutalization support. Supposedly the guest OS can, if Hyper-V friendly, get some vcpu scheduling/awareness. Whether this will work with the KiCheckForSListAddress trick - I have no idea, since I am not at all familiar with the api.

Even if the hypercall interface of Hyper-V somehow assists with KiCheckForSListAddress, third-party virtualization products may disrupt it. It depends on how they schedule or if they try to emulate Hyper-V. Additionally, I believe that the hypercall api for scheduling was introduced in Windows 2008. So XP/2003 may be unaware.

Thanks again.