ExInterlockedXxx functionality

I’ve been reviewing the WRK, and ReactOS to see how interlocked SLIST works. I’m trying to tidy up my XLIST. I see that on entry to these functions, in 32 bit mode, the sequence PUSHFD, _disable(), grab spinlock, do work, POPFD. The spinlock is grabbed in the same way as if you had called KeAcquireSpinlockAtDpcLevel(); IRQL is not raised, &c. So, I threw together this bit of code:

#define EFLAGS_IF_MASK 0x00000200
#define EFLAGS_IF_SHIFT 9

PXLIST_ENTRY InterlockedInsertTailXList(PXLIST_HEADER List,
	PXLIST_ENTRY Entry, PKSPIN_LOCK Lock) {
#if (1)
	unsigned int Flags = GetCallersEflags();
	_disable();
	KeAcquireSpinLockAtDpcLevel(Lock);
#else
	KLOCK_QUEUE_HANDLE LockHandle;
	KeAcquireInStackQueuedSpinLock(Lock, &LockHandle);
#endif
	InsertTailXList(List, Entry);
#if (1)
	KeReleaseSpinLockFromDpcLevel(Lock);
	if (Flags & EFLAGS_IF_MASK) {
		_enable();
	}
#else
	KeReleaseInStackQueuedSpinLock(&LockHandle);
#endif
	return Entry;
}

I know that there are amd64 interlocked instructions for just about everything (xor, &c.), and I’ll get around to that at some point if they are practical. I am just tinkering at the moment. Although pushfq and popfq are valid in 64 bit, Visual Studio coughs at __asm pushfq and __asm popfq. In 32 bit mode, all is OK using __asm pushfd and __asm popfd. I threw the above code together.

How bad is this?

Sorry, I need to understand how better to post code in this new web interface.

I’ve been reviewing the WRK, and ReactOS

/rolls eyes

I don’t like seeing the WRK license terms violated, and I find ReactOS… perplexing. Let’s leave it at that.

Visual Studio coughs at __asm pushfq and __asm popfq

Well, you can’t inline assembler in the 64-bit C compiler. It’s not allowed.

Is that your issue?

How bad is this?

I’m not sure what question you’re asking, really.

I need to understand how better to post code in this new web interface.

I fixed it for you. You have two choices for formatting:

  • Write the Markdown yourself
  • Paste what you want and then select it, and apply the appropriate format from the ribbon.

Either works pretty well – but there can be odd, unanticipated, side-effect at times.

Although pushfq and popfq are valid in 64 bit, Visual Studio coughs at __asm pushfq and __asm popfq.

IIRC, if you compile to x86_64, you cannot use inline asm in .c files with either MSFT or Intel C compilers (although you can still use it with GCC). However, with MSFT and Intel compilers you have to put all your asm code in a separate .s file if you want to build a 64-bit executable. Therefore, what your compiler “coughs at” is the very __asm statement in a .c file…

Anton Bassov

I guess my question is this: Is my code acceptable? It is processor
specific, but so is SLIST. I want to emulate how SLIST does it’s
interlocking, and it is not just simply acquiring and releasing the
spinlock using documented APIs.

As for WRK, I stumbled across it on GitHub, and use it as a resource. It’s
been there for quite a long time. Should I avoid it? My guess is that if MS
had a real problem with it on GitHub, then they would have removed it. They
own GitHub after all.

On Tue, Nov 20, 2018 at 1:54 PM anton_bassov
wrote:

> OSR https://community.osr.com/
> anton_bassov commented on ExInterlockedXxx functionality
>
> > Although pushfq and popfq are valid in 64 bit, Visual Studio coughs at
> asm pushfq and asm popfq.
>
> IIRC, if you compile to x86_64, you cannot use inline asm in .c files
> with either MSFT or Intel C compilers (although you can still use it with
> GCC). However, with MSFT and Intel compilers you have to put all your asm
> code in a separate .s file if you want to build a 64-bit executable.
> Therefore, what your compiler “coughs at” is the very __asm statement in a
> .c file…
>
> Anton Bassov
>
> –
> Reply to this email directly or follow the link below to check it out:
> https://community.osr.com/discussion/comment/291515#Comment_291515
>
> Check it out:
> https://community.osr.com/discussion/comment/291515#Comment_291515
>

I guess my question is this: Is my code acceptable? It is processor specific, but so is SLIST.

Well, everything is acceptable as long as it is not buggy, but, to be honest, the whole thing seems (at least to me - probably, I just don’t understand something) unnecessarily complex and convoluted. For example what about the following schematic code (I assume the list that works on FIFO basis)

struct slist_entry { struct slist_entry * next;};

struct slist {spinlock_t lock; struct slist_entry * first; struct slist_entry * last;};

void slist_init(struct slist *list)
{
list->first=list->last=NULL;
spinlock_init(&list->lock);
}

void slist_add (struct slist *list, struct slist_entry * new_entry)
{
new_entry->next=NULL;

spin_lock(&list->lock);

if(!list->first) list->first=list->last=new_entry;

else {list->last->next=new_entry; list->last=new_entry;}

spin_unlock(&list->lock);

}

void slist_remove (struct slist *list, struct slist_entry ** ptr)
{

spin_lock(&list->lock);

*ptr=list->first;
list->first=list->first->next;
if(!list->first) list->last=NULL;

spin_unlock(&list->lock);

}

This one seems to be straightforward and arch-agnostic - the only arch-specific part is a spinlock,which is wrapped in arch-independent constructs. In other words( unless I am just missing something, of course), you seem to be adding some unnecessary complications…

Anton Bassov

As you can see in the code, I was using an InStackQueueSpinlock, and that
code works in my code. I want to make XLIST a drop-in replacement for
SLIST, and LIST_ENTRY where a stack, queue, or dequeue are required. Every
dequeue and many queue implementations I’ve seen uses a double linked list.
The double linked list adds the niceness of removing elements from within
the list, but those features are still outside the definition of these data
structures.

  • A queue needs to add to the tail, and remove from the head.
  • A dequeue needs to add to the head, and remove from the tail; and visa
    versa.

So, if this is the scope of the need, it can be implemented using one less
pointer with an xor chain. You can also emulate a stack by adding to the
front, and removing from the front, and visa versa keeping the node size
the same as if you were using SLIST. So, I think implementing an XLIST is
useful. I’d like it to be as flexible as possible, and something that can
be used anywhere you can use the interlocked SLIST and interlocked
LIST_ENTRY functions to implement stack, queue, and dequeue.

Like I said, I am just tinkering, and having some fun; and thought you
folks might be interested in my poking around.

On Tue, Nov 20, 2018 at 3:50 PM anton_bassov
wrote:

> OSR https://community.osr.com/
> anton_bassov commented on ExInterlockedXxx functionality
>
> > I guess my question is this: Is my code acceptable? It is processor
> specific, but so is SLIST.
>
> Well, everything is acceptable as long as it is not buggy, but, to be
> honest, the whole thing seems (at least to me - probably, I just don’t
> understand something) unnecessarily complex and convoluted. For example
> what about the following schematic code (I assume the list that works on
> FIFO basis)
>
> struct slist_entry { struct slist_entry * next;};
>
> struct slist {spinlock_t lock; struct slist_entry * first; struct
> slist_entry * last;};
>
> void slist_init(struct slist *list)
>
> {
>
> list->first=list->last=NULL;
>
> spinlock_init(&list->lock);
>
> }
>
> void slist_add (struct slist *list, struct slist_entry * new_entry)
>
> {
>
> new_entry->next=NULL;
>
> spin_lock(&list->lock);
>
> if(!list->first) list->first=list->last=new_entry;
>
> else {list->last->next=new_entry; list->last=new_entry;}
>
> spin_unlock(&list->lock);
>
> }
>
> void slist_remove (struct slist *list, struct slist_entry ** ptr)
>
> {
>
> spin_lock(&list->lock);
>
> *ptr=list->first;
>
> list->first=list->first->next;
>
> if(!list->first) list->last=NULL;
>
> spin_unlock(&list->lock);
>
> }
>
> This one seems to be straightforward and arch-agnostic - the only
> arch-specific part is a spinlock,which is wrapped in arch-independent
> constructs. In other words( unless I am just missing something, of course),
> you seem to be adding some unnecessary complications…
>
> Anton Bassov
>
> –
> Reply to this email directly or follow the link below to check it out:
> https://community.osr.com/discussion/comment/291526#Comment_291526
>
> Check it out:
> https://community.osr.com/discussion/comment/291526#Comment_291526
>

Every dequeue and many queue implementations I’ve seen uses a double linked list.

  • A queue needs to add to the tail, and remove from the head.
  • A dequeue needs to add to the head, and remove from the tail; and visa versa.

Well, but then you can you the XOR-linked list that you had presented to us few weeks ago - when it comes to adding to/removing from the ends it seems to be really efficient, so that it seems to be a perfect queue/dequeue that does not require the target structures that you are linking to even know about getting linked. Furthermore, once you don’t need to store a link pointer in the structure itself you can have exactly the same instance of a structure on multiple lists…

In other words,XOR-linked list seems to be superior to a simple SLIST in absolutely all respects…

Anton Bassov

I guess my question is this: Is my code acceptable? It is processor specific, but so is SLIST.

Well, if your stated goal is to emulate what Interlocked SLIST does, then it must be acceptable, assuming it’s doing what SLIST does. Doing anything else wouldn’t meet that goal.

If your question is a collegial “How do we all feel, in general, about a architecture-specific implementations for the sake of optimization?” I would answer that they are, IMHO, rarely worthwhile … this being 2018 and not 1988 (when interlocked SLIST was invented, at least notionally). The implementation of Interlocked SLIST (and it’s brother) are notable in that they can be used to share data at any IRQL, and thus can be used for sharing data between your ISR and DPC. I really don’t see any other use for them.

Peter

it seems to be a perfect queue/dequeue that does not require the target structures that you are linking to even know about
getting linked. Furthermore, once you don’t need to store a link pointer in the structure itself you can have exactly the same
instance of a structure on multiple lists…

Ooops, guys, I have to retract the above statement - it looks like I spouted nonsense yet another time, and, on this particular occasion, did it on a truly grand-scale. If it could only work the above way…well, then it would be just a perpetual-engine-grade compressor algorithm that allowed to compress infinite amounts of arbitrary data to a couple of pointers. Certainly you still need to embed a link in a target structure…

Anton Bassov