Want to create a segment selector I can use

Hi –

I am developing a Forth-based virtual machine that is able to bootstrap
itself into existence from within some host program. It relies on the host
to allocate for it a 1 MB piece of memory and then proceeds to write intel
opcodes to that memory for a basic 32-instruction machine and a rudimentary
parser, after which it parses and executes an increasingly powerful set of
Forth language instructions to build a more capable processing environment.
For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and ebp
registers.

This has been entirely adequate up to now, but now that I am implementing
threads, I find that I need another register to give me access to per-thread
data structures. I can’t use the stack, code or data segment registers, but
I do notice that the es segment is sitting there unused. And what I would
like to do is be able to load a sgement selector index into it that will
ultimately point me back to a piece of memory that is part of my original 1
MB. So I would like to create a Local Descriptor Table entry for each
thread whose base address points to a data structure allocated by the forth
code already running. I figure that if I can get that to work, then I can
just use a segment override together with the other registers I already use
to read from and write to my thread-owned data and let the OS worry about
switching my tasks.

I can’t really use the stack for this purpose, since I’ve taken over ebp and
thus don’t really have a stack frame per se to work with. Bit I think this
approach I’ve outlined above should work pretty well, if my reading of the
Intel docs is correct. It may also give me the benefit of being able to
install one of my machines (the basic execution engine is really pretty
small) into code running at the kernel level and do driver work with it –
providing I don’t discover some show-stopper along the way that nixes the
idea.

After searching for the better part of a day, I can’t find sample code
anywhere that will show me how to allocate new segment selectors and install
them in the LDT. It looks like it should be as simple as getting the LDT
pointer form the Local Descriptor Table Register, creating a new selector
structure (there’s enough code out there that shows how to do that) and then
finding an available slot to stuff it into. It’s this last step that leaves
me a little perplexed. I can’t find anywhere any sort of protocol for doing
this in a safe and well-behaved way. I suppose I could just find the first
entry whose contents are still uninitialized and put my selector there. But
how do I know that some other process won’t come and trash it out from under
me? And how would that other process know that my selector slot is already
being used – especially if it does something icky like loading its own
selectors into a statically determined slot?

So my questions are two: First, am I totally out to lunch here or should I
in theory (or even better, in practice) be able to do what I am suggesting?
And second, how do I do this in a safe way. I guess a third question would
be, do I have to be at privilege level 0 to write to the LDT and if so, is
there any way I can put myself there from a user-mode program?

I would greatly appreciate some guidance from those more knowlegeable and
mighty than myself.

Thanks much,

Mike

One relatively easy way to implement Forth threads is to do it inside your
Forth system itself. The context switching is easy, all you have to do is to
switch the Forth Stack, although it may help to split your dictionary into
two, a global and a local one, so that each FThread has its own local dict
that links upwards to the global dict. The task switcher is peanuts to
implement if you do it within your Forth system: a thread is just another
Forth stack and possibly another local dictionary. I find this approach a
lot cheaper than to try to mess with OS-level structures.

We did this way back in 1981, on Sperry DCPs running Telcon and on Sperry
1100s running Exec 8, and it worked wonders. No need to fiddle with the
hardware, it’s all within the VM.

The problem of writing to the LDT and such like things is that it works fine
provided you have full control of the system. However, if you’re running
within Windows, you don’t know when the OS is going to take over your
hardware and mess with it, or worse, assume something that you forgot to do
or that you just couldn’t do: you will manage to seize full control from
Windows (well, we do that here at Numega, hence it’s possible) but it’s a
big and complex job. So, I find it best to implement threading inside your
Forth VM, and plug it onto Windows somehow, without bothering their
mechanisms. Once you do that, the Forth environment itself will allow you to
take things over seamlessly, for as long as you want, specially if your
Forth has a decent compiling facility.

Hope this helps !

Alberto.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Michael Clagett
Sent: Thursday, June 10, 2004 2:05 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Want to create a segment selector I can use

Hi –

I am developing a Forth-based virtual machine that is able to bootstrap
itself into existence from within some host program. It relies on the host
to allocate for it a 1 MB piece of memory and then proceeds to write intel
opcodes to that memory for a basic 32-instruction machine and a rudimentary
parser, after which it parses and executes an increasingly powerful set of
Forth language instructions to build a more capable processing environment.
For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and ebp
registers.

This has been entirely adequate up to now, but now that I am implementing
threads, I find that I need another register to give me access to per-thread
data structures. I can’t use the stack, code or data segment registers, but
I do notice that the es segment is sitting there unused. And what I would
like to do is be able to load a sgement selector index into it that will
ultimately point me back to a piece of memory that is part of my original 1
MB. So I would like to create a Local Descriptor Table entry for each
thread whose base address points to a data structure allocated by the forth
code already running. I figure that if I can get that to work, then I can
just use a segment override together with the other registers I already use
to read from and write to my thread-owned data and let the OS worry about
switching my tasks.

I can’t really use the stack for this purpose, since I’ve taken over ebp and
thus don’t really have a stack frame per se to work with. Bit I think this
approach I’ve outlined above should work pretty well, if my reading of the
Intel docs is correct. It may also give me the benefit of being able to
install one of my machines (the basic execution engine is really pretty
small) into code running at the kernel level and do driver work with it –
providing I don’t discover some show-stopper along the way that nixes the
idea.

After searching for the better part of a day, I can’t find sample code
anywhere that will show me how to allocate new segment selectors and install
them in the LDT. It looks like it should be as simple as getting the LDT
pointer form the Local Descriptor Table Register, creating a new selector
structure (there’s enough code out there that shows how to do that) and then
finding an available slot to stuff it into. It’s this last step that leaves
me a little perplexed. I can’t find anywhere any sort of protocol for doing
this in a safe and well-behaved way. I suppose I could just find the first
entry whose contents are still uninitialized and put my selector there. But
how do I know that some other process won’t come and trash it out from under
me? And how would that other process know that my selector slot is already
being used – especially if it does something icky like loading its own
selectors into a statically determined slot?

So my questions are two: First, am I totally out to lunch here or should I
in theory (or even better, in practice) be able to do what I am suggesting?
And second, how do I do this in a safe way. I guess a third question would
be, do I have to be at privilege level 0 to write to the LDT and if so, is
there any way I can put myself there from a user-mode program?

I would greatly appreciate some guidance from those more knowlegeable and
mighty than myself.

Thanks much,

Mike


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

Well, this is quite interesting :).

For the OP, and I could be (at present ) really off of the mark, but LDT is
per process IIRC, so if you have the right process context, and if you
create a seg sel. into an unused slot, then the there is more, it has to be
mapped properly. But if all are done properly, when a context switch
happens, the new map should not affect !!! For that, the only requirement is
that the space you are after is pageable per process area !!! Adress space
mapping is per process, so we dont need to worry about any collision there
due to thread switching within a process, I suppose.

I dont know of anyway to execute an user stack at ring 0 ( specially of
NT+ ), but if you catch it under kernel, so that you gurantee that it is
your process, you capture the VM of the process, but you are in kernel stack
execution.

-pro

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Moreira, Alberto
Sent: Thursday, June 10, 2004 6:56 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Want to create a segment selector I can use

One relatively easy way to implement Forth threads is to do it inside your
Forth system itself. The context switching is easy, all you have to do is to
switch the Forth Stack, although it may help to split your dictionary into
two, a global and a local one, so that each FThread has its own local dict
that links upwards to the global dict. The task switcher is peanuts to
implement if you do it within your Forth system: a thread is just another
Forth stack and possibly another local dictionary. I find this approach a
lot cheaper than to try to mess with OS-level structures.

We did this way back in 1981, on Sperry DCPs running Telcon and on Sperry
1100s running Exec 8, and it worked wonders. No need to fiddle with the
hardware, it’s all within the VM.

The problem of writing to the LDT and such like things is that it works fine
provided you have full control of the system. However, if you’re running
within Windows, you don’t know when the OS is going to take over your
hardware and mess with it, or worse, assume something that you forgot to do
or that you just couldn’t do: you will manage to seize full control from
Windows (well, we do that here at Numega, hence it’s possible) but it’s a
big and complex job. So, I find it best to implement threading inside your
Forth VM, and plug it onto Windows somehow, without bothering their
mechanisms. Once you do that, the Forth environment itself will allow you to
take things over seamlessly, for as long as you want, specially if your
Forth has a decent compiling facility.

Hope this helps !

Alberto.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Michael Clagett
Sent: Thursday, June 10, 2004 2:05 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Want to create a segment selector I can use

Hi –

I am developing a Forth-based virtual machine that is able to bootstrap
itself into existence from within some host program. It relies on the host
to allocate for it a 1 MB piece of memory and then proceeds to write intel
opcodes to that memory for a basic 32-instruction machine and a rudimentary
parser, after which it parses and executes an increasingly powerful set of
Forth language instructions to build a more capable processing environment.
For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and ebp
registers.

This has been entirely adequate up to now, but now that I am implementing
threads, I find that I need another register to give me access to per-thread
data structures. I can’t use the stack, code or data segment registers, but
I do notice that the es segment is sitting there unused. And what I would
like to do is be able to load a sgement selector index into it that will
ultimately point me back to a piece of memory that is part of my original 1
MB. So I would like to create a Local Descriptor Table entry for each
thread whose base address points to a data structure allocated by the forth
code already running. I figure that if I can get that to work, then I can
just use a segment override together with the other registers I already use
to read from and write to my thread-owned data and let the OS worry about
switching my tasks.

I can’t really use the stack for this purpose, since I’ve taken over ebp and
thus don’t really have a stack frame per se to work with. Bit I think this
approach I’ve outlined above should work pretty well, if my reading of the
Intel docs is correct. It may also give me the benefit of being able to
install one of my machines (the basic execution engine is really pretty
small) into code running at the kernel level and do driver work with it –
providing I don’t discover some show-stopper along the way that nixes the
idea.

After searching for the better part of a day, I can’t find sample code
anywhere that will show me how to allocate new segment selectors and install
them in the LDT. It looks like it should be as simple as getting the LDT
pointer form the Local Descriptor Table Register, creating a new selector
structure (there’s enough code out there that shows how to do that) and then
finding an available slot to stuff it into. It’s this last step that leaves
me a little perplexed. I can’t find anywhere any sort of protocol for doing
this in a safe and well-behaved way. I suppose I could just find the first
entry whose contents are still uninitialized and put my selector there. But
how do I know that some other process won’t come and trash it out from under
me? And how would that other process know that my selector slot is already
being used – especially if it does something icky like loading its own
selectors into a statically determined slot?

So my questions are two: First, am I totally out to lunch here or should I
in theory (or even better, in practice) be able to do what I am suggesting?
And second, how do I do this in a safe way. I guess a third question would
be, do I have to be at privilege level 0 to write to the LDT and if so, is
there any way I can put myself there from a user-mode program?

I would greatly appreciate some guidance from those more knowlegeable and
mighty than myself.

Thanks much,

Mike


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@garlic.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

I believe it’s better to implement the Forth stack by hand and separate from
the machine stack. The best machine to do that was the Motorola 6809,
because it had separate machine and user stacks: once in my remote past I
implemented a “Tiny Forth” for the 6809 that ran inside a 2K Eprom and had a
fair amount of functionality, and it had pretty close to the whole of the
Forth 79 standard inside a 4K Eprom. The Motorola 68K was good too, because
of its post-increment and pre-decrement addressing modes. So, the Forth
Stack becomes just a nonpaged memory buffer, and it operates separate from
the system stack which runs on the machine stack. The problem of
intertwining the Forth user stack with the machine stack is, you never know
who’s going to have pushed stuff onto it at what time ! You think you have
your operands on the stack when you do a ROT, and you’re instead messing up
somebody else’s interrupt stack…

Alberto.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Prokash Sinha
Sent: Thursday, June 10, 2004 10:48 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Want to create a segment selector I can use

Well, this is quite interesting :).

For the OP, and I could be (at present ) really off of the mark, but LDT is
per process IIRC, so if you have the right process context, and if you
create a seg sel. into an unused slot, then the there is more, it has to be
mapped properly. But if all are done properly, when a context switch
happens, the new map should not affect !!! For that, the only requirement is
that the space you are after is pageable per process area !!! Adress space
mapping is per process, so we dont need to worry about any collision there
due to thread switching within a process, I suppose.

I dont know of anyway to execute an user stack at ring 0 ( specially of
NT+ ), but if you catch it under kernel, so that you gurantee that it is
your process, you capture the VM of the process, but you are in kernel stack
execution.

-pro

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Moreira, Alberto
Sent: Thursday, June 10, 2004 6:56 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Want to create a segment selector I can use

One relatively easy way to implement Forth threads is to do it inside your
Forth system itself. The context switching is easy, all you have to do is to
switch the Forth Stack, although it may help to split your dictionary into
two, a global and a local one, so that each FThread has its own local dict
that links upwards to the global dict. The task switcher is peanuts to
implement if you do it within your Forth system: a thread is just another
Forth stack and possibly another local dictionary. I find this approach a
lot cheaper than to try to mess with OS-level structures.

We did this way back in 1981, on Sperry DCPs running Telcon and on Sperry
1100s running Exec 8, and it worked wonders. No need to fiddle with the
hardware, it’s all within the VM.

The problem of writing to the LDT and such like things is that it works fine
provided you have full control of the system. However, if you’re running
within Windows, you don’t know when the OS is going to take over your
hardware and mess with it, or worse, assume something that you forgot to do
or that you just couldn’t do: you will manage to seize full control from
Windows (well, we do that here at Numega, hence it’s possible) but it’s a
big and complex job. So, I find it best to implement threading inside your
Forth VM, and plug it onto Windows somehow, without bothering their
mechanisms. Once you do that, the Forth environment itself will allow you to
take things over seamlessly, for as long as you want, specially if your
Forth has a decent compiling facility.

Hope this helps !

Alberto.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Michael Clagett
Sent: Thursday, June 10, 2004 2:05 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Want to create a segment selector I can use

Hi –

I am developing a Forth-based virtual machine that is able to bootstrap
itself into existence from within some host program. It relies on the host
to allocate for it a 1 MB piece of memory and then proceeds to write intel
opcodes to that memory for a basic 32-instruction machine and a rudimentary
parser, after which it parses and executes an increasingly powerful set of
Forth language instructions to build a more capable processing environment.
For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and ebp
registers.

This has been entirely adequate up to now, but now that I am implementing
threads, I find that I need another register to give me access to per-thread
data structures. I can’t use the stack, code or data segment registers, but
I do notice that the es segment is sitting there unused. And what I would
like to do is be able to load a sgement selector index into it that will
ultimately point me back to a piece of memory that is part of my original 1
MB. So I would like to create a Local Descriptor Table entry for each
thread whose base address points to a data structure allocated by the forth
code already running. I figure that if I can get that to work, then I can
just use a segment override together with the other registers I already use
to read from and write to my thread-owned data and let the OS worry about
switching my tasks.

I can’t really use the stack for this purpose, since I’ve taken over ebp and
thus don’t really have a stack frame per se to work with. Bit I think this
approach I’ve outlined above should work pretty well, if my reading of the
Intel docs is correct. It may also give me the benefit of being able to
install one of my machines (the basic execution engine is really pretty
small) into code running at the kernel level and do driver work with it –
providing I don’t discover some show-stopper along the way that nixes the
idea.

After searching for the better part of a day, I can’t find sample code
anywhere that will show me how to allocate new segment selectors and install
them in the LDT. It looks like it should be as simple as getting the LDT
pointer form the Local Descriptor Table Register, creating a new selector
structure (there’s enough code out there that shows how to do that) and then
finding an available slot to stuff it into. It’s this last step that leaves
me a little perplexed. I can’t find anywhere any sort of protocol for doing
this in a safe and well-behaved way. I suppose I could just find the first
entry whose contents are still uninitialized and put my selector there. But
how do I know that some other process won’t come and trash it out from under
me? And how would that other process know that my selector slot is already
being used – especially if it does something icky like loading its own
selectors into a statically determined slot?

So my questions are two: First, am I totally out to lunch here or should I
in theory (or even better, in practice) be able to do what I am suggesting?
And second, how do I do this in a safe way. I guess a third question would
be, do I have to be at privilege level 0 to write to the LDT and if so, is
there any way I can put myself there from a user-mode program?

I would greatly appreciate some guidance from those more knowlegeable and
mighty than myself.

Thanks much,

Mike


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@garlic.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

Sure, since the OP’s trying on Forth, and you gave your expert input already :).

I was stressing the x86/nt combination, and if I want to overlay or inject some code in the user space (.txt) I dont think it would be that difficult. UNDERLYING ASSUMPTION … NOT FOR PRODUCTION MAY BE …

-pro

Google for ZwSetLdtEntries and NtSetLdtEntries.
These functions are also documented in Gary Nebbett’s
book “Windows NT/2000 Native API Reference”.

You can call NtSetLdtEntries from user mode like here:
http://vx.netlux.org/lib/vzo13.html

BTW, there has been a major security hole with this call until the recent
Windows
security patch. Expand-down segments were not validated correctly.

Dmitriy Budko, VMware

Hi –

Thanks for the suggestions, Alberto, Sinha and Dimitriy. Alberto, the
reason I was gravitating towards leveraging the os’s task management is that
I am already interacting with the os fairly heavily in that I am trying to
use the os’s windowing facility.and it seemed pretty worthwhile and useful
to give a top-level window its own message loop. I suppose there’s no real
reason why I can’t use a single message loop for all my windows, but I do
find that somewhat constricting. It seems more natural to me to stay as
close as I can to the c++ model that I am most familiar with.

The forth machine design I’m using is based on the more recent work of
Charles Moore (as documented by Jeff Fox and others) and the 32 virtual
machine instruction primitives I use are based on Moore’s Machine Forth –
with some twists, however. In place of Machine Forth’s 24-bit address
space, which was designed to fit their hardware environment, I use a full
32-bit address space so as to be able to integrate more naturally with
Windows and Linux and with the Intel processor itself. I am, however,
following their basic scheme, which means that I embed multiple 5-bit
instruction codes in a single intel 32-bit word (I get to have six slots
instead of their original 4 because of my laonger word length). This ends
up being extremely efficient both in terms of code dictionary space usage
and in terms of memory accesses, since I can hold a number of vm
instructions in a single register at a time. Following their model I
implement separate data and return stacks, avoiding the hardware stack
completely and using the eax register for top of data stack and the edi
register for top of return stack; the rest of each of these stacks is
implemented in its own circular buffer in memory.

In addition I have implemented a variety of addressing schemes that include
using 20-bit internal forth vm addresses that can be embedded inside a
32-bit word along with whatever instruction uses them sitting in instruction
slot one, as well as 32-bit intel addresses for calling os functions and
interacting with the hosting program – which so far has been C++, but
which theoretically could be any programming language that can write bytes
to memory and jump directly to assembler code. (The embedded 20-bit
addresses, by the way, permit a pretty efficient direct threading model with
the bulk of forth code word parameter lists consisting of sequences of
direct calls to code inside the forth code dictionary.)

The only reason I’ve gone to such length im describing this here (aside from
the fact that I’ve been working on this for about six or seven months in
spare hours without a single person to talk with about it who can understand
a word I’m saying) is that I wanted to give you a flavor of what would
challeng me in managing multitasking completely inside the machine itself.
A given thread’s state includes such things as whether it is currently in
forth or intel addressing mode, what dictionary stack it is using, what
current numeric base is in effect, what are the contents of its data and
return stacks and other such things. The thing that’s tricky is that I have
implemented the underlying 32-instruction vm in machine code (by literally
writing the code bytes to memory as I boot everything up) and a number of
these most basic operations get the current base address (either the
beginning of the forth vm space if its in forth addressing mode or the
beginning of the host process’s space if its in intel addressing mode) from
a memory variable and add this to whatever address is passed to them via the
forth data or return stacks (or in the machine’s address register; yes,
Moore’s design also has an address register for writing to and from memory
locations).

Thus very thread-state-specific information is used in the coremost
functioning of the virtual machine itself. This is why the idea of
leveraging the operating system’s ability to just switch in and out register
sets is so attractive to me. That together with the idea of avoiding having
to integrate my own thread management with os windowing message loops is a
very powerful attraction indeed. Also, I have on numerous occasions (as
have many others) cursed the relative dearth of registers available in the
intel architecture. I have every register but ebx and ecx dedicated to a
core and essential virtual machine purpose, so it’s not easy to think of how
to switch contexts within the scope of my virtual machine. So you can see
why I was attracted to the idea of having all my thread-specific state
sitting in a data structure pointed to by the es register, so that the core
machine-language implementation could use it completely transparently.

Alberto, given everything contained herein in this long, cathartic diatribe,
do you still think I would be better off managing threads myself? This is
not a rhetorical question; I really would like your opinion, since I’m
certain you have infinitely more practical experience in all of this than I
do and are almost certain to be in better touch with the difficulties that
lie ahead of me. I had sort of imagined that my os threads would not be
unlike any old app’s os threads and that the issues would be comparable to
any multi-threaded app’s issues. The whole reason I’m interested in the es
register is that (unlike the fs and gs registers, for example) it strikes me
as being part of that core set of registers that an application is really
supposed to be able to use for itself. And given that I’m talking about a
single selector per thread pointing to a data structure of limited size
(although I guess that’s probably pretty irrelevant) in the Local Descriptor
Table, which is supposed to be dedicated to an individual process, I was
hoping that that wouldn’t be an overly risky thing to take on.

Thank yuu very much for slogging through all this with me, and I of course
would appreciate any insights you (or any of the other brains that frequent
this site) could throw my way.

Regards,

Mike

----- Original Message -----
From: “Michael Clagett”
Newsgroups: ntdev
To:
Sent: Thursday, June 10, 2004 2:05 AM
Subject: Want to create a segment selector I can use

> Hi –
>
> I am developing a Forth-based virtual machine that is able to bootstrap
> itself into existence from within some host program. It relies on the
host
> to allocate for it a 1 MB piece of memory and then proceeds to write intel
> opcodes to that memory for a basic 32-instruction machine and a
rudimentary
> parser, after which it parses and executes an increasingly powerful set of
> Forth language instructions to build a more capable processing
environment.
> For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and ebp
> registers.
>
> This has been entirely adequate up to now, but now that I am implementing
> threads, I find that I need another register to give me access to
per-thread
> data structures. I can’t use the stack, code or data segment registers,
but
> I do notice that the es segment is sitting there unused. And what I would
> like to do is be able to load a sgement selector index into it that will
> ultimately point me back to a piece of memory that is part of my original
1
> MB. So I would like to create a Local Descriptor Table entry for each
> thread whose base address points to a data structure allocated by the
forth
> code already running. I figure that if I can get that to work, then I can
> just use a segment override together with the other registers I already
use
> to read from and write to my thread-owned data and let the OS worry about
> switching my tasks.
>
> I can’t really use the stack for this purpose, since I’ve taken over ebp
and
> thus don’t really have a stack frame per se to work with. Bit I think
this
> approach I’ve outlined above should work pretty well, if my reading of the
> Intel docs is correct. It may also give me the benefit of being able to
> install one of my machines (the basic execution engine is really pretty
> small) into code running at the kernel level and do driver work with it –
> providing I don’t discover some show-stopper along the way that nixes the
> idea.
>
> After searching for the better part of a day, I can’t find sample code
> anywhere that will show me how to allocate new segment selectors and
install
> them in the LDT. It looks like it should be as simple as getting the LDT
> pointer form the Local Descriptor Table Register, creating a new selector
> structure (there’s enough code out there that shows how to do that) and
then
> finding an available slot to stuff it into. It’s this last step that
leaves
> me a little perplexed. I can’t find anywhere any sort of protocol for
doing
> this in a safe and well-behaved way. I suppose I could just find the
first
> entry whose contents are still uninitialized and put my selector there.
But
> how do I know that some other process won’t come and trash it out from
under
> me? And how would that other process know that my selector slot is
already
> being used – especially if it does something icky like loading its own
> selectors into a statically determined slot?
>
> So my questions are two: First, am I totally out to lunch here or should
I
> in theory (or even better, in practice) be able to do what I am
suggesting?
> And second, how do I do this in a safe way. I guess a third question
would
> be, do I have to be at privilege level 0 to write to the LDT and if so, is
> there any way I can put myself there from a user-mode program?
>
> I would greatly appreciate some guidance from those more knowlegeable and
> mighty than myself.
>
> Thanks much,
>
> Mike
>
>
>
>

On Thu, 10 Jun 2004 02:05:01 -0400, Michael Clagett
wrote:

Note, that on the 32 i386 architecture, both Win32 and OS/2 32 bit
specifies that at all times, ES,DS,CS and SS refer to the same memory,
while (this is the clincher) FS refers to the per-thread state structure,
which is mostly identical for Win32 and OS/2.

So no need for your own selector, FS already does the job…

The typical way of using FS for your own data is to call TLSAlloc() to
allocate for your own process-wide exclusive use, one of 64
pointer-sized fields contained inside that FS: structure. In slow
code you would pass that index to the SetTLSValue and GetTLSValue calls,
but the Win32 ABI also specifies how to do it with inline assembler.

For code like yours, the index (allocated at process startup) would be
patched directly into the generated virtual machine instructions, to
avoid loading it into a register, typical code would be

OPCODE FS:[0E10h + 4 * index]

0E10h is XP specific, I do not recall the official inline version.
On Win95, the correct constant can be found as FS:[2Ch] - FS:[18h],
but FS:[2Ch] is not set on XP.

One not-XP specific (and apparently documented) optimization compared to
most official inline FS references DS:[FS:[018h] + x ] == FS:

In kernel mode, FS: refers to the current CPU, not the current thread,
but something in the thread switching code at least virtualizes FS:[0]

A very alternative method is used by the old thread implementation on
Linux:

Allocate your stacks at nicely aligned address such as N * 2MB, then
reserve the first part of the stack for thread data. That data can then
be found at negative offsets from (ESP | ~(2MB-1)).

> Hi –
>
> I am developing a Forth-based virtual machine that is able to bootstrap
> itself into existence from within some host program. It relies on the
> host
> to allocate for it a 1 MB piece of memory and then proceeds to write
> intel
> opcodes to that memory for a basic 32-instruction machine and a
> rudimentary
> parser, after which it parses and executes an increasingly powerful set
> of
> Forth language instructions to build a more capable processing
> environment.
> For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and
> ebp
> registers.
>
> This has been entirely adequate up to now, but now that I am implementing
> threads, I find that I need another register to give me access to
> per-thread
> data structures. I can’t use the stack, code or data segment registers,
> but
> I do notice that the es segment is sitting there unused. And what I
> would
> like to do is be able to load a sgement selector index into it that will
> ultimately point me back to a piece of memory that is part of my
> original 1
> MB. So I would like to create a Local Descriptor Table entry for each
> thread whose base address points to a data structure allocated by the
> forth
> code already running. I figure that if I can get that to work, then I
> can
> just use a segment override together with the other registers I already
> use
> to read from and write to my thread-owned data and let the OS worry about
> switching my tasks.
>
> I can’t really use the stack for this purpose, since I’ve taken over ebp
> and
> thus don’t really have a stack frame per se to work with. Bit I think
> this
> approach I’ve outlined above should work pretty well, if my reading of
> the
> Intel docs is correct. It may also give me the benefit of being able to
> install one of my machines (the basic execution engine is really pretty
> small) into code running at the kernel level and do driver work with it
> –
> providing I don’t discover some show-stopper along the way that nixes the
> idea.
>
> After searching for the better part of a day, I can’t find sample code
> anywhere that will show me how to allocate new segment selectors and
> install
> them in the LDT. It looks like it should be as simple as getting the LDT
> pointer form the Local Descriptor Table Register, creating a new selector
> structure (there’s enough code out there that shows how to do that) and
> then
> finding an available slot to stuff it into. It’s this last step that
> leaves
> me a little perplexed. I can’t find anywhere any sort of protocol for
> doing
> this in a safe and well-behaved way. I suppose I could just find the
> first
> entry whose contents are still uninitialized and put my selector there.
> But
> how do I know that some other process won’t come and trash it out from
> under
> me? And how would that other process know that my selector slot is
> already
> being used – especially if it does something icky like loading its own
> selectors into a statically determined slot?
>
> So my questions are two: First, am I totally out to lunch here or
> should I
> in theory (or even better, in practice) be able to do what I am
> suggesting?
> And second, how do I do this in a safe way. I guess a third question
> would
> be, do I have to be at privilege level 0 to write to the LDT and if so,
> is
> there any way I can put myself there from a user-mode program?
>
> I would greatly appreciate some guidance from those more knowlegeable and
> mighty than myself.
>
> Thanks much,
>
> Mike
>
>
>


#include <disclaimer.h></disclaimer.h>

Sounds like you are after a (almost) pure virtual machine, if I understand
correctly !!!.
I am guessing this based on your own stacks design, instruction packing
etc…

If that is correct, then I would recommend a fairly recent account on this
subject by -
“Virtual Machine Desing & Implementation in C/C++” by Bill Blunden. I was
breifly looking at it when I was poking with the j9 (java ) ports of IBM,
not much detail I got into though, AND IT IS IN MY OPINION A GOOD REFERENCE.

In case it is a noise, pls ignore.

-pro

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Michael Clagett
Sent: Thursday, June 10, 2004 9:39 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Want to create a segment selector I can use

Hi –

Thanks for the suggestions, Alberto, Sinha and Dimitriy. Alberto, the
reason I was gravitating towards leveraging the os’s task management is that
I am already interacting with the os fairly heavily in that I am trying to
use the os’s windowing facility.and it seemed pretty worthwhile and useful
to give a top-level window its own message loop. I suppose there’s no real
reason why I can’t use a single message loop for all my windows, but I do
find that somewhat constricting. It seems more natural to me to stay as
close as I can to the c++ model that I am most familiar with.

The forth machine design I’m using is based on the more recent work of
Charles Moore (as documented by Jeff Fox and others) and the 32 virtual
machine instruction primitives I use are based on Moore’s Machine Forth –
with some twists, however. In place of Machine Forth’s 24-bit address
space, which was designed to fit their hardware environment, I use a full
32-bit address space so as to be able to integrate more naturally with
Windows and Linux and with the Intel processor itself. I am, however,
following their basic scheme, which means that I embed multiple 5-bit
instruction codes in a single intel 32-bit word (I get to have six slots
instead of their original 4 because of my laonger word length). This ends
up being extremely efficient both in terms of code dictionary space usage
and in terms of memory accesses, since I can hold a number of vm
instructions in a single register at a time. Following their model I
implement separate data and return stacks, avoiding the hardware stack
completely and using the eax register for top of data stack and the edi
register for top of return stack; the rest of each of these stacks is
implemented in its own circular buffer in memory.

In addition I have implemented a variety of addressing schemes that include
using 20-bit internal forth vm addresses that can be embedded inside a
32-bit word along with whatever instruction uses them sitting in instruction
slot one, as well as 32-bit intel addresses for calling os functions and
interacting with the hosting program – which so far has been C++, but
which theoretically could be any programming language that can write bytes
to memory and jump directly to assembler code. (The embedded 20-bit
addresses, by the way, permit a pretty efficient direct threading model with
the bulk of forth code word parameter lists consisting of sequences of
direct calls to code inside the forth code dictionary.)

The only reason I’ve gone to such length im describing this here (aside from
the fact that I’ve been working on this for about six or seven months in
spare hours without a single person to talk with about it who can understand
a word I’m saying) is that I wanted to give you a flavor of what would
challeng me in managing multitasking completely inside the machine itself.
A given thread’s state includes such things as whether it is currently in
forth or intel addressing mode, what dictionary stack it is using, what
current numeric base is in effect, what are the contents of its data and
return stacks and other such things. The thing that’s tricky is that I have
implemented the underlying 32-instruction vm in machine code (by literally
writing the code bytes to memory as I boot everything up) and a number of
these most basic operations get the current base address (either the
beginning of the forth vm space if its in forth addressing mode or the
beginning of the host process’s space if its in intel addressing mode) from
a memory variable and add this to whatever address is passed to them via the
forth data or return stacks (or in the machine’s address register; yes,
Moore’s design also has an address register for writing to and from memory
locations).

Thus very thread-state-specific information is used in the coremost
functioning of the virtual machine itself. This is why the idea of
leveraging the operating system’s ability to just switch in and out register
sets is so attractive to me. That together with the idea of avoiding having
to integrate my own thread management with os windowing message loops is a
very powerful attraction indeed. Also, I have on numerous occasions (as
have many others) cursed the relative dearth of registers available in the
intel architecture. I have every register but ebx and ecx dedicated to a
core and essential virtual machine purpose, so it’s not easy to think of how
to switch contexts within the scope of my virtual machine. So you can see
why I was attracted to the idea of having all my thread-specific state
sitting in a data structure pointed to by the es register, so that the core
machine-language implementation could use it completely transparently.

Alberto, given everything contained herein in this long, cathartic diatribe,
do you still think I would be better off managing threads myself? This is
not a rhetorical question; I really would like your opinion, since I’m
certain you have infinitely more practical experience in all of this than I
do and are almost certain to be in better touch with the difficulties that
lie ahead of me. I had sort of imagined that my os threads would not be
unlike any old app’s os threads and that the issues would be comparable to
any multi-threaded app’s issues. The whole reason I’m interested in the es
register is that (unlike the fs and gs registers, for example) it strikes me
as being part of that core set of registers that an application is really
supposed to be able to use for itself. And given that I’m talking about a
single selector per thread pointing to a data structure of limited size
(although I guess that’s probably pretty irrelevant) in the Local Descriptor
Table, which is supposed to be dedicated to an individual process, I was
hoping that that wouldn’t be an overly risky thing to take on.

Thank yuu very much for slogging through all this with me, and I of course
would appreciate any insights you (or any of the other brains that frequent
this site) could throw my way.

Regards,

Mike

----- Original Message -----
From: “Michael Clagett”
Newsgroups: ntdev
To:
Sent: Thursday, June 10, 2004 2:05 AM
Subject: Want to create a segment selector I can use

> Hi –
>
> I am developing a Forth-based virtual machine that is able to bootstrap
> itself into existence from within some host program. It relies on the
host
> to allocate for it a 1 MB piece of memory and then proceeds to write intel
> opcodes to that memory for a basic 32-instruction machine and a
rudimentary
> parser, after which it parses and executes an increasingly powerful set of
> Forth language instructions to build a more capable processing
environment.
> For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and ebp
> registers.
>
> This has been entirely adequate up to now, but now that I am implementing
> threads, I find that I need another register to give me access to
per-thread
> data structures. I can’t use the stack, code or data segment registers,
but
> I do notice that the es segment is sitting there unused. And what I would
> like to do is be able to load a sgement selector index into it that will
> ultimately point me back to a piece of memory that is part of my original
1
> MB. So I would like to create a Local Descriptor Table entry for each
> thread whose base address points to a data structure allocated by the
forth
> code already running. I figure that if I can get that to work, then I can
> just use a segment override together with the other registers I already
use
> to read from and write to my thread-owned data and let the OS worry about
> switching my tasks.
>
> I can’t really use the stack for this purpose, since I’ve taken over ebp
and
> thus don’t really have a stack frame per se to work with. Bit I think
this
> approach I’ve outlined above should work pretty well, if my reading of the
> Intel docs is correct. It may also give me the benefit of being able to
> install one of my machines (the basic execution engine is really pretty
> small) into code running at the kernel level and do driver work with it –
> providing I don’t discover some show-stopper along the way that nixes the
> idea.
>
> After searching for the better part of a day, I can’t find sample code
> anywhere that will show me how to allocate new segment selectors and
install
> them in the LDT. It looks like it should be as simple as getting the LDT
> pointer form the Local Descriptor Table Register, creating a new selector
> structure (there’s enough code out there that shows how to do that) and
then
> finding an available slot to stuff it into. It’s this last step that
leaves
> me a little perplexed. I can’t find anywhere any sort of protocol for
doing
> this in a safe and well-behaved way. I suppose I could just find the
first
> entry whose contents are still uninitialized and put my selector there.
But
> how do I know that some other process won’t come and trash it out from
under
> me? And how would that other process know that my selector slot is
already
> being used – especially if it does something icky like loading its own
> selectors into a statically determined slot?
>
> So my questions are two: First, am I totally out to lunch here or should
I
> in theory (or even better, in practice) be able to do what I am
suggesting?
> And second, how do I do this in a safe way. I guess a third question
would
> be, do I have to be at privilege level 0 to write to the LDT and if so, is
> there any way I can put myself there from a user-mode program?
>
> I would greatly appreciate some guidance from those more knowlegeable and
> mighty than myself.
>
> Thanks much,
>
> Mike
>
>
>
>


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@garlic.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Jakob –

Thanks so much; that’s brilliant. I had already tried using the TLS
functions, but was trying to work with addresses and unfortunately the
linear address of each thread’s TLS variables is different. So since I
generate the machine code on startup, I was only able to generate the static
address of my original thread. It never occurred to me to use the FS
register to point to the TLS directly. I read about the fact that FS is
used for this purpose, but only went so far as to ssy “Ooohhh! Better not
use that register.” But this is great. The offset can be encoded in the
instruction and I can just address the same slot with every thread. No
mess, no fuss and it seems pretty kosher.

The only problem I can see encountering is when I get around to bringing
this over to Linux. Everybody over there is bitching about how Wine uses
the FS segment (gee, I wonder why). So either I do something uncool or I
find another way to do it when I get there.

Thanks very much. I’ll try this and see if it works.

Regards,

Mike

“Jakob Bohm” wrote in message news:xxxxx@ntdev…
> On Thu, 10 Jun 2004 02:05:01 -0400, Michael Clagett
> wrote:
>
> Note, that on the 32 i386 architecture, both Win32 and OS/2 32 bit
> specifies that at all times, ES,DS,CS and SS refer to the same memory,
> while (this is the clincher) FS refers to the per-thread state structure,
> which is mostly identical for Win32 and OS/2.
>
> So no need for your own selector, FS already does the job…
>
> The typical way of using FS for your own data is to call TLSAlloc() to
> allocate for your own process-wide exclusive use, one of 64
> pointer-sized fields contained inside that FS: structure. In slow
> code you would pass that index to the SetTLSValue and GetTLSValue calls,
> but the Win32 ABI also specifies how to do it with inline assembler.
>
> For code like yours, the index (allocated at process startup) would be
> patched directly into the generated virtual machine instructions, to
> avoid loading it into a register, typical code would be
>
> OPCODE FS:[0E10h + 4 * index]
>
> 0E10h is XP specific, I do not recall the official inline version.
> On Win95, the correct constant can be found as FS:[2Ch] - FS:[18h],
> but FS:[2Ch] is not set on XP.
>
> One not-XP specific (and apparently documented) optimization compared to
> most official inline FS references DS:[FS:[018h] + x ] == FS:
>
> In kernel mode, FS: refers to the current CPU, not the current thread,
> but something in the thread switching code at least virtualizes FS:[0]
>
> A very alternative method is used by the old thread implementation on
> Linux:
>
> Allocate your stacks at nicely aligned address such as N * 2MB, then
> reserve the first part of the stack for thread data. That data can then
> be found at negative offsets from (ESP | ~(2MB-1)).
>
> > Hi –
> >
> > I am developing a Forth-based virtual machine that is able to bootstrap
> > itself into existence from within some host program. It relies on the
> > host
> > to allocate for it a 1 MB piece of memory and then proceeds to write
> > intel
> > opcodes to that memory for a basic 32-instruction machine and a
> > rudimentary
> > parser, after which it parses and executes an increasingly powerful set
> > of
> > Forth language instructions to build a more capable processing
> > environment.
> > For the basic machine I take over the eax, ebx, ecx, edx, esi, edi and
> > ebp
> > registers.
> >
> > This has been entirely adequate up to now, but now that I am
implementing
> > threads, I find that I need another register to give me access to
> > per-thread
> > data structures. I can’t use the stack, code or data segment registers,
> > but
> > I do notice that the es segment is sitting there unused. And what I
> > would
> > like to do is be able to load a sgement selector index into it that will
> > ultimately point me back to a piece of memory that is part of my
> > original 1
> > MB. So I would like to create a Local Descriptor Table entry for each
> > thread whose base address points to a data structure allocated by the
> > forth
> > code already running. I figure that if I can get that to work, then I
> > can
> > just use a segment override together with the other registers I already
> > use
> > to read from and write to my thread-owned data and let the OS worry
about
> > switching my tasks.
> >
> > I can’t really use the stack for this purpose, since I’ve taken over ebp
> > and
> > thus don’t really have a stack frame per se to work with. Bit I think
> > this
> > approach I’ve outlined above should work pretty well, if my reading of
> > the
> > Intel docs is correct. It may also give me the benefit of being able to
> > install one of my machines (the basic execution engine is really pretty
> > small) into code running at the kernel level and do driver work with it
> > –
> > providing I don’t discover some show-stopper along the way that nixes
the
> > idea.
> >
> > After searching for the better part of a day, I can’t find sample code
> > anywhere that will show me how to allocate new segment selectors and
> > install
> > them in the LDT. It looks like it should be as simple as getting the
LDT
> > pointer form the Local Descriptor Table Register, creating a new
selector
> > structure (there’s enough code out there that shows how to do that) and
> > then
> > finding an available slot to stuff it into. It’s this last step that
> > leaves
> > me a little perplexed. I can’t find anywhere any sort of protocol for
> > doing
> > this in a safe and well-behaved way. I suppose I could just find the
> > first
> > entry whose contents are still uninitialized and put my selector there.
> > But
> > how do I know that some other process won’t come and trash it out from
> > under
> > me? And how would that other process know that my selector slot is
> > already
> > being used – especially if it does something icky like loading its own
> > selectors into a statically determined slot?
> >
> > So my questions are two: First, am I totally out to lunch here or
> > should I
> > in theory (or even better, in practice) be able to do what I am
> > suggesting?
> > And second, how do I do this in a safe way. I guess a third question
> > would
> > be, do I have to be at privilege level 0 to write to the LDT and if so,
> > is
> > there any way I can put myself there from a user-mode program?
> >
> > I would greatly appreciate some guidance from those more knowlegeable
and
> > mighty than myself.
> >
> > Thanks much,
> >
> > Mike
> >
> >
> >
>
>
>
> –
> #include <disclaimer.h>
></disclaimer.h>

> this over to Linux. Everybody over there is bitching about how Wine uses

Wine has a reputation of being very buggy.
Port to native Linux instead.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

On Fri, 11 Jun 2004 21:34:53 -0400, Michael Clagett
wrote:

>
> The only problem I can see encountering is when I get around to bringing
> this over to Linux. Everybody over there is bitching about how Wine uses
> the FS segment (gee, I wonder why). So either I do something uncool or I
> find another way to do it when I get there.
>

Look closely at the pthreads implementation on Linux, it has its own
similar
code, in one version (there are several competing implementations!) it does
this (but this is OT for ntdev…):

>>
>> A very alternative method is used by the old thread implementation on
>> Linux:
>>
>> Allocate your stacks at nicely aligned address such as N * 2MB, then
>> reserve the first part of the stack for thread data. That data can then
>> be found at negative offsets from (ESP | ~(2MB-1)).
>>
>> > Hi –
>> >


#include <disclaimer.h></disclaimer.h>