assembler.mi

dnl $ Id: $
dnl Copyright{2000,2022}: Albert van der Horst, HCC FIG Holland by GNU Public License

The forthfile({ciasdis}) assembler is described in this manual,
because the assembler that is in the lab file is compatible.
The idea is that you test code with ciasdis relying on comprehensive
consistency checking.
Then can you use the assembler in the blocks and can the real
debugging begin.
This chapter is about the assembler itself,
the information about what registers are used in ciforth is
contained in its assemblers source.


@menu
* A2:: Introduction
* A3:: Reliability
* A4:: Principle of operation
* A5:: The 8080 assembler
* A6:: Opcode sheets
* A7:: Details about the 80386 instructions
* A8:: 16, 32 and 64 bits code and segments
* A9:: Difference with the built in assembler
* AA:: A rant about redundancy
* AB:: Reference opcodes Intel 386
* ABA:: Reference opcodes Pentium-only
* AC:: The dreaded SIB byte
* AE:: An incomplete and irregular guide to the instruction mnemonics.
* AF:: Assembler Errors
@end menu
@node A2, A3, AF, Assembler
@section Introduction

Via forthurl({http://home.hccnet.nl/a.w.m.van.der.horst/forthassembler.html})
you can find a couple of assemblers, to complement the generic
ciforth system.
The assemblers are not part of the thisforth
package, and must be fetched separately.

They are based on the postit/fixup principle, an original and novel
design to accommodate reverse engineering.
The assembler that is present in the blocks,
is code compatible, but is less sophisticated,
especially regards error detection.
This assembler is automatically loaded in its 16 or a 32 bit form,
such that it is appropriate for adding small code definitions to the
system at hand.
The background information given here applies equally to that assembler.

_BOOTED_({{On this stand alone version of ciforth, you only
have the assembler in blocks, that is not documented
separately.}})
A useful technique is to develop code _BOOTED_({{in a hosted system,}})
using the full assembler.
Then with code that at least contains valid instruction enter
the debugging phase with the assembler from the library.

forthbreak
_BITS64_({The assembler is usable for 64 bit lina.
The assembler in forthfile({forth.lab}) is without reserve
useable for instructions
that do not require a prefix. Furthermore
the forthcode({REX,}) instruction
that makes the operand size 64 bits  is provided, plus
a 64 bit version of forthcode({NEXT,}).
These additions are sufficient to make the floating point library
assemble under 64 bits as floating point instructions do not require
a prefix, nor does the instruction forthcode({lEA,}).})

The following files comprise the great assembler.
forthbreak
forthfile({ass.frt})  : the 80-line 8086 assembler (no error detection), a prototype.
forthbreak
forthfile({as6809s.frt})  : a small 6809 assembler (no error detection).
forthbreak
forthfile({asgen.frt}) : generic part of postit/fixup assembler
forthbreak
forthfile({as80.frt})  : 8080 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({asi86.frt})  : 8086 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({asi386.frt}) : 80386 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({aspentium.frt}) : general Pentium non-386 instructions, requires forthfile({asgen.frt})
forthbreak
forthfile({asalpha.frt}) : DEC Alpha assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({asi6809.frt}) : 6809 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({ps.frt})     : generate opcode sheets
forthbreak
forthfile({p0.asi386.ps}) : first byte opcode for asi386 assembler
forthbreak
forthfile({p0F.asi386.ps}) : two byte opcode for same that start with 0F.
forthbreak
forthfile({test.mak}) : makefile, i.e. with targets for opcode sheets.
forthbreak

The relevant assembler present in
_BITS16_({forth.lab is equivalent to asgen.frt plus asi86.frt})
_BITS32_({forth.lab is equivalent to asgen.frt plus asi386.frt plus asipentium.frt})
but without error detection.

The forthfile({asi386.frt}) (containing the full 80386 instruction set) is in
many respects non-compliant to Intel syntax. The instruction
mnemonics are redesigned in behalf of reverse engineering.
There is a one to one correspondence between mnemonics and
machine instructions. In principle this would require a
monumental amount of documentation, comparable to parts of
Intel's architecture manuals. Not to mention the amount of work
to check this. I circumvent this. Opcode sheets for this
assembler are generated by tools automatically, and you can ask
interactively how a particular instructions can be completed.
This is a viable alternative to using manuals, if not more
practical. (Of course someone has to write up the descriptions,
I am happy Intel has done that.).

So look at my opcode sheets. If you think an instruction would be
what you want, use forthcode({SHOW:}) to find out how it is
to be completed.
If you are at all a bit familiar,
most of the time you can understand what your options are.
If not compare with an Intel opcode sheet, and look up the instruction
that sits on the same place. If you don't understand them, you can still
experiment in a Forth to find out.

The assembler in the Library Addressable
by Blocks (block file) hasn't the advanced features of disassembly,
completion and error detection.
It is intended for incidental use, to speed up a crucial word.
But the code is fully compatible,
so you can develop using the full assembler.

@node A3, A4, A2, Assembler
@section Reliability

I skimped on write up. I didn't skimp on testing.
All full assemblers, like
forthfile({asi386.frt}) and forthfile({aspentium.frt}),
are tested in this way:

forthenumerate
forthitem All instructions are generated.
(Because this uses the same
mechanism as checking during entry, it is most unlikely that you will get an
instruction assembled that is not in this set.)

forthitem They are assembled.

forthitem They are disassembled again and compared with
the original code, which must be the same.

forthitem They are disassembled by a different tool (e.g. GNU's objdump),
and the output is compared with 3. This has been done manually,
just once.
forthendenumerate

This leaves room for a defect of the following type:
A valid instruction is rejected or has been totally overlooked.

But opcode maps reveal their Terra Incognita relentlessly. So I
am quite confident to promise a bottle of good Irish whiskey to
the first one to come up with a defect in this assembler.

The full set of instructions, with all operand combinations sit
in a file for reference. This is all barring the 256-way forthsamp({SIB})
construction and prefixes, or combinations thereof. This would
explode this approach to beyond the practical.
Straightforward generation of all instructions
is also not
practical for the Alpha with 32K register combinations per
instruction. This is solved by defining ``interesting'' registers
that are used as examples
and leaving out opcode-operand combinations with uninteresting
registers.


@node A4, A5, A3, Assembler
@section Principle of operation

In making an assembler for the Pentium it turns out that
the in-between-step of creation defining words for each type
of assembly gets in the way. There are just too many of them.

MASM heavily overloads the instruction, in particular forthsamp({MOV}) .
Once I used to criticise Intel because they had an unpleasant to use
instruction set with forthsamp({MOV}) forthsamp({MVR}) and forthsamp({MVI}) for move instructions.
In hindsight I find the use of different opcodes correct.
(I mean they are really different instructions, it might have been
better if they weren't. But an assembler must live up to the truth.)
Where the Intel folks really go overboard is with the disambiguation of
essentially ambiguous constructs, by things as forthsamp({OFFSET}) forthsamp({BYTE POINTER})
 forthsamp({ASSUME}) . You can no longer find out what the instruction means by itself.
forthbreak
A simple example to illustrate this problem is
forthexample({        INC [BX]})
Are we to increment the byte or the word at BX?
Intel's solution is forthsamp({INC BYTE POINTER BX}))
The INC instruction in this (the mod/rm) incarnation has
a size bit. Here we require that this bit be filled in
explicitly, either with forthsamp({ X| }) or forthsamp({ B| }) ).
Failing to do so is a fatal error.
This results in the rule:
if an instruction doesn't determine the operand
size (some do, like forthcode({LEA,}) ),
then a size fixup is needed: forthsamp{ X| } or forthsamp{ B| } .
forthbreak
In this assembler this looks like
forthexample({        INC, B| ZO| [BX] })
This completely unambiguously determines the actual machine code.

These are the phases in which this assembler handles an instruction:

forthitemize
forthitem
POSTIT phase:
forthcode({MOV,}) assembles a two byte instruction with holes.
forthitem
FIXUP phase:
forthcode({X|}) or forthcode({B|}) fits in one of the holes left.
Other fixups determine registers and addressing mode.
forthitem
COMMA phase:
First check whether the fixups have filled up all holes.
Then add addresses (or offsets) and/or immediate data,
using e.g. forthcode({IL,}) or forthcode({L,})
forthitem
Check whether all commaers, requested either by postit's or fixup's
are present.
This check is actually executed by the next postit
prior to assembling, or by forthcode({END-CODE}).
forthenditemize

Doesn't this system lay a burden on the programmer? Yes.
He has to know exactly what he is doing.
But assembly programming is dancing on a rope. The Intel syntax tries
to hide from you were the rope is. A bad idea. There is no such thing as
assembly programming for dummies.

An advantage is that you are more
aware of what instructions are there.
Because you see the duplicates.

Now if you are serious, you have to study the forthfile({asgen.frt}) and
forthfile({as80.frt}) sources.
You better get your feet wet with forthfile({as80.frt})
before you attack the Pentium.
forthsamp({SIB}) is handled as an instruction within an instruction,
clever, but hard to understand.
It deviates somewhat from the phases explained here.

Another invention in this assembler is the forthdefi({family of instructions}).
Assembler instructions are grouped into families with identical fixups, and
an increment for the opcodes.
These are defined as a group by a single execution of a defining word.
For each group there is one opportunity to get the opcode wrong;
formerly that was for each opcode.

@node A5, A6, A4, Assembler
@section The 8080 assembler

The 8080 assembler doesn't take less place than Cassady's .
(In the end the postit-fixup makes the Pentium assembler
more compact, but not the 8080.)
But... The regularities are much more apparent.
It is much more difficult to make a mistake with the
code for the forthsamp({ADD}) and forthsamp({ADI}) instructions.
This principle allows to make a disassembler that is independant
of the instruction information, one that will work for the 8086.
A typical family are the 8 immediate- operand instructions, with an
increment of 08.
forthexample({ 08 C6 8 1FAMILY, ADI ACI SUI SBI ANI XRI ORI CPI })
The bottom line is : the assembler proper now takes 22 lines of code.
Furthermore the ``call conditional'' and ``return conditional''
instructions where missing. This became apparent as soon as I printed the
opcode sheets.
For me this means turning ``jump conditional'' into a family.

@node A6, A7, A5, Assembler
@section Opcode sheets

The makefile for the assembler project contain facilities to
generate opcode sheets directly from the instruction sets,
such as forthfile({asi386.ps.}).
For the opcode sheets featuring a n-byte prefix you must
pass the forthsamp({PREFIX}) to make and a forthsamp({MASK}) that covers the prefix and
the byte opcode, e.g. forthsamp({make asi386.ps MASK=FFFF PREFIX=0F})
The opcode sheets forthfile({p0.asi386.ps}) and forthfile({p0F.asi386.ps}) are
already part of the distribution and can be
printed on a PostScript printer or viewed with e.g. forthsamp({gv}).

Compare the opcode sheets with Intel's to get an overview of what I have done
to the instruction set. In essence I have re-engineered it to make it reverse
assemblable, i.e. from a disassembly you can regenerate the machine code.
This is forthemph({not}) true for Intel's instruction set, e.g. Intel has the same opcode for
forthsamp({MOV, X| T| AX'| R| BX| })     and
forthsamp({MOV, X| F| BX'| R| AX|}).

To get a reminder of what instructions there are type
forthcode({SHOW-OPCODES}) . If you are a bit familiar with the
opcodes you are almost there. For if you want to know what the
precise instruction format of e.g. forthcode({IMUL|AD,}) just
type forthsamp({SHOW: IMUL|AD,}) You can also type
forthcode({SHOW-ALL,}) but that takes a lot of time and is more
intended for test purposes. The most useful of them all is forthcode({??})
that for a partially completed instruction shows all possible completions.

@node A7, A8, A6, Assembler
_VERBOSE_({

@section Details about the 80386 instructions

Read the introductory comment of forthfile({asgen.frt}) for how the assembler
keeps track of the state, using the forthcode({BI}) forthcode({BY}) forthcode({BA}) tallies.
forthenumerate
forthitem
A word ending in forthkey({,}) is an ``opcode'' and reserves place in the
dictionary.
It stand for one assembler instruction.
The start of the instruction is kept and there is a bitfield (the tally) for
all bits that belong to the instruction, if only mentally. These bits are
put as comment in front of the instruction and they are considered filled
in.
The opcode also determines the instruction length.
forthitem
A fixup mostly ends in forthkey({|}).
It forthcode({OR})s in some bits
in an already assembled instruction. Again there is a mask in front
of fixups and in using the fixup these bits are considered to be filled
in.
A fixup cannot touch data before the start of the latest instruction.
Some addressing modes fixups do not have forthkey({|}) in them.
This is in order to adhere more closely to conventions regarding those
addressing modes.
This much can be said. You can be sure that a word containing
forthkey({[}) and/or forthkey({]}) is a fixup, that it
is addressing mode related and that the addressing is indirect.
forthitem
Families can be constructed from instructions or fixups with the
same tally bit fields, provided the instructions differ by a fixed increment.
The tallies also contain information about data and addresses following.
These fields must be the same too.
forthitem
The part before a possible forthkey({|}) in an instruction -- but excluding an
optional trailing I -- is the opcode. Opcodes define indeed a same action.
forthitem
The part after forthkey({|}) in an instruction may be
considered a built in fixup where irregularity forbids to use a
real fixup. A X stands for xell or natural data width. This is
16 bit for a 16 bit assembler and 32 bit for a 32 bit
assembler. These can be overruled with forthcode({ AS:, }) dnl
applying to forthcode({DX|}) and forthcode({MEM|}) and with
forthcode({ OS:, }) applying to data required where there is
an I suffix.
The commaers always reveal their true width.
It is either forthcode({IW,}) or forthcode({IL,}) .
forthitem
Width fixups determine the data width : forthcode({X|})
(xell or natural data width 16/32 ) or forthcode({B|}) ( 8 bit) unless
implied.
Offset fixups determine the offset or address width : forthcode({XO|})
(xell or natural data width 16/32 ) or forthcode({BO|}) ( 8 bit) or forthcode({ZO|}) .
forthitem
Instruction ending in forthkey({I}) have an immediate data field after all
fixups. This can be either
forthcode({IB,}) forthcode({IW,}) forthcode({IL,}) forthcode({IQ,}) ( 8 16 32 64 bit).
If there are width fixups they should correspond with the data.
forthitem
Instructions ending in forthsamp({|SG}) builtin fixup
(segments) require forthcode({SG,}) (which is always 16 bits).
For Xells in the presence of width overrules,
the programmer should carefully insert forthcode({W,}) or
forthcode({L,}) whatever appropriate.
forthitem
With r/m you can
have offsets (for forthcode({BO|}) and forthcode({XO|}) ) that
must be assembled using forthcode({B,}) or forthcode({L,}) but
mind the previous point.
forthitem
If an instructions with r/m has one register, it is always the target,
i.e. it is modified.
forthitem
Instruction with r/m can have a register instead of memory indicated
by the normal fixups forthcode({AX|}) etc.
forthitem
If instructions with r/m have two registers, the second one is indicated
by a prime such as forthcode({AX'|}).
Stated differently, if an instruction can handle two general
registers, the one that cannot be replaced by a memory reference gets a prime.
forthitem
If forthcode({T|}) or forthcode({F|}) are present they apply to the
primed register.
forthcode({T|}) ``to'' means that the primed register is modified.
Absent those the primed register is the one that is modified. e.g.
in forthcode({LEA,})
forthitem
At the start of an instruction the mask of the previous instruction
plus fixup should add up non-overlappingly to a full field.
Offsets and immediate data should have been comma-ed in in order as required.
This is diagnosed in the great assembler.
forthitem
Instructions ending in forthsamp({ :, }) are prefixes and are considered in their own
right. They have no fixups.
forthitem
The Scaled Index Byte is handled internally in the following way:
The fixup forthcode({SIB|}) closes the previous instruction (i.e.
fill up its bit field), but possible immediate data and offsets are kept.
Then forthcode({SIB,}) starts a new instruction.
The user merely needs to use a fixup with an unbalanced opening square
bracket such as forthcode({[AX}), that handles this transparently.
forthitem The forthcode({SET,}) instruction unfortunately requires a duplicate of the
forthcode({O|}) etc. fixups of the forthcode({J,}) and forthcode({J|X,}) instructions,
called forthcode({O'|}) etc.
It
forthitem
Similarly,
some single byte instructions require forthcode({X'|}) and
forthcode({B'|}) instead of forthcode({X|}) and forthcode({B|}) dnl
that are used for the ubiquitous instructions with r/m.
(FIXME! This probably is remedied in the first release of ciasdis. )
forthendenumerate

This is the way the disassembler works.

forthenumerate
forthitem
Find the first instruction that agrees with the data at the
program counter. Tally the bits. The instructions length follows from
the instruction. As does the presence of address offsets and immediate
data. In the current implementation the search follows dictionary links.
The dictionary must be organized such that the correct
instruction is found first.
If two instructions agree with the data,
in general the one that covers the most bits must be found first.
forthitem
Find the first fixup that agrees with untallied bits.
Note that opcode and previous fixups may have set bits in the
forthcode({BAD}) variable.
Any fixups that set a bit in forthcode({BAD}) that would
result in a conflict are not considered.
forthitem
If not all bits have been tallied go to 2, searching the dictionary
from where we left off
forthitem
Disassemble the address offsets and immediate data, in accordance with
the instruction. Length is determined from fixups and prefix bytes.
The commaers that were used to assemble the data have an associated
execution token to disassemble the data.
This is used to advantage to change the representation from
program counter relative to absolute,
or look up and show the name for a label.
forthendenumerate
 })_END_({_VERBOSE_})

@node A8, A9, A7, Assembler
@section 16, 32 and 64 bits code and segments

The buildin assembler would be cumbersome to use,
without the help of forthfile({ciasdis}), the great assembler.
Not only are the instructions checked as explained before,
from version 2.0.0 on, the interplay between segment size as
instruction size are checked.

In fixup forthcode({X}) is used to mean Xell, or the natural word length.
This is 16 bits for 16 bits segments, 32 bits for 32 bits segments and
64 bits for 64 bits segments
Likewise in PostIt-FixUp forthcode({AX}) means Intel's forthcode({AX}) for 16 bits
segments, forthcode({EAX}) for 32 bits segments and forthcode({RAX}) for
64 bits segments.

The description of 16 or 32 bits in the Intel manuals is messy.
These are the rules.
forthenumerate
forthitem
In real mode all sizes are 16 bits.
forthitem
In protected mode the size of an address or Xell
offset agrees with the size of the code segment.
forthitem
In protected mode the size of an immediate data Xell agrees with
the size of the applicable data segment.
Mostly this is the data segment, but it may be the stack segment
or some extra segment in the presence of segment override prefixes.
forthitem
In all previous cases the code length can be swapped between
16 and 32 bits by a code length override prefix forthcode({OS:}),
the data length by a data length override prefix forthcode({AS:}),
forthendenumerate

The 16 bit indexing in a 32 bit assembler have separate fixup's,
that all end in a forthkey({%})-sign.

In comma-ing, you must always select the proper one, commaers
contain either forthkey({C}), forthkey({W}), forthkey({L}) or
forthkey({Q}) for 1, 2, 4 or 8 byte widths.

After the directive forthcode({BITS-16}) code is generated for and checked
against 16 bit code and data segments.
After the directive forthcode({BITS-32}) code is generated for and checked
against 32 bit code and data segments.
After the directive forthcode({BITS-64}) code is generated for and checked
against 64 bit code and data segments.

In a 16 bits segments the following commaers must be used:
 forthcode({W,}) forthcode({IW,}) forthcode({(RW,)}) and forthcode({RW,}) .

In a 32 bits segments the following commaers must be used:
 forthcode({L,}) forthcode({IL,}) forthcode({(RL,)}) and forthcode({RL,}) .

In a 64 bits segments the following commaers must be used:
 forthcode({L,}) forthcode({IL,}) forthcode({(RL,)}) and
forthcode({RL,}) and occasionally forthcode({IQ,}).

The prefix forthcode({OS:}) switches the following opcode to use
forthcode({IL,}) instead of forthcode({IW,}) and vice versa.
Similarly the prefix forthcode({AS:}) switches between
forthcode({W,}) and forthcode({L,}), or between forthcode({RW,}) and
forthcode({RL,}).

While mixing modes,
whenever you get error messages and
you are sure you know better than the assembler,
put forthcode({!TALLY}) before the word that gives the error
messages.
This will forthdefi({override the error detection}).
Proper use of the BITS-xx directives makes this largely unnecessary,
but it can be needed if you use e.g. an extra segment forthcode({ES|})dnl
that is 16 bits in an otherwise 32 bits environment.

In 64 bits mode instructions that contain an immediate address
differ from 32 bits mode.
Those addresses are specified relative to the program counter, not absolute.
Consequently the forthcode({MEM|}) fixup leads to an error message, and
instead forthcode({REL|}) must be used with either a forthcode({RL,}) or
an forthcode({(RL,)}) commaer.
Absolute 64-bits addresses are nowhere present in the instruction set,
as they are not really useful.

The great assembler enforces all these rules.

AMD took advantage of the fact that Intel instruction are available
in a short and long form, e.g. forthcode({INC|X,}) and forthcode({INC, X|}).
The short form is hijacked, so forthcode({DEC|X, AX|}) becomes
forthcode({REX,}) .
All immediate data and offsets are sign-extended from 32 to 64 bits
in 64 bits code, with the rational that full 64 bit is rarely useful.
The result is that 32 and 64 code looks the same.
In the rare case that a 64 bit value is needed, forthcode({MOVI|X,})  is
hijacked and replaced with forthcode({MOVI|Q,}) .
(Remember forthcode({MOVI, X|}) is a duplicate.)
So only instructions involving ghost registers representing
integers and memory storage
are different between 32 and 64 bits.
That is all, and a 64-bit assembler is practically accommodated in full.
Bottom line, the assembler built in into forth.lab is adequate to assemble
the floating point wordset.

We need the 64-bit related prefix 0x48 to force the size to 64 bit in all
cases where a register is mentioned in the instruction.
Floating point instructions don't use regular registers and need not use
this prefix unless e.g. forthcode({[SP}) is used.
The three least significant bits in the 0x4#  switch the registers
(possible in three positions) to the ghost registers.
Such prefixes are present in forthfile({ciasdis}),
but in the lab file only forthcode({REX,}) is available.


There us more to say about forthdefi({ghost}) registers
in using forthfile({ciasdis}) itself.
 They appear instead of the regular
registers, e.g forthcode({AX}) is turned into forthcode({R8}) .
We make a distinction between instruction with possibly two
register operands, and the others. The first class is called modr/m
in Intel and AMD lit.
A two operand instruction always has a primary register that has
a prime like forthcode({AX'|}) and forthcode({T|}) forthcode({F|}) apply
to that register.
(The other operand may be a register, or indirect such as
sib or memory address.)
If you learned the distinction and use of primed and unprimed registers
it is easy:

forthenumerate
forthitem
  forthcode({'}) applies to primed registers, turns forthcode({AX'|}) to forthcode({R8'|})
forthitem
forthcode({]}) applies to index registers in sib-intructions, turns forthcode({AX']}) to forthcode({R8']})
forthitem
  forthcode({N}) bit applies to all other registers:
forthendenumerate

The remaining case of use of registers are
forthenumerate
forthitem
     - unprimed register like forthcode({BX|}) .
forthitem
     - indirect like forthcode({[BX]}) .
forthitem
     - base register in sib like forthcode({[AX}) .
forthendenumerate

In summary we get
  forthcode({ Q: QN: Q]: QN]: Q': QN': Q']: QN']: }) for possible
 prefixes, that switch at the same time to 64 bits.
Similar prefixes are available with E , if you want the 32
  bit ghost registers.

Note that
an unprimed register cannot be combined with sib (scaled
indexing) in any way, which would signify conflict between
forthcode({AX|}) and forthcode({[AX}) .

Note that most assemblers would conflate
forthcode({MOV, MOVI, MOVI|X, }) etc,
instructions and would not allow for such an easy explanation.

@node A9, AA, A8, Assembler
@section The built in assembler

From within ciforth one can load an assembler from the installed LAB
library by the command forthcode({WANT ASSEMBLERi86 }).
Automatically a 32 bit assembler is loaded
if the Forth itself is 32/64 bits and a 16 bit assembler for the
16 bit forths.
This is a simplified version with no error checking and
no provisions for 16/32 bit mixing.
(Those are not needed, because you can mix with impunity.)
This assembler is now (since 5.0.0) fully compatible with the large
file-based one.

forthemph({Consequently you can take a debugged program and run it
through the LAB assembler. })

forthemph({The built in assembler has no error
checking.})

forthemph({IMPORTANT NOTE: The 5.169 version and later may contain
assembler code in the LAB file that has not yet been converted.
This code largely relates to a booting version;
It will be updated as soon as I have a booting version in a binary
form available.
})

@node AA, AB, A9, Assembler
_VERBOSE_({

@section A rant about redundancy

You could complain about redundancy in postit-fixup assemblers.
But there is an advantage to that, it helps detect invalid
combinations of instructions parts. They look bad at first
sight. What about forthbreak forthsamp({MOV, B| T| [BX+SI] R|
AX|}) forthbreak forthsamp({MOV,}) needs two operands but there
is no primary operand in sight. forthcode({[BX+SI]}) would not
qualify. and not even forthcode({BX|}) because the primary
operand should be marked with a prime. forthbreak
forthsamp({MOV, X| T| BX| AX|}) looks bad because you know
forthcode({BX|}) and forthcode({AX|}) work on the same bit
fields, so it easy to remember you need the prime.
forthcode({T|}) and forthcode({F|}) refer to the primary
operands, so gone is the endless confusion about what is the
destination of the move. forthbreak forthsamp({MOV, X| T| BX'|
R| AL})| looks bad , because forthcode({AL|}) could not
possibly qualify as an X register. forthbreak forthsamp({MOV,
X| T| BX'| AX|}) looks bad , because soon you will adopt the
habit that one of the 8 main register always must be preceeded
with forthfile({T|}) forthcode({F|}) or forthcode({R|}) .
forthbreak forthsamp({MOV, X| T| BX'| R| AX|}) looks right but
you still can code forthsamp({MOV, AX| BX'| R| T| X|}) if you
prefer your fixups in alphabetic order. (A nice rule for those
Code Standard Police out there?).

And yes forthsamp({ES: OS:
MOV, X| T| DI'| XO| [BP +8* AX] FFFFF800 L,}) though
being correct, and in a logical order, looks still bad, because
it forthemph({is}) bad in the sense that the Pentium design got
overboard in complication. (This example is from the built-in assembler,
the one in forthfile({asi386.frt}) redefines forthcode({[BP}) c.s.
to get rid of the forthcode({SIB|,}) instruction.)
forthbreak
First remark: lets assume this is
32 bit code,(because otherwise there
would not be a forthcode({SIB,}) sure?)
forthbreak
There are 3 sizes involved :
forthitemize
forthitem
The size of the data transported this is always the forthsamp({X}) as
in forthcode({X|}) .
Then the first forthcode({X|}) changes its meaning to 16 bit, because
of the forthcode({OS:}) prefix.
forthitem
The fixups related to address offsets forthcode({XO|}) and forthcode({L,}) must
agree and are 32 bits because you are in a 32 bits segment and this
was not be overridden.
forthitem
The offset (in forthsamp({+AX]}) ) is counted in 64 bits.
Apparently, the forthsamp({DI}) is fetched from two cell records.
forthenditemize
And .. by the way the data is placed in the extra segment.
Add a bit of awareness of the cost of the instructions in execution time
and take care of the difference between the Pentium processors MMX en III
and what not and you will see that assembly program is not for the faint
of heart. The forthsamp({ASSUME}) of the MASM assembler buys you
nothing,
but
inconvenience.
 })_END_({_VERBOSE_})

@node AB, ABA, AA, Assembler
_VERBOSE_({
@section Reference opcodes, Intel 386

Table one contains all the opcodes used in forthfile({asi386.frt}) in alphabetic order,
with forthkey({|}) sorted before any letter.
The opcodes that lift the assembler to the level of the Pentium is separately
in table 3, in order not to make the tables overly long.
All opcodes on the first position are the same as Intel opcodes,
barring the bar.
Note that sometimes parts that are integrated in the opcodes in Intel
mnemonics are a separate fixup in the Postit-Fixup assembler.
Examples are the condition codes in jumps.

You can use it in two ways.

forthitemize
forthitem
You want the opcode for some known Intel opcode.
forthbreak
Look it up in the first column. One of the opcodes on that
line is what you want. To
pick the right one, consider the extension that are explained
in table 2. Exception: forthsamp({PUSHI}) is not on the line with forthsamp({PUSH}) .
Some times you have to trim built in size designators, e.g. you
look up forthsamp({LODSW}) but you are stuck at forthcode({LODS}) , so that's it.
With forthsamp({ SHOW: LODS, }) you can see what the operands look like.
forthitem
You want to know what a POSIT/FIXUP code does. Look it up in the table,
on the first word on the line you should recognize an Intel opcode. For example you have
forthcode({ CALLFAROI, })
That is at the line with forthcode({CALL,}) . So the
combination of operands for forthcode({CALLFAROI,}) are to be
found in the description for forthsamp({CALL}) in the Intel
manuals.
forthenditemize

Note. Some things are ugly. forthcode({LDS,}) should be
forthcode({L|DS,}) . I would replace forthcode({MOV|FA,}) by
forthcode({STA,}) and forthcode({MOV|TA,}) by forthcode({LDA, }) . But
that would make the cross referencing more problematic. Note. The
meaning of the operands for forthsamp({JMP}) and forthsamp({JMPFAR})
are totally different. So my suffices are different.

Table 1. Opcode cross reference.

@table @var
forthitem AAA,
forthitem AAD,
forthitem AAM,
forthitem AAS,
forthitem ADC, ADCI, ADCI|A, ADCSI,
forthitem ADD, ADDI, ADDI|A, ADDSI,
forthitem AND, ANDI, ANDI|A, ANDSI,
forthitem ARPL,
forthitem AS:,
forthitem BOUND,
forthitem BSF,
forthitem BSR,
forthitem BT, BTI,
forthitem BTC, BTCI,
forthitem BTR, BTRI,
forthitem BTS, BTSI,
forthitem CALL, CALLFAR, CALLFAROI, CALLO,
forthitem CBW,
forthitem CLC,
forthitem CLD,
forthitem CLI,
forthitem CLTS,
forthitem CMC,
forthitem CMP, CMPI, CMPI|A,
forthitem CMPS, CMPSI,
forthitem CPUID,
forthitem CS:,
forthitem CWD,
forthitem DAA,
forthitem DAS,
forthitem DEC, DEC|X,
forthitem DIV|AD,
forthitem DS:,
forthitem ENTER,
forthitem ES:,
forthitem FS:,
forthitem GS:,
forthitem HLT,
forthitem IDIV|AD,
forthitem IMUL, IMUL|AD, IMULI, IMULSI,
forthitem INC, INC|X,
forthitem INS,
forthitem INT, INT3, INTO,
forthitem IN|D, IN|P,
forthitem IRET,
forthitem J, J|X, (Intel Jcc)
forthitem JCXZ,
forthitem JMP, {JMPFAR,} JMPFAROI, JMPO, JMPS,
forthitem LAHF,
forthitem LAR,
forthitem LDS,
forthitem LEA,
forthitem LEAVE,
forthitem LES,
forthitem LFS,
forthitem LGDT,
forthitem LGS,
forthitem LIDT,
forthitem LLDT,
forthitem LMSW,
forthitem LOCK,
forthitem LODS,
forthitem LOOP, LOOPNZ, LOOPZ,
forthitem LSL,
forthitem LSS,
forthitem LTR,
forthitem MOV, MOV|CD, MOV|FA, MOV|SG, MOV|TA,
forthitem MOVI, MOVI|B, MOVI|X,
forthitem MOVS,
forthitem MOVSX|B, MOVSX|W,
forthitem MOVZX|B, MOVZX|W,
forthitem MUL|AD,
forthitem NEG,
forthitem NOT,
forthitem OR, ORI, ORI|A, ORSI,
forthitem OS:,
forthitem OUTS,
forthitem OUT|D, OUT|P,
forthitem POP, POP|ALL, POP|DS, POP|ES, POP|FS, POP|GS, POP|SS, POP|X,
forthitem POPF,
forthitem PUSH, PUSH|ALL, PUSH|CS, PUSH|DS, PUSH|ES, PUSH|FS, PUSH|GS, PUSH|SS, PUSH|X,
forthitem PUSHF,
forthitem PUSHI|B, PUSHI|X,
forthitem RCL, RCLI,
forthitem RCR, RCRI,
forthitem REPNZ,
forthitem REPZ,
forthitem RET+, RET, RETFAR+, RETFAR,
forthitem ROL, ROLI,
forthitem ROR, RORI,
forthitem SAHF,
forthitem SAR, SARI,
forthitem SBB, SBBI, SBBI|A, SBBSI,
forthitem SCAS,
forthitem SET,   (Intel SETcc)
forthitem SGDT,
forthitem SHL, SHLI,
forthitem SHLD|C, SHLDI,
forthitem SHR, SHRI,
forthitem SHRD|C, SHRDI,
forthitem SIDT,
forthitem SLDT,
forthitem SMSW,
forthitem SS:,
forthitem STC,
forthitem STD,
forthitem STI,
forthitem STOS,
forthitem STR,
forthitem SUB, SUBI, SUBI|A, SUBSI,
forthitem TEST, TESTI, TESTI|A,
forthitem VERR,
forthitem VERW,
forthitem WAIT,
forthitem XCHG,
forthitem XCHG|AX,
forthitem XLAT,
forthitem XOR, XORI, XORI|A, XORSI,
forthitem ~SIB,
@end table

 Table 2 Suffixes, not separated by a forthkey({|})
@table @var
forthitem I       : Immediate operand
forthitem SI      : Sign extended immediate operand
forthitem FAR     : Far (sometimes combined with OI)
forthitem O       : Operand
forthitem OI      : Operand indirect
@end table

 })_END_({_VERBOSE_})


@node ABA, AC, AB, Assembler
_VERBOSE_({
@section Reference opcodes, Pentium only.

Table three contains all the opcodes present in forthfile({asipentium.frt})
in alphabetic order,
with forthkey({|}) sorted before any letter.
All opcodes on the first position are the same as Intel opcodes,
barring the bar.
Note that again sometimes parts that are integrated in the opcodes in Intel
mnemonics are a separate fixup in the Postit-Fixup assembler.

You can use it in the same way as the Intel 386 table.
But there are much less instances where the opcodes do not agree exactly with
Intels.
Memory operands are specified in the same way for floating point
instructions.
But in     those instructions
register operands are always floating point registers.


There is at most one register specified in a floating point
instruction.
For two register operation forthcode({ST0}) is always implicit.
In that case normally it is the first operand as per forthsamp({ST0-ST1}).
forthsamp({a|}) (abnormal operation) means forthcode({ST0})
is the second operand as per forthsamp({ST1-ST0}).
Also normally forthcode({ST0}) gets the result.
forthsamp({m|}) (modified) means that the explicit register gets modified
instead.

And don't forget! forthsamp({SHOW: <opcode>}) is your friend.

Table 3. Opcode cross reference. Pentium-only.

@table @var
forthitem BSWAP,
forthitem CMPXCHG,
forthitem CMPXCHG8B,
forthitem F2XM1,
forthitem FABS,
forthitem FADD,
forthitem FADDP,
forthitem FBLD,
forthitem FBSTP,
forthitem FCHS,
forthitem FCLEX,
forthitem FCOM,
forthitem FCOMP,
forthitem FCOMPP,
forthitem FCOS,
forthitem FDECSTP,
forthitem FDIV,
forthitem FDIVP,
forthitem FFREE,
forthitem FIADD,
forthitem FICOM,
forthitem FICOMP,
forthitem FIDIV,
forthitem FILD, FILD|64,
forthitem FIMUL,
forthitem FINCSTP,
forthitem FINIT,
forthitem FIST,
forthitem FISTP, FISTP|64,
forthitem FISUB,
forthitem FLD, FLD|e,
forthitem FLD1,
forthitem FLDCW,
forthitem FLDENV,
forthitem FLDL2E,
forthitem FLDL2T,
forthitem FLDLG2,
forthitem FLDLN2,
forthitem FLDPI,
forthitem FLDZ,
forthitem FMUL,
forthitem FMULP,
forthitem FNOP,
forthitem FPATAN,
forthitem FPREM,
forthitem FPREM1,
forthitem FPTAN,
forthitem FRNDINT,
forthitem FRSTOR,
forthitem FSAVE,
forthitem FSCALE,
forthitem FSIN,
forthitem FSINCOS,
forthitem FSQRT,
forthitem FST, FST|u,
forthitem FSTCW,
forthitem FSTENV,
forthitem FSTP, FSTP|e, FSTP|u,
forthitem FSTSW,
forthitem FSTSW|AX,
forthitem FSUB,
forthitem FSUBP,
forthitem FTST,
forthitem FUCOM,
forthitem FUCOMP,
forthitem FUCOMPP,
forthitem FXAM,
forthitem FXCH,
forthitem FXTRACT,
forthitem FYL2X,
forthitem FYL2XP1,
forthitem INVD,
forthitem INVLPG,
forthitem Illegal-1,
forthitem Illegal-2,
forthitem RDMSR,
forthitem RDTSC,
forthitem RSM,
forthitem WBINVD,
forthitem WRMSR,
forthitem XADD,
@end table

The fixups for floating point are in lower case to make
some distinction with the regular instructions.
There is one fixup that conflicts with an uppercase
fixup: forthdefi({n|}) .
Table 4 Fixups and their meanings, Pentium-only.
@table @var
forthitem ST0|    : Register name
forthitem ST1|
forthitem ST2|
forthitem ST3|
forthitem ST4|
forthitem ST5|
forthitem ST6|
forthitem ST7|
forthitem s|      : Single (16 bit)
forthitem d|      : Double (32 bit)
forthitem m|      : Explicit register is modified
forthitem u|      : Explicit is unmodied, result to ST0
forthitem n|      : ST0 is first operand (normal)
forthitem a|      : ST0 is second operand (abnormal)
forthitem |16     : Int width in memory.
forthitem |32     : Int width in memory.
@end table

 })_END_({_VERBOSE_})


@node AC, AE, ABA, Assembler
@section The dreaded SIB byte

If you ask for the operands of a memory instruction (one of the
simple ones is LGDT, ) instead of all the forthdefi({scaled index byte})
(forthdefi({SIB})) possibilities you see.
forthsamp({LGDT, BO| ~SIB| 14 SIB,, 18, B,})
This loads the general description table from an address
described by a sib-byte of 14 and an offset of 18.

The forthsamp({~SIB| 14 SIB,,}) may be replaced by any sib-specification of
the kind forthsamp({[AX +2* SI]}).
You can ask for a reminder of the 256 possibilities by
forthsamp({SHOW: ~SIB,})

The SIB constituents are not normal fixups.
They must always appear between the normal fixups and
the commaers, and the first must be the base register,
the one with opening bracket,
such as forthcode({[AX}).

Error-prone as that may seem, the great assembler only accepts
correct instructions.
Instructions are verbose, but they are hard to misinterpret.

 Table 3 SIB-byte fixups.
@table @var
forthitem [AX   : Base register
forthitem [CX   : Base register
forthitem [DX   : Base register
forthitem [BX   : Base register
forthitem [SP   : Base register
forthitem [BP   : Base register
forthitem [MEM  : Base memory
forthitem [SI   : Base register
forthitem [DI   : register
forthitem +1*   : Scale by 1 byte.
forthitem +2*   : Scale by 2 bytes.
forthitem +4*   : Scale by 4 bytes.
forthitem +8*   : Scale by 8 bytes.
forthitem AX]   : Scaled index
forthitem CX]   : Scaled index
forthitem DX]   : Scaled  index
forthitem BX]   : Scaled index
forthitem 0]    : No index
forthitem BP]   : Scaled index
forthitem SI]   : Scaled index
forthitem DI]   : Scaled index

@end table

_VERBOSE_({
For the curious:

Explanation of
forthsamp({LGDT, BO| ~SIB| 10 SIB,, 14, B,})
This way of specifying a sib-byte
would be perfectly legal, had I not hidden those words.
It shows what is going on: the instruction is completed by ~SIB|
telling the assembler that a comma-er forthcode({SIB,,}) is required.

Instead of the comma-er we use a forthcode({~SIB,}) instruction.
This specifies in fact a one byte opcode with three fields
examplified by forthsamp({[AX +2* SI})] (and
again you might say forthsamp({+2* SI] [AX}) with the same meaning.)
At the same time it is a comma-er in the sense that it reports
that the demand for a sib-commaer is fill filled.

Many subtleties are involved to get right the error detection and the
disassembly.
 })_END_({_VERBOSE_})

@node AE, AF, AC, Assembler
_VERBOSE_({
@section An incomplete and irregular guide to the instruction mnemonics.

The following is an attempted overview of the suffixes and fixup's
used. It may be of some help for using the assembler because it gives
some idea of some of the names.
It doesn't contain all mnemonics,
you have to consult an Intel manual anyway,
just a few of them that I find hard to remember.

It also doesn't contain all fixup's, only those that are particularly
hard or irregular.
Neither does it contain fixups that are
part of a forthdefi({SIB}) byte (treated elsewhere).

So beware!

Note that some of the instruction are Pentium and as yet
not present in the forthfile(asi386.frt).

Be careful with fixups that end in a % (such as forthcode({ [BP+IS]%}).
They are to be used in incidental 16 bits code, so in 16 bits code segments
or for instructions preceeded by an address size overwrite prefix.

The forthdefi({primed registers}) have a prime after the register name
such as forthcode({AX'|}) , compared to forthcode({AX|}).
Some opcodes allow two operands and then always one of them is a
primed register.
Whether the primed register is a source or destination is explicitly
covered by forthcode({T|}) and forthcode({F|}) ,
forthemph({not}) by any order in which the operands appear.

The primed conditions such as forthcode({Z'|}) have a different
reason.
Those cannot be the same as the unprimed ones,
because they occur at a different place in the opcode,
though I would prefer them to be.

Some instructions
forthbreak
CPUID: CPU Identification
forthbreak
L :   Load Full Pointer
forthbreak
LLDT: Load Local Descriptor Table Register
forthbreak
LGDT: Load General Descriptor Table Register
forthbreak
LIDT: Load Interrupt Descriptor Table Register
forthbreak
LTR:  Load Task Register
forthbreak
LMSW: Load Machine Status Word
forthbreak
RDTSC: Read from Time Stamp Counter
forthbreak
RDMSR: Read from Model Specific Register
forthbreak
SHLD: Double Precision Shift Left
forthbreak
SHRD: Double Precision Shift Right
forthbreak
SLDT: Store Local Descriptor Table Register
forthbreak
SMSW: Store Machine Status Word
forthbreak
VERR: Verify a Segment for Reading or Writing
forthbreak
WRMSR: Write to Model Specific Register
forthbreak
forthbreak
Suffixes of the opcode, i.e. part of the opcode word.
forthbreak
|ALL : All
forthbreak
|CD : Control/Debug register
forthbreak
|FS : Replaces FS| in irregular opcodes.
forthbreak
|GS : Replaces GS| in irregular opcodes.
forthbreak
|AD : Implicit A and Double result.
forthbreak
|C  : Implicit C (count)
forthbreak
forthbreak
Items in Fixups.
forthbreak
Y| : Yes, Use the condition straight
forthbreak
N| : No, Use the condition inverted
forthbreak
O| : Overflow
forthbreak
C| : Carry
forthbreak
Z| : Zero
forthbreak
CZ| : C || Z (unsigned <= )
forthbreak
S| : Sign ( <0 )
forthbreak
P| : Parity (even)
forthbreak
L| :  S != O (signed < )
forthbreak
LE| : L || Z (signed <= )
forthbreak
T| : To (primed or special register)
forthbreak
F| : From (primed or special register)
forthbreak
V| : Variable number (in shifts)
forthbreak
1| : Just shift by 1.
forthbreak
ZO| : Zero Offset
forthbreak
BO| : Byte Offset
forthbreak
XO| : Xell Offset
forthbreak
Items in Commaers.
Note that in commaers, there is never an forthkey({X}).
You always have to choose between forthkey({W}) for 16 bits
and forthkey({L}) for 32 bits or forthkey({Q}) for 64 bits.
forthbreak
OW,    Obligatory word
forthbreak
(RL,)  Cell relative to IP
forthbreak
(RW,)  Cell relative to IP
forthbreak
(RB,)  Byte relative to IP
forthbreak
SG,   Segment: word
forthbreak
P,    Port number : byte
forthbreak
IS,   Single obligatory  byte
forthbreak
IL,   immediate data : cell
forthbreak
IW,   immediate data : cell
forthbreak
IB,   immediate data : byte
forthbreak
L,    address/offset data : cell
forthbreak
W,    address/offset data : cell
forthbreak
B,    address/offset data : byte
forthbreak
SIB,, Scaled index byte, an instruction with in an instruction
forthbreak
OB, : Obligatory byte
forthbreak
OW, : Obligatory word (=16bits)
forthbreak

There are also forthcode({RB, RW, RL,}) based on
forthcode({(RB,) (RW,) (RL,)}).
They comma in an amount relative to the program counter
based on an absolute address,
such that you can use labels.
These are used preferably, and are made to appear
in the disassemblies.
Otherwise no labels could appear in disassemblies.

 })_END_({_VERBOSE_})

@node AF, A2, AE, Assembler
@section  Assembler Errors

Errors are identified by a number. They are globally unique, so
assembler error numbers do not overlap with other ciforth error numbers,
or errors returned from operating system calls.
_VERBOSE_({Of course the error numbers are given in decimal, always.})

The errors whose message starts with forthsamp({AS:}) are used by the PostIt FixUp assembler
in the file forthfile({asgen.frt}). forthxref({Errors}) for other errors.

forthitemize
forthitem
forthsamp({ciforth ERROR # 26 : AS: PREVIOUS INSTRUCTION INCOMPLETE})

You left holes in the instruction before the current one, i.e.
one or more fixups like forthcode({X|}) are missing. Or you forget
to supply data required by the opcode like forthcode({OW,}) .
_VERBOSE_({With forthcode({??}) you can see what completions of your opcode
are possible.})

forthitem
forthsamp({ciforth ERROR # 27 : AS: INSTRUCTION PROHIBITED IRREGULARLY})

The instruction you try to assemble would have been legal, if Intel
had not made an exception just for this combination. This situation
is handled by special code, to issue just this error.
(This is rare, most situations are handled by bad bits, resulting
in different errors.)

forthitem
forthsamp({ciforth ERROR # 28 : AS: UNEXPECTED FIXUP/COMMAER})

You try to complete an opcode by fixup's (like forthcode({X|}))
or comma-ers (like forthcode({OW,}) ) in a way that conflicts
with what you specified earlier. So the fixup/comma-er word at
which this error is detected conflicts with either the opcode,
or one of the other fixups/comma-ers.
dnl FIXME This explanation is the same as the following.
_VERBOSE_({For example specifying both a forthcode({SI'|}) and a forthcode({DI'|}) operand
for a forthcode({LEA,}) opcode.})

forthitem
 forthsamp({ciforth ERROR # 29 : AS: DUPLICATE FIXUP/UNEXPECTED COMMAER})

You try to complete an opcode by fixup's (like forthcode({X|})  dnl
) or comma-ers (like forthcode({OW,}) ) in a way that conflicts
with what you specified earlier. So the fixup/comma-er word at
which this error is detected conflicts with either the opcode,
or one of other fixups/comma-ers.
FIXME This explanation is the same as the previous.
_VERBOSE_({For example forthcode({B|}) (byte size) with a forthcode({LEA,}) opcode .})

forthitem
forthsamp({ciforth ERROR # 30 : AS: COMMAERS IN WRONG ORDER})

The opcode requires more than one data item to be comma-ed in, such as
immediate data and an address. However you put them in the wrong order.
Use forthcode({SHOW:}) .

forthitem
 forthsamp({ciforth ERROR # 31 : AS: DESIGN ERROR, INCOMPATIBLE MASK})

This signals an internal inconsistency in the assembler itself.
If you are using an assembler supplied with ciforth, you can report
this as a defect (``bug'').
The remainder of this explanation is intended for the writers
of assemblers.
The bits that are filled in by an assembler word are outside
of the area were it is supposed to fill bits in. The latter
are specified separately by a mask.

forthitem
 forthsamp({ciforth ERROR # 32 : AS: PREVIOUS OPCODE PLUS FIXUPS INCONSISTENT})

The total instruction with opcode, fixups and data is ``bad''.
Somewhere there are parts that are conflicting. This may be another
one of the irregularities of the Intel instruction set. Or the
forthcode({BAD}) data was preset with bits to indicate that you
want to prohibit this instruction on this processor, because it
is not implemented. Investigate forthcode({BAD}) for two consecutive bits
that are up, and inspect the meaning of each of the two bits.
forthenditemize