-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathassembler.mi
1362 lines (1228 loc) · 48 KB
/
assembler.mi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
dnl $ Id: $
dnl Copyright{2000,2022}: Albert van der Horst, HCC FIG Holland by GNU Public License
The forthfile({ciasdis}) assembler is described in this manual,
because the assembler that is in the lab file is compatible.
The idea is that you test code with ciasdis relying on comprehensive
consistency checking.
Then can you use the assembler in the blocks and can the real
debugging begin.
This chapter is about the assembler itself,
the information about what registers are used in ciforth is
contained in its assemblers source.
@menu
* A2:: Introduction
* A3:: Reliability
* A4:: Principle of operation
* A5:: The 8080 assembler
* A6:: Opcode sheets
* A7:: Details about the 80386 instructions
* A8:: 16, 32 and 64 bits code and segments
* A9:: Difference with the built in assembler
* AA:: A rant about redundancy
* AB:: Reference opcodes Intel 386
* ABA:: Reference opcodes Pentium-only
* AC:: The dreaded SIB byte
* AE:: An incomplete and irregular guide to the instruction mnemonics.
* AF:: Assembler Errors
@end menu
@node A2, A3, AF, Assembler
@section Introduction
Via forthurl({http://home.hccnet.nl/a.w.m.van.der.horst/forthassembler.html})
you can find a couple of assemblers, to complement the generic
ciforth system.
The assemblers are not part of the thisforth
package, and must be fetched separately.
They are based on the postit/fixup principle, an original and novel
design to accommodate reverse engineering.
The assembler that is present in the blocks,
is code compatible, but is less sophisticated,
especially regards error detection.
This assembler is automatically loaded in its 16 or a 32 bit form,
such that it is appropriate for adding small code definitions to the
system at hand.
The background information given here applies equally to that assembler.
_BOOTED_({{On this stand alone version of ciforth, you only
have the assembler in blocks, that is not documented
separately.}})
A useful technique is to develop code _BOOTED_({{in a hosted system,}})
using the full assembler.
Then with code that at least contains valid instruction enter
the debugging phase with the assembler from the library.
forthbreak
_BITS64_({The assembler is usable for 64 bit lina.
The assembler in forthfile({forth.lab}) is without reserve
useable for instructions
that do not require a prefix. Furthermore
the forthcode({REX,}) instruction
that makes the operand size 64 bits is provided, plus
a 64 bit version of forthcode({NEXT,}).
These additions are sufficient to make the floating point library
assemble under 64 bits as floating point instructions do not require
a prefix, nor does the instruction forthcode({lEA,}).})
The following files comprise the great assembler.
forthbreak
forthfile({ass.frt}) : the 80-line 8086 assembler (no error detection), a prototype.
forthbreak
forthfile({as6809s.frt}) : a small 6809 assembler (no error detection).
forthbreak
forthfile({asgen.frt}) : generic part of postit/fixup assembler
forthbreak
forthfile({as80.frt}) : 8080 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({asi86.frt}) : 8086 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({asi386.frt}) : 80386 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({aspentium.frt}) : general Pentium non-386 instructions, requires forthfile({asgen.frt})
forthbreak
forthfile({asalpha.frt}) : DEC Alpha assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({asi6809.frt}) : 6809 assembler, requires forthfile({asgen.frt})
forthbreak
forthfile({ps.frt}) : generate opcode sheets
forthbreak
forthfile({p0.asi386.ps}) : first byte opcode for asi386 assembler
forthbreak
forthfile({p0F.asi386.ps}) : two byte opcode for same that start with 0F.
forthbreak
forthfile({test.mak}) : makefile, i.e. with targets for opcode sheets.
forthbreak
The relevant assembler present in
_BITS16_({forth.lab is equivalent to asgen.frt plus asi86.frt})
_BITS32_({forth.lab is equivalent to asgen.frt plus asi386.frt plus asipentium.frt})
but without error detection.
The forthfile({asi386.frt}) (containing the full 80386 instruction set) is in
many respects non-compliant to Intel syntax. The instruction
mnemonics are redesigned in behalf of reverse engineering.
There is a one to one correspondence between mnemonics and
machine instructions. In principle this would require a
monumental amount of documentation, comparable to parts of
Intel's architecture manuals. Not to mention the amount of work
to check this. I circumvent this. Opcode sheets for this
assembler are generated by tools automatically, and you can ask
interactively how a particular instructions can be completed.
This is a viable alternative to using manuals, if not more
practical. (Of course someone has to write up the descriptions,
I am happy Intel has done that.).
So look at my opcode sheets. If you think an instruction would be
what you want, use forthcode({SHOW:}) to find out how it is
to be completed.
If you are at all a bit familiar,
most of the time you can understand what your options are.
If not compare with an Intel opcode sheet, and look up the instruction
that sits on the same place. If you don't understand them, you can still
experiment in a Forth to find out.
The assembler in the Library Addressable
by Blocks (block file) hasn't the advanced features of disassembly,
completion and error detection.
It is intended for incidental use, to speed up a crucial word.
But the code is fully compatible,
so you can develop using the full assembler.
@node A3, A4, A2, Assembler
@section Reliability
I skimped on write up. I didn't skimp on testing.
All full assemblers, like
forthfile({asi386.frt}) and forthfile({aspentium.frt}),
are tested in this way:
forthenumerate
forthitem All instructions are generated.
(Because this uses the same
mechanism as checking during entry, it is most unlikely that you will get an
instruction assembled that is not in this set.)
forthitem They are assembled.
forthitem They are disassembled again and compared with
the original code, which must be the same.
forthitem They are disassembled by a different tool (e.g. GNU's objdump),
and the output is compared with 3. This has been done manually,
just once.
forthendenumerate
This leaves room for a defect of the following type:
A valid instruction is rejected or has been totally overlooked.
But opcode maps reveal their Terra Incognita relentlessly. So I
am quite confident to promise a bottle of good Irish whiskey to
the first one to come up with a defect in this assembler.
The full set of instructions, with all operand combinations sit
in a file for reference. This is all barring the 256-way forthsamp({SIB})
construction and prefixes, or combinations thereof. This would
explode this approach to beyond the practical.
Straightforward generation of all instructions
is also not
practical for the Alpha with 32K register combinations per
instruction. This is solved by defining ``interesting'' registers
that are used as examples
and leaving out opcode-operand combinations with uninteresting
registers.
@node A4, A5, A3, Assembler
@section Principle of operation
In making an assembler for the Pentium it turns out that
the in-between-step of creation defining words for each type
of assembly gets in the way. There are just too many of them.
MASM heavily overloads the instruction, in particular forthsamp({MOV}) .
Once I used to criticise Intel because they had an unpleasant to use
instruction set with forthsamp({MOV}) forthsamp({MVR}) and forthsamp({MVI}) for move instructions.
In hindsight I find the use of different opcodes correct.
(I mean they are really different instructions, it might have been
better if they weren't. But an assembler must live up to the truth.)
Where the Intel folks really go overboard is with the disambiguation of
essentially ambiguous constructs, by things as forthsamp({OFFSET}) forthsamp({BYTE POINTER})
forthsamp({ASSUME}) . You can no longer find out what the instruction means by itself.
forthbreak
A simple example to illustrate this problem is
forthexample({ INC [BX]})
Are we to increment the byte or the word at BX?
Intel's solution is forthsamp({INC BYTE POINTER BX}))
The INC instruction in this (the mod/rm) incarnation has
a size bit. Here we require that this bit be filled in
explicitly, either with forthsamp({ X| }) or forthsamp({ B| }) ).
Failing to do so is a fatal error.
This results in the rule:
if an instruction doesn't determine the operand
size (some do, like forthcode({LEA,}) ),
then a size fixup is needed: forthsamp{ X| } or forthsamp{ B| } .
forthbreak
In this assembler this looks like
forthexample({ INC, B| ZO| [BX] })
This completely unambiguously determines the actual machine code.
These are the phases in which this assembler handles an instruction:
forthitemize
forthitem
POSTIT phase:
forthcode({MOV,}) assembles a two byte instruction with holes.
forthitem
FIXUP phase:
forthcode({X|}) or forthcode({B|}) fits in one of the holes left.
Other fixups determine registers and addressing mode.
forthitem
COMMA phase:
First check whether the fixups have filled up all holes.
Then add addresses (or offsets) and/or immediate data,
using e.g. forthcode({IL,}) or forthcode({L,})
forthitem
Check whether all commaers, requested either by postit's or fixup's
are present.
This check is actually executed by the next postit
prior to assembling, or by forthcode({END-CODE}).
forthenditemize
Doesn't this system lay a burden on the programmer? Yes.
He has to know exactly what he is doing.
But assembly programming is dancing on a rope. The Intel syntax tries
to hide from you were the rope is. A bad idea. There is no such thing as
assembly programming for dummies.
An advantage is that you are more
aware of what instructions are there.
Because you see the duplicates.
Now if you are serious, you have to study the forthfile({asgen.frt}) and
forthfile({as80.frt}) sources.
You better get your feet wet with forthfile({as80.frt})
before you attack the Pentium.
forthsamp({SIB}) is handled as an instruction within an instruction,
clever, but hard to understand.
It deviates somewhat from the phases explained here.
Another invention in this assembler is the forthdefi({family of instructions}).
Assembler instructions are grouped into families with identical fixups, and
an increment for the opcodes.
These are defined as a group by a single execution of a defining word.
For each group there is one opportunity to get the opcode wrong;
formerly that was for each opcode.
@node A5, A6, A4, Assembler
@section The 8080 assembler
The 8080 assembler doesn't take less place than Cassady's .
(In the end the postit-fixup makes the Pentium assembler
more compact, but not the 8080.)
But... The regularities are much more apparent.
It is much more difficult to make a mistake with the
code for the forthsamp({ADD}) and forthsamp({ADI}) instructions.
This principle allows to make a disassembler that is independant
of the instruction information, one that will work for the 8086.
A typical family are the 8 immediate- operand instructions, with an
increment of 08.
forthexample({ 08 C6 8 1FAMILY, ADI ACI SUI SBI ANI XRI ORI CPI })
The bottom line is : the assembler proper now takes 22 lines of code.
Furthermore the ``call conditional'' and ``return conditional''
instructions where missing. This became apparent as soon as I printed the
opcode sheets.
For me this means turning ``jump conditional'' into a family.
@node A6, A7, A5, Assembler
@section Opcode sheets
The makefile for the assembler project contain facilities to
generate opcode sheets directly from the instruction sets,
such as forthfile({asi386.ps.}).
For the opcode sheets featuring a n-byte prefix you must
pass the forthsamp({PREFIX}) to make and a forthsamp({MASK}) that covers the prefix and
the byte opcode, e.g. forthsamp({make asi386.ps MASK=FFFF PREFIX=0F})
The opcode sheets forthfile({p0.asi386.ps}) and forthfile({p0F.asi386.ps}) are
already part of the distribution and can be
printed on a PostScript printer or viewed with e.g. forthsamp({gv}).
Compare the opcode sheets with Intel's to get an overview of what I have done
to the instruction set. In essence I have re-engineered it to make it reverse
assemblable, i.e. from a disassembly you can regenerate the machine code.
This is forthemph({not}) true for Intel's instruction set, e.g. Intel has the same opcode for
forthsamp({MOV, X| T| AX'| R| BX| }) and
forthsamp({MOV, X| F| BX'| R| AX|}).
To get a reminder of what instructions there are type
forthcode({SHOW-OPCODES}) . If you are a bit familiar with the
opcodes you are almost there. For if you want to know what the
precise instruction format of e.g. forthcode({IMUL|AD,}) just
type forthsamp({SHOW: IMUL|AD,}) You can also type
forthcode({SHOW-ALL,}) but that takes a lot of time and is more
intended for test purposes. The most useful of them all is forthcode({??})
that for a partially completed instruction shows all possible completions.
@node A7, A8, A6, Assembler
_VERBOSE_({
@section Details about the 80386 instructions
Read the introductory comment of forthfile({asgen.frt}) for how the assembler
keeps track of the state, using the forthcode({BI}) forthcode({BY}) forthcode({BA}) tallies.
forthenumerate
forthitem
A word ending in forthkey({,}) is an ``opcode'' and reserves place in the
dictionary.
It stand for one assembler instruction.
The start of the instruction is kept and there is a bitfield (the tally) for
all bits that belong to the instruction, if only mentally. These bits are
put as comment in front of the instruction and they are considered filled
in.
The opcode also determines the instruction length.
forthitem
A fixup mostly ends in forthkey({|}).
It forthcode({OR})s in some bits
in an already assembled instruction. Again there is a mask in front
of fixups and in using the fixup these bits are considered to be filled
in.
A fixup cannot touch data before the start of the latest instruction.
Some addressing modes fixups do not have forthkey({|}) in them.
This is in order to adhere more closely to conventions regarding those
addressing modes.
This much can be said. You can be sure that a word containing
forthkey({[}) and/or forthkey({]}) is a fixup, that it
is addressing mode related and that the addressing is indirect.
forthitem
Families can be constructed from instructions or fixups with the
same tally bit fields, provided the instructions differ by a fixed increment.
The tallies also contain information about data and addresses following.
These fields must be the same too.
forthitem
The part before a possible forthkey({|}) in an instruction -- but excluding an
optional trailing I -- is the opcode. Opcodes define indeed a same action.
forthitem
The part after forthkey({|}) in an instruction may be
considered a built in fixup where irregularity forbids to use a
real fixup. A X stands for xell or natural data width. This is
16 bit for a 16 bit assembler and 32 bit for a 32 bit
assembler. These can be overruled with forthcode({ AS:, }) dnl
applying to forthcode({DX|}) and forthcode({MEM|}) and with
forthcode({ OS:, }) applying to data required where there is
an I suffix.
The commaers always reveal their true width.
It is either forthcode({IW,}) or forthcode({IL,}) .
forthitem
Width fixups determine the data width : forthcode({X|})
(xell or natural data width 16/32 ) or forthcode({B|}) ( 8 bit) unless
implied.
Offset fixups determine the offset or address width : forthcode({XO|})
(xell or natural data width 16/32 ) or forthcode({BO|}) ( 8 bit) or forthcode({ZO|}) .
forthitem
Instruction ending in forthkey({I}) have an immediate data field after all
fixups. This can be either
forthcode({IB,}) forthcode({IW,}) forthcode({IL,}) forthcode({IQ,}) ( 8 16 32 64 bit).
If there are width fixups they should correspond with the data.
forthitem
Instructions ending in forthsamp({|SG}) builtin fixup
(segments) require forthcode({SG,}) (which is always 16 bits).
For Xells in the presence of width overrules,
the programmer should carefully insert forthcode({W,}) or
forthcode({L,}) whatever appropriate.
forthitem
With r/m you can
have offsets (for forthcode({BO|}) and forthcode({XO|}) ) that
must be assembled using forthcode({B,}) or forthcode({L,}) but
mind the previous point.
forthitem
If an instructions with r/m has one register, it is always the target,
i.e. it is modified.
forthitem
Instruction with r/m can have a register instead of memory indicated
by the normal fixups forthcode({AX|}) etc.
forthitem
If instructions with r/m have two registers, the second one is indicated
by a prime such as forthcode({AX'|}).
Stated differently, if an instruction can handle two general
registers, the one that cannot be replaced by a memory reference gets a prime.
forthitem
If forthcode({T|}) or forthcode({F|}) are present they apply to the
primed register.
forthcode({T|}) ``to'' means that the primed register is modified.
Absent those the primed register is the one that is modified. e.g.
in forthcode({LEA,})
forthitem
At the start of an instruction the mask of the previous instruction
plus fixup should add up non-overlappingly to a full field.
Offsets and immediate data should have been comma-ed in in order as required.
This is diagnosed in the great assembler.
forthitem
Instructions ending in forthsamp({ :, }) are prefixes and are considered in their own
right. They have no fixups.
forthitem
The Scaled Index Byte is handled internally in the following way:
The fixup forthcode({SIB|}) closes the previous instruction (i.e.
fill up its bit field), but possible immediate data and offsets are kept.
Then forthcode({SIB,}) starts a new instruction.
The user merely needs to use a fixup with an unbalanced opening square
bracket such as forthcode({[AX}), that handles this transparently.
forthitem The forthcode({SET,}) instruction unfortunately requires a duplicate of the
forthcode({O|}) etc. fixups of the forthcode({J,}) and forthcode({J|X,}) instructions,
called forthcode({O'|}) etc.
It
forthitem
Similarly,
some single byte instructions require forthcode({X'|}) and
forthcode({B'|}) instead of forthcode({X|}) and forthcode({B|}) dnl
that are used for the ubiquitous instructions with r/m.
(FIXME! This probably is remedied in the first release of ciasdis. )
forthendenumerate
This is the way the disassembler works.
forthenumerate
forthitem
Find the first instruction that agrees with the data at the
program counter. Tally the bits. The instructions length follows from
the instruction. As does the presence of address offsets and immediate
data. In the current implementation the search follows dictionary links.
The dictionary must be organized such that the correct
instruction is found first.
If two instructions agree with the data,
in general the one that covers the most bits must be found first.
forthitem
Find the first fixup that agrees with untallied bits.
Note that opcode and previous fixups may have set bits in the
forthcode({BAD}) variable.
Any fixups that set a bit in forthcode({BAD}) that would
result in a conflict are not considered.
forthitem
If not all bits have been tallied go to 2, searching the dictionary
from where we left off
forthitem
Disassemble the address offsets and immediate data, in accordance with
the instruction. Length is determined from fixups and prefix bytes.
The commaers that were used to assemble the data have an associated
execution token to disassemble the data.
This is used to advantage to change the representation from
program counter relative to absolute,
or look up and show the name for a label.
forthendenumerate
})_END_({_VERBOSE_})
@node A8, A9, A7, Assembler
@section 16, 32 and 64 bits code and segments
The buildin assembler would be cumbersome to use,
without the help of forthfile({ciasdis}), the great assembler.
Not only are the instructions checked as explained before,
from version 2.0.0 on, the interplay between segment size as
instruction size are checked.
In fixup forthcode({X}) is used to mean Xell, or the natural word length.
This is 16 bits for 16 bits segments, 32 bits for 32 bits segments and
64 bits for 64 bits segments
Likewise in PostIt-FixUp forthcode({AX}) means Intel's forthcode({AX}) for 16 bits
segments, forthcode({EAX}) for 32 bits segments and forthcode({RAX}) for
64 bits segments.
The description of 16 or 32 bits in the Intel manuals is messy.
These are the rules.
forthenumerate
forthitem
In real mode all sizes are 16 bits.
forthitem
In protected mode the size of an address or Xell
offset agrees with the size of the code segment.
forthitem
In protected mode the size of an immediate data Xell agrees with
the size of the applicable data segment.
Mostly this is the data segment, but it may be the stack segment
or some extra segment in the presence of segment override prefixes.
forthitem
In all previous cases the code length can be swapped between
16 and 32 bits by a code length override prefix forthcode({OS:}),
the data length by a data length override prefix forthcode({AS:}),
forthendenumerate
The 16 bit indexing in a 32 bit assembler have separate fixup's,
that all end in a forthkey({%})-sign.
In comma-ing, you must always select the proper one, commaers
contain either forthkey({C}), forthkey({W}), forthkey({L}) or
forthkey({Q}) for 1, 2, 4 or 8 byte widths.
After the directive forthcode({BITS-16}) code is generated for and checked
against 16 bit code and data segments.
After the directive forthcode({BITS-32}) code is generated for and checked
against 32 bit code and data segments.
After the directive forthcode({BITS-64}) code is generated for and checked
against 64 bit code and data segments.
In a 16 bits segments the following commaers must be used:
forthcode({W,}) forthcode({IW,}) forthcode({(RW,)}) and forthcode({RW,}) .
In a 32 bits segments the following commaers must be used:
forthcode({L,}) forthcode({IL,}) forthcode({(RL,)}) and forthcode({RL,}) .
In a 64 bits segments the following commaers must be used:
forthcode({L,}) forthcode({IL,}) forthcode({(RL,)}) and
forthcode({RL,}) and occasionally forthcode({IQ,}).
The prefix forthcode({OS:}) switches the following opcode to use
forthcode({IL,}) instead of forthcode({IW,}) and vice versa.
Similarly the prefix forthcode({AS:}) switches between
forthcode({W,}) and forthcode({L,}), or between forthcode({RW,}) and
forthcode({RL,}).
While mixing modes,
whenever you get error messages and
you are sure you know better than the assembler,
put forthcode({!TALLY}) before the word that gives the error
messages.
This will forthdefi({override the error detection}).
Proper use of the BITS-xx directives makes this largely unnecessary,
but it can be needed if you use e.g. an extra segment forthcode({ES|})dnl
that is 16 bits in an otherwise 32 bits environment.
In 64 bits mode instructions that contain an immediate address
differ from 32 bits mode.
Those addresses are specified relative to the program counter, not absolute.
Consequently the forthcode({MEM|}) fixup leads to an error message, and
instead forthcode({REL|}) must be used with either a forthcode({RL,}) or
an forthcode({(RL,)}) commaer.
Absolute 64-bits addresses are nowhere present in the instruction set,
as they are not really useful.
The great assembler enforces all these rules.
AMD took advantage of the fact that Intel instruction are available
in a short and long form, e.g. forthcode({INC|X,}) and forthcode({INC, X|}).
The short form is hijacked, so forthcode({DEC|X, AX|}) becomes
forthcode({REX,}) .
All immediate data and offsets are sign-extended from 32 to 64 bits
in 64 bits code, with the rational that full 64 bit is rarely useful.
The result is that 32 and 64 code looks the same.
In the rare case that a 64 bit value is needed, forthcode({MOVI|X,}) is
hijacked and replaced with forthcode({MOVI|Q,}) .
(Remember forthcode({MOVI, X|}) is a duplicate.)
So only instructions involving ghost registers representing
integers and memory storage
are different between 32 and 64 bits.
That is all, and a 64-bit assembler is practically accommodated in full.
Bottom line, the assembler built in into forth.lab is adequate to assemble
the floating point wordset.
We need the 64-bit related prefix 0x48 to force the size to 64 bit in all
cases where a register is mentioned in the instruction.
Floating point instructions don't use regular registers and need not use
this prefix unless e.g. forthcode({[SP}) is used.
The three least significant bits in the 0x4# switch the registers
(possible in three positions) to the ghost registers.
Such prefixes are present in forthfile({ciasdis}),
but in the lab file only forthcode({REX,}) is available.
There us more to say about forthdefi({ghost}) registers
in using forthfile({ciasdis}) itself.
They appear instead of the regular
registers, e.g forthcode({AX}) is turned into forthcode({R8}) .
We make a distinction between instruction with possibly two
register operands, and the others. The first class is called modr/m
in Intel and AMD lit.
A two operand instruction always has a primary register that has
a prime like forthcode({AX'|}) and forthcode({T|}) forthcode({F|}) apply
to that register.
(The other operand may be a register, or indirect such as
sib or memory address.)
If you learned the distinction and use of primed and unprimed registers
it is easy:
forthenumerate
forthitem
forthcode({'}) applies to primed registers, turns forthcode({AX'|}) to forthcode({R8'|})
forthitem
forthcode({]}) applies to index registers in sib-intructions, turns forthcode({AX']}) to forthcode({R8']})
forthitem
forthcode({N}) bit applies to all other registers:
forthendenumerate
The remaining case of use of registers are
forthenumerate
forthitem
- unprimed register like forthcode({BX|}) .
forthitem
- indirect like forthcode({[BX]}) .
forthitem
- base register in sib like forthcode({[AX}) .
forthendenumerate
In summary we get
forthcode({ Q: QN: Q]: QN]: Q': QN': Q']: QN']: }) for possible
prefixes, that switch at the same time to 64 bits.
Similar prefixes are available with E , if you want the 32
bit ghost registers.
Note that
an unprimed register cannot be combined with sib (scaled
indexing) in any way, which would signify conflict between
forthcode({AX|}) and forthcode({[AX}) .
Note that most assemblers would conflate
forthcode({MOV, MOVI, MOVI|X, }) etc,
instructions and would not allow for such an easy explanation.
@node A9, AA, A8, Assembler
@section The built in assembler
From within ciforth one can load an assembler from the installed LAB
library by the command forthcode({WANT ASSEMBLERi86 }).
Automatically a 32 bit assembler is loaded
if the Forth itself is 32/64 bits and a 16 bit assembler for the
16 bit forths.
This is a simplified version with no error checking and
no provisions for 16/32 bit mixing.
(Those are not needed, because you can mix with impunity.)
This assembler is now (since 5.0.0) fully compatible with the large
file-based one.
forthemph({Consequently you can take a debugged program and run it
through the LAB assembler. })
forthemph({The built in assembler has no error
checking.})
forthemph({IMPORTANT NOTE: The 5.169 version and later may contain
assembler code in the LAB file that has not yet been converted.
This code largely relates to a booting version;
It will be updated as soon as I have a booting version in a binary
form available.
})
@node AA, AB, A9, Assembler
_VERBOSE_({
@section A rant about redundancy
You could complain about redundancy in postit-fixup assemblers.
But there is an advantage to that, it helps detect invalid
combinations of instructions parts. They look bad at first
sight. What about forthbreak forthsamp({MOV, B| T| [BX+SI] R|
AX|}) forthbreak forthsamp({MOV,}) needs two operands but there
is no primary operand in sight. forthcode({[BX+SI]}) would not
qualify. and not even forthcode({BX|}) because the primary
operand should be marked with a prime. forthbreak
forthsamp({MOV, X| T| BX| AX|}) looks bad because you know
forthcode({BX|}) and forthcode({AX|}) work on the same bit
fields, so it easy to remember you need the prime.
forthcode({T|}) and forthcode({F|}) refer to the primary
operands, so gone is the endless confusion about what is the
destination of the move. forthbreak forthsamp({MOV, X| T| BX'|
R| AL})| looks bad , because forthcode({AL|}) could not
possibly qualify as an X register. forthbreak forthsamp({MOV,
X| T| BX'| AX|}) looks bad , because soon you will adopt the
habit that one of the 8 main register always must be preceeded
with forthfile({T|}) forthcode({F|}) or forthcode({R|}) .
forthbreak forthsamp({MOV, X| T| BX'| R| AX|}) looks right but
you still can code forthsamp({MOV, AX| BX'| R| T| X|}) if you
prefer your fixups in alphabetic order. (A nice rule for those
Code Standard Police out there?).
And yes forthsamp({ES: OS:
MOV, X| T| DI'| XO| [BP +8* AX] FFFFF800 L,}) though
being correct, and in a logical order, looks still bad, because
it forthemph({is}) bad in the sense that the Pentium design got
overboard in complication. (This example is from the built-in assembler,
the one in forthfile({asi386.frt}) redefines forthcode({[BP}) c.s.
to get rid of the forthcode({SIB|,}) instruction.)
forthbreak
First remark: lets assume this is
32 bit code,(because otherwise there
would not be a forthcode({SIB,}) sure?)
forthbreak
There are 3 sizes involved :
forthitemize
forthitem
The size of the data transported this is always the forthsamp({X}) as
in forthcode({X|}) .
Then the first forthcode({X|}) changes its meaning to 16 bit, because
of the forthcode({OS:}) prefix.
forthitem
The fixups related to address offsets forthcode({XO|}) and forthcode({L,}) must
agree and are 32 bits because you are in a 32 bits segment and this
was not be overridden.
forthitem
The offset (in forthsamp({+AX]}) ) is counted in 64 bits.
Apparently, the forthsamp({DI}) is fetched from two cell records.
forthenditemize
And .. by the way the data is placed in the extra segment.
Add a bit of awareness of the cost of the instructions in execution time
and take care of the difference between the Pentium processors MMX en III
and what not and you will see that assembly program is not for the faint
of heart. The forthsamp({ASSUME}) of the MASM assembler buys you
nothing,
but
inconvenience.
})_END_({_VERBOSE_})
@node AB, ABA, AA, Assembler
_VERBOSE_({
@section Reference opcodes, Intel 386
Table one contains all the opcodes used in forthfile({asi386.frt}) in alphabetic order,
with forthkey({|}) sorted before any letter.
The opcodes that lift the assembler to the level of the Pentium is separately
in table 3, in order not to make the tables overly long.
All opcodes on the first position are the same as Intel opcodes,
barring the bar.
Note that sometimes parts that are integrated in the opcodes in Intel
mnemonics are a separate fixup in the Postit-Fixup assembler.
Examples are the condition codes in jumps.
You can use it in two ways.
forthitemize
forthitem
You want the opcode for some known Intel opcode.
forthbreak
Look it up in the first column. One of the opcodes on that
line is what you want. To
pick the right one, consider the extension that are explained
in table 2. Exception: forthsamp({PUSHI}) is not on the line with forthsamp({PUSH}) .
Some times you have to trim built in size designators, e.g. you
look up forthsamp({LODSW}) but you are stuck at forthcode({LODS}) , so that's it.
With forthsamp({ SHOW: LODS, }) you can see what the operands look like.
forthitem
You want to know what a POSIT/FIXUP code does. Look it up in the table,
on the first word on the line you should recognize an Intel opcode. For example you have
forthcode({ CALLFAROI, })
That is at the line with forthcode({CALL,}) . So the
combination of operands for forthcode({CALLFAROI,}) are to be
found in the description for forthsamp({CALL}) in the Intel
manuals.
forthenditemize
Note. Some things are ugly. forthcode({LDS,}) should be
forthcode({L|DS,}) . I would replace forthcode({MOV|FA,}) by
forthcode({STA,}) and forthcode({MOV|TA,}) by forthcode({LDA, }) . But
that would make the cross referencing more problematic. Note. The
meaning of the operands for forthsamp({JMP}) and forthsamp({JMPFAR})
are totally different. So my suffices are different.
Table 1. Opcode cross reference.
@table @var
forthitem AAA,
forthitem AAD,
forthitem AAM,
forthitem AAS,
forthitem ADC, ADCI, ADCI|A, ADCSI,
forthitem ADD, ADDI, ADDI|A, ADDSI,
forthitem AND, ANDI, ANDI|A, ANDSI,
forthitem ARPL,
forthitem AS:,
forthitem BOUND,
forthitem BSF,
forthitem BSR,
forthitem BT, BTI,
forthitem BTC, BTCI,
forthitem BTR, BTRI,
forthitem BTS, BTSI,
forthitem CALL, CALLFAR, CALLFAROI, CALLO,
forthitem CBW,
forthitem CLC,
forthitem CLD,
forthitem CLI,
forthitem CLTS,
forthitem CMC,
forthitem CMP, CMPI, CMPI|A,
forthitem CMPS, CMPSI,
forthitem CPUID,
forthitem CS:,
forthitem CWD,
forthitem DAA,
forthitem DAS,
forthitem DEC, DEC|X,
forthitem DIV|AD,
forthitem DS:,
forthitem ENTER,
forthitem ES:,
forthitem FS:,
forthitem GS:,
forthitem HLT,
forthitem IDIV|AD,
forthitem IMUL, IMUL|AD, IMULI, IMULSI,
forthitem INC, INC|X,
forthitem INS,
forthitem INT, INT3, INTO,
forthitem IN|D, IN|P,
forthitem IRET,
forthitem J, J|X, (Intel Jcc)
forthitem JCXZ,
forthitem JMP, {JMPFAR,} JMPFAROI, JMPO, JMPS,
forthitem LAHF,
forthitem LAR,
forthitem LDS,
forthitem LEA,
forthitem LEAVE,
forthitem LES,
forthitem LFS,
forthitem LGDT,
forthitem LGS,
forthitem LIDT,
forthitem LLDT,
forthitem LMSW,
forthitem LOCK,
forthitem LODS,
forthitem LOOP, LOOPNZ, LOOPZ,
forthitem LSL,
forthitem LSS,
forthitem LTR,
forthitem MOV, MOV|CD, MOV|FA, MOV|SG, MOV|TA,
forthitem MOVI, MOVI|B, MOVI|X,
forthitem MOVS,
forthitem MOVSX|B, MOVSX|W,
forthitem MOVZX|B, MOVZX|W,
forthitem MUL|AD,
forthitem NEG,
forthitem NOT,
forthitem OR, ORI, ORI|A, ORSI,
forthitem OS:,
forthitem OUTS,
forthitem OUT|D, OUT|P,
forthitem POP, POP|ALL, POP|DS, POP|ES, POP|FS, POP|GS, POP|SS, POP|X,
forthitem POPF,
forthitem PUSH, PUSH|ALL, PUSH|CS, PUSH|DS, PUSH|ES, PUSH|FS, PUSH|GS, PUSH|SS, PUSH|X,
forthitem PUSHF,
forthitem PUSHI|B, PUSHI|X,
forthitem RCL, RCLI,
forthitem RCR, RCRI,
forthitem REPNZ,
forthitem REPZ,
forthitem RET+, RET, RETFAR+, RETFAR,
forthitem ROL, ROLI,
forthitem ROR, RORI,
forthitem SAHF,
forthitem SAR, SARI,
forthitem SBB, SBBI, SBBI|A, SBBSI,
forthitem SCAS,
forthitem SET, (Intel SETcc)
forthitem SGDT,
forthitem SHL, SHLI,
forthitem SHLD|C, SHLDI,
forthitem SHR, SHRI,
forthitem SHRD|C, SHRDI,
forthitem SIDT,
forthitem SLDT,
forthitem SMSW,
forthitem SS:,
forthitem STC,
forthitem STD,
forthitem STI,
forthitem STOS,
forthitem STR,
forthitem SUB, SUBI, SUBI|A, SUBSI,
forthitem TEST, TESTI, TESTI|A,
forthitem VERR,
forthitem VERW,
forthitem WAIT,
forthitem XCHG,
forthitem XCHG|AX,
forthitem XLAT,
forthitem XOR, XORI, XORI|A, XORSI,
forthitem ~SIB,
@end table
Table 2 Suffixes, not separated by a forthkey({|})
@table @var
forthitem I : Immediate operand
forthitem SI : Sign extended immediate operand
forthitem FAR : Far (sometimes combined with OI)
forthitem O : Operand
forthitem OI : Operand indirect
@end table
})_END_({_VERBOSE_})
@node ABA, AC, AB, Assembler
_VERBOSE_({
@section Reference opcodes, Pentium only.
Table three contains all the opcodes present in forthfile({asipentium.frt})
in alphabetic order,
with forthkey({|}) sorted before any letter.
All opcodes on the first position are the same as Intel opcodes,
barring the bar.
Note that again sometimes parts that are integrated in the opcodes in Intel
mnemonics are a separate fixup in the Postit-Fixup assembler.
You can use it in the same way as the Intel 386 table.
But there are much less instances where the opcodes do not agree exactly with
Intels.
Memory operands are specified in the same way for floating point
instructions.
But in those instructions
register operands are always floating point registers.
There is at most one register specified in a floating point
instruction.
For two register operation forthcode({ST0}) is always implicit.
In that case normally it is the first operand as per forthsamp({ST0-ST1}).
forthsamp({a|}) (abnormal operation) means forthcode({ST0})
is the second operand as per forthsamp({ST1-ST0}).
Also normally forthcode({ST0}) gets the result.
forthsamp({m|}) (modified) means that the explicit register gets modified
instead.
And don't forget! forthsamp({SHOW: <opcode>}) is your friend.
Table 3. Opcode cross reference. Pentium-only.
@table @var
forthitem BSWAP,
forthitem CMPXCHG,
forthitem CMPXCHG8B,
forthitem F2XM1,
forthitem FABS,
forthitem FADD,
forthitem FADDP,
forthitem FBLD,
forthitem FBSTP,
forthitem FCHS,
forthitem FCLEX,
forthitem FCOM,
forthitem FCOMP,
forthitem FCOMPP,
forthitem FCOS,
forthitem FDECSTP,
forthitem FDIV,
forthitem FDIVP,
forthitem FFREE,
forthitem FIADD,
forthitem FICOM,
forthitem FICOMP,
forthitem FIDIV,
forthitem FILD, FILD|64,
forthitem FIMUL,
forthitem FINCSTP,
forthitem FINIT,
forthitem FIST,
forthitem FISTP, FISTP|64,
forthitem FISUB,
forthitem FLD, FLD|e,
forthitem FLD1,
forthitem FLDCW,
forthitem FLDENV,
forthitem FLDL2E,
forthitem FLDL2T,
forthitem FLDLG2,
forthitem FLDLN2,
forthitem FLDPI,
forthitem FLDZ,
forthitem FMUL,
forthitem FMULP,
forthitem FNOP,
forthitem FPATAN,
forthitem FPREM,
forthitem FPREM1,
forthitem FPTAN,
forthitem FRNDINT,
forthitem FRSTOR,
forthitem FSAVE,
forthitem FSCALE,
forthitem FSIN,
forthitem FSINCOS,
forthitem FSQRT,
forthitem FST, FST|u,
forthitem FSTCW,
forthitem FSTENV,
forthitem FSTP, FSTP|e, FSTP|u,
forthitem FSTSW,
forthitem FSTSW|AX,
forthitem FSUB,
forthitem FSUBP,
forthitem FTST,
forthitem FUCOM,
forthitem FUCOMP,
forthitem FUCOMPP,
forthitem FXAM,
forthitem FXCH,
forthitem FXTRACT,
forthitem FYL2X,
forthitem FYL2XP1,
forthitem INVD,
forthitem INVLPG,
forthitem Illegal-1,
forthitem Illegal-2,