Skip to content

Commit

Permalink
1989 final year project
Browse files Browse the repository at this point in the history
  • Loading branch information
steveloughran committed Sep 1, 2014
1 parent 5476692 commit e53813b
Show file tree
Hide file tree
Showing 48 changed files with 7,171 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,7 @@

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

# Tex artifacts
*.dvi
*.aux
3 changes: 3 additions & 0 deletions papers/urisc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# the Ultimate RISC

This is my final year undergraduate paper from 1989; the formal specification and implementation of a microprocessor which implemented one instruction, MOVE. The specification was done in a version of Standard ML which allowed for the specification of temporal logic, yet allowed pure ML to be used, ML which could then be interpreted. The ALU was so defined, then translated to FPGA form. The rest of the system was built from 74-series ICs and hand-wired.
16 changes: 16 additions & 0 deletions papers/urisc/abstract.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
%abstract 28/4/89 sal
% %W%
\begin{abstract}
Computer Design is a highly competitive field, where there is much interest in the
possibility of designing high performance computers quickly, cheaply and reliably.
The cost and performance requirements may be satisfied by RISC architectures, and the application
of Formal Methods to hardware design promises new levels of quality.
This report is a description of a final year project, the design, formal specification and implementation of the Ultimate RISC, a single instruction computer.

The project was a demonstration that such a computer is simple enough to be designed and built within a single year, although
the implementation does suffer from some limitations.

The concept of a single instruction computer is discussed in general,
concluding that whilst it is a fast and compact design which could be useful in some applications the memory bandwidth it requires limits its performance.

\end{abstract}
114 changes: 114 additions & 0 deletions papers/urisc/alu.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
\chapter{The ALU}

\section{Design}
For a computer to be capable of effective operation, it needs the ability to perform
actual processing of data.
It has long been known that the ability to compare two items and act upon the result is sufficient for effective computation
---Turing Machines are based around this concept.
It would therefore have been possible to build a basic comparison unit, and rely
on software to derive mathematical and logical operations.
This would have been unreasonably inefficient.
All realistic computers have hardware dedicated to evaluation of these functions.
These Arithmetic and Logic Units ({\bf ALU}) normally perform at least integer addition, subtraction and the standard boolean functions of two variables.
More powerful units are capable of high speed multiplication, or even manipulate
floating point numbers.


At the start of the project I was offered the possibility of using a
single chip 64-bit floating point ALU from AMD (AM29C327) \cite{amd:uprogramming,amd:29c300}.
This would have produced impressive performance figures, but I
decided that it would have been unworkable, since it was
designed for a triple data bus and needed 31 bits of control
information every cycle. A single bus system would have been unable to use this
device effectively.

Instead I designed a very simple ALU, since this made
formal specification possible. The unit was
built from eight bit sliced TTL ICs, each of which operates on four bits.
When connected together via a two level
carry lookahead generator, they perform operations on 32-bit words.

This
is sufficient for many purposes, except that the ability to shift
a word right was needed in iterative multiplication and division algorithms.

The result of the ALU had to be stored until re-used in later instructions.
The state of this result,
whether zero or negative needed to obtained in a form
which could be passed to the Skip register.
Arithmetic overflow and carry flags were also desirable,
detecting results too large to be represented in 32 bits.

\section{Implementation}

The design of the ALU is shown in figure~\ref{figure:alu}.

An Accumulator stores the output of the ALU between operations.
This can be read as a memory location.
The contents of the Accumulator are also used as one of the inputs to the ALU,
so only one other argument needs to be supplied per operation.
This accumulator is built out of four SSRS, so can be read directly by the host.

A number of bit sliced ALUs were available with built in accumulator registers.
For example,
the AMD AM2901 (\cite{amd:logic}) or the TTL 74F681 ALU bit slices,
would have provided enhanced performance with less components and wiring.
Using these would have prevented the host examining the Accumulator directly.
Instead I used 74F381 ALU/function generators in my design.
These only perform basic operations ---addition, subtraction, and, or, exclusive or, preset and clear.
Three control signals are used to select a function.


Between the outputs of the ALU ICs and the Accumulator is a bank of five PALS.
Normally these pass the result straight through, each PAL checking if the bits passed though it are all zero or not.
They can also be instructed to shift the result ---including the carry flag--- one bit to the right;
this shifting is controlled by a one bit signal.
This post shifting allows a normal operation to be combined with a shift, to make unusual functions such as `subtract and divide by two'.

The results of the five
zero tests along with other signals are fed to another PAL, which
produces values for a Condition Code register ({\bf CC}), constructed from a Shadow Serial Register.
The PAL generates a zero flag when all five slices of the result are zero.

\begin{figure}
\vspace{20cm}
\caption{The ALU}
\label{figure:alu}
\end{figure}

\subsubsection{Overflow}

An arithmetic overflow is where a signed number's sign changes due to too large an addition, subtraction or shift.

My design of an ALU does not detect signed overflow, despite the original intent to do so.
I had originally
acquired equations from my CS3 notes to detect overflows using a PAL.
While specifying the system I realised
these equations only detected overflow on signed addition.
To detect overflow in a multi-function ALU, one must compare the carry between
bit 30 and bit 31 of the result with the most significant bit, an overflow occuring if the two differ.
This can not be done with the 74F381 bit-sliced devices, as this carry is internal.
I have discovered that AMD make a special most-significant-slice version of this bit-sliced ALU which does detect overflows internally.
The result of this check would however become confused if shifting was performed after the operation, so would not always be reliable.

Note that even if the ALU did produce an overflow flag, the software would still have to check it after every operation. A number of
implementations of languages do not do this because of the overhead this entails;
APM Pascal and Standard ML are two such implementations.

\subsubsection{Memory Interface}

Seventeen addresses are allocated to the ALU, as shown in table~\ref{table:memory}.
One of these addresses returns the current value of the Accumulator whenever it is read.
The remaining sixteen addresses all apply a different function between the accumulator and the word moved to the selected address.
This is accomplished by wiring address bus lines directly to the ALU and the PALS.

It is not be possible to directly load the
accumulator, but a two instruction sequence clears it and then adds a
number to the now empty accumulator.

Condition code flag manipulation is supported:
reading any of the sixteen function addresses returns one of the condition code flags in the least significant bit.
These results can be passed directly to the Skip register for conditional branching.
Before performing a subtraction the carry flag has to be set to true, while
for other operations the flag has to be cleared.
An address is provided to enable this; when it is written to, the least significant bit is passed to the carry flag.
54 changes: 54 additions & 0 deletions papers/urisc/aluL.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
The components can be combined to specify exactly how the ALU should behave.
This could, if desired, be verified against the mathematics of of a subset of integers.
\begin{verbatim}
(* ALU.L
=====
The definition of the ALU as a whole.
3/1/89 sal: TTL part only; not the PALS
*)
(* The carry look-ahead unit *)
(* built out of three TTL devices *)
val carrylookahead #( g0,g1,g2,g3,g4,g5,g6,g7,
p0,p1,p2,p3,p4,p5,p6,p7,
c,c3,c7,c11,c15,c19,c23,c27,carry_out)=
SN74F182 #(g0,g1,g2,g3,p0,p1,p2,p3,c,c3,c7,c11,G0,P0)
/\
SN74F182 #(g4,g5,g6,g7,p4,p5,p6,p7,c15,c19,c23,c27,G1,P1)
/\
SN74F182 #(G0,G1,true,true,P0,P1,true,true,c,c15,carry_out,
false,true,true);
(* the TTL portion of the ALU *)
(* eight four bit slices connected together via the carry generator *)
(* a= Input a
b= Input b
c= carry in
s2-s0= control signals
c=carry out
f=evaluated function
*)
val ttlALU # (a b c s2 s1 s0 carry_out f)=
(a7,a6,a5,a4,a3,a2,a1,a0)==split a /\
(b7,b6,b5,b4,b3,b2,b1,b0)==split b /\
carrylookahead #( g0,g1,g2,g3,g4,g5,g6,g7,
p0,p1,p2,p3,p4,p5,p6,p7,
c,c3,c7,c11,c15,c19,c23,c27,carry_out) /\
SN74F381#(a0,b0,c,p0,g0,s2,s1,s0,f0) /\
SN74F381#(a1,b1,c3,p1,g1,s2,s1,s0,f1) /\
SN74F381#(a2,b2,c7,p2,g2,s2,s1,s0,f2) /\
SN74F381#(a3,b3,c11,p3,g3,s2,s1,s0,f3) /\
SN74F381#(a4,b4,c15,p4,g4,s2,s1,s0,f4) /\
SN74F381#(a5,b5,c19,p5,g5,s2,s1,s0,f5) /\
SN74F381#(a6,b6,c23,p6,g6,s2,s1,s0,f6) /\
SN74F381#(a7,b7,c27,p7,g7,s2,s1,s0,f7) /\
split f=(f7,f6,f5,f4,f3,f2,f1,f0);
\end{verbatim}
178 changes: 178 additions & 0 deletions papers/urisc/aluS.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
This describes the operation of the ALU.
Based upon the lambda specification, the components have been redescribed as
functions, and combined to describe the entire ALU's operation.
\begin{verbatim}
(* alu.SIM v1.8 *)
(* ======= =====*)
(* Simulation of the ALU *)
(* the ALU bit slice component*)
fun SN74F182 g0 g1 g2 g3 p0 p1 p2 p3 c= (* c1 c2 c3 G P *)
let
val G =(g3 && (p3 ||| g2 )
&& (p3 ||| p2 ||| g1 )
&& (p3 ||| p2 ||| p1 ||| g0))
and P=p0 ||| p1 ||| p2 ||| p3
and c1= not(g0 && (p1 ||| (~ c)))
and c2=not(g1 && (p0 ||| (g0 && p1 ||| (~ c))))
and c3= not(g2 && (p2 ||| (g1 && p0 |||
(g0 && p1 ||| (~ c)))))
in
(c1,c2,c3,G,P)
end;
(* the lookahead carry generator *)
fun carrylookahead g0 g1 g2 g3 g4 g5 g6 g7
p0 p1 p2 p3 p4 p5 p6 p7 c=
let
val (c3,c7,c11,G0,P0)=SN74F182 g0 g1 g2 g3 p0 p1 p2 p3 c
and G1=(g7 && (p7 ||| g6 )
&& (p7 ||| p6 ||| g5 )
&& (p7 ||| p6 ||| p5 ||| g4))
and P1=p4 ||| p5 ||| p6 ||| p7
in
let
val (c15,carry_out,_,_,_)=SN74F182 G0 G1 true true
P0 P1 true true
c
in
let val (c19,c23,c27,_,_) = SN74F182 g4 g5 g6 g7
p4 p5 p6 p7
c15
in
(c3,c7,c11,c15,c19,c23,c27,carry_out)
end
end
end;
(* the TTL part of the ALU)
fun ttlALU a b c s2 s1 s0=
let
val (a7,a6,a5,a4,a3,a2,a1,a0)=split a
and (b7,b6,b5,b4,b3,b2,b1,b0)=split b
in
let
val (c3,c7,c11,c15,c19,c23,c27,carry_out)=
carrylookahead
(generate a0 b0)
(generate a1 b1)
(generate a2 b2)
(generate a3 b3)
(generate a4 b4)
(generate a5 b5)
(generate a6 b6)
(generate a7 b7)
(propagate a0 a0)
(propagate a1 a1)
(propagate a2 a2)
(propagate a3 a3)
(propagate a4 a4)
(propagate a5 a5)
(propagate a6 a6)
(propagate a7 a7)
c
in (carry_out, merge
(applyALU a7 b7 c27 s2 s1 s0)
(applyALU a6 b6 c23 s2 s1 s0)
(applyALU a5 b5 c19 s2 s1 s0)
(applyALU a4 b4 c15 s2 s1 s0)
(applyALU a3 b3 c11 s2 s1 s0)
(applyALU a2 b2 c7 s2 s1 s0)
(applyALU a1 b1 c3 s2 s1 s0)
(applyALU a0 b0 c s2 s1 s0))
end
end;
(* the shift pal program for the four least significant PALS *)
fun ALU_SHIFT_PAL_fn shift f7 (f6,f5,f4,f3,f2,f1,f0)=
let val z= (~shift && ~f0 && ~f1 && ~f2 && ~f3 && ~f4
&& ~f5 && ~f6 ) |||
(shift && ~f1 && ~f2 && ~f3 && ~f4 && ~f5
&& ~f6 && ~f7 )
and h0= ~shift && f0 ||| shift && f1
and h1= ~shift && f1 ||| shift && f2
and h2= ~shift && f2 ||| shift && f3
and h3= ~shift && f3 ||| shift && f4
and h4= ~shift && f4 ||| shift && f5
and h5= ~shift && f5 ||| shift && f6
and h6= ~shift && f6 ||| shift && f7
in
(z,(h6,h5,h4,h3,h2,h1,h0))
end;
(* the shift PAL program for the most significant PAL *)
fun ALU_SHIFT_PAL_fn_2 shift d0 c f31 f30 f29 f28
=
let val h28= ( ~shift && f28 ||| shift && f29)
and h29 = ~shift && f29 ||| shift && f30
and h30 = ~shift && f30 ||| shift && f31
and h31 = ~shift && f31 ||| shift && c
and carry_out = ~shift && c ||| shift && d0
and z = ~shift && ~f28 && ~f29 && ~f30 && ~f31
|||
shift && ~f29 && ~f30 && ~f31 && c
in
(z,carry_out,(h31,h30,h29,h28))
end;
(* the condition code generation *)
fun ALU_CC_PAL_fn shift z0 z1 z2 z3 z4 carry_in data0 addr4=
let
val z=z0 && z1 && z2 && z3 && z4
and c=carry_in && addr4 ||| data0 && ~addr4
in
(z,c)
end;
(* the complete ALU *)
(* takes the current ALU state, the data on the bus and the
value of the address bus to return an updated ALU state*)
fun alu (alustate:ALUstate) d a=
let
val (c,f)=ttlALU (get_acc alustate)
d
(get_carry alustate)
(addressBit2 a)
(addressBit1 a)
(addressBit0 a)
and shift=addressBit3 a
in
let val ((f31,f30,f29,f28),f3,f2,f1,f0)=
split7 f in
let val (z0,h0)=ALU_SHIFT_PAL_fn shift (dataBit7 f) f0
and (z1,h1)=ALU_SHIFT_PAL_fn shift (dataBit14 f) f1
and (z2,h2)=ALU_SHIFT_PAL_fn shift (dataBit21 f) f2
and (z3,h3)=ALU_SHIFT_PAL_fn shift (dataBit28 f) f3
and (z4,c2,(h31,h30,h29,h28))=
ALU_SHIFT_PAL_fn_2 shift
(dataBit0 f) c f31 f30 f29 f28
in
let val h = merge7 (h31,h30,h29,h28)
h3 h2 h1 h0
in
let val (z,carry)=
ALU_CC_PAL_fn shift
z0 z1 z2 z3 z4 c
(dataBit31 d)
(addressBit4 a)
in
({acc=h,
z=z,
n=dataBit31 h,
v=carry,
carry=carry}:ALUstate)
end
end
end
end
end;
\end{verbatim}
Loading

0 comments on commit e53813b

Please sign in to comment.