![]() |
|
version 1.2
Copyright © Department of Computer Science Rochester Institute of Technology
All Rights Reserved
This document is intended to be a quick introduction to (or a brief review of) RTL, the Register Transfer Language, which is used to describe CPU organization in high-level terms. It is not intended to be a complete treatment of RTL, but should be sufficient to introduce you to the major elements of RTL. For a more complete description, consult one of many digital design textbooks; a good one is Computer Systems Architecture, M. Morris Mano, Third Edition, Prentice-Hall, 1993.
We begin with some definitions of terms we'll use:
To fully describe a digital system, we must specify:
We will use RTL expressions in combination with block-level architectural drawings to do this.
RTL expressions are made up of elements which describe the registers being manipulated, and the micro-ops being performed on them.
Here are the basic components of RTL expressions:
Note that these are flexible - e.g., parens could be used to indicate
bit positions in registers
Registers are named. Register names consist of a sequence of uppercase alphabetic characters, possibly followed by one or more numeric characters. Sometimes we'll refer to the entire contents of registers; at other times we'll just refer to portions of them.
Registers come in many different forms. We represent them with boxes in architectural drawings, and label them with their RTL names. Sometimes we'll label portions of registers, or indicate their bit positions. Here are some examples:
Micro-ops come in four essential varieties: Data transfer, Arithmetic, Logic/bit manipulation, and Shift. We'll discuss each in turn.
Transfer micro-ops are are used to move (actually, copy) information from one place to another within the computer. This transfer can occur in one of several ways:
into
on one clock pulse
(a transfer which can inherently be done in parallel):
Sometimes, we want to control when the transfer occurs; we can do this by structuring the RTL expression to indicate the controlling condition:
This is often written with a control function specifying when the load takes place. In general, control functions are combinational (Boolean) expressions which are implemented using combinational synthesis. In this example, we might use:
We can also indicate that combinations of transfers are to be performed:
To transfer data using a bus: connect the output of the source register to the bus; connect the input of the target register to the bus; when the clock pulse arrives, the transfer occurs. You can even transfer the source to multiple targets by connecting the bus to the inputs of each target, but the controlling logic gets a bit complex.
Connections between the registers and the bus can be made in several ways.
The two most common are the use of multiplexor banks whose outputs are the
bus lines themselves (common select inputs are used to pass the data bits
of the chosen register onto the bus), and the use of
tri-state buffers
to connect the register outputs to the bus lines (which allow us to
connect all the
outputs to bus line i, but control which
of those connections is ``active'' through control lines).
On the other end of the bus, routing the bus contents into one of a set of
registers is trivial - just connect the bus lines to the corresponding
inputs of all the registers, and use a decoder to provide
control inputs to the registers.
We represent bus transfers in two ways in RTL expressions:
Explicitly:
RTL expressions for a read operation, assuming the use of an address registers:
RTL expressons for a write operation, assuming use of a data register:
Minimally, a CPU typically provides addition, subtraction, increment, and decrement operations in its ALU (arithmetic-logic unit). It may also provide multiplication and division (which involve more complex hardware); these two are often implemented as collections of operations rather than as single micro-ops, so we won't discuss them here. Often, we find that subtraction is implemented as addition of the two's complement of the second operand. This allows both addition and subtraction to be implemented with just an adder.
Here are some common arithmetic operations which we might find in a simple ALU:
Implementation of these can be through separate circuits (e.g., addition/subtraction through one circuit, complements through a second, increment/decrement through an adder with inputs +1 and -1, etc.), or with a single arithmetic circuit which provides a collection of possible results, one of which is selected through a control input.
Logic
micro-ops are like arithmetic, but treat each bit of the register(s)
separately.
Commonly, computers implement a basic set of logic operations
(AND, OR, NOT, XOR); we can even use the common symbols for some of them
(e.g., one's complement of
is
,
XOR of
with
is
).
However, we use alternate symbols for AND & OR to avoid confusion with
arithmetic operations (AND is
,
OR is
).
Thus, in
the first
is an OR in a Boolean control function, the second is the
arithmetic ADD operation, and the
is a logic OR operation.
For two input variables, there are four possible combinations of input
values, and thus four outputs which must be specified.
For each output, there are two possible results; thus, there are sixteen
possible output combinations for two input variables, with values 0000
through 1111.
We identify these as
,
where i is the result pattern interpreted
as an unsigned binary number; thus, 0000 is
, 1010 is
, and 1111 is
.
Because all of these can be generated by combining one or both of the input values with AND, OR, XOR, and NOT micro-ops in some fasion, we often find that only those four operations are actually implemented:
One bit:
Finally, we have the shift micro-ops. As the name implies, these move the information in a register by one bit position. Shifts come in three varieties: logical (the value shifted in is a 0); arithmetic (the value shifted in is 0 for left shift, the duplicate of the high-order bit for right shift; maintains sign of resulting number); and circular (the value shifted in is whatever was shifted out of the other end). Here are the corresponding RTL micro-ops:
Next, we'll look at the use of RTL and architectural diagrams to represent the structure and operation of a simple computer system. The processor we consider will be a simple accumulator-based system having a 16-bit word and 4096 (212) words of memory.
Because our word size is 16 bits, instructions must fit within that space; as we need 12 bits to represent a memory address, we have only 4 bits remaining to indicate the opcode and any other per-instruction information we need. We'll use a 3-bit opcode, and use the remaining bit to distinguish between direct and indirect memory addressing.
Here is our basic instruction layout:
When the
(indirect) bit is 0, the value in
is the actual address of the operand
(direct
addressing).
When
is one,
contains the address of an
indirect word, which in turn will contain the actual operand address
(indirect
addressing).
Our simple machine will have the following registers. Sizes are determined directly from the intended use of that register - e.g., a register which only holds addresses will be the size of an address.
To simplify the connection logic between the registers, we'll use a common bus structure. The bus will be 16 bits wide (size of a word of memory); we'll need to connect six registers and memory to it, so we'll need three selection inputs to the bus controller logic.
This structure allows us to move information between the major elements of the CPU. Registers are connected to the bus as follows:
The ALU takes two 16-bit inputs and 8-bit input, and produces 17
(one to
, 16 to
).
ALU inputs come from the
(allows
complementation, and use of
contents in arithmetic/logic/shift operations), from
(ditto),
and from the
(input is moved through the ALU into the
).
This design also allows for multiple parallel hardware operations - e.g.,
information can be moved onto/from the bus while the ALU is performing an
operation.
One potential problem with our instruction word is that there are only
three bits available for the opcode.
Normally, this would limit us to a total of eight instructions, which is
too few to provide a meaningful set of operations.
However, we can take advantage of the fact that not all instructions
require the use of a memory operand.
If we can design the instruction decoding logic in the CU properly, we can
arrange to use a single opcode for all these instructions, and use the 12
bits in the
field to distinguish between the operations.
We can even use the
field here - as there is no memory operand, using
indirect addressing doesn't make any sense, so we can use that additional
bit to help extend the instruction set.
We'll use opcode 7 to represent all the ``extended'' instructions, leaving
opcodes 0 through 6 for the ``memory reference'' instructions.
Further, we'll use the
bit to distinguish between ``register''
instructions and ``i/o'' instructions.
Although our simple instruction set is complete, it is fairly inefficient -
there are no subtract, multiply, divide, or other similar operations.
All of these operations can be performed by combining the existing
instructions in different sequences (subtraction by using ADD after moving
the second operand to the
,
then using CMA and INC on it); while a more
complete set would have made programming easier, it would also have
increased the complexity of the machine to the point where it would not be
understandable.
We next need to consider the structure of the CU. In essence, the CU is a large sequential circuit. It must be able to sequence the execution of micro-ops, as well as fetch, decode, and execute instructions. All CPU registers are controlled by a single master clock. While all registers are connected to this clock, none will change state unless the other control inputs are present to initiate the change at the arrival of a clock pulse.
The two major data sources on which CU decisions are based are decoded
information from the instruction being executed, and timing information
which allows the sequencing of micro-ops to support instruction execution.
Instruction decoding is handled quite easily.
Once an instruction is moved into the
,
the opcode can be taken from
that register and fed into a 3x8 decoder to generate separate
output signals (
)
for each of the eight possible opcode values.
Similarly, the
bit can be moved into a special
register for later use.
Timing signals will be generated by feeding the output of an N-bit
sequence counter,
,
into an Nx2N decoder; this
will generate 2N timing signals
(
)
which will be used in
control functions for micro-ops to limit their operation to particular
units of time.
The number of timing signals required depends on the longest sequence of
micro-ops required for any single instruction.
We'll use a clearable counter, and add a micro-op to the system's
repetoire which ``resets'' the counter to 0; we'll use this at the end of
each instruction's micro-op sequence to avoid having the system move
through all 2N timing values, which will speed up execution of
instructions.
Here's a block diagram of the basic CU, assuming a four-bit counter as the timing sequence generator:
Our CPU's fetch/execute cycle will look like this:
Fetch an instruction from memory.
Increment theso that it points to the next instruction.
Decode the instruction.
If the instruction specifies indirect addressing, retrieve the indirect word from memory.
Execute the instruction.
The fetch and decode phases of the cycle can be specified by the following RTL expressions:
The logic to implement this will look something like the following:
At time
,
we must move the
into the
;
thus, the bus
select inputs must be
010,
and we must enable the
input of
.
The double transfer occurs at the next clock pulse, which also advances
to 1.
At time
,
we must move memory into the
(bus select 111,
input of
enabled); also, we must increment
(
input of
enabled).
The next clock pulse triggers both actions, and advances
to 2.
Note that in all cases there will be other control functions affecting the bus select inputs and register control lines, hence the extra input lines to the OR gates.
We feed
into a 3x8 decoder, which produces outputs
through
.
These control signals will directly indicate the opcode to be executed.
There are four basic paths through the instruction execution portion of the
CU logic: register; i/o; memory direct; and memory indirect.
The path is chosen based on the opcode and
values.
Memory reference instructions differ only in the determination of the
effective address for the operand.
For indirect references, one additional clock pulse is required (to move
the address of the indirect word to
,
so that the actual operand
retrieval logic will find the final EA in
.
This is handled by either doing that move at time
,
or doing
nothing at all.
Thus, we will execute particular instructions under the following
conditions:
to reset the
timer to
;
this moves the CU back to the fetch phase after the
execution of the previous instruction.
is true, the opcode must be
111,
and thus we have either a register or an I/O instruction.
If
is 0,
indicates a register instruction; we represent this
in a control function as
Register instructions use the
field to specify the particular
operation to be performed.
All these operations are implemented in terms of the basic arithmetic and
logic operations of the ALU, or the control inputs of the various
registers, as follows:
is a one-bit register which controls the
input of
;
initially, it contains 1 (causing
to count time units), but changing
it to 0 effectively prevents the computer from moving beyond the current
time unit, therefore halting it.
Indirect addressing has already been handled by the time
arrives
(when
we executed the micro-op
which moved the indirect address into the
,
so the
contains the final EA of the operand.
To retrieve the operand, then, we need to perform
for
those instructions which require the operand.
This is AND, ADD, LDA, and ISZ; thus, the full RTL expression is
Individual instructions are implemented as follows:
The BSA instruction copies the return address into the address specified by
the operand, and then effects a transfer to the word following that one by
incrementing
and then moving it into the
(which actually causes
the transfer of control).
This means that each subprogram must begin with a data word; for a
subprogram which is at location 100, the actual first instruction of the
subprogram must be at 101, as 100 will be used to hold the return address.
Returning from the subprogram is done with a BUN instruction whose operand
is the location containing the return address.
The operand is referenced indirectly - that is, the BUN refers to the
return address data word, but has the
bit turned on; this means that
the CU will interpret the return address word as an indirect word which
actually contains the final EA of the operand.
That will cause the actual return address to be used as the final EA; the
BUN moves that into the
,
which effects the return.
The ISZ insruction requires the longest timing sequence - seven units.
We could, therefore, get by with only a three-bit sequence counter
(counting
through
;
however, we'd still need to
implement
at the end of each sequence.
Also, having extra timing slots allows us to expand the operation of the computer - we could, for instance, increase the word size to allow for a larger opcode field; we could then add an instruction which might require more than seven time slots to execute.
Our computer must also have i/o and interrupt control instructions. For the sake of time, we won't discuss them here, but will represent them in later discussions.
Here is the full control function/micro-op description of the computer:
The basic computer consists of these components:
We need control signals for the following:
control inputs of nine registers
read and write inputs of memory
set, clear, and complement the seven flip-flops
bus selection inputs
control inputs of arithmetic/logic circuits
Design of registers requires that we scan the control table to locate all
micro-ops which modify the contents of that register.
The
,
,
and
functions are implemented as follows:
is modified in only one situation:
Micro-op:
is a bit more complicated:
From this, we can derive:
Implementation of
control logic:
Implementation of
and ALU connections:
Connections to
would be designed in a similar fashion.