RIT

Computer Architecture: Design Using RTL[1]

version 1.2

Copyright © Department of Computer Science Rochester Institute of Technology
All Rights Reserved

1. Introduction to RTL

This document is intended to be a quick introduction to (or a brief review of) RTL, the Register Transfer Language, which is used to describe CPU organization in high-level terms. It is not intended to be a complete treatment of RTL, but should be sufficient to introduce you to the major elements of RTL. For a more complete description, consult one of many digital design textbooks; a good one is Computer Systems Architecture, M. Morris Mano, Third Edition, Prentice-Hall, 1993.

2. Basic Definitions

We begin with some definitions of terms we'll use:

*
A digital system is a collection of digital hardware modules, each of which has been designed (using combinational and sequential synthesis techniques) to perform a specific task. Examples of modules are registers, counters, arithmetic elements, etc.
*
Modules are interconnected via common data paths (routes on which information is moved) and control paths (routes on which control signals are moved). Data paths are typically just ``wires''; control paths are often connections between control logic (combinational and/or sequential) which generates control signals and modules which need those signals in order to operate.
*
Each module is capable of performing one or more simple operations - e.g., load a value, increment, decrement, or shift. These are called micro operations, or just micro-ops.
*
Digital modules (often just called ``registers'', for simplicity) are defined by their information contents and the set of micro-ops they perform.

To fully describe a digital system, we must specify:

  1. Its collection of modules, and each module's registers and functions;
  2. The sequences of micro-ops performed on the contents of modules; and
  3. The control functions which cause the micro-op sequences to be performed.

We will use RTL expressions in combination with block-level architectural drawings to do this.

3. Register Transfer Language

RTL expressions are made up of elements which describe the registers being manipulated, and the micro-ops being performed on them.

Here are the basic components of RTL expressions:

.gif

Note that these are flexible - e.g., parens could be used to indicate bit positions in registers .gif

3.1. Register Representation

Registers are named. Register names consist of a sequence of uppercase alphabetic characters, possibly followed by one or more numeric characters. Sometimes we'll refer to the entire contents of registers; at other times we'll just refer to portions of them.

Registers come in many different forms. We represent them with boxes in architectural drawings, and label them with their RTL names. Sometimes we'll label portions of registers, or indicate their bit positions. Here are some examples:

.gif

3.2. Common Micro-Ops

Micro-ops come in four essential varieties: Data transfer, Arithmetic, Logic/bit manipulation, and Shift. We'll discuss each in turn.

3.2.1. Micro-ops - Transfer

Transfer micro-ops are are used to move (actually, copy) information from one place to another within the computer. This transfer can occur in one of several ways:

Parallel

Parallel transfer is typically used for transfers between registers. Example: Transfer all contents of .gif into .gif on one clock pulse (a transfer which can inherently be done in parallel):

.gif

.gif

Sometimes, we want to control when the transfer occurs; we can do this by structuring the RTL expression to indicate the controlling condition:

.gif

This is often written with a control function specifying when the load takes place. In general, control functions are combinational (Boolean) expressions which are implemented using combinational synthesis. In this example, we might use:

.gif

.gif

We can also indicate that combinations of transfers are to be performed:

.gif

Serial

Serial transfer is used to specify that a collection of bits are to be moved, but that the transfer is to occur one bit at a time. We sometimes use this to move data using shift registers; the transfer is done on successive clock pulses, with a circular shift (rotate) register as the source, and a shift register as the destination:

.gif

.gif

Bus

Parallel and serial transfers are hard to implement when the number of possible transfers goes up (e.g., with eight registers and the ability to transfer from any one to any other, there are 56 possible transfers). Often, we'll use a bus to simplify this. A bus consists of a set of parallel data lines, equal in number to the number of bits which are to be transferred; inherently, this is a parallel transfer:

.gif

To transfer data using a bus: connect the output of the source register to the bus; connect the input of the target register to the bus; when the clock pulse arrives, the transfer occurs. You can even transfer the source to multiple targets by connecting the bus to the inputs of each target, but the controlling logic gets a bit complex.

Connections between the registers and the bus can be made in several ways. The two most common are the use of multiplexor banks whose outputs are the bus lines themselves (common select inputs are used to pass the data bits of the chosen register onto the bus), and the use of tri-state buffers to connect the register outputs to the bus lines (which allow us to connect all the .gif outputs to bus line i, but control which of those connections is ``active'' through control lines). On the other end of the bus, routing the bus contents into one of a set of registers is trivial - just connect the bus lines to the corresponding inputs of all the registers, and use a decoder to provide .gif control inputs to the registers.

We represent bus transfers in two ways in RTL expressions:

Explicitly:

.gif
Implicitly (assumes existence of bus):

.gif
Memory

While memory transfers are similar to register transfers, we usually identify them differently. Specifically, memory to register transfers are called read operations, while register to memory transfers are called write operations. Both require specification of the memory location to be used (which can be done through a special register or a special bus) and a storage location which will hold the result of a read or which holds the data to be written.

RTL expressions for a read operation, assuming the use of an address registers:

.gif

RTL expressons for a write operation, assuming use of a data register:

.gif

3.2.2. Micro-ops - Arithmetic and Logic

Minimally, a CPU typically provides addition, subtraction, increment, and decrement operations in its ALU (arithmetic-logic unit). It may also provide multiplication and division (which involve more complex hardware); these two are often implemented as collections of operations rather than as single micro-ops, so we won't discuss them here. Often, we find that subtraction is implemented as addition of the two's complement of the second operand. This allows both addition and subtraction to be implemented with just an adder.

Here are some common arithmetic operations which we might find in a simple ALU:

.gif

Implementation of these can be through separate circuits (e.g., addition/subtraction through one circuit, complements through a second, increment/decrement through an adder with inputs +1 and -1, etc.), or with a single arithmetic circuit which provides a collection of possible results, one of which is selected through a control input.

Logic micro-ops are like arithmetic, but treat each bit of the register(s) separately. Commonly, computers implement a basic set of logic operations (AND, OR, NOT, XOR); we can even use the common symbols for some of them (e.g., one's complement of .gif is .gif, XOR of .gif with .gif is .gif). However, we use alternate symbols for AND & OR to avoid confusion with arithmetic operations (AND is .gif, OR is .gif). Thus, in

.gif

the first .gif is an OR in a Boolean control function, the second is the arithmetic ADD operation, and the .gif is a logic OR operation.

For two input variables, there are four possible combinations of input values, and thus four outputs which must be specified. For each output, there are two possible results; thus, there are sixteen possible output combinations for two input variables, with values 0000 through 1111. We identify these as .gif, where i is the result pattern interpreted as an unsigned binary number; thus, 0000 is .gif, 1010 is .gif, and 1111 is .gif.

Because all of these can be generated by combining one or both of the input values with AND, OR, XOR, and NOT micro-ops in some fasion, we often find that only those four operations are actually implemented:

One bit:

.gif
Function table:

.gif

3.2.3. Micro-ops - Shift

Finally, we have the shift micro-ops. As the name implies, these move the information in a register by one bit position. Shifts come in three varieties: logical (the value shifted in is a 0); arithmetic (the value shifted in is 0 for left shift, the duplicate of the high-order bit for right shift; maintains sign of resulting number); and circular (the value shifted in is whatever was shifted out of the other end). Here are the corresponding RTL micro-ops:

.gif

4. Using RTL to Represent a Simple Computer

Next, we'll look at the use of RTL and architectural diagrams to represent the structure and operation of a simple computer system. The processor we consider will be a simple accumulator-based system having a 16-bit word and 4096 (212) words of memory.

4.1. Instruction Representation

Because our word size is 16 bits, instructions must fit within that space; as we need 12 bits to represent a memory address, we have only 4 bits remaining to indicate the opcode and any other per-instruction information we need. We'll use a 3-bit opcode, and use the remaining bit to distinguish between direct and indirect memory addressing.

Here is our basic instruction layout:

.gif

When the .gif (indirect) bit is 0, the value in .gif is the actual address of the operand (direct addressing). When .gif is one, .gif contains the address of an indirect word, which in turn will contain the actual operand address (indirect addressing).

4.2. Register Structure

Our simple machine will have the following registers. Sizes are determined directly from the intended use of that register - e.g., a register which only holds addresses will be the size of an address.

.gif

To simplify the connection logic between the registers, we'll use a common bus structure. The bus will be 16 bits wide (size of a word of memory); we'll need to connect six registers and memory to it, so we'll need three selection inputs to the bus controller logic.

.gif

This structure allows us to move information between the major elements of the CPU. Registers are connected to the bus as follows:

.gif

The ALU takes two 16-bit inputs and 8-bit input, and produces 17 (one to .gif, 16 to .gif). ALU inputs come from the .gif (allows .gif complementation, and use of .gif contents in arithmetic/logic/shift operations), from .gif (ditto), and from the .gif (input is moved through the ALU into the .gif). This design also allows for multiple parallel hardware operations - e.g., information can be moved onto/from the bus while the ALU is performing an operation.

4.3. Instruction Set

One potential problem with our instruction word is that there are only three bits available for the opcode. Normally, this would limit us to a total of eight instructions, which is too few to provide a meaningful set of operations. However, we can take advantage of the fact that not all instructions require the use of a memory operand. If we can design the instruction decoding logic in the CU properly, we can arrange to use a single opcode for all these instructions, and use the 12 bits in the .gif field to distinguish between the operations. We can even use the .gif field here - as there is no memory operand, using indirect addressing doesn't make any sense, so we can use that additional bit to help extend the instruction set.

We'll use opcode 7 to represent all the ``extended'' instructions, leaving opcodes 0 through 6 for the ``memory reference'' instructions. Further, we'll use the .gif bit to distinguish between ``register'' instructions and ``i/o'' instructions.

.gif

Although our simple instruction set is complete, it is fairly inefficient - there are no subtract, multiply, divide, or other similar operations. All of these operations can be performed by combining the existing instructions in different sequences (subtraction by using ADD after moving the second operand to the .gif, then using CMA and INC on it); while a more complete set would have made programming easier, it would also have increased the complexity of the machine to the point where it would not be understandable.

4.4. The Control Unit

We next need to consider the structure of the CU. In essence, the CU is a large sequential circuit. It must be able to sequence the execution of micro-ops, as well as fetch, decode, and execute instructions. All CPU registers are controlled by a single master clock. While all registers are connected to this clock, none will change state unless the other control inputs are present to initiate the change at the arrival of a clock pulse.

The two major data sources on which CU decisions are based are decoded information from the instruction being executed, and timing information which allows the sequencing of micro-ops to support instruction execution. Instruction decoding is handled quite easily. Once an instruction is moved into the .gif, the opcode can be taken from that register and fed into a 3x8 decoder to generate separate output signals (.gif) for each of the eight possible opcode values. Similarly, the .gif bit can be moved into a special .gif register for later use.

Timing signals will be generated by feeding the output of an N-bit sequence counter, .gif, into an Nx2N decoder; this will generate 2N timing signals (.gif) which will be used in control functions for micro-ops to limit their operation to particular units of time. The number of timing signals required depends on the longest sequence of micro-ops required for any single instruction. We'll use a clearable counter, and add a micro-op to the system's repetoire which ``resets'' the counter to 0; we'll use this at the end of each instruction's micro-op sequence to avoid having the system move through all 2N timing values, which will speed up execution of instructions.

Here's a block diagram of the basic CU, assuming a four-bit counter as the timing sequence generator:

.gif

4.5. The Fetch/Execute Cycle

Our CPU's fetch/execute cycle will look like this:

Fetch an instruction from memory.
Increment the .gif so that it points to the next instruction.
Decode the instruction.
If the instruction specifies indirect addressing, retrieve the indirect word from memory.
Execute the instruction.

The fetch and decode phases of the cycle can be specified by the following RTL expressions:

.gif

The logic to implement this will look something like the following:

.gif

At time .gif, we must move the .gif into the .gif; thus, the bus select inputs must be 010, and we must enable the .gif input of .gif. The double transfer occurs at the next clock pulse, which also advances .gif to 1.

At time .gif, we must move memory into the .gif (bus select 111, .gif input of .gif enabled); also, we must increment .gif (.gif input of .gif enabled). The next clock pulse triggers both actions, and advances .gif to 2.

Note that in all cases there will be other control functions affecting the bus select inputs and register control lines, hence the extra input lines to the OR gates.

4.6. Decoding and Executing the Instruction

We feed .gif into a 3x8 decoder, which produces outputs .gif through .gif. These control signals will directly indicate the opcode to be executed.

There are four basic paths through the instruction execution portion of the CU logic: register; i/o; memory direct; and memory indirect. The path is chosen based on the opcode and .gif values. Memory reference instructions differ only in the determination of the effective address for the operand. For indirect references, one additional clock pulse is required (to move the address of the indirect word to .gif, so that the actual operand retrieval logic will find the final EA in .gif. This is handled by either doing that move at time .gif, or doing nothing at all. Thus, we will execute particular instructions under the following conditions:

.gif

*
Note that the ``execution'' steps all execute .gif to reset the timer to .gif; this moves the CU back to the fetch phase after the execution of the previous instruction.

4.6.1. Instructions - Register Reference

>pp If .gif is true, the opcode must be 111, and thus we have either a register or an I/O instruction. If .gif is 0, .gif indicates a register instruction; we represent this in a control function as .gif

Register instructions use the .gif field to specify the particular operation to be performed. All these operations are implemented in terms of the basic arithmetic and logic operations of the ALU, or the control inputs of the various registers, as follows:

.gif

.gif

.gif is a one-bit register which controls the .gif input of .gif; initially, it contains 1 (causing .gif to count time units), but changing it to 0 effectively prevents the computer from moving beyond the current time unit, therefore halting it.

4.6.2. Instructions - Memory Reference

Indirect addressing has already been handled by the time .gif arrives (when .gif we executed the micro-op .gif which moved the indirect address into the .gif, so the .gif contains the final EA of the operand.

To retrieve the operand, then, we need to perform .gif for those instructions which require the operand. This is AND, ADD, LDA, and ISZ; thus, the full RTL expression is

.gif

Individual instructions are implemented as follows:

.gif

The BSA instruction copies the return address into the address specified by the operand, and then effects a transfer to the word following that one by incrementing .gif and then moving it into the .gif (which actually causes the transfer of control). This means that each subprogram must begin with a data word; for a subprogram which is at location 100, the actual first instruction of the subprogram must be at 101, as 100 will be used to hold the return address.

Returning from the subprogram is done with a BUN instruction whose operand is the location containing the return address. The operand is referenced indirectly - that is, the BUN refers to the return address data word, but has the .gif bit turned on; this means that the CU will interpret the return address word as an indirect word which actually contains the final EA of the operand. That will cause the actual return address to be used as the final EA; the BUN moves that into the .gif, which effects the return.

The ISZ insruction requires the longest timing sequence - seven units. We could, therefore, get by with only a three-bit sequence counter (counting .gif through .gif; however, we'd still need to implement .gif at the end of each sequence.

Also, having extra timing slots allows us to expand the operation of the computer - we could, for instance, increase the word size to allow for a larger opcode field; we could then add an instruction which might require more than seven time slots to execute.

4.6.3. Instructions - I/O and Interrupt

Our computer must also have i/o and interrupt control instructions. For the sake of time, we won't discuss them here, but will represent them in later discussions.

4.7. The Complete Machine

Here is the full control function/micro-op description of the computer:

.gif

4.8. Design

The basic computer consists of these components:

.gif

We need control signals for the following:

control inputs of nine registers
read and write inputs of memory
set, clear, and complement the seven flip-flops
bus selection inputs
control inputs of arithmetic/logic circuits

4.8.1. Design of Control Logic

Design of registers requires that we scan the control table to locate all micro-ops which modify the contents of that register. The .gif, .gif, and .gif functions are implemented as follows:

.gif

4.8.2. Design of TR

*
.gif is modified in only one situation:

Micro-op:

.gif
Implementation diagram:

.gif

4.8.3. Design of AC

.gif is a bit more complicated:

.gif

From this, we can derive:

.gif

Implementation of .gif control logic:

.gif

Implementation of .gif and ALU connections:

.gif

Connections to .gif would be designed in a similar fashion.


January 23, 1999 at 6:26 PM