RISC Design Issues

The major issues that are encompassed with RISC design can be roughly summarized asfollows: • Analyze the applications to identify the key operations; • To execute these key operations design an optimal data path; • Using the devised optimal data path, design appropriate instructions; • Add new instructions only if they do not slow down the machine;}}

Finally, the same process should be repeated for other resources within the CPU, such as cache memory, MMU, FPU, co-processors, and similar other ones.

RISC Instruction Set

Call instruction, in essence, the procedure call, is probably the most time-consuming operation in a compiled high-level language program. Different RISCprocessors, however, have efficiently implemented these operations in manydifferent ways within the confinement of their underlying architectures. Twosignificant aspects are attached with this instruction that are found to be different, and thus, differently implemented in different RISC processors, such as thenumber of parameters and variables (both local and global) that a procedure dealswith, and the depth of nesting. • Branch instructions are available in the instruction set of all RISC processors, butthese instructions are usually found to differ in different RISC processors withrespect to various features that they normally contain. However, they are alwaysdesigned in such a way as to ultimately speed up the performance using the underlying pipeline organisation, and also take the assistance of the prescribed compiler that optimizes the generated object code at the time of program compilation.}}

  • • Read and write instructions are normally used to read and write the special registers.
  • • Save and restore instructions sometimes are also used as separate instructions in some RISC processors. They usually manipulate only the register window and stack pointer.

Apart from these common ones, there exist several other instructions and also some special instructions that are essentially different and are differently implemented in different RISC processors.

RISC Instruction Format

The basic instruction format used in a generic RISC machine as shown in Figure 9.1 is:

  • 1. 7-bit opcode
  • 2. Two 5-bit registers (DEST and SOURCE)
  • 3. A mode bit (I)
  • 1 = 0, not immediate 1 = 1, Immediate
  • 4. A Condition code bit (C)

C = 0 Don't set condition code С = 1 Set condition code.

For ordinary instruction, like ADD, the operands depend on the 7-bit. If I = 0, this is called register addressing. One operand is taken from the source register, the second operand is taken from the register specified by the low-order 5bits of the OFFSET field, and the result is stored in the destination register. If I = 1, the second operand is a 13-bit constant, giving immediate addressing.

RISC Addressing Mode

The low-order 5 bits of the OFFSET field specify the register (25 = 32 registers), and this fact that Register 0 is hardwired to the constant 0.

  • • Indexed addressing: The OFFSET is added to the source register to form the effective memory address.
  • • Register indirect: If OFFSET is 0, then indexed addressing reduces to register indirect addressing.


Generalized RISC instruction format.

  • Register direct: If the register 0 is specified, i.e. low-order 5-bits is not 0, then 13-bit offset field gives (213 = 8K) direct addressing of the bottom 8K of memory, which is useful for accessing global variables.
  • Other modes: They can be constructed at runtime by building an address in a register and then using register indirect or indexed addressing.
  • PC-relative conditional JUMP: This is realized by concatenating or adding the low-order 3 fields (13 +1 + 5 = 19) to form a 19-bit signed offset; the DEST field then specifies the condition.

Register Windows: The Large Register File

RISC architecture, by virtue of its guiding philosophy, always provides a large number of physically small registers that form register files. One of the main objectives of using such a large set of registers is to hold the most frequently accessed operands in the registers, thereby aiming to reduce the required number of frequent visits to slower main memory as far as possible, for the sake of performance improvement.

Registers are mostly used to hold the operands of executing programs (procedures). When switching of procedure (process switch) occurs due to a procedure call, all the registers used by the calling procedure need to be saved in order to free them, to be reused for other purposes. On return from the called procedure, all the registers which were saved once again need to be restored (loading back to the corresponding registers) to allow the calling procedure to again continue from the point where it left. Saving and restoring of information stored in the registers is somehow a problematic and time-consuming one that often limits the overall performance to attain the desired level. To get rid of this additional overhead, large number of registers available in the processor are logically divided into multiple small sets of registers, and each such set is called register window that can be assigned to each individual procedure. At any given time, an application program sees only a particular register window (a set of specific registers) allocated to them. Incidentally, a typical calling procedure usually employs a few parameters to be passed that are adequately supported by this small number of fixed registers present in a window. The net effect is that in the event of a procedure call, the processor is then simply switched from one register window to a different appropriate one, rather than saving/restoring all registers individually in memory, thereby avoiding the critical huge overhead associated with this activity.

This concept is explained in Figure 9.2. Each register window here is formed with a set of a certain number of registers and is divided into three predefined fixed-size sections. Consider the procedure at level L. Its in-registers holds parameters which are passed down from the procedure (level L - 1) that called the current procedure, and holds the results that are to be passed back up from the current one to its caller (level L - 1). Local registers are used by local variables of the respective procedure. Out-registers are used to exchange parameters and results with the next lower level (level L + 1). It is to be noted that the out-registers at one level are physically the same as the in-registers at the next lower level. This notion of using overlapped register windows is perhaps one of the most sparkling features introduced first by Berkeley RISC architecture that permits parameters to be simply passed without the actual movement of data. It is once again reiterated that except for the overlap, the registers used at two different consecutive levels are always physically distinct.


Three overlapping register windows.

Since the number of nested procedure (depth of levels) activations hardly changes beyond a small range, and is observed to mostly remain bounded over a considerable duration of time, the number of register windows needed for this purpose is also thus kept limited, in the processor design. These register windows can then be employed to hold only a limited number of most recent procedure activations. Other prevailing age- old activations can be saved in memory and later could be restored at an appropriate time. In actual implementation, the register windows as well as the ways they are operated as described, however, can be realized by arranging the register windows in the form of a circular organisation (depicted in SPARC processors, Figure 9.5 available on the website). Two notable examples can be cited where this approach has been successfully implemented in practice, namely Sun SPARC architecture and the IA-64 architecture used in Intel's Itanium processor.

Register File and Cache Memory

While the register files are used as temporary memory storage to hold the most frequently used operands, seemingly appearing almost cache-like, they provide relatively much faster operations when operands are needed. One of the main reasons is that registers are tightly connected with the ALU, whereas the on-chip caches are comparatively loosely coupled. Although optimization in cache organisation and availability of today's multilevel on-chip caches provide enough support to the CPU to yield rather fast operation, the ultimate objectives in the use of these two important hardware resources are somewhat different, which again mostly depends on the situations where they can be exploited most profitably. Usually, these two vital resources address different domains, conducive individually to them, based on their inherent characteristics. That is why, a proper choice between a large window-based register file and a cache is not very straight off. However, a moderate comparison in respect of their effective usage can be summarized based on their capabilities in the context of different situations that are often encountered during the execution of applications, in general.

A brief comparison in the use of register file and cache memory in different situations is given in the website: http://routledge.com/9780367255732.

Comparison between RISCs and CISCs

The computer architecture of RISC and CISC machines belong to two different worlds. The conventional CISC machine has continuously upgraded and enhanced its sophisticated microprogramming to overcome the limitations of their functional components, offering a highly versatile facility, but at the cost of continuously increasing its size and complexity that, in turn, made the interpreter constantly bigger and slower, using more space in chips to accommodate them. This again has adversely affected the design of the chips which are basically made of silicon transistors having relatively slower switching times. In fact, CISC complexity was exposed and came to the notice of the public when a design flaw affecting the floating-point division instruction of the Pentium was discovered in 1994. The cost to Intel for this bug, including the replacement cost of Pentium chips already installed in PCs, was about $ 475 million. RISC machines, on the other hand, started their design approach from a different perspective. This machine is essentially a computer with a small number of vertical microinstructions which can be directly executed by the hardware with no intervening interpreter. The optimization in compiler technology used in RISC machines, however, generates microcode directly at an acceptable level taking the necessary help from its hardware designers. In essence, the fundamental differences that exist between these two classes of machines are summarized in Table 9.1.

The major architectural distinction between a typical RISC processor and a typical CISC processor is given in the website: http://routledge.com/9780367255732.


Salient Characteristics of CISC and RISC Architectures



Complex Instruction Set Computers (CISC)

Reduced Instruction Set Computers (RISC)

Instruction set size

Large set of instructions.

Small set of instructions, mostly register-based.

Instruction formats

Instructions with variable formats, 16-64 bits per instruction.

Instructions with fixed format.

Addressing modes

Normally 12 - 24.

Limited to 3 - 5.


Mostly ranges from 8 to 24 GPRs.

Large number of GPRs ranging from 32 to 192.

Cache design and usage

Use of unified cache. Recent trend to use split caches for instructions and data.

Mostly use of split caches. Separate caches for instructions and data.

Processor speed (clock rate and CPI)

33-66 MHz initially and may be a few GHz or more in recent release, and CPI mostly between 2 and 15.

33-66 MHz initially and may be a few GHz or more in recent release and CPI mostly between 1.5 and 2.

Memory references

Many instructions may visit slower memory multiple times per instruction.

Memory visits require only in load/ store instructions.

Software attributes

Complexity lies in the microprogram development.

Complexity lies in optimization of the compiler.

CPU control design

Most are microcoded, but recent CISC also uses hardwired control.

Almost all hardwired control without using control memory

< Prev   CONTENTS   Source   Next >