Stack Organisation

One of the useful features that are included in the modern macro-architecture of CPU in most computers is a stack or last-in-first-out (LIFO) list. Block-structured high-level programming languages are normally implemented in such a way that when a procedure or function is exited, the storage it had been using for local variables is temporarily stored and then released, and the easiest way to achieve this goal is by using a data structure called a stack. A stack consists of data items (words, characters, bits, etc.) stored in consecutive order in a portion of a large memory or a stack that can be organised as a collection of a finite number of memory words or registers. Several operations are defined on stacks, and the most important ones are operations/or the insertion of elements called PUSH (or pushdown) and/or the deletion of elements called POP (or pop-up, i.e. removing one item so that the stack pops up). The items are stored in such a manner that the item stored last is the first item retrieved (LIFO). Most of the time, the stack is partially filled with stack elements and the remainder is available for stack growth. Three addresses are thus needed to keep track of the status of the stack for proper operation, namely SP, stack base, and stack limit, and these are often stored in CPU registers.

Stack Pointer (SP) associated with each stack is a register or memory word that contains the address of the top of the stack (TOS). If an item is appended to or deleted from the stack, the pointer is incremented or decremented to contain the address of the new top of the stack (TOS). The number of elements in the stack, or length of the stack, is variable. If n is the number of bits available in SP, then the maximum number of locations that SP can point to is 2”. This is the maximum size of the stack with this SP. Stack base contains the address of the bottom location in the reserved block. If the contents of the SP are equal to the contents of the stack base, the stack is empty. If an attempt is made to POP when the stack is empty, an error is reported. Stack limit contains the other end (upper end) of the reserved area for the stack. If the contents of the SP are equal to the contents


A typical stack organisation.

of the stack limit, the stack is full. If an attempt is made to PUSH when the block is fully utilized for the stack, an error is reported. Most computers do not provide hardware to check for stack overflow (full stack) or underflow (empty stack). The stack limits can be checked by using two processor registers: one to hold the upper limit (stack limit) and the other to hold the lower limit (stack base). After a push operation, SP is incremented and is compared with the upper-limit register, and after a pop operation, SP is decremented and is compared with the lower-limit register. Figure 3.2, however, depicts a block diagram showing all these elements. In some stack implementations, the top two stack elements are often stored in two separate registers to speed up stack operations, as shown in Figure 3.2. In this case, the SP is the third element of the stack. One of the distinct advantages of using stack, which is already implemented in stack-organised computers and also in other computers with this feature, is that it always provides relatively shorter machine instructions with no addresses (zero-address instruction) that summarily save both CPU time and necessary memory space for machine instructions - one of the vital criteria of a CPU design. For more details, see stack-organised computer, given in later section.

A brief detail of the stack organisation with an example showing the stack operations is provided with figure in the website:

Generalized Structure of CPU

Keeping this register organisation in view, an overall structure of a generalized CPU, in a slightly detailed form considering some typical basic elements of the ALU and control unit, is shown in Figure 3.3. Here, the program control unit and data-processing unit have separately illustrated with their respective individual processing components and their interconnections as well as the interconnections between these two units as a whole.


A schematic block diagram of a CPU with general-register organisation.

The interconnections path used between the components and also between these two units is called the internal processor bus, which is required to transfer data between various registers and the ALU during the actual computation or processing of data. It is interesting to observe the similarity between the internal structure of the computer as a whole and the internal structure of the CPU. In both the cases, there is a small collection of major resources (computer: CPU, I/O, memory; and CPU: ALU, control unit, registers) connected by data paths.

As shown in Figure 3.1, the CPU communicates directly with the main memory, which is a high-capacity, multichip random-access memory having relatively low speed than CPU via the system bus. This speed disparity between processor and main memory is truly an inherent problem till today, even with tremendous advancement in technological evolution and revolution. That is why, most computers nowadays use a comparatively high-speed, smaller-capacity cache memory placed in between the CPU and the main memory in order to reduce the repeated slower main memory visits required by the CPU. The CPU communicates directly with I/O devices, or I/O modules connected with I/O devices, in much the same way as it communicates with external memory via the system bus.

CPU Operation: Instruction Execution

The CPU executes each instruction in a series of small steps:

  • 1. Fetch the current instruction pointed by the PC from memory into the IR;
  • 2. Change the PC to point to the next instruction in memory;
  • 3. Decode the instruction just fetched, to determine its type;
  • 4. If the instruction uses data in memory, determine where they are;
  • 5. Fetch the data, if any, into internal CPU registers Data Register (DR);
  • 6. Execute the instruction;
  • 7. Store the result in the appropriate place;
  • 8. Go to step 1 to begin executing the next instruction.

This sequence of steps (micro-operations) is central to the operations of all types of CPUs and is frequently referred to as the fetch-decode-execute cycle or instruction cycle. However, the exact sequence of events during an instruction cycle depends to a large extent on the design of the processor. Moreover, not every step in the instruction cycle is needed by each instruction for its execution. In fact, the sequence of steps to be followed for the execution of an instruction may vary, which entirely depends on the type of instruction being fetched. A check for pending interrupt requests is also usually included in the instruction cycle.

During an instruction cycle, the action of the CPU is defined by the sequence of microoperations it executes. The time required by the CPU to execute the shortest well-defined micro-operation is the CPU cycle time or clock period T clock, which is a basic unit of time for measuring CPU actions. It should be noted that the number of CPU cycles required to completely process an instruction varies with the instruction type as well as to the extent to which the processing of an individual instructions can be overlapped. At this moment, we are strictly assuming that each instruction in sequence is fetched, decoded, and finally executed.

A brief detail of the instruction cycle along with a flow chart of the steps being followed is given in the website:

Instruction Set

Each operation of a CPU is determined by the instruction(s) it executes. These primitive instructions of the computer are referred to as machine instructions or computer instructions. The collection of these different instructions that the CPU can execute is referred to as the CPU's instruction set that precisely categorizes a CPU and signifies its capability and versatility. This set should be complete, efficient, and above all, easy to use; fully satisfy the performance requirements of the computer; and be directly executed by the electronic circuits built in the form of gates.

Machine Instruction Elements

Each machine instruction specifies some particular operation, and hence, each instruction must contain sufficient information so that the CPU can execute the operation embedded in the instruction. The elements of a machine instruction are as follows:

  • Operation code tells what action (operation) to be performed (e.g. ADD LOAD, etc.), and is specified by a unique binary code, known as operation code or opcode;
  • Source operand are those that are input for the operation;
  • Result operand The operation may produce a result after execution;
  • Mode reference specifies the means (way) by which the operand can be accessed and obtained;
  • Next instruction reference provides the CPU with the information of the next instruction to be fetched when the execution of the current instruction is completed.

Source operand and result operand, however, can be obtained in one of three places, namely main memory or virtual memory, CPU register, and I/O device module.

Instruction Formats and Design Criteria

Each (machine) instruction is divided into groups of bits called the fields, indicating the constituent elements forming the instruction. The layout of the instruction, or the way the instruction is expressed, is known as the instruction format. It is usually depicted in the form of a rectangular box symbolizing the bits of the instruction as they appear in memory words or in a control register. Three different types of instruction format exist: each must include an opcode (implicitly or explicitly), and zero, one, or more operands, as is shown in Figure 3.4.

Opcodes are generally expressed in symbolic representation (called a mnemonic). Each explicit operand is referenced using one of the addressing modes (to be discussed later), and the format must include implicitly or explicitly the addressing mode for each operand. Operands residing in memory are specified by their memory address. Operands residing in processor registers are specified with a register address, and lesser number of bits is required to address a register operand than an operand residing in memory. A register address in a general-purpose register computer is a binary number of k-bits that defines one of 2k registers in the CPU. If a machine has 16 (16 = 24) registers, then 4 bits are required to address an individual register.

The design of an instruction format is a major issue and equally complex to realize due to the presence of many deciding factors, such as processor complexity, number of available processor registers, processor speed, memory characteristics and size, memory organisation, memory-transfer rate, bus structures and similar others, that need to be taken into account. The designer thus always negotiates with a lot of competing and conflicting factors while developing machine instructions and its formats, in which each factor appears to be equally significant. Therefore, it is the trade-off that will ultimately decide which factor will receive more importance to fit in the underlying design to achieve the ultimate target. Accordingly, a compromise may be needed on some factors, to strike a reasonable balance in between. It is thus found really critical to decide the effective instruction format length that can be efficiently utilized. In fact, an instruction set used in a computer will normally have a variety of instruction formats of different lengths to realize the ultimate objectives.


A simple instruction format with three different types: (a) zero operand (b) one operand (c) more operands.

A brief detail of instruction formats and design criteria is given in the website: http://

< Prev   CONTENTS   Source   Next >