 # Fixed-Point Arithmetic

Four basic arithmetic operations for fixed-point as well as for floating-point numbers are addition, subtraction, multiplication, and division. The arithmetic algorithms and the related logic circuits needed to implement these arithmetic operations are the main focus of this section and onwards. It has been shown that two n-bit signed numbers expressed in 2's complement representation can be added/subtracted using n-bit binary addition/subtraction, FIGURE 7.4

Representative block diagram of ALU with inputs and outputs.

treating the sign bit the same as the other bits. In other words, a logic circuit that is designed to add/subtract unsigned binary numbers can also be used equally well to add/subtract signed numbers in 2's complement form. The time needed by an ALU to perform an arithmetic operation often influences the performance of the processor. Addition/subtraction in this regard takes relatively lesser time than multiplication and division which require more complex circuitry than either addition or subtraction operation. That is why, some modern techniques are used in today's advanced computers that can perform all arithmetic operations at a comparatively higher speed.

Addition and subtraction for fixed-point numbers are the most fundamental operations that are found in the instruction set of every computer since inception. The elementary scheme to accomplish addition and subtraction is shown in the form of a basic circuit in Figure 7.5, which consists of a binary adder as central element that executes addition/ subtraction operation with two unsigned integers sent from two registers X and Y, and then produces the result with an overflow indication. The result after computation may be stored in one of these registers or in a third register (depending on the design). The overflow, if happened, is usually stored in a 1-bit flag (for 0 = no overflow, 1 = overflow). For subtraction, the subtrahend, i.e. here the content of register Y, is passed through a 2's complementer so that the 2's complement of its content can be presented to the adder. Appropriate signals from the control unit are here required to ensure whether the contents of register Y will need to go through the complementer or not, which again depends on whether the operation is subtraction or addition, decoded by the control unit in advance. To implement all these, the required hardware can, however, be designed in many different ways that involve various trade-offs between operating speed and related hardware cost. FIGURE 7.5

Schematic block diagram with hardware components for addition and subtraction.

In general, the addition of two n-bit numbers X and Y is performed by subdividing the numbers into stages X, and Y, of length n, where n > «,> 1. X, and Y, are added separately, and the resulting partial sums are then combined to form the overall sum. The formation of this sum, however, involves assimilation of carry bits generated by the partial additions. However, a 1-bit adder (bit-by-bit addition), also called a full adder, can be directly implemented in various ways that can be modified even to work as a full subtracter, since subtraction of a number (Y) from another number (X) can be defined as: Thus, the subtraction can also be done using a full adder.

### Subtracters

The «-bit adders can operate addition correctly on both unsigned and positive numbers such as X and Y, since the 0 sign bit of a positive number has the same effect as a leading 0 in an unsigned number. To add negative numbers (they have las the sign bit in 2's complement representation), like adding -X to Y, is equivalent to subtracting X from Y. Similarly, subtracting -X from -Y is equivalent to adding X to -Y. So the ability to add negative numbers implies the ability to perform subtraction.

A brief detail of different types of adders and subtracters with their respective figures is given in the website: http://routledge.com/9780367255732.

Classical adders suffer from several major drawbacks: one of these is that they cause too much delay in developing their outputs, which are further increased while implementing overflow detection. To minimize all these inherent shortcomings, one approach is to use an enhanced logic-gate network structure (larger than the usual) that ultimately must speed up the generation of carry signals. In other words, this approach must provide some means to compute the input carry beforehand, which can be arrived at any stage i directly, similar to carry-like signals obtained from all the preceding stages i - 1, i - 2,..., 0, rather than waiting for usual production of actual carries that ripple slowly from stage to stage. Adders that use this principle are called carry-lookahead adders.

The basic principle lying behind an «-bit CLA is that it is formed from n stages, each of which is basically a full adder, but modified by replacing its carry output line c, at stage i by two auxiliary signals called g, (generate) and p, (propagate), respectively. Let z, be the sum, and c,be the carry-out of two 1-bit numbers x,and y,at stage i. Then, by suitable mathematical calculation, it can be shown that the generate function g, and the propagate function p, at stage i can be defined only in terms of x, and y, at the stage i without depending on the value of the carry of its just previous stage c, _Moreover, all gi and p, functions for all i's in an я-bit adder can be formed independently and in parallel in one logic-gate delay after the X and Y vectors are applied to the inputs of the adder. Similarly, the output function z, at the stage i can also be defined in terms of x, and y, at that stage i.

Although a substantial improvement is thus achieved by this design in reducing critical gate delays, the number of gates, however, grows almost in proportion to n2 as n increases. In contrast, the number of gates in a two-level adder of the sum-of-products type grows exponentially with n, while the number of gates in a ripple-carry adder grows linearly with n. Moreover, the complexity of the carry-generation logic embedded in the CLA, including its gate count, its maximum fan-in, and its maximum fan-out, increases steadily with the increase in n. Moreover, it is found that the last AND gate and the OR gate require a fan-in of i + 2 in generating c,. For c3 in the 4-bit adder, a fan-in of 5 is thus required. This is not only almost about the limit of practical gates, but consideration of cost in practice is also a matter of concern that often limits n in a single CLA module not to exceed four or so.

More details of CLA with mathematical calculations and corresponding figures are given in the website: http://routledge.com/9780367255732.

CLA, no doubt, is relatively enriched with several distinct advantages compared to all of its contemporary counterparts. But the inherent shortcomings as already explained encompassed with its embedded carry-lookahead logic often limit n in the addition/subtraction operation of и-bit integers to keep и bound within four or so. That is why, this adder design cannot be directly extended to longer operand (increased value of «) sizes.

One possible solution in this regard may be to improvise a method by combining ripple-carry propagation and carry-lookahead approaches together to let carry signals be handled in a reasonably efficient way. This approach, however, can be easily exploited to design larger adders of this kind needed to execute add instructions with longer operand sizes, say up to 64 bits. If we replace « number of 1-bit (full) adder stages in the «-bit ripple- carry design with и number of L-bit CLAs, we obtain an nk-bit adder. For example, four 4-bit adders, each one of a 4-bit carry-lookahead circuit, can be then connected in ripple- carry design to realize a relatively larger 16-bit adder. Similar approach can be employed with eight 4-bit CLAs to be connected to form a 32-bit adder.

The detail of this approach with related figure is given in the website: http://routledge. com/9780367255732.

The inherent drawbacks of CLA, or of any derived adder based on this principle, often limit и in the addition/subtraction operation of и-bit integer that keeps « bound within a small range. In practice, arithmetic operation often requires longer operand

In this example, four decimal numbers are taken for addition. Firstly, all the digits in the unit place (first column) are added that produce a sum of 9 and a carry digit of 1 written in the second line shown by arrow (<—) shifted left by one position. Similarly, all the digits in the tens place are added that produce a sum of 0 and a carry digit of 2 which is written in a similar way like the addition of digits in the unit place. Likewise, all the digits in the hundreds place are added that produce a sum of 8 and a carry digit of 1 which is written in a similar way as mentioned above. These summations (column-wise) can be then executed in parallel to produce a sum vector of 809 and a carry vector of 121, since no propagation of carry is required from the addition of digits in units place to the addition process of digits in tens place, and similarly from the tens place to the hundreds place, and so on. When the addition of all digits present in the operands is over, thereby producing a sum vector and a carry vector, the sum vector and the shifted carry vector are then simply added in the conventional way using either CPA or CLA that produces the final result.

Assume that the CSA here takes three binary numbers I, J, and K, and produces two outputs, namely the sum vector S and the carry vector C. The sum vector S and the carry vector C, however, can be obtained using the following expressions: and Here, all logical operations are performed bit-wise. FIGURE 7.6

An illustration of carry-save addition technique with decimal numbers. FIGURE 7.7

A representative scheme of a two-stage CSA.

The final arithmetic sum of three inputs, i.e. Z = I + J + K, is obtained by adding the two generated outputs S and C, i.e. Z = I + J + K = C + S, using a CPA or CL A.

In general, an и-bit CSA consists of n disjoint full adders. As shown in Figure 7.7, its input is three n-bit numbers to be added, while the output consists of the и sum bits forming a word S, and the и carry bits forming a word C. Unlike the adders discussed so far, there is no carry propagation within the individual adders. The outputs S and C can be fed into another и-bit CSA where, as shown in Figure 7.7, they can be added to a fourth и-bit number L. It is to be noted that the carry connections are shifted to the left to correspond to normal carry propagation. In this way, m numbers can be added using a tree-like network of CSAs to produce a result of the form (S, C). However, to obtain the final sum, S and C must be added by a conventional adder (CPA or CLA) with usual carry propagation.

Carry-save addition mechanism appears to be highly conducive and well suited to pipelined implementation. Multistage CSA circuit can now be built up in the form of a Wallace tree (see Chapter 8) to handle these groups (all groups in one stage) as just explained, and can be then implemented as a sequence of substages within the multiplier circuit built in the execute stage in a pipelined architecture. The type of operands used can also, however, be modified easily to handle floating-point numbers. The input significands (imantissa) are then processed in a fixed-point multiplier pipeline. The exponents are combined by a separate fixed-point adder, and a normalization circuit can be then included at the appropriate position in the execute stage within the pipeline.