2 Text Books: Reference Books: Carl Hamacher, Zvonko Vranesic, Safwat Zaky: Computer Organization, 5th Edition, Tata McGraw Hill, 2002.Reference Books:Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Naraig Manjikian : Computer Organization and Embedded Systems, 6th Edition, Tata McGraw Hill, 2012.William Stallings: Computer Organization & Architecture, 9th Edition, Pearson, 2015.
3 Objective Number and character representations Addition and subtraction of binary numbersAdder and subtractor circuitsHigh-speed adders based on carry-lookahead logic circuitsThe Booth algorithm for multiplication of signed numbersHigh-speed multipliers based on carry-save additionLogic circuits for divisionArithmetic operations on floating-point numbers conforming to the IEEE standard
5 Signed Integer 3 major representations: Sign and magnitude One’s complementTwo’s complementAssumptions:4-bit machine word16 different values can be representedRoughly half are positive, half are negative5
6 Sign and Magnitude Representation High order bit is sign: 0 = positive (or zero), 1 = negativeThree low order bits is the magnitude: 0 (000) thru 7 (111)Number range for n bits = +/-2n-1 -1Two representations for 06
7 One’s Complement Representation Subtraction implemented by addition & 1's complementStill two representations of 0! This causes some problemsSome complexities in addition7
8 Two’s Complement Representation like 1's compexcept shiftedone positionclockwiseOnly one representation for 0One more negative number than positive number8
10 Addition and Subtraction – 2’s Complement 4+ 37010000110111-4+ (-3)-71100110111001If carry-in to the highorder bit =carry-out then ignorecarryif carry-in differs fromcarry-out then overflow4- 310100110110001-4+ 3-1110000111111Simpler addition scheme makes two’s complement the most commonchoice for integer number systems within digital systems10
11 2’s-Complement Add and Subtract Operations 5-()2+34761(a)(c)(e)(f)(g)(h)(i)(j)6-()24+37(b)(d)4+()2-385Figure 's-complement Add and Subtract operations.11
12 Overflow - Add two positive numbers to get a negative number or two negative numbers to get a positive number-1+0-1+0-21111-20000+111110000+1111000011110-3-30001+2+21101110100100010-4-41100+300111100+30011-5-5101110110100+40100+41010-610100101-60101+5+51001100101100110-7+6-71000+6011110000111-8+7-8+75 + 3 = -8= +712
13 Overflow Conditions53-8-7-27OverflowOverflow527-3-5-8No overflowNo overflowOverflows can occur when the sign of the two operands is the same.Overflow occurs if the sign of the result is different from the sign of the operands.Overflow when carry-in to the high-order bit does not equal carry out13
14 Sign Extension Task: Rule: Given w-bit signed integer x Convert it to w+k-bit integer with same valueRule:Make k copies of sign bit:X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0w• • •XX k copies of MSBkw14
15 Sign Extension Example short int x = ;int ix = (int) x;short int y = ;int iy = (int) y;DecimalHexBinaryx152133B 6DixB 6Dy-15213C4 93iyFF FF C4 9315
17 Addition/subtraction of signed Numbers =c+11xyCarry-inSumCarry-outÅ+At the ith stage:Input:ci is the carry-inOutput:si is the sumci+1 carry-out to (i+1)ststate137+YExample:1=Legend for stageiXZ+ 6xysCarry-outc+1Carry-in
18 Addition logic for a single stage SumCarryyixcs1+cFull adderci+1(FA)isiFull Adder (FA): Symbol for the complete circuitfor a single stage of addition.
19 n-bit adder Cascade n full adder (FA) blocks to form a n-bit adder. Carries propagate or ripple through this cascade, n-bit ripple carry adder.FAyn1-xcsMost significant bit(MSB) positiony1xsFAcn-FAc1yxsLeast significant bit(LSB) positionCarry-in c0 into the LSB position provides a convenient way toperform subtraction.
20 K n-bit adderK n-bit numbers can be added by cascading k n-bit adders.ckns1-()yxbitadderynxs21-bitadderyxn-bitc1sadderEach n-bit adder forms a block, so this is cascading of blocks.Carries ripple or propagate through blocks, Blocked Ripple Carry Adder
21 n-bit subtractorRecall X – Y is equivalent to adding 2’s complement of Y to X.2’s complement is equivalent to 1’s complement + 1.X−Y=X+ Y +12’s complement of positive and negative numbers is computed similarly.FAyn1-xcsMost significant bit(MSB) positiony1xsFAcn-FA1cyxsLeast significant bit(LSB) position
22 n-bit adder/subtractor (contd..) yyyn-11Add/Subcontrolxxxn-11cn-bit adderncsssn-11Add/sub control = 0, addition.Add/sub control = 1, subtraction.
23 Detecting overflowsOverflows can only occur when the sign of the two operands is the same.Overflow occurs if the sign of the result is different from the sign of the operands.Recall that the MSB represents the sign.xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result sum s respectively.Circuit to detect overflow can be implemented by the following logic expressions:
24 Computing the add time x0 y0 Consider 0th stage: FAConsider 0th stage:c1 is available after 2 gate delays.s1 is available after 1 gate delay.SumCarryyicxiixysiciici+1icixiyi
25 Computing the add time (contd..) Cascade of 4 Full Adders, or a 4-bit adderx0y0s2FAs1c2s0c1c3c0s3c4s0 available after 1 gate delays, c1 available after 2 gate delays.s1 available after 3 gate delays, c2 available after 4 gate delays.s2 available after 5 gate delays, c3 available after 6 gate delays.s3 available after 7 gate delays, c4 available after 8 gate delays.For an n-bit adder, sn-1 is available after 2n-1 gate delayscn is available after 2n gate delays.
26 Fast addition xi yi ci si Pi Gi B cell Recall the equations: Second equation can be written as:We can write:Gi is called generate function and Pi is called propagate functionGi and Pi are computed only from xi and yi and not ci, thus they can be computed in one gate delay after X and Y are applied to the inputs of an n-bit adder.Ci+1 = 1 only when Generate= 1 and Propagate = 0 or 1So we can modify the Pi
27 Carry lookaheadAll carries can be obtained 3 gate delays after X, Y and c0 are applied.-One gate delay for Pi and Gi-Two gate delays in the AND-OR circuit for ci+1All sums can be obtained 1 gate delay after the carries are computed.Independent of n, n-bit addition requires only 4 gate delays.This is called Carry Lookahead adder.
28 Carry-lookahead adder Carry-lookahead logicB cells3PGc21.4xy4-bitcarry-lookaheadadderGic.PsxyB cellB-cell for a single stage
29 Carry lookahead adder (contd..) Performing n-bit addition in 4 gate delays independent of n is good only theoretically because of fan-in constraints.Last AND gate and OR gate require a fan-in of (n+1) for a n-bit adder.For a 4-bit adder (n=4) fan-in of 5 is required.Practical limit for most gates.In order to add operands longer than 4 bits, we can cascade 4-bit Carry-Lookahead adders. Cascade of Carry-Lookahead adders is called Blocked Carry-Lookahead adder.Fan-in is the number of inputs a gate can handle.ci+1=GP-2.....c0
31 Blocked Carry-Lookahead adder Carry-out from a 4-bit block can be given as:Rewrite this as:Subscript I denotes the blocked carry lookahead and identifies the block.Cascade 4 4-bit adders, c16 can be expressed as:
32 Blocked Carry-Lookahead adder Carry-lookahead logic4-bit adders15-12P3IGc122811-8147-43-016xy.After xi, yi and c0 are applied as inputs:- Gi and Pi for each stage are available after 1 gate delay.- PI is available after 2 and GI after 3 gate delays.- All carries are available after 5 gate delays.- c16 is available after 5 gate delays.- s15 which depends on c12 is available after 8 (5+3) gate delays(Recall that for a 4-bit carry lookahead adder, the last sum bit isavailable 3 gate delays after all inputs are available)
34 Multiplication of unsigned numbers Product of 2 n-bit numbers is at most a 2n-bit number.Unsigned multiplication can be viewed as addition of shiftedversions of the multiplicand.
35 Multiplication of unsigned numbers (contd..) We added the partial products at end.Alternative would be to add the partial products at each stage.Rules to implement multiplication are:If the ith bit of the multiplier is 1, shift the multiplicand and add the shifted multiplicand to the current value of the partial product.Hand over the partial product to the next stageValue of the partial product at the start stage is 0.
36 Multiplication of unsigned numbers Typical multiplication cellith multiplier bitcarry incarry outjth multiplicand bitBit of incoming partial product (PPi)Bit of outgoing partial product (PP(i+1))FA
37 Typical multiplication cell The main component in each cell is a full adder, FA.The AND gate in each cell determines whether a multiplicand bit, mj , is added to the incoming partial-product bit, based on the value of the multiplier bit, qi.Each row i, where 0 ≤ i ≤ 3, adds the multiplicand (appropriately shifted) to the incoming partial product, PPi, to generate the outgoing partial product, PP(i + 1), if qi = 1.If qi = 0, PPi is passed vertically downward unchanged.PP0 is all 0s, and PP4 is the desired product.The multiplicand is shifted left one position per row by the diagonal signal path.
38 Combinatorial array multiplier Multiplicandm321qp4567PP1PP2PP3(PP0),Product is: p7,p6,..p0Multiplicand is shifted by displacing it through an array of adders.
39 Combinatorial array multiplier (contd..) Combinatorial array multipliers are:Extremely inefficient.Have a high gate count for multiplying numbers of practical size such as 32-bit or 64-bit numbers.Perform only one function, namely, unsigned integer product.Improve gate efficiency by using a mixture of combinatorial array techniques and sequential techniques requiring less combinational logic.
40 Sequential multiplication Recall the rule for generating partial products:If the ith bit of the multiplier is 1, add the appropriately shifted multiplicand to the current partial product.Multiplicand has been shifted left when added to the partial product.However, adding a left-shifted multiplicand to an unshifted partial product is equivalent to adding an unshifted multiplicand to a right-shifted partial product.
42 Sequential Circuit Multiplier Registers A and Q are shift registers, concatenated as shown. Together, they hold partial product PPi while multiplier bit qi generates the signal Add/No add.This signal causes the multiplexer MUX to select 0 when qi = 0, or to select the multiplicand M when qi = 1, to be added to PPi to generate PP(i + 1).The product is computed in n cycles. The partial product grows in length by one bit per cycle from the initial vector, PP0, of n 0s in register A.The carry-out from the adder is stored in flip-flop C, shown at the left end of register A.At the start, the multiplier is loaded into register Q, the multiplicand into register M, and C and A are cleared to 0. At the end of each cycle, C, A, and Q are shifted right one bit position to allow for growth of the partial product as the multiplier is shifted out of register Q.Because of this shifting, multiplier bit qi appears at the LSB position of Q to generate the Add/No add signal at the correct time, starting with q0 during the first cycle, q1 during the second cycle, and so on.After they are used, the multiplier bits are discarded by the right-shift operation. Note that the carry-out from the adder is the leftmost bit of PP(i + 1), and it must be held in the C flip-flop to be shifted right with the contents of A and Q.After n cycles, the high-order half of the product is held in register A and the low-order half is in register Q.
45 Signed Multiplication Considering 2’s-complement signed operands, what will happen to (-13)(+11) if following the same method of unsigned multiplication?111(-13)111(+11)111111111111111Sign extension isshown in blue11111111111(-143)Sign extension of negative multiplicand.
46 Signed Multiplication For a negative multiplier, a straightforward solution is to form the 2’s-complement of both the multiplier and the multiplicand and proceed as in the case of a positive multiplier.This is possible because complementation of both operands does not change the value or the sign of the product.A technique that works equally well for both negative and positive multipliers – Booth algorithm.
47 Booth AlgorithmIn general, in the Booth scheme, -1 times the shifted multiplicand is selected when moving from 0 to 1, and +1 times the shifted multiplicand is selected when moving from 1 to 0, as the multiplier is scanned from right to left.111111111+1-1+1-1+1-1+1-1+1-1Treats both positive and negative 2’s-complement operands uniformlyAlgorithm (scanning multiplier right to left)0 to 1 => add −1 times shifted multiplicand1 to 0 => add +1 times shifted multiplicand0 to 0 or 1 to 1 => add 0
50 Booth AlgorithmConsider in a multiplication, the multiplier is positive , how many appropriately shifted versions of the multiplicand are added in a standard procedure?1111+1+1+1+1111111111111111111111
51 Booth AlgorithmSince = – , if we use the expression to the right, what will happen?1111+1-12's complement of1111111111the multiplicand111111111
52 Booth Algorithm Booth multiplication with a negative multiplier. 1 1 1 111(+13)111X111(-6)-1+1-1111111111111111111111(-78)Booth multiplication with a negative multiplier.
53 Booth Algorithm Booth multiplier recoding table. Multiplier V ersion of multiplicandselected by bitiBitiBiti-1XM1+1XM11XM11XMBooth multiplier recoding table.
54 Booth Algorithm Best case – a long string of 1’s (skipping over 1s) Worst case – 0’s and 1’s are alternating11111111Worst-casemultiplier+1-1+1-1+1-1+1-1+1-1+1-1+1-1+1-1111111111Ordinarymultiplier-1+1-1+1-1+1-111111111Goodmultiplier+1-1+1-1
56 Bit-Pair Recoding of Multipliers Bit-pair recoding halves the maximum number of summands (versions of the multiplicand).Sign extensionImplied 0 to right of LSB11111+1112(a) Example of bit-pair recoding derived from Booth recoding
57 Bit-Pair Recoding of Multipliers -4x 5-20x+1 +1+= -20
58 Bit-Pair Recoding of Multipliers Multiplier bit-pairMultiplier bit on the rightMultiplicandselected at positionii+1ii1X M1+1X M1+1X M11+2X M12X M111X M111X M111X M(b) Table of multiplicand selection decisions
59 Bit-Pair Recoding of Multipliers 111-1+1-1111111111111111111(+13)111(-6)111111(-78)111-1-21111111111111111111Figure Multiplication requiring only n/2 summands.
60 Carry-Save Addition of Summands CSA speeds up the addition process.P7P6P5P4P3P2P1P0
61 Carry-Save Addition of Summands (Cont.,) P7P6P5P4P3P2P1P0
62 Carry-Save Addition of Summands (Cont.,) Consider the addition of many summands, we can:Group the summands in threes and perform carry-save addition on each of these groups in parallel to generate a set of S and C vectors in one full-adder delayGroup all of the S and C vectors into threes, and perform carry-save addition on them, generating a further set of S and C vectors in one more full-adder delayContinue with this process until there are only two vectors remainingThey can be added in a RCA or CLA to produce the desired product
63 Carry-Save Addition of Summands 1111(45)MX111111(63)Q1111A1111B1111C1111D1111E1111F111111(2,835)ProductFigure A multiplication example used to illustrate carry-save addition as shown in Figure 6.18.
64 1111Mx111111QA1111B1111C11111111S11111C11111D1111E1111FS111121111C21111S11111C11111S2111111S3111C31111C21111111S4+111C4111111ProductFigure The multiplication example from Figure 6.17 performed usingcarry-save addition.
67 Longhand Division Steps Position the divisor appropriately with respect to the dividend and performs a subtraction.If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by another bit of the dividend, the divisor is repositioned, and another subtraction is performed.If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back the divisor, and the divisor is repositioned for another subtraction.
68 Circuit Arrangement Control Sequencer Shift left a0 an an-1 qn-1 q0 Dividend QAQuotientSettingN+1 bitadderAdd/SubtractControlSequencerm0mn-1Divisor MFigure Circuit arrangement for binary division.
69 Restoring Division Shift A and Q left one binary position Subtract M from A, and place the answer back in AIf the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set q0 to 1Repeat these steps n times
70 Examples 1 Initially 1 1 1 Shift 1 Subtract 1 1 1 1 First cycle Set q 111Shift1Subtract1111First cycleSetq1111Restore1111Shift1Subtract1111Setq11111Second cycleRestore111Shift1Subtract1111Setq1Third cycleShift11Subtract11111Setq11111Fourth cycleRestore1111RemainderQuotientFigure A restoring-division example.
71 Nonrestoring Division Avoid the need for restoring A after an unsuccessful subtraction.Any idea?Step 1: (Repeat n times)If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and Q left and add M to A.Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.Step2: If the sign of A is 1, add M to A
74 FractionsIf b is a binary vector, then we have seen that it can be interpreted asan unsigned integer by:V(b) = b b bn b b0.20This vector has an implicit binary point to its immediate right:b31b30b b1b implicit binary pointSuppose if the binary vector is interpreted with the implicit binary point isjust left of the sign bit:implicit binary point .b31b30b b1b0The value of b is then given by:V(b) = b b b b b0.2-32
75 Range of fractions The value of the unsigned binary fraction is: V(b) = b b b b b0.2-32The range of the numbers represented in this format is:In general for a n-bit binary fraction (a number with an assumed binarypoint at the immediate left of the vector), then the range of values is:
76 Scientific notationPrevious representations have a fixed point. Either the point is to the immediate right or it is to the immediate left. This is called Fixed point representation.Fixed point representation suffers from a drawback that the representation can only represent a finite range (and quite small) range of numbers.A more convenient representation is the scientific representation, wherethe numbers are represented in the form:Components of these numbers are:Mantissa (m), implied base (b), and exponent (e)
77 Significant digitsA number such as the following is said to have 7 significant digitsFractions in the range 0.0 to need about 24 bits of precision(in binary). For example the binary fraction with 24 1’s:=Not every real number between 0 and can be representedby a 24-bit fractional number.The smallest non-zero number that can be represented is:= x 10-8Every other non-zero number is constructed in increments of this value.
78 Sign and exponent digits In a 32-bit number, suppose we allocate 24 bits to represent a fractionalmantissa.Assume that the mantissa is represented in sign and magnitude format,and we have allocated one bit to represent the sign.We allocate 7 bits to represent the exponent, and assume that theexponent is represented as a 2’s complement integer.There are no bits allocated to represent the base, we assume that thebase is implied for now, that is the base is 2.Since a 7-bit 2’s complement number can represent values in the range-64 to 63, the range of numbers that can be represented is:x < = | x | <= x 263In decimal representation this range is:x < = | x | <= x 1018
79 A sample representation Sign Exponent Fractional mantissabit24-bit mantissa with an implied binary point to the immediate left7-bit exponent in 2’s complement form, and implied base is 2.
80 Normalization Consider the number: x = 0.0004056781 x 1012 If the number is to be represented using only 7 significant mantissa digits,the representation ignoring rounding is:x = x 1012If the number is shifted so that as many significant digits are brought into7 available slots:x = x 109 = x 1012Exponent of x was decreased by 1 for every left shift of x.A number which is brought into a form so that all of the available mantissadigits are optimally used (this is different from all occupied which maynot hold), is called a normalized number.Same methodology holds in the case of binary mantissas(10110) x 28 = (10) x 25
81 Normalization (contd..) A floating point number is in normalized form if the most significant1 in the mantissa is in the most significant bit of the mantissa.All normalized floating point numbers in this system will be of the form:0.1xxxxx xxRange of numbers representable in this system, if every number must benormalized is:0.5 x <= | x | < 1 x 263
82 Normalization, overflow and underflow The procedure for normalizing a floating point number is:Do (until MSB of mantissa = = 1)Shift the mantissa left (or right)Decrement (increment) the exponent by 1end doApplying the normalization procedure to:x 2-62gives:x 2-65But we cannot represent an exponent of –65, in trying to normalize thenumber we have underflowed our representation.Applying the normalization procedure to:x 263gives:x 264This overflows the representation.
83 Changing the implied base So far we have assumed an implied base of 2, that is our floating pointnumbers are of the form:x = m 2eIf we choose an implied base of 16, then:x = m 16eThen:y = (m.16) .16e-1 (m.24) .16e-1 = m . 16e = xThus, every four left shifts of a binary mantissa results in a decrease of 1in a base 16 exponent.Normalization in this case means shifting the mantissa until there is a 1 inthe first four bits of the mantissa.
84 Excess notationRather than representing an exponent in 2’s complement form, it turns out to be more beneficial to represent the exponent in excess notation.If 7 bits are allocated to the exponent, exponents can be represented in the range of -64 to +63, that is:-64 <= e <= 63Exponent can also be represented using the following coding called as excess-64:E’ = Etrue + 64In general, excess-p coding is represented as:E’ = Etrue + pTrue exponent of -64 is represented as 00 is represented as 6463 is represented as 127This enables efficient comparison of the relative sizes of two floating point numbers.
85 IEEE notationIEEE Floating Point notation is the standard representation in use. There are two representations:- Single precision.- Double precision.Both have an implied base of 2.Single precision:- 32 bits (23-bit mantissa, 8-bit exponent in excess-127 representation)Double precision:- 64 bits (52-bit mantissa, 11-bit exponent in excess-1023 representation)Fractional mantissa, with an implied binary point at immediate left.Sign Exponent Mantissaor or 52
86 IEEE notation Represent 1259.12510 in single precision Step 1 :Convert decimal number to binary format1259(10)= (2)Fractional Part0.125 (10)=0.001 (2)Binary number = =Step 2: Normalize the number= x 210Step3: Single precision format:For a given number S=0,E=10 and M=Bias for single precision format is = 127E’=E+127=10+127=137 (10) = (2)• Number in single precision format….0
87 Peculiarities of IEEE notation Floating point numbers have to be represented in a normalized form tomaximize the use of available mantissa digits.In a base-2 representation, this implies that the MSB of the mantissa isalways equal to 1.If every number is normalized, then the MSB of the mantissa is always 1.We can do away without storing the MSB.IEEE notation assumes that all numbers are normalized so that the MSBof the mantissa is a 1 and does not store this bit.So the real MSB of a number in the IEEE notation is either a 0 or a 1.The values of the numbers represented in the IEEE single precisionnotation are of the form:(+,-) 1.M x 2(E - 127)The hidden 1 forms the integer part of the mantissa.Note that excess-127 and excess-1023 (not excess-128 or excess-1024) are used to represent the exponent.
88 Exponent fieldIn the IEEE representation, the exponent is in excess-127 (excess-1023)notation.The actual exponents represented are:-126 <= E <= and <= E <= 1023not-127 <= E <= and <= E <= 1024This is because the IEEE uses the exponents -127 and 128 (and and1024), that is the actual values 0 and 255 to represent special conditions:- Exact zero- Infinity
89 Floating point arithmetic Addition:x x 106 = x x 108 = x 108Multiplication:x 108 x 1.19 x 106 = ( x 1.19 ) x 10(8+6)Division:x 108 / 1.19 x = ( / 1.19 ) x 10(8-6)Biased exponent problem:If a true exponent e is represented in excess-p notation, that is as e+p.Then consider what happens under multiplication:a. 10(x + p) * b. 10(y + p) = (a.b). 10(x + p + y +p) = (a.b). 10(x +y + 2p)Representing the result in excess-p notation implies that the exponentshould be x+y+p. Instead it is x+y+2p.Biases should be handled in floating point arithmetic.
90 Floating point arithmetic: ADD/SUB rule Choose the number with the smaller exponent.Shift its mantissa right until the exponents of both the numbers are equal.Add or subtract the mantissas.Determine the sign of the result.Normalize the result if necessary and truncate/round to the number of mantissa bits.Note: This does not consider the possibility of overflow/underflow.
91 Floating point arithmetic: MUL rule Add the exponents.Subtract the bias.Multiply the mantissas and determine the sign of the result.Normalize the result (if necessary).Truncate/round the mantissa of the result.
92 Floating point arithmetic: DIV rule Subtract the exponentsAdd the bias.Divide the mantissas and determine the sign of the result.Normalize the result if necessary.Truncate/round the mantissa of the result.Note: Multiplication and division does not require alignment of themantissas the way addition and subtraction does.
93 Guard bitsWhile adding two floating point numbers with 24-bit mantissas, we shiftthe mantissa of the number with the smaller exponent to the right untilthe two exponents are equalized.This implies that mantissa bits may be lost during the right shift (that is,bits of precision may be shifted out of the mantissa being shifted).To prevent this, floating point operations are implemented by keepingguard bits, that is, extra bits of precision at the least significant endof the mantissa.The arithmetic on the mantissas is performed with these extra bits ofprecision.After an arithmetic operation, the guarded mantissas are:- Normalized (if necessary)- Converted back by a process called truncation/rounding to a 24-bitmantissa.
94 Truncation/rounding Straight chopping: Von Neumann rounding: Rounding: The guard bits (excess bits of precision) are dropped.Von Neumann rounding:If the guard bits are all 0, they are dropped.However, if any bit of the guard bit is a 1, then the LSB of the retained bit is set to 1.Rounding:If there is a 1 in the MSB of the guard bit then a 1 is added to the LSB of the retained bits.
95 Rounding Rounding is evidently the most accurate truncation method. However,Rounding requires an addition operation.Rounding may require a renormalization, if the addition operation de-normalizes the truncated number.IEEE uses the rounding method.rounds to= which must be renormalized to
96 Implementing Floating-Point Operations-1 The hardware implementation of floating-point operations designed using logic circuitry. and can also be implemented by software routines.In either case, the computer must be able to convert input and output from and to the user’s decimal representation of numbers.In many general-purpose processors, floating-point operations are available at the machine-instruction level, implemented in hardware.
97 Implementing Floating-Point Operations-2 Step 1: Compare the exponent for sign bit using 8bit subtractor Sign is sent to SWAP unit to decide on which number to be sent to SHIFTER unit.Step2: The exponent of the result is determined in two way multiplexer depending on the sign bit from step1Step3: Control logic determines whether mantissas are to be added or subtracted. Depending on sign of the operand. There are many combinations are possible here, that depends on sign bits, exponent values of the operand.Step4: Normalization of the result depending on the leading zeros, and some special case like 1.xxxxx operands. Where result is 1x.xxx and X = -1, therefore will increase the exponent value.
98 Implementing Floating-Point Operations-3 Example Add single precision floating point numbers A and B, where A= H and B = 42A00000H. Solution Step 1 : Represent numbers in single precision format A = ….0 B = ….0 Exponent for A = =137 Therefore actual exponent = (Bias) =10 Exponent for B = = 133 Therefore actual exponent = (Bias) = 6 With difference 4. Hence its mantissa is shifted right by 4 bits as shown below Step 2: Shift mantissa Shifted mantissa of B = …0 Step 3: Add mantissa Mantissa of A = …0 Mantissa of B = …0 Mantissa of result = …0 As both numbers are positive, sign of the result is positive Result = …0 = H