Download presentation

Presentation is loading. Please wait.

Published byNeil Mounsey Modified about 1 year ago

1
M ULTIPLIERS Multipliers Booth’s Multiplier Floating Point Arithmetic

2
W HY M ULTIPLIERS ? Used in a lot of DSP applications Vector product, matrix multiplication Convolution Filtering (tap filters, FIR, …)... 2 “At least one good reason for studying multiplication and division is that there is an infinite number of ways of performing these operations and hence there is an infinite number of PhDs (or expense-paid visits to conferences in USA) to be won from inventing new forms of multiplier” Alan Clements The Principles of Computer Hardware, 1986

3
B ASIC A RITHMETIC AND THE ALU Now Integer multiplication Booth’s algorithm Floating point representation Floating point addition, multiplication Floating point are not crucial for the project 3

4
M ULTIPLICATION Flashback to 3 rd grade Multiplier Multiplicand Partial products Final sum Base 10: 8 x 9 = 72 PP: = 72 How wide is the result? log(n x m) = log(n) + log(m) 32b x 32b = 64b result x

5
C OMBINATIONAL M ULTIPLIER Generating partial products 2:1 mux based on multiplier[i] selects multiplicand or 0x0 32 partial products (!) Summing partial products Build Wallace tree of CSA 5

6
6 C OMBINATIONAL M ULTIPLIER : I DEA Use an array of AND gates to generate the partial products in parallel LSB multiplier multiplicand

7
7 C OMBINATIONAL M ULTIPLIER : A DDING PP RODS

8
8 C OMBINATIONAL M ULTIPLIER : C RITICAL P ATH ( S ) A lot of critical paths: same delay. (AND gates not shown) HAFA HA FA HA FA HA Critical Path 1 Critical Path 2 Delay=(M+N-2)t carry +(N-1)t sum +t AND MxN Multiplier N M

9
9 C OMBINATIONAL M ULTIPLIER : L AYOUT HAFA HA FA HA FA HA Better floorplan for compact layout: Send partial product diagonally Results in better area (AND gates and hence the first row not shown)

10
C ARRY S AVE A DDER A + B => S Save carries A + B => S, C out Use C in A + B + C => S1, S2 (3# to 2# in parallel) Used in combinational multipliers by building a Wallace Tree 10 c b a c s CSA

11
W ALLACE T REE 11 a b c d e f CSA

12
M ULTICYCLE M ULTIPLIERS Combinational multipliers Very hardware-intensive Integer multiply relatively rare Not the right place to spend resources Multicycle multipliers Iterate through bits of multiplier Conditionally add shifted multiplicand 12

13
M ULTIPLIER (F4.25) x

14
M ULTIPLIER (F4.26) x

15
M ULTIPLIER I MPROVEMENTS Do we really need a 64-bit adder? No, since low-order bits are not involved Hence, just use a 32-bit adder Shift product register right on every step Do we really need a separate multiplier register? No, since low-order bits of 64-bit product are initially unused Hence, just store multiplier there initially 15

16
M ULTIPLIER (F4.31) x

17
M ULTIPLIER (F4.32) x

18
S IGNED M ULTIPLICATION Recall For p = a x b, if a<0 or b<0, then p < 0 If a 0 Hence sign(p) = sign(a) xor sign(b) Hence Convert multiplier, multiplicand to positive number with (n-1) bits Multiply positive numbers Compute sign, convert product accordingly Or, Perform sign-extension on shifts for F4.31 design Right answer falls out 18

19
B OOTH ’ S E NCODING Recall grade school trick When multiplying by 9: Multiply by 10 (easy, just shift digits left) Subtract once E.g x 9 = x (10 – 1) = – Converts addition of six partial products to one shift and one subtraction Booth’s algorithm applies same principle Except no ‘9’ in binary, just ‘1’ and ‘0’ So, it’s actually easier! 19

20
B OOTH ’ S E NCODING Search for a run of ‘1’ bits in the multiplier E.g. ‘0110’ has a run of 2 ‘1’ bits in the middle Multiplying by ‘0110’ (6 in decimal) is equivalent to multiplying by 8 and subtracting twice, since 6 x m = (8 – 2) x m = 8m – 2m Hence, iterate right to left and: Subtract multiplicand from product at first ‘1’ Add multiplicand to product after first ‘1’ Don’t do either for ‘1’ bits in the middle 20

21
B OOTH ’ S A LGORITHM 21 Current bit Bit to right ExplanationExampleOperation 10Begins run of ‘1’ Subtract 11Middle of run of ‘1’ Nothing 01End of a run of ‘1’ Add 00Middle of a run of ‘0’ Nothing

22
B OOTH ’ S E NCODING Really just a new way to encode numbers Normally positionally weighted as 2 n With Booth, each position has a sign bit Can be extended to multiple bits Binary bit Booth bit Booth

23
B OOTH ’ S E XAMPLE Negative multiplicand: -6 x 6 = x 0110, 0110 in Booth’s encoding is +0-0 Hence: x x – x x Final Sum: (-36)

24
B OOTH ’ S E XAMPLE Negative multiplier: -6 x -2 = x 1110, 1110 in Booth’s encoding is 00-0 Hence: x x – x x Final Sum: (12)

25
M ODIFIED B OOTH Booth 2 modified to produce at most n/2+1 partial products. Algorithm: (for unsigned numbers) 1. Pad the LSB with one zero. 2. Pad the MSB with 2 zeros if n is even and 1 zero if n is odd. 3. Divide the multiplier into overlapping groups of 3- bits. 4. Determine partial product scale factor from modified booth 2 encoding table. 5. Compute the Multiplicand Multiples 6. Sum Partial Products

26
Spring 2006 EE VLSI Design II - © Kia Bazargan 26 M ODIFIED B OOTH M ULTIPLIER : I DEA ( CONT.) Can encode the digits by looking at three bits at a time Booth recoding table: Must be able to add multiplicand times –2, -1, 0, 1 and 2 Since Booth recoding got rid of 3’s, generating partial products is not that hard (shifting and negating) i+1ii-1add 0000*M 0011*M 0101*M 0112*M 100–2*M 101–1*M 110–1*M 1110*M [©Hauck]

27
M ODIFIED B OOTH Example: (n=4-bits unsigned) 1. Pad LSB with 1 zero 2. n is even then pad the MSB with two zeros 3. Form 3-bit overlapping groups for n=8 we have 5 groups Y3Y3 Y2Y2 Y1Y1 Y0Y0 Y3Y3 Y2Y2 Y1Y1 Y0Y0 0 0 Y3Y3 Y2Y2 Y1Y1 Y0Y Y7Y7 Y6Y6 Y5Y5 Y4Y4 Y7Y7 Y6Y6 Y5Y5 Y4Y Y7Y7 Y6Y6 Y5Y5 Y4Y

28
2- BITS / CYCLE M ODIFIED B OOTH M ULTIPLIER For every pair of multiplier bits If Booth’s encoding is ‘-2’ Shift multiplicand left by 1, then subtract If Booth’s encoding is ‘-1’ Subtract If Booth’s encoding is ‘0’ Do nothing If Booth’s encoding is ‘1’ Add If Booth’s encoding is ‘2’ Shift multiplicand left by 1, then add 28

29
2 BITS / CYCLE M ODIFIED B OOTH ’ S 29 CurrentPreviousOperationExplanation 000+0;shift 2[00] => +0, [00] => +0; 2x(+0)+(+0)= M; shift 2[00] => +0, [01] => +M; 2x(+0)+(+M)=+M 010+M; shift 2[01] => +M, [10] => -M; 2x(+M)+(-M)=+M 011+2M; shift 2[01] => +M, [11] => +0; 2x(+M)+(+0)=+2M 100-2M; shift 2[10] => -M, [00] => +0; 2x(-M)+(+0)=-2M 101-M; shift 2[10] => -M, [01] => +M; 2x(-M)+(+M)=-M 110-M; shift 2[11] => +0, [10] => -M; 2x(+0)+(-M)=-M 111+0; shift 2[11] => +0, [11] => +0; 2x(+0)+(+0)=+0 1 bit Booth M; 10-M; 11+0

30
30 W ALLACE T REE : I DEA Idea: divide & conquer Why add the k numbers one by one? Tree structure logarithmic

31
Spring W ALLACE T REE E XAMPLE Delay = 4 CSA + 1 CLA

32
32 W ALLACE T REE : S TRUCTURE FOR 7 K - BIT N UMBERS [0,k-1] [1,k] [0,k-1] [1,k][0,k-1] K-bit CSA [1,k] [0,k-1] K-bit CSA [2,k+1] [1,k] K-bit CSA [2,k+1][1,k+1] K-bit CSA [0,k-1] [1,k-1], ‘0’ K-bit CPA [0] [1] [2,k+1] [k+2] ‘0’,[2,k] [k+1]

33
33 At each step, # of operands reduces to 2/3 W ALLACE T REE : T IMING CSA n k-bit numbers (2/3) n nums CSA (2/3) 2 n CSA... (2/3) h n = 2 h levels

34
34 W ALLACE T REE : T IMING ( CONT.) Delay depends on height h h = O ( log n ) Logarithmic delay Max # N of k-bit numbers that can be added using a Wallace tree of height h hNhNhN

35
F LOATING POINT 35

36
Want to represent larger range of numbers Fixed point (integer): -2 n-1 … (2 n-1 –1) How? Sacrifice precision for range by providing exponent to shift relative weight of each bit position Similar to scientific notation: x Cannot specify every discrete value in the range, but can span much larger range 36

37
F LOATING P OINT Still use a fixed number of bits Sign bit S, exponent E, significand F Value: (-1) S x F x 2 E IEEE 754 standard 37 SizeExponentSignificand Range Single precision32b8b23b 2x10 +/-38 Double precision64b11b52b 2x10 +/-308 SEF

38
F LOATING P OINT E XPONENT Exponent specified in biased or excess notation Why? To simplify sorting Sign bit is MSB to ease sorting 2’s complement exponent: Large numbers have positive exponent Small numbers have negative exponent Sorting does not follow naturally 38

39
E XCESS OR B IASED E XPONENT Value: (-1) S x F x 2 (E-bias) SP: bias is 127 DP: bias is Exponent2’s ComplExcess ………

40
F LOATING P OINT N ORMALIZATION S,E,F representation allows more than one representation for a particular value, e.g. 1.0 x 10 5 = 0.1 x 10 6 = 10.0 x 10 4 This makes comparison operations difficult Prefer to have a single representation Hence, normalize by convention: Only one digit to the left of the floating point In binary, that digit must be a 1 Since leading ‘1’ is implicit, no need to store it Hence, obtain one extra bit of precision for free 40

41
FP O VERFLOW /U NDERFLOW FP Overflow Analogous to integer overflow Result is too big to represent Means exponent is too big FP Underflow Result is too small to represent Means exponent is too small (too negative) Both can raise an exception under IEEE754 41

42
IEEE754 S PECIAL C ASES 42 Single PrecisionDouble PrecisionValue ExponentSignificandExponentSignificand nonzero0 denormalized 1-254anything1-2046anything fp number infinity 255nonzero2047nonzeroNaN (Not a Number)

43
FP R OUNDING Rounding is important Small errors accumulate over billions of ops FP rounding hardware helps Compute extra guard bit beyond 23/52 bits Further, compute additional round bit beyond that Multiply may result in leading 0 bit, normalize shifts guard bit into product, leaving round bit for rounding Finally, keep sticky bit that is set whenever ‘1’ bits are “lost” to the right Differentiates between 0.5 and

44
F LOATING P OINT A DDITION Just like grade school First, align decimal points Then, add significands Finally, normalize result Example x x x x 10 2 Sum x 10 2 Normalized x 10 3

45
FP A DDER (F4.45) 45

46
FP M ULTIPLICATION Sign: P s = A s xor B s Exponent: P E = A E + B E Due to bias/excess, must subtract bias e = e1 + e2 E = e = e1 + e E = (E1 – 1023) + (E2 – 1023) E = E1 + E2 –1023 Significand: P F = A F x B F Standard integer multiply (23b or 52b + g/r/s bits) Use Wallace tree of CSAs to sum partial products 46

47
FP M ULTIPLICATION Compute sign, exponent, significand Normalize Shift left, right by 1 Check for overflow, underflow Round Normalize again (if necessary) 47

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google