Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic.

Similar presentations

Presentation on theme: "Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic."— Presentation transcript:

1 Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic

2 Why Multipliers? Used in a lot of DSP applications
Vector product, matrix multiplication Convolution Filtering (tap filters, FIR, …) ... “At least one good reason for studying multiplication and division is that there is an infinite number of ways of performing these operations and hence there is an infinite number of PhDs (or expense-paid visits to conferences in USA) to be won from inventing new forms of multiplier” Alan Clements The Principles of Computer Hardware, 1986

3 Basic Arithmetic and the ALU
Now Integer multiplication Booth’s algorithm Floating point representation Floating point addition, multiplication Floating point are not crucial for the project

4 Multiplication 1 x Flashback to 3rd grade Base 10: 8 x 9 = 72
x Flashback to 3rd grade Multiplier Multiplicand Partial products Final sum Base 10: 8 x 9 = 72 PP: = 72 How wide is the result? log(n x m) = log(n) + log(m) 32b x 32b = 64b result

5 Combinational Multiplier
Generating partial products 2:1 mux based on multiplier[i] selects multiplicand or 0x0 32 partial products (!) Summing partial products Build Wallace tree of CSA

6 Combinational Multiplier: Idea
Use an array of AND gates to generate the partial products in parallel multiplicand 1 1 LSB LSB 1 multiplier 1 1 1 1 1 1 1

7 Combinational Multiplier: Adding PProds
HA FA X3 X2 X1 X0 Y1 Y0 Z0 Y2 Z1 Y3 Z2 Z3 Z4 Z5 Z6 Z7

8 Combinational Multiplier: Critical Path(s)
Combinational Multiplier: Critical Path(s) A lot of critical paths: same delay. (AND gates not shown) HA FA FA HA MxN Multiplier M FA FA FA HA Critical Path 1 N Critical Path 2 M=# of multiplier bits N=# of multiplicand bits Does it make sense to use faster adders for the last row? No! You have to change ALL the critical paths simultaneously FA FA FA HA Delay=(M+N-2)tcarry+(N-1)tsum+tAND VLSI Design II – © Kia Bazargan

9 Combinational Multiplier: Layout
Combinational Multiplier: Layout Better floorplan for compact layout: Send partial product diagonally Results in better area (AND gates and hence the first row not shown) HA FA FA HA FA FA FA HA M=# of multiplier bits N=# of multiplicand bits FA FA FA HA VLSI Design II – © Kia Bazargan

10 Carry Save Adder A + B => S Save carries A + B => S, Cout Use Cin A + B + C => S1, S2 (3# to 2# in parallel) Used in combinational multipliers by building a Wallace Tree c a b CSA c s

11 Wallace Tree f e d c b a CSA CSA CSA CSA

12 Multicycle Multipliers
Combinational multipliers Very hardware-intensive Integer multiply relatively rare Not the right place to spend resources Multicycle multipliers Iterate through bits of multiplier Conditionally add shifted multiplicand

13 Multiplier (F4.25) 1 x

14 Multiplier (F4.26) 1 x

15 Multiplier Improvements
Do we really need a 64-bit adder? No, since low-order bits are not involved Hence, just use a 32-bit adder Shift product register right on every step Do we really need a separate multiplier register? No, since low-order bits of 64-bit product are initially unused Hence, just store multiplier there initially

16 Multiplier (F4.31) 1 x

17 Multiplier (F4.32) 1 x

18 Signed Multiplication
Recall For p = a x b, if a<0 or b<0, then p < 0 If a<0 and b<0, then p > 0 Hence sign(p) = sign(a) xor sign(b) Hence Convert multiplier, multiplicand to positive number with (n-1) bits Multiply positive numbers Compute sign, convert product accordingly Or, Perform sign-extension on shifts for F4.31 design Right answer falls out

19 Booth’s Encoding Recall grade school trick
When multiplying by 9: Multiply by 10 (easy, just shift digits left) Subtract once E.g. x 9 = x (10 – 1) = – Converts addition of six partial products to one shift and one subtraction Booth’s algorithm applies same principle Except no ‘9’ in binary, just ‘1’ and ‘0’ So, it’s actually easier!

20 Booth’s Encoding Search for a run of ‘1’ bits in the multiplier
E.g. ‘0110’ has a run of 2 ‘1’ bits in the middle Multiplying by ‘0110’ (6 in decimal) is equivalent to multiplying by 8 and subtracting twice, since 6 x m = (8 – 2) x m = 8m – 2m Hence, iterate right to left and: Subtract multiplicand from product at first ‘1’ Add multiplicand to product after first ‘1’ Don’t do either for ‘1’ bits in the middle

21 Booth’s Algorithm Current bit Bit to right Explanation Example
Operation 1 Begins run of ‘1’ Subtract Middle of run of ‘1’ Nothing End of a run of ‘1’ Add Middle of a run of ‘0’

22 Booth’s Encoding 1 Binary +1 -1 1-bit Booth +2 -2 2-bit Booth
Really just a new way to encode numbers Normally positionally weighted as 2n With Booth, each position has a sign bit Can be extended to multiple bits 1 Binary +1 -1 1-bit Booth +2 -2 2-bit Booth

23 Booth’s Example Negative multiplicand: -6 x 6 = -36
1010 x 0110, 0110 in Booth’s encoding is +0-0 Hence: x 0 x –1 x +1 Final Sum: (-36)

24 Booth’s Example Negative multiplier: -6 x -2 = 12
1010 x 1110, 1110 in Booth’s encoding is 00-0 Hence: x 0 x –1 Final Sum: (12)

25 Modified Booth Booth 2 modified to produce at most n/2+1 partial products. Algorithm: (for unsigned numbers) Pad the LSB with one zero. Pad the MSB with 2 zeros if n is even and 1 zero if n is odd. Divide the multiplier into overlapping groups of 3-bits. Determine partial product scale factor from modified booth 2 encoding table. Compute the Multiplicand Multiples Sum Partial Products

26 Modified Booth Multiplier: Idea (cont.)
Can encode the digits by looking at three bits at a time Booth recoding table: Must be able to add multiplicand times –2, -1, 0, 1 and 2 Since Booth recoding got rid of 3’s, generating partial products is not that hard (shifting and negating) i+1 i i-1 add *M *M *M *M –2*M –1*M –1*M *M [©Hauck] Spring 2006 EE VLSI Design II - © Kia Bazargan

27 Modified Booth Example: (n=4-bits unsigned) Pad LSB with 1 zero
n is even then pad the MSB with two zeros Form 3-bit overlapping groups for n=8 we have 5 groups Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 1 1 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0

28 2-bits/cycle Modified Booth Multiplier
For every pair of multiplier bits If Booth’s encoding is ‘-2’ Shift multiplicand left by 1, then subtract If Booth’s encoding is ‘-1’ Subtract If Booth’s encoding is ‘0’ Do nothing If Booth’s encoding is ‘1’ Add If Booth’s encoding is ‘2’ Shift multiplicand left by 1, then add

29 2 bits/cycle Modified Booth’s
1 bit Booth 00 +0 01 +M; 10 -M; 11 Current Previous Operation Explanation 00 +0;shift 2 [00] => +0, [00] => +0; 2x(+0)+(+0)=+0 1 +M; shift 2 [00] => +0, [01] => +M; 2x(+0)+(+M)=+M 01 [01] => +M, [10] => -M; 2x(+M)+(-M)=+M +2M; shift 2 [01] => +M, [11] => +0; 2x(+M)+(+0)=+2M 10 -2M; shift 2 [10] => -M, [00] => +0; 2x(-M)+(+0)=-2M -M; shift 2 [10] => -M, [01] => +M; 2x(-M)+(+M)=-M 11 [11] => +0, [10] => -M; 2x(+0)+(-M)=-M +0; shift 2 [11] => +0, [11] => +0; 2x(+0)+(+0)=+0

30 Wallace Tree: Idea Idea: divide & conquer
Wallace Tree: Idea Idea: divide & conquer Why add the k numbers one by one? Tree structure  logarithmic For now, let’s assume we are going to add 7 6-bit numbers – which are NOT partial products, hence not shifted. What’s the fastest way to add them? VLSI Design II – © Kia Bazargan

31 Delay = 4 CSA + 1 CLA Wallace Tree Example
Wallace Tree Example Spring 2006 Circles represent digits Boxes show FAs Diagonal lines correspond to (Sum, Carry) pairs generated by the FA cells in the previous stage (same color) Dotted box is some carry propagate adder (e.g., CLA) Delay = 4 CSA + 1 CLA VLSI Design II – © Kia Bazargan

32 Wallace Tree: Structure for 7 k-bit Numbers
K-bit CSA K-bit CSA [1,k] [1,k] [0,k-1] [0,k-1] K-bit CSA [1,k] [0,k-1] K-bit CSA [2,k+1] [1,k] ‘0’,[2,k] [1,k-1], ‘0’ K-bit CSA [k+1] [2,k+1] [1,k+1] [2,k+1] K-bit CPA [k+2] [2,k+1] [1] [0]

33 Wallace Tree: Timing At each step, # of operands reduces to 2/3
n k-bit numbers CSA CSA CSA CSA CSA CSA CSA CSA CSA (2/3) n nums CSA CSA CSA CSA CSA CSA (2/3)2 n CSA CSA CSA CSA h levels . . . CSA (2/3)h n = 2

34 Wallace Tree: Timing (cont.)
Delay depends on height h h = O ( log n )  Logarithmic delay Max # N of k-bit numbers that can be added using a Wallace tree of height h h N h N h N 2 7 28 14 474 1 3 8 42 15 711 2 4 9 63 16 1066 3 6 10 94 17 1599 4 9 11 141 18 2398 5 13 12 211 19 3597 6 19 13 316 20 5395

35 Floating point

36 Floating Point Want to represent larger range of numbers
Fixed point (integer): -2n-1 … (2n-1 –1) How? Sacrifice precision for range by providing exponent to shift relative weight of each bit position Similar to scientific notation: x 1023 Cannot specify every discrete value in the range, but can span much larger range

37 Floating Point Still use a fixed number of bits IEEE 754 standard
Sign bit S, exponent E, significand F Value: (-1)S x F x 2E IEEE 754 standard S E F Size Exponent Significand Range Single precision 32b 8b 23b 2x10+/-38 Double precision 64b 11b 52b 2x10+/-308

38 Floating Point Exponent
Exponent specified in biased or excess notation Why? To simplify sorting Sign bit is MSB to ease sorting 2’s complement exponent: Large numbers have positive exponent Small numbers have negative exponent Sorting does not follow naturally

39 Excess or Biased Exponent
2’s Compl Excess-127 -127 -126 +127 Value: (-1)S x F x 2(E-bias) SP: bias is 127 DP: bias is 1023

40 Floating Point Normalization
S,E,F representation allows more than one representation for a particular value, e.g. 1.0 x 105 = 0.1 x 106 = x 104 This makes comparison operations difficult Prefer to have a single representation Hence, normalize by convention: Only one digit to the left of the floating point In binary, that digit must be a 1 Since leading ‘1’ is implicit, no need to store it Hence, obtain one extra bit of precision for free

41 FP Overflow/Underflow
Analogous to integer overflow Result is too big to represent Means exponent is too big FP Underflow Result is too small to represent Means exponent is too small (too negative) Both can raise an exception under IEEE754

42 IEEE754 Special Cases Single Precision Double Precision Value Exponent
Significand nonzero denormalized 1-254 anything 1-2046 fp number 255 2047 infinity NaN (Not a Number)

43 FP Rounding Rounding is important FP rounding hardware helps
Small errors accumulate over billions of ops FP rounding hardware helps Compute extra guard bit beyond 23/52 bits Further, compute additional round bit beyond that Multiply may result in leading 0 bit, normalize shifts guard bit into product, leaving round bit for rounding Finally, keep sticky bit that is set whenever ‘1’ bits are “lost” to the right Differentiates between 0.5 and

44 Floating Point Addition
Just like grade school First, align decimal points Then, add significands Finally, normalize result Example 9.997 x 102 x 102 4.631 x 10-1 x 102 Sum x 102 Normalized x 103

45 FP Adder (F4.45)

46 FP Multiplication Sign: Ps = As xor Bs Exponent: PE = AE + BE
Due to bias/excess, must subtract bias e = e1 + e2 E = e = e1 + e E = (E1 – 1023) + (E2 – 1023) E = E1 + E2 –1023 Significand: PF = AF x BF Standard integer multiply (23b or 52b + g/r/s bits) Use Wallace tree of CSAs to sum partial products

47 FP Multiplication Compute sign, exponent, significand Normalize
Shift left, right by 1 Check for overflow, underflow Round Normalize again (if necessary)

Download ppt "Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic."

Similar presentations

Ads by Google