# Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic.

## Presentation on theme: "Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic."— Presentation transcript:

Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic

Why Multipliers? Used in a lot of DSP applications
Vector product, matrix multiplication Convolution Filtering (tap filters, FIR, …) ... “At least one good reason for studying multiplication and division is that there is an infinite number of ways of performing these operations and hence there is an infinite number of PhDs (or expense-paid visits to conferences in USA) to be won from inventing new forms of multiplier” Alan Clements The Principles of Computer Hardware, 1986

Basic Arithmetic and the ALU
Now Integer multiplication Booth’s algorithm Floating point representation Floating point addition, multiplication Floating point are not crucial for the project

Multiplication 1 x Flashback to 3rd grade Base 10: 8 x 9 = 72
x Flashback to 3rd grade Multiplier Multiplicand Partial products Final sum Base 10: 8 x 9 = 72 PP: = 72 How wide is the result? log(n x m) = log(n) + log(m) 32b x 32b = 64b result

Combinational Multiplier
Generating partial products 2:1 mux based on multiplier[i] selects multiplicand or 0x0 32 partial products (!) Summing partial products Build Wallace tree of CSA

Combinational Multiplier: Idea
Use an array of AND gates to generate the partial products in parallel multiplicand 1 1 LSB LSB 1 multiplier 1 1 1 1 1 1 1

HA FA X3 X2 X1 X0 Y1 Y0 Z0 Y2 Z1 Y3 Z2 Z3 Z4 Z5 Z6 Z7

Combinational Multiplier: Critical Path(s)
Combinational Multiplier: Critical Path(s) A lot of critical paths: same delay. (AND gates not shown) HA FA FA HA MxN Multiplier M FA FA FA HA Critical Path 1 N Critical Path 2 M=# of multiplier bits N=# of multiplicand bits Does it make sense to use faster adders for the last row? No! You have to change ALL the critical paths simultaneously FA FA FA HA Delay=(M+N-2)tcarry+(N-1)tsum+tAND VLSI Design II – © Kia Bazargan

Combinational Multiplier: Layout
Combinational Multiplier: Layout Better floorplan for compact layout: Send partial product diagonally Results in better area (AND gates and hence the first row not shown) HA FA FA HA FA FA FA HA M=# of multiplier bits N=# of multiplicand bits FA FA FA HA VLSI Design II – © Kia Bazargan

Carry Save Adder A + B => S Save carries A + B => S, Cout Use Cin A + B + C => S1, S2 (3# to 2# in parallel) Used in combinational multipliers by building a Wallace Tree c a b CSA c s

Wallace Tree f e d c b a CSA CSA CSA CSA

Multicycle Multipliers
Combinational multipliers Very hardware-intensive Integer multiply relatively rare Not the right place to spend resources Multicycle multipliers Iterate through bits of multiplier Conditionally add shifted multiplicand

Multiplier (F4.25) 1 x

Multiplier (F4.26) 1 x

Multiplier Improvements
Do we really need a 64-bit adder? No, since low-order bits are not involved Hence, just use a 32-bit adder Shift product register right on every step Do we really need a separate multiplier register? No, since low-order bits of 64-bit product are initially unused Hence, just store multiplier there initially

Multiplier (F4.31) 1 x

Multiplier (F4.32) 1 x

Signed Multiplication
Recall For p = a x b, if a<0 or b<0, then p < 0 If a<0 and b<0, then p > 0 Hence sign(p) = sign(a) xor sign(b) Hence Convert multiplier, multiplicand to positive number with (n-1) bits Multiply positive numbers Compute sign, convert product accordingly Or, Perform sign-extension on shifts for F4.31 design Right answer falls out

Booth’s Encoding Recall grade school trick
When multiplying by 9: Multiply by 10 (easy, just shift digits left) Subtract once E.g. x 9 = x (10 – 1) = – Converts addition of six partial products to one shift and one subtraction Booth’s algorithm applies same principle Except no ‘9’ in binary, just ‘1’ and ‘0’ So, it’s actually easier!

Booth’s Encoding Search for a run of ‘1’ bits in the multiplier
E.g. ‘0110’ has a run of 2 ‘1’ bits in the middle Multiplying by ‘0110’ (6 in decimal) is equivalent to multiplying by 8 and subtracting twice, since 6 x m = (8 – 2) x m = 8m – 2m Hence, iterate right to left and: Subtract multiplicand from product at first ‘1’ Add multiplicand to product after first ‘1’ Don’t do either for ‘1’ bits in the middle

Booth’s Algorithm Current bit Bit to right Explanation Example
Operation 1 Begins run of ‘1’ Subtract Middle of run of ‘1’ Nothing End of a run of ‘1’ Add Middle of a run of ‘0’

Booth’s Encoding 1 Binary +1 -1 1-bit Booth +2 -2 2-bit Booth
Really just a new way to encode numbers Normally positionally weighted as 2n With Booth, each position has a sign bit Can be extended to multiple bits 1 Binary +1 -1 1-bit Booth +2 -2 2-bit Booth

Booth’s Example Negative multiplicand: -6 x 6 = -36
1010 x 0110, 0110 in Booth’s encoding is +0-0 Hence: x 0 x –1 x +1 Final Sum: (-36)

Booth’s Example Negative multiplier: -6 x -2 = 12
1010 x 1110, 1110 in Booth’s encoding is 00-0 Hence: x 0 x –1 Final Sum: (12)

Modified Booth Booth 2 modified to produce at most n/2+1 partial products. Algorithm: (for unsigned numbers) Pad the LSB with one zero. Pad the MSB with 2 zeros if n is even and 1 zero if n is odd. Divide the multiplier into overlapping groups of 3-bits. Determine partial product scale factor from modified booth 2 encoding table. Compute the Multiplicand Multiples Sum Partial Products

Modified Booth Multiplier: Idea (cont.)
Can encode the digits by looking at three bits at a time Booth recoding table: Must be able to add multiplicand times –2, -1, 0, 1 and 2 Since Booth recoding got rid of 3’s, generating partial products is not that hard (shifting and negating) i+1 i i-1 add *M *M *M *M –2*M –1*M –1*M *M [©Hauck] Spring 2006 EE VLSI Design II - © Kia Bazargan

Modified Booth Example: (n=4-bits unsigned) Pad LSB with 1 zero
n is even then pad the MSB with two zeros Form 3-bit overlapping groups for n=8 we have 5 groups Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 1 1 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0

2-bits/cycle Modified Booth Multiplier
For every pair of multiplier bits If Booth’s encoding is ‘-2’ Shift multiplicand left by 1, then subtract If Booth’s encoding is ‘-1’ Subtract If Booth’s encoding is ‘0’ Do nothing If Booth’s encoding is ‘1’ Add If Booth’s encoding is ‘2’ Shift multiplicand left by 1, then add

2 bits/cycle Modified Booth’s
1 bit Booth 00 +0 01 +M; 10 -M; 11 Current Previous Operation Explanation 00 +0;shift 2 [00] => +0, [00] => +0; 2x(+0)+(+0)=+0 1 +M; shift 2 [00] => +0, [01] => +M; 2x(+0)+(+M)=+M 01 [01] => +M, [10] => -M; 2x(+M)+(-M)=+M +2M; shift 2 [01] => +M, [11] => +0; 2x(+M)+(+0)=+2M 10 -2M; shift 2 [10] => -M, [00] => +0; 2x(-M)+(+0)=-2M -M; shift 2 [10] => -M, [01] => +M; 2x(-M)+(+M)=-M 11 [11] => +0, [10] => -M; 2x(+0)+(-M)=-M +0; shift 2 [11] => +0, [11] => +0; 2x(+0)+(+0)=+0

Wallace Tree: Idea Idea: divide & conquer
Wallace Tree: Idea Idea: divide & conquer Why add the k numbers one by one? Tree structure  logarithmic For now, let’s assume we are going to add 7 6-bit numbers – which are NOT partial products, hence not shifted. What’s the fastest way to add them? VLSI Design II – © Kia Bazargan

Delay = 4 CSA + 1 CLA Wallace Tree Example
Wallace Tree Example Spring 2006 Circles represent digits Boxes show FAs Diagonal lines correspond to (Sum, Carry) pairs generated by the FA cells in the previous stage (same color) Dotted box is some carry propagate adder (e.g., CLA) Delay = 4 CSA + 1 CLA VLSI Design II – © Kia Bazargan

Wallace Tree: Structure for 7 k-bit Numbers
K-bit CSA K-bit CSA [1,k] [1,k] [0,k-1] [0,k-1] K-bit CSA [1,k] [0,k-1] K-bit CSA [2,k+1] [1,k] ‘0’,[2,k] [1,k-1], ‘0’ K-bit CSA [k+1] [2,k+1] [1,k+1] [2,k+1] K-bit CPA [k+2] [2,k+1] [1] [0]

Wallace Tree: Timing At each step, # of operands reduces to 2/3
n k-bit numbers CSA CSA CSA CSA CSA CSA CSA CSA CSA (2/3) n nums CSA CSA CSA CSA CSA CSA (2/3)2 n CSA CSA CSA CSA h levels . . . CSA (2/3)h n = 2

Wallace Tree: Timing (cont.)
Delay depends on height h h = O ( log n )  Logarithmic delay Max # N of k-bit numbers that can be added using a Wallace tree of height h h N h N h N 2 7 28 14 474 1 3 8 42 15 711 2 4 9 63 16 1066 3 6 10 94 17 1599 4 9 11 141 18 2398 5 13 12 211 19 3597 6 19 13 316 20 5395

Floating point

Floating Point Want to represent larger range of numbers
Fixed point (integer): -2n-1 … (2n-1 –1) How? Sacrifice precision for range by providing exponent to shift relative weight of each bit position Similar to scientific notation: x 1023 Cannot specify every discrete value in the range, but can span much larger range

Floating Point Still use a fixed number of bits IEEE 754 standard
Sign bit S, exponent E, significand F Value: (-1)S x F x 2E IEEE 754 standard S E F Size Exponent Significand Range Single precision 32b 8b 23b 2x10+/-38 Double precision 64b 11b 52b 2x10+/-308

Floating Point Exponent
Exponent specified in biased or excess notation Why? To simplify sorting Sign bit is MSB to ease sorting 2’s complement exponent: Large numbers have positive exponent Small numbers have negative exponent Sorting does not follow naturally

Excess or Biased Exponent
2’s Compl Excess-127 -127 -126 +127 Value: (-1)S x F x 2(E-bias) SP: bias is 127 DP: bias is 1023

Floating Point Normalization
S,E,F representation allows more than one representation for a particular value, e.g. 1.0 x 105 = 0.1 x 106 = x 104 This makes comparison operations difficult Prefer to have a single representation Hence, normalize by convention: Only one digit to the left of the floating point In binary, that digit must be a 1 Since leading ‘1’ is implicit, no need to store it Hence, obtain one extra bit of precision for free

FP Overflow/Underflow
Analogous to integer overflow Result is too big to represent Means exponent is too big FP Underflow Result is too small to represent Means exponent is too small (too negative) Both can raise an exception under IEEE754

IEEE754 Special Cases Single Precision Double Precision Value Exponent
Significand nonzero denormalized 1-254 anything 1-2046 fp number 255 2047 infinity NaN (Not a Number)

FP Rounding Rounding is important FP rounding hardware helps
Small errors accumulate over billions of ops FP rounding hardware helps Compute extra guard bit beyond 23/52 bits Further, compute additional round bit beyond that Multiply may result in leading 0 bit, normalize shifts guard bit into product, leaving round bit for rounding Finally, keep sticky bit that is set whenever ‘1’ bits are “lost” to the right Differentiates between 0.5 and

Just like grade school First, align decimal points Then, add significands Finally, normalize result Example 9.997 x 102 x 102 4.631 x 10-1 x 102 Sum x 102 Normalized x 103

FP Multiplication Sign: Ps = As xor Bs Exponent: PE = AE + BE
Due to bias/excess, must subtract bias e = e1 + e2 E = e = e1 + e E = (E1 – 1023) + (E2 – 1023) E = E1 + E2 –1023 Significand: PF = AF x BF Standard integer multiply (23b or 52b + g/r/s bits) Use Wallace tree of CSAs to sum partial products

FP Multiplication Compute sign, exponent, significand Normalize
Shift left, right by 1 Check for overflow, underflow Round Normalize again (if necessary)