Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www.cse.psu.edu/~mji)

Similar presentations


Presentation on theme: "CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www.cse.psu.edu/~mji)"— Presentation transcript:

1 CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www.cse.psu.edu/~mji)

2 CSE575 Multiplication.2 © MJIrwin, PSU, 2005 Computer Arithmetic Remaining Lecture Schedule Mar 15Introduction, number reprDr. IrwinChp 1 Mar 17Local project design reviewTheo T. Mar 22Global project reviewDr. Vijay Mar 24Global project reviewDr. Vijay Mar 29AdditionDr. IrwinChp 2 Apr 1Redundant repr & its usesDr. Irwin Apr 5MultiplicationDr. IrwinChp 4 Apr 7Local/Global project reviewDr. Vijay Apr 12DivisionDr. IrwinChp 5 Apr 14Flt point repr & operationDr. IrwinChp 8 Apr 19Function evaluationDr. IrwinChp 10, 11 Apr 21Final global project reviewDr. Vijay Apr 26Other # systemsDr. Irwin Apr 28Final global project reviewDr. Vijay

3 CSE575 Multiplication.3 © MJIrwin, PSU, 2005 Computer Arithmetic Review: Binary Adders synchronous word parallel adders ripple carry adders (RCA) carry prop min adders signed-digit fast carry prop residue adders adders (CPAs) adders Manchester carry carry prefix cond. carry carry chain select lookahead sum skip T = O(n), A = O(n) T = O(1), A = O(n) T = O(log n) A = O(n log n) T = O(  n), A = O(n) T = O(n), A = O(n)

4 CSE575 Multiplication.4 © MJIrwin, PSU, 2005 Computer Arithmetic Multioperand Addition l Addition of more than two numbers »vector inner products »computing averages X0X0 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 Sum n k log(k2 n - k + 1)  n + logk

5 CSE575 Multiplication.5 © MJIrwin, PSU, 2005 Computer Arithmetic Serial Implementation CPA n bits XjXj Partial sum register n + logk bits  X’s T serial-multiadd = O(k log(n + logk)) = O(k logn + k loglogk) Addition time grows superlinearly with k when n is fixed and logarithmically with n for a fixed k.

6 CSE575 Multiplication.6 © MJIrwin, PSU, 2005 Computer Arithmetic Multiply l Binary multiplication as repeated additions multiplicand - D multiplier - Q partial product array double precision product - P n 2n n

7 CSE575 Multiplication.7 © MJIrwin, PSU, 2005 Computer Arithmetic A Serial Implementation 2n-b CPA n bits D Partial product register 2n bits P T serial-multiply = O(n log(2n)) Multiplication time grows superlinearly with n (when using a log time adder) Multiplicand register Q Multiplier register Add/no add control 1 bit

8 CSE575 Multiplication.8 © MJIrwin, PSU, 2005 Computer Arithmetic Shift & Add Multiplication l Left shift and add »Partial products accumulated from bottom to top »Requires a 2n bit adder l Right shift and add »Partial products accumulated from top to bottom »Only requires an n bit adder »Sign extend ‘icand on right shift; premultiply ‘icand by 2 n to offset effect of right shifts (integer operands only)

9 CSE575 Multiplication.9 © MJIrwin, PSU, 2005 Computer Arithmetic Right Shift & Add Multiplier n-b CPA n bits D P T serial-multiply = O(n logn) or O(n 2 ) Multiplication time grows superlinearly with n. Multiplicand register Q Multiplier register Add/subt control (Partial) Product register 0 Add/no add control

10 CSE575 Multiplication.10 © MJIrwin, PSU, 2005 Computer Arithmetic Signed Multiplication l So far we have (q 0. q 1 q 2 q 3 …q n-1 ) P 0 = 0 P 1 = ½(P 0 + q n-1 D) P 2 = ½(P 1 + q n-2 D)... P i+1 = ½(P i + q n-i-1 D) = (  q n-j 2 -j ) D So P n-1 = (  q n-j 2 -j ) D = Q * D sign bit j=1 i+1 n-1 j=1

11 CSE575 Multiplication.11 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplicand l As long as we sign extend the ‘icand our scheme works fine l But what if both ‘icand and ‘ier are negative? 1 0 0 1 1 D = -13 0 1 0 1 1 *Q = +11

12 CSE575 Multiplication.12 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplicand l As long as we sign extend the ‘icand our scheme works fine l But what if both ‘icand and ‘ier are negative? 1 0 0 1 1 D = -13 0 1 0 1 1 *Q = +11 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 0... 0 1 1 0 0 1 1 1 0 1 1 1 0 0 0 1 P = -143 sign extend

13 CSE575 Multiplication.13 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplier l Recall for 2s’c D = -d 0 2 0 +  d j 2 -j and Q = -q 0 2 0 +  q j 2 -j and what we have computed so far is P n-1 = (  q n-j 2 -j ) D what we want is P = Q * D = -q 0 2 0 D + (  q n-j 2 -j ) D j=1 n-1 l So the correction factor for 2s’c is P = P n-1 - q 0 D

14 CSE575 Multiplication.14 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplier Example 1 0 0 1 1 D = -13 1 1 0 1 0 *Q = -6

15 CSE575 Multiplication.15 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplier Example 1 0 0 1 1 D = -13 1 1 0 1 0 *Q = -6 0... 0 1 1 1 0 0 1 1 0... 0 1 0 0 1 1 1 0 1 1 1 1 1 1 0 - 1 0 0 1 1 0 0 1 0 0 1 1 1 0 P = +78

16 CSE575 Multiplication.16 © MJIrwin, PSU, 2005 Computer Arithmetic Other Negative Multipliers l 1s’C P = P n-1 - q 0 D + 2 -(n-1) q 0 D »adder must do 1s’C addition (EAC) »sign extend ‘icand »initialize P 0 as q 0 D (rather than clearing the register) »do an optional subtraction as a last step l SM |P| = P n-1 and p sign = q 0  d 0 »strip off the sign bits and do unsigned multiplication (so no corrections and no sign extensions) »sign of the product is the xor of ‘ier and ‘icand sign bits

17 CSE575 Multiplication.17 © MJIrwin, PSU, 2005 Computer Arithmetic Lower Bound on Multiplication l Winograd’s lower bound on multiplication of two n-digit d-valued numbers is t   log2n  l Mult can be done as the addition of the log representation of two numbers a * b = c  loga + logb = logc but the data representation is nonstandard

18 CSE575 Multiplication.18 © MJIrwin, PSU, 2005 Computer Arithmetic Faster Serial Multiplication l Use logn fast CPA l Bypass addition cycle when ‘ier bit is 0 »Zero detect and barrel shift –Detect strings of zeros in the ‘ier and shift 1, 2, 3, … n-1 places right in one cycle l Use higher radix multiplication »Multiplier recoding to simplify multiple formation »CSAs to form multiples

19 CSE575 Multiplication.19 © MJIrwin, PSU, 2005 Computer Arithmetic Carry Save Adder (CSA) l A carry save adder is nothing more than a full adder with the carries saved rather than propagated! l Also called a (3,2) counter FA

20 CSE575 Multiplication.20 © MJIrwin, PSU, 2005 Computer Arithmetic Carry Save Word Adder l A 6 bit CSA reduces three 6-bit inputs to one 6-bit output and one 7- bit output FA 6-b CSA

21 CSE575 Multiplication.21 © MJIrwin, PSU, 2005 Computer Arithmetic Radix 4 Multiply l Radix 4 multiply involves half as many additions, so runs twice as fast where P i+1 = ¼(P i + q n-i-1 ||q n-i-2 D) with P 0 = 0 and P n-1 = (  q n-j 2 -j ) D = Q * D multiplicand -D multiplier - Q partial product array double precision product - P n 2n n/2 n-1 j=1

22 CSE575 Multiplication.22 © MJIrwin, PSU, 2005 Computer Arithmetic Forming the Multiples l Need the multiples 0*D, 1*D, 2*D, 3*D l All are easy except 3*D »compute it via an addition (3D = 2D + 1D) every cycle –too slow! »precompute it and store it in a register »use a CSA to form the multiples »replace 3D with a 4D (a carry into the next higher multiplier digit) and a – D – recode the multiplier – so you don’t need it

23 CSE575 Multiplication.23 © MJIrwin, PSU, 2005 Computer Arithmetic Using a CSA to Form Multiples n+2-b CPA D n+2 bits P ‘icand Q ‘ier Add/subt control (Partial) Product 0 n+1-b CSA 0 0,1D 0,2D Shift P || Q right 2 bits each iteration

24 CSE575 Multiplication.24 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding the Multiplier l Recall for radix 4, Q=[0,1,2,3] can be recoded into Q=[ - 2, - 1,0,1,2] l This recoding has to be accomplished so that the algebraic value (Q = -q 0 +  q j r -j in RC) of the ‘ier is unchanged... q j-1 q j q j+1... r -(j-1) q j-1 + r -j q j = r -(j-1) (q j-1 + 1) + r -j (q j - r) add a unit here subtract r here

25 CSE575 Multiplication.25 © MJIrwin, PSU, 2005 Computer Arithmetic Goals of Recoding l Maximize the number of zero’s 0111 1111  1000 000 - 1 or l Eliminate the possibility of a 11 or - 1 - 1 digit pairing 0111 0111  100 - 1 100 - 1  1000 - 100 - 1

26 CSE575 Multiplication.26 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding l With mode digit, m j, and recoded digit, q j ’ r -(j-1) q j-1 ’ + r -j q j ’ + r -(j+1) q j+1 ’ = r -(j-1) (q j-1 +m j-1 ) + r -j (q j -rm j-1 +m j ) + r -(j+1) (q j+1 -rm j ) l So that q j ’ = q j - rm j-1 + m j

27 CSE575 Multiplication.27 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding, Con’t l And Q’ =  q j ’r -j =  r -j (q j - rm j-1 + m j ) =  r -j q j +  r -j (-rm j-1 + m j ) =  r -j q j - r 0 m 0 + r -1 m 1 - r -1 m 1 + … + r -(n-1) m n-1 = - m 0 +  r -j q j + r -(n-1) m n-1 l So if m n-1 = 0 and m 0 = q 0 then the recoding works for RC notation and the choices for m j (j = 1, 2, …, n-2) are arbitrary!!

28 CSE575 Multiplication.28 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding Table l In binary q j ’ = q j - 2m j-1 + m j l Given m j from the previous step, when q j is sensed pick m j-1 qj’qj’ mjmj qjqj m j-1 =0m j-1 =1 000-2 011 101 1120

29 CSE575 Multiplication.29 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding Families l Canonical (Booths) l Differentiating l Nonrestoring l Modified Booths uses q j-1 and q j uses q j-1,q j and q j+1

30 CSE575 Multiplication.30 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’ 000000000 010000000 0010111 0111 1 01 10001011 1101 0101 101101010 111101010

31 CSE575 Multiplication.31 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’ 000000000 010000000 0010111 0111 1 01 10001011 1101 0101 101101010 111101010 middle of string of 0’s middle of string of 1’s isolated 1 start string of 1’s start string of 0’s isolated 0

32 CSE575 Multiplication.32 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical (Booths) Recoding 01001011 m 8 = 0 m 7 = m 6 = m 5 = m 4 = m 3 = m 2 = m 1 = m 0 = 0

33 CSE575 Multiplication.33 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical (Booths) Recoding 01001011 m 8 = 0 m 7 = 1 0 m 6 = 1 m 5 = 10 m 4 = 1 1 m 3 = 00 m 2 = 01 m 1 = 00 m 0 = 00 1 0 1 0 -1 0 -1

34 CSE575 Multiplication.34 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical Recoding Facts l Every two nonzero recoded digits are separated by at least one zero digit (in binary), so Q=[ - 2, - 1,0,1,2] and no 3D to deal with »Proof: l Produces a multiplier with the most zeros l It is a left-directed (serial) recoding

35 CSE575 Multiplication.35 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical Recoding Facts l Every two nonzero recoded digits are separated by at least one zero digit (in binary), so Q=[ - 2, - 1,0,1,2] and no 3D to deal with »Proof: l Produces a multiplier with the most zeros l It is a left-directed (serial) recoding So q i-1 ’ & q i ’ = (q i-1 !m i-1 | !q i-1 m i-1 )(q i !m i | !q i m i ) From the table m i-1 = m i q i | m i q i-1 | q i-1 q i and q i ’ = q i !m i | !q i m i If no two successive digits are nonzero, it must be true that q i-1 ’ & q i ’ = 0 And substituting terms gives q i-1 ’ & q i ’ = (q i-1 !q i !m i | !q i-1 q i m i )(q i !m i | !q i m i ) = 0 and !m i-1 = !q i !q i-1 | !m i !q i-1 | !m i !q i and !q i ’ = q i m i | !q i !m i

36 CSE575 Multiplication.36 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’ 000000000 010000000 0010111 0111 1 01 10001011 1101 0101 101101010 111101010

37 CSE575 Multiplication.37 © MJIrwin, PSU, 2005 Computer Arithmetic Differentiating Recoding 01001011 m 8 = 0 m 7 = m 6 = m 5 = m 4 = m 3 = m 2 = m 1 = m 0 = 0

38 CSE575 Multiplication.38 © MJIrwin, PSU, 2005 Computer Arithmetic Differentiating Recoding 01001011 m 8 = 0 m 7 = 10 m 6 = 11 m 5 = 0 m 4 = 1 1 m 3 = 00 m 2 = 0 m 1 = 1 1 -1 0 1 -1 1 0 -1m 0 = 0 1

39 CSE575 Multiplication.39 © MJIrwin, PSU, 2005 Computer Arithmetic Differentiating Recoding Facts l Because of the pairing of rows, the recoding is independent of q j-1 and m j-1 = q j so m j = q j+1 l So the recoding can be based upon just q j and q j+1 to recode q j mjmj qjqj m j-1 qj’qj’ 0000 011 1001 1110 q j+1

40 CSE575 Multiplication.40 © MJIrwin, PSU, 2005 Computer Arithmetic More Differentiating Facts l The recoding can be done lsd first (left directed) OR msd first (right directed) OR in parallel 0 1 0 0 1 0 1 1 Q M (Q shifted left) Q’ l Successive nonzero recoded digits are always of opposite sign, so Q=[ - 2, - 1,0,1,2] and still no 3D to have to deal with l Also gives a n/2 versus n height partial product array 0 1 0 0 1 0 1 1 0 1 - 1 0 1 - 1 1 0 - 1 1 1 - 1 - 1

41 CSE575 Multiplication.41 © MJIrwin, PSU, 2005 Computer Arithmetic Modified Booth’s Recoding l Modified Booth’s recoding has the same goal as differentiating, to have a recoding scheme that is parallel and that allows a radix 4 multiply without 3D l Instead of a mode digit, it uses three adjacent bits of Q to do the recoding and recodes two bits at a time (instead of one) l Successive nonzero recoded digits are always of opposite sign, so Q=[ - 2, - 1,0,1,2]

42 CSE575 Multiplication.42 © MJIrwin, PSU, 2005 Computer Arithmetic Modified Booth’s Scheme q j-1 qjqj q j+1 q j-1 ’qj’qj’ 00000 00101 01001 01110 1000 101 1 1100 11100

43 CSE575 Multiplication.43 © MJIrwin, PSU, 2005 Computer Arithmetic Modified Booth’s Scheme q j-1 qjqj q j+1 q j-1 ’qj’qj’ 00000 00101 01001 01110 1000 101 1 1100 11100 run of zeros end of string of ones isolated one end of string of ones start of string of ones isolated zero start of string of ones run of ones

44 CSE575 Multiplication.44 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding Hardware Comparison l How do differentiating and modified Booth’s compare wrt time, complexity, power? 0 1 0 0 1 0 1 1 0 Q differentiating modified Booths

45 CSE575 Multiplication.45 © MJIrwin, PSU, 2005 Computer Arithmetic Right Shift & Add Multiplier n+2-b CPA !D n+2 bits T serial-multiply = O((n/2) logn) ‘icand Add/subt control 0 recode Shift P || Q right 2 bits each iteration P Q ‘ier (Partial) Product -2D,-1D,0,1D,2D D

46 CSE575 Multiplication.46 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’ 000000000 010000000 0010111 0111 1 01 10001011 1101 0101 101101010 111101010 low order 0’s only never?

47 CSE575 Multiplication.47 © MJIrwin, PSU, 2005 Computer Arithmetic Nonrestoring Recoding 01001011 m 8 = 0 m 7 = m 6 = m 5 = m 4 = m 3 = m 2 = m 1 = m 0 =

48 CSE575 Multiplication.48 © MJIrwin, PSU, 2005 Computer Arithmetic Nonrestoring Recoding 01001011 m 8 = 0 1 m 7 = 0 m 6 = 11 m 5 = 0 m 4 = 1 m 3 = 11 m 2 = 0 m 1 = 1 1 -1 1 -1 -1 1 -1 1m 0 = 0 1

49 CSE575 Multiplication.49 © MJIrwin, PSU, 2005 Computer Arithmetic Nonrestoring Recoding Facts l The msd does not conform to the rules, it is overridden by the termination condition that m 0 must agree with the sign of the ‘ier l Gives a recoded digit set of Q’=[ - 3, - 1,1,3] (lsd could also be Q’=[ - 2, 0, 2]) l It is a left-directed (serial) recoding l It corresponds to the inverse of nonrestoring division, so could be useful in helping to determine the relationship between multiply and divide

50 CSE575 Multiplication.50 © MJIrwin, PSU, 2005 Computer Arithmetic Higher Radix Multiply l Does recoding work for radix 8? radix 16? radix 32? l Only choice is to form or pre-form multiples of D

51 CSE575 Multiplication.51 © MJIrwin, PSU, 2005 Computer Arithmetic Higher Radix Multiply l Does recoding work for radix 8? radix 16? radix 32? l Only choice is to form or pre-form multiples of D - 7 to 7 many “hard” multiples ( - 3,3, - 5,5, - 6,6, - 7,7) max. redundant - 6 to 6 many “hard” multiples ( - 3,3, - 5,5, - 6,6) - 5 to 5 some “hard” multiples ( - 3,3, - 5,5) - 4 to 4 few “hard” multiples ( - 3,3) min. redundant

52 CSE575 Multiplication.52 © MJIrwin, PSU, 2005 Computer Arithmetic Multiply Operation Review multiplicand (D) multiplier (Q) partial product array (ppa) (note: can be formed in parallel) double precision product (P = Q*D) n 2n n

53 CSE575 Multiplication.53 © MJIrwin, PSU, 2005 Computer Arithmetic Parallel Multiplication l In a parallel multiplier »Can use a multiplier recoding scheme to reduce the height of the ppa in half (from n bits high to n/2 bits high) –must be able to form the ppa in parallel (so must use either modified Booths or differentiating recoding) »Reduce the height of the ppa to two rows in parallel with a tree of fast adders »Use a fast CPA to do the final add

54 CSE575 Multiplication.54 © MJIrwin, PSU, 2005 Computer Arithmetic Full Tree Multiplier Structure partial product reduction tree fast CPA P (product) use multiplier recoding to reduce the height of the tree to n/2 D0 Q (‘ier) D D D 0 0 0 multiple forming circuits...

55 CSE575 Multiplication.55 © MJIrwin, PSU, 2005 Computer Arithmetic Tree Reduction Techniques l CSA ((3,2) counters) trees »Wallace - row reduction –combine partial product bits as early as possible –fastest possible design, shorter CPA »Dadda - column reduction –combine partial product bits as late as possible –cheaper CSA tree, wider CPA l Other counter trees l SDA trees FA... n-b CSA

56 CSE575 Multiplication.56 © MJIrwin, PSU, 2005 Computer Arithmetic Full CSA (Wallace) Multiplier Tree n-b CSA n+1-b CSA n+3-b CSA n+3-b CPA [n+5,6][n+4,5] [n+3,4] [n+2,3][n+1,2] [n,1] [n-1,0] [n+1,1] [n+4,3] [n+4,4][n+1,2] [n+5,3] [n+2,2] [n+6,4] [n+5,4] [n+5,3] n+8[n+7,4]10 $ CSA-multiply = O((k-2)n)$ CSA + n$ CPA T CSA-multiply = O(tree height + T CPA ) = O(logk + logn) [n-1,1] [n+5,3] [n+2,3] 23 for a k-bit ‘ier (k=7) and n-bit ‘icand

57 CSE575 Multiplication.57 © MJIrwin, PSU, 2005 Computer Arithmetic 4x4 Tree Reduction Wallace tree 4FAs+6HAs+5-bCPA Dadda tree 2FAs+4HAs+6-bCPA 12A2A 3A3A 4A4A 3A3A 2A2A 1123A3A 4A4A 321 2A2A 2A2A 2A2A 3A3A 2A2A 1112A2A 3A3A 3A3A 3A3A 21 122221112222221 1111111111111111 5-bit CPA 6-bit CPA

58 CSE575 Multiplication.58 © MJIrwin, PSU, 2005 Computer Arithmetic 6x6 Tree Reduction Wallace tree 16FAs+13HAs+8-bCPA Dadda tree 15FAs+5HAs+10-bCPA 12A2A 3A3A 4A4A 5AA5AA 6AA6AA 5AA5AA 4A4A 3A3A 2A2A 11234A4A 5AA5AA 6AA6AA 5A5A 4321 2A2A 2A2A 2A2A 4A4A 4A4A 4A4A 3A3A 3A3A 2A2A 11124A4A 4A4A 4A4A 4A4A 4A4A 4A4A 321 12A2A 2A2A 2A2A 3A3A 3A3A 3A3A 2A2A 2A2A 11113A3A 3A3A 3A3A 3A3A 3A3A 3A3A 3A3A 3A3A 21 22222222111122222222221 1111111111111111111111111

59 CSE575 Multiplication.59 © MJIrwin, PSU, 2005 Computer Arithmetic Maximum Inputs for CSA Trees Max #CSA Levels 63 94 135 196 287 428 639 9410 The maximum number, n, of inputs that can be reduced to two outputs by an h- level CSA tree is n(h) =  3n(h-1)/2  Giving an upper bound of n(h)  2(3/2) h and a lower bound of n(h)  2(3/2) h-1

60 CSE575 Multiplication.60 © MJIrwin, PSU, 2005 Computer Arithmetic l Gives an irregular structure making design and layout quite difficult l Connections and signal paths of varying lengths lead to signal skew and increased glitching impacting both performance and power consumption l Is there an approach better for VLSI layout we can use? Log Reduction Trees Traditional Wallace and Dadda Approaches

61 CSE575 Multiplication.61 © MJIrwin, PSU, 2005 Computer Arithmetic Reduction with Counters l General parallel counters »adds up the 1’s in a k-bit column outputting a logk wide count –completely utilized - (3;2) (7;3) (15;4) (5,5;4) (2,2,2,3;5) –partially utilized – (10;4) (3,3,3,3;6) l Specialize parallel counters »adds up the 1’s in a k-bit column plus j internal carry-ins, outputting j internal carry-outs and a 2 wide count –(4;2) (7;2) (11;2)

62 CSE575 Multiplication.62 © MJIrwin, PSU, 2005 Computer Arithmetic (7,3) Counter l Built out of (3,2) counters (3,2)

63 CSE575 Multiplication.63 © MJIrwin, PSU, 2005 Computer Arithmetic (7,3) Counter l Built out of (3,2) counters (3,2)

64 CSE575 Multiplication.64 © MJIrwin, PSU, 2005 Computer Arithmetic (4,2) Counter l Also built out of two (3,2) counters, but with 1 internal carry-in and 1 internal carry-out (3,2)

65 CSE575 Multiplication.65 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (4,2) Counters l Tiles with neighboring (4,2) counters l Reduces columns four high to columns only two high »Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)

66 CSE575 Multiplication.66 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (4,2) Counters l Tiles with neighboring (4,2) counters l Reduces columns four high to columns only two high »Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)

67 CSE575 Multiplication.67 © MJIrwin, PSU, 2005 Computer Arithmetic 4x4 PPA Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product l Fast 4x4 multiplication using (4,2) counters l How would you lay it out?

68 CSE575 Multiplication.68 © MJIrwin, PSU, 2005 Computer Arithmetic 4x4 PPA Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product l Fast 4x4 multiplication using (4,2) counters l How would you lay it out? five (4,2) counters 5-bit CPA multiplicand multiplier 8-bit product

69 CSE575 Multiplication.69 © MJIrwin, PSU, 2005 Computer Arithmetic 8x8 PPA Reduction ‘icand ‘ier partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows?

70 CSE575 Multiplication.70 © MJIrwin, PSU, 2005 Computer Arithmetic 8x8 PPA Reduction ‘icand ‘ier partial product array reduced partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows? Answer: 24 to a 12-bit fast CPA

71 CSE575 Multiplication.71 © MJIrwin, PSU, 2005 Computer Arithmetic 8x8 PPA Reduction ‘icand ‘ier partial product array reduced partial product array two rows of nine (4,2) counters each one row of thirteen (4,2) counters to a 13-bit fast CPA How many (4,2) counters are needed in the Wallace tree? Answer: 31

72 CSE575 Multiplication.72 © MJIrwin, PSU, 2005 Computer Arithmetic An 8x8 Multiplier Layout multiplicand multiplier thirteen (4,2) counters 13-bit CPA l How should it be laid out? nine (4,2) counters

73 CSE575 Multiplication.73 © MJIrwin, PSU, 2005 Computer Arithmetic A Better 8x8 Multiplier Layout multiple generators multiplicand multiple selection signals (‘ier)... 2 nine (4,2) counters thirteen (4,2) counters CPA l One that focuses on wires instead of gates

74 CSE575 Multiplication.74 © MJIrwin, PSU, 2005 Computer Arithmetic A 16x16 Multiplier Layout multiple generators multiplicand multiple selection signals (‘ier)... 2 (4,2) counter slice CPA

75 CSE575 Multiplication.75 © MJIrwin, PSU, 2005 Computer Arithmetic Pipelining l Divide computation into stages that take approximately the same time l Separate stages with pipeline latches to isolate them l Run clock at rate determined by slowest stage »Much faster clock l Longer latency - time from input of particular inputs to output of corresponding result l Big bandwidth win if doing lots of (independent) multiplies in a row - after pipeline fill, one result is generated every clock cycle

76 CSE575 Multiplication.76 © MJIrwin, PSU, 2005 Computer Arithmetic A Pipelined Version CPA multiple selection signals (‘ier)... multiplicand (4,2) counter slices 2 Pipeline latches on counter slice output

77 CSE575 Multiplication.77 © MJIrwin, PSU, 2005 Computer Arithmetic (7,2) Counter l Built out of (3,2) counters with 2 carry-ins and 2 carry-outs (3,2) from (i-2) slice from (i-1) slice to (i+2) slice to (i+1) slice (3,2)

78 CSE575 Multiplication.78 © MJIrwin, PSU, 2005 Computer Arithmetic (7,2) Counter l Built out of (3,2) counters with 2 carry-ins and 2 carry-outs (3,2) from (i-2) slice from (i-1) slice to (i+2) slice to (i+1) slice (3,2)

79 CSE575 Multiplication.79 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (7,2) Counters (3,2)

80 CSE575 Multiplication.80 © MJIrwin, PSU, 2005 Computer Arithmetic (11,2) Counter For each outgoing carry there is a corresponding incoming carry that was generated after the same delay - balanced delay tree Delay of five (3,2) counter levels 111 22 3 3 4 5

81 CSE575 Multiplication.81 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (11,2) Counters 111 22 3 3 4 5 111 22 3 3 4 5 111 22 3 3 4 5

82 CSE575 Multiplication.82 © MJIrwin, PSU, 2005 Computer Arithmetic Counter Reduction Trees l May be more CSA levels in the slice tree than in Wallace or Dadda trees l However, regular interconnect with shorter wires give more efficient and faster layouts with less glitching l Can be combined with ‘ier recoding to reduce the PP array height by half »may not pay off! (additional CSAs for reducing n rather than n/2 could be less complex than the recoding logic when wiring and layout irregularity are taken into account)

83 CSE575 Multiplication.83 © MJIrwin, PSU, 2005 Computer Arithmetic Signed Tree Multipliers l Sign extend each partial product (and do final correction subtraction) to width of final product a a a a a a a a x x x x x x x x b b b b b b b x x x x x x x x x d d d d d d x x x x x x x x x x signs can be removed

84 CSE575 Multiplication.84 © MJIrwin, PSU, 2005 Computer Arithmetic Baugh Wooley Multiplier q 0 q 1 q 2 q 3 d 0 d 1 d 2 d 3 d 0 q 3 d 1 q 3 d 2 q 3 d 3 q 3 d 0 q 2 d 1 q 2 d 2 q 2 d 3 q 2 d 0 q 1 d 1 q 1 d 2 q 1 d 3 q 1 d 0 q 0 d 1 q 0 d 2 q 0 d 3 q 0 q 0 q 1 q 2 q 3 d 0 d 1 d 2 d 3 d 0 !q 3 d 1 q 3 d 2 q 3 d 3 q 3 d 0 !q 2 d 1 q 2 d 2 q 2 d 3 q 2 d 0 !q 1 d 1 q 1 d 2 q 1 d 3 q 1 d 0 q 0 !d 1 q 0 !d 2 q 0 !d 3 q 0 !d 0 0 0 d 0 1 !q 0 0 0 q 0

85 CSE575 Multiplication.85 © MJIrwin, PSU, 2005 Computer Arithmetic Baugh Wooley Multiplier Example 1 1 1 0 -2 1 1 0 1 -3 1 0 0 0 0 1 q 0 q 1 q 2 q 3 d 0 d 1 d 2 d 3 d 0 !q 3 d 1 q 3 d 2 q 3 d 3 q 3 d 0 !q 2 d 1 q 2 d 2 q 2 d 3 q 2 d 0 !q 1 d 1 q 1 d 2 q 1 d 3 q 1 d 0 q 0 !d 1 q 0 !d 2 q 0 !d 3 q 0 !d 0 0 0 d 0 1 !q 0 0 0 q 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 +6

86 CSE575 Multiplication.86 © MJIrwin, PSU, 2005 Computer Arithmetic Partial Tree Multipliers l If full tree is too expensive, do h passes through a smaller tree reduction tree sum carry Upper part of the cumulative PP in stored carry form Lower part of cumulative PP h bit adder h+2 reduction tree h h-1 …

87 CSE575 Multiplication.87 © MJIrwin, PSU, 2005 Computer Arithmetic Multiply Operation multiplicand (D) multiplier (Q) partial product array double precision product (P = Q*D) h = 4

88 CSE575 Multiplication.88 © MJIrwin, PSU, 2005 Computer Arithmetic Radix 16 Partial Tree Multiply Q (‘ier) CSA sum carry D 2D 4D 8D 0 0 0 0 4 bit shift 4 bits 3 bits 4 bit RCA To lower half of PP

89 CSE575 Multiplication.89 © MJIrwin, PSU, 2005 Computer Arithmetic Pipelined Tree Multiplier partial product reduction tree fast CPA P (product) D0 Q (‘ier) D D D 0 0 0 multiple forming circuits... pipeline latches

90 CSE575 Multiplication.90 © MJIrwin, PSU, 2005 Computer Arithmetic Pipelined Partial Tree Multipliers l Feed back sum and carry into middle of (h+2) reduction tree reduction tree sum carry Upper part of the cumulative PP in stored carry form Lower part of cumulative P h bit adder h reduction tree h h-1 CSA latch h

91 CSE575 Multiplication.91 © MJIrwin, PSU, 2005 Computer Arithmetic Twin Beat Multiplier pipelined radix-8 recoder/ selector sum carry CSA 3DD pipelined radix-8 recoder/ selector sum carry CSA 3DD 6 6 5

92 CSE575 Multiplication.92 © MJIrwin, PSU, 2005 Computer Arithmetic Key References Baugh, Wooley, A two’s complement parallel array multiplication algorithm, IEEE Trans. Computers, 22:1045-1047, 1973. Booth, A signed binary multiplication technique, Quarterly Journal Mechanics and Applied Math, 4(2):236-240, June 1951. Ciminiera, Montuschi, Carry-save multiplication schemes without final addition, IEEE Trans. Computers, 45(9):1050-1055, 1996. Dadda, Some schemes for parallel multipliers, Alta Frequenza, 34:349-356, 1965. Dadda, On parallel digital multipliers, Alta Frequenza, 45:574-580, 1976. Robertson, Two’s complement multiplication in binary parallel computers, IRE Trans. Electronic Computers, 4(3):118-119, Sept. 1955. Santoro, Horowitz, A pipelined 64x64b iterative array multiplier, Proc. of SSCC, pp. 35-36, Feb 1988. Stenzel, Kubitz, A compact high-speed parallel multiplication scheme, IEEE Trans. on Computers, C-26:948-957, 1977. Swartzlander, Parallel counters, IEEE Trans. on Computers, 22(11):1021-1024, 1973. Wallace, A suggestion for a fast multiplier, IEEE Trans. on Electronic Computers, 13:14-17, 1964. Zuras, McAllister, Balanced delay trees and combinatorial division in VLSI, IEEE J. SSC, 21:814-819, 1986.


Download ppt "CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www.cse.psu.edu/~mji)"

Similar presentations


Ads by Google