CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www.cse.psu.edu/~mji)

Slides:



Advertisements
Similar presentations
Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
Advertisements

UNIVERSITY OF MASSACHUSETTS Dept
EE 382 Processor DesignWinter 98/99Michael Flynn 1 AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under.
Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 6 Arithmetic. Addition Carry in Carry out
UNIVERSITY OF MASSACHUSETTS Dept
Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Introduction to CMOS VLSI Design Datapath Functional Units
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Lecture 18: Datapath Functional Units
Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
ECE 645 – Computer Arithmetic Lecture 7: Tree and Array Multipliers ECE 645—Computer Arithmetic 3/18/08.
Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.
Chapter # 5: Arithmetic Circuits
Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Spring 2006 University of Ilam.
Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
1 Chapter 7 Computer Arithmetic Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
Multi-operand Addition
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Logic and Computer Design.
ECE 645 – Computer Arithmetic Lecture 6: Multi-Operand Addition ECE 645—Computer Arithmetic 3/5/08.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Unconventional Fixed-Radix Number Systems
Wallace Tree Previous Example is 7 Input Wallace Tree
Digital Integrated Circuits 2e: Chapter Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.
Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.
Multioperand Addition
CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.
ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
Addition and multiplication1 Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM
Integer Multiplication and Division
Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
Sequential Multipliers
UNIVERSITY OF MASSACHUSETTS Dept
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Addition and multiplication
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (www. cse. psu
Topics Number representation. Shifters. Adders and ALUs.
Unsigned Multiplication
Unconventional Fixed-Radix Number Systems
UNIVERSITY OF MASSACHUSETTS Dept
Topics Multipliers..
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
Reading: Study Chapter (including Booth coding)
Addition and multiplication
Montek Singh Mon, Mar 28, 2011 Lecture 11
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Addition and multiplication
ECE 352 Digital System Fundamentals
UNIVERSITY OF MASSACHUSETTS Dept
Lecture 9 Digital VLSI System Design Laboratory
Comparison of Various Multipliers for Performance Issues
Sequential Multipliers
UNIVERSITY OF MASSACHUSETTS Dept
Appendix J Authors: John Hennessy & David Patterson.
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

CSE575 Multiplication.1 © MJIrwin, PSU, 2005 Computer Arithmetic CSE 575 Computer Arithmetic Spring 2005 Mary Jane Irwin (

CSE575 Multiplication.2 © MJIrwin, PSU, 2005 Computer Arithmetic Remaining Lecture Schedule Mar 15Introduction, number reprDr. IrwinChp 1 Mar 17Local project design reviewTheo T. Mar 22Global project reviewDr. Vijay Mar 24Global project reviewDr. Vijay Mar 29AdditionDr. IrwinChp 2 Apr 1Redundant repr & its usesDr. Irwin Apr 5MultiplicationDr. IrwinChp 4 Apr 7Local/Global project reviewDr. Vijay Apr 12DivisionDr. IrwinChp 5 Apr 14Flt point repr & operationDr. IrwinChp 8 Apr 19Function evaluationDr. IrwinChp 10, 11 Apr 21Final global project reviewDr. Vijay Apr 26Other # systemsDr. Irwin Apr 28Final global project reviewDr. Vijay

CSE575 Multiplication.3 © MJIrwin, PSU, 2005 Computer Arithmetic Review: Binary Adders synchronous word parallel adders ripple carry adders (RCA) carry prop min adders signed-digit fast carry prop residue adders adders (CPAs) adders Manchester carry carry prefix cond. carry carry chain select lookahead sum skip T = O(n), A = O(n) T = O(1), A = O(n) T = O(log n) A = O(n log n) T = O(  n), A = O(n) T = O(n), A = O(n)

CSE575 Multiplication.4 © MJIrwin, PSU, 2005 Computer Arithmetic Multioperand Addition l Addition of more than two numbers »vector inner products »computing averages X0X0 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 Sum n k log(k2 n - k + 1)  n + logk

CSE575 Multiplication.5 © MJIrwin, PSU, 2005 Computer Arithmetic Serial Implementation CPA n bits XjXj Partial sum register n + logk bits  X’s T serial-multiadd = O(k log(n + logk)) = O(k logn + k loglogk) Addition time grows superlinearly with k when n is fixed and logarithmically with n for a fixed k.

CSE575 Multiplication.6 © MJIrwin, PSU, 2005 Computer Arithmetic Multiply l Binary multiplication as repeated additions multiplicand - D multiplier - Q partial product array double precision product - P n 2n n

CSE575 Multiplication.7 © MJIrwin, PSU, 2005 Computer Arithmetic A Serial Implementation 2n-b CPA n bits D Partial product register 2n bits P T serial-multiply = O(n log(2n)) Multiplication time grows superlinearly with n (when using a log time adder) Multiplicand register Q Multiplier register Add/no add control 1 bit

CSE575 Multiplication.8 © MJIrwin, PSU, 2005 Computer Arithmetic Shift & Add Multiplication l Left shift and add »Partial products accumulated from bottom to top »Requires a 2n bit adder l Right shift and add »Partial products accumulated from top to bottom »Only requires an n bit adder »Sign extend ‘icand on right shift; premultiply ‘icand by 2 n to offset effect of right shifts (integer operands only)

CSE575 Multiplication.9 © MJIrwin, PSU, 2005 Computer Arithmetic Right Shift & Add Multiplier n-b CPA n bits D P T serial-multiply = O(n logn) or O(n 2 ) Multiplication time grows superlinearly with n. Multiplicand register Q Multiplier register Add/subt control (Partial) Product register 0 Add/no add control

CSE575 Multiplication.10 © MJIrwin, PSU, 2005 Computer Arithmetic Signed Multiplication l So far we have (q 0. q 1 q 2 q 3 …q n-1 ) P 0 = 0 P 1 = ½(P 0 + q n-1 D) P 2 = ½(P 1 + q n-2 D)... P i+1 = ½(P i + q n-i-1 D) = (  q n-j 2 -j ) D So P n-1 = (  q n-j 2 -j ) D = Q * D sign bit j=1 i+1 n-1 j=1

CSE575 Multiplication.11 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplicand l As long as we sign extend the ‘icand our scheme works fine l But what if both ‘icand and ‘ier are negative? D = *Q = +11

CSE575 Multiplication.12 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplicand l As long as we sign extend the ‘icand our scheme works fine l But what if both ‘icand and ‘ier are negative? D = *Q = P = -143 sign extend

CSE575 Multiplication.13 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplier l Recall for 2s’c D = -d  d j 2 -j and Q = -q  q j 2 -j and what we have computed so far is P n-1 = (  q n-j 2 -j ) D what we want is P = Q * D = -q D + (  q n-j 2 -j ) D j=1 n-1 l So the correction factor for 2s’c is P = P n-1 - q 0 D

CSE575 Multiplication.14 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplier Example D = *Q = -6

CSE575 Multiplication.15 © MJIrwin, PSU, 2005 Computer Arithmetic Negative (2s’C) Multiplier Example D = *Q = P = +78

CSE575 Multiplication.16 © MJIrwin, PSU, 2005 Computer Arithmetic Other Negative Multipliers l 1s’C P = P n-1 - q 0 D + 2 -(n-1) q 0 D »adder must do 1s’C addition (EAC) »sign extend ‘icand »initialize P 0 as q 0 D (rather than clearing the register) »do an optional subtraction as a last step l SM |P| = P n-1 and p sign = q 0  d 0 »strip off the sign bits and do unsigned multiplication (so no corrections and no sign extensions) »sign of the product is the xor of ‘ier and ‘icand sign bits

CSE575 Multiplication.17 © MJIrwin, PSU, 2005 Computer Arithmetic Lower Bound on Multiplication l Winograd’s lower bound on multiplication of two n-digit d-valued numbers is t   log2n  l Mult can be done as the addition of the log representation of two numbers a * b = c  loga + logb = logc but the data representation is nonstandard

CSE575 Multiplication.18 © MJIrwin, PSU, 2005 Computer Arithmetic Faster Serial Multiplication l Use logn fast CPA l Bypass addition cycle when ‘ier bit is 0 »Zero detect and barrel shift –Detect strings of zeros in the ‘ier and shift 1, 2, 3, … n-1 places right in one cycle l Use higher radix multiplication »Multiplier recoding to simplify multiple formation »CSAs to form multiples

CSE575 Multiplication.19 © MJIrwin, PSU, 2005 Computer Arithmetic Carry Save Adder (CSA) l A carry save adder is nothing more than a full adder with the carries saved rather than propagated! l Also called a (3,2) counter FA

CSE575 Multiplication.20 © MJIrwin, PSU, 2005 Computer Arithmetic Carry Save Word Adder l A 6 bit CSA reduces three 6-bit inputs to one 6-bit output and one 7- bit output FA 6-b CSA

CSE575 Multiplication.21 © MJIrwin, PSU, 2005 Computer Arithmetic Radix 4 Multiply l Radix 4 multiply involves half as many additions, so runs twice as fast where P i+1 = ¼(P i + q n-i-1 ||q n-i-2 D) with P 0 = 0 and P n-1 = (  q n-j 2 -j ) D = Q * D multiplicand -D multiplier - Q partial product array double precision product - P n 2n n/2 n-1 j=1

CSE575 Multiplication.22 © MJIrwin, PSU, 2005 Computer Arithmetic Forming the Multiples l Need the multiples 0*D, 1*D, 2*D, 3*D l All are easy except 3*D »compute it via an addition (3D = 2D + 1D) every cycle –too slow! »precompute it and store it in a register »use a CSA to form the multiples »replace 3D with a 4D (a carry into the next higher multiplier digit) and a – D – recode the multiplier – so you don’t need it

CSE575 Multiplication.23 © MJIrwin, PSU, 2005 Computer Arithmetic Using a CSA to Form Multiples n+2-b CPA D n+2 bits P ‘icand Q ‘ier Add/subt control (Partial) Product 0 n+1-b CSA 0 0,1D 0,2D Shift P || Q right 2 bits each iteration

CSE575 Multiplication.24 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding the Multiplier l Recall for radix 4, Q=[0,1,2,3] can be recoded into Q=[ - 2, - 1,0,1,2] l This recoding has to be accomplished so that the algebraic value (Q = -q 0 +  q j r -j in RC) of the ‘ier is unchanged... q j-1 q j q j+1... r -(j-1) q j-1 + r -j q j = r -(j-1) (q j-1 + 1) + r -j (q j - r) add a unit here subtract r here

CSE575 Multiplication.25 © MJIrwin, PSU, 2005 Computer Arithmetic Goals of Recoding l Maximize the number of zero’s  or l Eliminate the possibility of a 11 or digit pairing  

CSE575 Multiplication.26 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding l With mode digit, m j, and recoded digit, q j ’ r -(j-1) q j-1 ’ + r -j q j ’ + r -(j+1) q j+1 ’ = r -(j-1) (q j-1 +m j-1 ) + r -j (q j -rm j-1 +m j ) + r -(j+1) (q j+1 -rm j ) l So that q j ’ = q j - rm j-1 + m j

CSE575 Multiplication.27 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding, Con’t l And Q’ =  q j ’r -j =  r -j (q j - rm j-1 + m j ) =  r -j q j +  r -j (-rm j-1 + m j ) =  r -j q j - r 0 m 0 + r -1 m 1 - r -1 m 1 + … + r -(n-1) m n-1 = - m 0 +  r -j q j + r -(n-1) m n-1 l So if m n-1 = 0 and m 0 = q 0 then the recoding works for RC notation and the choices for m j (j = 1, 2, …, n-2) are arbitrary!!

CSE575 Multiplication.28 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding Table l In binary q j ’ = q j - 2m j-1 + m j l Given m j from the previous step, when q j is sensed pick m j-1 qj’qj’ mjmj qjqj m j-1 =0m j-1 =

CSE575 Multiplication.29 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding Families l Canonical (Booths) l Differentiating l Nonrestoring l Modified Booths uses q j-1 and q j uses q j-1,q j and q j+1

CSE575 Multiplication.30 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’

CSE575 Multiplication.31 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’ middle of string of 0’s middle of string of 1’s isolated 1 start string of 1’s start string of 0’s isolated 0

CSE575 Multiplication.32 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical (Booths) Recoding m 8 = 0 m 7 = m 6 = m 5 = m 4 = m 3 = m 2 = m 1 = m 0 = 0

CSE575 Multiplication.33 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical (Booths) Recoding m 8 = 0 m 7 = 1 0 m 6 = 1 m 5 = 10 m 4 = 1 1 m 3 = 00 m 2 = 01 m 1 = 00 m 0 =

CSE575 Multiplication.34 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical Recoding Facts l Every two nonzero recoded digits are separated by at least one zero digit (in binary), so Q=[ - 2, - 1,0,1,2] and no 3D to deal with »Proof: l Produces a multiplier with the most zeros l It is a left-directed (serial) recoding

CSE575 Multiplication.35 © MJIrwin, PSU, 2005 Computer Arithmetic Canonical Recoding Facts l Every two nonzero recoded digits are separated by at least one zero digit (in binary), so Q=[ - 2, - 1,0,1,2] and no 3D to deal with »Proof: l Produces a multiplier with the most zeros l It is a left-directed (serial) recoding So q i-1 ’ & q i ’ = (q i-1 !m i-1 | !q i-1 m i-1 )(q i !m i | !q i m i ) From the table m i-1 = m i q i | m i q i-1 | q i-1 q i and q i ’ = q i !m i | !q i m i If no two successive digits are nonzero, it must be true that q i-1 ’ & q i ’ = 0 And substituting terms gives q i-1 ’ & q i ’ = (q i-1 !q i !m i | !q i-1 q i m i )(q i !m i | !q i m i ) = 0 and !m i-1 = !q i !q i-1 | !m i !q i-1 | !m i !q i and !q i ’ = q i m i | !q i !m i

CSE575 Multiplication.36 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’

CSE575 Multiplication.37 © MJIrwin, PSU, 2005 Computer Arithmetic Differentiating Recoding m 8 = 0 m 7 = m 6 = m 5 = m 4 = m 3 = m 2 = m 1 = m 0 = 0

CSE575 Multiplication.38 © MJIrwin, PSU, 2005 Computer Arithmetic Differentiating Recoding m 8 = 0 m 7 = 10 m 6 = 11 m 5 = 0 m 4 = 1 1 m 3 = 00 m 2 = 0 m 1 = m 0 = 0 1

CSE575 Multiplication.39 © MJIrwin, PSU, 2005 Computer Arithmetic Differentiating Recoding Facts l Because of the pairing of rows, the recoding is independent of q j-1 and m j-1 = q j so m j = q j+1 l So the recoding can be based upon just q j and q j+1 to recode q j mjmj qjqj m j-1 qj’qj’ q j+1

CSE575 Multiplication.40 © MJIrwin, PSU, 2005 Computer Arithmetic More Differentiating Facts l The recoding can be done lsd first (left directed) OR msd first (right directed) OR in parallel Q M (Q shifted left) Q’ l Successive nonzero recoded digits are always of opposite sign, so Q=[ - 2, - 1,0,1,2] and still no 3D to have to deal with l Also gives a n/2 versus n height partial product array

CSE575 Multiplication.41 © MJIrwin, PSU, 2005 Computer Arithmetic Modified Booth’s Recoding l Modified Booth’s recoding has the same goal as differentiating, to have a recoding scheme that is parallel and that allows a radix 4 multiply without 3D l Instead of a mode digit, it uses three adjacent bits of Q to do the recoding and recodes two bits at a time (instead of one) l Successive nonzero recoded digits are always of opposite sign, so Q=[ - 2, - 1,0,1,2]

CSE575 Multiplication.42 © MJIrwin, PSU, 2005 Computer Arithmetic Modified Booth’s Scheme q j-1 qjqj q j+1 q j-1 ’qj’qj’

CSE575 Multiplication.43 © MJIrwin, PSU, 2005 Computer Arithmetic Modified Booth’s Scheme q j-1 qjqj q j+1 q j-1 ’qj’qj’ run of zeros end of string of ones isolated one end of string of ones start of string of ones isolated zero start of string of ones run of ones

CSE575 Multiplication.44 © MJIrwin, PSU, 2005 Computer Arithmetic Recoding Hardware Comparison l How do differentiating and modified Booth’s compare wrt time, complexity, power? Q differentiating modified Booths

CSE575 Multiplication.45 © MJIrwin, PSU, 2005 Computer Arithmetic Right Shift & Add Multiplier n+2-b CPA !D n+2 bits T serial-multiply = O((n/2) logn) ‘icand Add/subt control 0 recode Shift P || Q right 2 bits each iteration P Q ‘ier (Partial) Product -2D,-1D,0,1D,2D D

CSE575 Multiplication.46 © MJIrwin, PSU, 2005 Computer Arithmetic Multiplier Recoding Schemes CanonicalDifferentiateNonrestore mjmj q j-1 qjqj m j-1 qj’qj’ qj’qj’ qj’qj’ low order 0’s only never?

CSE575 Multiplication.47 © MJIrwin, PSU, 2005 Computer Arithmetic Nonrestoring Recoding m 8 = 0 m 7 = m 6 = m 5 = m 4 = m 3 = m 2 = m 1 = m 0 =

CSE575 Multiplication.48 © MJIrwin, PSU, 2005 Computer Arithmetic Nonrestoring Recoding m 8 = 0 1 m 7 = 0 m 6 = 11 m 5 = 0 m 4 = 1 m 3 = 11 m 2 = 0 m 1 = m 0 = 0 1

CSE575 Multiplication.49 © MJIrwin, PSU, 2005 Computer Arithmetic Nonrestoring Recoding Facts l The msd does not conform to the rules, it is overridden by the termination condition that m 0 must agree with the sign of the ‘ier l Gives a recoded digit set of Q’=[ - 3, - 1,1,3] (lsd could also be Q’=[ - 2, 0, 2]) l It is a left-directed (serial) recoding l It corresponds to the inverse of nonrestoring division, so could be useful in helping to determine the relationship between multiply and divide

CSE575 Multiplication.50 © MJIrwin, PSU, 2005 Computer Arithmetic Higher Radix Multiply l Does recoding work for radix 8? radix 16? radix 32? l Only choice is to form or pre-form multiples of D

CSE575 Multiplication.51 © MJIrwin, PSU, 2005 Computer Arithmetic Higher Radix Multiply l Does recoding work for radix 8? radix 16? radix 32? l Only choice is to form or pre-form multiples of D - 7 to 7 many “hard” multiples ( - 3,3, - 5,5, - 6,6, - 7,7) max. redundant - 6 to 6 many “hard” multiples ( - 3,3, - 5,5, - 6,6) - 5 to 5 some “hard” multiples ( - 3,3, - 5,5) - 4 to 4 few “hard” multiples ( - 3,3) min. redundant

CSE575 Multiplication.52 © MJIrwin, PSU, 2005 Computer Arithmetic Multiply Operation Review multiplicand (D) multiplier (Q) partial product array (ppa) (note: can be formed in parallel) double precision product (P = Q*D) n 2n n

CSE575 Multiplication.53 © MJIrwin, PSU, 2005 Computer Arithmetic Parallel Multiplication l In a parallel multiplier »Can use a multiplier recoding scheme to reduce the height of the ppa in half (from n bits high to n/2 bits high) –must be able to form the ppa in parallel (so must use either modified Booths or differentiating recoding) »Reduce the height of the ppa to two rows in parallel with a tree of fast adders »Use a fast CPA to do the final add

CSE575 Multiplication.54 © MJIrwin, PSU, 2005 Computer Arithmetic Full Tree Multiplier Structure partial product reduction tree fast CPA P (product) use multiplier recoding to reduce the height of the tree to n/2 D0 Q (‘ier) D D D multiple forming circuits...

CSE575 Multiplication.55 © MJIrwin, PSU, 2005 Computer Arithmetic Tree Reduction Techniques l CSA ((3,2) counters) trees »Wallace - row reduction –combine partial product bits as early as possible –fastest possible design, shorter CPA »Dadda - column reduction –combine partial product bits as late as possible –cheaper CSA tree, wider CPA l Other counter trees l SDA trees FA... n-b CSA

CSE575 Multiplication.56 © MJIrwin, PSU, 2005 Computer Arithmetic Full CSA (Wallace) Multiplier Tree n-b CSA n+1-b CSA n+3-b CSA n+3-b CPA [n+5,6][n+4,5] [n+3,4] [n+2,3][n+1,2] [n,1] [n-1,0] [n+1,1] [n+4,3] [n+4,4][n+1,2] [n+5,3] [n+2,2] [n+6,4] [n+5,4] [n+5,3] n+8[n+7,4]10 $ CSA-multiply = O((k-2)n)$ CSA + n$ CPA T CSA-multiply = O(tree height + T CPA ) = O(logk + logn) [n-1,1] [n+5,3] [n+2,3] 23 for a k-bit ‘ier (k=7) and n-bit ‘icand

CSE575 Multiplication.57 © MJIrwin, PSU, 2005 Computer Arithmetic 4x4 Tree Reduction Wallace tree 4FAs+6HAs+5-bCPA Dadda tree 2FAs+4HAs+6-bCPA 12A2A 3A3A 4A4A 3A3A 2A2A 1123A3A 4A4A 321 2A2A 2A2A 2A2A 3A3A 2A2A 1112A2A 3A3A 3A3A 3A3A bit CPA 6-bit CPA

CSE575 Multiplication.58 © MJIrwin, PSU, 2005 Computer Arithmetic 6x6 Tree Reduction Wallace tree 16FAs+13HAs+8-bCPA Dadda tree 15FAs+5HAs+10-bCPA 12A2A 3A3A 4A4A 5AA5AA 6AA6AA 5AA5AA 4A4A 3A3A 2A2A 11234A4A 5AA5AA 6AA6AA 5A5A A2A 2A2A 2A2A 4A4A 4A4A 4A4A 3A3A 3A3A 2A2A 11124A4A 4A4A 4A4A 4A4A 4A4A 4A4A A2A 2A2A 2A2A 3A3A 3A3A 3A3A 2A2A 2A2A 11113A3A 3A3A 3A3A 3A3A 3A3A 3A3A 3A3A 3A3A

CSE575 Multiplication.59 © MJIrwin, PSU, 2005 Computer Arithmetic Maximum Inputs for CSA Trees Max #CSA Levels The maximum number, n, of inputs that can be reduced to two outputs by an h- level CSA tree is n(h) =  3n(h-1)/2  Giving an upper bound of n(h)  2(3/2) h and a lower bound of n(h)  2(3/2) h-1

CSE575 Multiplication.60 © MJIrwin, PSU, 2005 Computer Arithmetic l Gives an irregular structure making design and layout quite difficult l Connections and signal paths of varying lengths lead to signal skew and increased glitching impacting both performance and power consumption l Is there an approach better for VLSI layout we can use? Log Reduction Trees Traditional Wallace and Dadda Approaches

CSE575 Multiplication.61 © MJIrwin, PSU, 2005 Computer Arithmetic Reduction with Counters l General parallel counters »adds up the 1’s in a k-bit column outputting a logk wide count –completely utilized - (3;2) (7;3) (15;4) (5,5;4) (2,2,2,3;5) –partially utilized – (10;4) (3,3,3,3;6) l Specialize parallel counters »adds up the 1’s in a k-bit column plus j internal carry-ins, outputting j internal carry-outs and a 2 wide count –(4;2) (7;2) (11;2)

CSE575 Multiplication.62 © MJIrwin, PSU, 2005 Computer Arithmetic (7,3) Counter l Built out of (3,2) counters (3,2)

CSE575 Multiplication.63 © MJIrwin, PSU, 2005 Computer Arithmetic (7,3) Counter l Built out of (3,2) counters (3,2)

CSE575 Multiplication.64 © MJIrwin, PSU, 2005 Computer Arithmetic (4,2) Counter l Also built out of two (3,2) counters, but with 1 internal carry-in and 1 internal carry-out (3,2)

CSE575 Multiplication.65 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (4,2) Counters l Tiles with neighboring (4,2) counters l Reduces columns four high to columns only two high »Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)

CSE575 Multiplication.66 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (4,2) Counters l Tiles with neighboring (4,2) counters l Reduces columns four high to columns only two high »Internal carry in at same “level” (i.e., bit position weight) as the internal carry out (3,2)

CSE575 Multiplication.67 © MJIrwin, PSU, 2005 Computer Arithmetic 4x4 PPA Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product l Fast 4x4 multiplication using (4,2) counters l How would you lay it out?

CSE575 Multiplication.68 © MJIrwin, PSU, 2005 Computer Arithmetic 4x4 PPA Reduction multiplicand multiplier partial product array reduced pp array (to CPA) double precision product l Fast 4x4 multiplication using (4,2) counters l How would you lay it out? five (4,2) counters 5-bit CPA multiplicand multiplier 8-bit product

CSE575 Multiplication.69 © MJIrwin, PSU, 2005 Computer Arithmetic 8x8 PPA Reduction ‘icand ‘ier partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows?

CSE575 Multiplication.70 © MJIrwin, PSU, 2005 Computer Arithmetic 8x8 PPA Reduction ‘icand ‘ier partial product array reduced partial product array How many (4,2) counters minimum are needed to reduce it to 2 rows? Answer: 24 to a 12-bit fast CPA

CSE575 Multiplication.71 © MJIrwin, PSU, 2005 Computer Arithmetic 8x8 PPA Reduction ‘icand ‘ier partial product array reduced partial product array two rows of nine (4,2) counters each one row of thirteen (4,2) counters to a 13-bit fast CPA How many (4,2) counters are needed in the Wallace tree? Answer: 31

CSE575 Multiplication.72 © MJIrwin, PSU, 2005 Computer Arithmetic An 8x8 Multiplier Layout multiplicand multiplier thirteen (4,2) counters 13-bit CPA l How should it be laid out? nine (4,2) counters

CSE575 Multiplication.73 © MJIrwin, PSU, 2005 Computer Arithmetic A Better 8x8 Multiplier Layout multiple generators multiplicand multiple selection signals (‘ier)... 2 nine (4,2) counters thirteen (4,2) counters CPA l One that focuses on wires instead of gates

CSE575 Multiplication.74 © MJIrwin, PSU, 2005 Computer Arithmetic A 16x16 Multiplier Layout multiple generators multiplicand multiple selection signals (‘ier)... 2 (4,2) counter slice CPA

CSE575 Multiplication.75 © MJIrwin, PSU, 2005 Computer Arithmetic Pipelining l Divide computation into stages that take approximately the same time l Separate stages with pipeline latches to isolate them l Run clock at rate determined by slowest stage »Much faster clock l Longer latency - time from input of particular inputs to output of corresponding result l Big bandwidth win if doing lots of (independent) multiplies in a row - after pipeline fill, one result is generated every clock cycle

CSE575 Multiplication.76 © MJIrwin, PSU, 2005 Computer Arithmetic A Pipelined Version CPA multiple selection signals (‘ier)... multiplicand (4,2) counter slices 2 Pipeline latches on counter slice output

CSE575 Multiplication.77 © MJIrwin, PSU, 2005 Computer Arithmetic (7,2) Counter l Built out of (3,2) counters with 2 carry-ins and 2 carry-outs (3,2) from (i-2) slice from (i-1) slice to (i+2) slice to (i+1) slice (3,2)

CSE575 Multiplication.78 © MJIrwin, PSU, 2005 Computer Arithmetic (7,2) Counter l Built out of (3,2) counters with 2 carry-ins and 2 carry-outs (3,2) from (i-2) slice from (i-1) slice to (i+2) slice to (i+1) slice (3,2)

CSE575 Multiplication.79 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (7,2) Counters (3,2)

CSE575 Multiplication.80 © MJIrwin, PSU, 2005 Computer Arithmetic (11,2) Counter For each outgoing carry there is a corresponding incoming carry that was generated after the same delay - balanced delay tree Delay of five (3,2) counter levels

CSE575 Multiplication.81 © MJIrwin, PSU, 2005 Computer Arithmetic Tiling (11,2) Counters

CSE575 Multiplication.82 © MJIrwin, PSU, 2005 Computer Arithmetic Counter Reduction Trees l May be more CSA levels in the slice tree than in Wallace or Dadda trees l However, regular interconnect with shorter wires give more efficient and faster layouts with less glitching l Can be combined with ‘ier recoding to reduce the PP array height by half »may not pay off! (additional CSAs for reducing n rather than n/2 could be less complex than the recoding logic when wiring and layout irregularity are taken into account)

CSE575 Multiplication.83 © MJIrwin, PSU, 2005 Computer Arithmetic Signed Tree Multipliers l Sign extend each partial product (and do final correction subtraction) to width of final product a a a a a a a a x x x x x x x x b b b b b b b x x x x x x x x x d d d d d d x x x x x x x x x x signs can be removed

CSE575 Multiplication.84 © MJIrwin, PSU, 2005 Computer Arithmetic Baugh Wooley Multiplier q 0 q 1 q 2 q 3 d 0 d 1 d 2 d 3 d 0 q 3 d 1 q 3 d 2 q 3 d 3 q 3 d 0 q 2 d 1 q 2 d 2 q 2 d 3 q 2 d 0 q 1 d 1 q 1 d 2 q 1 d 3 q 1 d 0 q 0 d 1 q 0 d 2 q 0 d 3 q 0 q 0 q 1 q 2 q 3 d 0 d 1 d 2 d 3 d 0 !q 3 d 1 q 3 d 2 q 3 d 3 q 3 d 0 !q 2 d 1 q 2 d 2 q 2 d 3 q 2 d 0 !q 1 d 1 q 1 d 2 q 1 d 3 q 1 d 0 q 0 !d 1 q 0 !d 2 q 0 !d 3 q 0 !d d 0 1 !q q 0

CSE575 Multiplication.85 © MJIrwin, PSU, 2005 Computer Arithmetic Baugh Wooley Multiplier Example q 0 q 1 q 2 q 3 d 0 d 1 d 2 d 3 d 0 !q 3 d 1 q 3 d 2 q 3 d 3 q 3 d 0 !q 2 d 1 q 2 d 2 q 2 d 3 q 2 d 0 !q 1 d 1 q 1 d 2 q 1 d 3 q 1 d 0 q 0 !d 1 q 0 !d 2 q 0 !d 3 q 0 !d d 0 1 !q q

CSE575 Multiplication.86 © MJIrwin, PSU, 2005 Computer Arithmetic Partial Tree Multipliers l If full tree is too expensive, do h passes through a smaller tree reduction tree sum carry Upper part of the cumulative PP in stored carry form Lower part of cumulative PP h bit adder h+2 reduction tree h h-1 …

CSE575 Multiplication.87 © MJIrwin, PSU, 2005 Computer Arithmetic Multiply Operation multiplicand (D) multiplier (Q) partial product array double precision product (P = Q*D) h = 4

CSE575 Multiplication.88 © MJIrwin, PSU, 2005 Computer Arithmetic Radix 16 Partial Tree Multiply Q (‘ier) CSA sum carry D 2D 4D 8D bit shift 4 bits 3 bits 4 bit RCA To lower half of PP

CSE575 Multiplication.89 © MJIrwin, PSU, 2005 Computer Arithmetic Pipelined Tree Multiplier partial product reduction tree fast CPA P (product) D0 Q (‘ier) D D D multiple forming circuits... pipeline latches

CSE575 Multiplication.90 © MJIrwin, PSU, 2005 Computer Arithmetic Pipelined Partial Tree Multipliers l Feed back sum and carry into middle of (h+2) reduction tree reduction tree sum carry Upper part of the cumulative PP in stored carry form Lower part of cumulative P h bit adder h reduction tree h h-1 CSA latch h

CSE575 Multiplication.91 © MJIrwin, PSU, 2005 Computer Arithmetic Twin Beat Multiplier pipelined radix-8 recoder/ selector sum carry CSA 3DD pipelined radix-8 recoder/ selector sum carry CSA 3DD 6 6 5

CSE575 Multiplication.92 © MJIrwin, PSU, 2005 Computer Arithmetic Key References Baugh, Wooley, A two’s complement parallel array multiplication algorithm, IEEE Trans. Computers, 22: , Booth, A signed binary multiplication technique, Quarterly Journal Mechanics and Applied Math, 4(2): , June Ciminiera, Montuschi, Carry-save multiplication schemes without final addition, IEEE Trans. Computers, 45(9): , Dadda, Some schemes for parallel multipliers, Alta Frequenza, 34: , Dadda, On parallel digital multipliers, Alta Frequenza, 45: , Robertson, Two’s complement multiplication in binary parallel computers, IRE Trans. Electronic Computers, 4(3): , Sept Santoro, Horowitz, A pipelined 64x64b iterative array multiplier, Proc. of SSCC, pp , Feb Stenzel, Kubitz, A compact high-speed parallel multiplication scheme, IEEE Trans. on Computers, C-26: , Swartzlander, Parallel counters, IEEE Trans. on Computers, 22(11): , Wallace, A suggestion for a fast multiplier, IEEE Trans. on Electronic Computers, 13:14-17, Zuras, McAllister, Balanced delay trees and combinatorial division in VLSI, IEEE J. SSC, 21: , 1986.