VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Prof. V.G. OklobdzijaVLSI Arithmetic2 Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.

Prof. V.G. OklobdzijaVLSI Arithmetic3 Basic Operations Addition Multiplication Multiply-Add Division Evaluation of Functions Multi-Media

Addition of Binary Numbers

Prof. V.G. OklobdzijaVLSI Arithmetic5 Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits: The sum and carry outputs are described as: Full Adder C in C out sisi aiai bibi

Prof. V.G. OklobdzijaVLSI Arithmetic6 Addition of Binary Numbers Propagate Generate InputsOutputs cici aiai bibi sisi c i+1 00000 00110 01010 01101 10010 10101 11001 11111

Prof. V.G. OklobdzijaVLSI Arithmetic7 Full-Adder Implementation Full Adder operations is defined by equations: One-bit adder could be implemented as shown Carry-Propagate: and Carry-Generate g i

Prof. V.G. OklobdzijaVLSI Arithmetic8 High-Speed Addition One-bit adder could be implemented more efficiently because MUX is faster

Prof. V.G. OklobdzijaVLSI Arithmetic9 The Ripple-Carry Adder

Prof. V.G. OklobdzijaVLSI Arithmetic10 The Ripple-Carry Adder From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic11 Inversion Property From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic12 Minimize Critical Path by Reducing Inverting Stages From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic13 Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path Oklobdzija, ISCAS’88

Prof. V.G. OklobdzijaVLSI Arithmetic14 Manchester Carry-Chain Realization of the Carry Path Simple and very popular scheme for implementation of carry signal path

Prof. V.G. OklobdzijaVLSI Arithmetic15 Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

Prof. V.G. OklobdzijaVLSI Arithmetic16 Manchester Carry Chain (CMOS) Kilburn, et al, IEE Proc, 1959. Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up

Prof. V.G. OklobdzijaVLSI Arithmetic17 Pass-Transistor Realization in DPL

Prof. V.G. OklobdzijaVLSI Arithmetic18 Carry-Skip Adder MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans on Comp, 12/61

Prof. V.G. OklobdzijaVLSI Arithmetic19 Carry-Skip Adder Bypass From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic20 Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups

Prof. V.G. OklobdzijaVLSI Arithmetic21 Carry-Skip Adder k

Prof. V.G. OklobdzijaVLSI Arithmetic22 Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic23 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic24 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 1 1 3 3 4 4 5 5 6  =9 Any-point-to-any-point delay = 9  as compared to 12  for CSKA

Prof. V.G. OklobdzijaVLSI Arithmetic25 Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic26 Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model:

Prof. V.G. OklobdzijaVLSI Arithmetic27 Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85

Prof. V.G. OklobdzijaVLSI Arithmetic28 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem

Prof. V.G. OklobdzijaVLSI Arithmetic29 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic30 Delay Comparison: Variable Block Adder VBA- Multi-Level CLA VBA

Prof. V.G. OklobdzijaVLSI Arithmetic31 Fan-Out Dependency

Prof. V.G. OklobdzijaVLSI Arithmetic32 Fan-In Dependency

Prof. V.G. OklobdzijaVLSI Arithmetic33 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic34

Prof. V.G. OklobdzijaVLSI Arithmetic35 Carry-Lookahead Adder (Weinberger and Smith) A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.

Prof. V.G. OklobdzijaVLSI Arithmetic36 Carry-Lookahead Adder (Weinberger and Smith)

Prof. V.G. OklobdzijaVLSI Arithmetic37 Carry-Lookahead Adder One gate delay  to calculate p, g One  to calculate P and two for G Three gate delays To calculate C 4(j+1) Compare that to 8  in RCA !

Prof. V.G. OklobdzijaVLSI Arithmetic38 Carry-Lookahead Adder (Weinberger and Smith) Additional two gate delays C 16 will take a total of 5  vs. 32  for RCA !

Prof. V.G. OklobdzijaVLSI Arithmetic39 32-bit Carry Lookahead Adder

Prof. V.G. OklobdzijaVLSI Arithmetic40 Carry-Lookahead Adder (Weinberger and Smith: original derivation )

Prof. V.G. OklobdzijaVLSI Arithmetic41 Carry-Lookahead Adder (Weinberger and Smith: original derivation )

Prof. V.G. OklobdzijaVLSI Arithmetic42 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !

Prof. V.G. OklobdzijaVLSI Arithmetic43 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !

Delay Optimized CLA B. Lee, V. G. Oklobdzija Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

Prof. V.G. OklobdzijaVLSI Arithmetic45 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) Fixed groups and levels (b.) variable-sized groups, fixed levels (c.) variable-sized groups and fixed levels (d.) variable-sized groups and levels

Prof. V.G. OklobdzijaVLSI Arithmetic46 Two-Levels of Logic Implementation of the Carry Block

Prof. V.G. OklobdzijaVLSI Arithmetic47 Two-Levels of Logic Implementation of the Carry-Lookahead Block

Prof. V.G. OklobdzijaVLSI Arithmetic48 Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)

Prof. V.G. OklobdzijaVLSI Arithmetic49 Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)

Prof. V.G. OklobdzijaVLSI Arithmetic50 Delay Optimized CLA: Lee-Oklobdzija ‘91 Delay: Two-level BCLA Delay: Three-level BCLA

Prof. V.G. OklobdzijaVLSI Arithmetic51 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) 2-level BCLA  =8.5nS (b.) 3-level BCLA  =8.9nS

Motorola: CLA Implementation Example A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”, Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

Prof. V.G. OklobdzijaVLSI Arithmetic53 Critical path in Motorola's 64-bit CLA

Prof. V.G. OklobdzijaVLSI Arithmetic54 Motorola's 64-bit CLA conventional PG Block

Prof. V.G. OklobdzijaVLSI Arithmetic55 Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals P i:0 are generated to speed-up C 3

Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981.

Prof. V.G. OklobdzijaVLSI Arithmetic57 Ling Adder Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Ling’s equations:

Prof. V.G. OklobdzijaVLSI Arithmetic58 Ling Adder Ling’s equation Doran, Trans on Comp 9/88 Propagates information on two bits

Prof. V.G. OklobdzijaVLSI Arithmetic59 Ling Adder Conventional: Ling:

Prof. V.G. OklobdzijaVLSI Arithmetic60 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic71 Results: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96 0.5u Technology Speed: 0.930 nS Nominal process, 80C, V=3.3V

ConditionalSum Adder J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic Computers, EC-9, p.226-231, 1960.

Prof. V.G. OklobdzijaVLSI Arithmetic73 Conditional Sum Adder

Prof. V.G. OklobdzijaVLSI Arithmetic74 ConditionalSum Adder

Carry-Select Adder O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June 1962, p.340-34

Prof. V.G. OklobdzijaVLSI Arithmetic76 Carry-Select Adder O.J. Bedrij, IBM Poughkeepsie, 1962

Prof. V.G. OklobdzijaVLSI Arithmetic77 Carry-Select Adder Addition under assumption of C in =0 and C in =1.

Prof. V.G. OklobdzijaVLSI Arithmetic78 Carry Select Adder: combining two 32-b VBAs in select mode Delay =  VBA32 +  MUX

Addition Under Non-equal Signal Arrival Profile Assumption P. Stelling, V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer Academic Publishers, Vol.14, No.3, December 1996

Prof. V.G. OklobdzijaVLSI Arithmetic80 Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree

Prof. V.G. OklobdzijaVLSI Arithmetic81 Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995

Prof. V.G. OklobdzijaVLSI Arithmetic82 Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995

Performing Multiply-Add Operation in the Multiply Time P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific Grove, California, July 5 - 9, 1997.

Prof. V.G. OklobdzijaVLSI Arithmetic93 Final Adder: Implementation

Recurrence Solver Based Adders Koggie and Stone, IEEE Trans on Computers, August 1973 Bilgory and Gajski, 18 th DAC, 1981 Brent and Kung, IEEE Trans on Computers, March 1982

Prof. V.G. OklobdzijaVLSI Arithmetic98 Recurrence Solver Based Adders 1973, Koggie and Stone published a general recurrence scheme for parallel computation 1979, Brent and Kung published Tech. Report on regular layout for parallel adders 1980, Guibas and Vuillemin, developed a layout scheme based on recurrence equation for addition 1980, Ladner and Fisher published “parallel prefix computation”, Jo of ACM 1981, Bilgory and Gajski published a paper on recurrence structures for automatic cell generation

Prof. V.G. OklobdzijaVLSI Arithmetic99 Recurrence Solver Based Adders They are based on recurrence equation for P,G (what is new there since Weinberger ?!!): Or:and

Prof. V.G. OklobdzijaVLSI Arithmetic100 Recurrence Solver Based Adders

Prof. V.G. OklobdzijaVLSI Arithmetic101 Carry-Lookahead Adder (Weinberger and Smith) Just to remind you ! please notice the similarity with Parallel-Prefix Adders !

Multiplexer Based Adder Farooqui and Oklobdzija 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999

Prof. V.G. OklobdzijaVLSI Arithmetic103 Multiplexer Based Adder Based on the realization that MUX circuit is faster than a logic gate due to its transmission gate implementation. Based on Carry-Lookahead method (W-S), or recurrence solver.

Prof. V.G. OklobdzijaVLSI Arithmetic104 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

Prof. V.G. OklobdzijaVLSI Arithmetic107 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999. Results in a very fast structure 7-MUX delays for a 64-b adder Delay using standard cell 0.25u, 2.5V, 25 o C : Adder Size (bits) Delay (pS) 8625 16665 32710 64903

Prof. V.G. OklobdzijaVLSI Arithmetic108 DEC "Alpha" 21064 Adder Combination: –8-bit tapered pre-discharged Manchester Carry Chains, with C in = 0 and C in = 1 –32-bit LSB Carry Lookahead Adder –32-bit MSB Conditional-Sum Adder –Carry-Select on most significant 32-bits –Latches in the middle: pipelined addition

Prof. V.G. OklobdzijaVLSI Arithmetic109 DEC "Alpha" 21064 Adder

Prof. V.G. OklobdzijaVLSI Arithmetic110 DEC "Alpha" 21064 Adder: Results The first 200MHz processor Built using 0.75u technology V=3.3V, 30W Pipelined (two-latches) allowing 5nS throughput and 10nS latency

Conclusion VLSI Implementation of Addition

Prof. V.G. OklobdzijaVLSI Arithmetic112 Conclusion: VLSI Implementation of Addition Currently, implementation parameters are not reflected in algorithms used for development Layout and wire delays effects are largely neglected and this is becoming intolerable in the next generation of technology Transistor sizing has a large effect which can out weight the algorithm There is a great disconnect between algorithm and implementation New rules and measures of goodness are needed

Multiplication Parallel Multiplier Implementation

Prof. V.G. OklobdzijaVLSI Arithmetic114 Multiplication Algorithm: for j=0,....,n-1 initially p(n)=XY after n steps

Prof. V.G. OklobdzijaVLSI Arithmetic115 Parallel Multipliers

Prof. V.G. OklobdzijaVLSI Arithmetic116 4:2 Compressor

Prof. V.G. OklobdzijaVLSI Arithmetic117 Re-designed 4:2 Compressor with 3 XOR Delay C in I1 I2 I3 I4 0 1 S C C out

A Method for Generation of Fast Parallel Multipliers by Vojin G. Oklobdzija David Villeger Simon S. Liu Electrical and Computer Engineering University of California Davis

Idea !!!!!

Prof. V.G. OklobdzijaVLSI Arithmetic122 Three-Dimensional optimization Method: TDM (Oklobdzija, Villeger, Liu, 1996)

Method

Computer Tools

Prof. V.G. OklobdzijaVLSI Arithmetic130 Algorithm for Automatic Generation of Partial Product Array. Initialize: Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where: p i = i+1 for i £ N-1 and p i = 2N-1-i for i N An element of a list Li ( j = 0,...,pi-1 ) is a pair: i where: nj : is a unique node identifying name  j : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point. For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.

Prof. V.G. OklobdzijaVLSI Arithmetic131 For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If sort the elements of Li in ascending order by the values of delay  j connect an HA to the first 2 elements of Li starting with the slowest input Ds =max {  A+  A-s,  B+  B-s} Dc =max {  A+  A-c,  B+  B-c} remove 2 elements from Li insert the pair into Li insert the pair into Li+1 decrement the length of Li increment the length of Li+1 End If;

while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay  j connect an FA to the first 3 elements of Li starting with the slowest input of the FA: Ds =max {  A+  A-s,  B+  B-s,  Ci+  Ci-s} Dc = max {  A+  A-c,  B+  B-c,  Ci+  Ci-c} remove 3 elements from Li insert the pair into Li insert the pair into Li+1 subtract 2 from the length of Li increment the length of Li+1 End While; sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA End For; End Method;

Competing Approaches

Prof. V.G. OklobdzijaVLSI Arithmetic138 Organization of Hitachi's DPL multiplier

Prof. V.G. OklobdzijaVLSI Arithmetic139 Hitachi's 4:2 compressor structure

Prof. V.G. OklobdzijaVLSI Arithmetic140 DPL multiplexer circuit

RECOMENDATIONS

Prof. V.G. OklobdzijaVLSI Arithmetic142 Conclusion 1.The key to improving multiplier speed was in optimizing interconnections, not the compressor circuit (as it was believed for so long). 2.With the increase in wire delay it is important to make a connection between layout topology and algorithm for optimal interconnection of the PPRT. 3.Using one of the “fast adders” (CLA) as a final adder was acutally counterproductive. A simple final adder, but optimized for the signal arrival profile yields better results with less hardware. 4.It is possible to further optimize the PPRT and FA so that Multiply-Add operation (fused) can be performed in multiply time. 5.For the larger size multipliers / adders (as used in cryptography) the optimization procedures (described) yields even better results. See: http://www.ece.ucdavis.edu/acsel/Publications.html

Prof. V.G. OklobdzijaVLSI Arithmetic143 Read This ! 1.E. Swartzlander, "Computer Arithmetic". Vol. 1&2, IEEE Computer Society Press, 1990. 2.K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, 1979. 3.M. Ercegovac, “Digital Systems and Hardware/Firmware Algorithms”, Chapter 12: Arithmetic Algorithms and Processors, John Wiley & Sons, 1985. 4.A. Chandrakasan, W. Bowhill, F Fox, Editors, "Design of High Performance Microprocessors Circuits", IEEE Press, July 2000. 5.V. G. Oklobdzija, “High-Performance System Design: Circuits and Logic”, IEEE Press, July 1999. Also: http://www.ece.ucdavis.edu/acsel/Publications.html

Prof. V.G. OklobdzijaVLSI Arithmetic144 THE END

Hollywood

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Similar presentations

Presentation on theme: "VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Similar presentations

Presentation on theme: "VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California"— Presentation transcript:

Similar presentations

About project

Feedback