VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Slides:

Advertisements

Similar presentations

VLSI Arithmetic Adders & Multipliers

Advertisements

Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.

CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

EE141 Adder Circuits S. Sundar Kumar Iyer.

UNIVERSITY OF MASSACHUSETTS Dept

Henry Hexmoor1 Chapter 5 Arithmetic Functions Arithmetic functions –Operate on binary vectors –Use the same subfunction in each bit position Can design.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 24 - Subsystem.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.

S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 28: Datapath Subsystems 2/3 Prof. Sherief Reda Division of Engineering,

EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.

VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California

1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.

Introduction to CMOS VLSI Design Lecture 11: Adders

Introduction to VLSI Circuits and Systems, NCUT 2007 Chapter 12 Arithmetic Circuits in CMOS VLSI Introduction to VLSI Circuits and Systems 積體電路概論賴秉樑 Dept.

Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H

Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.

Lecture 8 Arithmetic Logic Circuits

VLSI Arithmetic Adders & Multipliers

Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.

Lecture 17: Adders.

1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.

Copyright 2008 Koren ECE666/Koren Part.5a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,

Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.

Lec 17 : ADDERS ece407/507.

Parallel Prefix Adders A Case Study

Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 4 – Arithmetic Functions Logic and Computer.

Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.

Bar Ilan University, Engineering Faculty

 Arithmetic circuit  Addition  Subtraction  Division  Multiplication.

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 2 CMOS.

Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.

Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.

Chapter 6-1 ALU, Adder and Subtractor

Arithmetic Building Blocks

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.

EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.

Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003

Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder

Low-Power and Area-Efficient Carry Select Adder on Reconfigurable Hardware Presented by V.Santhosh kumar, B.Tech,ECE,4 th Year, GITAM University Under.

July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.

Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.

A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo

FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.

CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.

Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Logic and Computer Design.

EE 466/586 VLSI Design Partha Pande School of EECS Washington State University

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,

Unrolling Carry Recurrence

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

1 KU College of Engineering Elec 204: Digital Systems Design Lecture 10 Multiplexers MUX: –Selects binary information from one of many input lines and.

EE466: VLSI Design Lecture 13: Adders

Digital Integrated Circuits 2e: Chapter Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.

CPEN Digital System Design

Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.

VLSI Arithmetic Lecture 5

VLSI Arithmetic Lecture 4

VLSI Arithmetic Adders & Multipliers

VLSI Arithmetic Lecture 10: Multipliers

Digital Integrated Circuits A Design Perspective

Lecture 9 Digital VLSI System Design Laboratory

Arithmetic Circuits.

Presentation transcript:

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Prof. V.G. OklobdzijaVLSI Arithmetic2 Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.

Prof. V.G. OklobdzijaVLSI Arithmetic3 Basic Operations Addition Multiplication Multiply-Add Division Evaluation of Functions Multi-Media

Addition of Binary Numbers

Prof. V.G. OklobdzijaVLSI Arithmetic5 Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits: The sum and carry outputs are described as: Full Adder C in C out sisi aiai bibi

Prof. V.G. OklobdzijaVLSI Arithmetic6 Addition of Binary Numbers Propagate Generate InputsOutputs cici aiai bibi sisi c i

Prof. V.G. OklobdzijaVLSI Arithmetic7 Full-Adder Implementation Full Adder operations is defined by equations: One-bit adder could be implemented as shown Carry-Propagate: and Carry-Generate g i

Prof. V.G. OklobdzijaVLSI Arithmetic8 High-Speed Addition One-bit adder could be implemented more efficiently because MUX is faster

Prof. V.G. OklobdzijaVLSI Arithmetic9 The Ripple-Carry Adder

Prof. V.G. OklobdzijaVLSI Arithmetic10 The Ripple-Carry Adder From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic11 Inversion Property From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic12 Minimize Critical Path by Reducing Inverting Stages From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic13 Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path Oklobdzija, ISCAS’88

Prof. V.G. OklobdzijaVLSI Arithmetic14 Manchester Carry-Chain Realization of the Carry Path Simple and very popular scheme for implementation of carry signal path

Prof. V.G. OklobdzijaVLSI Arithmetic15 Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

Prof. V.G. OklobdzijaVLSI Arithmetic16 Manchester Carry Chain (CMOS) Kilburn, et al, IEE Proc, Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up

Prof. V.G. OklobdzijaVLSI Arithmetic17 Pass-Transistor Realization in DPL

Prof. V.G. OklobdzijaVLSI Arithmetic18 Carry-Skip Adder MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans on Comp, 12/61

Prof. V.G. OklobdzijaVLSI Arithmetic19 Carry-Skip Adder Bypass From Rabaey

Prof. V.G. OklobdzijaVLSI Arithmetic20 Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups

Prof. V.G. OklobdzijaVLSI Arithmetic21 Carry-Skip Adder k

Prof. V.G. OklobdzijaVLSI Arithmetic22 Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic23 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic24 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)  =9 Any-point-to-any-point delay = 9  as compared to 12  for CSKA

Prof. V.G. OklobdzijaVLSI Arithmetic25 Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic26 Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model:

Prof. V.G. OklobdzijaVLSI Arithmetic27 Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85

Prof. V.G. OklobdzijaVLSI Arithmetic28 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem

Prof. V.G. OklobdzijaVLSI Arithmetic29 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic30 Delay Comparison: Variable Block Adder VBA- Multi-Level CLA VBA

Prof. V.G. OklobdzijaVLSI Arithmetic31 Fan-Out Dependency

Prof. V.G. OklobdzijaVLSI Arithmetic32 Fan-In Dependency

Prof. V.G. OklobdzijaVLSI Arithmetic33 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

Prof. V.G. OklobdzijaVLSI Arithmetic34

Prof. V.G. OklobdzijaVLSI Arithmetic35 Carry-Lookahead Adder (Weinberger and Smith) A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.

Prof. V.G. OklobdzijaVLSI Arithmetic36 Carry-Lookahead Adder (Weinberger and Smith)

Prof. V.G. OklobdzijaVLSI Arithmetic37 Carry-Lookahead Adder One gate delay  to calculate p, g One  to calculate P and two for G Three gate delays To calculate C 4(j+1) Compare that to 8  in RCA !

Prof. V.G. OklobdzijaVLSI Arithmetic38 Carry-Lookahead Adder (Weinberger and Smith) Additional two gate delays C 16 will take a total of 5  vs. 32  for RCA !

Prof. V.G. OklobdzijaVLSI Arithmetic39 32-bit Carry Lookahead Adder

Prof. V.G. OklobdzijaVLSI Arithmetic40 Carry-Lookahead Adder (Weinberger and Smith: original derivation )

Prof. V.G. OklobdzijaVLSI Arithmetic41 Carry-Lookahead Adder (Weinberger and Smith: original derivation )

Prof. V.G. OklobdzijaVLSI Arithmetic42 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !

Prof. V.G. OklobdzijaVLSI Arithmetic43 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !

Delay Optimized CLA B. Lee, V. G. Oklobdzija Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

Prof. V.G. OklobdzijaVLSI Arithmetic45 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) Fixed groups and levels (b.) variable-sized groups, fixed levels (c.) variable-sized groups and fixed levels (d.) variable-sized groups and levels

Prof. V.G. OklobdzijaVLSI Arithmetic46 Two-Levels of Logic Implementation of the Carry Block

Prof. V.G. OklobdzijaVLSI Arithmetic47 Two-Levels of Logic Implementation of the Carry-Lookahead Block

Prof. V.G. OklobdzijaVLSI Arithmetic48 Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)

Prof. V.G. OklobdzijaVLSI Arithmetic49 Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)

Prof. V.G. OklobdzijaVLSI Arithmetic50 Delay Optimized CLA: Lee-Oklobdzija ‘91 Delay: Two-level BCLA Delay: Three-level BCLA

Prof. V.G. OklobdzijaVLSI Arithmetic51 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) 2-level BCLA  =8.5nS (b.) 3-level BCLA  =8.9nS

Motorola: CLA Implementation Example A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”, Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

Prof. V.G. OklobdzijaVLSI Arithmetic53 Critical path in Motorola's 64-bit CLA

Prof. V.G. OklobdzijaVLSI Arithmetic54 Motorola's 64-bit CLA conventional PG Block

Prof. V.G. OklobdzijaVLSI Arithmetic55 Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals P i:0 are generated to speed-up C 3

Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981.

Prof. V.G. OklobdzijaVLSI Arithmetic57 Ling Adder Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Ling’s equations:

Prof. V.G. OklobdzijaVLSI Arithmetic58 Ling Adder Ling’s equation Doran, Trans on Comp 9/88 Propagates information on two bits

Prof. V.G. OklobdzijaVLSI Arithmetic59 Ling Adder Conventional: Ling:

Prof. V.G. OklobdzijaVLSI Arithmetic60 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic61 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic62 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic63 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic64 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic65 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic66 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic67 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic68 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic69 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic70 S. Naffziger, ISSCC’96

Prof. V.G. OklobdzijaVLSI Arithmetic71 Results: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ u Technology Speed: nS Nominal process, 80C, V=3.3V

ConditionalSum Adder J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic Computers, EC-9, p , 1960.

Prof. V.G. OklobdzijaVLSI Arithmetic73 Conditional Sum Adder

Prof. V.G. OklobdzijaVLSI Arithmetic74 ConditionalSum Adder

Carry-Select Adder O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June 1962, p

Prof. V.G. OklobdzijaVLSI Arithmetic76 Carry-Select Adder O.J. Bedrij, IBM Poughkeepsie, 1962

Prof. V.G. OklobdzijaVLSI Arithmetic77 Carry-Select Adder Addition under assumption of C in =0 and C in =1.

Prof. V.G. OklobdzijaVLSI Arithmetic78 Carry Select Adder: combining two 32-b VBAs in select mode Delay =  VBA32 +  MUX

Addition Under Non-equal Signal Arrival Profile Assumption P. Stelling, V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer Academic Publishers, Vol.14, No.3, December 1996

Prof. V.G. OklobdzijaVLSI Arithmetic80 Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree

Prof. V.G. OklobdzijaVLSI Arithmetic81 Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995

Prof. V.G. OklobdzijaVLSI Arithmetic82 Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995

Prof. V.G. OklobdzijaVLSI Arithmetic83

Prof. V.G. OklobdzijaVLSI Arithmetic84

Prof. V.G. OklobdzijaVLSI Arithmetic85

Prof. V.G. OklobdzijaVLSI Arithmetic86

Prof. V.G. OklobdzijaVLSI Arithmetic87

Prof. V.G. OklobdzijaVLSI Arithmetic88

Prof. V.G. OklobdzijaVLSI Arithmetic89

Prof. V.G. OklobdzijaVLSI Arithmetic90

Performing Multiply-Add Operation in the Multiply Time P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific Grove, California, July 5 - 9, 1997.

Prof. V.G. OklobdzijaVLSI Arithmetic92

Prof. V.G. OklobdzijaVLSI Arithmetic93 Final Adder: Implementation

Prof. V.G. OklobdzijaVLSI Arithmetic94 Final Adder: Implementation

Prof. V.G. OklobdzijaVLSI Arithmetic95 Final Adder: Implementation

Prof. V.G. OklobdzijaVLSI Arithmetic96 Final Adder: Implementation

Recurrence Solver Based Adders Koggie and Stone, IEEE Trans on Computers, August 1973 Bilgory and Gajski, 18 th DAC, 1981 Brent and Kung, IEEE Trans on Computers, March 1982

Prof. V.G. OklobdzijaVLSI Arithmetic98 Recurrence Solver Based Adders 1973, Koggie and Stone published a general recurrence scheme for parallel computation 1979, Brent and Kung published Tech. Report on regular layout for parallel adders 1980, Guibas and Vuillemin, developed a layout scheme based on recurrence equation for addition 1980, Ladner and Fisher published “parallel prefix computation”, Jo of ACM 1981, Bilgory and Gajski published a paper on recurrence structures for automatic cell generation

Prof. V.G. OklobdzijaVLSI Arithmetic99 Recurrence Solver Based Adders They are based on recurrence equation for P,G (what is new there since Weinberger ?!!): Or:and

Prof. V.G. OklobdzijaVLSI Arithmetic100 Recurrence Solver Based Adders

Prof. V.G. OklobdzijaVLSI Arithmetic101 Carry-Lookahead Adder (Weinberger and Smith) Just to remind you ! please notice the similarity with Parallel-Prefix Adders !

Multiplexer Based Adder Farooqui and Oklobdzija 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999

Prof. V.G. OklobdzijaVLSI Arithmetic103 Multiplexer Based Adder Based on the realization that MUX circuit is faster than a logic gate due to its transmission gate implementation. Based on Carry-Lookahead method (W-S), or recurrence solver.

Prof. V.G. OklobdzijaVLSI Arithmetic104 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

Prof. V.G. OklobdzijaVLSI Arithmetic105 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

Prof. V.G. OklobdzijaVLSI Arithmetic106 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

Prof. V.G. OklobdzijaVLSI Arithmetic107 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, Results in a very fast structure 7-MUX delays for a 64-b adder Delay using standard cell 0.25u, 2.5V, 25 o C : Adder Size (bits) Delay (pS)

Prof. V.G. OklobdzijaVLSI Arithmetic108 DEC "Alpha" Adder Combination: –8-bit tapered pre-discharged Manchester Carry Chains, with C in = 0 and C in = 1 –32-bit LSB Carry Lookahead Adder –32-bit MSB Conditional-Sum Adder –Carry-Select on most significant 32-bits –Latches in the middle: pipelined addition

Prof. V.G. OklobdzijaVLSI Arithmetic109 DEC "Alpha" Adder

Prof. V.G. OklobdzijaVLSI Arithmetic110 DEC "Alpha" Adder: Results The first 200MHz processor Built using 0.75u technology V=3.3V, 30W Pipelined (two-latches) allowing 5nS throughput and 10nS latency

Conclusion VLSI Implementation of Addition

Prof. V.G. OklobdzijaVLSI Arithmetic112 Conclusion: VLSI Implementation of Addition Currently, implementation parameters are not reflected in algorithms used for development Layout and wire delays effects are largely neglected and this is becoming intolerable in the next generation of technology Transistor sizing has a large effect which can out weight the algorithm There is a great disconnect between algorithm and implementation New rules and measures of goodness are needed

Multiplication Parallel Multiplier Implementation

Prof. V.G. OklobdzijaVLSI Arithmetic114 Multiplication Algorithm: for j=0,....,n-1 initially p(n)=XY after n steps

Prof. V.G. OklobdzijaVLSI Arithmetic115 Parallel Multipliers

Prof. V.G. OklobdzijaVLSI Arithmetic116 4:2 Compressor

Prof. V.G. OklobdzijaVLSI Arithmetic117 Re-designed 4:2 Compressor with 3 XOR Delay C in I1 I2 I3 I4 0 1 S C C out

A Method for Generation of Fast Parallel Multipliers by Vojin G. Oklobdzija David Villeger Simon S. Liu Electrical and Computer Engineering University of California Davis

Prof. V.G. OklobdzijaVLSI Arithmetic119

Idea !!!!!

Prof. V.G. OklobdzijaVLSI Arithmetic121

Prof. V.G. OklobdzijaVLSI Arithmetic122 Three-Dimensional optimization Method: TDM (Oklobdzija, Villeger, Liu, 1996)

Prof. V.G. OklobdzijaVLSI Arithmetic123

Prof. V.G. OklobdzijaVLSI Arithmetic124

Method

Prof. V.G. OklobdzijaVLSI Arithmetic126

Prof. V.G. OklobdzijaVLSI Arithmetic127

Prof. V.G. OklobdzijaVLSI Arithmetic128

Computer Tools

Prof. V.G. OklobdzijaVLSI Arithmetic130 Algorithm for Automatic Generation of Partial Product Array. Initialize: Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where: p i = i+1 for i £ N-1 and p i = 2N-1-i for i N An element of a list Li ( j = 0,...,pi-1 ) is a pair: i where: nj : is a unique node identifying name  j : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point. For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.

Prof. V.G. OklobdzijaVLSI Arithmetic131 For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If sort the elements of Li in ascending order by the values of delay  j connect an HA to the first 2 elements of Li starting with the slowest input Ds =max {  A+  A-s,  B+  B-s} Dc =max {  A+  A-c,  B+  B-c} remove 2 elements from Li insert the pair into Li insert the pair into Li+1 decrement the length of Li increment the length of Li+1 End If;

while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay  j connect an FA to the first 3 elements of Li starting with the slowest input of the FA: Ds =max {  A+  A-s,  B+  B-s,  Ci+  Ci-s} Dc = max {  A+  A-c,  B+  B-c,  Ci+  Ci-c} remove 3 elements from Li insert the pair into Li insert the pair into Li+1 subtract 2 from the length of Li increment the length of Li+1 End While; sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA End For; End Method;

Prof. V.G. OklobdzijaVLSI Arithmetic133

Prof. V.G. OklobdzijaVLSI Arithmetic134

Prof. V.G. OklobdzijaVLSI Arithmetic136

Competing Approaches

Prof. V.G. OklobdzijaVLSI Arithmetic138 Organization of Hitachi's DPL multiplier

Prof. V.G. OklobdzijaVLSI Arithmetic139 Hitachi's 4:2 compressor structure

Prof. V.G. OklobdzijaVLSI Arithmetic140 DPL multiplexer circuit

RECOMENDATIONS

Prof. V.G. OklobdzijaVLSI Arithmetic142 Conclusion 1.The key to improving multiplier speed was in optimizing interconnections, not the compressor circuit (as it was believed for so long). 2.With the increase in wire delay it is important to make a connection between layout topology and algorithm for optimal interconnection of the PPRT. 3.Using one of the “fast adders” (CLA) as a final adder was acutally counterproductive. A simple final adder, but optimized for the signal arrival profile yields better results with less hardware. 4.It is possible to further optimize the PPRT and FA so that Multiply-Add operation (fused) can be performed in multiply time. 5.For the larger size multipliers / adders (as used in cryptography) the optimization procedures (described) yields even better results. See:

Prof. V.G. OklobdzijaVLSI Arithmetic143 Read This ! 1.E. Swartzlander, "Computer Arithmetic". Vol. 1&2, IEEE Computer Society Press, K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, M. Ercegovac, “Digital Systems and Hardware/Firmware Algorithms”, Chapter 12: Arithmetic Algorithms and Processors, John Wiley & Sons, A. Chandrakasan, W. Bowhill, F Fox, Editors, "Design of High Performance Microprocessors Circuits", IEEE Press, July V. G. Oklobdzija, “High-Performance System Design: Circuits and Logic”, IEEE Press, July Also:

Prof. V.G. OklobdzijaVLSI Arithmetic144 THE END

Hollywood