1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.

Slides:



Advertisements
Similar presentations
Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.
Advertisements

ECE2030 Introduction to Computer Engineering Lecture 13: Building Blocks for Combinational Logic (4) Shifters, Multipliers Prof. Hsien-Hsin Sean Lee School.
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,
Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
Datapath Functional Units. Outline  Comparators  Shifters  Multi-input Adders  Multipliers.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Arithmetic See: P&H Chapter 3.1-3, C.5-6.
CSE-221 Digital Logic Design (DLD)
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.
1 Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University.
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
1 Timing-Driven Synthesis for Fast Barrel Shifters Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University.
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Arithmetic II CPSC 321 Andreas Klappenecker. Any Questions?
Introduction to CMOS VLSI Design Lecture 11: Adders
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding.
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
1 Area-reducing Sharing of Mutually Exclusive Multiplier, MAC, Adder and Subtractor blocks Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Lecture 17: Adders.
Introduction to CMOS VLSI Design Datapath Functional Units
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Bar Ilan University, Engineering Faculty
 Arithmetic circuit  Addition  Subtraction  Division  Multiplication.
Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Digital Arithmetic and Arithmetic Circuits
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Description and Analysis of MULTIPLIERS using LAVA.
1 Chapter 7 Computer Arithmetic Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
1 CPSC3850 Adders and Simple ALUs Simple Adders Figures 10.1/10.2 Binary half-adder (HA) and full-adder (FA). Digit-set interpretation: {0, 1}
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5.
Spring C:160/55:132 Page 1 Lecture 19 - Computer Arithmetic March 30, 2004 Sukumar Ghosh.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Block p and g Generators. Carry Determination as Prefix Computations Two Contiguous (or Overlapping) Blocks (g’, p’) and (g’’, p’’) Merged Block (g, p)
Unrolling Carry Recurrence
Conditional-Sum Adders Parallel Prefix Network Adders
Computer Architecture Lecture 16 Fasih ur Rehman.
EE466: VLSI Design Lecture 13: Adders
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.
Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
Application of Addition Algorithms Joe Cavallaro.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
AN ENHANCED LOW POWER HIGH SPEED ADDER FOR ERROR TOLERANT APPLICATIONS BY K.RAJASHEKHAR, , VLSI Design.
Conditional-Sum Adders Parallel Prefix Network Adders
Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University
Part III The Arithmetic/Logic Unit
ECE 352 Digital System Fundamentals
Description and Analysis of MULTIPLIERS using LAVA
Conditional-Sum Adders Parallel Prefix Network Adders
Presentation transcript:

1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University

2 What is an Adder? IC block that performs addition of 2 data signals IC block that performs addition of 2 data signals Well-known logic architectures Well-known logic architectures Often part of other arithmetic components, like Sum-of-Products, Multiplier etc. Often part of other arithmetic components, like Sum-of-Products, Multiplier etc. Computationally-intensive and occupies large area Computationally-intensive and occupies large area Wide usage in almost all digital designs Wide usage in almost all digital designs

3 Overview of an adder a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0_____________________________ S 8 S 7 S 6 S 5 S 4 S 3 S 2 S 1 S 0 For each bit (i = 0 to (n-1)) For each bit (i = 0 to (n-1)) S i = a i b i Carry i S i = a i b i Carry i Carry i+1 = (a i b i ) (b i Carry i ) (Carry i a i ) Carry i+1 = (a i b i ) (b i Carry i ) (Carry i a i )

4 Introduction to Parallel-Prefix Adder Fast family of adders Fast family of adders Computes Carry i for each bit i in a tree structure Computes Carry i for each bit i in a tree structure Several different flavors are available Several different flavors are available Brent-Kung and Kogge-Stone are very popular Brent-Kung and Kogge-Stone are very popular

5 Generate and Propagate for a Bit For each bit i of the adder, Generate (G i ) indicates whether a carry is generated from that bit For each bit i of the adder, Generate (G i ) indicates whether a carry is generated from that bit G i = a i b i G i = a i b i For each bit i of the adder, Propagate (P i ) indicates whether a carry is propagated through that bit For each bit i of the adder, Propagate (P i ) indicates whether a carry is propagated through that bit P i = a i b i P i = a i b i Generate and Propagate concept is extendable to blocks comprising multiple bits Generate and Propagate concept is extendable to blocks comprising multiple bits

6 Generate and Propagate for Blocks If two blocks (comprising one or more bits) have the GP value-pairs as (G left, P left ) and (G right, P right ), then the combined block has the GP values as follows: If two blocks (comprising one or more bits) have the GP value-pairs as (G left, P left ) and (G right, P right ), then the combined block has the GP values as follows: G left, right = G left (P left G right ) G left, right = G left (P left G right ) P left, right = P left P right P left, right = P left P right This operation is performed by a This operation is performed by a carry-operator or o-operator. (G left, P left ) (G right, P right ) (G left, right, P left, right )

7 Kogge-Stone (KS) Adder Parallel prefix, fast architecture: log 2 n levels Parallel prefix, fast architecture: log 2 n levels Requires large area: (n*log 2 n-n+1) cells Requires large area: (n*log 2 n-n+1) cells GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

8 Brent-Kung (BK) Adder Parallel prefix architecture: (2*log 2 n-2) levels Parallel prefix architecture: (2*log 2 n-2) levels Optimized for area: (2n-2-log 2 n) cells Optimized for area: (2n-2-log 2 n) cells GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

9 Our Proposed Approach Use 2-input XOR and AND gates to compute G i and P i values Use 2-input XOR and AND gates to compute G i and P i values Use triple-carry operator in parallel-prefix tree to compute Carry i values Use triple-carry operator in parallel-prefix tree to compute Carry i values Use P i and Carry i to compute final Sum i values. Use P i and Carry i to compute final Sum i values. G and P Generator (for each bit) Parallel-Prefix Tree using Triple-Carry operator Computation of Final Sum values 2 Inputs Outputs

10 Generate and Propagate for a Bit In our approach, we use the traditional way of computing the Generate (G i ) and Propagate (P i ) for each bit. In our approach, we use the traditional way of computing the Generate (G i ) and Propagate (P i ) for each bit. G i = a i b i G i = a i b i P i = a i b i P i = a i b i If G i is equal to 1, that indicates a Carry i+1 signal equal to 1’b1 (logic-1) is generated from the i th bit If G i is equal to 1, that indicates a Carry i+1 signal equal to 1’b1 (logic-1) is generated from the i th bit If P i is equal to 1, that indicates the Carry i gets fed to the Carry i+1 signal If P i is equal to 1, that indicates the Carry i gets fed to the Carry i+1 signal

11 Triple-Carry Operator If three blocks (or bits) have the GP value-pairs as If three blocks (or bits) have the GP value-pairs as (G left, P left ), (G mid, P mid ) and (G right, P right ), then the combined block generates a Carry only if Left block generates a Carry OR Left block generates a Carry OR Middle block generates a Carry and Left block propagates that OR Middle block generates a Carry and Left block propagates that OR Right block generates a Carry and both Middle and Left blocks propagate that Carry. Right block generates a Carry and both Middle and Left blocks propagate that Carry. The combined block propagates only if The combined block propagates only if Each of the three blocks propagates the input Carry. Each of the three blocks propagates the input Carry.

12 Triple-Carry Operator If three blocks (consisting of one or more bits) have the GP value-pairs as (G left, P left ), (G mid, P mid ) and (G right, P right ), then the combined block has the GP values as follows: If three blocks (consisting of one or more bits) have the GP value-pairs as (G left, P left ), (G mid, P mid ) and (G right, P right ), then the combined block has the GP values as follows: G left, right = G left (P left G mid ) (P left P mid G right ) G left, right = G left (P left G mid ) (P left P mid G right ) P left, right = P left P mid P right P left, right = P left P mid P right This operation is performed by a triple-carry operator or o3-operator. This operation is performed by a triple-carry operator or o3-operator.

13 Triple-Carry Operator Typically, delay of a triple-carry operator is about 110% to 130% of the delay of a traditional carry-operator. Typically, delay of a triple-carry operator is about 110% to 130% of the delay of a traditional carry-operator. Typically, area of a triple-carry operator is about 150% to 180% of the area of a traditional carry-operator. Typically, area of a triple-carry operator is about 150% to 180% of the area of a traditional carry-operator. (G mid, P mid )(G right, P right ) (G left, right, P left, right ) (G left, P left )

14 Proposed Parallel-Prefix Network In the 1 st level (or topmost level) of the parallel-prefix tree network, we use maximum number of triple-carry operators to combine groups of three GP 3k, GP 3k+1 and GP 3k+2 (k starts from zero) In the 1 st level (or topmost level) of the parallel-prefix tree network, we use maximum number of triple-carry operators to combine groups of three GP 3k, GP 3k+1 and GP 3k+2 (k starts from zero) In the quadrant closest to LSB, we use the traditional carry- operator exclusively. In the quadrant closest to LSB, we use the traditional carry- operator exclusively. In the quadrant closest to MSB, our proposed triple-carry operator extensively. In the quadrant closest to MSB, our proposed triple-carry operator extensively. In the middle two quadrants, we use both carry-operator and triple-carry operator in a timing-driven fashion. In the middle two quadrants, we use both carry-operator and triple-carry operator in a timing-driven fashion. We restrict the fanout of each operator to 5 We restrict the fanout of each operator to 5

15 Proposed Parallel-Prefix Network Critical path primarily goes through the bits near MSB Critical path primarily goes through the bits near MSB We instantiate more triple-carry operators along the critical path and bits near MSB. We instantiate more triple-carry operators along the critical path and bits near MSB. This reduces the depth along the critical path of the parallel-prefix computation tree. This reduces the depth along the critical path of the parallel-prefix computation tree. The delay of o3 operator is about 110%-130% of delay of o operator. The delay of o3 operator is about 110%-130% of delay of o operator. Bits near LSB are typically less critical and has less depth Bits near LSB are typically less critical and has less depth We instantiate more traditional carry operators in the bits near LSB. We instantiate more traditional carry operators in the bits near LSB. This saves area occupied by the parallel-prefix computation tree. This saves area occupied by the parallel-prefix computation tree. The area of o3 operator is about 150%-180% of area of o operator. The area of o3 operator is about 150%-180% of area of o operator.

16 Proposed Parallel-Prefix Network For an example of the 24-bit adder, please refer to the paper. For an example of the 24-bit adder, please refer to the paper. GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 GP 11 GP 10 GP 9 GP 8 GP 15 GP 14 GP 13 GP 12 C 12 C 11 C 10 C 16 C 15 C 14 C 13 C9C9

17 Computation of Final Sum Values At the output of the parallel-prefix computation tree, G i, 0 and P i, 0 (for each bit i) values are produced. At the output of the parallel-prefix computation tree, G i, 0 and P i, 0 (for each bit i) values are produced. By definition, if G i, 0 is equal to 1’b1 (logic-1), then a carry gets fed to the (i+1) th bit. Hence, By definition, if G i, 0 is equal to 1’b1 (logic-1), then a carry gets fed to the (i+1) th bit. Hence, Carry i+1 = G i, 0 Carry i+1 = G i, 0 Sum i+1 is computed by using the following equation Sum i+1 is computed by using the following equation Sum i+1 = P i+1 Carry i+1 Sum i+1 = P i+1 Carry i+1 = P i+1 G i, 0 = P i+1 G i, 0

18 Delay Results On an average, Our approach produces about 23% faster adder than BK adder and about 0.5% faster than KS adder

19 Area Results On an average, Our approach produces about 9% larger adder than BK adder and about 30% smaller than KS adder

20 Summary Triple-carry operator combines GP values of 3 blocks Triple-carry operator combines GP values of 3 blocks Use triple-carry operator in the parallel-prefix computation tree to reduce delay of the critical-path Use triple-carry operator in the parallel-prefix computation tree to reduce delay of the critical-path Use traditional carry-operator in non timing-critical path to reduce the overall area Use traditional carry-operator in non timing-critical path to reduce the overall area Our approach is 0.5% faster than KS and 23% faster than BK Our approach is 0.5% faster than KS and 23% faster than BK Our approach is 29% smaller than KS and 9% larger than BK Our approach is 29% smaller than KS and 9% larger than BK

21 Thank you