1 Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University.

Slides:



Advertisements
Similar presentations
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Advertisements

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
UNIVERSITY OF MASSACHUSETTS Dept
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Datapath Functional Units. Outline  Comparators  Shifters  Multi-input Adders  Multipliers.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
CSE-221 Digital Logic Design (DLD)
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.
Space vs. Speed: Binary Adders 11.3 Space vs. Speed.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
1 Timing-Driven Synthesis for Fast Barrel Shifters Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University.
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
UNIVERSITY OF MASSACHUSETTS Dept
Introduction to CMOS VLSI Design Lecture 11: Adders
EE466: VLSI Design Lecture 14: Datapath Functional Units.
1 Area-reducing Sharing of Mutually Exclusive Multiplier, MAC, Adder and Subtractor blocks Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Lecture 17: Adders.
Introduction to CMOS VLSI Design Datapath Functional Units
ECE 301 – Digital Electronics
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
Copyright 2008 Koren ECE666/Koren Part.5a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Bar Ilan University, Engineering Faculty
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Accuracy-Configurable Adder for Approximate Arithmetic Designs
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
1. Copyright  2005 by Oxford University Press, Inc. Computer Architecture Parhami2 Figure 10.1 Truth table and schematic diagram for a binary half-adder.
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.
CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,
1 Appendix J Authors: John Hennessy & David Patterson.
Han Liu Supervisor: Seok-Bum Ko Electrical & Computer Engineering Department 2010-Feb-23.
1 Chapter 7 Computer Arithmetic Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5.
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
Spring C:160/55:132 Page 1 Lecture 19 - Computer Arithmetic March 30, 2004 Sukumar Ghosh.
1 CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits 4-1,2: Iterative Combinational Circuits and Binary Adders.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
ECE 645 – Computer Arithmetic Lecture 6: Multi-Operand Addition ECE 645—Computer Arithmetic 3/5/08.
Block p and g Generators. Carry Determination as Prefix Computations Two Contiguous (or Overlapping) Blocks (g’, p’) and (g’’, p’’) Merged Block (g, p)
Unrolling Carry Recurrence
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Conditional-Sum Adders Parallel Prefix Network Adders
Digital Design Module –II Adders Amit Kumar Assistant Professor SCSE, Galgotias University, Greater Noida.
Computer Architecture Lecture 16 Fasih ur Rehman.
ECE 331 – Digital System Design Multi-bit Adder Circuits, Adder/Subtractor Circuit, and Multiplier Circuit (Lecture #12)
EE466: VLSI Design Lecture 13: Adders
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
CPEN Digital System Design
Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Summary Half-Adder Basic rules of binary addition are performed by a half adder, which has two binary inputs (A and B) and two binary outputs (Carry out.
Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University
Presentation transcript:

1 Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University Presented by David Pan, UT Austin

2 What is a Multiplier? IC block that perform multiplication operation IC block that perform multiplication operation Well-known logic architectures Well-known logic architectures Computationally-intensive Computationally-intensive Wide usage in DSP, Graphics, Microprocessors Wide usage in DSP, Graphics, Microprocessors

3 Structure of Multiplier Multiplier block consists of 3 parts (written in the order of data-flow) Multiplier block consists of 3 parts (written in the order of data-flow) Partial Product Generator (PPGen) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Partial Product Reduction Tree (PPRT) Final Carry-Propagation Adder (CPA) Final Carry-Propagation Adder (CPA) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Final Carry Propagation Adder (CPA) Inputs Output

4 Final Adder in a Multiplier Frequently used adder architectures Frequently used adder architectures Ripple-Carry Ripple-Carry Area-efficient, but slow Area-efficient, but slow Timing-efficient if inputs have skewed arrival time Timing-efficient if inputs have skewed arrival time Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) Faster architecture Faster architecture Requires more area Requires more area Carry-Select Carry-Select Large area overhead (often >100%) Large area overhead (often >100%) Better delay if C in signal arrives late. Better delay if C in signal arrives late.

5 3-stage Hybrid Adder Multipliers exhibit a typical arrival time pattern (in the input of the CPA) Multipliers exhibit a typical arrival time pattern (in the input of the CPA) Hybrid adder produces best result for Multipliers Hybrid adder produces best result for Multipliers This outperforms all stand-alone architectures This outperforms all stand-alone architectures Stelling et al., “Design Strategies for optimal hybrid final adders in a parallel multiplier”, In The Journal of VLSI Signal Processing, 1996

6 3-Stage Hybrid Adder There are many possible configurations (w 1, w 2 and w 3 ). Exhaustive exploration is not feasible (huge runtime) How to identify the best configuration? How to identify the best configuration? SubAdder 1 (Ripple) w rpl SubAdder 2 (Brent-Kung) w bk SubAdder 3 (Carry-Select) w cs

7 Identification of Optimal Topology Width of the Ripple adder Width of the Ripple adder At every bit (i), compute T(C i+1 ) and check if At every bit (i), compute T(C i+1 ) and check if T(C i+1 ) ≤ T(a i+1 ) or T(C i+1 ) ≤ T(a i+1 ) or T(C i+1 ) ≤ T(b i+1 ) T(C i+1 ) ≤ T(b i+1 ) If check passes, w rpl = i+1 If check passes, w rpl = i+1 Else continue checking until 3 consecutive bits fail the check (Hill Climbing) Else continue checking until 3 consecutive bits fail the check (Hill Climbing) Return the value i as the Ripple Adder width Return the value i as the Ripple Adder width

8 Delay of the Hybrid Adder T hybrid =Max (T s2, (T co2 +D mx ), (T s3 +D mx )) SubAdder 1 (Ripple) w rpl SubAdder 2 (Brent-Kung) w bk SubAdder 3 (Carry-Select) w cs T s2 T s3 + D mx T co2 + D mx

9 Identification of Optimal Topology Width of the BK and Carry-Select Adders Width of the BK and Carry-Select Adders Initial Configuration w bk = 2 p, where p= log 2 (n – w rpl ) w cs = n – w bk – w rpl Example: If n=32 and w rpl =7 then w bk =16 and w cs =9 Iterative approach Iterative approach Estimate delay of a configuration and explore in the appropriate direction (similar to Binary Search) Estimate delay of a configuration and explore in the appropriate direction (similar to Binary Search)

10 Results For different adder widths, our approach always found best configuration in very short runtime. For different adder widths, our approach always found best configuration in very short runtime. Runtime example: for a 32-bit Adder, Runtime example: for a 32-bit Adder, Trying all possible configurations (561) takes hours of runtime Trying all possible configurations (561) takes hours of runtime Our approach takes 4-18 minutes of runtime and always computes the best configuration. Our approach takes 4-18 minutes of runtime and always computes the best configuration.

11 Results Now, it is feasible to use this powerful hybrid-adder architecture during synthesis (~12% faster adder). Now, it is feasible to use this powerful hybrid-adder architecture during synthesis (~12% faster adder).

12 Thank you