1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 Adder Circuits S. Sundar Kumar Iyer.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal
UNIVERSITY OF MASSACHUSETTS Dept
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
Introduction to CMOS VLSI Design Lecture 11: Adders
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Interconnect Optimizations
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
VLSI Arithmetic Adders & Multipliers
Fall 06, Sep 14 ELEC / Lecture 5 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits (Formerly ELEC / )
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Lecture 17: Adders.
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.
Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Bar Ilan University, Engineering Faculty
Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
4. Combinational Logic Networks Layout Design Methods 4. 2
On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Block p and g Generators. Carry Determination as Prefix Computations Two Contiguous (or Overlapping) Blocks (g’, p’) and (g’’, p’’) Merged Block (g, p)
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Basics of Energy & Power Dissipation
Jianhua Liu1, Yi Zhu1, Haikun Zhu1, John Lillis2, Chung-Kuan Cheng1
Unrolling Carry Recurrence
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Conditional-Sum Adders Parallel Prefix Network Adders
EE466: VLSI Design Lecture 13: Adders
Digital Integrated Circuits 2e: Chapter Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.
Adding the Superset Adder to the DesignWare IP Library
1 Fundamentals of Computer Science Combinational Circuits.
Static Timing Analysis
EE415 VLSI Design. Read 4.1, 4.2 COMBINATIONAL LOGIC.
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
High Computation Mahendra Sharma. Hybrid number representation The hybrid number representations proposed are capable of bounding the maximum length of.
1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Conditional-Sum Adders Parallel Prefix Network Adders
Presentation transcript:

1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California, San Diego

2 Outline Prefix Adder Problem Background & Previous Work Extensions: High-radix, Ling Our Work Area/Timing/Power Models Mixed-Radix (2,3,4) Adders ILP Formulation Experimental Results Future Work

Prefix Adder – Challenges Logical Levels Wire Tracks Fanouts Area Physical placement Detail routing New Design Scope Timing Gate Cap Wire Cap Gate sizing Buffer insertion Signal slope Input arrival time Output require time Increasing impact of physical design and concern of power. Power Static power Dynamic power Power gating Activity Probability

4 Binary Addition Input: two n-bit binary numbers and, one bit carry-in Output: n-bit sum and one bit carry out Prefix Addition: Carry generation & propagation

5 Prefix Addition – Formulation Pre- processing: Post- processing: Prefix Computation:

:13:14:1 Prefix Adder – Prefix Structure Graph gp i pipi G [i:0] sisi bibi aiai GP [i, j] GP [j-1, k] GP [i, k] gp generator sum generator GP cell Pre- processing Post- processing Prefix Computation

7 Previous Works – Classical prefix adders :13:14:16:17:18:15:1 Brent-Kung: Logical levels: 2log 2 n–1 Max fanouts: 2 Wire tracks: :13:14:16:17:18:15:1 Kogge-Stone: Logical levels: log 2 n Max fanouts: 2 Wire tracks: n/ :13:14:16:17:18:15:1 Sklansky: Logical levels: log 2 n Max fanouts: n/2 Wire tracks: 1

8 High-Radix Adders Each cell has more than two fan-in ’ s Pros: less logic levels 6 levels (radix-2) vs. 3 levels (radix-4) for 64-bit addition Cons: larger delay and power in each cell

9 Radix-3 Sklansky & Kogge- Stone Adder David Harris, “ Logical Effort of Higher Valency Adders ”

10 Ling Adders Pre- processing: Post- processing: Prefix Computation: Prefix Ling

11 An 8-bit Ling Adder

12 Area Model Distinguish physical placement from logical structure, but keep the bit-slice structure. Logical view Physical view Bit position Logical level Bit position Physical level Compact placement

13 Timing Model Cell delay calculation: Effort DelayIntrinsic Delay Logical Effort Electrical Effort = Cout/Cin = (fanouts+wirelength) / size Intrinsic properties of the cell

14 Power Model Total power consumption: Dynamic power + Static Power Static power: leakage current of device P sta = *#cells Dynamic power: current switching capacitance P dyn =   C load  is the switching probability  = j (j is the logical level*) * Vanichayobon S, etc, “Power-speed Trade-off in Parallel Prefix Circuits”

15 ILP Formulation Overview Structure variables: GP cells Connections (wires) Physical positions Capacitance variables: Gate cap Vertical wire cap Horizontal wire cap Timing variables: Input arrival time Output arrival time Power Objective ILP ILOG CPLEX Optimal Solution

16 Integer Linear Programming (ILP) ILP: Linear Programming with integer variables. Difficulties and techniques: Constraints are not linear Linearize using pseudo linear constraints Search Space too large Reduce search space Search is slow Add redundant constraints to speedup

17 ILP – Integer Linear Programming Linear Programming: linear constraints, linear objective, fractional variables. Integer Linear Programming: Linear Programming with integer variables. LP Optimal Constraints ILP Optimal

ILP – Pseudo-Linear Constraint Minimize: x 3 Subject to: x 1  300 x 2  500 x 3 = min(x 1, x 2 ) Minimize: x 3 Subject to: x 1  300 x 2  500 x 3  x 1 x 3  x 2 x 3  x 1 – 1000 b 1 (1) x 3  x 2 – 1000 (1 – b 1 ) (2) b 1 is binary Problem: ILP formulation: LP objective: 0 ILP objective: 300 A constraint is called pseudo-linear if it’s not effective until some integer variables are fixed. Pseudo-linear constraints mostly arise from IF/ELSE scenarios binary decision variables are introduced to indicate true or false.

19 ILP Solver Search Procedure 0 Root (all vars are fractional) b1b1 b2b2 b3b3 b4b4 2 ‘0’‘1’ infeasible ‘0’‘1’ 3 feasible (current best) infeasible Minimize F(b 1, b 2, b 3, b 4, f 1,…) b i is binary It is VERY helpful if ILP objective is close to LP objective 2 ‘0’‘1’ 4 ‘0’‘1’ ‘0’ ‘1’ Cut 2 Bound (Smallest candidate)

Interval Adjacency Constraint (column id, logic level)

21 Linearization for Interval Adjacency Constraint Linearize Pseudo Linear Left interval bound equal to column index

22 Search Space Reduction Ling ’ s adder: separate odd and even bits Double the bit-width we are able to search

23 Redundant Constraints Cell (i,j) is known to have logic level j before wire connection Assume load is MinLoad (fanout=1 with minimum wire length): Cell (i,j) has a path of length j-1 Assume each cell along the path has MinLoad

24 Experiments – 16-bit Uniform Timing

25

26 Min-Power Radix-2 Adder (delay= 22, power = 45.5FO4 )

27 Min-Power Radix-2&4 Adder (delay=18, power = 29.75FO4 ) Radix-2 CellRadix-4 Cell

28 Min-Power Mixed-Radix Adder (delay=20, power = 28.0FO4) Radix-2 CellRadix-4 CellRadix-3 Cell

29 Experiments – 16-bit Non- uniform Time (Mixed Radix) ILP is able to handle non-uniform timings Ling adders are most superior in increasing arrival time – faster carries

Increasing Arrival Time (delay=35.5, power = 27.0FO4 ) 30

Decreasing Arrival Time (delay=34.5, power = 30.5FO4) 31

Convex Arrival Time (delay=35.9, power = 32.4FO4 ) 32

Increasing Required Time (delay=34.5, power = 30.5FO4) 33

Decreasing Required Time (delay=36.5, power = 32.5FO4) 34

Convex Required Time (delay=36.5, power = 32.5FO4) 35

36 Experiments – 64-bit Hierarchical Structure (Mixed-Radix) Handle high bit-width applications 16x4 and 8x8

37 Experiments – 64-bit Hierarchical Structure TSL: a 64-bit high-radix three-stage Ling adder V. Oklobdzija and B. Zeydel, “ Energy-Delay Characteristics of CMOS Adders ”, in High-Performance Energy-Efficient Microprocessor Design, pp , 2006

38 ASIC Implementation - Results 64-bit hierarchical design (mixed-radix) by ILP vs. fast carry look-ahead adder by Synopsys Design Compiler TSMC 90nm standard cell library was used

39 Future Work ILP formulation improvement Expected to handle 32 or 64 bit applications without hierarchical scheme Optimizing other computer arithmetic modules Comparator, Multiplier

40 Q & A Thank You!