1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.

Slides:



Advertisements
Similar presentations
Explicit Gate Delay Model for Timing Evaluation Muzhou Shao : University of Texas at Austin D.F.Wong : U. of Illinois at Urbana- Champaign Huijing Cao.
Advertisements

Topics Electrical properties of static combinational gates:
OCV-Aware Top-Level Clock Tree Optimization
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.
ECE 667 Synthesis and Verification of Digital Systems
Design of Variable Input Delay Gates for Low Dynamic Power Circuits
Modern VLSI Design 2e: Chapter4 Copyright  1998 Prentice Hall PTR.
Power-Aware Placement
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.
Logical Effort.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
A Probabilistic Method to Determine the Minimum Leakage Vector for Combinational Designs Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri Department of.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 Logical Effort - sizing for speed.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
Design Of Combinational Logic Circuits
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
Overview Part 1 – Design Procedure 3-1 Design Procedure
Practical Aspects of Logic Gates COE 202 Digital Logic Design Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum.
03/30/031 ECE 551: Digital System Design & Synthesis Lecture Set 9 9.1: Constraints and Timing 9.2: Optimization (In separate file)
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 11 – Design Concepts.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Power Reduction for FPGA using Multiple Vdd/Vth
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Combinational network delay. n Logic optimization.
Optimal digital circuit design Mohammad Sharifkhani.
Logical Effort and Transistor Sizing Digital designs are usually expected to operate at high frequencies, thus designers often have to choose the fastest.
Introduction to CMOS VLSI Design Lecture 5: Logical Effort GRECO-CIn-UFPE Harvey Mudd College Spring 2004.
ECO Timing Optimization Using Spare Cells Yen-Pin Chen, Jia-Wei Fang, and Yao-Wen Chang ICCAD2007, Pages ICCAD2007, Pages
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
4. Combinational Logic Networks Layout Design Methods 4. 2
Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
04/06/031 ECE 551: Digital System Design & Synthesis Lecture Set 9 9.1: Constraints and Timing (In separate file) 9.2: Optimization - Part 1 9.3: Optimization.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.
Logic synthesis flow Technology independent mapping –Two level or multilevel optimization to optimize a coarse metric related to area/delay Technology.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Combinational network delay. n Logic optimization.
Static Timing Analysis
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Reducing Structural Bias in Technology Mapping
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
Delay Optimization using SOP Balancing
Timing Analysis 11/21/2018.
Lesson 8: Analog Signal Conversion
Chapter 3 – Combinational Logic Design
ECE 667 Synthesis and Verification of Digital Systems
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Improvements in FPGA Technology Mapping
Delay Optimization using SOP Balancing
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Fast Min-Register Retiming Through Binary Max-Flow
CS137: Electronic Design Automation
Presentation transcript:

1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000

2 Overview Problem statement Problem statement Proposed solution Proposed solution –Constant delay paradigm –Zero-slack algorithm Implementation Implementation –Incorporation into SIS –Library characterization –Results Conclusions Conclusions Future Work Future Work

3 Problem Statement Given a boolean network, estimate the area if implemented with particular required time constraints Given a boolean network, estimate the area if implemented with particular required time constraints –Estimation should be fast and reasonably accurate Examine how technology independent logic optimization affects the estimation Examine how technology independent logic optimization affects the estimation

4 Area/Delay Models Constant area (traditional) model Constant area (traditional) model –Composed of discretely sized gates with constant area –Mapping involves calculating delay as a function of load Constant delay model Constant delay model –Composed of mathematical functions relating area to size –Mapping involves calculating size (area) as a function of load ND2X1 Area = constant from library Size = constant from library Delay = d int + k*C L Constant Area Model CLCL ND2 Area = A int + A slope *size Size = k*C L /(Delay – d int ) Delay = constant Constant Delay Model CLCL

5 Zero Slack Algorithm Given input arrival times {ai} and output required time {rk}, assign gate delays as follows: 1.Initialize all internal required/arrival times to “unknown” 2.Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the path in number of gates 1.For each node from primary inputs to primary outputs 1. Calculate all the (ai, li) pairs from all fanin edges 2. Discard dominated pairs, save the union of the undominated pairs 2.When all primary outputs are reached, calculate minimum (rk-ai)/lp 3.Assign delay of each gate in the selected path(s) to this minimum 4.Update arrival and required times for all fi and fo edges of newly assigned delays 5.Repeat steps 2-4 until all gates are assigned delays n1 n2 n n3 n4 a1 a2 r3 r4 l4 l3 l2 l1 Pair (ai, li) dominates (aj, lj) if ai  aj and li  lj If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary. Pair domination defined:

6 Faster Approximation Select an allowable slack threshold s thresh (if zero then algorithm yields same result as previous) 1.Compute the forward level l j and arrival time a j of all nodes in network using a forward trace 2.Compute the reverse level k j and required time r j of all nodes in network using a backward trace 3.Update the delay of every node as d j = d j + (r j -a j )/(l j +k j ) 4.While the slack of any node exceeds s thresh then repeat steps 1-3.

7 Incorporation into SIS read_library Tech. lib. Manual analysis Est. lib. read_estim BLIF net. read_blif Tech. independent optimization: script.algebraic, script.boolean, etc Tech. dependent optimization: map Fast delay/area estimation: estimate Area Area/delay tradeoff curve

8 Library Characterization Commercial standard cell library have possibly multiple gates that implement the same equation Commercial standard cell library have possibly multiple gates that implement the same equation Each gate in the library has characteristics: Each gate in the library has characteristics: –Size –Delays from all input pins to the output pin for all transitions and several loads –Capacitance for all input pins –Maximum load –Area We need estimation parameters for each class of gates (ie. gates with the same equation): We need estimation parameters for each class of gates (ie. gates with the same equation): –Intrinsic gate delay (d int ) –Drive factor (k) –Area line y-intercept (A int ) –Area line slope (A slope ) –Input capacitance line y-intercept (c int ) –Input capacitance line slope (c slope )

9 Inverter Characterization (1) Inverter delay scales linearly with load/size Inverter delay scales linearly with load/size –Slope is k –Y-intercept is d int

10 Inverter Characterization (2) Inverter area scales linearly with size Inverter area scales linearly with size –Slope is A slope –Y-intercept is A int

11 Characterization Issues Requires at least two gates per class in the library Requires at least two gates per class in the library Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination) Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination) Further research shows the reason is CMOS implementation (below) Further research shows the reason is CMOS implementation (below) Future work might replace linear model with piece-wise linear model for more accuracy Future work might replace linear model with piece-wise linear model for more accuracy NAND-gate CMOS schematic for smaller sizes NAND-gate CMOS schematic for larger sizes

12 Estimation Library These issues are evident in the table These issues are evident in the table –OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area –NOR3, NOR4 had poor coefficients of determination –Many gates in the library had only one size

13 Estimation Modes Sweep mode Sweep mode –User specifies a range of required times to sweep (possibly only one) and a step size –Estimation starts with the largest required time and steps down until network fails the zero slack algorithm (ie. negative slack is encountered) Binary search mode Binary search mode –Used to find the minimum possible required time (period) given infinite area –Starts at a user-specified maximum and performs a binary search until a pass limit is reached

14 Experimentation Various sized combinational logic benchmarks Various sized combinational logic benchmarks –MCNC c17, c880, c1908, c3540 Various sized sequential logic benchmarks Various sized sequential logic benchmarks –Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously) –MCNC s713, s838, s953, s1196, s1238, s1423 Tested four scripts Tested four scripts –script.none (no optimization), script.algebraic, script.boolean, script.rugged

15 Tradeoff Curves Sweep mode allows multiple required times (clock periods) to be easily tabulated Sweep mode allows multiple required times (clock periods) to be easily tabulated

16 Sensitivity to Optimization Script When delay is non-critical (ie. as required time approaches infinity) When delay is non-critical (ie. as required time approaches infinity) –Area within 20% of no optimization –Variation between optimization scripts mostly under 10%

17 Conclusions Sometimes more optimization yields worse results Sometimes more optimization yields worse results As required times become smaller, more paths become critical requiring larger sizes (area) As required times become smaller, more paths become critical requiring larger sizes (area) –Area increases quickly before failure From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times

18 Possible Future Work Accuracy Accuracy –Relate estimated areas to actual areas from a good mapping using the full technology library –Use more complex delay equations to handle different rise/fall times –Modify the algorithm to handle the case where a primary input cannot drive the required load Characterization Characterization –Revise characterization to support piece-wise linear functional forms –Automate process so only the actual technology library is required as an input Mapping Mapping –Examine how various mapping options affect estimation –Use buffered fanout trees (Touati) after sizing gates Speed Speed –Compare speed of total estimation procedure to traditional flow Power estimation Power estimation