1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
OCV-Aware Top-Level Clock Tree Optimization
Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.
Native-Conflict-Aware Wire Perturbation for Double Patterning Technology Szu-Yu Chen, Yao-Wen Chang ICCAD 2010.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Constructing Minimal Spanning Steiner Trees with Bounded Path Length Presenter : Cheng-Yin Wu, NTUGIEE Some of the Slides in this Presentation are Referenced.
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Improved Algorithms for Link- Based Non-tree Clock Network for Skew Variability Reduction Anand Rajaram †‡ David Z. Pan † Jiang Hu * † Dept. of ECE, UT-Austin.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs Lei He UCLA Computer Science Department Los Angeles, CA.
ABSTRACT We consider the problem of buffering a given tree with the minimum number of buffers under load cap and buffer skew constraints. Our contributions.
Chapter 7 Reading on Moment Calculation. Time Moments of Impulse Response h(t) Definition of moments i-th moment Note that m 1 = Elmore delay when h(t)
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen (608)
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Gate Sizing by Mathematical Programming Prof. Shiyan Hu
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
RLC Interconnect Modeling and Design Students: Jinjun Xiong, Jun Chen Advisor: Lei He Electrical Engineering Department Design Automation Group (
Page 1 Department of Electrical Engineering National Chung Cheng University, Chiayi, Taiwan Power Optimization for Clock Network with Clock Gate Cloning.
VLSI Physical Design Automation
WISCAD – VLSI Design Automation GRIP: Scalable 3-D Global Routing using Integer Programming Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
ICCAD 2003 Algorithm for Achieving Minimum Energy Consumption in CMOS Circuits Using Multiple Supply and Threshold Voltages at the Module Level Yuvraj.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
ARCHER:A HISTORY-DRIVEN GLOBAL ROUTING ALGORITHM Muhammet Mustafa Ozdal, Martin D. F. Wong ICCAD ’ 07.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
A SAT-Based Routing Algorithm for Cross-Referencing Biochips Ping-Hung Yuh 1, Cliff Chiung-Yu Lin 2, Tsung- Wei Huang 3, Tsung-Yi Ho 3, Chia-Lin Yang 4,
A Stable Fixed-outline Floorplanning Method Song Chen and Takeshi Yoshimura Graduate School of IPS, Waseda University March, 2007.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Chris Chu Iowa State University Yiu-Chung Wong Rio Design Automation
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Xuanxing Xiong and Jia Wang Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois, United States November, 2011 Vectorless.
Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
CSE 248 Skew 1Kahng, UCSD 2011 CSE248 Spring 2011 Skew.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen ICCAD 99’ Embedded Tutorial Session 12A
Zero Skew Clock Routing ECE 556 Project Proposal John Thompson Kurt Ting Simon Wong.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
1 Chapter 5 Branch-and-bound Framework and Its Applications.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Buffer Insertion with Adaptive Blockage Avoidance
Reducing Clock Skew Variability via Cross Links
Zero Skew Clock tree Implementation
Clock Tree Routing With Obstacles
Presentation transcript:

1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University) University of Wisconsin-Madison

2 Outline  Background Motivation and contribution Literature overview  ClockTune algorithm Problem formulation ClockTune algorithm overview Optimality and complexity analysis  Experimental results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement

3 Motivation  Clock skew  cycle time penalty Start with zero-skew clock tree Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90])  Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) P = f CV 2 Minimize switching capacitance (wiring area)  Stability affects design convergence Allow incremental refinement to accommodate local changes  Interconnect delay dominates total delay Wire-sizing is effective in reducing interconnect delay

4 Motivation  Non-convex zero-skew constraints No known algorithm solves zero-skew wire-sizing problem optimally with polynomial runtime  Hence, a good clock tree wire-sizing algorithm can  Minimize delay and power  Guarantee optimality and runtime  Have good stability

5 Contribution  First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem  Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning  Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality)  Runtime v.s. Optimality tradeoff  Incremental clock re-balancing to speed up design convergence

6 Literature Overview  “ Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93] Iteratively optimize skew and delay using adjoint sensitivity analysis Aimed at reliable clock trees under process variation  Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] Bottom-up merging segment construction, top-down embedding  Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00] Handles simultaneous routing, buffer-insertion, and wire-sizing Merging segment set: a set of line samples of a merging region No optimality guarantee The size of MSS grows exponentially  “Process variation aware clock tree routing”, Lu, et al. [ISPD ’03] Based on DME/BST

7 Outline  Background Motivation and contribution Literature overview  ClockTune algorithm Problem formulation ClockTune algorithm overview Optimality and complexity analysis  Experimental results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement

8 Problem formulation  min-ZSWS (Zero Skew Wire Sizing) problem Given a clock routing minimize s.t. wherePi, Pj are paths from v to leaf nodes i and j  Zero-skew constraints are non-convex constraints No known algorithm solves the problem optimally in polynomial runtime

9 DC region approach  Clock Delay and wiring Capacitance are top concerns  Define f : R N  R 2, such that f Y (w) = Delay(T v (w)), f X (w) = Capacitance(T v (w)) DC region (  v ): The projection of the feasible region Choose a d-c pair from the DC region on R 2 DC region Feasible region

10 ClockTune algorithm overview  Phase 1: bottom-up construct DC regions for every node  Phase 2: top-down embedding after delay/power tradeoff

11 Optimality analysis  Embeddings not fall on the delay samples will be omitted Propagated error Delay sampling error Wire width sampling error (detailed in the paper)

12 Optimality analysis  Error is bounded  d : delay sampling resolution  w : wire width sampling resolution k, : Constants related to l, r 0, c 0, w m, w M …  Generally speaking, error reduced about a half when resolution doubled Error Resolution

13 Optimality runtime trade off  Control sampling resolution can trade off optimality with runtime and memory

14 Complexity analysis  Runtime Bottom-up phase takes O(n p max(p,q)) Top-down phase takes O(np) Overall: O(n p max(p,q))  Memory  O(np) where n : number of nodes of the clock tree, p : number of delay samples taken at each node q : number of wire width samples taken at each level-2 node

15 Outline  Background Motivation and contribution Related works problem formulation  ClockTune Algorithm Design space projection Algorithm overview Optimality and complexity analysis  Experimental Results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement

16 Experimental setup ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC  Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91]  Initial routing generated by BB+DME algorithm with minimum wire width w = 1  m  ClockTune uses w m = 1  m, w M = 4  m  p: number of delay samples taken at every node  q: number of wire width samples taken at every level-2 node  r 0 = 0.03, c 0 = 2  /  m 2

17 Runtime and memory usage  Runtime and memory usage are linear to problem size when p, q are fixed  Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB) p, q = 256# sink nodes# branchesRuntime (s)Memory (MB)Optimality r % r % r % r % r %

18 Optimality results  Optimality  Error below 1% with p=q=256  Error reduced to about a half when resolution doubled

19 Power/Delay trade-off r5 Capacitance Delay 0.2~1.1nF 5~150ns Minimum power Minimum delay 15:1 delay:power trade-off

20 Incremental refinement  DC region captures the design space Enables incremental refinement

21 Conclusion & Future Work  Provide a zero-skew clock tree wire-sizing algorithm which Minimizes delay and area ε-optimally Guarantees pseudo-polynomial runtime and memory usage Provides delay/power trade-off information to designers Speeds up design convergence by allowing clock tree re- balancing with minimum changes  Better delay model  Buffer insertion/sizing capability

22 Thank you !