© KLMH Lienig 1 Impact of Local Interconnects and a Tree Growing Algorithm for Post-Grid Clock Distribution Jiayi Xiao.

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Spring 08, Mar 11 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2008 Zero - Skew Clock Routing Vishwani D. Agrawal.
1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
Ch.7 Layout Design Standard Cell Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Praveen Venkataramani Suraj Sindia Vishwani D. Agrawal FINDING BEST VOLTAGE AND FREQUENCY TO SHORTEN POWER CONSTRAINED TEST TIME 4/29/ ST IEEE VLSI.
CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Topological Design of Clock.
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
Dual Voltage Design for Minimum Energy Using Gate Slack Kyungseok Kim and Vishwani D. Agrawal ECE Dept. Auburn University Auburn, AL 36849, USA IEEE ICIT-SSST.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Timing Analysis Timing Analysis Instructor: Dr. Vishwani D. Agrawal ELEC 7770 Advanced VLSI Design Team Project.
Design of Variable Input Delay Gates for Low Dynamic Power Circuits
Boosting: Min-Cut Placement with Improved Signal Delay Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Power-Aware Placement
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
Optimal Layout of CMOS Functional Arrays ECE665- Computer Algorithms Optimal Layout of CMOS Functional Arrays T akao Uehara William M. VanCleemput Presented.
Interconnect Optimizations
Iterative Algorithms for Low Power VLSI Placement Sadiq M. Sait, Ph.D Department of Computer Engineering King Fahd University of Petroleum.
VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Accurate Estimation of Global.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 11 – Design Concepts.
Power-Aware SoC Test Optimization through Dynamic Voltage and Frequency Scaling Vijay Sheshadri, Vishwani D. Agrawal, Prathima Agrawal Dept. of Electrical.
MASSOUD PEDRAM UNIVERSITY OF SOUTHERN CALIFORNIA Interconnect Length Estimation in VLSI Designs: A Retrospective.
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
TEMPLATE DESIGN © Gate-Diffusion Input (GDI) Technique for Low Power CMOS Logic Circuits Design Yerkebulan Saparov, Aktanberdi.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
1 Provided By: Ali Teymouri Based on article “Jaguar: A Next-Generation Low-Power x86-64 Core ” Coarse: Custom Implementation of DSP Systems University.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation Techniques for Fast.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri Advanced VLSI Course Presentation University of Tehran December.
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
26 th International Conference on VLSI January 2013 Pune,India Optimum Test Schedule for SoC with Specified Clock Frequencies and Supply Voltages Vijay.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.
ARCHER:A HISTORY-DRIVEN GLOBAL ROUTING ALGORITHM Muhammet Mustafa Ozdal, Martin D. F. Wong ICCAD ’ 07.
Recent Topics on Programmable Logic Array
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Area: VLSI Signal Processing.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
On the Topology of Wireless Sensor Networks Sen Yang, Xinbing Wang, Luoyi Fu Department of Electronic Engineering, Shanghai Jiao Tong University, China.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Spring 2014, Mar 17...ELEC 7770: Advanced VLSI Design (Agrawal)1 ELEC 7770 Advanced VLSI Design Spring 2014 Zero - Skew Clock Routing Vishwani D. Agrawal.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
NTU Confidential Introduction to the Applications of Asynchronous Circuits Presenter: Po-Chun Hsieh Advisor:Tzi-Dar Chiueh Date: 2003/09/22.
Tae- Hyoung Kim, Hanyong Eom, John Keane Presented by Mandeep Singh
Static Timing Analysis
1 Floorplanning of Pipelined Array (FoPA) Modules using Sequence Pairs Matt Moe Herman Schmit.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Dept. of Electronics Engineering & Institute of Electronics National Chiao Tung University Hsinchu, Taiwan ISPD’16 Generating Routing-Driven Power Distribution.
Dirk Stroobandt Ghent University Electronics and Information Systems Department Multi-terminal Nets do Change Conventional Wire Length Distribution Models.
6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)
NCTU, CS VLSI Information Processing Research Lab 研究生 : ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle  1/2: Chebyshev.
RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.
Buffer Insertion with Adaptive Blockage Avoidance
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
Circuit Design Techniques for Low Power DSPs
Presentation transcript:

© KLMH Lienig 1 Impact of Local Interconnects and a Tree Growing Algorithm for Post-Grid Clock Distribution Jiayi Xiao

© KLMH Lienig Paper Titles 2

© KLMH Lienig Impact of Interconnects on Timing in RLS/SDP Blocks  Local interconnects are believed to cause major impact on timing, power and routability in VLSI design  Interconnects generally affect the timing and power in the following ways  Wire delay  Cell delay degradation  Slope degradation  Delays due to the repeaters 3

© KLMH Lienig Simulation Environment  A 45nm IA processor (Nehalem)  86 RTL-to-layout synthesis (RLS) blocks  133 Semi-manually designed structured data path(SDP) blocks  Focus on the impact of local (intra-block) interconnects  No consideration for intra-standard-cell wire delay (Poly and M1) 4 Source:

© KLMH Lienig Wire Delay  The wire delay contributes to 6% and 5%, respectively, in RLS and SDP blocks 5 Figure 1: Wire-delay on the worst internal paths in (a) synthesized and in (b) semi-manually designed datapath blocks

© KLMH Lienig Cell Delay and Slope Degradation  Often known as secondary effects due to interconnects  Not secondary anymore in nanometer technology  It is 7% (13%) and 5% (9%) in both RLS and SDP blocks 6 Figure 2: Slack improvement percentage due to setting R and C of interconnects to 0 on the worst internal paths in (a) synthesized and in (b) semi-manually designed datapath blocks

© KLMH Lienig Repeater Count due to Interconnects  43% and 30% repeaters, respectively, in RLS and SDP blocks. 7 Figure 3: Number of repeaters vs. cell count in (a) synthesized and in (b) semi-manually designed datapath blocks

© KLMH Lienig Overall Interconnect Delay Impact Including Repeater Delays  Wires contribute to 33% and 20% of the delay in RLS and SDP blocks  Repeaters contribute to 21% and 10%, respectively 8 Figure 4: Repeater delay percentage vs. slack on the worst critical paths in (a) synthesized and in (b) semi-manually designed datapath blocks

© KLMH Lienig Power In Clock Interconnects and Buffers for RLS Blocks  The clock trees including sequentials contribute to 70% of total dynamic power  Only 20% of the cell count  Clock buffers and nets contribute 16% of the dynamic power  Only about 2% of the cell count 9 16% 14% 40% Figure 5: Distribution of dynamic power in clock trees inside synthesized blocks

© KLMH Lienig Power In Clock Interconnects and Buffers for SDP Blocks  The clock trees including sequentials contribute to 35% of total dynamic power  About 10% of the cell count 10 5% 6% 23% Figure 6: Distribution of dynamic power in clock trees inside SDP blocks

© KLMH Lienig Impact to Dynamic Power in Combinational Logic for RLS Blocks  The combinational logic dissipates about 27% of the dynamic power  The repeaters (44% of cell count) constitute 30% of dynamic power 11 32% 68% 30% 70% Figure 7: Distribution of dynamic power in combinational logic for synthesized blocks. (a) Distribution into combinational logic cells and interconnect. (b) Distribution of dynamic power in combinational logic cells into that in repeaters and other cells.

© KLMH Lienig Impact to Dynamic Power in Combinational Logic for SDP Blocks  The combinational logic dissipates about 65% of the dynamic power  The repeaters (27% of cell count) constitute 20% of dynamic power 12 47% 53% 20% 80% (a) (b) Figure 8: Distribution of dynamic power in combinational logic for SDP blocks. (a) Distribution into combinational logic cells and interconnect. (b) Distribution of dynamic power in combinational logic cells into that in repeaters and other cells.

© KLMH Lienig Conclusion  Local interconnects contribute to more that 1/3 of the delay in RLS blocks and about 1/5 in SDP blocks  RLS blocks have 13% more repeater than SDP blocks  Some secondary effects are not second order  Local interconnects contribute 27% and 35% to the dynamic power in RLS and SDP blocks  The percentage is even higher in clock trees  Power dissipation due to clock interconnects and buffers is a big problem 13

© KLMH Lienig Routing With Constraints for Post-Grid Clock Distribution 14

© KLMH Lienig Routing With Constraints for Post-Grid Clock Distribution  In high-performance microprocessors, clocks are distributed employing hybrid networks, containing global grid followed by buffered gated trees 15 Figure 9: (a) Global clock distribution to layout areas employing spines for grid. (b) Clock distribution from grid wires to sequentials in a block inside layout area

© KLMH Lienig Routing With Constraints for Post-Grid Clock Distribution  Clock routing from global grid to buffered gated local clock tree is known as post-grid clock distribution  Traditionally, this routing is carried out manually  Often employing the nearest source heuristic and up-sizing strategy  Very time-consuming and non-optimal 16 Figure 10: Global clock routing topology, with horizontal grid wires and tracks for routing in both directions. (b) Routing solution.

© KLMH Lienig Problem Formulation 17

© KLMH Lienig Tree Growing Algorithm 18

© KLMH Lienig Tree Growing Algorithm, Continued 19

© KLMH Lienig Tree Growing Algorithm, Continued 20

© KLMH Lienig Tree Growing Algorithm, Continued 21

© KLMH Lienig Tree Growing Algorithm, Continued 22 Figure 12: Comparison between (b) the nearest source heuristic and (c) tree-growing algorithm

© KLMH Lienig Algorithm Complexity and Delay Model 23

© KLMH Lienig Simulation Results 24

© KLMH Lienig Conclusion 25

© KLMH Lienig References [1] R. S. Shelar, “Routing With Constraints for Post-Grid Clock Distribution in Microprocessors,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol 29, no. 2, pp , Feb [2] R. S. Shelar and M. Patyra, “Impact of Local Interconnects on Timing and Power in a High Performance Microprocessor,” in Proc. Int. Symp. Physical Design (ISPD), Mar. 2010, pp [3] R. Kumar and G. Hinton. “A family of 45nm IA processors,” in International Solid-State Circuits Conference, pages 58–59, February