We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byYasmine Sidell
Modified about 1 year ago
© KLMH Lienig 1 Impact of Local Interconnects and a Tree Growing Algorithm for Post-Grid Clock Distribution Jiayi Xiao
© KLMH Lienig Paper Titles 2
© KLMH Lienig Impact of Interconnects on Timing in RLS/SDP Blocks Local interconnects are believed to cause major impact on timing, power and routability in VLSI design Interconnects generally affect the timing and power in the following ways Wire delay Cell delay degradation Slope degradation Delays due to the repeaters 3
© KLMH Lienig Simulation Environment A 45nm IA processor (Nehalem) 86 RTL-to-layout synthesis (RLS) blocks 133 Semi-manually designed structured data path(SDP) blocks Focus on the impact of local (intra-block) interconnects No consideration for intra-standard-cell wire delay (Poly and M1) 4 Source:
© KLMH Lienig Wire Delay The wire delay contributes to 6% and 5%, respectively, in RLS and SDP blocks 5 Figure 1: Wire-delay on the worst internal paths in (a) synthesized and in (b) semi-manually designed datapath blocks
© KLMH Lienig Cell Delay and Slope Degradation Often known as secondary effects due to interconnects Not secondary anymore in nanometer technology It is 7% (13%) and 5% (9%) in both RLS and SDP blocks 6 Figure 2: Slack improvement percentage due to setting R and C of interconnects to 0 on the worst internal paths in (a) synthesized and in (b) semi-manually designed datapath blocks
© KLMH Lienig Repeater Count due to Interconnects 43% and 30% repeaters, respectively, in RLS and SDP blocks. 7 Figure 3: Number of repeaters vs. cell count in (a) synthesized and in (b) semi-manually designed datapath blocks
© KLMH Lienig Overall Interconnect Delay Impact Including Repeater Delays Wires contribute to 33% and 20% of the delay in RLS and SDP blocks Repeaters contribute to 21% and 10%, respectively 8 Figure 4: Repeater delay percentage vs. slack on the worst critical paths in (a) synthesized and in (b) semi-manually designed datapath blocks
© KLMH Lienig Power In Clock Interconnects and Buffers for RLS Blocks The clock trees including sequentials contribute to 70% of total dynamic power Only 20% of the cell count Clock buffers and nets contribute 16% of the dynamic power Only about 2% of the cell count 9 16% 14% 40% Figure 5: Distribution of dynamic power in clock trees inside synthesized blocks
© KLMH Lienig Power In Clock Interconnects and Buffers for SDP Blocks The clock trees including sequentials contribute to 35% of total dynamic power About 10% of the cell count 10 5% 6% 23% Figure 6: Distribution of dynamic power in clock trees inside SDP blocks
© KLMH Lienig Impact to Dynamic Power in Combinational Logic for RLS Blocks The combinational logic dissipates about 27% of the dynamic power The repeaters (44% of cell count) constitute 30% of dynamic power 11 32% 68% 30% 70% Figure 7: Distribution of dynamic power in combinational logic for synthesized blocks. (a) Distribution into combinational logic cells and interconnect. (b) Distribution of dynamic power in combinational logic cells into that in repeaters and other cells.
© KLMH Lienig Impact to Dynamic Power in Combinational Logic for SDP Blocks The combinational logic dissipates about 65% of the dynamic power The repeaters (27% of cell count) constitute 20% of dynamic power 12 47% 53% 20% 80% (a) (b) Figure 8: Distribution of dynamic power in combinational logic for SDP blocks. (a) Distribution into combinational logic cells and interconnect. (b) Distribution of dynamic power in combinational logic cells into that in repeaters and other cells.
© KLMH Lienig Conclusion Local interconnects contribute to more that 1/3 of the delay in RLS blocks and about 1/5 in SDP blocks RLS blocks have 13% more repeater than SDP blocks Some secondary effects are not second order Local interconnects contribute 27% and 35% to the dynamic power in RLS and SDP blocks The percentage is even higher in clock trees Power dissipation due to clock interconnects and buffers is a big problem 13
© KLMH Lienig Routing With Constraints for Post-Grid Clock Distribution 14
© KLMH Lienig Routing With Constraints for Post-Grid Clock Distribution In high-performance microprocessors, clocks are distributed employing hybrid networks, containing global grid followed by buffered gated trees 15 Figure 9: (a) Global clock distribution to layout areas employing spines for grid. (b) Clock distribution from grid wires to sequentials in a block inside layout area
© KLMH Lienig Routing With Constraints for Post-Grid Clock Distribution Clock routing from global grid to buffered gated local clock tree is known as post-grid clock distribution Traditionally, this routing is carried out manually Often employing the nearest source heuristic and up-sizing strategy Very time-consuming and non-optimal 16 Figure 10: Global clock routing topology, with horizontal grid wires and tracks for routing in both directions. (b) Routing solution.
© KLMH Lienig Problem Formulation 17
© KLMH Lienig Tree Growing Algorithm 18
© KLMH Lienig Tree Growing Algorithm, Continued 19
© KLMH Lienig Tree Growing Algorithm, Continued 20
© KLMH Lienig Tree Growing Algorithm, Continued 21
© KLMH Lienig Tree Growing Algorithm, Continued 22 Figure 12: Comparison between (b) the nearest source heuristic and (c) tree-growing algorithm
© KLMH Lienig Algorithm Complexity and Delay Model 23
© KLMH Lienig Simulation Results 24
© KLMH Lienig Conclusion 25
© KLMH Lienig References  R. S. Shelar, “Routing With Constraints for Post-Grid Clock Distribution in Microprocessors,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol 29, no. 2, pp , Feb  R. S. Shelar and M. Patyra, “Impact of Local Interconnects on Timing and Power in a High Performance Microprocessor,” in Proc. Int. Symp. Physical Design (ISPD), Mar. 2010, pp  R. Kumar and G. Hinton. “A family of 45nm IA processors,” in International Solid-State Circuits Conference, pages 58–59, February
STRUCTURED ASIC’s Dan Lander Haru Yamamoto Shane Erickson (EE 201A Spring 2004)
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
EE201 Spring 2004 Group: Wilber L. Duran Duo (Steve) Liu Multilevel Routing.
Ch. 12 Routing in Switched Networks Routing in Circuit Switched Networks Routing –The process of selecting the path through the switched network.
Slide_1 A Survey of Architectural Design and Implementation Tradeoffs in Network on Chip Systems Dan Marconett Next-Generation Networking Systems Lab University.
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Computer Organization, Bus Structure. Bus Structure A communication pathway connecting two or more devices When a word of data is transferred between.
Some Unsolved Mathematical Problems in Systems Area Networking Mark Stewart M elbourne O perations Re search.
Networks-on-Chip. Seminar contents The Premises Homogenous and Heterogeneous Systems- on-Chip and their interconnection networks The Network-on-Chip.
A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance Found in: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Notes: Multiprocessors (updated version)
Analysis of Floorplanning Algorithm in EDA Tools By: RENISHKUMAR V. LADANI M.TECH-2003, DA-IICT GANDHINAGAR GUIDE: PROF. ASHOK AMIN CO-GUIDE: PROF. AMIT.
Multipath Routing for Video Delivery over Bandwidth-Limited Networks S.-H. Gary Chan Jiancong Chen Department of Computer Science Hong Kong University.
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'',
EE 201A/EE298 Modeling and Optimization for VLSI Layout Instructor: Lei He
Networks on Chip: Router Microarchitecture & Network Topologies Daniel U. Becker, James Chen, Nan Jiang, Prof. William J. Dally Concurrent VLSI Architecture.
SCSC 311 Information Systems: hardware and software.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC.
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los.
CSE 413: Computer Networks Md. Kamrul Hasan Assistant Professor and Chairman Dept. of Computer and Communication Engineering Patuakhali Science and Technology.
Lecture 4: Direct and Indirect Interconnection Networks for Distributed- Memory Multiprocessors Shantanu Dutt Univ. of Illinois at Chicago.
Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
A Comparison of Mechanisms for Improving TCP Performance over Wireless Links Published In IEEE/ACM TRANSACTIONS ON NETWORKING, VOL.5 NO.6,DECEMBER 1997.
Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
Position-based Routing in Ad Hoc Networks Brad Stephenson A presentation submitted in partial fulfillment of the requirements of the course ECSE 6962.
Scalable Routing In Delay Tolerant Networks Mohammad Reza Faghani 1.
1. Interconnection Networks ◦ Terminology, basics, examples Optical Interconnects ◦ Motivation ◦ Architecture examples for Data Centers and HPC ◦
© 2016 SlidePlayer.com Inc. All rights reserved.