Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong

Slides:

Advertisements

Similar presentations

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.

Advertisements

OCV-Aware Top-Level Clock Tree Optimization

Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.

Yi-Lin Chuang1, Sangmin Kim2, Youngsoo Shin2, and Yao-Wen Chang National Taiwan University, Taiwan KAIST, Korea 2010 DAC.

Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao.

Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.

Timing Margin Recovery With Flexible Flip-Flop Timing Model

FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.

High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.

Post-Placement Voltage Island Generation for Timing-Speculative Circuits Rong Ye†, Feng Yuan†, Zelong Sun†, Wen-Ben Jone§ and Qiang Xu†‡

Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.

Power-Aware Placement

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

Fast and Area-Efficient Phase Conflict Detection and Correction in Standard-Cell Layouts Charles Chiang, Synopsys Andrew B. Kahng, UC San Diego Subarna.

Rewiring – Review, Quantitative Analysis and Applications Matthew Tang Wai Chung CUHK CSE MPhil 10/11/2003.

On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.

1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.

1 Integrating Logic Retiming and Register Placement Tzu-Chieh Tien, Hsiao-Pin Su, Yu-Wen Tsay Yih-Chih Chou, and Youn-Long Lin Department of Computer Science.

1 Application Specific Integrated Circuits. 2 What is an ASIC? An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized.

A Cost-Driven Lithographic Correction Methodology Based on Off-the-Shelf Sizing Tools.

Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,

Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,

Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.

Triple Patterning Aware Detailed Placement With Constrained Pattern Assignment Haitong Tian, Yuelin Du, Hongbo Zhang, Zigang Xiao, Martin D.F. Wong.

Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California,

Signal Integrity Methodology on 300 MHz SoC using ALF libraries and tools Wolfgang Roethig, Ramakrishna Nibhanupudi, Arun Balakrishnan, Gopal Dandu Steven.

On Timing- Independent False Path Identification Feng Yuan, Qiang Xu Cuhk Reliable Computing Lab, The Chinese University of Hong Kong ICCAD 2010.

CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.

-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.

A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.

CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.

TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.

A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.

A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.

-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.

FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.

Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang

A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine M.K. Tsiampas, D. Bountas, P. Merakos, N.E. Evmorfopoulos, S. Bantas.

ECO Timing Optimization Using Spare Cells Yen-Pin Chen, Jia-Wei Fang, and Yao-Wen Chang ICCAD2007, Pages ICCAD2007, Pages

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.

Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.

1 Compacting Test Vector Sets via Strategic Use of Implications Kundan Nepal Electrical Engineering Bucknell University Lewisburg, PA Nuno Alves, Jennifer.

Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.

An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,

System in Package and Chip-Package-Board Co-Design

High-Performance Global Routing with Fast Overflow Reduction Huang-Yu Chen, Chin-Hsiung Hsu, and Yao-Wen Chang National Taiwan University Taiwan.

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong ， Computer Science Department ， UCLA Presented.

-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,

SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.

6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)

Reducing Structural Bias in Technology Mapping

Jody Matos, Augusto Neutzling, Renato Ribas and Andre Reis

Delay Optimization using SOP Balancing

Standard-Cell Mapping Revisited

SAT-Based Optimization with Don’t-Cares Revisited

Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*

Timing Optimization.

Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow

Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli

Off-path Leakage Power Aware Routing for SRAM-based FPGAs

An Energy Efficient Two-Phase Clocking Scheme

Delay Optimization using SOP Balancing

Presentation transcript:

Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong Cliff Sze, Charles Alpert IBM Austin Research Center Burnet Road, Austin, TX {csze, Mountain-Mover: An Intuitive Logic Shifting Heuristic for Improving Timing Slack Violating Paths ASPDAC 2013

Outline Introduction The Mountain Mover Heuristic Preliminaries and problem definition Post-placement timing optimization using logic rewiring Experimental results Conclusion

Introduction Traditionally, logic synthesis is done prior to physical synthesis which includes placement, routing and placement- based timing optimization [2]. However, under modern sub-micron VLSI technology, wire delay dominates the total path delay. It is observed that for industrial designs many critical paths fail to close on timing even after physical synthesis such as path straightening, buffering, Vt optimization and gate sizing.

Introduction Increasing wire delay greatly increases the importance of the cell placement problem. As a result, placement engine has become the key player for timing optimization during physical synthesis of the design flow. In fact, multiple rounds of placement and timing optimization have to be iterated in order to identify true timing-critical paths which require an incremental placement engine to straighten them. Thus, post-placement logic synthesis is a must in order to further improve the path delay.

The Mountain Mover Heuristic Design a heuristic being able to make decisions based on the whole timing picture during this post-placement logic synthesis, which avoids lots of futile logic restructures and makes the flow efficient. We build a slack distribution graph to help guide logic rewiring technique to shift logic resource efficiently. A slack distribution graph is a mapping from the locations of circuit cells to their slack values.

The Mountain Mover Heuristic An example of such a graph of an industrial customer design created by an industrial EDA tool [13]

The Mountain Mover Heuristic

Preliminaries and problem definition A circuit delay is defined as the largest signal delay among all paths from any PI to any PO. The slack of a pin/gate is the difference (gap) between its arrival time and required arrival time. The slack of a path is the difference (gap) between the path delay and timing constraint. The slack of wire (i, j) is the worst slack between i and j. The worst negative slack (WNS) of the circuit is the largest violations among all slacks in the circuit.

Logic rewiring Logic rewiring [14] is a circuit restructuring technique referring to the process of replacing a certain wire(TW) by another wire(AW) without changing the functionality of the circuit. The addition and removal of wires are sometimes accompanied by changes in logic gates corresponding to the AW and TW.

An example of logic rewiring

Problem definition Given a logic netlist, its physical placement, and the timing constraints, the restructuring-based post-placement timing optimization problem is to maximize the worst slack of the circuit by (i) restructuring local circuit while preserving the functionality (ii) keeping the placement free of overlaps.

Post-placement timing optimization using logic rewiring The overall framework: First compute the slack of each gate/wire to construct the slack distribution graph and the slack mountain. Run the rewiring engine [11] to remove the target wire guided by the slack distribution graph. An overlapping-free incremental placement is performed after the logic restructuring to keep the placement legal. Run the STA to confirm the effectiveness of the selected AW in reducing WNS..

Rule of Thumb : Boundary First The key idea of our scheme is to figure out a local slack mountain around the most critical path and first shift the logic from boundaries of the slack mountain to avoid getting stuck at local minimum.

Shift Logic to Less Critical Area When we attempt to remove a target wire, the logic rewiring engine usually suggests more than one AWs to be chosen as the alternative logic. We use the slack distribution graph to guarantee the original logic is shifted from critical area to the less critical area.

Shift Logic to Less Critical Area = (xg, yg):the gradient value of TW in the slack distribution graph. = (dx, dy) : the distance vector between TW and AW.

Shift Logic to Less Critical Area The first inequality makes the alternative logic located in the less critical area. The second inequality ensures that the alternative logic is located along the gradient direction of TW.

Incremental Placement The incremental placement will place the replaced gate at the same location. If there is any overlap after the gate change, a new gate like AND2 or OR2 is inserted instead of replacing the original gate and the new gate is added into the circuit while keeping the placement free of overlap.

Slack Estimation We define the local worst negative slack (LWNS) and local total negative slack (LTNS) of a set of gates as below to help evaluate the AW candidates. pt(·) denotes a gate set in which each gate is a primary output and belongs to the fanout cone of a gate gi in G

Slack Estimation

We will adopt an AW if: 1. The LWNS of the rewired circuit is no worse than the original circuit. 2. The AW gains the most LTNS improvement among all AW candidates. The first condition guarantees that the delay of the rewired circuit is not degenerated. The second condition helps reducing the delay of all paths.

Shifting logic for timing optimization

Experimental result Initial placements were generated by Cadence SoC Encounter 10.1 Based on IWLS 2005 benchmark. The Cadence 180 nm generic library is used and the circuit timing is estimated by STA with the same library.

Experimental result

Conclusions We proposed a novel framework using logic rewiring to eliminate or improve slack violating paths. Extensive experiments upon IWLS2005 show that the scheme can improve 14.1% on circuit delays (averagely) and achieve 7X speedup compared to a recent work. We believe that the scheme can be integrated with other strategies including both logical and physical techniques to gain a further delay reduction.