Placement and Routing Algorithms. 2 FPGA Placement & Routing.

Slides:



Advertisements
Similar presentations
ECE 667 Synthesis and Verification of Digital Circuits
Advertisements

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
Placement 1 Outline Goal What is Placement? Why Placement?
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Reconfigurable Computing (EN2911X, Fall07)
Lecture 4: FPGA Placement September 12, 2013 ECE 636 Reconfigurable Computing Lecture 4 FPGA Placement.
Simulated Annealing 10/7/2005.
Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.
Cost-Based Tradeoff Analysis of Standard Cell Designs Peng Li Pranab K. Nag Wojciech Maly Electrical and Computer Engineering Carnegie Mellon University.
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Authors: Jia-Wei Fang,Chin-Hsiung Hsu,and Yao-Wen Chang DAC 2007 speaker: sheng yi An Integer Linear Programming Based Routing Algorithm for Flip-Chip.
Power Reduction for FPGA using Multiple Vdd/Vth
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Julien Lamoureux and Steven J.E Wilton ICCAD
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
FPGA CAD 10-MAR-2003.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 16, 2011 Placement II (Simulated Annealing)
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
Scientific Research Group in Egypt (SRGE)
VLSI Physical Design Automation
Partial Reconfigurable Designs
Heuristic Optimization Methods
The minimum cost flow problem
HeAP: Heterogeneous Analytical Placement for FPGAs
Algorithm Analysis CSE 2011 Winter September 2018.
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Incremental Placement Algorithm for Field Programmable Gate Arrays
Algorithm Analysis (not included in any exams!)
ESE535: Electronic Design Automation
Placement and Routing With Congestion Control
Data and Computer Communications
CS 201 Fundamental Structures of Computer Science
FPGA Glitch Power Analysis and Reduction
ESE535: Electronic Design Automation
Algorithms for Budget-Constrained Survivable Topology Design
More on Search: A* and Optimization
Routing Algorithms.
Register-Transfer (RT) Synthesis
More on HW 2 (due Jan 26) Again, it must be in Python 2.7.
Greg Knowles ECE Fall 2004 Professor Yu Hu Hen
Fast Min-Register Retiming Through Binary Max-Flow
CprE / ComS 583 Reconfigurable Computing
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Under a Concurrent and Hierarchical Scheme
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Placement and Routing Algorithms

2 FPGA Placement & Routing

3 Placement Placement:  Defines the amount of interconnect in the design −which now becomes the bottleneck of circuit performance  Large impact on performance and routability −Even worse in FPGA due to switches

4 Placement Impact on Routing

5 Simulated Annealing [Bazargan]

6 Simulated Annealing

7 Search Space Cost

8 Simulated Annealing

9 Temperature Reduction Function

10 Cost Decrease

11 Accepted Moves

12 SA Paramters The quality of results is highly dependent on parameters:  Initial Temperature  Final Temperature  Inner Loop Criterion  Cooling Schedule  Move Function  Cost Function

13 Placement Algorithm: VPR VPR:  Uses simulated annealing:  Temperature updating:  Decreases the temperature faster when the move acceptance rate is very high or very low −  Spends more time at the most productive temperature region (i.e. when a significant portion of moves, but not all, are being accepted).  A linear congestion model −  Cost function can be computed as efficiently as the traditional half-perimeter bounding box

14 Timing-Driven VPR Timing-driven VPR:  Adds timing cost term to the objective function  Timing cost = summation of the delay times and the timing criticality over all connections in the design  Criticality of a net: −depends on its timing slack.  Every accepted move may change delays in a number of connections −  Change the slack distribution −  Static timing analysis is required to recompute all the slacks after each accepted move −  Too costly (i.t.o. run time) −Update slacks after a number of moves (typically at the end of each temperature iteration)

15 Timing-Driven VPR Timing-driven VPR:  Two terms in objective function  Careful scaling  Self-normalization: −Changes of timing cost and wirelength of a move are scaled by the total timing cost and total wirelength at the end of the previous temperature iteration  Very good results  Widely used in the FPGA research community Problem:  TD VPR minimizes the weighted delays of ALL connections where the weight of a connection depends on its slack.  Ignores important path sharing: −A connection appearing in many critical paths should be given a higher weight.  Difficult to capture: −An exponential number of paths going through a connection, each with a different timing criticality

16 PathFinder/VPR A new enhancement:  Ken Eguro Scott Hauck, “Enhancing Timing-Driven FPGA Placement for Pipelined Netlists,” DAC, 2008.

A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong, ICCAD 2002

18 Net-Weighting Approaches Net weighting for timing-driven placement  Very popular in industry and academia Advantages:  Low complexity  High flexibility  Ease of implementation

19 Net-Weighting Approaches Basic idea:  Put a higher weight for nets that are more timing critical  A good net weighting algorithm should assign net weights based on path analysis.  Can be used with any length-based placement algorithm Results:  Compared with the weighting algorithm of VPR  Longest path delay reduction: −up to 38.8% −Average: 15.6%  no runtime overhead  only a 4.1% increase in total wirelength

20 Net-Weighting Approaches Timing-driven placement:  Usually employs a static timing analysis engine to compute the delay of the longest path in a circuit.  The results are then used by the placement optimization engine to minimize the longest path delay. Longest path delay:  by computing the arrival time, iteratively: s1 s2 s3 s4 t

21 Net-Weighting Approaches Main Idea:  If two critical paths share a common segment, the edges in the common segment should receive higher weights.  Path counting is a general way to assign net weights with consideration of such effect. Two path counting algorithm:  Critical Path Counting  Accurate All Path Counting

22 Critical Path Counting After timing analysis:  Compute the total number of critical paths passing each edge For each pin, define two variables:  F(p): the number of different critical paths starting from PI elements, terminating at p.  B(p): the number of different critical paths starting from PO elements, terminating at p, (if all signal flow directions are reversed).

23 Critical Path Counting  The number of different critical paths passing through an edge (s,t) : − GP(s,t) = F(s) * B(t)  Weights are assigned to nets proportional to GP(s,t).

24 Critical Path Counting Example F(t) computations: PI PO

25 Critical Path Counting Example B(t) computations: PI PO

26 Pseudo code

27 Critical Path Counting Example  Selected edge share with 2 path  GP(s,t) = F(s) * B(t) = 2*2=4 s t PI PO

28 Accurate All Path Counting Define two variables:  Forward local slack for edge e(s,t):  Backward local slack for edge e(s,t): Define a discount function:  a is a constant.

29 Example PI

30 Discount function Objective: put smaller weight on path with larger slack 1 1/a x D

31 Accurate All Path Counting F(t) computations: t s1 s2 s3 s4 ARR=1.2 ARR= 1.4 ARR= 1.3 ARR= ARR= 2.8 F s (s1,t) = =0.6 F s (s2,t) = =0.4 F s (s3,t) = =0 F s (s4,t) = =0 D1=0.959 D2=0.972 D3=1 D4=1 F(t) =D1*F(s1)+ D2*F(s2)+ D3*F(s3)+ D4*F(s4) Assume: a=2 T=10 D = a -Fs/T Fs (s1,t) is larger, thus D1 is lower

32 Pseudo code:

33 References  [Chen06] D. Chen, J. Cong and P. Pan, “FPGA Design Automation: A Survey,” Foundations and Trends in Electronic Design Automation, Vol. 1, No. 3 (2006) 195–330.  Kong, “A Novel Net Weighting Algorithm for Timing- Driven Placement,” ICCAD  [Bazargan] K. Bazargan, Lecture slides.