A Semi-Persistent Clustering Technique for VLSI Circuit Placement Charles J. Alpert 1, Andrew Kahng 2, Gi-Joon Nam 1, Sherief Reda 2 and Paul G. Villarrubia.

Slides:



Advertisements
Similar presentations
Capo: Robust and Scalable Open-Source Min-cut Floorplacer Jarrod A. Roy, David A. Papa,Saurabh N. Adya, Hayward H. Chan, James F. Lu, Aaron N. Ng, Igor.
Advertisements

Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Natarajan Viswanathan Min Pan Chris Chu Iowa State University International Symposium on Physical Design April 6, 2005 FastPlace: An Analytical Placer.
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Placer Suboptimality Evaluation Using Zero-Change Transformations Andrew B. Kahng Sherief Reda VLSI CAD lab UCSD ECE and CSE Departments.
Intrinsic Shortest Path Length: A New, Accurate A Priori Wirelength Estimator Andrew B. KahngSherief Reda VLSI CAD Laboratory.
APLACE: A General and Extensible Large-Scale Placer Andrew B. KahngSherief Reda Qinke Wang VLSICAD lab University of CA, San Diego.
Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.
Minimum Spanning Tree Algorithms
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
Power-Aware Placement
Chapter 2 – Netlist and System Partitioning
Architecture and Details of a High Quality, Large-Scale Analytical Placer Andrew B. Kahng, Sherief Reda and Qinke Wang VLSI CAD Lab University of California,
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Reconfigurable Computing (EN2911X, Fall07)
Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La.
1 A Tale of Two Nets: Studies in Wirelength Progression in Physical Design Andrew B. Kahng Sherief Reda CSE Department University of CA, San Diego.
Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.
An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.
Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.
Circuit Simulation Based Obstacle-Aware Steiner Routing Yiyu Shi, Paul Mesa, Hao Yu and Lei He EE Department, UCLA Partially supported by NSF Career Award.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.
Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
Analytic Placement. Layout Project:  Sending the RTL file: −Thursday, 27 Farvardin  Final deadline: −Tuesday, 22 Ordibehesht  New Project: −Soon 2.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.
Session 10: The ISPD2005 Placement Contest. 2 Outline  Benchmark & Contest Introduction  Individual placement presentation  FastPlace, Capo, mPL, FengShui,
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Quadratic VLSI Placement Manolis Pantelias. General Various types of VLSI placement  Simulated-Annealing  Quadratic or Force-Directed  Min-Cut  Nonlinear.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 3 Partitioning Mustafa Ozdal Computer Engineering Department, Bilkent University Mustafa.
Optimality, Scalability and Stability study of Partitioning and Placement Algorithms Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department.
Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.
International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson.
Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department
RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.
CSE 373: Data Structures and Algorithms Lecture 21: Graphs V 1.
Shortest Path -Prim’s -Djikstra’s. PRIM’s - Minimum Spanning Tree -A spanning tree of a graph is a tree that has all the vertices of the graph connected.
SRAM Yield Rate Optimization EE201C Final Project
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
APLACE: A General and Extensible Large-Scale Placer
Iterative Deletion Routing Algorithm
Data Structures and Algorithms
Outline This topic covers Prim’s algorithm:
Design Hierarchy Guided Multilevel Circuit Partitioning
A Semi-Persistent Clustering Technique for VLSI Circuit Placement
Birch presented by : Bahare hajihashemi Atefeh Rahimi
CSE 373 Data Structures and Algorithms
(via graph coloring and spilling)
More Graphs Lecture 19 CS2110 – Fall 2009.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

A Semi-Persistent Clustering Technique for VLSI Circuit Placement Charles J. Alpert 1, Andrew Kahng 2, Gi-Joon Nam 1, Sherief Reda 2 and Paul G. Villarrubia 1 1 IBM Corp. 2 Department of CSE, UCSD

2 bigblue4 design from ISPD2005 Suite

3 Implications in Placement  Scalability  Tractability  Runtime vs. quality trade-off  SoC (System-on-Chip) designs  Mixed-size objects  White space

4 Problem Statement  What is the most effective and efficient clustering strategy for analytic placement?  Quality of solution  CPU time

5 Clustering Concept A D E F C B Cluster A with its “closest neighbor” A D E F C B AC D E F B Update the circuit netlist Clustering Score Function: d(u, v) =  w ij  conn(u,v) [ size(u) + size(v) ] k

6 Clustering Literature  Tremendous amounts of research here  Edge-Coarsening (EC)  First-Choice (FC)  Edge-Separability (ESC)  Peak-Clustering  Etc…  General drawbacks  Clique transformation  Edge weight discrepancy  Pass-based iteration  Lack of global clustering view

7 Best-Choice Clustering  Avoid clique transformation  Avoid pass-based iterations  More global view of clustering sequence  Priority-queue management  Lazy-update speed-up technique  Area-controlled balanced clustering

8 Best-Choice Clustering 1.Initialize the priority-queue PQ: - For each cell u: calculate its clustering score c with its closest neighbor v. - Insert the pair (u, v) into PQ based on their cost c. 2.Until the target cell number is reached: - Pick the top of the heap (m, n) - Cluster (m, n) into a new object mn; update the netlist - Calculate mn closest neighbor k; insert (mn, k) into PQ - Recalculate the clustering cost of all the neighbors to m and n

9 Best-Choice Example  Assume  N-pin net weight = 1 / (n-1)  Each object size = 1  Timing criticality is 1 for all nets E F C B D A

10 Best-Choice Example B=1/2 A=1/2 D=1 C=1 D=3/4 D=1/2 E F C B D A E F CD B A B=1/2 CD=2/3 B=2/3 F=1/2 E=1/2

11 Best-Choice Example E F BCD A BDC=3/8 F=1/2 A=3/8 E=1/2 EF BCD A BCD=3/10 A=3/8 BCD=3/8

12 Best-Choice Example EF ABCD EF=1/3 ABCD=1/3 ABCDEF  clustering_score = 2.875

13 Best-Choice Clustering Summary  Globally optimal clustering sequence via priority-queue data structure  Produce better quality of results  Clustering framework  Arbitrary clustering score function can be plugged in

14 Best-Choice Clustering  Clustering score distribution 1)First-choice (FC) :  clustering_score = )Best-choice (BC) :  clustering_score = (1) (2)

15 Lazy Update Speed-up Technique Priority Queue PQ Top of the PQ Node A Observations: 1.Node A might be updated a number of times before making it to the top of the PQ (if ever), but the last update is what determines its final position in PQ 2.Statistics indicate than in 96% of our updating steps, updating node A score pushes A down in PQ

16 Lazy Update Speed-up Technique Until the target cell number is reached: - Pick the top of the heap (m, n) - If (m, n) is invalid then - recalculate m closest neighbor n’ and insert (m, n’) in the heap else - Cluster (m, n) into a new object mn; update the netlist - Calculate mn closest neighbor k; insert (mn, k) in the heap - Mark all neighbors of m and n invalid Main Idea: Wait until A gets to the top of the priority-queue and then update its score if necessary

17 Lazy Update Runtime Charateristic Note: Practically no impact to solution quality

18 Experiments  IBM CPLACE  Analytic placement algorithm  Semi-persistent clustering paradigm  Up-front clustering  Selective unclustering during main global placement  Full unclustering before detailed placement  Order-of-magnitude reduction by clustering  Industrial ASIC designs  Size ranges from 56K to 880K placeable objects

19 Placement Results w/ Clustering  Average 4.3% WL improvement over EC  BC is x8.76 slower than EC

20 No Clustering vs. BC+Lazy Clustering WL(%)CPUCL-CPU% AL(270K)2.09% % BL(276K)-4.28% % CL(351K)3.27% % DL(426K)0.87% % EL(456K)1.59% % FL(880K)1.41% % AD(389K)8.23% % BD(285K)-0.34% % CD(56K)-0.36% % Avg.1.39% %

21 Conclusions  Globally optimal clustering sequence framework  Independent of clustering scoring function  Better clustering sequence  Allow significant placement speed-up  Almost no loss of quality of solution  Size control via clustering scoring function  Effective for dense design

22 Future Work  Handling fixed blocks during clustering  Ignoring nets connected to fixed objects  Ignoring pins connected to fixed objects  Including fixed blocks during clustering  Etc….  No visible improvement at the moment

23 Cluster Size Control Results StandardAutomatic MaxAvgWL%MaxAvgWL% AD BD CD d(u, v) =  w ij  conn(u,v) [ size(u) + size(v) ] k Standard : k = 1 Automatic: k =  size(u) + size(v) /   where  = expected avg. size