EXPLORING HIGH THROUGHPUT COMPUTING PARADIGM FOR GLOBAL ROUTING Yiding Han, Dean Michael Ancajas, Koushik Chakraborty, and Sanghamitra Roy Electrical and.

Slides:

Advertisements

Similar presentations

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.

Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao.

Wen-Hao Liu1, Yih-Lang Li, and Cheng-Kok Koh Department of Computer Science, National Chiao-Tung University School of Electrical and Computer Engineering,

Natarajan Viswanathan Min Pan Chris Chu Iowa State University International Symposium on Physical Design April 6, 2005 FastPlace: An Analytical Placer.

A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.

Shuai Li and Cheng-Kok Koh School of Electrical and Computer Engineering, Purdue University West Lafayette, IN, Mixed Integer Programming Models.

Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.

Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.

© KLMH Lienig Multi-Threaded Collision Aware Global Routing Bounded Length Maze Routing.

MCFRoute: A Detailed Router Based on Multi- Commodity Flow Method Xiaotao Jia, Yici Cai, Qiang Zhou, Gang Chen, Zhuoyuan Li, Zuowei Li.

1 BoxRouter: A New Global Router Based on Box Expansion and Progressive ILP Minsik Cho and David Z. Pan ECE Dept. Univ. of Texas at Austin DAC 2006, July.

Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.

A Resource-level Parallel Approach for Global-routing-based Routing Congestion Estimation and a Method to Quantify Estimation Accuracy Wen-Hao Liu, Zhen-Yu.

Metal Layer Planning for Silicon Interposers with Consideration of Routability and Manufacturing Cost W. Liu, T. Chien and T. Wang Department of CS, NTHU,

POLAR 2.0: An Effective Routability-Driven Placer Chris Chu Tao Lin.

SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.

WISCAD – VLSI Design Automation GRIP: Scalable 3-D Global Routing using Integer Programming Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer.

Chih-Hung Lin, Kai-Cheng Wei VLSI CAD 2008

Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.

MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD

A Topology-based ECO Routing Methodology for Mask Cost Minimization Po-Hsun Wu, Shang-Ya Bai, and Tsung-Yi Ho Department of Computer Science and Information.

Area-I/O Flip-Chip Routing for Chip-Package Co-Design Progress Report 方家偉、張耀文、何冠賢 The Electronic Design Automation Laboratory Graduate Institute of Electronics.

Authors: Jia-Wei Fang,Chin-Hsiung Hsu,and Yao-Wen Chang DAC 2007 speaker: sheng yi An Integer Linear Programming Based Routing Algorithm for Flip-Chip.

A Parallel Integer Programming Approach to Global Routing Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer Engineering Jeffrey Linderoth.

Global Routing. Global routing:  Sequential −One net at a time  Concurrent −Order-independent −ILP 2.

CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.

Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.

1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.

Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.

Archer: A History-Driven Global Routing Algorithm Mustafa Ozdal Intel Corporation Martin D. F. Wong Univ. of Illinois at Urbana-Champaign Mustafa Ozdal.

UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.

ARCHER:A HISTORY-DRIVEN GLOBAL ROUTING ALGORITHM Muhammet Mustafa Ozdal, Martin D. F. Wong ICCAD ’ 07.

Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

A Negotiated Congestion based Router for Simultaneous Escape Routing Q.Ma, T.Yan and Martin D.F. Wong Department of Electrical and Computer Engineering.

ILP-Based Inter-Die Routing for 3D ICs Chia-Jen Chang, Pao-Jen Huang, Tai-Chen Chen, and Chien-Nan Jimmy Liu Department of Electrical Engineering, National.

Parallel Routing for FPGAs based on the operator formulation

Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.

Static Process Scheduling

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

FPGA CAD 10-MAR-2003.

High-Performance Global Routing with Fast Overflow Reduction Huang-Yu Chen, Chin-Hsiung Hsu, and Yao-Wen Chang National Taiwan University Taiwan.

Outline Motivation and Contributions Related Works ILP Formulation

Congestion Analysis for Global Routing via Integer Programming Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer.

1 ROAD : An Order-Impervious Optimal Detailed Router Hasan Arslan, Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago ICCD 2003.

Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.

XGRouter: high-quality global router in X-architecture with particle swarm optimization Frontiers of Computer Science, 2015, 9(4):576–594 Genggeng LIU,

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

VLSI Physical Design Automation

Parallel Programming By J. H. Wang May 2, 2017.

Parallel Algorithm Design

Multi-Commodity Flow Based Routing

2 University of California, Los Angeles

Verilog to Routing CAD Tool Optimization

Chin Hau Hoo, Akash Kumar

CSE8380 Parallel and Distributed Processing Presentation

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Presentation transcript:

EXPLORING HIGH THROUGHPUT COMPUTING PARADIGM FOR GLOBAL ROUTING Yiding Han, Dean Michael Ancajas, Koushik Chakraborty, and Sanghamitra Roy Electrical and Computer Engineering, Utah State University

Outline  Introduction  Overview of GPU-CPU global routing  Enabling net level parallelism in gloval routing  Scheduler  Implementation  Results  Conclusion

Introduction  Global routing problem (GRP) is one of the most computationally intensive processes in VLSI design.  Sequential global routing  maze routing  negotiation- based rip-up and reroute  Concurrent global routing  Integer Linear Programming (ILP)

Overview of GPU-CPU global routing  Objective  First is the minimization of the sum of overflows among all edges.  Second is to minimize the total wirelength of routing all the nets  Third is the minimization of the total runtime needed to obtain a solution

GPU-CPU global routing design flow

Overview of GPU-CPU global routing  Global Routing Parallelization  The negotiation-based RRR scheme is strongly dependent on the routing order  Routing nets simultaneously regardless of their order might cause degradation in solution quality and performance.  Examining the dependencies among nets, and extracting concurrent nets from the ordered task queue

Enabling net level parallelism in global routing  Two main approaches in parallelizing the global routing problem  The routing map can be partitioned into smaller regions and solved in a bottom-up approach using parallel threads  Individual nets can be divided across many threads and routed simultaneously which called Net Level Concurrency (NLC).

Challenge in Parallelization of Global Routing  Must ensure that threads have current usage information of the shared resources

Achieving NLC  Collision-awareness alone cannot guarantee reaping performance benefits by employing NLC on high throughput platforms  Concurrent routing will cramp limited resources to specific threads, sacrificing performance and solution quality

Achieving NLC

 Recover substantial performance by allowing threads to concurrently route in different congested regions.  The key to achieving high throughput in NLC is to identify congested regions and co-schedule nets that have minimal resource collisions

Scheduler Nets Data Dependency  Routing order for Rip-up and Re-Route (RRR)  The order is derived by a comparison of area and aspect ratio of the nets.  nets covering larger area with a smaller aspect ratio  Independent subnets  subnets with no overlapping bounding region, as these can be routed simultaneously.

Scheduler Net Dependency Construction  Tile Coloring  Finding Available Concurrency  Tile Recoloring

Scheduler Net Dependency Construction  D>C>F>B>E>G>A  Dependency level  score the dependency level as the number of routing regions that one subnet invades

Scheduler Net Dependency Construction  2 nd iteration  C>F>B>E>G

Scheduler Implementation and Optimization  The scheduler thread and the global router threads execute in parallel with a producer- consumer relationship  Reduce the problem size using the identified congestion region  A task window is used to restrict the number of nets being examined for concurrency in each iteration

Implementation

Congested Region Identification

Results

Conclusion  Present a hybrid GPU-CPU high throughput computing environment as a scalable alternative to the traditional CPU based router.  Detailed simulation results show an average of 4X speedup over NTHU-Route 2.0 with negligible loss in solution quality