1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

Slides:



Advertisements
Similar presentations
Shantanu Dutt Univ. of Illinois at Chicago
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Misbah Mubarak, Christopher D. Carothers
A Novel 3D Layer-Multiplexed On-Chip Network
Interconnect throughput modeling. Important network performance metrics Throughput – Point to point (link bandwidth + end host software overheads) – Aggregate.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
Distributed Algorithms for Secure Multipath Routing
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
Mario Čagalj supervised by prof. Jean-Pierre Hubaux (EPFL-DSC-ICA) and prof. Christian Enz (EPFL-DE-LEG, CSEM) Wireless Sensor Networks:
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
Interconnect Networks
Minimum-Cost Multicast Routing for Multi- Layered Multimedia Distribution IM PhD Forum, NTU Minimum-Cost Multicast Routing for Multi- Layered Multimedia.
Steady and Fair Rate Allocation for Rechargeable Sensors in Perpetual Sensor Networks Zizhan Zheng Authors: Kai-Wei Fan, Zizhan Zheng and Prasun Sinha.
1 IEEE Trans. on Smart Grid, 3(1), pp , Optimal Power Allocation Under Communication Network Externalities --M.G. Kallitsis, G. Michailidis.
07/21/2005 Senmetrics1 Xin Liu Computer Science Department University of California, Davis Joint work with P. Mohapatra On the Deployment of Wireless Sensor.
Network Aware Resource Allocation in Distributed Clouds.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
A Distributed Framework for Correlated Data Gathering in Sensor Networks Kevin Yuen, Ben Liang, Baochun Li IEEE Transactions on Vehicular Technology 2008.
When In-Network Processing Meets Time: Complexity and Effects of Joint Optimization in Wireless Sensor Networks Department of Computer Science, Wayne State.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Interconnect Performance Modeling. Performance modeling Given an interconnect topology, routing, and other parameters, predict the interconnect performance.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Solving the Maximum Cardinality Bin Packing Problem with a Weight Annealing-Based Algorithm Kok-Hua Loh University of Maryland Bruce Golden University.
10/03/2005: 1 Physical Synthesis of Latency Aware Low Power NoC Through Topology Exploration and Wire Style Optimization CK Cheng CSE Department UC San.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
Bounded relay hop mobile data gathering in wireless sensor networks
CSE 291A Interconnection Networks Instructor: Prof. Chung-Kuan, Cheng CSE Dept. UCSD Winter-2007.
Simultaneous routing and resource allocation via dual decomposition AUTHOR: Lin Xiao, Student Member, IEEE, Mikael Johansson, Member, IEEE, and Stephen.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
Static Process Scheduling
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
1 Slides by Yong Liu 1, Deep Medhi 2, and Michał Pióro 3 1 Polytechnic University, New York, USA 2 University of Missouri-Kansas City, USA 3 Warsaw University.
Low-Power and High-Speed Interconnect Using Serial Passive Compensation Chun-Chen Liu and Chung-Kuan Cheng Computer Science and Engineering Dept. University.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Impact of Interference on Multi-hop Wireless Network Performance
Solving bucket-based large flow allocation problems
Constraint-Based Routing
Cross layer design is wireless multi-hop network
An Equal-Opportunity-Loss MPLS-Based Network Design Model
Mayank Bhatt, Jayasi Mehar
Interconnect Architecture
Maximum Flow Neil Tang 4/8/2008
Presentation transcript:

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department of Computer Science and Engineering University of California, San Diego

2 Outline  Introduction  Design Flow, Formulation & Algorithms  Example: Blue Gene/L Packaging Overview Models & Constraints  Experiments Benchmark Instances Generated Instances  Conclusion & Future Work

3 Interconnection Networks  Interconnection networks become a more critical factor than computing or memory modules (W. Dally, HPCA 2007 Keynote Speech)  Popular network topologies: Hypercube (SGI Origin2000) 2D torus (Cray X1) 3D torus (Cray T3E and XT3, IBM Blue Gene/L) Crossbar (NEC Earth Simulator) Folded Clos (Cray BlackWidow) Fat tree, flattened butterfly, Etc.

4 Our Work  We propose a design methodology to select the best topology to minimize the average latency Design flow is fully automated Physical constraints can be specified by users Efficient multi-commodity flow algorithm to evaluate Demonstrate the efficiency using Blue Gene/L packaging framework

5 Design Flow MCF Evaluation Solver Delay ModelsTopology Pool Communication Patterns Physical Constraints Best Topology

6 Multi-Commodity Flow (MCF)  Graph G(V,E)  K commodities, each has a source and a sink, and demand amount d(k)  Each edge e has a capacity u(e)  Each edge e has a weight w(e)  Minimum Cost MCF: each commodity k is routed units under the capacity constraints, minimize, where f(e) is the flow routed on edge e

7 Map Supercomputer Performance Evaluation to MCF Problem  Nodes – processors  Edges – interconnection links  Commodities – communications  Demands – communication bandwidth (injection rate)  Flow amount – wires assignments  Capacity constraints – physical constraints (wires, pins, board dim)  Edge weight – unit latency (unit power)

8 An Example on Maximum Concurrent Flow  Two commodities: s1->t1, s2->t2, both have demand d(1)=d(2)=1  Optimal throughput = 1.5

9 Approximation Algorithms  The duality theory in LP: for a maximization, primal feasible, dual feasible D, optimal solution OPT  Increase and decrease D iteratively till the duality gap is small enough

10 Blue Gene/L: An Example Midplane: 8x8x8 Torus

11 Assumptions  We follow the same hierarchical structure: midplane – node card – compute card  The properties of boards (dimensions, # layers, dielectric) keep unchanged  We seek better topologies than the existing 3D torus to implement the networks in the midplane

12 Topology Generation  Generate 8-node 1D topologies and duplicate to each row and column  Topologies are isomorph-free and has maximum degree bound for each node #isomorph-free topologies

13 Node Card Graph Model Horizontal: Strongly Connected; Vertical: Generated Topology

14 Midplane Graph Model Coteus et al., “ Packaging the Blue Gene/L Supercomputer ” IBM J of Res & Dev, Vol. 43, pp

15 Experiment 1: Benchmark Instances  NAS Parallel Benchmarks (121/128 processes) Benchmark source code Compiled with Intel Trace Collector & Analyzer Executable Run on multi-processor machines Output Simulated annealing placement Traffic Patterns Task placement Our design flow Best topology

16 Benchmarks CharacteristicsCommunication Pattern: MG

17 Results  Optimal: each instance has different topology  Aggregate: one topology for all instances  3D Torus: 3D torus topology

18 Experiment 2: Generated Instances  Randomly generated communications Scalar values which represent the demand for bandwidth between each pair of nodes More general, time independent  Control Parameters # communication demands: O(n) pairs Communication amount: uniform traffic but vary case by case (different congestion level)

19 Latency & Throughput Tradeoffs Distribution: 40% / 50% / 10%

20 Topologies with Different Injection Rates With larger injection rate, more (red) links are needed to go through the cut between 4 and 5, in order to reduce the number of hops

21 Conclusion  An design flow for interconnection network synthesis Fully automated Explore large design space Efficient evaluation algorithm  Future work Power consumption Accurate simulation

22 Q&A Thank you!