Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

Similar presentations


Presentation on theme: "1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department."— Presentation transcript:

1 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department of Computer Science and Engineering University of California, San Diego

2 2 Outline  Introduction  Design Flow, Formulation & Algorithms  Example: Blue Gene/L Packaging Overview Models & Constraints  Experiments Benchmark Instances Generated Instances  Conclusion & Future Work

3 3 Interconnection Networks  Interconnection networks become a more critical factor than computing or memory modules (W. Dally, HPCA 2007 Keynote Speech)  Popular network topologies: Hypercube (SGI Origin2000) 2D torus (Cray X1) 3D torus (Cray T3E and XT3, IBM Blue Gene/L) Crossbar (NEC Earth Simulator) Folded Clos (Cray BlackWidow) Fat tree, flattened butterfly, Etc.

4 4 Our Work  We propose a design methodology to select the best topology to minimize the average latency Design flow is fully automated Physical constraints can be specified by users Efficient multi-commodity flow algorithm to evaluate Demonstrate the efficiency using Blue Gene/L packaging framework

5 5 Design Flow MCF Evaluation Solver Delay ModelsTopology Pool Communication Patterns Physical Constraints Best Topology

6 6 Multi-Commodity Flow (MCF)  Graph G(V,E)  K commodities, each has a source and a sink, and demand amount d(k)  Each edge e has a capacity u(e)  Each edge e has a weight w(e)  Minimum Cost MCF: each commodity k is routed units under the capacity constraints, minimize, where f(e) is the flow routed on edge e

7 7 Map Supercomputer Performance Evaluation to MCF Problem  Nodes – processors  Edges – interconnection links  Commodities – communications  Demands – communication bandwidth (injection rate)  Flow amount – wires assignments  Capacity constraints – physical constraints (wires, pins, board dim)  Edge weight – unit latency (unit power)

8 8 An Example on Maximum Concurrent Flow  Two commodities: s1->t1, s2->t2, both have demand d(1)=d(2)=1  Optimal throughput = 1.5

9 9 Approximation Algorithms  The duality theory in LP: for a maximization, primal feasible, dual feasible D, optimal solution OPT  Increase and decrease D iteratively till the duality gap is small enough

10 10 Blue Gene/L: An Example Midplane: 8x8x8 Torus

11 11 Assumptions  We follow the same hierarchical structure: midplane – node card – compute card  The properties of boards (dimensions, # layers, dielectric) keep unchanged  We seek better topologies than the existing 3D torus to implement the networks in the midplane

12 12 Topology Generation  Generate 8-node 1D topologies and duplicate to each row and column  Topologies are isomorph-free and has maximum degree bound for each node #isomorph-free topologies

13 13 Node Card Graph Model Horizontal: Strongly Connected; Vertical: Generated Topology

14 14 Midplane Graph Model Coteus et al., “ Packaging the Blue Gene/L Supercomputer ” IBM J of Res & Dev, Vol. 43, pp

15 15 Experiment 1: Benchmark Instances  NAS Parallel Benchmarks (121/128 processes) Benchmark source code Compiled with Intel Trace Collector & Analyzer Executable Run on multi-processor machines Output Simulated annealing placement Traffic Patterns Task placement Our design flow Best topology

16 16 Benchmarks CharacteristicsCommunication Pattern: MG

17 17 Results  Optimal: each instance has different topology  Aggregate: one topology for all instances  3D Torus: 3D torus topology

18 18 Experiment 2: Generated Instances  Randomly generated communications Scalar values which represent the demand for bandwidth between each pair of nodes More general, time independent  Control Parameters # communication demands: O(n) pairs Communication amount: uniform traffic but vary case by case (different congestion level)

19 19 Latency & Throughput Tradeoffs Distribution: 40% / 50% / 10%

20 20 Topologies with Different Injection Rates With larger injection rate, more (red) links are needed to go through the cut between 4 and 5, in order to reduce the number of hops

21 21 Conclusion  An design flow for interconnection network synthesis Fully automated Explore large design space Efficient evaluation algorithm  Future work Power consumption Accurate simulation

22 22 Q&A Thank you!


Download ppt "1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department."

Similar presentations


Ads by Google