Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Dynamic Load Balancing in Scientific Simulation Angen Zheng.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
OpenFOAM on a GPU-based Heterogeneous Cluster
Faster SAT and Smaller BDDs via Common Function Structure Fadi A. Aloul, Igor L. Markov, Karem A. Sakallah University of Michigan.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model) Jamal Faik 1, J. D. Teresco 2, J. E. Flaherty 1, K. Devine.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Parallelization of Irregular Applications: Hypergraph-based Models and Methods for Partitioning and Load Balancing Cevdet Aykanat Bilkent University Computer.
Combinatorial Scientific Computing is concerned with the development, analysis and utilization of discrete algorithms in scientific and engineering applications.
Sandia National Laboratories Graph Partitioning Workshop Oct. 15, Load Balancing Myths, Fictions & Legends Bruce Hendrickson Sandia National Laboratories.
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Parallel Computing Sciences Department MOV’01 Multilevel Combinatorial Methods in Scientific Computing Bruce Hendrickson Sandia National Laboratories Parallel.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
1 1  Capabilities: Dynamic load balancing and static data partitioning -Geometric, graph-based, hypergraph-based -Interfaces to ParMETIS, PT-Scotch, PaToH.
Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
ParMA: Towards Massively Parallel Partitioning of Unstructured Meshes Cameron Smith, Min Zhou, and Mark S. Shephard Rensselaer Polytechnic Institute, USA.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
Multilevel Partitioning
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication.
Dynamic Load Balancing in Scientific Simulation
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
An Accelerated Procedure for Hypergraph Coarsening on the GPU Lin Cheng, Hyunsu Cho, and Peter Yoon Trinity College Hartford, CT, USA.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
Improving Parallelism in Structural Data Mining Min Cai, Istvan Jonyer, Marcin Paprzycki Computer Science Department, Oklahoma State University, Stillwater,
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
High Performance Computing Seminar
A Hypergraph-Partitioning Approaches for Workload Decomposition
Parallel Hypergraph Partitioning for Scientific Computing
Computational Techniques for Efficient Carbon Nanotube Simulation
= Michael M. Wolf, Sandia National Laboratories
CS 240A: Graph and hypergraph partitioning
Linchuan Chen, Peng Jiang and Gagan Agrawal
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Adaptive Mesh Applications
Computational Techniques for Efficient Carbon Nanotube Simulation
Dynamic Load Balancing of Unstructured Meshes
Presentation transcript:

Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of Biomedical Informatics Department of Electrical & Computer Engineering The Ohio State University Workshop on Combinatorial Scientific Computing & Petascale Simulations 2008 June 10-13, 2008, Santa Fe, NM

Department of Biomedical Informatics OSU’s CSCAPES Contributions Load Balancing Parallel Static Load Balancing Parallel Dynamic Load Balancing Parallel Graph Coloring Distance-1 coloring Distance-2 coloring talk by Bozdag Friday morning Parallel Matrix Partitioning Parallel Matrix Ordering Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 2 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 3 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Partitioning and Load Balancing Goal: assign data to processors to minimize application runtime maximize utilization of computing resources Metrics: minimize processor idle time (balance workloads) keep inter-processor communication costs low Impacts performance of a wide range of simulations Adaptive mesh refinement Contact detection Particle simulations x b A = Linear solvers & preconditioners CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Dynamic Load Balancing/Repartitioning Applications with workload or locality that changes during simulation require dynamic load balancing (a.k.a. repartitioning) Adaptive mesh refinement Particle methods Contact detection Repartitioning has additional cost: Moving data from old to new decomposition executionT = #iter x ( computationT + communicationT) + repartT + migrationT CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 6 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Classification of Dynamic Load Balancing Approaches Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 7 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Graph and Hypergraph Partitioning GraphsHypergraphs Communityload-balancing (highly successful for PDE problems) VLSI, recently Computational Science ModelVertices = computation/data Edge = relationship between computation/data (bi-directional) Vertices= computation/data Edge = dependency to data elements (multi- way) GoalEvenly distribute vertex weight while minimizing weight of cut edges Evenly distribute vertex weight while minimizing cut size AlgorithmsKernighan, Lin, Simon, Hendrickson, Leland, Kumar, Karypis, et al. Kernighan, Schweikert, Fiduccia, Mattheyes, Sanchis, Alpert, Kahng, Hauck, Borriello, Çatalyürek, Aykanat, Karypis, et al. Serial Partitioner Chaco (SNL), Jostle (U. Greenwich), METIS (U. Minn.), Party (U. Paderborn), Scotch (U. Bordeaux) hMETIS (Karypis), PaToH (Çatalyürek), Mondriaan (Bisseling) Parallel Partitioner ParMETIS (U. Minn.), PJostle (U. Greenwich) Zoltan PHG (Sandia), Parkway (Trifunovic) CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Impact of Hypergraph Models (Where Graph is not Sufficient) Greater expressiveness  Greater applicability Structurally non-symmetric systems circuits, biology Rectangular systems linear programming, least-squares methods Non-homogeneous, highly connected topologies circuits, nanotechnology, databases Multiple models for different granularity partitioning Owner compute, fine-grain, checkerboard/cartesian, Mondriaan Accurate communication model  lower application communication costs Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 9 CSCAPES Workshop, June 10, 2008 P4P4 P3P3 P1P1 ViVi VkVk VjVj VmVm VhVh VlVl nini nknk nlnl nmnm nhnh Mondriaan Partitioning Courtesy of Rob Bisseling P4P4 P3P3 P1P1 P2P2 ViVi VkVk VjVj VmVm VhVh VlVl

Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 10 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Hypergraph Model : #parts edge e i connects Cut = Cut = total comm volume CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning executionT = #iter x ( computationT + communicationT) + repartT + migrationT CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 16 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Implementation of Hypergraph Repartitioning Implemented in Zoltan toolkit Based on parallel multilevel parallel hypergraph partitioner with recursive bisection (IPDPS’06) Automatically construct augmented hypergraph … with added capability for handling “fixed vertices.” Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 17 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Experimental Results Experiments on OSU-RI cluster 64 compute nodes connected with Infiniband Dual 2.4 GHz AMD Opteron processors with 8 GB RAM Sandia-Thunderbird cluster 4,480 compute nodes connected with Infiniband Dual 3.6 GHz Intel EM64T processors with 6 GB RAM Zoltan v3 (alpha) hypergraph partitioner & ParMETIS v3.1 graph partitioner Test problems: 2DLipid: density functional theory; 4K x 4K; 5.6M nonzeros Xyce: ASIC Stripped; 680K x 680K; 2.3M nonzeros Cage14: DNA Electrophoresis; 1.5M x 1.5M; 27M nonzeros Xyce ASIC Stripped Cage Electrophoresis CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Communication Volume 2DLipid Hypergraph is better Zoltan-repart trades comm with migration to min tot cost Scratch methods are comparable for large alpha (#iter) Xyce Cage14

Department of Biomedical Informatics Dynamic Graph: Partitioning Time on T-bird 2DLipid Cage14 Xyce

Department of Biomedical Informatics Summary of Dynamic Load Balancing A novel hypergraph model for dynamic load balancing Single hypergraph that incorporates both communication volume in the application and data migration cost Performs better or comparable to graph-based dynamic load balancing A parallel dynamic load balancing tool Essential for peta-scale applications Scales similar to those of graph-based tools Future Work There is always room for improvement: speed and/or quality Direct k-way refinement Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 21 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 22 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Matrix Partitioning Hypergraph Models for Sparse-Matrix Partitioning 1D row-wise column-wise 2D Fine-grain Jagged-like Checkerboard Serial Tool: PaToH & Matlab interface Matrix Partitioning Partitioned Matrix Display CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics 1D Partitioning M x N matrices with K processors Worst case Total Volume = (K-1) x N words or (K-1) x M words Total Number Messages = K x (K-1) Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 24 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics 2D Partitioning: Jagged-Like M x N matrices with K=PxQ processors Worst case Total Volume = (K-P) x N + (Q-1) x M Total Number Messages = K x (K-Q) + K x (Q-1) = K x (K-1) Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 25 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics 2D Partitioning: Checkerboard M x N matrices with K=PxQ processors Worst case Total Volume = (P-1) x N + (Q-1) x M Total Number Messages = P+Q-2 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 26 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics cage5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 27 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics cage5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 28 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics cage5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 29 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Experimental Results Tested 1,413 matrices (out of 1,877) from UFL Collection #rows >= 500 and #columns >= 500 #non-zeros < 10,000,000 K-way partitioning for K = 4, 16, 64 and 256 If 50 x K >= max {#rows, #columns} Partitioning instance = matrix & K For each partitioning instance we run RW, CW, JL, CH, FG methods Linux Cluster 64 dual 2.4GHz Opteron CPUs, 8GB ram Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 30 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Experimental Results: Total Communication Volume Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 31 CSCAPES Workshop, June 10, 2008 All Instances (4040) Square Symmetric (2231) Performance Profiles

Department of Biomedical Informatics Experimental Results: Total Communication Volume Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 32 CSCAPES Workshop, June 10, 2008 Square Non-symmetric (1102) Rectangular (707) N>M (662)  CW better than RW M>N (45)

Department of Biomedical Informatics Experimental Results: Total Number of Messages Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 33 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Experimental Results: Execution Time Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 34 CSCAPES Workshop, June 10, 2008

Department of Biomedical Informatics Summary of Matrix Partitioning Hypergraph models for Matrix Partitioning Well.. some are not new but not have been adopted by applications yet. Why? (Information dissemination problem? Tool?) More hypergraph-based methods are being developed! Corner-Model Hybrid Mondrian with Fine-Grain Matlab interface to PaToH for Matrix Partitioning Currently supports: RW, CW, JL, CH, FG Will be available soon Work in progress Parallel Matrix Partitioning via Zoltan CSCAPES Workshop, June 10, Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

Department of Biomedical Informatics Thanks Contact Info: Also: Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 36 CSCAPES Workshop, June 10, 2008