PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.

Slides:



Advertisements
Similar presentations
Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Advertisements

Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
C3.ca in Atlantic Canada Virendra Bhavsar Director, Advanced Computational Research Laboratory (ACRL) Faculty of Computer Science University of New Brunswick.
METIS Three Phases Coarsening Partitioning Uncoarsening
Adaptive Mesh Applications
Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.
CS 584. Review n Systems of equations and finite element methods are related.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
A scalable multilevel algorithm for community structure detection
Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model) Jamal Faik 1, J. D. Teresco 2, J. E. Flaherty 1, K. Devine.
FPGA Acceleration of Phylogeny Reconstruction for Whole Genome Data Jason D. Bakos Panormitis E. Elenis Jijun Tang Dept. of Computer Science and Engineering.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
A Resource-level Parallel Approach for Global-routing-based Routing Congestion Estimation and a Method to Quantify Estimation Accuracy Wen-Hao Liu, Zhen-Yu.
Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July
15-853Page :Algorithms in the Real World Separators – Introduction – Applications.
The Shortest Path Problem
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
CS 146: Data Structures and Algorithms July 21 Class Meeting
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
Graph Partitioning Donald Nguyen October 24, 2011.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Network Aware Resource Allocation in Distributed Clouds.
Switch-and-Navigate: Controlling Data Ferry Mobility for Delay-Bounded Messages Liang Ma*, Ting He +, Ananthram Swami §, Kang-won Lee + and Kin K. Leung*
1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
Parallel Computing Sciences Department MOV’01 Multilevel Combinatorial Methods in Scientific Computing Bruce Hendrickson Sandia National Laboratories Parallel.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text ``Introduction to Parallel Computing'',
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian.
QoS Supported Clustered Query Processing in Large Collaboration of Heterogeneous Sensor Networks Debraj De and Lifeng Sang Ohio State University Workshop.
Tung-Wei Kuo, Kate Ching-Ju Lin, and Ming-Jer Tsai Academia Sinica, Taiwan National Tsing Hua University, Taiwan Maximizing Submodular Set Function with.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Static Process Scheduling
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Discrete Optimization Lecture 4 – Part 1 M. Pawan Kumar
ParMA: Towards Massively Parallel Partitioning of Unstructured Meshes Cameron Smith, Min Zhou, and Mark S. Shephard Rensselaer Polytechnic Institute, USA.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
Dynamic Load Balancing in Scientific Simulation
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
High Performance Computing Seminar
Auburn University
Parallel Hypergraph Partitioning for Scientific Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Plan Introduction to multilevel heuristics Rich partitioning problems
Using Multilevel Force-Directed Algorithm to Draw Large Clustered Graph 研究生: 何明彥 指導老師:顏嗣鈞 教授 2018/12/4 NTUEE.
Richard Anderson Lecture 21 Network Flow
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Problem Solving 4.
Adaptive Mesh Applications
Presentation transcript:

PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This work is done in collaboration with Sili Huang and Dr. Eric Aubanel.

Outline Introduction Background PaGrid Mesh Partitioner Experimental Results Conclusion

Advanced Computational Research Laboratory Virendra C. Bhavsar

ACRL Facilities

ACEnet Project ACEnet (Atlantic Computational Excellence Network) is Atlantic Canada's entry into this national fabric of HPC facilities. A partnership of seven institutions, including UNB, MUN, MTA, Dalhousie, StFX, SMU, and UPEI. ACEnet was awarded $9.9M by the CFI in March The project will be worth nearly $28M.

Mesh Partitioning Problem i j h i,j+1 h i+1,j h i-1,j h i,j-1 Enlarged Metal plate (a) Heat distribution problem(b) Corresponding application graph

Mesh Partitioning Problem Mapping of the mesh onto the processors while minimizing the inter-processor communication cost Balance the computational load among processors p0p0 p1p1 p2p2 p3p3 (b) A partition with homogeneous partitioning p2p2 p0p0 p1p1 p3p (a) Homogeneous system graph Cut Edges: p 0 : 8 p 1 : 8 p 2 : 8 p 3 : 8 Total Cut Edges: 16

Computational Grids The slide is from the Centre for Unified Computing, University of College Cork, Ireland

Computational Grid Applications Computational Fluid DynamicsComputational Mechanics BioinformaticsCondensed Matter Physics Simulation The slide is from Fluent.com, University of California San Diego, George Washington University, Ohio State University

A Computational Grid Model Computational Grids and their heterogeneity in both processors and networks p0p0 p1p1 p2p2 p3p3 4 p4p4 p5p5 p6p6 p7p7 p8p8 p9p9 Cluster 1 Cluster 2

Mesh Partitioning Problem p2p2 p0p0 p1p1 p3p3 112 (a) Processor graph p0p0 p1p1 p2p2 p3p3 (c) Optimal partition with a heterogeneous partitioner Total Cut Edges:24 Total Communication Cost:32 Total Cut Edges:16 Total Communication Cost:40 p0p0 p1p1 p2p2 p3p3 (b) Optimal partition with a homogeneous partitioner Equation: Total communication cost

Background Generic Multilevel Partitioning Algorithm The slide is from Centre from CEPBA-IBM Research Institute, Spain.

Background Coarsening phase  Matching and contraction. Heavy Edge Matching Heuristic [2] v1v1 v2v2 u

Background Refinement (Uncoarsening Phase)  Kernighan-Lin/Fiduccia-Mattheyses (KL-FM) refinement Refine partitions under load balance constraint. Compute a gain for each candidate vertex. Each step, move a single vertex to a different subdomain. Vertices with negative gains are allowed for migration.  Greedy refinement Similar to KL-FM refinement Vertices with negative gains are not allowed to move

Background (Computational) Load balancing  To balance the load among the processors  Small imbalance can lead to a better partition. Diffusion-based Flow Solutions  Determine how much load to be transferred among processors

Mesh Partitioning Tools  METIS (Karypis and Kumar, 1995)  JOSTLE (Walshaw, 1997)  CHACO (Hendrickson and Leland, 1994)  PART(Chen and Taylor, 1996)  SCOTCH(Pellegrini, 1994)  PARTY(Preis and Diekmann, 1996)  MiniMax(Kumar, Das, and Biswas, 2002)

METIS A widely used partitioning tool. Developed from Uses Multilevel partitioning algorithm.  Heavy Edge Matching for Coarsening Phase  Greedy Refinement algorithm Does not consider the network heterogeneity.

JOSTLE Developed from A heterogeneous partitioner Uses multilevel partitioning algorithm  Heavy Edge Matching  KL-type refinement algorithm Does not factor in the ratio of communication time and computation time.

PaGrid Mesh Partitioner Grid System Modeling Refinement Cost Function KL-type Refinement Estimated Execution Time Load Balancing

Grid System Modeling Grid system that contains a set of processors (P) connected by a set of edges (C) –> weighted processor graph S. Vertex weight = relative computational power  if p 0 is twice powerful than p 1, and |p 1 |=0.5, then |p 0 |=1 Path length = accumulative weights in the shortest path. Weighted Matrix W of size |P| X |P| is constructed, where Grid system Model 12 p0p0 p1p1 p2p2 |(p 0, p 1 )|= 1 |(p 1, p 2 )|= 2 |(p 0, p 2 )|= 3 Path lengthsWeighted matrix W

Refinement Cost Function Given a processor mapping cost matrix W, the total mapping cost for a partition is given by u v map to p0p0 p1p1 p2p2 p3p3

Refinement Cost Function

Multilevel Partitioning Algorithm Coarsening Phase.  Heavy Edge Matching  Iterate until the number of vertices in the coarsest graph is same as the given number of processors. Initial Partitioning Phase.  Assign the each vertex to a processor, while minimizing the cost function. Uncoarsening Phase.  Load balancing based on vertex weights  KL-type refinement algorithm. Load balancing based on estimated execution time.

Estimated Execution time load balancing Input is the final partition after refinement stage. Tries to improve the quality of final partition in terms of estimated execution time. Execution time for a processor is the sum of time required for computation and the time required for communication. Execution time is a more accurate metric for the quality of a partition. Uses KL-type algorithm

Estimated Execution time load balancing For a processor p with one of its edges (p, q) in the processor graph, let Estimated execution time for processor p is given as Estimated execution time of the application is:

Experimental Results Test application graphs Grid system graphs Comparison with METIS and JOSTLE

Test Application Graphs Graph|V||V||E||E||E|/|V|Description 598a D finite element mesh (Submarine I) D finite element mesh (Parafoil) m14b D finite element mesh (Submarine II) auto D finite element mesh (GM Saturn) Mrng (description not available) |V| is the total number of vertices and |E| is the total number of edges in the graph.

Grid Systems 32-processor Grid system 64-processor Grid system

Metrics Total Communication Cost Maximum Estimated Execution Time

Total Communication Cost 32-processor Grid System

Total Communication Cost Average values of Total Communication Cost of PaGrid are similar to those of METIS. Average values of Total Communication Cost of PaGrid are slightly worse than for Jostle.

Maximum Estimated Execution Time 32-processor Grid System

Maximum Estimated Execution Time The minimum and average values of Execution Time for PaGrid are always lower than for Jostle and METIS, except for graph mrng2, where PaGrid is slightly worse than METIS. Even though the results PaGrid are worse than Jostle in terms of average Total Communication Cost, PaGrid ’ s Estimated Execution Time Load Balancing generates lower average Execution Time than Jostle in all cases.

Total Communication Cost 64-processor Grid System

Total Communication Cost Average values of Total Communication Cost of PaGrid are better than METIS in most cases, except for graph mrng2 ( because of the low ratio of |E|/|V| ). Average values of Total Communication Cost of PaGrid are much worse than Jostle in three of five test application graphs.

Maximum Estimated Execution Time 64-processor Grid System

Maximum Estimated Execution Time The difference between PaGrid and Jostle are amplified:  even though the results PaGrid are much worse than Jostle in terms of average Total Communication Cost, the minimum and average values of Execution Time for PaGrid are much lower than for Jostle. The minimum Estimated Execution Times for PaGrid are always much lower than for METIS, and the average Execution Times for PaGrid are almost always lower than those of METIS, except for application graph mrng2.

Conclusion Intensive need for mesh partitioner that considers the heterogeneity of the processors and networks in a computational Grid environment. Current partitioning tools provide only limited solution. PaGrid: a heterogeneous mesh partitioner  Consider both processor and network heterogeneity.  Use multilevel graph partitioning algorithm.  Incorporate load balancing that is based on estimated execution time. Experimental results indicate that load balancing based on estimated execution time improves the quality of partitions.

Future Work Cost function can be modified to be based on estimated execution time. Algorithms can be developed addressing repartitioning problem. Parallelization of PaGrid.

Publications S. Huang, E. Aubanel, and V.C. Bhavsar, "PaGrid: A Mesh Partitioner for Computational Grids", Journal of Grid Computing, 18 pages, in press, S. Huang, E. Aubanel and V. Bhavsar, ‘Mesh Partitioners for Computational Grids: a Comparison’, in V. Kumar, M. Gavrilova, C. Tan, and P. L'Ecuyer (eds.), Computational Science and Its Applications, Vol of Lecture Notes in Computer Science, Springer Inc., Berlin Heidelberg New York, pp. 60–68, 2003.

Questions ?