MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Practical techniques & Examples
Introduction to Parallel Computing
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Chapter 4 Systems of Linear Equations; Matrices Section 2 Systems of Linear Equations and Augmented Matrics.
CS 290H 7 November Introduction to multigrid methods
MATH 685/ CSI 700/ OR 682 Lecture Notes
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
Point-to-Point Communication Self Test with solution.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
CS240A: Conjugate Gradients and the Model Problem.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
The Design and Analysis of Algorithms
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Program Development Life Cycle (PDLC)
Finite Element Method.
MA/CS 375 Fall MA/CS 375 Fall 2003 Lecture 19.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Introduction to Algorithms By Mr. Venkatadri. M. Two Phases of Programming A typical programming task can be divided into two phases: Problem solving.
MA/CS 375 Fall MA/CS 375 Fall 2002 Lecture 21.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
UNIT-I INTRODUCTION ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 1:
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
Application: Multiresolution Curves Jyun-Ming Chen Spring 2001.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
ICS 353: Design and Analysis of Algorithms Backtracking King Fahd University of Petroleum & Minerals Information & Computer Science Department.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Xing Cai University of Oslo
Ioannis E. Venetis Department of Computer Engineering and Informatics
The Design and Analysis of Algorithms
Parallel Programming By J. H. Wang May 2, 2017.
MA/CS 375 Spring 2002 Lecture 19 MA/CS 375 Fall 2002.
Lecture 19 MA471 Fall 2003.
Finite Element Method To be added later 9/18/2018 ELEN 689.
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
Introduction to High Performance Computing Lecture 16
Parallel Programming in C with MPI and OpenMP
Programming assignment #1 Solving an elliptic PDE using finite differences Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning

MA/CS 471 Fall Graph (or mesh) Partitioning  We have so far implemented a finite element Poisson solver.  The implementation is serial and not suited to parallel computing immediately  We have started to make the algorithm more suitable by switching from the LU factorization approach to solving the linear system –> to a conjugate gradient, iterative, algorithm which does not have the same bottlenecks to parallel computation

MA/CS 471 Fall Next Step To Parallelism  Now we have made sure that there are no intrinsically serial computation steps in system solve we are free to divide up the work between processes.  We will proceed by deciding which finite-element triangle goes to which processor

MA/CS 471 Fall Mesh Partioning  So far, I have supplied files which include information on which triangle goes to which processor  These files were generated using pmetis   This is a serial routine, however Karypis has written a parallel version which can be used as a library.  The library is called parmetis…

MA/CS 471 Fall 20025

6

7

8

9

10 Team Project Continued  Now we are ready to progress towards making the serial Poisson solver work in paralllel.  This task divides into a number of steps:  Conversion of umDriver, umMESH, umStartUp, umMatrix and umSolve  Adding a routine to read in a partition file (or call parMetis to obtain a partition vector)

MA/CS 471 Fall umDriver modification  This code should now initialize MPI  This code should call the umPartition routine  This should be modified to find the number of processors and local processor ID (stored in your struct/class..)  This code should finalize MPI

MA/CS 471 Fall umPartition  This code should read in a partition from file  The input should be the name of the partition file, the current process ID (rank) and the number of processes (size)  The output should be a list of elements belonging to this process

MA/CS 471 Fall umMESH Modifications  This routine should now be fed a partition file determining which elements it should read in from the.neu input mesh file  You should replace the elmttoelmt part with a piece of code which goes through the.neu file and reads in which element/face lies on the boundary and use this to mark whether a node is known or unknown  Each process should send a list of its “known” vertices’ global numbers to each other process so all nodes can be correctly identified as lying on the boundary or not

MA/CS 471 Fall umStartUp modification  Remains largely unchanged (depending on how you read in umVertX,umVertY, elmttonode).

MA/CS 471 Fall umMatrix modification  This routine should be modified so that instead of creating the mat matrix it should be fed a vector vec and returns mat*vec  IT SHOULD NOT STORE THE GLOBAL MATRIX AT ALL!!  I strongly suggest creating a new routine (umMatrixOP) and comparing the output from this with using umMatrix to build and multiply some vector as debugging

MA/CS 471 Fall umSolve modification  The major biggy here is the replacement of umAinvB with a call to your own conjugate gradient solver  Note – the rhs vector is filled up here with a global gather of the elemental contributions, so this will have to be modified due to the elements on other processes.

MA/CS 471 Fall umCG modification  umCG is the routine which should take a rhs and return an approximate solution using CG.  Each step of the CG algorithm needs to be analyzed to determine the process data dependency  For the matrix*vector steps a certain amount of data swap is required  For the dot products an allreduce is required.  Strongly suggest creating the exchange sequence before the iterations start.

MA/CS 471 Fall Work Partition  Here’s the deal – there are approximately six unequal chunks of work to be done.  I suggest the following code split up  umDriver, umCG  umPartition, umSolve  umMESH, umStartUp  umMatrixOP  However, you are free to choose.  Try to minimize the amount of data stored on multiple processes (but do not make the task too difficult, by not sharing anything)

MA/CS 471 Fall Discussion and Project Write-Up  This is a little tricky so now is the time to form a plan and to ask any questions.  This will be due on Tuesday 22 nd October  As usual I need a complete write up.  This should include parallel timings and speed up tests (I.e. for a fixed grid find wall clock time umCG for Nprocs =2,4,6,8,10,12,14,16 and compare in a graph)  Test the code to make sure it is giving the same results (up to convergence tolerance) as the serial code  Profile your code using upshot  Include pictures showing partition (use a different colour per partition) and parallel solution.