Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 301 Finite.
Advertisements

A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
Chapter 8 Elliptic Equation.
Parabolic Partial Differential Equations
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Numerical Algorithms Matrix multiplication
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Part Eight Partial Differential Equations.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
PARTIAL DIFFERENTIAL EQUATIONS
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Matrix-vector Multiplication.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Landscape Erosion Kirsten Meeker
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Design of parallel algorithms Matrix operations J. Porras.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Partial differential equations Function depends on two or more independent variables This is a very simple one - there are many more complicated ones.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Numerical Methods for Partial Differential Equations
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Part 81 Partial.
CISE301: Numerical Methods Topic 9 Partial Differential Equations (PDEs) Lectures KFUPM (Term 101) Section 04 Read & CISE301_Topic9.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming in C with MPI and OpenMP
Hyperbolic PDEs Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Scientific Computing Partial Differential Equations Poisson Equation.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Linear Systems Iterative Solutions CSE 541 Roger Crawfis.
Parallel Simulation of Continuous Systems: A Brief Introduction
Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Elliptic PDEs and the Finite Difference Method
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 8 - Chapter 29.
Solving an elliptic PDE using finite differences Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Introduction to PDE classification Numerical Methods for PDEs Spring 2007 Jim E. Jones References: Partial Differential Equations of Applied Mathematics,
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming with MPI and OpenMP
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Engineering Analysis – Computational Fluid Dynamics –
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 271 Boundary-Value.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 27.
Part 1 Chapter 1 Mathematical Modeling, Numerical Methods, and Problem Solving PowerPoints organized by Dr. Michael R. Gustafson II, Duke University and.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 6 - Chapters 22 and 23.
Solving Partial Differential Equation Numerically Pertemuan 13 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
High Altitude Low Opening?
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
Parallel Programming with MPI and OpenMP
PDEs and Examples of Phenomena Modeled
CSCE569 Parallel Computing
Finite Difference Method
CSCE569 Parallel Computing
Parallel Programming in C with MPI and OpenMP
CSCE569 Parallel Computing
CS6068 Applications: Numerical Methods
Jacobi Project Salvatore Orlando.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 13 Finite Difference Methods

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Outline Ordinary and partial differential equations Ordinary and partial differential equations Finite difference methods Finite difference methods Vibrating string problem Vibrating string problem Steady state heat distribution problem Steady state heat distribution problem

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Ordinary and Partial Differential Equations Ordinary differential equation: equation containing derivatives of a function of one variable Ordinary differential equation: equation containing derivatives of a function of one variable Partial differential equation: equation containing derivatives of a function of two or more variables Partial differential equation: equation containing derivatives of a function of two or more variables

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Examples of Phenomena Modeled by PDEs Air flow over an aircraft wing Air flow over an aircraft wing Blood circulation in human body Blood circulation in human body Water circulation in an ocean Water circulation in an ocean Bridge deformations as its carries traffic Bridge deformations as its carries traffic Evolution of a thunderstorm Evolution of a thunderstorm Oscillations of a skyscraper hit by earthquake Oscillations of a skyscraper hit by earthquake Strength of a toy Strength of a toy

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Model of Sea Surface Temperature in Atlantic Ocean Courtesy MICOM group at the Rosenstiel School of Marine and Atmospheric Science, University of Miami

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solving PDEs Finite element method Finite element method Finite difference method (our focus) Finite difference method (our focus)  Converts PDE into matrix equation  Result is usually a sparse matrix  Matrix-based algorithms represent matrices explicitly  Matrix-free algorithms represent matrix values implicitly (our focus)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Linear Second-order PDEs Linear second-order PDEs are of the form Linear second-order PDEs are of the form where A - H are functions of x and y only Elliptic PDEs: B 2 - AC < 0 Elliptic PDEs: B 2 - AC < 0 Parabolic PDEs: B 2 - AC = 0 Parabolic PDEs: B 2 - AC = 0 Hyperbolic PDEs: B 2 - AC > 0 Hyperbolic PDEs: B 2 - AC > 0

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Difference Quotients

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Formulas for 1st, 2d Derivatives

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Vibrating String Problem Vibrating string modeled by a hyperbolic PDE

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solution Stored in 2-D Matrix Each row represents state of string at some point in time Each row represents state of string at some point in time Each column shows how position of string at a particular point changes with time Each column shows how position of string at a particular point changes with time

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Discrete Space, Time Intervals Lead to 2-D Matrix

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Heart of Sequential C Program u[j+1][i] = 2.0*(1.0-L)*u[j][i] + L*(u[j][i+1] + u[j][i-1]) - u[j-1][i];

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Program Design Associate primitive task with each element of matrix Associate primitive task with each element of matrix Examine communication pattern Examine communication pattern Agglomerate tasks in same column Agglomerate tasks in same column Static number of identical tasks Static number of identical tasks Regular communication pattern Regular communication pattern Strategy: agglomerate columns, assign one block of columns to each task Strategy: agglomerate columns, assign one block of columns to each task

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Result of Agglomeration and Mapping

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication Still Needed Initial values (in lowest row) are computed without communication Initial values (in lowest row) are computed without communication Values in black cells cannot be computed without access to values held by other tasks Values in black cells cannot be computed without access to values held by other tasks

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Ghost Points Ghost points: memory locations used to store redundant copies of data held by neighboring processes Ghost points: memory locations used to store redundant copies of data held by neighboring processes Allocating ghost points as extra columns simplifies parallel algorithm by allowing same loop to update all cells Allocating ghost points as extra columns simplifies parallel algorithm by allowing same loop to update all cells

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Matrices Augmented with Ghost Points Lilac cells are the ghost points.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication in an Iteration This iteration the process is responsible for computing the values of the yellow cells.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Computation in an Iteration This iteration the process is responsible for computing the values of the yellow cells. The striped cells are the ones accessed as the yellow cell values are computed.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Complexity Analysis Computation time per element is constant, so sequential time complexity per iteration is  (n) Computation time per element is constant, so sequential time complexity per iteration is  (n) Elements divided evenly among processes, so parallel computational complexity per iteration is  (n / p) Elements divided evenly among processes, so parallel computational complexity per iteration is  (n / p) During each iteration a process with an interior block sends two messages and receives two messages, so communication complexity per iteration is  (1) During each iteration a process with an interior block sends two messages and receives two messages, so communication complexity per iteration is  (1)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Isoefficiency Analysis Sequential time complexity:  (n) Sequential time complexity:  (n) Parallel overhead:  (p) Parallel overhead:  (p) Isoefficiency relation: Isoefficiency relation: n  Cp To maintain the same level of efficiency, n must increase at the same rate as p To maintain the same level of efficiency, n must increase at the same rate as p If M(n) = n 2, algorithm has poor scalability If M(n) = n 2, algorithm has poor scalability If matrix of 3 rows rather than m rows is used, M(n) = n and system is perfectly scalable If matrix of 3 rows rather than m rows is used, M(n) = n and system is perfectly scalable

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Replicating Computations If only one value transmitted, communication time dominated by message latency If only one value transmitted, communication time dominated by message latency We can reduce number of communications by replicating computations We can reduce number of communications by replicating computations If we send two values instead of one, we can advance simulation two time steps before another communication If we send two values instead of one, we can advance simulation two time steps before another communication

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Replicating Computations Without replication: With replication:

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication Time vs. Number of Ghost Points

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Steady State Heat Distribution Problem Steam Ice bath

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solving the Problem Underlying PDE is the Poisson equation Underlying PDE is the Poisson equation This is an example of an elliptical PDE This is an example of an elliptical PDE Will create a 2-D grid Will create a 2-D grid Each grid point represents value of state state solution at particular (x, y) location in plate Each grid point represents value of state state solution at particular (x, y) location in plate

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Heart of Sequential C Program w[i][j] = (u[i-1][j] + u[i+1][j] + u[i][j-1] + u[i][j+1]) / 4.0;

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Algorithm 1 Associate primitive task with each matrix element Associate primitive task with each matrix element Agglomerate tasks in contiguous rows (rowwise block striped decomposition) Agglomerate tasks in contiguous rows (rowwise block striped decomposition) Add rows of ghost points above and below rectangular region controlled by process Add rows of ghost points above and below rectangular region controlled by process

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Decomposition 16 × 16 grid divided among 4 processors

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Complexity Analysis Sequential time complexity:  (n 2 ) each iteration Sequential time complexity:  (n 2 ) each iteration Parallel computational complexity:  (n 2 / p) each iteration Parallel computational complexity:  (n 2 / p) each iteration Parallel communication complexity:  (n) each iteration (two sends and two receives of n elements) Parallel communication complexity:  (n) each iteration (two sends and two receives of n elements)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Isoefficiency Analysis Sequential time complexity:  (n 2 ) Sequential time complexity:  (n 2 ) Parallel overhead:  (pn) Parallel overhead:  (pn) Isoefficiency relation: n 2  Cnp  n  Cp Isoefficiency relation: n 2  Cnp  n  Cp This implementation has poor scalability This implementation has poor scalability

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Algorithm 2 Associate primitive task with each matrix element Associate primitive task with each matrix element Agglomerate tasks into blocks that are as square as possible (checkerboard block decomposition) Agglomerate tasks into blocks that are as square as possible (checkerboard block decomposition) Add rows of ghost points to all four sides of rectangular region controlled by process Add rows of ghost points to all four sides of rectangular region controlled by process

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Decomposition 16 × 16 grid divided among 16 processors

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Implementation Details Using ghost points around 2-D blocks requires extra copying steps Using ghost points around 2-D blocks requires extra copying steps Ghost points for left and right sides are not in contiguous memory locations Ghost points for left and right sides are not in contiguous memory locations An auxiliary buffer must be used when receiving these ghost point values An auxiliary buffer must be used when receiving these ghost point values Similarly, buffer must be used when sending column of values to a neighboring process Similarly, buffer must be used when sending column of values to a neighboring process

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Complexity Analysis Sequential time complexity:  (n 2 ) each iteration Sequential time complexity:  (n 2 ) each iteration Parallel computational complexity:  (n 2 / p) each iteration Parallel computational complexity:  (n 2 / p) each iteration Parallel communication complexity:  (n /  p ) each iteration (four sends and four receives of n /  p elements each) Parallel communication complexity:  (n /  p ) each iteration (four sends and four receives of n /  p elements each)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Isoefficiency Analysis Sequential time complexity:  (n 2 ) Sequential time complexity:  (n 2 ) Parallel overhead:  (n  p ) Parallel overhead:  (n  p ) Isoefficiency relation: n 2  Cn  p  n  C  p Isoefficiency relation: n 2  Cn  p  n  C  p This system is perfectly scalable This system is perfectly scalable

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (1/4) PDEs used to model behavior of a wide variety of physical systems PDEs used to model behavior of a wide variety of physical systems Realistic problems yield PDEs too difficult to solve analytically, so scientists solve them numerically Realistic problems yield PDEs too difficult to solve analytically, so scientists solve them numerically Two most common numerical techniques for solving PDEs Two most common numerical techniques for solving PDEs  finite element method  finite difference method

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (2/4) Finite different methods Finite different methods  Matrix-based methods store matrix explicitly  Matrix-free implementations store matrix implicitly We have designed and analyzed parallel algorithms based on matrix-free implementations We have designed and analyzed parallel algorithms based on matrix-free implementations

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (3/4) Linear second-order PDEs Linear second-order PDEs  Elliptic (e.g., heat equation)  Hyperbolic  Parabolic (e.g., wave equation) Hyperbolic PDEs typically solved by methods not as amenable to parallelization Hyperbolic PDEs typically solved by methods not as amenable to parallelization

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (4/4) Ghost points store copies of values held by other processes Ghost points store copies of values held by other processes Explored increasing number of ghost points and replicating computation in order to reduce number of message exchanges Explored increasing number of ghost points and replicating computation in order to reduce number of message exchanges Optimal number of ghost points depends on characteristics of parallel system Optimal number of ghost points depends on characteristics of parallel system