Parallel Simulation of Continuous Systems: A Brief Introduction

Slides:



Advertisements
Similar presentations
Chapter 8 Elliptic Equation.
Advertisements

Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
Parallel MIMD Algorithm Design
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
CS213 Parallel Processing Architecture Lecture 5: MIMD Program Design
Virtues of Good (Parallel) Software
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
1 Parallel Computing 5 Parallel Application Design Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.
Parallel Programming in C with MPI and OpenMP
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Hyperbolic PDEs Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department.
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
MECN 3500 Inter - Bayamon Lecture 9 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Introduction to Numerical Methods for ODEs and PDEs Lectures 1 and 2: Some representative problems from engineering science and classification of equation.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Towards Future Navier-Stokes Schemes Uniform Accuracy, O(h) Time Step, Accurate Viscous/Heat Fluxes Hiroaki Nishikawa National Institute of Aerospace.
Solving Partial Differential Equation Numerically Pertemuan 13 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
Auburn University
Auburn University
Xing Cai University of Oslo
High Altitude Low Opening?
Christopher Crawford PHY
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
CS 584 Lecture 3 How is the assignment going?.
Soft Computing Applied to Finite Element Tasks
Parallel Algorithm Design
Lecture 19 MA471 Fall 2003.
Parallel Programming in C with MPI and OpenMP
CS 584.
CS6068 Applications: Numerical Methods
Parallel algorithm design
Lecture 2 The Art of Concurrency
Parallel Programming in C with MPI and OpenMP
CS 584 Lecture 5 Assignment. Due NOW!!.
Mattan Erez The University of Texas at Austin
Presentation transcript:

Parallel Simulation of Continuous Systems: A Brief Introduction Oct. 19, 2005 CS6236 Lecture

Background Sample applications of continuous systems Computer simulations Sample applications of continuous systems Civil engineering: building construction Aerospace engineering: aircraft design Mechanical engineering: machining Systems biology: heart simulations Computer engineering: semiconductor simulations Discrete models Continuous models

Outline Mathematical models and methods Parallel algorithm methodology Some active research areas

Mathematical Models Ordinary/partial differential equations Laplace equation: Heat (diffusion) equation: Steady-state v.s. time-dependent Convert into discrete problem through numerical discretization Finite difference methods: structured grids Finite element methods: local basis functions Spectral methods: global basis functions Finite volume methods: conservation

Example: 1-D Laplace Equation Laplace equation in one dimension with boundary conditions Finite difference approximation with Jacobi iteration

Example: 2-D Laplace Equation Laplace equation in two dimension with boundary conditions at four sides

Parallel Programming Model Parallel computation: two or more tasks executing concurrently Task encapsulates sequential program and local memory Tasks can be mapped to processors in various ways, including multiple tasks per processor

Performance Considerations Load balance: work divided evenly Concurrency: work done simultaneously Overhead: work not present in serial computation Communication Synchronization Redundant work Speculative work

Example: 1-D Laplace Equation Define n tasks, one for each yi Program for task i, i=1,…,n Initialize yi for k=1,… if i>1, send yi to task i-1 if i<n, send yi to task i+1 if i<n, recv yi+1 from task i+1 if i>1, recv yi-1 from task i-1 yi = (yi-1+yi+1)/2 end

Design Methodology Partition (Decomposition): decompose problem into fine-grained tasks to maximize potential parallelism Communication: determine communication pattern among tasks Agglomeration: combine into coarser-grained tasks, if necessary, to reduce communication requirements or other costs Mapping: assign tasks to processors, subject to tradeoff between communication cost and concurrency

Design Methodology

Types of Partitioning Domain decomposition: partition data Example: grid points in 1-, 2-, or 3-D mesh Functional decomposition: partition computation Example: components in climate model (atmosphere, ocean, land, etc.)

Example: Domain Decomposition 3-D mesh can be partitioned along any combination of one, two, or all three of its dimensions

Partitioning Checklist Identify at least an order of magnitude more tasks than processors in target parallel system Avoid redundant computation or storage Make tasks reasonably uniform in size Number of tasks, rather than size of each task, should grow as problem size increases

Communication Issues Latency and bandwidth Routing and switching Contention, flow control, and aggregate bandwidth Collective communication One-to-many: broadcast, scatter Many-to-one: gather, reduction, scan All-to-all Barrier

Communication Checklist Communication should be reasonably uniform across tasks in frequency and volume As localized as possible Concurrent Overlapped with computation, if possible Not inhibiting concurrent execution of tasks

Agglomeration Communication is proportional to surface area of subdomain, whereas computation is proportional to volume of subdomain Higher-dimensional decompositions have more favorable communication-to-computation ratio Increasing task sizes reduces communication but also reduces potential concurrency and flexibility

Surface-to-Volume Ratio

Example: Agglomeration Define p tasks, each with n/p of yi’s Program for task j, j=1,...p initialize yl,...,yh for k=1,... if j>1, send yl to task j-1 if j<p, send yh to task j+1 if j<p, recv yh+1 from task j+1 if j>1, recv yl-1 from task j-1 for i=l to h zi = (yi-1+yi+1)/2 end y = z end

Example: Overlap Comm/Comp Program for task j, j=1,...p initialize yl,...,yh for k=1,... if j>1, send yl to task j-1 if j<p, send yh to task j+1 for i=l+1 to h-1 zi = (yi-1+yi+1)/2 end if j<p, recv yh+1 from task j+1 zh = (yh-1+yh+1)/2 if j>1, recv yl-1 from task j-1 zl = (yl-1+yl+1)/2 y = z end

Mapping Two basic strategies for assigning tasks to processors: Place tasks that can execute concurrently on different processors Place tasks that communicate frequently on same processor Problem: These two strategies often conflict In general, finding optimal solution to this tradeoff is NP-complete, so heuristics are used to find reasonable compromise Dynamic vs static strategies

Mapping Issues Partitioning Granularity Mapping Scheduling Load balancing Particularly challenging for irregular problems Some software tools: Metis, Chaco, Zoltan, etc.

Example: Atmosphere Model Partitioning grid points in 3-D finite difference model Typically yields 105 to 107 tasks Communication 9-point stencil horizontally and 3-point stencil vertically Physics computations in vertical columns Global operations to compute total mass

Example: Atmosphere Model

Other Equations Heat (diffusion) equation: Laplace equation: Advection equation: Wave equation: Classification of second-order equations Parabolic, hyperbolic, and elliptic Methods for time-dependent equations Explicit v.s. implicit Finite-difference, finite-volume, finite-element

CFL Condition for Stability Necessary condition named after Courant, Friedrichs, and Lewy Computational domain of dependence must contain physical domain of dependence Implies time step must satisfy

Active Research Areas DES of continuous systems

Active Research Areas Coupling of different physics Load balancing Different mathematical models Continuous v.s. discrete techniques Load balancing Manager-worker model Irregular/unstructured problems Dynamic load balancing

Summary Mathematical models for continuous systems Ordinary and partial differential equations Finite difference, finite volume, and finite element Parallel algorithm design Partitioning Communication Agglomeration Mapping Active research areas

References I. T. Foster, Designing and Building Parallel Programs, Addison-Wesley, 1995 A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2nd. ed., Addison-Wesley, 2003 M. J. Quinn, Parallel Computing: Theory and Practice, McGraw-Hill, 1994 K. M. Chandy and J. Misra, Parallel Program Design: A Foundation, Addison-Wesley, 1988