April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Parallel & Cluster Computing Distributed Cartesian Meshes Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy,
Reference: / MPI Program Structure.
Numerical Algorithms Matrix multiplication
MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.
Getting Started with MPI Self Test with solution.
MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Reference: Message Passing Fundamentals.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Collective Communications Self Test with solution.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
I/O Optimization for ENZO Cosmology Simulation Using MPI-IO Jianwei Li12/06/2001.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Parallel Programming in C with MPI and OpenMP
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Message Passing Interface (MPI) Part I NPACI Parallel.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
Chapter 5, CLR Textbook Algorithms on Grids of Processors.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 9 October 30, 2002 Nayda G. Santiago.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
ECE 1747H : Parallel Programming Message Passing (MPI)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
Parallel Programming with MPI By, Santosh K Jena..
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
Introduction to MPI Nischint Rajmohan 5 November 2007.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
M1G Introduction to Programming 2 2. Creating Classes: Game and Player.
Implementing Processes and Threads CS550 Operating Systems.
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Computations with MPI technology Matthew BickellThomas Rananga Carmen Jacobs John S. NkunaMalebo Tibane S UPERVISORS : Dr. Alexandr P. Sapozhnikov Dr.
Parallel Programming for Wave Equation
Introduction to parallel computing concepts and technics
High Altitude Low Opening?
MPI Message Passing Interface
CS 584.
Compiling and Job Submission
Parallel Processing - MPI
Lecture 14: Inter-process Communication
Introduction to parallelism and the Message Passing Interface
Creating Computer Programs
MPI MPI = Message Passing Interface
Introduction to High Performance Computing Lecture 16
Creating Computer Programs
MPI (continue) An example for designing explicit message passing programs Emphasize on the difference between shared memory code and distributed memory.
Parallel Processing - MPI
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Parallel Programming & Cluster Computing Transport Codes and Shifting
Presentation transcript:

April 24, 2002 Parallel Port Example

April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use of the MPI library to parallelize a partial differential equation (PDE). The Laplace problem is a simple PDE and is found at the core of many applications. More elaborate problems often have the same communication structure that we will discuss in this class. Thus, we will use this example to provide the fundamentals on how communication patterns appear on more complex PDE problems. This lecture will demonstrate message passing techniques, among them, how to: Distribute Work Distribute Data Communication: Since each processor has its own memory, the data is not shared, and communication becomes important. Synchronization

April 24, 2002 Laplace Equation The Laplace equation is: We want to know t(x,y) subject to the following initial boundary conditions:

April 24, 2002 Laplace Equation To find an approximate solution to the equation, define a square mesh or grid consisting of points

April 24, 2002 The Point Jacobi Iteration The method known as “point Jacobi iteration” calculates the value if T9i,j) as an average of the old values of T at the neighboring points:

April 24, 2002 The Point Jacobi Iteration The iteration is repeated until the solution is reached. If we want to solve T for [1000, 1000] points, the grid itself needs to be of dimension 1002 x 1002; since the algorithm to calculate T9i,j) requires values of T at I-1, I+1, j-1, and j+1.

April 24, 2002 Serial Code Implementation In the following NR=numbers of rows, NC= number of columns. (excluding the boundary columns and rows) The serial implementation of the Jacobi iteration is:

April 24, 2002 Serial Version – C

April 24, 2002 Serial Version – C

April 24, 2002 Serial Version – C

April 24, 2002 Serial Version - Fortran

April 24, 2002 Serial Version - Fortran

April 24, 2002 Serial Version - Fortran

April 24, 2002 Serial Version - Fortran

April 24, 2002 Serial Version - Fortran

April 24, 2002 Parallel Version: Example Using 4 Processors Recall that in the serial case the grid boundaries were:

April 24, 2002 Simplest Decomposition for Fortran Code

April 24, 2002 Simplest Decomposition for Fortran Code A better distribution from the point of view of communication optimization is the following: The program has a “local” view of data. The programmer has to have a “global” view of data.

April 24, 2002 Simplest Decomposition for C Code

April 24, 2002 Simplest Decomposition for C Code In the parallel case, we will break this up into 4 processors: There is only one set of boundary values. But when we distribute the data, each processor needs to have an extra row for data distribution: The program has a “local” view of data. The programmer has to have a “global” view of data.

April 24, 2002 Include Files Fortran: * (always declare all variables) implicit none INCLUDE 'mpif.h‘ * Initialization and clean up (always check error codes): call MPI_Init(ierr) call MPI_Finalize(ierr) C: #include "mpi.h" /* Initialization and clean up (always check error codes): */ stat = MPI_Init(&argc, &argv); stat = MPI_Finalize(); Note: Check for MPI_SUCCESS if (ierr. ne. MPI_SUCCESS) then do error processing endif

April 24, 2002 Initialization Serial version: Parallel version: Just for simplicity, we will distribute rows in C and columns in Fortran; this is easier because data is stored in rows C and in columns Fortran.

April 24, 2002 Parallel Version: Boundary Conditions Fortran Version We need to know MYPE number and how many PEs we are using. Each processor will work on different data depending on MYPE. Here are the boundary conditions in the serial code, where NRL-local number of rows, NRL=NPROC

April 24, 2002 Parallel C Version: Boundary Conditions We need to know MYPE number and how many PEs we are using. Each processor will work on different data depending on MYPE. Here are the boundary conditions in the serial code, where NRL=local number of rows, NRL=NR/NPROC

April 24, 2002 Processor Information Fortran: Number of processors: call MPI_Comm_size (MPI_COMM_WORLD, npes ierr) Processor Number: call MPI_Comm_rank(MPI_COMM_WORLD, mype, ierr) C: Number of processors: stat = MPI_Comm_size(MPI_COMM_WORLD, &npes); Processor Number: stat = MPI_Comm_rank(MPI_COMM_WORLD, &mype);

April 24, 2002 Maximum Number of Iterations Only 1 PE has to do I/O (usually PE0). Then PE0 (or root PE) will broadcast niter to all others. Use the collective operation MPI_Bcast. Fortran: Here number of elements is how many values we are passing, in this case only one: niter. C:

April 24, 2002 Main Loop for (iter=1; iter <= NITER; iter++) { Do averaging (each PE averages from 1 to 250) Copy T into Told Send Values down This is where we use MPI communication calls: need to exchange data between processors Send values up Receive values from above Receive values from below (find the max change) Synchronize }

April 24, 2002 Parallel Template: Send data up Once the new T values have been calculated: SEND All processors except processor 0 send their “first” row (in C) to their neighbor above (mype – 1).

April 24, 2002 Parallel Template: Send data down SEND All processors except the last one, send their “last” row to their neighbor below (mype + 1).

April 24, 2002 Parallel Template: Receive from above Receive All processors except PE0, receive from their neighbor above and unpack in row 0.

April 24, 2002 Parallel Template: Receive from below Receive All processors except processor (NPES-1), receive from the neighbor below and unpack in the last row. Example: PE1 receives 2 messages – there is no guarantee of the order in which they will be received.

April 24, 2002 Parallel Template (C)

April 24, 2002 Parallel Template (C)

April 24, 2002 Parallel Template (C)

April 24, 2002 Parallel Template (C)

April 24, 2002 Parallel Template (C)

April 24, 2002 Parallel Template (Fortran)

April 24, 2002 Parallel Template (Fortran)

April 24, 2002 Parallel Template (Fortran)

April 24, 2002 Parallel Template (Fortran)

April 24, 2002 Parallel Template (Fortran)

April 24, 2002 Variations if ( mype != 0 ){ up = mype - 1 MPI_Send( t, NC, MPI_FLOAT, up, UP_TAG, comm, ierr ); } Alternatively up = mype - 1 if ( mype == 0 ) up = MPI_PROC_NULL; MPI_Send( t, NC, MPI_FLOAT, up, UP_TAG, comm,ierr );

April 24, 2002 Variations if( mype.ne.0 ) then left = mype - 1 call MPI_Send( t, NC, MPI_REAL, left, L_TAG, comm, ierr) endif Alternatively left = mype - 1 if( mype.eq.0 ) left = MPI_PROC_NULL call MPI_Send( t, NC, MPI_REAL, left, L_TAG, comm, ierr) endif Note: You may also MPI_Recv from MPI_PROC_NULL

April 24, 2002 Variations Send and receive at the same time: MPI_Sendrecv( … )

April 24, 2002 Finding Maximum Change Each PE can find it’s own maximum change dt To find the global change dtg in C:: MPI_Reduce(&dt, & dtg, 1, MPI_FLOAT, MPI_MAX, PE0, comm); To find the global change dtg in Fortran: call MPI_Reduce(dt,dtg,1,MPI_REAL,MPI_MAX, PE0, comm, ierr)

April 24, 2002 Domain Decomposition

April 24, 2002 Data Distribution I Domain Decomposition I All processors have entire T array. Each processor works on TW part of T. After every iteration, all processors broadcast their TW to all other processors. Increased memory. Increased operations.

April 24, 2002 Data Distribution I Domain Decomposition II Each processor has sub-grid. Communicate boundary values only. Reduce memory. Reduce communications. Have to keep track of neighbors in two directions.

April 24, 2002 Exercise 1. Copy the following parallel templates into your /tmp directory in jaromir: /tmp/training/laplace/laplace.t3e.c /tmp/training/laplace/laplace.t3e.f 2. These are template files; your job is to go into the sections marked "<<<<<<" in the source code and add the necessary statements so that the code will run on 4 PEs. Useful Web reference for this exercise: To view a list of all MPI calls, with syntax and descriptions, access the Message Passing Interface Standard at: 3. To compile the program, after you have modified it, rename the new programs laplace_mpi_c.c and laplace_mpi_f.f and execute: cc –lmpi laplace_mpi_c f90 –lmpi laplace_mpi_f

April 24, 2002 Exercise 4. To run: echo 200 | mpprun -n4./laplace_mpi_c echo 200 | mpprun -n 4./laplace_mpi_f 5. You can check your program against the solutions laplace_mpi_c.c laplace_mpi_c.c and laplace_mpi_f.f

April 24, 2002 Source Codes The following are the C and Fortran templates that you need to parallelize for the Exercise. laplace.t3e.c

April 24, 2002 Source Codes

April 24, 2002 Source Codes

April 24, 2002 Source Codes

April 24, 2002 Source Codes

April 24, 2002 Source Codes laplace.t3e.f

April 24, 2002 Source Codes

April 24, 2002 Source Codes

April 24, 2002 Source Codes

April 24, 2002 Source Codes