Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)

Slides:



Advertisements
Similar presentations
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Advertisements

CS 140: Models of parallel programming: Distributed memory and MPI.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Reference: / MPI Program Structure.
High Performance Computing
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
High Performance Parallel Programming Dirk van der Knijff Advanced Research Computing Information Division.
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Director of Contra Costa College High Performance Computing Center
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid.
An Introduction to Parallel Programming and MPICH Nikolaos Hatzopoulos.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
Introduction to MPI Nischint Rajmohan 5 November 2007.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
NORA/Clusters AMANO, Hideharu Textbook pp. 140-147.
1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.
ECE 1747H: Parallel Programming Lecture 2-3: More on parallelism and dependences -- synchronization.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
PVM and MPI.
Introduction to parallel computing concepts and technics
MPI Basics.
Introduction to MPI.
MPI Message Passing Interface
CS 584.
Introduction to Message Passing Interface (MPI)
Message Passing Models
Lecture 14: Inter-process Communication
Lab Course CFD Parallelisation Dr. Miriam Mehl.
Introduction to parallelism and the Message Passing Interface
MPI (continue) An example for designing explicit message passing programs Emphasize on the difference between shared memory code and distributed memory.
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI (continue) An example for designing explicit message passing programs Emphasize on the difference between shared memory code and distributed memory.
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)

Explicit Parallelism Same thing as multithreading for shared memory. Explicit parallelism is more common with message passing. –User has explicit control over processes. –Good: control can be used to performance benefit. –Bad: user has to deal with it.

Distributed Memory - Message Passing proc1proc2proc3procN mem1mem2mem3memN network

Distributed Memory - Message Passing A variable x, a pointer p, or an array a[] refer to different memory locations, depending of the processor. In this course, we discuss message passing as a programming model (can be on any hardware)

What does the user have to do? This is what we said for shared memory: –Decide how to decompose the computation into parallel parts. –Create (and destroy) processes to support that decomposition. –Add synchronization to make sure dependences are covered. Is the same true for message passing?

Another Look at SOR Example for some number of timesteps/iterations { for (i=0; i<n; i++ ) for( j=0; j<n, j++ ) temp[i][j] = 0.25 * ( grid[i-1][j] + grid[i+1][j] grid[i][j-1] + grid[i][j+1] ); for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) grid[i][j] = temp[i][j]; }

Shared Memory proc1proc2proc3procN grid 11 temp

Message-Passing Data Distribution (only middle processes) proc2proc grid temp

Is this going to work? Same code as we used for shared memory for( i=from; i<to; i++ ) for( j=0; j<n; j++ ) temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); No, we need extra boundary elements for grid.

Data Distribution (only middle processes) proc2proc grid temp

Is this going to work? Same code as we used for shared memory for( i=from; i<to; i++) for( j=0; j<n; j++ ) temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); No, on the next iteration we need boundary elements from our neighbors.

Data Communication (only middle processes) proc2proc3 grid

Is this now going to work? Same code as we used for shared memory for( i=from; i<to; i++ ) for( j=0; j<n; j++ ) temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); No, we need to translate the indices.

Index Translation for( i=0; i<n/p; i++) for( j=0; j<n; j++ ) temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); Remember, all variables are local.

Index Translation is Optional Allocate the full arrays on each processor. Leave indices alone. Higher memory use. Sometimes necessary (see later).

What does the user need to do? Divide up program in parallel parts. Create and destroy processes to do above. Partition and distribute the data. Communicate data at the right time. (Sometimes) perform index translation. Still need to do synchronization? –Sometimes, but many times goes hand in hand with data communication.

Message Passing Systems Provide process creation and destruction. Provide message passing facilities (send and receive, in various flavors) to distribute and communicate data. Provide additional synchronization facilities.

MPI (Message Passing Interface) Is the de facto message passing standard. Available on virtually all platforms, including public domain versions (MPICH). Grew out of an earlier message passing system, PVM, now outdated.

MPI Process Creation/Destruction MPI_Init( int argc, char **argv ) Initiates a computation. MPI_Finalize() Terminates a computation.

MPI Process Identification MPI_Comm_size( comm, &size ) Determines the number of processes. MPI_Comm_rank( comm, &pid ) Pid is the process identifier of the caller.

MPI Basic Send MPI_Send(buf, count, datatype, dest, tag, comm) buf: address of send buffer count: number of elements datatype: data type of send buffer elements dest: process id of destination process tag: message tag (ignore for now) comm: communicator (ignore for now)

MPI Basic Receive MPI_Recv(buf, count, datatype, source, tag, comm, &status) buf: address of receive buffer count: size of receive buffer in elements datatype: data type of receive buffer elements source: source process id or MPI_ANY_SOURCE tag and comm: ignore for now status: status object

MPI Matrix Multiply (w/o Index Translation) main(int argc, char *argv[]) { MPI_Init (&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &p); from = (myrank * n)/p; to = ((myrank+1) * n)/p; /* Data distribution */... /* Computation */... /* Result gathering */... MPI_Finalize(); } Willy Zwaenepoel: missing initialization of a and b Willy Zwaenepoel: missing initialization of a and b

MPI Matrix Multiply (w/o Index Translation) /* Data distribution */ if( myrank != 0 ) { MPI_Recv( &a[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); } else { for( i=1; i<p; i++ ) { MPI_Send( &a[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD ); MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD ); }

MPI Matrix Multiply (w/o Index Translation) /* Computation */ for ( i=from; i<to; i++) for (j=0; j<n; j++) { C[i][j]=0; for (k=0; k<n; k++) C[i][j] += A[i][k]*B[k][j]; }

MPI Matrix Multiply (w/o Index Translation) /* Result gathering */ if (myrank!=0) MPI_Send( &c[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD); else for (i=1; i<p; i++) MPI_Recv( &c[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD, &status);

MPI Matrix Multiply (with Index Translation) main(int argc, char *argv[]) { MPI_Init (&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &p); from = (myrank * n)/p; to = ((myrank+1) * n)/p; /* Data distribution */... /* Computation */... /* Result gathering */... MPI_Finalize(); } Willy Zwaenepoel: missing initialization of a and b Willy Zwaenepoel: missing initialization of a and b

MPI Matrix Multiply (with Index Translation) /* Data distribution */ if( myrank != 0 ) { MPI_Recv( &a, n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); } else { for( i=1; i<p; i++ ) { MPI_Send( &a[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD ); MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD ); }

MPI Matrix Multiply (with Index Translation) /* Computation */ for ( i=0; i<n/p; i++) for (j=0; j<n; j++) { C[i][j]=0; for (k=0; k<n; k++) C[i][j] += A[i][k]*B[k][j]; }

MPI Matrix Multiply (with Index Translation) /* Result gathering */ if (myrank!=0) MPI_Send( &c, n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD); else for( i=1; i<p; i++ ) MPI_Recv( &c[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD, &status);

Running a MPI Program mpirun Interacts with a daemon process on the hosts. Causes a Unix process to be run on each of the hosts. Currently: only runs in interactive mode on our Itanium (batch mode blocked by ssh)