1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
CS 140: Models of parallel programming: Distributed memory and MPI.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Reference: / MPI Program Structure.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Director of Contra Costa College High Performance Computing Center
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
CS 240A Models of parallel programming: Distributed memory and MPI.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
PVM and MPI.
Introduction to parallel computing concepts and technics
MPI Basics.
CS4402 – Parallel Computing
Introduction to MPI.
MPI Message Passing Interface
CS 668: Lecture 3 An Introduction to MPI
Introduction to MPI CDP.
CS 584.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
Introduction to Message Passing Interface (MPI)
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
Introduction to parallelism and the Message Passing Interface
MPI MPI = Message Passing Interface
Introduction to Parallel Computing with MPI
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Distributed Memory Programming with Message-Passing
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
Some codes for analysis and preparation for programming
CS 584 Lecture 8 Assignment?.
Presentation transcript:

1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication.

2 What is MPI? M P I = Message Passing Interface An Interface Specification: MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a library - but rather the specification of what such a library should be. Simply stated, the goal of the Message Passing Interface is to provide a widely used standard for writing message passing programs. The interface attempts to be: practical, portable, efficient, flexible. Interface specifications have been defined for Fortran, C/C++ and Java programs.

3 Some History: MPI resulted from the efforts of numerous individuals and groups over the course of a 2 year period between 1992 and s - early 1990s: Recognition of the need for a standard arose. April, 1992: The basic features essential to a standard message passing interface were discussed, and a working group established to continue the standardization process. Preliminary draft proposal developed subsequently. November 1992: MPI draft proposal (MPI1) from ORNL presented. Group adopts procedures and organization to form the MPI Forum.MPI Forum. November 1993: Supercomputing 93 conference - draft MPI standard presented. May 1994: Final version of MPI 1 was released : Developed MPI2.

4 Programming Model: SPMD MPI lends itself to most (if not all) distributed memory parallel programming models. Distributed Memory: Originally, MPI was targeted for distributed memory systems. Shared Memory: As shared memory systems became more popular, MPI implementations for these platforms appeared. Hybrid: MPI is now used on just about any common parallel architecture including massively parallel machines, SMP clusters, workstation clusters and heterogeneous networks. All parallelism is explicit: the programmer is responsible for correctly identifying parallelism and implementing parallel algorithms using MPI constructs. The number of tasks dedicated to run a parallel program is static. New tasks cannot be dynamically spawned during run time. (MPI-2 addresses this issue).

5 C Coding – Recalling Some facts Structure of a C program: 1.Include all headers 2.Declare all functions 3.Define all functions including main Simple Facts: 1.Declarations take the first part of a block 2.Same syntax for statements 3.Various important headers stdio, stdlib etc

6 MPI - Getting Started MPI Header: Required for all programs/routines which make MPI library calls. # include “mpi.h” Sometime we have to include some other MPI libraries. MPI Functions: Format: rc = MPI_Xxxxx(parameter,... ) Example: rc = MPI_Bsend(&buf,count,type,dest,tag,comm) Error code: Returned as "rc". MPI_SUCCESS if successful

7 MPI - Getting Started General MPI Program Structure: include declarations (MPI header) the main function - initialise the MPI environment - get the MPI basic elements: size, rank, etc - do the parallel work - acquire the local data for processor rank - perform the computation of the data -terminate the MPI environment. some other functions

8 MPI Programs 1.#include 2.#include "mpi.h" 3.int main( int argc, char* argv[]) 4.{ 5.int rank, size; MPI_Init( &argc, &argv ); 9.MPI_Comm_size( MPI_COMM_WORLD, &size ); 10.MPI_Comm_rank( MPI_COMM_WORLD, &rank ); 1.// the parallel computation of processor rank 2. // get the data from somewhere 3. // process the data for processor rank 4.MPI_Finalize(); 5.return 0; 6.}

9 Environment Management Routines Intialize/terminate the comm, find information about it, etc. MPI_InitMPI_Init Initializes the MPI execution environment. MPI_Init (&argc, &argv) where argc and argv are the arguments of main() MPI_AbortMPI_Abort Terminates all MPI processes associated with the communicator. MPI_Abort (comm,errorcode) MPI_WtimeMPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor. MPI_Wtime () MPI_FinalizeMPI_Finalize Terminates the MPI execution. MPI_Finalize ().

10 Environment Management Routines MPI_Comm_sizeMPI_Comm_size Determines the number of processes from a communicator. MPI_Comm_size (comm,&size) MPI_Comm_rankMPI_Comm_rank Determines the rank of the calling process within the communicator. MPI_Comm_rank (comm,&rank) MPI_Get_processor_nameMPI_Get_processor_name Returns the processor name. MPI_Get_processor_name (&name,&resultlength)

11 Communicators: MPI_COMM_WORLD Communicators: set of processors that communicate each other. MPI routines require a communicator. MPI_COMM_WORLD the default communicator. Within a communicator each process has a rank.

12 Hello World 1.#include 2.#include "mpi.h" 3.int main( int argc, char* argv[]) 4.{ 5.int rank, size; 6.int i,namelen; 7.char processor_name[MPI_MAX_PROCESSOR_NAME]; 8. 9.MPI_Init( &argc, &argv ); 10.MPI_Comm_size( MPI_COMM_WORLD, &size ); 11.MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Get_processor_name(processor_name,&namelen); 12. printf("called on %s\n",processor_name); 13. printf( "Hello World from process %d of %d\n", rank, size ); 14.MPI_Finalize(); 15.return 0; 16.}

13 hello]$ ls hellos hellos.c hellos.o Makefile hello]$ mpirun -np 4 hellos called on cuc100.ucc.ie called on cuc104.ucc.ie called on cuc106.ucc.ie called on cuc108.ucc.ie Hello world from process 0 of 4 Hello world from process 2 of 4 Hello world from process 1 of 4 Hello world from process 3 of 4

14 Simple Structures – All Processors Work #include #include "mpi.h" int main( int argc, char* argv[]) { int rank, size; int i,namelen; double time; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); // describe the processing to be done by processor rank // 1. identify the local data by using rank // 2. process the local data MPI_Finalize(); return 0; }

15 Case Study: Count Prime Numbers Some facts about the lab program:  The first numbers 2*i+1 are tested for i = 0, 1, 2,..., n-1  With size processors  Each gets n/size numbers to test  Block partition of the odd numbers onto processors:  Proc 0: 0, 1, 2, 3,..., n/size-1  Proc 1: n/size, n/size+1,..., 2*n/size-1  Proc 2: 2*n/size, 2*n/size+1,..., 3*n/size-1  Proc rank gets: rank*(n/size),..., (rank+1)*n/size-1

16 Count Primes #include #include "mpi.h" int main( int argc, char* argv[]) { int rank, size, i, count = 0; double time; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); time=MPI_Wtime(); for(i=rank*n/size;i<(rank+1)*n/size;i++) if(isPrime(2*i+1)) count++ time=MPI_Wtime()-time; printf(“Processor %d finds %d primes in %lf\n", rank, count, time); MPI_Finalize(); return 0; } LOCAL DATA

17 Count Primes Cyclic partition of the odd numbers onto processors:  Proc 0: 0, size, 2*size,...  Proc 1: 1, size+1, 2*size+1,...  Proc rank gets: rank, rank+size,..., for(i=rank; i<n; i+=size) if(isPrime(2*i+1) count++

18 Simple Structures – One Processor Works #include #include "mpi.h" int main( int argc, char* argv[]) { int rank, size; int i,namelen; double time; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); if(rank==0) { // describe the processing to be done by processor 0 } MPI_Finalize(); return 0; } SERIAL PART

19 P2P Communication MPI P2P operations involve message passing between two different MPI processors. The sender should have MPI_Send and the receiver a MPI_Recv. The code should look like if(rank==sender || rank == receiver) {if (rank==sender) MPI_Send(…); else if (rank==receiver) MPI_Recv(); }

20 Different types of send and receive routines for different purposes. Blocking send / blocking receive; Synchronous send; Non-blocking send / non-blocking receive; Buffered send; Combined send/receive; "Ready" send. Blocking: methods return only after the operation has been done successfully. Synchronous: Blocking plus handshaking. Non-Blocking: methods return immediately but must be backed up by MPI wait or test - non blocking call - do some other computation - wait or test the call completion. Types of P2P Communication

21 Blocking

22 Non-Blocking

23 Synchronous

24 Envelope Details The P2P operations should have envelope details. Blocking send: MPI_Send(buffer, count, type, dest, tag, comm) Blocking receive: MPI_Recv(buffer, count, type, source, tag, comm, status) Non-blocking send: MPI_Isend(buffer, count, type, dest, tag, comm, request) Non-blocking receive: MPI_Irecv(buffer, count, type, source, tag, comm, request) buffer - the address of the message count - number of elements to be sent dest - the process destination tag - the message tag/id; wild card MPI_ANY_TAG comm - the communicator status - general info about the message

25 MPI Data Types MPI_CHAR signed char MPI_SHORTsigned short int MPI_INT signed int MPI_LONG signed long int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double MPI_LOGICAL logical

26 Basic Blocking Operations MPI_Send – Basic send routine returns only after the application buffer in the sending task is free for reuse. MPI_Send (&buf,count,datatype,dest,tag,comm) MPI_Recv - Receive a message and block until the requested data is available. MPI_Recv (&buf,count,datatype,source,tag,comm,&status) MPI_Ssend - Synchronous blocking send: MPI_Ssend (&buf,count,datatype,dest,tag,comm,ierr) MPI_Bsend - Buffered blocking send MPI_Bsend (&buf,count,datatype,dest,tag,comm) MPI_Rsend - Blocking ready send. MPI_Rsend (&buf,count,datatype,dest,tag,comm) Similar MPI_?recv can be analyzed.

27 Basic Non-Blocking Operations MPI_Isend – immediate send operation that should be followed by MPI_Wait or MPI_Test. MPI_Isend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irecv - immediate receive MPI_Irecv (&buf,count,datatype,source,tag,comm,&request) MPI_Issend – immediate synchronous send. MPI_Wait() or MPI_Test() indicates when the destination process has received the message. MPI_Issend (&buf,count,datatype,dest,tag,comm,&request) MPI_Ibsend - Non-blocking buffered send. MPI_Ibsend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irsend Non-blocking ready send. MPI_Irsend (&buf,count,datatype,dest,tag,comm,&request) MPI_Test - MPI_Test checks the status of a specified non-blocking send or receive operation.

28 Simple Ping-Pong Example

29 Simple Ping-Pong Example

30 Simple Ping-Pong Example Ping-Pong computation works with the following elements: - Only two processors involved in; the rest are idle. -Processor rank1 does: -1. Prepare the message. -2. Send the message to Processor Receive the message from Processor 2. -Processor rank2 does: -1. Prepare the message. -2. Receive the message to Processor Send the message from Processor 2.

31 // MPI program to ping-pong between Processor 0 and Processor 1 #include "mpi.h" #include int main(int argc, char * argv []) int numtasks, rank, dest, source, rc, count, tag=1; char inmsg, outmsg; MPI_Status Stat ; MPI_Init (&argc,&argv); MPI_Comm_size (MPI_COMM_WORLD, &numtasks); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) { dest = source = 1;outmsg=’x’; rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); } else if (rank == 1) { dest = source = 0;outmsg=’y’; rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } rc = MPI_Get_count (&Stat, MPI_CHAR, &count); printf("Task %d: Received %d char(s) from task %d with tag %d \n", rank, count, Stat.MPI_SOURCE,Stat.MPI_TAG); MPI_Finalize (); }

32 pingpong]$ make /usr/local/mpich/bin/mpicc -c pingpong.c /usr/local/mpich/bin/mpicc -o pingpong pingpong.o -lm pingpong]$ mpirun -np 2 pingpong Task 0 received the char x Task 0: Received 1 char(s) from task 1 with tag 1 Task 1 received the char y Task 1: Received 1 char(s) from task 0 with tag 1

33 All-to-Root as P2P Communication

34 All-to-Root as P2P Communication All-to-root computation involves: - The processor rank sends the message to root. - If Processor 0 then : -for size times do -Receive the message from Processor source. Overall execution time computation?

35 #include #include "mpi.h" int main(int argc, char** argv) { int rank; /* Rank of process */ int size; /* Number of processes */ int source, dest, int tag = 50; MPI_Status status; /* Return status for receive */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); sprintf(message, "Greetings from process %d!", rank); dest = 0; /* Use strlen(message)+1 to include '\0' */ MPI_Send(message, strlen(message)+1, MPI_CHAR, 0, tag, MPI_COMM_WORLD); if(rank == 0) { for (source = 0; source < size; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } MPI_Finalize(); }

36 fireP0]$ make /usr/local/mpich/bin/mpicc -c fire.c /usr/local/mpich/bin/mpicc -o fire fire.o -lm fireP0]$ mpirun -np 5 fire Here it is Process 0 Greetings from process 1! Greetings from process 2! Greetings from process 3! Greetings from process 4!

37 Ring Communication How can the processors all know a variable? How many values they have to know? How to achieve this? circular process Each processor repeats for size times: Send the value to the right Receive a value from left Store the value or process the value

38 Ring Communication a b d c ef

39 Ring Communication a,f b,a d,c c,b e,df,e

40 # include # include“mpi.h” # define tag 100 int main (int argc, char *argv[]){ int ierror, rank, size; int right, left; int ibuff, obuff, sum, i; MPI_Status recv_status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); right = rank + 1; if (right == size) right = 0; left = rank - 1; if (left == -1) left = size-1; sum = 0; obuff = rank; for( i = 0; i < size; i++) { MPI_Send(&obuff, 1, MPI_INT, right, tag, MPI_COMM_WORLD); MPI_Recv(&ibuff, 1, MPI_INT, left, tag, MPI_COMM_WORLD, &recv_status); // storebuff[(rank-i)%n] = obuff; sum = sum + ibuff; obuff = ibuff; } printf ("\t Processor %d: \t Sum = %d\n", rank, sum); MPI_Finalize(); }

41 ring]$ make /usr/local/mpich/bin/mpicc -c ring.c /usr/local/mpich/bin/mpicc -o ring ring.o -lm ring]$ mpirun -np 5 ring Processor 0: Sum = 10 Processor 1: Sum = 10 Processor 3: Sum = 10 Processor 4: Sum = 10 Processor 2: Sum = 10

42 References: 1. LLNL MPI Tutorial – Sections on P2P communication. 2. Wilkinson Book – Sections on P2P Communication.