Parallel Processing - MPI

Parallel Processing - MPI
MPI Programming Introduction to MPI: Basic Functions MPI applications using Point-to-Point communication Collective Communications Common MPI algorithms Parallel Processing - MPI

What is MPI? The Message Passing Interface (MPI) is a standard implementation of the message passing model of parallel computing. The Message Passing Interface (MPI) is a library of functions and macros that support parallel execution using message passing. MPI was developed as a standard for parallel programming by the MPI Forum, a group of researchers from the industry and the academia in Currently MPI supports message passing programming in Fortran, C and C++. The MPI Forum has revised the original MPI standard (MPI -1) and announced the MPI-2 standard in 1997. MPI is implemented on a wide variety of platforms ranging from: shared memory multiprocessors, networks of workstation and even single-processor machines. Parallel Processing - MPI

How does MPI works? A parallel program consists of a number of processes, each process is working on some local memory data, each process uses purely local variables, there is no mechanism for a process to directly access the memory of another process. Sharing of data between processes takes place by exchanging messages. Processes do not necessarily run on different processors. MPI supports the Single Program Multiple Data (SPMD) model All processing nodes execute the same program on different sets of data. The processing nodes do not necessarily execute the same instructions of the program. Parallel Processing - MPI

The MPI basic functions:
MPI has over 120 functions. Most of MPI programs can be written using only six fundamental functions. Four of these are used to initialize, manage and terminate MPI. MPI_Comm_init(&argc, &argv): This function initiates the computation using MPI. The two arguments argc and argv are pointers to the main program’s arguments. MPI_Finalize(): This function marks the end of MPI computation in a program. It cleans up any unfinished MPI operations, such as pending receives that were never completed. No MPI functions are allowed after this function. MPI_Comm_rank(comm, p_id): This function returns in p_id the process rank in the group of communicator comm. A communicator is a collection of processes that can send messages to each other. For basic programs, the only communicator needed is the MPI_COMM_WORLD, which is predefined in MPI. MPI_Comm_size(comm, size): This function returns in size the number of processes in the communicator comm. Parallel Processing - MPI

The MPI_Send and MPI_Rcve Functions:
The other two functions provide point-to-point communication. The MPI_Send() and MPI_Rcve pair of functions passes a message from one process to another. Their format is: MPI_Send(s_msg, s_count, datatype, dest, tag, comm), MPI_Recv(r_msg, r_count, datatype, srce, tag, comm, status) This pair of functions passes the message pointed by s_msg from a process with rank srce to the buffer pointed by r_msg of a process with rank dest. The message contains a number of items equal to s_count of type specified by datatype. The size of the buffer r_msg specified by r_count can be greater than the actual size of the message. The tag is used to distinguish between multiple massages passed between the same pair of processes. Parallel Processing - MPI

A simple “Hello World” MPI Program:
#include <stdio.h> #include <mpi.h> void main (int argc, char *argv[]) { MPI_Init(&argc, &argv); /* Initialize MPI */ printf(“Hello World\n”); MPI_Finalize(); /* Terminate MPI */ } The MPI header file (mpi.h) contains definitions and function prototypes that is imported via the “include” statement. The MPI_Init() and MPI_Finalize() functions return an error code indicating whether the function was successful. Each process executes a copy of the entire code. Thus when run on two processing nodes, the output of this program will be: Hello World (?? Only the root process can have access to the display) Parallel Processing - MPI

Another “Hello World” MPI Program:
#include <stdio.h> #include <mpi.h> void main (int argc, char *argv[]) { int myrank, size; MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /*Get my rank */ MPI_Comm_size(MPI_COMM_WORLD, &size); /*Get # of processors */ printf(“Processor %d of %d: Hello World\n”,myrank,size); MPI_Finalize(); /* Terminate MPI */ } Each process executes a copy of the entire code. Thus when run on four processors, the output of this program will be: Processor 2 of 4: Hello World Processor 1 of 4: Hello World Processor 3 of 4: Hello World Processor 0 of 4: Hello World }(? Only the root process can have access to the display) Parallel Processing - MPI

A Final “Hello World” MPI Program:
#include <stdio.h> #include <mpi.h> void main (int argc, char *argv[]) { int myrank, size, source, dest; int tag=50; /* Tag for messages */ char msg[100]; /* Storage for the message */ MPI_Status status; /*Return status for receive */ MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /*Get my rank */ MPI_Comm_size(MPI_COMM_WORLD, &size); /*Get # of processors */ if (myrank !=0) { sprintf_s(msg,“Greetings from Processor %d”, myrank); dest=0; MPI_Send(msg,strlen(msg)+1,MPI_CHAR,dest,tag,MPI_COMM_WORLD); } else { /*My rank = 0 */ for(source = 1; source < size; source ++) { MPI_Recv(msg,100,MPI_CHAR,source,tag,MPI_COMM_WORLD, &status); printf(“%s/n”,msg); }} MPI_Finalize(); } Parallel Processing - MPI

A Final “Hello World” MPI Program:
The first two programs will display the specified messages, given that all process can produce output. This is not the case in most MPI systems. In many systems only the root processing node has a keyboard and a display. In the last program, all processes send a message to the root process (process 0). The root processes receives the messages and displays them locally. The display of the root node will look as shown below. Greetings from process 1 Greetings from process 2 Greetings from process 3 Parallel Processing - MPI

A Summation MPI Program:
#include <stdio.h> #include <mpi.h> #define N 4; int arr[i] = {21,32,25,41, …..} void main (int argc, char *argv[]) { int myrank, size, sum=0, temp; MPI_Status status; /*Return status for receive */ MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /*Get my rank */ MPI_Comm_size(MPI_COMM_WORLD, &size); /*Get # of processors */ …..cont. on next page Parallel Processing - MPI

A Summation MPI Program: (Cont.)
if (myrank ==0) { /* The following is executed only by process 0 */ for(i=myrank;i<N;i=i+size) sum=sum+arr[i]); /* calculate partial Sum for process 0 */ for(i=1;i<size;i++) { /* read partial Sum from all other processes */ MPI_Recv(&temp,1,MPI_INT,i,i,MPI_COMM_WORLD,&status); sum=sum+temp; } /* add partial Sum from other process to global Sum */ printf(“\n The sum is %d”,sum); } /* print the result*/ else { /*myrank <> 0, thus this code is executed by all processes except process 0 */ for (i = myrank;i <N; i=i+size) sum=sum+arr[i] /* calculate partial Sum */ MPI_Send(&sum,1,MPI_INT,0,i,MPI_COMM_WORLD); /*... And send it to process 0 */ } MPI_Finalize(); } Parallel Processing - MPI

A Summation MPI Program: (Cont.)
The previous program has two problems: It assumes that each process has a copy of the array arr[I] It does not exploit spatial locality, i.e. not efficient use of the cache A better approach is shown below: Parallel Processing - MPI

A Summation MPI Program: (With Send/Rcve data distribution)
In the program below the root process (process 0): Reads the data from a file into the array arr[], Distributes the array to the rest of the processes Each process gets N/size elements starting from i*N/size Computes its local sum Collects the sum from other processes and computes the global sum The rest of the processes (process 1 to size-1): Receive their part of the array arr[] from root process Compute their local sum Send local sum to the root process #include <stdio.h> #include <mpi.h> void main (int argc, char *argv[]) { int myrank, size, sum=0, i, temp, N=10000, arr[N]; /* N is the array size */ MPI_Status status; /*Return status for receive */ MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /*Get my rank */ MPI_Comm_size(MPI_COMM_WORLD, &size); /*Get # of processors */ Parallel Processing - MPI

A Summation MPI Program: (With Send/Rcve data distribution)
if (myrank ==0) { /* The following is executed only by process 0 */ get_data(filename,arr); /*Copy data from data file to array arr[] */ for(i=1;i<size;i++) /* distribute arr[] to all other processes */ MPI_Send(arr+i*(N/size),N/size,MPI_INT,i,i,MPI_COMM_WORLD); for(i=0;i<N/size;i=i++) sum=sum+arr[i]); /* calculate partial Sum for process 0 */ for(i=1;i<size;i++) { /* read partial Sum from all other processes */ MPI_Recv(&temp,1,MPI_INT,i,myrank,MPI_COMM_WORLD,&status); sum=sum+temp; } /* add partial Sum from other process to global Sum */ printf(“\n The sum is %d”,sum); } /* print the result*/ else { /*myrank <> 0, thus this code is executed by all processes except process 0 */ MPI_Recv(arr,N/size,MPI_INT,0, myrank,MPI_COMM_WORLD, &status); for (i = 0;i <N/size; i=i++) sum=sum+arr[i] /* calculate partial Sum */ MPI_Send(&sum,1,MPI_INT,0,myrank,MPI_COMM_WORLD); /*... And send it to process 0 */ } MPI_Finalize(); Parallel Processing - MPI

Collective Communications:
In most MPI applications a process, usually process 0, must read the input data and then distribute it to the rest of the processes for further processing. At the end of the program, process 0 must collect the local result from each process, and perform some operations to compute the final result. If we are allowed to use only simple send and receive functions, then during the data distribution phase and data collection phase of the program, process 0 will do most of the work, while the rest of the processes will be idle waiting for their turn to receive the input data or send the local result to process 0. This kind of programming is inefficient and highly undesirable, since most of the processes are idle for a considerable amount of time. Parallel Processing - MPI

Collective Communications:
Collective Communication allows all processes participate evenly in the distribution and collection of data. For example consider a case where process 0 has to send some data to 7 other processes. With ordinary MPI_Send and MPI_Rcve operations this task will need 7 communication cycles. By using a Broadcast operation the above task can be completed in only 3 cycles, as shown below. The implementation of collective communication operations can vary on different machines. For example a broadcast operation on Ethernet is completed in 1 cycle. Parallel Processing - MPI

The Broadcast operation:
MPI_Bcast(message, count, datatype, root, comm): Broadcast the content of ‘message’ from the ‘root’ process to all other processes. Parallel Processing - MPI

Summation using Broadcast:
The following program uses the Broadcast to send a copy of the array to all processes and then find the sum of the array. void main(int argc, char *argv) { int myrank, numofprocs, dvals[10000], low_index, highl_index, x , i; int mysum=0; int sum=0; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numofprocs); MPI_rank(MPI_COMM_WORLD, &myrank); if(myrank==0) get_data(filename,dvals); /*Copy data from data file to array dvals[] */ MPI_Bcast(dvals,x, MPI_INT, 0, MPI_COMM_WOLD); /*Send array to all */ x=10000/numofprocs; low_index = myrank * x; high_index = low_index + x; for(i=low_index, i<high_index, i++) mysum +=dvals[i]; /*Calculate partial sum */ MPI_Send(&mysum,1,MPI_INT,0,myrank,MPI_COMM_WORLD); /*Send to root */ if(myrank == 0) { for(i=0;i<numofprocs,i++){ MPI_Rcve(&mysum, 1, MPI_INT, i,i,MPI_COMM_WORLD); sum+=mysum; } /*Calculate global sum */ printf("The sum is %d.\n",sum);} MPI_Finalize(); } Parallel Processing - MPI

The Scatter operation:
MPI_Scatter(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm): The root process splits the contents of sendbuf into segments consisting of sendcount items and distributes them to all other processes. Each process stores the message received in the recvbuf. Parallel Processing - MPI

The Gather operation: MPI_Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm): Each process sends the contents of sendbuf to the root process. The root process concatenates the received data in process rank order and stores it in the recvbuff. Parallel Processing - MPI

Summation using Scatter and Gather:
The following program uses the Scatter to distribute the array to all processes and then Gather to collect the partial sum. void main(int argc, char *argv) { int myrank, numofprocs, dvals[10000], low_index, highl_index, x , i; int mysum=0; int sum=0; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numofprocs); MPI_rank(MPI_COMM_WORLD, &myrank); if(myrank==0) get_data(filename,dvals); /*Copy data from data file to array dvals[] */ x=10000/numofprocs; int lvals[x] MPI_Scatter(dvals,x,MPI_INT,lvals,x,MPI_INT,0,MPI_COMM_WOLD); for(i=0, i<x, i++) mysum +=lvals[i]; /*Calculate partial sum */ MPI_Gather(lvals,1,MPI_INT,mysum,1,MPI_INT,0,MPI_COMM_WORLD); /*Send partial sum to root */ if(myrank == 0) { for(i=0;i<numofprocs,i++) sum+=lvals[i]; /*Calculate global sum */ printf("The sum is %d.\n",sum);} MPI_Finalize(); } Parallel Processing - MPI

The Reduce operation: MPI_Reduce(operand, result, count, datatype, operation, root, comm): The reduction function specified by the operation parameter is performed on the operand of each process, and the result is stored in the result buffer of the root process. Parallel Processing - MPI

Other Reduction operations:
Reduction operation that combine Reduction and data movement collective communication operations are also available. Parallel Processing - MPI

Summation using Scatter and Reduce:
The following program uses the Scatter and the Reduction (Sum) operation to find the sum of an array. void main(int argc, char *argv) { int myrank, numofprocs; int dvals[10000], low_index, highl_index, x , i; int mysum=0; int sum = 0; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numofprocs); MPI_rank(MPI_COMM_WORLD, &myrank); if(myrank==0) get_data(filename,dvals); /*Copy data from data file to array dvals[] */ x=10000/numofprocs; int lvals[x] MPI_Scatter(dvals,x,MPI_INT,lvals,x,MPI_INT,0,MPI_COMM_WOLD); for(i=0, i<x, i++) mysum +=lvals[i]; /*Calculate partial sum */ MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM,0,MPI_COMM_WORLD); if(myrank = = 0) printf("The sum is %d.\n",sum); MPI_Finalize(); } Parallel Processing - MPI

Parallel Processing - MPI

Similar presentations

Presentation on theme: "Parallel Processing - MPI"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Processing - MPI

Similar presentations

Presentation on theme: "Parallel Processing - MPI"— Presentation transcript:

Similar presentations

About project

Feedback