9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid.

9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-2.ppt Modification date: Feb 26, 2010

9-2.2 Using Parallel Programming Techniques Grid computing offers possibility of using multiple computers on a single problem to decrease execution time. Potential of using multiple computers collectively well known (at least since 1950’s) and obvious. Suppose a problem divided into p equal parts, with each part executing on a separate computer at same time. Overall time to execute problem using p computers simultaneously would be 1/p th of time on a single computer. Ideal situation - problems cannot usually be divided into equal parts that can be executed simultaneously.

9-2.3 Message Passing Programming Computers on a Grid connected through Internet or a high performance distributed network. Suggests programming model similar to that often used in cluster computing. In cluster computing, computers connected through a local network with computers physically nearby. For such a structure, a convenient approach - have processes on different computers communicate through message passing. Message passing generally done using library routines. Programmer inserts message-passing routines into their code

9-2.4 Message passing concept using library routines Fig 9.9

2.5 Multiple program, multiple data (MPMD) model Source file Executable Processor 0Processorp - 1 Compile to suit processor Source file Different programs executed by each processor

2.6 Single Program Multiple Data (SPMD) model Source file Executables Processor 0Processorp - 1 Compile to suit processor Same program executed by each processor Control statements select different parts for each processor to execute.

2.7 Message Passing Routines for Clusters Late 1980’sParallel Virtual Machine (PVM) - developed Became very popular. Mid 1990’s -Message-Passing Interface (MPI) - standard defined. Based upon Message Passing Parallel Programming model. Both provide a set of user-level libraries for message passing. Use with sequential programming languages (C, C++,...).

2.8 MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability. Defines routines, not implementation. Several free implementations exist.

9-2.9 MPI version 1 finalized in 1994. Version 2 of MPI introduced in 1997 with a greatly expanded scope. MPI version 1 has about 126 routines MPI version 2 has to about 156 routines. MPI-2 standard daunting. However, in most applications only a few MPI routines actually needed. Suggested that many MPI programs can be written using only about six different routines.

9-2.10 Basic MPI Programming Model Single program multiple data model (SPMD) Each process executes same code User writes a single program compiled to suit processors and all processes started together.

9-2.11 MPI introduces concept of a communication domain or context it calls a communicator (a non-obvious name). Defines scope of a communication and initially all processes enrolled in a communicator called MPI_COMM_WORLD. Appears in all MPI message passing routines. Enables one to constrain message-passing to be between agreed processes. Also enables library routines written to be others to be incorporated into a program without conflicts by assigning different communicators in the library routines. Many MPI programs, however, do not need additional communicators. MPI communicator

In MPI, processes within a communicator given a number called a rank starting from zero onwards. Program uses control constructs, typically IF statements, to direct processes to perform specific actions. Example if (rank == 0).../* rank 0 process does this */; if (rank == 1)... /* rank 1 process does this*/;. 2.12

Master-Slave approach Usually computation constructed as a master-slave model. Example if (rank == 0).../* master does this */; else.../* slaves do this */; 2.13

Static process creation All executables started together. Done when one starts the compiled programs. Normal MPI way. 2.14

2.15 Multiple Program Multiple Data (MPMD) Model with Dynamic Process Creation Process 1 Process 2 spawn(); Time Start execution of process 2 One processor executes master process. Other processes started from within master process Available in MPI-2 Might find applicability if do not know initially how many processes needed. Does have a process creation overhead.

9-2.16 #include #include #include "mpi.h" main(int argc, char **argv ) { char message[20]; int i,rank, size, type=99; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if(rank == 0) { strcpy(message, "Hello, world"); for (i=1; i<size; i++) MPI_Send(message,13,MPI_CHAR,i,type,MPI_COMM_WORLD); } else MPI_Recv(message,20,MPI_CHAR,0,type,MPI_COMM_WORLD,&status); printf( "Message from process =%d : %.13s\n", rank,message); MPI_Finalize(); } Sample MPI “Hello world” program Will look at routines in more detail later Fig 9.10

9-2.17 Message “Hello World” sent from master process (rank = 0) to the other processes (rank  0). Then, all processes execute a println statement. In MPI, standard output automatically redirected from remote computers to user’s console. Final result: Message from process =1 : Hello, world Message from process =0 : Hello, world Message from process =2 : Hello, world Message from process =3 : Hello, world... except order of messages might be different—will depend upon how processes scheduled.

9-2.18 Very convenient model for many applications in which number of processes needed is fixed. Typically, number of processes set to be same as number of available processor cores as generally less efficient to run multiple processes on a single processor core. Possible to start processes dynamically from within a process in MPI version 2. Single program multiple data model (SPMD)

9-2.19 Each process will have a copy of variables named in source program. Can cause some confusion for the programmer. For example, suppose source program has a variable i declared. Variable i will appear in each process, although each is distinct and holds a value determined by the local process. Single program multiple data model (SPMD)

9-2.20 Setting up Message Passing Environment Usually computers to be used specified in a file, called a hostfile or machines file. An example of a machines file is coit-grid01.uncc.edu coit-grid02.uncc.edu coit-grid03.uncc.edu coit-grid04.uncc.edu coit-grid05.uncc.edu If a machines file not specified, default machines file used or it may be that program will only run on a single computer.

9-2.21 Message routing between computers typically done by daemon processes installed on computers that form the “virtual machine”. Application daemon process program Workstation Application program Application program Workstation Workstation Messages sent through network (executable) (executable) (executable). Can have more than one process running on each computer. Depends upon MPI implementation. MPI daemon processes must be running before user submits programs

Process Creation and Execution 9-2.22 Compiling Using SPMD model, single program written and compiled to include MPI libraries, typically using an MPI script mpicc, i.e., mpicc myProg.c -o myProg mpicc typically uses cc compiler and cc options available..

9-2.23 Once compiled, program can be executed. Executing program MPI-1 does not specify implementation details at all. Common command to execute program is mpirun where -np option specifies number of processes. Example mpirun -np 4 myProg Command line arguments for myProg can be added after myProg and could be accessed by the program.

2.24 Executing MPI program on multiple computers Create a file called say “machines” containing the list of machines, say: coit-grid01.uncc.edu coit-grid02.uncc.edu coit-grid03.uncc.edu coit-grid04.uncc.edu coit-grid05.uncc.edu

2.25 mpirun -machinefile machines -n 4 prog would run prog with four processes. Each processes would execute on one of machines in list. MPI would cycle through list of machines giving processes to machines. Can also specify number of processes on a particular machine by adding that number after machine name.)

Initialization MPI_Init() Before any MPI routine can be called from within the program, the program must call MPI_Init() once (and only once) to initialize MPI environment. MPI_Init() supplied with command line arguments that the implementation can use. 9-2.26

Termination MPI_Finalize() Program must call MPI_Finalize() to terminate the MPI environment. One typically does not place anything after MPI_Finalize() including a return and certainly no MPI routine can be called. 9-2.27

Therefore, every MPI program has the structure: main (int argc, char *argv[]) { MPI_Init(&argc, &argv);/* initialize MPI */. MPI_Finalize();/* terminate MPI */ } 9-2.28

Finding process rank MPI_Comm_rank() In most MPI programs, necessary to find the rank of a process within a communicator. Can be found from routine MPI_Comm_rank(). MPI_Comm_rank(MPI_COMM_WORLD,&myid); 9-2.29

Finding number of processes MPI_Comm_size() Often, also necessary to know how many processes there are in a communicator. Can be found from the routine MPI_Comm_size(). MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 9-2.30

A simple program using these routines is main (int argc, char *argv[]) { MPI_Init(&argc, &argv); /* initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD,&myid); /*rank */ MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /*no of procs*/ printf(“I am %d of %d\n”, myid+1, numprocs); MPI_Finalize(); /* terminate MPI */ } 9-2.31

All output redirected to console, so output at the console will be I am 3 of 4 I am 1 of 4 I am 4 of 4 I am 2 of 4 but in any order, as the order of execution on different computers is indeterminate. 9-2.32

Note in previous slide, each process executes the same code. Typically though master-slave approach used. Then one might have: main (int argc, char *argv[]) { MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* rank */ if (myrank == 0) master(); else slave(); MPI_Finalize(); } where master() and slave() are executed by master process and slave process, respectively. 9-2.33

MPI Message-Passing Routines A brief overview 9-2.34

MPI point-to-point message passing using MPI_send() and MPI_recv() library calls 9-2.35 Fig 9.11 Data Transfer MPI_send() and MPI_recv()

9-2.36 Parameters of (blocking) send MPI_Send(buf, count, datatype, dest, tag, comm) Address of Number of items Datatype of Rank of destination Message tag Communicator send buffer to send each item process

9-2.37 Parameters of (blocking) receive MPI_Recv(buf, count, datatype, src, tag, comm, status) Address of Maximum number Datatype of Rank of source Message tag Communicator receive buffer of items to receive each item process Status after operation

9-2.38 Example To send an integer x from process 0 to process 1, MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /* find rank */ if (myrank == 0) { int x; MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD); } else if (myrank == 1) { int x; MPI_Recv(&x, 1, MPI_INT, 0,msgtag,MPI_COMM_WORLD,status); }

Message Tags Used to differentiate between messages being received at different times for different purposes. An integer that is carried with the message. Only messages with tags specified in MPI_Recv() accepted by receive routine. Wild card for tag MPI_ANY_TAG -- receive will match with any message from the specified process in the communicator. Wild card for source MPI_ANY SOURCE -- all messages in communicator accepted. 9-2.39

Semantics of MPI_Send() and MPI_Recv() If process waits until message transmission is complete before moving on, this is called blocking. MPI_Send() – Non-Blocking – Message may not reached its destination but process can continue in the knowledge that message safely on its way. MPI_Recv() – Blocking – Returns when message received and data collected. Will cause process to stall until message received. 2.40

Semantics of MPI_Send() and MPI_Recv() Data is buffered, meaning that after returning, any local variables used can be altered without affecting message transfer. Other versions of MPI_Send() and MPI_Recv() have different semantics, i.e. blocking versus non-blocking, buffered versus non-buffered. 2.41

MPI broadcast operation MPI_Bcast() 9-2.42 Fig 9.12

MPI_Bcast parameters 2a.43

Basic MPI scatter operation MPI_Scatter() 9-2.44 Fig 9.13

MPI scatter parameters 2a.45 scount refers to number of items sent to each process. Normally, scount the same as rcount, and stype same as rtype. Somewhat strangely, send and receive types and count are specified separately and could be different.

Sending group of contiguous elements to each process In following, 100 contiguous elements sent to each process: main (int argc, char *argv[]) { int size, *sendbuf, recvbuf[100]; /* for each process */ MPI_Init(&argc, &argv); /* initialize MPI */ MPI_Comm_size(MPI_COMM_WORLD, &size); sendbuf = (int *)malloc(size*100*sizeof(int)); MPI_Scatter(sendbuf,100,MPI_INT,recvbuf,100,MPI_IN T,0, MPI_COMM_WORLD); MPI_Finalize(); /* terminate MPI */ } 9-2.46

Scattering contiguous groups of elements to each process 9-2.47 Fig 9.14

Gather operation MPI_Gather() 9-2.48 Fig 9.15

Gather parameters 2a.49

2a.50 Gather Example To gather items from group of processes into process 0, using dynamically allocated memory in root process: int data[10];/*data to be gathered from processes*/ MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */ if (myrank == 0) { MPI_Comm_size(MPI_COMM_WORLD, &grp_size); /*find group size*/ buf = (int *)malloc(grp_size*10*sizeof (int)); /*alloc. mem*/ } MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0,MPI_COMM_ WORLD) ; … MPI_Gather() gathers from all processes, including root.

Reduce operation MPI_Reduce() with addition 9-2.51 Fig 9.16

Reduce parameters 2a.52

Implementation of collective operations MPI_Bcast(), MPI_Scatter(), MPI_Gather(), and MPI_Reduce() Not defined in MPI standard except that their effects equivalent to using a multitude of point-to-point sends and receives. Same semantics of point-to-point blocking MPI_send()/MPI_recv() communications -- all processes return when locally complete and do not wait for each other. Collective routines would be assumed to be much more efficient than MPI_send()/MPI_recv(). 9-2.53

Synchronization MPI_Barrier() Related purpose for message passing is to synchronize processes so communication continues at a specific point. MPI offers specific synchronization operation, MPI_Barrier(), which prevents any process calling routine returning until all have called it. 9-2.54 Fig 9.17

Synchronous message-passing routines MPI_SSend() 9-2.55 MPI_SSend() does not continue until message received (Blocking). Can be used with MPI_Recv() to effect a synchronous point-to-point message transfer. Note, much better to avoid synchronization if possible as implies that processes waiting for other.

Grid-Enabled MPI 9-2.56 MPI uses networked computers with IP addresses so it would seem reasonable to extend MPI to include geographically distributed networked computers. There have been several projects to implement MPI across geographically distributed computers connected on the Internet. In fact, MPI alone could be used.

Message passing across geographically distributed computers 9-2.57 Fig 9.18

MPICH abstract device interface 9-2.58 Fig 9.19 MPICH-1 separates implementation of MPI routines from implementation of communication methods.

In MPICH, different implementations of abstract device interface provided for different systems, e.g. for workstation clusters, for shared memory systems. MPICH-G MPICH modified to operate across a Grid platform. Includes abstract device for Globus version 2. Perhaps first attempt at porting MPI to a Grid and first projects that directly used Globus infrastructure and indeed proved utility of Globus toolkit. 9-2.59

(a) Using regular MPICH to run a remote job 9-2.60 Fig 9.20 (a) Comparing MPICH and MPICH-G

9-2.61 Comparing MPICH and MPICH-G (b) Using MPICH-G to run a remote job Fig 9.20 (b)

9-2.62 Apart from early MPICH-G/MPICH-G2 project, other projects provided tools to run MPI programs across a geographically distributed Grid. Do not necessarily use Globus software. Examples PACX-MPI (Parallel Computer Extension - MPI Can encrypt messages using SSL GridMPI ™ open source project Avoided security layer altogether for performance. Project investigators argue HPC Grid sites often use their own networks, so that security not an issue. Found that MPI programs scale up to round-trip communication latencies of 20 mS, - about 500 miles for 1–10 Gbits/sec networks, the distance between major sites in Japan (where project developed) and then MPI programming feasible.

9-2.63 In both GridMPI and PACX-MPI, idea not to have to alter MPI programs at all and simply run MPI programs that would run on a local cluster. That does not preclude modifying the programs to take into account the geographically distributed nature of the compute resources. Grid Enabling MPI Programs

9-2.64 Most notable aspects for running MPI programs on a Grid are: Long and indeterminate communication latency Difficulty and effect of global synchronization Effect of failures (fault tolerance) Heterogeneous computers (usually) Need for communicating processes to be running simultaneously Grid Enabling MPI Programs

9-2.65 Mostly, focus is to ameliorate effects of latency. One basic concept - overlap communications with computations while waiting for communication to complete - comes directly from regular parallel programming

9-2.66 Overlapping communications with computations (send before receive) Fig 9.21

9-2.67 Quiz (Multiple choice)

9-2.68 What is MPI? (a) A standard for user-level message passing library routines (b) An implementation of message passing routines (c) A portal interface for message passing Grid applications (d) A portlet interface for message passing Grid applications SAQ 9-4

9-2.69 What is an MPI communicator? (a) A talkative MPI program (b) A process that sends messages (c) An MPI compiler (d) A way of defining a set of processes that can communicate between themselves SAQ 9-5

9-2.70 What type of routine is the MPI_Recv() routine? (a) Synchronous (b) Blocking (c) Non-blocking (d) Asynchronous SAQ 9-7

Questions 9-2.71

9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid.

Similar presentations

Presentation on theme: "9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid.

Similar presentations

Presentation on theme: "9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid."— Presentation transcript:

Similar presentations

About project

Feedback