Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Processing Javier Delgado

Similar presentations


Presentation on theme: "Parallel Processing Javier Delgado"— Presentation transcript:

1 Parallel Processing Javier Delgado
Grid-Enabledment of Scientific Applications Professor S. Masoud Sadjadi

2 Parallel Processing - GCB
Outline Why parallel processing Overview The Message Passing Interface (MPI)‏ Introduction Basics Examples OpenMP Alternatives to MPI Parallel Processing - GCB

3 Why parallel processing?
Computationally-intensive scientific applications Hurricane modelling Bioinformatics High-Energy Physics Physical limits of one processor There are many open areas in science that require massive computation power to solve. Many new areas have emerge recently, such as bioinformatics. As the professor has discussed in class, there are physical limitations to how many fast a single processor can go. Even if there were not, we are still many years away from having a single processor to solve these problems Parallel Processing - GCB

4 Types of Parallel Processing
Shared Memory e.g. Multiprocessor computer Distributed Memory e.g. Compute Cluster Parallel Processing - GCB

5 Parallel Processing - GCB
Shared Memory Advantages No explicit message passing Fast Disadvantages Scalability Synchronizaton Since all processors are on the same box, the user does not need to pass messages as in a distributed system. In many cases, a simple pragma statement will take care of everything. Also, since all load balancing is done in one processor, it is fast However, as more and more processors/cores are added, simultaneious access to memory can lead to bus saturation. Also, synchronizatino becomes a problem if multiple cores are reading and writing the same area in memory. Parallel Processing - GCB Source:

6 Parallel Processing - GCB
Distributed Memory Advantages Each processor has its own memory Usually more cost-effective Disadvantages More programmer involvement Slower Parallel Processing - GCB

7 Parallel Processing - GCB
Combination of Both Emerging trend Best and worst of both worlds As processors themselves are scaling out instead of up, we end up with a combination of shared memory and distributed memory Parallel Processing - GCB

8 Parallel Processing - GCB
Outline Why parallel processing Overview The Message Passing Interface (MPI)‏ Introduction Basics Examples OpenMP Alternatives to MPI Parallel Processing - GCB

9 Parallel Processing - GCB
Message Passing Standard for Distributed Memory systems Networked workstations can communicate De Facto specification: The Message Passing Interface (MPI)‏ Free MPI Implementations: MPICH OpenMPI LAM-MPI “Specification” is highlighted since MPI is not really an implementation. It is a specification of what the implementations should do. There are several implementations available today. Parallel Processing - GCB

10 Parallel Processing - GCB
MPI Basics Design Virtues Defines communication, but not its hardware Expressive Performance Concepts No adding/removing of processors during computation Same program runs on all processors Single-Program, Multiple Data (SPMD)‏ Multiple Instruction, Multiple Data (MIMD)‏ Processes identified by “rank” MPI specifies communication directives that are allowed by the system, but it does not limit to any kind of hardware implementation. Whether you are using ethernet, myrinet, or even shared memory systems, although by default this is disabled in most implementations, as far as I know. It is designed such that programs can be written with a minimal subset of the specified functions. However, many powerful functions are provided for optimal performance and programming power. Since it is an open standard, a lot of thought when into its design. Also, it is optimized for parallel programs. It works with other compiler optimizations since standard system compilers are used. Number of nodes doing computation stays constant. This provides an easier implementation and is generally safe for jobs that complete in a reasonable amount of time and servers and not in a “dangerous environment”. One of the main problems of grid computing, which Marlon will be covering in a later lecture, is that this is does not hold. MPI programs consist of a single executable that runs on all participating nodes. Somtimes, the same instructions are carried on different data, other times different instructions are carried out on the data. Process determines its role from program logic Master node is the entry point Core commands: Init, Send, Receive, Finalize Parallel Processing - GCB

11 Parallel Processing - GCB
Communication Types Standard Synchronous (blocking send)‏ Ready Buffered (asynchronous)‏ For non-blocking communication: MPI_Wait – block until receive MPI_Test - true/false At the heart of MPI, is message passing. In other words sending (and receiving) messages. Here we begin to see the flexibility provided by MPI. It defines a standard communication type, we may be synchronous or asynchronous, the underlying implementation tries to make the best decision. Synchronous communication requires the call to block until a “receive” is snet from the destination node Ready mode assumes that the destination node is ready. So it will complete even if the receiver was not ready, which could be dangerous Buffered mode makes a copy of the message to a local buffer, so that it can execute when ready. With nonblocking calls, other work could be performed while the message is transfering. To sort of convert them to blocking calls, MPI_WAit (or one of its variants) can be used. To Test if a transfer has been completed, MPI_TEST (or one of its variants) may be used. Parallel Processing - GCB

12 Parallel Processing - GCB
Message Structure Data Length Data Type Data Length Data Type Variable Name Data Send Recv Destination Status Communication context Tag Communication context Tag Naturally, for things like Send/Receive and Wait/Test to work, there needs to be a way of identifying messages. Various parameters related to the data being transferred must be specified. Also, in order to differentiate messages, a tag needs to be issued. Since tags are user-generated and collisions are possible, There is a need for contexts as well. Contexts are system-generated. Parallel Processing - GCB

13 Data Types and Functions
Uses its own types for consistency MPI_INT, MPI_CHAR, etc. All Functions prefixed with “MPI_” MPI_Init, MPI_Send, MPI_Recv, etc. Parallel Processing - GCB

14 Our First Program: Numerical Integration
Objective: Calculate area under f(x) = x2 Outline: Define variables Initialize MPI Determine subset of program to calculate Perform Calculation Collect Information (at Master)‏ Send Information (Slaves)‏ Finalize Problem: Determine the area under the curve f(x)=x^2, between x = [2,5], using a 50 rectangle resolution Parallel Processing - GCB

15 Parallel Processing - GCB
Our First Program Download Link: Parallel Processing - GCB

16 Variable Declarations
#include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects #define lowerLimit #define upperLimit int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

17 Variable Declarations
#include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects #define lowerLimit #define upperLimit int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

18 Variable Declarations
#include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects #define lowerLimit #define upperLimit int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

19 Variable Declarations
#include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects #define lowerLimit #define upperLimit int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

20 Parallel Processing - GCB
MPI Initialization int main( int argc, char * argv[] )‏ { ... MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &noProcesses); MPI_Comm_rank(MPI_COMM_WORLD, &processId); This is the same main, I'm just copying again so you realize we are still in it Parallel Processing - GCB

21 Parallel Processing - GCB
Calculation int main( int argc, char * argv[] )‏ { ... /* adjust problem size for subproblem*/ range = (upperLimit - lowerLimit) / noProcesses; width = range / numberRects; lower = lowerLimit + range * processId; /* calculate area for subproblem */ area = 0.0; for (i = 0; i < numberRects; i++)‏ { x = lower + i * width + width / 2.0; height = f(x); area = area + width * height; } Parallel Processing - GCB

22 Parallel Processing - GCB
Sending and Receiving int main( int argc, char * argv[] )‏ { ... tag = 0; if (processId == 0) /* MASTER */ { total = area; for (src=1; src < noProcesses; src++)‏ { MPI_Recv(&area, 1, MPI_DOUBLE, src, tag, MPI_COMM_WORLD, &status); total = total + area; } fprintf(stderr, "The area from %f to %f is: %f\n", lowerLimit, upperLimit, total ); else /* WORKER (i.e. compute node) */ { dest = 0; MPI_Send(&area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); }; Using “0” as the destination is good since there will always be a processor with rank of 0. If you are going to be testing code in a single-processor system, as is often the case, this is especially applicable. Parallel Processing - GCB

23 Parallel Processing - GCB
Finalizing int main( int argc, char * argv[] )‏ { ... MPI_Finalize(); return 0; } Parallel Processing - GCB

24 Parallel Processing - GCB
Communicators MPI_COMM_WORLD – All processes involved What if different workers have different tasks? MPI_COMM_WORLD is the default simple example: you have one process that acts as a random number generator that distributes unique numbers to the other nodes. The MASTER node sends the compute tasks to the rest of the nodes. In this case, you could have a communicator called “WORKER”. When an MPI call is given WORKER as the communicator, only the “WORKER” processes will be involved. Parallel Processing - GCB

25 Parallel Processing - GCB
Additional Functions Data Management MPI_Bcast (broadcast)‏ Collective Computation Min, Max, Sum, AND, etc. Benefits: Abstraction Optimized As mentioned earlier, many complete MPI programs can be created with the 6 basic functions. However, for optimal performance and development time, it is sometimes necessary to use other functions. These functions still use send and receive internally, but provide abstraction. Also, they are interanlly optimized for performance. I can't go over everything, here but these are a couple. A typical example is Data management, and the most common one is the broadcast message. This is used to send something to all participating nodes. For example, a constant variable Another example is a collective computation function. For example, if you calculate different subsets of a problem at different nodes and need to get the sum of them all, a sum function is provided. Parallel Processing - GCB Source:

26 Parallel Processing - GCB
Typical Problems Designing Debugging Scalability The first two are existing problems in computer science in general. The fact that you are dealing with a distributed environment merely makes them even bigger problems. Scalability is the new problem. Since the programs must deal with communication problems, it is usually difficult to increase the computation time in a nearly-linear fashion Parallel Processing - GCB

27 Parallel Processing - GCB
Scalability Analysis Definition: Estimation of resource (computation and computation) requirements of a program as problem size and/or number of processors increases Require knowledge of communication time Assume otherwise idle nodes Ignore data requirements of node When performing scalability analysis, we need knowledge of the propogation time of messages in order to make an estimate. Also, we assume that the nodes are not performing any other computation or communication. In other words, 100 percent of their resources are devoted to the task at hand. Lastly, we ignore the fact that as problem size increases, the likelihood of having to use virtual memory does also, which can have a profound effect on computation time. Parallel Processing - GCB

28 Simple Scalability Example
Tcomm = Time to send a message Tcomm = s + rn s = start-up time r = time to send a single byte (i.e. 1/bandwidth)‏ n = size of the data type (int, double, etc.)‏ Parallel Processing - GCB

29 Simple Scalability Example
Matrix Multiplication of two square matrices of size (N x N). First Matrix is broadcasted to all nodes Cost for the rest Computation n multiplications and (n – 1) additions per cell n2 x (2n – 1) = 2n3 -n2 floating point operations Communication Send n elements to worker node, and return the resulting n elements to the master node (2n)‏ After doing this for each column in the result matrix: n x 2n Parallel Processing - GCB

30 Simple Scalability Example
Therefore, we get the following ratio of communication to computation As n becomes very large, the ratio approaches 1/n. So this problem is not severely affected by communication overhead Parallel Processing - GCB

31 Parallel Processing - GCB
References slides/allslides.html High Performance Linux Clusters. By Joseph D. Sloan. O'Reilly Press. Using MPI, second edition. By Gropp, Lusk, and Skjellum. MIT Press. Parallel Processing - GCB


Download ppt "Parallel Processing Javier Delgado"

Similar presentations


Ads by Google