Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.

Slides:

Advertisements

Similar presentations

Practical techniques & Examples

Advertisements

CS 140: Models of parallel programming: Distributed memory and MPI.

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.

Introduction to Parallel Programming at MCSR. Mission Enhance Computational Research Climate at Mississippi’s 8 Public Universities also: Support High.

Tutorial on MPI Experimental Environment for ECE5610/CSC

High Performance Computing

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.

Point-to-Point Communication Self Test with solution.

CS 240A: Models of parallel programming: Distributed memory and MPI.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS1 Enzo Papandrea COMPUTING HW REQUIREMENT.

EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.

Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale.

2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

Introduction to Parallel Programming with C and MPI at MCSR Part 2 Broadcast/Reduce.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

CS 240A Models of parallel programming: Distributed memory and MPI.

MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.

Introduction to Parallel Programming with C and MPI at MCSR Part 1 MCSR Unix Camp.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.

MPI Communications Point to Point Collective Communication Data Packaging.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.

An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams

1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

CSCI-455/522 Introduction to High Performance Computing Lecture 4.

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.

Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.

1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.

12.1 Parallel Programming Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

An Introduction to MPI (message passing interface)

Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.

1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.

Message Passing Interface Using resources from

MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.

An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams

1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.

MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.

Introduction to parallel computing concepts and technics

CS4402 – Parallel Computing

Introduction to MPI.

MPI Message Passing Interface

CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.

MPI-Message Passing Interface

Lecture 14: Inter-process Communication

A Message Passing Standard for MPP and Workstations

Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,

Introduction to parallelism and the Message Passing Interface

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Message-Passing Computing Message Passing Interface (MPI)

Hello, world in MPI #include <stdio.h> #include "mpi.h"

Hello, world in MPI #include <stdio.h> #include "mpi.h"

MPI Message Passing Interface

CS 584 Lecture 8 Assignment?.

Presentation transcript:

Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010

What is a Supercomputer? Loosely speaking, it is a “large” computer with an architecture that has been optimized for bigger solving problems faster than a conventional desktop, mainframe, or server computer. - Pipelining - Parallelism (lots of CPUs or Computers)

Supercomputers at MCSR: mimosa -253 CPU Intel Linux Cluster – Pentium 4 -Distributed memory – 500MB – 1GB per node -Gigabit Ethernet

Supercomputers at MCSR: redwood -224 CPU Memory Supercomputer -Intel Itanium 2 -Shared Memory: 1GB per node

Supercomputers at MCSR: sequoia -46 node Linux Cluster -8 cores (CPUs) per node = 368 cores total -2 GB memory per core (16 GB per node) -Shared memory intra-node -Distributed memory inter-node -Intel Xeon processors

Supercomputers at MCSR: sequoia

What is Parallel Computing? Using more than one computer (or processor) to complete a computational problem

How May a Problem be Parallelized? Data Decomposition Task Decomposition

Models of Parallel Programming Message Passing Computing –Processes coordinate and communicate results via calls to message passing library routines –Programmers “parallelize” algorithm and add message calls –At MCSR, this is via MPI programming with C or Fortran Sweetgum – Origin 2800 Supercomputer (128 CPUs) Mimosa – Beowulf Cluster with 253 Nodes Redwood – Altix 3700 Supercomputer (224 CPUs) Shared Memory Computing –Processes or threads coordinate and communicate results via shared memory variables –Care must be taken not to modify the wrong memory areas –At MCSR, this is via OpenMP programming with C or Fortran on sweetgum

Message Passing Computing at MCSR Process Creation Manager and Worker Processes Static vs. Dynamic Work Allocation Compilation Models Basics Synchronous Message Passing Collective Message Passing Deadlocks Examples

Message Passing Process Creation Dynamic –one process spawns other processes & gives them work –PVM –More flexible –More overhead - process creation and cleanup Static –Total number of processes determined before execution begins –MPI

Message Passing Processes Often, one process will be the manager, and the remaining processes will be the workers Each process has a unique rank/identifier Each process runs in a separate memory space and has its own copy of variables

Message Passing Work Allocation Manager Process –Does initial sequential processing –Initially distributes work among the workers Statically or Dynamically –Collects the intermediate results from workers –Combines into the final solution Worker Process –Receives work from, and returns results to, the manager –May distribute work amongst themselves (decentralized load balancing)

Message Passing Compilation Compile/link programs w/ message passing libraries using regular (sequential) compilers Fortran MPI example: include mpif.h C MPI example: #include “mpi.h”

Message Passing Compilation

Message Passing Models SPMD – Shared Program/Multiple Data –Single version of the source code used for each process –Manager executes one portion of the program; workers execute another; some portions executed by both –Requires one compilation per architecture type –MPI MPMP – Multiple Program/Multiple Data –Once source code for master; another for slave –Each must be compiled separately –PVM

Message Passing Basics Each process must first establish the message passing environment Fortran MPI example: integer ierror call MPI_INIT (ierror) C MPI example: MPI_Init(&argc, &argv);

Message Passing Basics Each process has a rank, or id number –0, 1, 2, … n-1, where there are n processes With SPMD, each process must determine its own rank by calling a library routine Fortran MPI Example: integer comm, rank, ierror call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) C MPI Example MPI_Comm_rank(MPI_COMM_WORLD, &rank);

Message Passing Basics Each process has a rank, or id number –0, 1, 2, … n-1, where there are n processes Each process may use a library call to determine how many total processes it has to play with Fortran MPI Example: integer comm, size, ierror call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) C MPI Example MPI_Comm_size(MPI_COMM_WORLD, &size);

Message Passing Basics Each process has a rank, or id number –0, 1, 2, … n-1, where there are n processes Once a process knows the size, it also knows the ranks (id #’s) of those other processes, and can send or receive a message to/from any other process. C Example: MPI_Send(buf, count, datatype, dest, tag, comm, ierror) DATA EVELOPE--- -status MPI_Recv(buf, count, datatype, sourc,tag,comm, status,ierror)

MPI Send and Receive Arguments Buf starting location of data Count number of elements Datatype MPI_Integer, MPI_Real, MPI_Character… Destination rank of process to whom msg being sent Source rank of sender from whom msg being received or MPI_ANY_SOURCE Tag integer chosen by program to indicate type of message or MPI_ANY_TAG Communicator id’s the process team, e.g., MPI_COMM_WORLD Status the result of the call (such as the # data items received)

Synchronous Message Passing Message calls may be blocking or nonblocking Blocking Send –Waits to return until the message has been received by the destination process –This synchronizes the sender with the receiver Nonblocking Send –Return is immediate, without regard for whether the message has been transferred to the receiver –DANGER: Sender must not change the variable containing the old message before the transfer is done. –MPI_ISend() is nonblocking

Synchronous Message Passing Locally Blocking Send –The message is copied from the send parameter variable to intermediate buffer in the calling process –Returns as soon as the local copy is complete –Does not wait for receiver to transfer the message from the buffer –Does not synchronize –The sender’s message variable may safely be reused immediately –MPI_Send() is locally blocking

Synchronous Message Passing Blocking Receive –The call waits until a message matching the given tag has been received from the specified source process. –MPI_RECV() is blocking. Nonblocking Receive –If this process has a qualifying message waiting, retrieves that message and returns –If no messages have been received yet, returns anyway –Used if the receiver has other work it can be doing while it waits –Status tells the receive whether the message was received –MPI_Irecv() is nonblocking –MPI_Wait() and MPI_Test() can be used to periodically check to see if the message is ready, and finally wait for it, if desired

Collective Message Passing Broadcast –Sends a message from one to all processes in the group Scatter –Distributes each element of a data array to a different process for computation Gather –The reverse of scatter…retrieves data elements into an array from multiple processes

Collective Message Passing w/MPI MPI_Bcast() Broadcast from root to all other processes MPI_Gather() Gather values for group of processes MPI_Scatter() Scatters buffer in parts to group of processes MPI_Alltoall() Sends data from all processes to all processes MPI_Reduce() Combine values on all processes to single val MPI_Reduce_Scatter() Broadcast from root to all other processes MPI_Bcast() Broadcast from root to all other processes

Message Passing Deadlock Deadlock can occur when all critical processes are waiting for messages that never come, or waiting for buffers to clear out so that their own messages can be sent Possible Causes –Program/algorithm errors –Message and buffer sizes Solutions –Order operations more carefully –Use nonblocking operations –Add debugging output statements to your code to find the problem

Sample PBS Script sequoia% vi example.pbs #!/bin/bash #PBS -l nodes=4 # Mimosa #PBS –l ncpus=4 # Redwood #PBS –l ncpus=4 # Sequoia #PBS –l cput=0:5:0 # Request 5 minutes of CPU time #PBS –N example cd $PWD rm *.pbs.[eo]* icc –lmpi –o add_mpi.exe add_mpi.c #Sequoia mpiexec -n 4 add_mpi.exe #Sequoia sequoia % qsub example.pbs sequoia.mcsr.olemiss.edu

PBS: Querying Jobs

MPI Programming Exercises Hello World sequential parallel (w/MPI and PBS) Add the prime numbers in an Array of numbers sequential parallel (w/MPI and PBS)

Log in to sequoia & get workshop files A.Use secure shell to login from your PC to hpcwoods ssh B. Use secure shell to from hpcwoods to your training account on sequoia: ssh ssh C. Copy workshop files into your home directory by running: /usr/local/apps/ppro/prepare_mpi_workshop

Examine, compile, and execute hello.c

Examine hello_mpi.c

Add macro to include the header file for the MPI library calls.

Examine hello_mpi.c Add function call to initialize the MPI environment

Examine hello_mpi.c Add function call find out how many parallel processes there are.

Examine hello_mpi.c Add function call to find out which process this is – the MPI process ID of this process.

Examine hello_mpi.c Add IF structure so that the manager/boss process can do one thing, and everyone else (the workers/servants) can do something else.

Examine hello_mpi.c All processes, whether manager or worker, must finalize MPI operations.

Compile hello_mpi.c Why won’t this compile? You must link to the MPI library. Compile it.

Run hello_mpi.exe On 1 CPU On 2 CPUs On 4 CPUs

hello_mpi.pbs

Submit hello_mpi.pbs

Examine, compile, and execute add_mpi.c

Examine add_mpi.pbs

Submit PBS Script: add_mpi.pbs

Examine Output and Errors add_mpi.c

Determine Speedup

Determine Parallel Efficiency

How Could Speedup/Efficiency Improve?

What Happens to Results When MAXSIZE Not Evenly Divisible by n?

Exercise 1: Change Code to Work When MAXSIZE is Not Evenly Divisible by n

Exercise 2: Change Code to Improve Speedup