Lecture 2: Part II Message Passing Programming: MPI

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Parallel Programming in C with MPI and OpenMP
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Parallel Programming with Java
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Lecture 2: Part II Message Passing Programming: MPI
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Parallel Programming with MPI By, Santosh K Jena..
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Message-passing Model.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
Chapter 4 Message-Passing Programming. Learning Objectives Understanding how MPI programs execute Understanding how MPI programs execute Familiarity with.
PVM and MPI.
1 MPI: Message Passing Interface Prabhaker Mateti Wright State University.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
Introduction to MPI.
MPI Message Passing Interface
CS 668: Lecture 3 An Introduction to MPI
Computer Science Department
Send and Receive.
Collective Communication with MPI
CS 584.
An Introduction to Parallel Programming with MPI
Send and Receive.
Distributed Systems CS
MPI-Message Passing Interface
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Systems CS
Lecture 14: Inter-process Communication
High Performance Parallel Programming
A Message Passing Standard for MPP and Workstations
MPI: Message Passing Interface
CSCE569 Parallel Computing
4. Distributed Programming
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Computer Science Department
Parallel Processing - MPI
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Lecture 2: Part II Message Passing Programming: MPI Introduction to MPI MPI programming Running MPI program Architecture of MPICH

Message Passing Interface (MPI) go

What is MPI? A message passing library specification message-passing model not a compiler specification not a specific product For parallel computers, clusters and heterogeneous networks. Full-featured

Why use MPI? (1) Message passing now mature as programming paradigm well understood efficient match to hardware many applications

Why use MPI? (2) Full range of desired features modularity access to peak performance portability heterogeneity subgroups topologies performance measurement tools

Who Designed MPI ? Venders Library writers IBM, Intel, TMC, SGI, Meiko, Cray, Convex, Ncube,….. Library writers PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda, DP (HKU), PM (Japan), AM (Berkeley), FM (HPVM at Illinois) Application specialists and consultants

Vender-Supported MPI HP-MPI Hewlett Packard; Convex SPP MPI-F IBM SP1/SP2 Hitachi/MPI Hitachi SGI/MPI SGI PowerChallenge series MPI/DE NEC. INTEL/MPI Intel. Paragon (iCC lib) T.MPI Telmat Multinode Fujitsu/MPI Fujitsu AP1000 EPCC/MPI Cray & EPCC, T3D/T3E. Cho-Li Wang

Public-Domain MPI MPICH Argonne National Lab. & Mississippi State Univ. LAM Ohio Supercomputer center MPICH/NT Mississippi State University MPI-FM Illinois (Myrinet) MPI-AM UC Berkeley (Myrinet) MPI-PM RWCP, Japan (Myrinet) MPI-CCL California Institute of Technology Cho-Li Wang

Public-Domain MPI CRI/EPCC MPI Cray Research and Edinburgh Parallel Computing Centre (Cray T3D/E) MPI-AP Australian National University- CAP Research Program (AP1000) W32MPI Illinois, Concurrent Systems RACE-MPI Hughes Aircraft Co. MPI-BIP INRIA, France (Myrinet)

Communicator Concept in MPI Identify the process group and context with respect to which the operation is to be performed

Communicator (2) Four communicators Communicator within Communicator Process Same process can be existed in different communicators Process Process in different communicators cannot communicate Process Process Process Process Process Process Process Process Process Process Process Process

Features of MPI (1) go General Predefined communicator Communicators combine context and group for message security Predefined communicator MPI_COMM_WORLD

Features of MPI (2) Point-to-point communication Structured buffers and derived data types, heterogeneity Modes : normal (blocking and non-blocking), synchronous, ready (to allow access to fast protocols), buffered

Features of MPI (3) Collective Communication Both built-in and user-defined collective operations Large number of data movement routines Subgroups defined directly or by topology E.g, broadcast, barrier, reduce, scatter, gather, all-to-all, ..

MPI Programming

Writing MPI programs MPI comprises 125 functions Many parallel programs can be written with just 6 basic functions

Six basic functions (1) MPI_INIT Initiate an MPI computation int MPI_Init ( argc, argv ) MPI_FINALIZE Terminate a computation int MPI_Finalize ( )

Six basic functions (2) MPI_COMM_SIZE Determine number of processes in a communicator MPI_COMM_RANK Determine the identifier of a process in a specific communicator

int MPI_Comm_size ( comm, size ) MPI_Comm comm; int *size; int MPI_Comm_rank ( comm, rank ) int *rank;

Six basic functions (3) MPI_SEND Send a message from one process to another process MPI_RECV Receive a message from one process to another process

int MPI_Send( buf, count, datatype, dest, tag, comm ) void *buf; int count, dest, tag; MPI_Datatype datatype; MPI_Comm comm; tag distinguishes different types of messages dest is a rank in comm

int MPI_Recv( buf, count, datatype, source, tag, comm, status ) void *buf; int count, source, tag; MPI_Datatype datatype; MPI_Comm comm; MPI_Status *status;

A simple program Each process prints Find the process ID of print(“I am “, myid, “ of “, count) Each process prints out its output MPI_COMM_RANK(MPI_COMM_WORLD, myid) Find the process ID of current process MPI_COMM_SIZE(MPI_COMM_WORLD, count) Find the number of processes Program main begin MPI_INIT() MPI_COMM_SIZE(MPI_COMM_WORLD, count) MPI_COMM_RANK(MPI_COMM_WORLD, myid) print(“I am ”, myid, “ of ”, count) MPI_FINALIZE() end MPI_FINALIZE() Shut down MPI_INIT() Initiate computation

Result I’m 3 of 4 I’m 1 of 4 I’m 0 of 4 I’m 2 of 4 Process 3 Process 1

Point-to-Point Communication The basic point-to-point communication operators are send and receive. Send Transmission Receive Buffer Buffer Sender Receiver

Another simple program (2 nodes) ….. MPI_COMM_RANK(MPI_COMM_WORLD, myid) if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…) else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) END IF print(“Received from “,words) …… I’m process 0! if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)…… I’m process 1! else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…)

Process 0 Process 1 MPI_SEND (“Zero”,…,…,1,…,…) MPI_RECV (words,…,…,0,…,…,…) Received Setup buffer and wait the message from process 0 Send “Zero” to process 1 Zero words (buffer) MPI_RECV (words,…,…,1,…,…) MPI_SEND (“One”,…,…,0,…,…,…) Wait Setup buffer and wait the message from process 1 Received One words (buffer) Send “One” to process 0 Print(“Received from “,words) Wait

Result Received from One Received from Zero Process 0 Process 1

Collective Communication (1) Communication that involves a group of processes Receive Send Transmission Buffer Buffer Buffer Buffer Sender Receivers

Collective Communication (2) Three Types Barrier MPI_BARRIER Data movement MPI_BCAST MPI_GATHER MPI_SCATTER Reduction operations MPI_REDUCE

Barrier go MPI_BARRIER Used to synchronize execution of a group of processes Wait for us! We can’t go on! Barrier Barrier Barrier We’re together! The barrier will be disappeared! Let’s go!

int MPI_Barrier ( comm ) MPI_Comm comm; int MPI_Bcast ( buffer, count, datatype, root, comm ) void *buffer; int count; MPI_Datatype datatype; int root; MPI_Comm comm;

Data movement (1) MPI_BCAST One single process sends the same data to all other processes, itself included BCAST BCAST BCAST BCAST FACE FACE FACE FACE FACE Process 0 Process 1 Process 2 Process 3

Data movement (2) MPI_GATHER All process (include the root process) send the same data to one process and store them in rank order GATHER GATHER GATHER GATHER F F A A C C FACE E E Process 0 Process 1 Process 2 Process 3

int MPI_Gather ( sendbuf, sendcnt, sendtype, int MPI_Gather ( sendbuf, sendcnt, sendtype, recvbuf, recvcount, recvtype, root, comm ) void *sendbuf; int sendcnt; MPI_Datatype sendtype; void *recvbuf; int recvcount; MPI_Datatype recvtype; int root; MPI_Comm comm;

Examples 1) Gather 100 ints from every process in group to root MPI_comm comm; int root, myrank, *rbuf, gsize, sendbuf [100]; … … MPI_Comm_size (comm, &gsize); rbuf = (int *) malloc (gsize*100*sizeof (int)); MPI_Gather(sendbuf,100,MPI_int, rbuf,100,MPI_int,root,comm); 100 100 100 100 rbuf at root rbuf = new int[gsize*100]

MPI_comm comm; int root, myrank, *rbuf, gsize, sendbuf [100]; … … MPI_Comm_rank (comm, myrank); if (myrank = = root) {MPI_Comm_size (comm, &gsize); rbuf = (int *) malloc (gsize*100*sizeof (int));} MPI_Gather(sendbuf,100,MPI_int, rbuf,100,MPI_int, root,comm);

Data movement (3) MPI_SCATTER A process sends out a message, which is split into several equals parts, and the ith portion is sent to the ith process SCATTER SCATTER SCATTER SCATTER F FACE A C E Process 0 Process 1 Process 2 Process 3

int MPI_Scatter ( sendbuf, sendcnts, sendtype, int MPI_Scatter ( sendbuf, sendcnts, sendtype, recvbuf, recvcnt, recvtype, root, comm ) void *sendbuf; int *sendcnts; MPI_Datatype sendtype; void *recvbuf; int recvcnt; MPI_Datatype recvtype; int root; MPI_Comm comm;

Data movement (4) MPI_REDUCE (e.g., find maximum value) combine the values of each process, using a specified operation, and return the combined value to a process REDUCE REDUCE REDUCE REDUCE 8 9 max 3 7 8 9 9 3 7 Process 0 Process 1 Process 2 Process 3

int MPI_Reduce ( sendbuf, recvbuf, count, datatype, op, root, comm ) void *sendbuf; void *recvbuf; int count; MPI_Datatype datatype; MPI_Op op; int root; MPI_Comm comm;

Predefined operations MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND logical and MPI_BAND bit-wise and MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR MPI_MAXLOC MPI_MINLOC

Example program (1) Calculating the value of  by:

Example program (2) …… MPI_BCAST(numprocs, …, …, 0, …) for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) if (myid == 0) Output result Boardcast the no. of process MPI_BCAST(numprocs, …, …, 0, …) Each process calculate specified areas for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) Sum up all the areas MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) Print the result if (myid == 0) Output result

=3.141... Start calculation! OK! OK! Calculated by process 0

MPICH - A Portable Implementation of MPI Argonne National Laboratory

What is MPICH??? The first complete and portable implementation of full MPI standard. ‘CH’ stands for “Chameleon” symbol of adaptability and portability. It contains a programming environment for working with MPI programs. It includes a portable startup mechanism and libraries. ,

How can I install it??? Install the packet mpich.tar.gz to a directory Use ‘./configure’ and ‘make >& make.log to choose appropriate architecture and device and compile the file Syntax: ./configure -device=DEVICE -arch=ARCH_TYPE ARCH_TYPE: specify the type of machine to be configured DEVICE: specify what kind of communication device the system will choose - ch_p4 (TCP/IP)

How to run an MPI Program The file should be in the format: mercury venus earth mars Edit mpich/util/machines/machines.XXXX, to contain names of machines of architecture xxxx. For example: Computer mercury Computer venus Computer mars Computer earth

How to run an MPI Program include “mpi.h” into the source program. Compile program by using command ‘mpicc’ - mpicc -c foo.c Use ‘mpirun’ to run an MPI program. mpirun will determine the environment for the program to run

How to run an MPI Program mpirun -np 4 a.out - a.out are going to run four processors for massively parallel processors mpirun -arch sun4 -np2 -arch rs6000 -np 3 program - Run a program on 2 sun4s and 3 rs6000s, with local machine being a sun4 (multiple architectures) 5 6

MPIRUN (1) How to start a mpi program? Use mpirun Examples: #mpirun -np 4 cpi it starts four processes of cpi

MPIRUN (2) What MPIRUN do? 1. Read the arguments to specify the environment of the mpi program. i) How many processes should be started ii) Which machines will the mpi program be started iii) What device will be used (e.g. ch_p4) 2. Split the processes to the machines will be ran 3. Record down the split results in the PI???? file

MPIRUN(3) Example Suppose using ch_p4 device #mpirun -np 4 cpi 1. mpirun knows 4 processes need to be started 2. mpirun reads the machines file to find which machines can be ran 3. ch_p4 device will be used if no specified argument given in the command

MPIRUN (4) 4. Split the tasks and save in PI???? file File format: <hostname> <no. of proc.> <program> genius.cs.hku.hk 0 cpi eagle.cs.hku.hk 1 cpi dragon.cs.hku.hk 1 cpi virtue.cs.hku.hk 1 cpi 5. Start the processes in remote machines by using “rsh”

Architecture of MPICH

Structure of MPICH ABSTRACT DEVICE INTERFACE ABSTRACT DEVICE INTERFACE MPI PORTABLE API LIBRARY MPICH ABSTRACT DEVICE MPICH CHANNEL INTERFACE Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Socket TCP/IP Shared Memory Vendor Design

MPICH - Abstract Device Interface Interface between high-level MPI and low-level device. Manages message packaging, buffering policies and handle heterogeneous communication. 4 sets of functions: 1. Specify send or receive of a message. 2. Data movement between API and hardware. 3. Manage lists of pending messages. 4. Provide information about execution environment.

MPICH - The Channel Interface (1) The interface transfer data from one process‘s address space to another’s. Information is divided into two parts: message envelop and data It includes five functions: MPID_SendControl, MPID_RecvAnyControl, MPID_ControlMsgAvail - envelop information MPID_SendChannel, MPID_RecvFromChannel - data information

MPICH - The Channel Interface (2) Channel Interface adopt data exchange mechanism in accordance to the size of message. Data Exchange Mechanism implemented: Short, Eager, Rendezvous, Get

Protocol - Short The size of data managed by this mechanism is shortest. The data is delivered within the message envelop.

Short Protocol Data Transfer Reach Reach Reach Reach Reach Reach Store in Buffer MPI_Recv MPI_Recv MPI_Recv Data Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Short Protocol Data Transfer

Protocol - Eager Data is sent to the destination immediately. The receiver must allocate some space to store the data locally. It is the default choice in MPICH. It is not suitable for large amounts of data transfer.

Eager Protocol Data Transfer Buffer Full!!! Save in Buffer MPI_Control Data MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control Data3 Data Data Data Data Data Data Data1 Data Data Data Data4 Data2 MPI_Recv MPI_Recv MPI_Recv MPI_Recv Eager Protocol Data Transfer

Protocol - Rendezvous Data is sent to the destination only when requested. If users want to use it, add -use_rndv in the command ‘./configure’. No buffering required.

Rendezvous Protocol Data Transfer Wait Again! Wait! MPI_Control MPI_Control MPI_Cotrol MPI_Control Match!!! Received! Wait MPI_Control MPI_Control MPI_Control MPI_Control Data Data Data Data Data Data Data Data Data MPI_Recv MPI_Request MPI_Request MPI_Request MPI_Request MPI_Request MPI_Request MPI_Request Rendezvous Protocol Data Transfer

Protocol - Get In this protocol, data is read directly by the receiver. Data is directly transferred from one process’s memory to another. Highest Performance. require shared memory remote memory operation

Get Protocol Data Transfer Receiver directly access sender shared memory I want to get data from sender Receiver directly copy data from sender shared memory to its memory Get Protocol Data Transfer

Conclusion

MPI–1.1 (June 95) MPI 1.1 doesn’t provide process management remote memory transfers active messages threads virtual shared memory

MPI–2 (July 97) Extensions to the MPI process creation and management one-sided communications extended collective operations external interface I/O additional language bindings