MPI Jakub Yaghob. Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface,

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed

MPI Collective Communications

Reference: / MPI Program Structure.

Reference: Getting Started with MPI.

Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.

SOME BASIC MPI ROUTINES With formal datatypes specified.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.

MPI Collective Communication CS 524 – High-Performance Computing.

EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.

Collective Communications

Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.

Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.

Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.

1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.

Sahalu JunaiduICS 573: High Performance Computing6.1 Programming Using the Message Passing Paradigm Principles of Message-Passing Programming The Building.

Parallel Programming with Java

Parallel Programming with MPI Matthew Pratola

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.

1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.

CS 420 – Design of Algorithms MPI Data Types Basic Message Passing - sends/receives.

Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.

Parallel Programming with MPI By, Santosh K Jena..

Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.

Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.

CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.

Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.

1. 2 The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. The logical.

Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

Message Passing Interface Using resources from

Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

1 MPI: Message Passing Interface Prabhaker Mateti Wright State University.

Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.

3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:

Computer Science Department

Introduction to MPI Programming Ganesh C.N.

MPI Jakub Yaghob.

CS4402 – Parallel Computing

Introduction to MPI.

MPI Message Passing Interface

Computer Science Department

Send and Receive.

Collective Communication with MPI

An Introduction to Parallel Programming with MPI

Send and Receive.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Distributed Systems CS

CS 5334/4390 Spring 2017 Rogelio Long

MPI: Message Passing Interface

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Computer Science Department

5- Message-Passing Programming

MPI Message Passing Interface

Presentation transcript:

MPI Jakub Yaghob

Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, ISBN , MIT Press, 2014 Gropp W., Hoefler T., Thakur R., Lusk E.: Using Advanced MPI: Modern Features of the Message-Passing Interface, ISBN , MIT Press, 2014 References MPI forum (standard) Cornell Virtual Workshop

What is MPI? Message Passing Interface A library of functions MPI-1 standard (1994) MPI-2 standard (1997) MPI-3 standard (2012)

MPI-1 standard MPI-1 standard (1994) Specifies the names, calling sequences, and results of subroutines and functions to be called from Fortran 77 and C, respectively. All implementations of MPI must conform to these rules, thus ensuring portability. MPI programs should compile and run on any platform that supports the MPI standard The detailed implementation of the library is left to individual vendors, who are thus free to produce optimized versions for their machines Implementations of the MPI-1 standard are available for a wide variety of platforms

MPI-2 standard Additional features not presented in MPI-1 Tools for parallel I/O C++ and Fortran 90 bindings Dynamic process management

MPI-3 standard Nonblocking collective communication One side communication Removed C++ binding Added Fortran 2008 binding

Goals The primary goals Provide source code portability MPI programs should compile and run as-is on any platform Allow efficient implementations across a range of architectures MPI also offers A great deal of functionality, including a number of different types of communication, special routines for common collective operations, and the ability to handle user-defined data types and topologies Support for heterogeneous parallel architectures

Goals – cont. Some things explicitly outside of MPI-1 Explicit shared-memory operations The precise mechanism for launching an MPI program Platform dependent Dynamic process management Included in MPI-2 Debugging Parallel I/O Included in MPI-2 Operations that require more OS support Interrupt-driven receives

Why (not) use MPI? You should use MPI when you need to Write portable parallel code Achieve high performance in parallel programming Handle a problem that involves irregular or dynamic data relationship that do not fit well into the data-parallel environment (High-Performance Fortran) You should not use MPI when you Can achieve sufficient performance and portability using a data-parallel or shared-memory approach Can use a pre-existing library of parallel routines Don’t need parallelism at all

Library calls Library calls classes Initialize, manage, and terminate communications Communication between pairs of processes Communication operations among groups of processes Arbitrary data types

Hello world! #include void main(int argc, char **argv) { int err; err = MPI_Init(&argc, &argv); printf(“Hello world!\n”); err = MPI_Finalize(); } include returned error naming convention

Initializing MPI int MPI Init(int *argc, char ***argv); Must be called as the first MPI routine Establishes the MPI environment

Terminating MPI int MPI Finalize(void); The last MPI routine Cleans up all MPI data structures, cancels incomplete operations Must be called by all processes Otherwise the program will appear to hang

Datatypes Variables normally declared as C/Fortran types MPI type names used as arguments in MPI routines Hides the details of representation Automatic translation between representations in a heterogeneous environment Arbitrary data types built from the basic types

Basic datatypes MPI datatypeC type MPI_CHARchar (printable) MPI_SIGNED_CHARsigned char (integer) MPI_UNSIGNED_CHARunsigned char (integer) MPI_SHORTsigned short int MPI_INTsigned int MPI_LONGsigned long int MPI_UNSIGNED_CHARunsigned char MPI_UNSIGNED_SHORTunsigned short int MPI_UNSIGNEDunsigned int MPI_UNSIGNED_LONGunsigned long int MPI_FLOATfloat MPI_DOUBLEdouble MPI_LONG_DOUBLElong double

Basic datatypes – cont. MPI datatypeC type MPI_WCHARwchar_t MPI_INT8_Tint8_t MPI_UINT8_Tuint8_t MPI_INT16_Tint16_t MPI_UINT16_Tuint16_t MPI_INT32_Tint32_t MPI_UINT32_Tuint32_t MPI_INT64_Tint64_t MPI_UINT64_Tuint64_t MPI_BYTE(none) MPI_PACKED(none)

Special datatypes MPI provides several special datatypes MPI_Comm – a communicator MPI_Status – a structure with several fields of status information for MPI calls MPI_Datatype – a datatype MPI_Request – a nonblocking operation MPI_Aint – an address in a memory

Communicators A communicator – a handle representing a group of processes that can communicate with one another Processes can communicate only if they share a communicator There can be many communicators A process can be a member of a number of different communicators Processes numbered sequentially (0-based) Rank of the process  Different ranks in different communicators A basic communicator MPI_COMM_WORLD All processes

Getting communicator information Rank int MPI_Comm_rank(MPI_Comm comm, int *rank); Determines a rank for a given communicator Size int MPI_Comm_size(MPI_Comm comm, int *size); A number of processes in a communicator

Point-to-point communication One process sends a message and another process receives it Active participation from the processes on both sides The source and destination processes operate asynchronously The source process may complete sending a message long before the destination process receives the message The destination process may initiate receiving a message that has not yet been sent

Message Two parts Envelope Source – the sending process Destination – the receiving process Communicator – a group of processes to which both processes belong Tag – classify messages Message body Buffer – the message data  An array datatype[count] Datatype – the type of the message data Count – the number of items of type datatype in buffer

Sending a message Blocking send int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm); All arguments are input arguments Returns an error code Possible behaviors The message may be copied into an MPI internal buffer and transferred to its destination later The message may be left where it is, in the program’s variables, until the destination process is ready to receive it  Minimizes copying and memory use

Receiving a message Blocking receive int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status); The message envelope arguments determine what messages can be received The source wildcard MPI_ANY_SOURCE The tag wildcard MPI_ANY_TAG It is an error, when the message contains more data than the receiving process is prepared to accept The sender and receiver must use the same message datatype Not checked, undefined behavior Status contains the source, the tag and the actual count

Status MPI_Status structure Predefined members MPI_SOURCE, MPI_TAG, MPI_ERROR int MPI_Get_count(const MPI_Status *status, MPI_Datatype datatype, int *count) Getting the number of elements in the message Datatype should be the same as in MPI_Recv, MPI_Probe, etc.

Derived types Constructors MPI_Type_contiguous A contiguous sequence of values in memory MPI_Type_vector Several sequences evenly spaced but not consecutive in memory MPI_Type_hvector Identical to VECTOR, except the distance between successive blocks is in bytes Elements of some other type are interspersed in memory with the elements of interest MPI_Type_indexed Sequences that may vary both in length and in spacing Arbitrary parts of a single array MPI_Type_hindexed Similar to INDEXED, except that the locations are specified in bytes Arbitrary parts of arbitrary arrays, all have the same type

Derived types Address int MPI_Address(void* location, MPI_Aint *address); The address of a location in a memory General constructor int MPI_Type_struct(int count, int *array_of_blocklengths, MPI_Aint *array_of_displacements, MPI_Datatype *array_of_types, MPI_Datatype *newtype);

Derived types in pictures Vector Indexed Struct v_blk_len[0]=3 blklen=2 Derived types stride=5 count=3 v_disp[0]=0 count=3 v_blk_len[1]=2v_blk_len[2]=1 v_disp[1]=5v_disp[2]=14 v_blk_len[0]=3 v_disp[0] count=3 v_blk_len[1]=2v_blk_len[2]=1 v_disp[1]v_disp[2] type[0]type[1]type[2]

Derived types Commit int MPI_Type_commit(MPI_Datatype *datatype); Must be called before using the datatype in a communication Free int MPI_Type_free(MPI_Datatype *datatype); Deallocates the datatype Any communication using the datatype will complete normally Derived datatypes are not affected

Collective communication Communication among all processes in a group Set of collective communication routines Hide implementation details The most efficient algorithm Collective communication calls do not use tags Associated by order of program execution The programmer must ensure that all processes execute the same collective communication calls and execute them in the same order

Barrier synchronization Barrier int MPI_Barrier(MPI_Comm comm); Blocks the calling process until all processes in a group call this function Use it, when some processes cannot proceed until other processes have completed their computation Master process reads the data and transmit them to workers

Broadcast int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm); Broadcasts a message from the process with rank root to all processes of the group, itself included Called by all members of group using the same arguments for comm, root

Broadcast p0a p1 p2 p3 p0a p1 p2 p3 a a a

Reduction int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm); Collects data from each process Reduces these data to a single value using an operation Stores the reduced result on the root process A set of predefined operations or user defined MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MINLOC, MPI_MAXLOC

Reduction p0a0 p1 p2 p3 p0 p1 p2 p3 a1 a2 a3 a1 a2 a3 a0 r

Gather int MPI_Gather(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) int MPI_Gatherv(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, const int recvcounts[], const int displs[], MPI_Datatype recvtype, int root, MPI_Comm comm) All-to-one communication The receive arguments are only meaningful to the root process Each process (including the root) sends the contents of the send buffer to the root The root process receives the messages and stores them in rank order

Gather p0a0 p1 p2 p3 p0 p1 p2 p3 a1 a2 a3 a0 a1 a2 a3 a1a2a3

Allgather int MPI_Allgather(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) int MPI_Allgatherv(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, const int recvcounts[], const int displs[], MPI_Datatype recvtype, MPI_Comm comm) After the data are gathered into root process, they are broadcasted to all processes No root process specified Send and receive arguments meaningful to all processes

Allgather p0a0 p1 p2 p3 p0 p1 p2 p3 a1 a2 a3 a0a1a2a3 a0a1a2a3 a0a1a2a3 a0a1a2a3

Scatter int MPI_Scatter(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) int MPI_Scatterv(const void* sendbuf, const int sendcounts[], const int displs[], MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) One-to-all communication Different data are sent from the root process to each process in rank order The send arguments are only meaningful to the root process

Scatter p0a p1 p2 p3 p0a p1 p2 p3 b c d bcd

Alltoall int MPI_Alltoall(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) int MPI_Alltoallv(const void* sendbuf, const int sendcounts[], const int sdispls[], MPI_Datatype sendtype, void* recvbuf, const int recvcounts[], const int rdispls[], MPI_Datatype recvtype, MPI_Comm comm) int MPI_Alltoallw(const void* sendbuf, const int sendcounts[], const int sdispls[], const MPI_Datatype sendtypes[], void* recvbuf, const int recvcounts[], const int rdispls[], const MPI_Datatype recvtypes[], MPI_Comm comm) Scatters data to other processes, gather data from them Matrix transposition

Alltoall p0a0a1a2a3 p1b1b2b3 p2c1c2c3 p3d1d2d3 p0 p1 p2 p3 b0 c0 d0 a0b0c0d0 a1b1c1d1 a2b2c2d2 a3b3c3d3

Other collective operations MPI_Allreduce Combine the elements of each process’s input buffer Stores the combined value on the receive buffer of all group members MPI_Scan, MPI_Excscan A prefix reduction on data distributed across the group MPI_Reduce_scatter Combines MPI_Reduce and MPI_Scatter