Lecture 2: Part II Message Passing Programming: MPI

Lecture 2: Part II Message Passing Programming: MPI
Introduction to MPI MPI programming Running MPI program Architecture of MPICH

Message Passing Interface (MPI)
go

What is MPI? A message passing library specification
message-passing model not a compiler specification not a specific product For parallel computers, clusters and heterogeneous networks. Full-featured

Why use MPI? (1) Message passing now mature as programming paradigm
well understood efficient match to hardware many applications

Why use MPI? (2) Full range of desired features modularity
access to peak performance portability heterogeneity subgroups topologies performance measurement tools

Who Designed MPI ? Venders Library writers
IBM, Intel, TMC, SGI, Meiko, Cray, Convex, Ncube,….. Library writers PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda, DP (HKU), PM (Japan), AM (Berkeley), FM (HPVM at Illinois) Application specialists and consultants

Vender-Supported MPI HP-MPI Hewlett Packard; Convex SPP
MPI-F IBM SP1/SP2 Hitachi/MPI Hitachi SGI/MPI SGI PowerChallenge series MPI/DE NEC. INTEL/MPI Intel. Paragon (iCC lib) T.MPI Telmat Multinode Fujitsu/MPI Fujitsu AP1000 EPCC/MPI Cray & EPCC, T3D/T3E. Cho-Li Wang

Public-Domain MPI MPICH Argonne National Lab. &
Mississippi State Univ. LAM Ohio Supercomputer center MPICH/NT Mississippi State University MPI-FM Illinois (Myrinet) MPI-AM UC Berkeley (Myrinet) MPI-PM RWCP, Japan (Myrinet) MPI-CCL California Institute of Technology Cho-Li Wang

Public-Domain MPI CRI/EPCC MPI Cray Research and Edinburgh
Parallel Computing Centre (Cray T3D/E) MPI-AP Australian National University- CAP Research Program (AP1000) W32MPI Illinois, Concurrent Systems RACE-MPI Hughes Aircraft Co. MPI-BIP INRIA, France (Myrinet)

Communicator Concept in MPI
Identify the process group and context with respect to which the operation is to be performed

Communicator (2) Four communicators Communicator within Communicator
Process Same process can be existed in different communicators Process Process in different communicators cannot communicate Process Process Process Process Process Process Process Process Process Process Process Process

Features of MPI (1) go General Predefined communicator
Communicators combine context and group for message security Predefined communicator MPI_COMM_WORLD

Features of MPI (2) Point-to-point communication
Structured buffers and derived data types, heterogeneity Modes : normal (blocking and non-blocking), synchronous, ready (to allow access to fast protocols), buffered

Features of MPI (3) Collective Communication
Both built-in and user-defined collective operations Large number of data movement routines Subgroups defined directly or by topology E.g, broadcast, barrier, reduce, scatter, gather, all-to-all, ..

MPI Programming

Writing MPI programs MPI comprises 125 functions
Many parallel programs can be written with just 6 basic functions

Six basic functions (1) MPI_INIT Initiate an MPI computation int MPI_Init ( argc, argv ) MPI_FINALIZE Terminate a computation int MPI_Finalize ( )

Six basic functions (2) MPI_COMM_SIZE Determine number of processes in a communicator MPI_COMM_RANK Determine the identifier of a process in a specific communicator

int MPI_Comm_size ( comm, size )
MPI_Comm comm; int *size; int MPI_Comm_rank ( comm, rank ) int *rank;

Six basic functions (3) MPI_SEND Send a message from one process to another process MPI_RECV Receive a message from one process to another process

int MPI_Send( buf, count, datatype, dest, tag, comm )
void *buf; int count, dest, tag; MPI_Datatype datatype; MPI_Comm comm; tag distinguishes different types of messages dest is a rank in comm

int MPI_Recv( buf, count, datatype, source, tag, comm, status )
void *buf; int count, source, tag; MPI_Datatype datatype; MPI_Comm comm; MPI_Status *status;

A simple program Each process prints Find the process ID of
print(“I am “, myid, “ of “, count) Each process prints out its output MPI_COMM_RANK(MPI_COMM_WORLD, myid) Find the process ID of current process MPI_COMM_SIZE(MPI_COMM_WORLD, count) Find the number of processes Program main begin MPI_INIT() MPI_COMM_SIZE(MPI_COMM_WORLD, count) MPI_COMM_RANK(MPI_COMM_WORLD, myid) print(“I am ”, myid, “ of ”, count) MPI_FINALIZE() end MPI_FINALIZE() Shut down MPI_INIT() Initiate computation

Result I’m 3 of 4 I’m 1 of 4 I’m 0 of 4 I’m 2 of 4 Process 3 Process 1

Point-to-Point Communication
The basic point-to-point communication operators are send and receive. Send Transmission Receive Buffer Buffer Sender Receiver

Another simple program (2 nodes)
….. MPI_COMM_RANK(MPI_COMM_WORLD, myid) if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…) else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) END IF print(“Received from “,words) …… I’m process 0! if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)…… I’m process 1! else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…)

Process 0 Process 1 MPI_SEND (“Zero”,…,…,1,…,…) MPI_RECV
(words,…,…,0,…,…,…) Received Setup buffer and wait the message from process 0 Send “Zero” to process 1 Zero words (buffer) MPI_RECV (words,…,…,1,…,…) MPI_SEND (“One”,…,…,0,…,…,…) Wait Setup buffer and wait the message from process 1 Received One words (buffer) Send “One” to process 0 Print(“Received from “,words) Wait

Result Received from One Received from Zero Process 0 Process 1

Collective Communication (1)
Communication that involves a group of processes Receive Send Transmission Buffer Buffer Buffer Buffer Sender Receivers

Collective Communication (2)
Three Types Barrier MPI_BARRIER Data movement MPI_BCAST MPI_GATHER MPI_SCATTER Reduction operations MPI_REDUCE

Barrier go MPI_BARRIER
Used to synchronize execution of a group of processes Wait for us! We can’t go on! Barrier Barrier Barrier We’re together! The barrier will be disappeared! Let’s go!

int MPI_Barrier ( comm )
MPI_Comm comm; int MPI_Bcast ( buffer, count, datatype, root, comm ) void *buffer; int count; MPI_Datatype datatype; int root; MPI_Comm comm;

Data movement (1) MPI_BCAST
One single process sends the same data to all other processes, itself included BCAST BCAST BCAST BCAST FACE FACE FACE FACE FACE Process 0 Process 1 Process 2 Process 3

Data movement (2) MPI_GATHER
All process (include the root process) send the same data to one process and store them in rank order GATHER GATHER GATHER GATHER F F A A C C FACE E E Process 0 Process 1 Process 2 Process 3

int MPI_Gather ( sendbuf, sendcnt, sendtype,
int MPI_Gather ( sendbuf, sendcnt, sendtype, recvbuf, recvcount, recvtype, root, comm ) void *sendbuf; int sendcnt; MPI_Datatype sendtype; void *recvbuf; int recvcount; MPI_Datatype recvtype; int root; MPI_Comm comm;

Examples 1) Gather 100 ints from every process in group to root
MPI_comm comm; int root, myrank, *rbuf, gsize, sendbuf [100]; … … MPI_Comm_size (comm, &gsize); rbuf = (int *) malloc (gsize*100*sizeof (int)); MPI_Gather(sendbuf,100,MPI_int, rbuf,100,MPI_int,root,comm); 100 100 100 100 rbuf at root rbuf = new int[gsize*100]

MPI_comm comm; int root, myrank, *rbuf, gsize, sendbuf [100]; … … MPI_Comm_rank (comm, myrank); if (myrank = = root) {MPI_Comm_size (comm, &gsize); rbuf = (int *) malloc (gsize*100*sizeof (int));} MPI_Gather(sendbuf,100,MPI_int, rbuf,100,MPI_int, root,comm);

Data movement (3) MPI_SCATTER
A process sends out a message, which is split into several equals parts, and the ith portion is sent to the ith process SCATTER SCATTER SCATTER SCATTER F FACE A C E Process 0 Process 1 Process 2 Process 3

int MPI_Scatter ( sendbuf, sendcnts, sendtype,
int MPI_Scatter ( sendbuf, sendcnts, sendtype, recvbuf, recvcnt, recvtype, root, comm ) void *sendbuf; int *sendcnts; MPI_Datatype sendtype; void *recvbuf; int recvcnt; MPI_Datatype recvtype; int root; MPI_Comm comm;

Data movement (4) MPI_REDUCE (e.g., find maximum value)
combine the values of each process, using a specified operation, and return the combined value to a process REDUCE REDUCE REDUCE REDUCE 8 9 max 3 7 8 9 9 3 7 Process 0 Process 1 Process 2 Process 3

int MPI_Reduce ( sendbuf, recvbuf, count, datatype, op, root, comm )
void *sendbuf; void *recvbuf; int count; MPI_Datatype datatype; MPI_Op op; int root; MPI_Comm comm;

Predefined operations
MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND logical and MPI_BAND bit-wise and MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR MPI_MAXLOC MPI_MINLOC

Example program (1) Calculating the value of  by:

Example program (2) …… MPI_BCAST(numprocs, …, …, 0, …)
for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) if (myid == 0) Output result Boardcast the no. of process MPI_BCAST(numprocs, …, …, 0, …) Each process calculate specified areas for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) Sum up all the areas MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) Print the result if (myid == 0) Output result

=3.141... Start calculation! OK! OK! Calculated by process 0

MPICH - A Portable Implementation of MPI
Argonne National Laboratory

What is MPICH??? The first complete and portable implementation of full MPI standard. ‘CH’ stands for “Chameleon” symbol of adaptability and portability. It contains a programming environment for working with MPI programs. It includes a portable startup mechanism and libraries. ,

How can I install it??? Install the packet mpich.tar.gz to a directory
Use ‘./configure’ and ‘make >& make.log to choose appropriate architecture and device and compile the file Syntax: ./configure -device=DEVICE -arch=ARCH_TYPE ARCH_TYPE: specify the type of machine to be configured DEVICE: specify what kind of communication device the system will choose - ch_p4 (TCP/IP)

How to run an MPI Program
The file should be in the format: mercury venus earth mars Edit mpich/util/machines/machines.XXXX, to contain names of machines of architecture xxxx. For example: Computer mercury Computer venus Computer mars Computer earth

include “mpi.h” into the source program. Compile program by using command ‘mpicc’ - mpicc -c foo.c Use ‘mpirun’ to run an MPI program. mpirun will determine the environment for the program to run

mpirun -np 4 a.out - a.out are going to run four processors for massively parallel processors mpirun -arch sun4 -np2 -arch rs6000 -np 3 program - Run a program on 2 sun4s and 3 rs6000s, with local machine being a sun4 (multiple architectures) 5 6

MPIRUN (1) How to start a mpi program? Use mpirun Examples:
#mpirun -np 4 cpi it starts four processes of cpi

MPIRUN (2) What MPIRUN do?
1. Read the arguments to specify the environment of the mpi program. i) How many processes should be started ii) Which machines will the mpi program be started iii) What device will be used (e.g. ch_p4) 2. Split the processes to the machines will be ran 3. Record down the split results in the PI???? file

MPIRUN(3) Example Suppose using ch_p4 device #mpirun -np 4 cpi
1. mpirun knows 4 processes need to be started 2. mpirun reads the machines file to find which machines can be ran 3. ch_p4 device will be used if no specified argument given in the command

MPIRUN (4) 4. Split the tasks and save in PI???? file File format:
<hostname> <no. of proc.> <program> genius.cs.hku.hk cpi eagle.cs.hku.hk cpi dragon.cs.hku.hk cpi virtue.cs.hku.hk cpi 5. Start the processes in remote machines by using “rsh”

Architecture of MPICH

Structure of MPICH ABSTRACT DEVICE INTERFACE ABSTRACT DEVICE INTERFACE
MPI PORTABLE API LIBRARY MPICH ABSTRACT DEVICE MPICH CHANNEL INTERFACE Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Low Level Layer Socket TCP/IP Shared Memory Vendor Design

MPICH - Abstract Device Interface
Interface between high-level MPI and low-level device. Manages message packaging, buffering policies and handle heterogeneous communication. 4 sets of functions: 1. Specify send or receive of a message. 2. Data movement between API and hardware. 3. Manage lists of pending messages. 4. Provide information about execution environment.

MPICH - The Channel Interface (1)
The interface transfer data from one process‘s address space to another’s. Information is divided into two parts: message envelop and data It includes five functions: MPID_SendControl, MPID_RecvAnyControl, MPID_ControlMsgAvail - envelop information MPID_SendChannel, MPID_RecvFromChannel - data information

MPICH - The Channel Interface (2)
Channel Interface adopt data exchange mechanism in accordance to the size of message. Data Exchange Mechanism implemented: Short, Eager, Rendezvous, Get

Protocol - Short The size of data managed by this mechanism is shortest. The data is delivered within the message envelop.

Short Protocol Data Transfer
Reach Reach Reach Reach Reach Reach Store in Buffer MPI_Recv MPI_Recv MPI_Recv Data Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Control Message Short Protocol Data Transfer

Protocol - Eager Data is sent to the destination immediately.
The receiver must allocate some space to store the data locally. It is the default choice in MPICH. It is not suitable for large amounts of data transfer.

Eager Protocol Data Transfer
Buffer Full!!! Save in Buffer MPI_Control Data MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control MPI_Control Data3 Data Data Data Data Data Data Data1 Data Data Data Data4 Data2 MPI_Recv MPI_Recv MPI_Recv MPI_Recv Eager Protocol Data Transfer

Protocol - Rendezvous Data is sent to the destination only when requested. If users want to use it, add -use_rndv in the command ‘./configure’. No buffering required.

Rendezvous Protocol Data Transfer
Wait Again! Wait! MPI_Control MPI_Control MPI_Cotrol MPI_Control Match!!! Received! Wait MPI_Control MPI_Control MPI_Control MPI_Control Data Data Data Data Data Data Data Data Data MPI_Recv MPI_Request MPI_Request MPI_Request MPI_Request MPI_Request MPI_Request MPI_Request Rendezvous Protocol Data Transfer

Protocol - Get In this protocol, data is read directly by the receiver. Data is directly transferred from one process’s memory to another. Highest Performance. require shared memory remote memory operation

Get Protocol Data Transfer
Receiver directly access sender shared memory I want to get data from sender Receiver directly copy data from sender shared memory to its memory Get Protocol Data Transfer

Conclusion

MPI–1.1 (June 95) MPI 1.1 doesn’t provide process management
remote memory transfers active messages threads virtual shared memory

MPI–2 (July 97) Extensions to the MPI process creation and management
one-sided communications extended collective operations external interface I/O additional language bindings

Lecture 2: Part II Message Passing Programming: MPI

Similar presentations

Presentation on theme: "Lecture 2: Part II Message Passing Programming: MPI"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2: Part II Message Passing Programming: MPI

Similar presentations

Presentation on theme: "Lecture 2: Part II Message Passing Programming: MPI"— Presentation transcript:

Similar presentations

About project

Feedback