Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Parallel Computing—Higher-level concepts of MPI.

Similar presentations


Presentation on theme: "1 Parallel Computing—Higher-level concepts of MPI."— Presentation transcript:

1 1 Parallel Computing—Higher-level concepts of MPI

2 2 MPI—Presentation Outline Communicators, Groups, and Contexts Collective Communication Derived Datatypes Virtual Topologies

3 3 Communicators, groups, and contexts MPI provides a higher level abstraction to create parallel libraries: Safe communication space Group scope for collective operations Process Naming Communicators + Groups provide: Process Naming (instead of IP address + ports) Group scope for collective operations Contexts: Safe communication

4 4 What are communicators? A data-structure that contains groups (and thus processes) Why is it useful: Process naming, ranks are names for application programmers Easier than IPaddress + ports Group communications as well as point to point communication There are two types of communicators, Intracommunicators: Communication within a group Intercommunicators: Communication between two groups (must be disjoint)

5 5 What are contexts? An unique integer: An additional tag on the messages Each communicator has a distinct context that provides a safe communication universe: A context is agreed upon by all processes when a communicator is built Intracommunicators has two contexts: One for point-to-point communications One for collective communications, Intercommunicators has two contexts: Explained in the coming slides

6 6 Intracommunicators Contains one group Allows point-to-point and collective communications between processes within this group Communicators can only be built from existing communicators: MPI.COMM_WORLD is the first Intracommunicator to start with Creation of intracommunicators is a collective operation: All processes in the existing communicator must call it in order to execute successfully Intracommunicators can have process topologies: Cartesian Graph

7 7 Creating new Intracommunicators MPI.Init(args); int [] incl1 = { 0, 3}; Group grp1 = MPI.COMM_WORLD.Group(); Group grp2 = grp1.Incl(incl1); Intracomm newComm = MPI.COMM_WORLD.Create(grp2);

8 8 How do processes agree on context for new Intracommunicators ? Each process has a static context variable which is incremented whenever an Intracomm is created Each process increments this variable, sends it to all the other processes The max integer is agreed upon as the context An existing communicators’ context is used for sending “context agreement” messages: What about MPI.COMM_WORLD? It is safe anyway, because it is the first intracommunicator and there is no chance of conflicts

9 9 Intercommunicators Contains two groups: Local group (the local process is in this group) Remote group Both groups must be disjoint Only allows point-to-point communications Intercommunicators cannot have process topologies Next slide: How to create intercommunicators

10 10 MPI.Init(args); int [] incl2 = {0, 2, 4, 6}; int [] incl3 = {1, 3, 5, 7}; Group grp1 = MPI.COMM_WORLD.Group(); int rank = MPI.COMM_WORLD.Rank(); Group grp2 = grp1.Incl(incl2); Group grp3 = grp1.Incl(incl3); Intercomm icomm = null; if(rank == 0 || rank == 2 || rank == 4 || rank == 6) { icomm = MPI.COMM_WORLD.Create_intercomm(comm1,0,1,56); } else { icomm = MPI.COMM_WORLD.Create_intercomm(comm2,1,0,56);} Creating intercommunicators

11 11 Creating intercomms … What are the arguments to Create_intercomm method: Local communicator (which contains current process) local_leader (rank) remote_leader (rank) tag for messages sent for selection of contexts But, the groups were disjoint, how can they communicate? That is where a peer communicator is required At least local_leader and remote_leader are part of this peer communicator In the last figure, MPI.COMM_WORLD is the peer communicator, and process 0 and 1 (ranks relative to MPI.COMM_WORLD) are leaders of their respective groups

12 12 Selecting contexts for intercomms An intercommunicator has two contexts: send_context (Used for sending messages) recv_context (Used for receiving messages) In intercommunicators, processes in local group can only send messages to remote groups How is context agreed upon? Each group decides its context, The leaders (local and remote) exchange the contexts agreed upon, The one which is greater, is selected as the context

13 13 Process 0 Process 1 Process 3 Process 2 Process 4 Process 5 Process 7 Process 6 COMM_WORLD Group1 Group2 0 1 2 0 1 2

14 14 MPI—Presentation Outline Point to Point Communication Communicators, Groups, and Contexts Collective Communication Derived Datatypes Virtual Topologies

15 15 Collective communications Provided as a convenience for application developers: Save significant development time Efficient algorithms may be used Stable (tested) Built on top of point-to-point communications These operations include: Broadcast, Barrier, Reduce, Allreduce, Alltoall, Scatter, Scan, Allscatter Versions that allows displacements between the data

16 16 Image from MPI standard doc Broadcast, scatter, gather, allgather, alltoall

17 17 Reduce collective operations  MPI.PROD  MPI.SUM  MPI.MIN  MPI.MAX  MPI.LAND  MPI.BAND  MPI.LOR  MPI.BOR  MPI.LXOR  MPI.BXOR  MPI.MINLOC  MPI.MAXLOC Processes

18 18 Eight processes, thus forms only one group Each process exchanges an integer 4 times Overlaps communications well A Typical Barrier() Implementation

19 19 Intracomm.Bcast( … ) Sends data from a process to all the other processes Code from adlib: A communication library for HPJava The current implementation is based on n-ary tree: Limitation: broadcasts only from rank=0 Generated dynamically Cost: O( log2(N) ) MPICH1.2.5 uses linear algorithm: Cost O(N) MPICH2 has much improved algorithms LAM/MPI uses n-ary trees: Limitation, broadcast from rank=0

20 20 A Typical Broadcast Implementation

21 21 MPI—Presentation Outline Point to Point Communication Communicators, Groups, and Contexts Collective Communication Derived Datatypes Virtual Topologies

22 22 MPI Datatypes What kind (type) of data can be sent using MPI messaging? Basically two types: Basic (primitive) datatypes Derived datatypes

23 23 MPI Basic Datatypes MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_UNSIGNED_CHAR MPI_UNSIGNED_SHORT MPI_UNSIGNED_LONG MPI_UNSIGNED MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE

24 24 Besides basic datatypes, it is possible communicate heterogeneous and non- contiguous data: Contiguous Indexed Vector Struct Derived Datatypes

25 25 MPI—Presentation Outline Point to Point Communication Communicators, Groups, and Contexts Collective Communication Derived Datatypes Virtual Topologies

26 26 Virtual topologies Used to specify processes in a geometric shape Virtual topologies have no connection with the physical layout of machines: Its possible to make use of underlying machine architecture These virtual topologies can be assigned to processes in an Intracommunicator MPI provides: Cartesian topology Graph topology

27 27 Cartesian topology: Mapping four processes onto 2x2 topology Each process is assigned a coordinate: Rank 0: (0,0) Rank 1: (1,0) Rank 2: (0,1) Rank 3: (1,1) Uses: Calculate rank by knowing grid position Calculate grid positions from ranks Easier to locate rank of neighbours Applications may have communication patterns: Lots of messaging with immediate neighbours

28 28 Periods in cartesian topology Axis 1 (y-axis is periodic): Processes in top and bottom rows have valid neighbours towards top and bottom respectively Axis 0 (x-axis is non- periodic): Processes in right and left column have undefined neighbour towards right and left respectively

29 29 Graph topology

30 30 Just to give you an idea how MPI-based applications are designed … Doing Matrix Multiplication using MPI

31 31 = x 1 0 2 2 1 0 0 2 2 0 1 0 0 0 1 1 1 1 2 3 2 0 2 1 2 2 4 Basically how it works!

32 32 Matrix Multiplication M x N.. int rank = MPI.COMM_WORLD.Rank() ; int size = MPI.COMM_WORLD.Size() ; if(master_mpi_process) { initialize matrices M and N for(int i=1 ; i<size ; i++) { send rows of matrix M to process `i’ } broadcast matrix N to all non-zero processes for (int i=0 ; i<size ; i++) { receive rows of resultant matrix from process `i’ }.. print results.. } else { receive rows of Matrix M call broadcast to receive matrix N compute matrix multiplication for sub matrix (its done in parallel) send resultant row back to master process }..


Download ppt "1 Parallel Computing—Higher-level concepts of MPI."

Similar presentations


Ads by Google