Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,

Similar presentations


Presentation on theme: "Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,"— Presentation transcript:

1 Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook, the reference books, and the instructor’s original materials.

2 Winter, 2004CSS490 MPI2 Group Communication Communication types: One-to-many: broadcast Many-to-one: synchronization, collective communication Many-to-many: gather and scatter Group addressing Using a special network address: IP Class D and UDP Emulating a broadcast with one-to-one communication: Performance drawback on bus-type networks Simpler for switching-based networks Semantics Send-to-all, bulletin-board semantics 0-, 1-, m-out-of-n, all-reliable

3 Winter, 2004CSS490 MPI3 Atomic Multicast Send-to-all semantics and all-reliable Simple emulation: A repetition of one-to-on communication with acknowledgment What if a receiver fails Time-out retransmission What if a sender fails before all receivers receive a message All receivers forward the message to the same group. A receiver discard the 2 nd or the following messages.

4 Winter, 2004CSS490 MPI4 Message Ordering R1 and R2 receive m1 and m2 in a different order! Some message ordering required Absolute ordering Consistent ordering Causal ordering FIFO ordering S1R1R2 S2 m1 m2

5 Winter, 2004CSS490 MPI5 Absolute Ordering Rule: Mi must be delivered before mj if Ti < Tj Implementation: A clock synchronized among machines A sliding time window used to commit message delivery whose timestamp is in this window. Example: Distributed simulation Drawback Too strict constraint No absolute synchronized clock No guarantee to catch all tardy messages mi mj Tj Ti Ti < Tj

6 Winter, 2004CSS490 MPI6 Consistent Ordering Rule: Messages received in the same order (regardless of their timestamp). Implementation: A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers A message retrieved in incremental order at a receiver Example: Replicated database updation Drawback: A centralized algorithm mi mj Tj Ti Ti < Tj

7 Winter, 2004CSS490 MPI7 Causal Ordering Rule: Happened-before relation If e k i, e l i ∈ h and k < l, then e k i → e l i, If e i = send(m) and e j = receive(m), then e i → e j, If e → e’ and e’ → e”, then e → e” Implementation: Use of a vector message Example: Distributed file system Drawback: Vector as an overhead Broadcast assumed S1 R1 R2 R3 S2 m1 m2 m3 m4 From R2 ’ s view point m1 → m2

8 Winter, 2004CSS490 MPI8 Vector Message S[i] = R[i] + 1 where i is the source id S[j] ≤ R[j] where i≠j Site A Site B Site CSite D 2, 1, 1, 0 1, 1, 1, 0 2, 1, 0, 0 delayed delivered 3,1,1,0

9 Winter, 2004CSS490 MPI9 FIFO Ordering Rule: Messages received in the same order as they were sent. Implementation: Messages assigned a sequence number Example: TCP This is the weakest ordering. Router 1 Router 2 m1 m2 m3 m4 m1 m2 m3 m4 S R

10 Winter, 2004CSS490 MPI10 Why High-Level Message Passing Tools? Data formatting Data formatted into appropriate types at user level Non-blocking communication Polling and interrupt handled at system call level Process addressing Inflexible hardwired addressing with machine id + local id Group communication Group server implemented at user level Broadcasting simulated by a repetition of one-to-one communication

11 Winter, 2004CSS490 MPI11 PVM and MPI PVM: Parallel Virtual Machine Developed in 80 ’ s The pioneer library to provide high-level message passing functions The PVM daemon process taking care of message transfer for user processes in background MPI: Message Passing Interface Defined in 90 ’ s The specification of high-level message passing functions Several implementations available: mpich, mpi-lam Library functions directly linked to user programs (no background daemons) The detailed difference is shown by: PVMvsMPI.ps

12 Winter, 2004CSS490 MPI12 Getting Started with MPI Website: Creating a hostfile: mfukuda]$ vi hosts uw uw uw uw Compile a source program: mfukuda]$ mpiCC source.cpp – o myProg Run the executable file: mfukuda]$ mpirun – np 4 myProg args

13 Winter, 2004CSS490 MPI13 Program Using MPI #include #include "mpi++.h" int main(int argc, char *argv[]) { MPI::Init(argc, argv); // Start MPI computation int rank = MPI::COMM_WORLD.Get_rank(); // Process ID (from 0 to #processes – 1) int size = MPI::COMM_WORLD.Get_size(); // # participating processes cout << "Hello World! I am " << rank << " of " << size << endl; MPI::Finalize(); // Finish MPI computation }

14 Winter, 2004CSS490 MPI14 MPI_Send and MPI_Recv Int MPI::COMM_WORLD.Send( void*message/* in */, intcount/* in */, MPI::Datatypedatatype/* in */, intdest/* in */, inttag/* in */) Int MPI::COMM_WORLD.Recv( void*message/* in */, intcount/* in */, MPI::Datatypedatatype/* in */, intsource/* in */, /* MPI::ANY_SOURCE */ inttag/* in */, MPI::Status*status/* out */) /* can be omitted */ MPI::Datatype =CHAR, SHORT, INT, LONG UNSIGNED_CHAR, UNSIGNED_SHORT, UNSIGNED, UNSIGNED_LONG, FLOAT, DOUBLE, LONG_DOUBLE, BYTE, PACKED MPI::Status->MPI_SOURCE, MPI::Status->MPI_TAG, MPI::MPI_ERROR

15 Winter, 2004CSS490 MPI15 MPI_Send and MPI_Recv #include #include "mpi++.h" main(int argc, char *argv[]) { int tag0 = 0; MPI::Init(argc, argv); // Start MPI computation if (MPI::COMM_WORLD.Get_rank() rank == 0 ) { // rank 0…sender int loop = 3; MPI::COMM_WORLD.Send( "Hello World!", 12, MPI::CHAR, 1, tag0 ); MPI::COMM_WORLD.Send( &loop, 1, MPI::INT, 1, tag0 ); } else { // rank 1…receiver int loop; char msg[12]; MPI::COMM_WORLD.Recv( msg, 12, MPI::CHAR, 0, tag0 ); MPI::COMM_WORLD.Recv( &loop, 1, MPI::INT, 0, tag0 ); for (int I = 0; I < loop; I++ ) cout << msg << endl; } MPI::Finalize(); // Finish MPI computation }

16 Winter, 2004CSS490 MPI16 Message Ordering in MPI FIFO Ordering in each data type Messages reordered with a tag in each data type SourceDestination SourceDestination tag = 1 tag = 2 tag = 3

17 Winter, 2004CSS490 MPI17 MPI_Bcast Int MPI::COMM_WORLD.Bcast( void*message/* in */, intcount/* in */, MPI::Datatypedatatype/* in */, introot/* in */) Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 MPI::COMM_WORLD.Bcast( &msg, 1, MPI::INT, 2);

18 Winter, 2004CSS490 MPI18 MPI_Reduce Int MPI::COMM_WORLD.Reduce( void*operand/* in */, void*result/* out */, intcount/* in */, MPI::Datatypedatatype/* in */, MPI::Opoperator/* in */, introot/* in */) MPI::Op = MPI::MAX (Maximum),MPI::MIN (Minimum),MPI::SUM (Sum), MPI::PROD (Product),MPI::LAND (Logical and),MPI::BAND (Bitwise and), MPI::LOR (Logical or),MPI::BOR (Bitwise or),MPI::LXOR (logical xor), MPI::BXOR(Bitwise xor),MPI::MAXLOC (MAX location)MPI::MINLOC (MIN loc.) Rank0 15 Rank1 10 Rank2 12 Rank3 8 Rank4 4 MPI::COMM_WORLD.Reduce( &msg, &result, 1, MPI::INT, MPI::SUM, 2); 49

19 Winter, 2004CSS490 MPI19 MPI_Allreduce Int MPI::COMM_WORLD.Allreduce( void*operand/* in */, void*result/* out */, intcount/* in */, MPI::Datatypedatatype/* in */, MPI::Opoperator/* in */)

20 Winter, 2004CSS490 MPI20 Exercises (No turn-in) 1. Consider an application requiring both one-to-many and many-to-one communication. 2. Consider an application requiring atomic multicast. 3. Assume that four processes communicate with one another in causal ordering. Their current vectors are show below. If Process A sends a message, which processes can receive it immediately? 4. Consider pros and cons of PVM ’ s daemon-based and MPI ’ s library linking-based message passing. 5. Why can MPI maintain FIFO ordering? Process AProcess BProcess CProcess D 3, 5, 2, 12, 5, 2, 13, 5, 2, 13, 4, 2, 1


Download ppt "Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,"

Similar presentations


Ads by Google