Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

MPI Message Passing Interface
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda.
Chapter 7 – Transport Layer Protocols
Slide 1 Client / Server Paradigm. Slide 2 Outline: Client / Server Paradigm Client / Server Model of Interaction Server Design Issues C/ S Points of Interaction.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
CSS490 Replication & Fault Tolerance
Client Server Model The client machine (or the client process) makes the request for some resource or service, and the server machine (the server process)
The OSI Model A layered framework for the design of network systems that allows communication across all types of computer systems regardless of their.
Logical Clocks (2). Topics r Logical clocks r Totally-Ordered Multicasting r Vector timestamps.
Process-to-Process Delivery:
1 Lab 3 Transport Layer T.A. Youngjoo Han. 2 Transport Layer  Providing logical communication b/w application processes running on different hosts 
TRANSPORT LAYER T.Najah Al-Subaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
PVM. PVM - What Is It? F Stands for: Parallel Virtual Machine F A software tool used to create and execute concurrent or parallel applications. F Operates.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
TCP/IP Transport and Application (Topic 6)
Chapter 6-2 the TCP/IP Layers. The four layers of the TCP/IP model are listed in Table 6-2. The layers are The four layers of the TCP/IP model are listed.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
MODULE I NETWORKING CONCEPTS.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Winter, 2004CSS490 Synchronization1 Textbook Ch6 Instructor: Munehiro Fukuda These slides were compiled from the textbook, the reference books, and the.
03/11/2015 Michael Chai; Behrouz Forouzan Staffordshire University School of Computing Streaming 1.
The InetAddress Class A class for storing and managing internet addresses (both as IP numbers and as names). The are no constructors but “class factory”
Multimedia and Networks. Protocols (rules) Rules governing the exchange of data over networks Conceptually organized into stacked layers – Application-oriented.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Chapter 4 Message-Passing Programming. The Message-Passing Model.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 OSI and TCP/IP Models. 2 TCP/IP Encapsulation (Packet) (Frame)
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Reliable Communication Smita Hiremath CSC Reliable Client-Server Communication Point-to-Point communication Established by TCP Masks omission failure,
Distributed systems (NET 422) Prepared by Dr. Naglaa Fathi Soliman Princess Nora Bint Abdulrahman University College of computer.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
Fault Tolerance (2). Topics r Reliable Group Communication.
UDP: User Datagram Protocol Chapter 12. Introduction Multiple application programs can execute simultaneously on a given computer and can send and receive.
Message Passing Interface Using resources from
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Advanced Operating System
MPI Message Passing Interface
Multimedia and Networks
Process-to-Process Delivery:
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook, the reference books, and the instructor’s original materials.

Winter, 2004CSS490 MPI2 Group Communication Communication types: One-to-many: broadcast Many-to-one: synchronization, collective communication Many-to-many: gather and scatter Group addressing Using a special network address: IP Class D and UDP Emulating a broadcast with one-to-one communication: Performance drawback on bus-type networks Simpler for switching-based networks Semantics Send-to-all, bulletin-board semantics 0-, 1-, m-out-of-n, all-reliable

Winter, 2004CSS490 MPI3 Atomic Multicast Send-to-all semantics and all-reliable Simple emulation: A repetition of one-to-on communication with acknowledgment What if a receiver fails Time-out retransmission What if a sender fails before all receivers receive a message All receivers forward the message to the same group. A receiver discard the 2 nd or the following messages.

Winter, 2004CSS490 MPI4 Message Ordering R1 and R2 receive m1 and m2 in a different order! Some message ordering required Absolute ordering Consistent ordering Causal ordering FIFO ordering S1R1R2 S2 m1 m2

Winter, 2004CSS490 MPI5 Absolute Ordering Rule: Mi must be delivered before mj if Ti < Tj Implementation: A clock synchronized among machines A sliding time window used to commit message delivery whose timestamp is in this window. Example: Distributed simulation Drawback Too strict constraint No absolute synchronized clock No guarantee to catch all tardy messages mi mj Tj Ti Ti < Tj

Winter, 2004CSS490 MPI6 Consistent Ordering Rule: Messages received in the same order (regardless of their timestamp). Implementation: A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers A message retrieved in incremental order at a receiver Example: Replicated database updation Drawback: A centralized algorithm mi mj Tj Ti Ti < Tj

Winter, 2004CSS490 MPI7 Causal Ordering Rule: Happened-before relation If e k i, e l i ∈ h and k < l, then e k i → e l i, If e i = send(m) and e j = receive(m), then e i → e j, If e → e’ and e’ → e”, then e → e” Implementation: Use of a vector message Example: Distributed file system Drawback: Vector as an overhead Broadcast assumed S1 R1 R2 R3 S2 m1 m2 m3 m4 From R2 ’ s view point m1 → m2

Winter, 2004CSS490 MPI8 Vector Message S[i] = R[i] + 1 where i is the source id S[j] ≤ R[j] where i≠j Site A Site B Site CSite D 2, 1, 1, 0 1, 1, 1, 0 2, 1, 0, 0 delayed delivered 3,1,1,0

Winter, 2004CSS490 MPI9 FIFO Ordering Rule: Messages received in the same order as they were sent. Implementation: Messages assigned a sequence number Example: TCP This is the weakest ordering. Router 1 Router 2 m1 m2 m3 m4 m1 m2 m3 m4 S R

Winter, 2004CSS490 MPI10 Why High-Level Message Passing Tools? Data formatting Data formatted into appropriate types at user level Non-blocking communication Polling and interrupt handled at system call level Process addressing Inflexible hardwired addressing with machine id + local id Group communication Group server implemented at user level Broadcasting simulated by a repetition of one-to-one communication

Winter, 2004CSS490 MPI11 PVM and MPI PVM: Parallel Virtual Machine Developed in 80 ’ s The pioneer library to provide high-level message passing functions The PVM daemon process taking care of message transfer for user processes in background MPI: Message Passing Interface Defined in 90 ’ s The specification of high-level message passing functions Several implementations available: mpich, mpi-lam Library functions directly linked to user programs (no background daemons) The detailed difference is shown by: PVMvsMPI.ps

Winter, 2004CSS490 MPI12 Getting Started with MPI Website: Creating a hostfile: mfukuda]$ vi hosts uw uw uw uw Compile a source program: mfukuda]$ mpiCC source.cpp – o myProg Run the executable file: mfukuda]$ mpirun – np 4 myProg args

Winter, 2004CSS490 MPI13 Program Using MPI #include #include "mpi++.h" int main(int argc, char *argv[]) { MPI::Init(argc, argv); // Start MPI computation int rank = MPI::COMM_WORLD.Get_rank(); // Process ID (from 0 to #processes – 1) int size = MPI::COMM_WORLD.Get_size(); // # participating processes cout << "Hello World! I am " << rank << " of " << size << endl; MPI::Finalize(); // Finish MPI computation }

Winter, 2004CSS490 MPI14 MPI_Send and MPI_Recv Int MPI::COMM_WORLD.Send( void*message/* in */, intcount/* in */, MPI::Datatypedatatype/* in */, intdest/* in */, inttag/* in */) Int MPI::COMM_WORLD.Recv( void*message/* in */, intcount/* in */, MPI::Datatypedatatype/* in */, intsource/* in */, /* MPI::ANY_SOURCE */ inttag/* in */, MPI::Status*status/* out */) /* can be omitted */ MPI::Datatype =CHAR, SHORT, INT, LONG UNSIGNED_CHAR, UNSIGNED_SHORT, UNSIGNED, UNSIGNED_LONG, FLOAT, DOUBLE, LONG_DOUBLE, BYTE, PACKED MPI::Status->MPI_SOURCE, MPI::Status->MPI_TAG, MPI::MPI_ERROR

Winter, 2004CSS490 MPI15 MPI_Send and MPI_Recv #include #include "mpi++.h" main(int argc, char *argv[]) { int tag0 = 0; MPI::Init(argc, argv); // Start MPI computation if (MPI::COMM_WORLD.Get_rank() rank == 0 ) { // rank 0…sender int loop = 3; MPI::COMM_WORLD.Send( "Hello World!", 12, MPI::CHAR, 1, tag0 ); MPI::COMM_WORLD.Send( &loop, 1, MPI::INT, 1, tag0 ); } else { // rank 1…receiver int loop; char msg[12]; MPI::COMM_WORLD.Recv( msg, 12, MPI::CHAR, 0, tag0 ); MPI::COMM_WORLD.Recv( &loop, 1, MPI::INT, 0, tag0 ); for (int I = 0; I < loop; I++ ) cout << msg << endl; } MPI::Finalize(); // Finish MPI computation }

Winter, 2004CSS490 MPI16 Message Ordering in MPI FIFO Ordering in each data type Messages reordered with a tag in each data type SourceDestination SourceDestination tag = 1 tag = 2 tag = 3

Winter, 2004CSS490 MPI17 MPI_Bcast Int MPI::COMM_WORLD.Bcast( void*message/* in */, intcount/* in */, MPI::Datatypedatatype/* in */, introot/* in */) Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 MPI::COMM_WORLD.Bcast( &msg, 1, MPI::INT, 2);

Winter, 2004CSS490 MPI18 MPI_Reduce Int MPI::COMM_WORLD.Reduce( void*operand/* in */, void*result/* out */, intcount/* in */, MPI::Datatypedatatype/* in */, MPI::Opoperator/* in */, introot/* in */) MPI::Op = MPI::MAX (Maximum),MPI::MIN (Minimum),MPI::SUM (Sum), MPI::PROD (Product),MPI::LAND (Logical and),MPI::BAND (Bitwise and), MPI::LOR (Logical or),MPI::BOR (Bitwise or),MPI::LXOR (logical xor), MPI::BXOR(Bitwise xor),MPI::MAXLOC (MAX location)MPI::MINLOC (MIN loc.) Rank0 15 Rank1 10 Rank2 12 Rank3 8 Rank4 4 MPI::COMM_WORLD.Reduce( &msg, &result, 1, MPI::INT, MPI::SUM, 2); 49

Winter, 2004CSS490 MPI19 MPI_Allreduce Int MPI::COMM_WORLD.Allreduce( void*operand/* in */, void*result/* out */, intcount/* in */, MPI::Datatypedatatype/* in */, MPI::Opoperator/* in */)

Winter, 2004CSS490 MPI20 Exercises (No turn-in) 1. Consider an application requiring both one-to-many and many-to-one communication. 2. Consider an application requiring atomic multicast. 3. Assume that four processes communicate with one another in causal ordering. Their current vectors are show below. If Process A sends a message, which processes can receive it immediately? 4. Consider pros and cons of PVM ’ s daemon-based and MPI ’ s library linking-based message passing. 5. Why can MPI maintain FIFO ordering? Process AProcess BProcess CProcess D 3, 5, 2, 12, 5, 2, 13, 5, 2, 13, 4, 2, 1