Group Communication Robbert van Renesse CS614 – Tuesday Feb 20, 2001.

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Teaser - Introduction to Distributed Computing
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Lab 2 Group Communication Andreas Larsson
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC 688/788 Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Optimistic Virtual Synchrony Jeremy Sussman - IBM T.J.Watson Idit Keidar – MIT LCS Keith Marzullo – UCSD CS Dept.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Fault Tolerant Services
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Group Communication Theresa Nguyen ICS243f Spring 2001.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Fault Tolerance (2). Topics r Reliable Group Communication.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Lecture 9: Multicast Sep 22, 2015 All slides © IG.
Indirect Communication Indirect Communication is defined as communication between entities in DS through intermediary with no direct coupling b/w sender.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Systems Lecture 7 Multicast 1. Previous lecture Global states – Cuts – Collecting state – Algorithms 2.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Replication & Fault Tolerance CONARD JAMES B. FARAON
Reliable group communication
Advanced Operating System
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
Seminar Mobilkommunikation Reliable Multicast in Wireless Networks
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Group Communication Robbert van Renesse CS614 – Tuesday Feb 20, 2001

Distr. programming is hard True concurrency No shared memory No locks Host failures & recoveries Network failures Too many scenarios to wrap your brain around Coordination hard to achieve

Kinds of Distributed Apps Replicated Services Parallel Computing Factory Floor Control Management Services Cluster Services Distributed Games …

Commonality Each requires coordination between distributed, possibly flaky components over a possibly flaky network Each involves a dynamic group of processes communicating with one another

Basic operations JoinGroup(“group”, event-handler) –Events: Point-to-Point and multicast messages Member Join and Leave (Crash) events SendP2P(“group”, member-id, message) Multicast(“group”, message) LeaveGroup(“group”)

Programming Example Main(){ char buf [ BUF_SIZE ]; group = JoinGroup(“chat”, EventHandler); while (read(buf) != EOF) { Multicast(group, buf); }

Programming Example, cont’d EventHandler(event){ switch (event.type) { case VIEW: printf(“New view: %v”, event.view); break; case DATA: printf(“%s: %s”, event.source, event.data); }

What can go wrong? Message M gets delivered to X but not to Y –Lack of agreement on delivery M1 gets delivered before M2 at X, but the other way around at Y –Lack of order on delivery X thinks Z is up, while Y thinks Z is down –Lack of agreement on membership  lack of coordination  programmer has to consider many scenarios

Example 1: Replicated Service Updates are multicast Lack of agreement on delivery of messages can cause an update to be applied to only some of the replicas Lack of order on delivery of messages can cause updates to be applied in different orders at different replicas

Example 2: Parallel Computation Membership  Partitioning of task Lack of agreement on membership  Different processes partition the task differently  Too little or too much work is done

Making life easier Reduce the number of possible scenarios by supporting network protocols that guarantee –Agreement of message delivery –Order of message delivery –Agreement of membership updates –Order of membership updates Results in fewer things to think about

Some Terminology Group: a set of processes Member: a process in the group View: a uniquely identified set of members as seen by one or more group members –View should approach reachable set of members –Members install new views over time –Each installed view of a group member always includes itself –A member never installs the same view twice

Events A group member observes the following events: –Join() –Leave() –Crash() –View-Change(view) –Send-Multicast(msg) –Receive-Multicast(msg, sender) (We ignore multiple groups and point-to-point traffic for the rest of today)

Graphical representation Messag e Loss Crash Join View Change TIME Member

Traces and Properties Trace: history of events –E.g.: X sends msg4, Y receives msg3, X gets view3, Y receives msg4, X gets view3, … Property: predicate on potential traces –E.g.: Messages are delivered in the order they were sent –E.g.: Messages sent are eventually delivered to all correct processes

Protocols Properties are implemented by protocols Each protocol is a layer of software Syntax the same for each layer: –Join(), Send-Multicast(), … –Snap together like Lego blocks Semantics different: –Unreliable  Reliable –Unordered  Ordered –…

Protocols may be layered Seqno layer FIFO total unreliable Token layer For example: Seqno Token STACK

For example: Reliability Property: A message that is sent is eventually delivered to all correct processes Protocol: ack/timeout/retransmission

For example: Total Order Property: if two processes deliver the same two messages, they deliver them in the same order Protocol: centralized sequencer, or rotating token, or …

Other examples PropertyProtocol FIFO orderSequence number Bound on resource useFlow Control ConfidentialityEncryption IntegrityChecksum Consensus on MembershipMembership Failures are detectedHeartbeat ……

Dependencies Total ordering protocols typically depend on reliable delivery:  layer ordering protocol on top of reliability protocol

Toolkits here at Cornell Horus and Ensemble are both protocol stack toolkits, each supplying dozens of protocol layers for group communication Plug’n’play allows applications to choose just those protocols that they require, rather one-size-fits-all Good performance and flexibility

Typical protocol stack TOTAL ORDERING MEMBERSHIP FLOW CONTROL RETRANSMISSION CHECKSUM UDP/IP

Each layer adds header

Extreme example app: Replicated State Machine Model of replicated service: –Each replica is a state machine –Initial state is the same –They receive the same update messages –They receive them in the same order  keeps replicas in the same state

What we want State Transfer

What we don’t want No order Messag e Loss Inconsistent View

What we need: Virtual Synchrony Introduced by Ken Birman / Isis project Agreement and Ordering of messages State Transfer Failure Detection Discovery of New Members Consensus on views

Example: replicated integer int X; Main(){ Join(EventHandler); for (;;) { client := receive(ClientRequest); switch (ClientRequest.type) { case ReadInt: reply(client, X); case WriteInt:multicast(ClientRequest.value); }

Example cont’d EventHandler(event){ switch (event.type) { case View: /* nothing */break; case Update: X := event.value;break; case GetState:return X; case PutState: X := event.state;break; }

What is a correct process? Many properties talk about correct processes, but what is one? Process X calls process Y correct if –Y is currently in X’s view –Y will be in X’s next view  correctness relative to views!

Reliability revisited A message sent by X in view V is delivered to all correct members (from X’s perspective), and possibly to some incorrect members in V as well… If you don’t want the latter part, there’s something called “uniform” or “safe” delivery, which is a much more expensive property.

View Consensus A message can only be delivered to members in one and the same view.  requires members to agree on views. Also, a member cannot receive messages sent in a different view than its current view.

Failure Detection A crashed or unavailable member is eventually removed from views.  guarantees some form of progress (without this, every member would be correct, resulting in infinite blocking while the reliability protocol tries to deliver messages to faulty members.

Message Orderings Several orderings are optional: –Unordered –FIFO: messages from the same sender delivered in the order they were sent –Causal: message delivery respects Lamport’s causality relation –Total: as before

View Ordering Total: –If two processes both deliver V1 and V2, they do so in the same order Without it, the definition of correctness would not make much sense…

Why is all this good? The number of possible scenarios has been reduced significantly Between two views, you can pretend there is no message loss, and members do not crash State Transfer is usually a very easy way to deal with Joins/Recoveries

Useful for lots of stuff Replication Leader election: who’s going to be responsible for some external event –Primary back-up Partitioning work: consensus on views allows parallel computations to split up the work Distributed Games: consensus on participants’ views on the virtual world