Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.

Slides:



Advertisements
Similar presentations
CS425/CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Advertisements

1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
CS542 Topics in Distributed Systems Diganta Goswami.
Multicast Fundamentals n The communication ways of the hosts n IP multicast n Application level multicast.
Page 1 Introduction Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Lab 2 Group Communication Andreas Larsson
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Slide Set 15: IP Multicast. In this set What is multicasting ? Issues related to IP Multicast Section 4.4.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Internet Networking Spring 2002
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Delivery, Forwarding, and Routing
MULTICASTING Network Security.
Spanning Tree and Multicast. The Story So Far Switched ethernet is good – Besides switching needed to join even multiple classical ethernet networks Routing.
© J. Liebeherr, All rights reserved 1 IP Multicasting.
CSE679: Multicast and Multimedia r Basics r Addressing r Routing r Hierarchical multicast r QoS multicast.
Connecting LANs, Backbone Networks, and Virtual LANs
Chapter 22 Network Layer: Delivery, Forwarding, and Routing
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Multicast routing.
Lecture 2 TCP/IP Protocol Suite Reference: TCP/IP Protocol Suite, 4 th Edition (chapter 2) 1.
22.1 Chapter 22 Network Layer: Delivery, Forwarding, and Routing Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Multicast Routing Protocols NETE0514 Presented by Dr.Apichan Kanjanavapastit.
Dec 4, 2007 Reliable Multicast Group Neelofer T. CMSC 621.
Multicasting. References r Note: Some slides come from the slides associated with this book: “Mastering Computer Networks: An Internet Lab Manual”, J.
Broadcast and Multicast. Overview Last time: routing protocols for the Internet  Hierarchical routing  RIP, OSPF, BGP This time: broadcast and multicast.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Chapter 22 Network Layer: Delivery, Forwarding, and Routing Part 5 Multicasting protocol.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
© J. Liebeherr, All rights reserved 1 Multicast Routing.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
© J. Liebeherr, All rights reserved 1 IP Multicasting.
Routing and Routing Protocols
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
1 IP Multicasting Relates to Lab 10. It covers IP multicasting, including multicast addressing, IGMP, and multicast routing.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
4: Network Layer4-1 Chapter 4: Network Layer Last time: r Internet routing protocols m RIP m OSPF m IGRP m BGP r Router architectures r IPv6 Today: r IPv6.
Multicasting  A message can be unicast, multicast, or broadcast. Let us clarify these terms as they relate to the Internet.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Multicast Communications
Failure detection The design of fault-tolerant systems will be easier if failures can be detected. Depends on the 1. System model, and 2. The type of failures.
Fault Tolerance (2). Topics r Reliable Group Communication.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
22.1 Network Layer Delivery, Forwarding, and Routing.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
1 CMPT 471 Networking II Multicasting © Janice Regan,
Kapitel 19: Routing. Kapitel 21: Routing Protocols
Multicast Outline Multicast Introduction and Motivation DVRMP.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EE 122: Lecture 13 (IP Multicast Routing)
Implementing Multicast
Optional Read Slides: Network Multicast
Distributed Systems CS
Presentation transcript:

Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:  Open group (anyone can join, customers of Walmart )  Closed groups (membership is closed, class of 2000 )  Peer-to-peer group (all have equal status, graduate students of CS department, members in a videoconferencing / netmeeting )  Hierarchical groups (one or more members are distinguished from the rest. President and the employees of a company, distance learning ).

Major issues Various forms of multicast to communicate with the members. Examples are  Atomic multicast  Ordered multicast Dynamic groups  How to correctly communicate when the membership constantly changes?  Keeping track of membership changes

Atomic multicast A multicast is atomic, when the message is delivered to every correct member, or to no member at all. In general, processes may crash, yet the atomicity of the multicast is to be guaranteed. How can we implement atomic multicast?

Basic vs. reliable multicast Basic multicast does not consider failures. Reliable multicast tolerates (certain kinds of) failures. Three criteria for basic multicast : Liveness. Each process must receive every message Integrity. No spurious message received No duplicate. Accepts exactly one copy of a message

Reliable atomic multicast Sender’s programReceiver’s program i:=0; if m is new → do i ≠ n →accept it; send message to member [i];multicast m ; i:= i+1 [] m is duplicate → discard m od fi Tolerates process crashes. Why does it work?

Multicast support in networks Multicast is an extremely important operation in networking, and there are numerous protocols for multicast. Sometimes, certain features available in the infrastructure of a network simplify the implementation of multicast. Examples are Multicast on an ethernet LAN IP multicast for wide area networks

IP Multicast IP multicast is a bandwidth-conserving technology where the router reduces traffic by replicating a single stream of information and forwarding them to multiple clients. Very popular for for streaming content distribution. Sender sends a single copy to a special multicast IP address (Class D) that represents a group, where other members register.

IP Multicast Internet radio. BBC has encouraged UK-based Internet service providers to adopt multicast-addressable services in their networks by providing BBC Radio at higher quality than is available via their unicast-addressed services. Instant movie or TV. Netflix uses a version of IP multicast for the instant streaming of their movies at homes through the Blu-ray Disc players Distance learning. Involves live transmission of the course material to a large number of students at geographically distributed locations.

Distribution trees Routers maintain & update distribution trees whenever members join / leave a group. The shared tree is also called core-based tree. All multicasts are Routed via the Rendezvous point. Too much load on routers. Application layer multicast overcomes this. source Rendezvous point Source tree Shared tree

Distribution trees The routers have to know about group composition. As groups memberships change, routers have to be updated. Maintaining data about group composition and replicating the copy for each create too much load on routers. Application layer multicast overcomes this. The responsibility of multicast is left to the applications layer. Each multicast is implemented as a series of unicasts, and their routes are carefully planned to reduce the stress on the links.

Application Layer Multicast Two examples of application layer multicast with host 0 as the source. (a)The stress on the link from host 0 to router A is 3. (b) The stress on no link exceeds 2.

Reverse Path Forwarding RPF is a multicast protocol that uses both source and destination addresses. It guarantees loop-free forwarding of multicast packets to their destinations. It builds a shortest path tree for each source node. Each recipient of a packet from a source node  looks up source address in its own routing table  compares the entry port with receiving interface ( the packet must be received from the port that the router would use to forward the return packet. This prevents possible looping as in flooding, IP spoofing etc.)  if wrong interface (other than the shortest path port), drop for each outgoing interface with group members downstream, forward the packet.

Reverse Path Forwarding The blue link denotes the first hop for the return path to the source. The messages represented by the red arrows will be dropped source

Ordered multicasts: Basic versions only Total order multicast. Every member must receive all updates in the same order. Example: consistent update of replicated data on servers Causal order multicast. If a, b are two updates and a happened before b, then every member must accept a before accepting b. Example: implementation of a bulletin board. Local order (a.k.a. S ingle source FIFO ). Example: video distribution, distance learning using “push technology.”

Implementing total order multicast First method. Basic multicast using a sequencer { The sequencer S} define seq: integer (initially 0} do receive m → multicast (m, seq); seq := seq+1; deliver m od sequencer

Implementing total order multicast Second method. Basic multicast without a sequencer. Uses the idea of 2PC (two-phase commit) Each process will deliver the messages from the senders in the order (p, r, q)

Implementing total order multicast Step 1. Sender i sends (m, ts) to all Step 2. Receiver j saves it in a holdback queue, and sends an ack (a, ts) Step 3. Sender receive all acks, and pick the largest ts. Then send (m, ts, commit) to all. Step 4. Receiver removes it from the holdback queue and delivers m in the ascending order of timestamps. Why does it work?

Implementing causal order multicast Basic multicast only. Uses vector clocks. Recipient i will deliver a message from j iff 1. VC j (j) = LC j (i) + 1 {LC = local vector clock} 2. ∀ k: k≠j :: VC k (j) ≤ LC k (i) VC = incoming vector clock LC = Local vector clock Note the slight difference in the implementation of the vector clocks

Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes are required to receive the messages from all correct processes in the group. Multicasts by faulty processes will either be received by every correct process, or by none at all.

A theorem on reliable multicast Theorem. In an asynchronous distributed system, total order reliable multicasts cannot be implemented when even a single process undergoes a crash failure. (Hint) The implementation will violate the FLP impossibility result. Complete the arguments!

Scalable Reliable Multicast IP multicast or application layer multicast provides unreliable datagram service. Reliability requires the detection of the message omission followed by retransmission. This can be done using ack. However, for large groups (as in distance learning applications or software distribution) scalability is a major problem. The reduction of acknowledgements and retransmissions is the main contribution in Scalable Reliable Multicasts (SRM) (Floyd et. al).

Scalable Reliable Multicast If omission failures are rare, then then instead of using ACK, receivers will only report the non-receipt of messages using NACK. If several members of a group fail to receive a message, then each such member waits for a random period of time before sending its NACK. This helps to suppress redundant NACKs. Sender multicasts the missing copy only once. Use of cached copies in the network and selective point-to-point retransmission further reduces the traffic.

Scalable Reliable Multicast Receiving processes have to detect the non receipt of messages from the source. Each member periodically broadcasts a sessions message containing the largest sequence number received by it. Other members figure out which messages they did not receive.

Scalable Reliable Multicast Source sending m[0], m[1], m[2] … Missed m[7] and sent NACK Missed m[7] and sent NACK Missed m[7] and sent NACK m[7] cached here m[7] cached here m[7]

Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent views can lead to problems. Example: Four members (0,1,2,3) will send out 144 s. Assume that 3 left the group but only 2 knows about it. So, 0 will send 144/4 = 36 s (first quarter 1-36) 1 will send 144/4 = 36 s (second quarter 37-71) 2 will send 144/3 = 48 s (last one-third ) 3 has left. The mails will not be delivered!

Dealing with open groups Views can change unpredictably, and no member may have exact information about who joined and who left at any given time. These views and their changes should propagate in the same order to all members.

Dealing with open groups Example. Current view (of all processes) v 0 (g) = {0, 1, 2, 3}. Let 1, 2 leave and 4 join the group concurrently. This view change can be serialized in many ways: {0,1,2,3}, {0,1,3} {0,3,4}, OR {0,1,2,3}, {0,2,3}, {0,3}, {0,3,4}, OR {0,1,2,3}, {0,3}, {0,3,4} To make sure that every member observe these changes in the same order, changes in the view should be sent via total order multicast.

View propagation {Process 0}: v 0 (g); v 0 (g) = {0,1,2,3}, send m1,... ; v 1 (g); send m2, send m3; v 1 (g) = {0,1,3}, v 2 (g) ; {Process 1}: v 2 (g) = {0,3,4} v 0 (g); send m4, send m5; v 1 (g); send m6; v 2 (g)...;

View delivery guidelines Rule 1. If a process j joins and continues its membership in a group g that already contains a process i, then eventually j appears in all views delivered by process i. Rule 2. If a process j permanently leaves a group g that contains a process i, then eventually j is excluded from all views delivered by process i.

View-synchronous communication Rule. With respect to each message, all correct processes have the same view. m sent in view V ⇒ m received in view V This is also known as virtual synchrony

View-synchronous communication Agreement. If a correct process k delivers a message m in v i (g ) before delivering the next view v i+1 (g), then every correct process j ∈ v i (g) ∩ v i+1 (g) must deliver m before delivering v i+1 (g). Integrity. If a process j delivers a view v i (g ), then v i (g ) must include j. Validity. If a process k delivers a message m in view v i (g) and another process j ∈ v i (g) does not deliver that message m, then the next view v i+1 (g) delivered by k must exclude j. v i (g ) v i+1 (g), m v i (g ) v i+1 (g), m Sender k Receiver j

Example Let process 1 deliver m and then crash. Possibility 1. No one delivers m, but each delivers the new view {0,2,3}. Possibility 2. Processes 0, 2, 3 deliver m and then deliver the new view {0,2,3} Possibility 3. Processes 2, 3 deliver m and then deliver the new view {0,2,3} but process 0 first delivers the view {0,2,3} and then delivers m. Are these acceptable? {0,1,2,3}{0,2,3} m m m Possibility 3

Overview of Transis What is Transis? A group communication system developed by Danny Dolev and his group at the Hebrew University of Jerusalem. (see Important objectives of Transis are to develop a framework for Computer Supported Cooperative Work (CSCW) applications, such as multi-media and desktop conferencing, as well as a scalable Video-on- demand service.

Overview of Transis (2) What is Transis? Deals with open group Supports scalable reliable multicast Tackles network partition. Allows the partitions to continue working and later restore consistency upon merger

Overview of Transis (3)

Overview of Transis (4) FIFO (single source) Causal mode (maintains causal order) Agreed mode (maintains total order that does not conflict with the causal order) Safe mode (Delivers a message only after all processes in the current configuration have acknowledged its reception) If safe message M is delivered in a configuration that includes process P, then P will deliver M unless it crashes.

Overview of Transis (5) Each partition assumes that the machines in the other partition have failed, and maintains virtual synchrony within its own partition only. After repair, consistency is restored in the entire system. Dealing with partition

Dealing with partitions Assume that when A was sending a safe message M, the configuration changed to {A, B, C} → {A, B}, {C}. Case 1. All but C sent ack to A, B. Now, to deliver M, A, B must receive the new view {A,B} first. A B C M ack ? M Extended virtual synchrony

Dealing with partitions Assume A was sending a safe message M, and the configuration changed to {A, B, C} → {A, B}, {C}. Case 2. If C ack’ed M and also received acks from A and B before the partition, then C will deliver M before it receives the new view {C}. A B C M ack M Otherwise, C will ignore message M as spurious without contradicting any guarantee.

Continuing operation in spite of a partition Electronic Town Hall A B C D E Polling booth Votes are counted manually at each booth. The count of each vote is multicast to every other booth, so that the latest count is displayed at each booth. Apparently, if a single wire breaks, the counting will stop. However, it is silly. Counting could easily continue at the individual booths, and the results could be merged later. This is the Transis approach: supporting operations within partitions whenever possible. Polling booth