NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus Hao Li.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
CS 582 / CMPE 481 Distributed Systems
Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.
CS 582 / CMPE 481 Distributed Systems
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Byzantine Fault Tolerance CS 425: Distributed Systems Fall Material drived from slides by I. Gupta and N.Vaidya.
Distributed Storage System Survey
1 The Google File System Reporter: You-Wei Zhang.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Spring 2003CS 4611 Replication Outline Failure Models Mirroring Quorums.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Coordination and Agreement
The consensus problem in distributed systems
CSE 486/586 Distributed Systems Leader Election
Distributed Systems – Paxos
Lecture 17: Leader Election
Two phase commit.
EECS 498 Introduction to Distributed Systems Fall 2017
Implementing Consistency -- Paxos
Outline Announcements Fault Tolerance.
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Leader Election
Consensus, FLP, and Paxos
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
Replicated state machine and Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra

Agenda Basic Algorithms such as Leader Election Consensus in Distributed Systems Replication and Fault Tolerance in Distributed Systems GFS as an example of a Distributed System

Network Algorithms Distributed System is a collection of entities where Each of them is autonomous, asynchronous and failure-prone Communicating through unreliable channels To perform some common function Network algorithms enable such distributed systems to effectively perform these “common functions”

Gobal State in Distributed Systems We want to estimate a “consistent” state of a distributed system Required for determining if the system is deadlocked, terminated and for debugging Two approaches: 1. Centralized- All processes and channels report to a central process 2. Distributed – Chandy Lamport Algorithm

Chandy Lamport Algorithm Based on Marker Messages M On receiving M over channel c: If state is not recorded: a) Record own state b) Start recording state of incoming channels c) Send Marker Messages to all outgoing channels Else a) Record state of c

Chandy Lamport Algorithm P1 P2 P3 e10e10 e20e20 e23e23 e30e30 e13e13 a b M e 1 1,2 M 1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels Ch 21 and Ch 31 e 2 1,2,3 M M 2- P2 receives Marker over Ch 12, records its state (S2), sets state(Ch 12 ) = {} sends Marker to P1 & P3; turns on recording for channel Ch 32 e14e14 3- P1 receives Marker over Ch 21, sets state(Ch 21 ) = {a} e 3 2,3,4 M M 4- P3 receives Marker over Ch 13, records its state (S3), sets state(Ch 13 ) = {} sends Marker to P1 & P2; turns on recording for channel Ch 23 e24e24 5- P2 receives Marker over Ch 32, sets state(Ch 32 ) = {b} e31e31 6- P3 receives Marker over Ch 23, sets state(Ch 23 ) = {} e13e13 7- P1 receives Marker over Ch 31, sets state(Ch 31 ) = {} Taken from CS 425/UIUC/Fall 2009

Leader Election Suppose you want to -elect a master server out of n servers -elect a co-ordinator among different mobile systems Common Leader Election Algorithms -Ring Election -Bully Election Two requirements - Safety (Process with best attribute is elected) - Liveness (Election terminates)

Ring Election Processes organized in a ring Send message clockwise to next process in a ring with its id and own attribute value Next process checks the election message a) if its attribute value is greater, it replaces its own process id with that in the message. b) If the attribute value is less, it simply passes on the message c) If the attribute value is equal it declares itself as the leader and passes on an “elected” message. What happens when a node fails?

Ring Election - Example Taken from CS 425/UIUC/Fall 2009

Ring Election - Example Taken from CS 425/UIUC/Fall 2009

Bully Algorithm Best case and worst case scenarios Taken from CS 425/UIUC/Fall 2009

Consensus A set of n processes/systems attempt to “agree” on some information P i begins in undecided state and proposes value v i єD P i ‘s communicate by exchanging values P i sets its decision value d i and enters decided state Requirements: 1.Termination: Eventually all correct processes decide, i.e., each correct process sets its decision variable 2. Agreement : Decision value of all correct processes is the same 3. Integrity: If all correct processes proposed v, then any correct decided process has d i = v

2 Phase Commit Protocol Useful in distributed transactions to perform atomic commit Atomic Commit: Set of distinct changes applied in a single operation Suppose A transfers 300 $ from A’s account to B’s bank account. A= A-300 B=B+300 These operations should be guaranteed for consistency.

2 Phase Commit Protocol What happens if the co-ordinator and a participant fails after doCommit?

Issue with 2PC Co- ordinator B A CanCommit?

Issue with 2PC Co- ordinator B A Yes

Issue with 2PC Co- ordinator B A doCommit A crashes Co-ordinator Crashes B commits A new co-ordinator cannot know whether A had committed.

3 Phase Commit Protocol (3PC) Use an additional stage

3PC Cont… Co-ordinator Cohort 1 Cohort 2 Cohort 3 canCommitackpreCommitack commit

3PC Cont… Why is this better? 2PC: execute transaction when everyone is willing to COMMIT it 3PC: execute transaction when everyone knows it will COMMIT ( But 3PC is expensive Timeouts triggered by slow machines

Paxos Protocol A consensus algorithm Important Safety Conditions: Only one value is chosen Only a proposed value is chosen Important Liveness Conditions: Some proposed value is eventually chosen Given a value is chosen, a process can learn the value eventually Nodes behave as Proposer, Acceptor and Learners

Paxos Protocol – Phase 1 22 Proposer Acceptor Select a number n for proposal of value v Prepare message What about this acceptor? Majority of acceptors is enough Acceptors respond back with the highest n it has seen Acknowledgement

Paxos Protocol – Phase 2 23 Proposer Acceptor n n n Majority of acceptors agree on proposal n with value v

Paxos Protocol – Phase 2 24 Proposer Acceptor Majority of acceptors agree on proposal n with value v Accept Acceptors accept What if v is null?

Paxos Protocol Cont… What if arbitrary number of proposers are allowed? P Q Acceptor n1 Round 1 Round 2 n2

Paxos Protocol Cont… What if arbitrary number of proposers are allowed? To ensure progress, use distinguished proposer P Q Acceptor Round 1 Round 2 n3 Round 3 n4 Round 4

Paxos Protocol Contd… Some issues: a) How to choose proposer? b) How do we ensure unique n ? c) Expensive protocol d) No primary if distinguished proposer used Originally used by Paxons to run their part-time parliament

Replication Replication is important for 1. Fault Tolerance 2. Load Balancing 3. Increased Availability Requirements: 1. Transparency 2. Consistency

Failure in Distributed Systems An important consideration in every design decision Fault detectors should be : a) Complete – should be able to detect a fault when it occurs b) Accurate – Does not raise false positives

Byzantine Faults Arbitrary messages and transitions Cause: e.g., software bugs, malicious attacks Byzantine Agreement Problem: “Can a set of concurrent processes achieve coordination in spite of the faulty behavior of some of them?” Concurrent processes could be replicas in distributed systems

Practical Byzantine Fault Tolerance(PBFT)

PBFT Cont.. C R1 R2 R3 R4 requestpre-prepareprepare commit reply C : Client R1: Primary replica Client blocks and waits for f+1 replies After accepting 2f prepares Execution after 2f+1 commits

PBFT Cont… The algorithm provides -> Safety By guaranteeing linearizability. Pre-prepare and prepare ensures total order on messages -> Liveness By providing for view change, when the primary replica fails. Here, synchrony is assumed. How do we know apriori the value of f?

Google File System Revisited traditional file system design 1. Component failures are a norm 2. Multi-GB Files are common 3. Files mutated by appending new data 4. Relaxed consistency model

GFS Architecture Leader Election/ Replication Maintains metadata, namespace, chunk metadata etc

GFS – Relaxed Consistency

GFS – Design Issues Single Master Rational: Keep things simple Problems: 1.Increasing volume of underlying storage -> Increase in metadata 2.Clients not as fast as master server -> Master server became bottleneck Current: Multiple Masters per data center Ref:

GFS Design Isuues Replication of chunks a) Replication across racks – default number is 3 b) Allowing concurrent changes to the same file. -> In retrospect, they would rather have a single writer c) Primary replica serializes mutation to chunks - They do not use any of the consensus protocols before applying mutations to the chunks. Ref:

THANK YOU