Distributed Algorithms: Asynch R/W SM Computability Eli Gafni, UCLA Summer Course, CRI, Haifa U, Israel.

Slides:

Advertisements

Similar presentations

N-Consensus is the Second Strongest Object for N+1 Processes Eli Gafni UCLA Petr Kuznetsov Max Planck Institute for Software Systems.

Advertisements

Bigamy Eli Gafni UCLA GETCO Outline Models, tasks, and solvability What is SM? r/w w.f. asynch computability Sperner Lemma as a consequence of the.

Teaser - Introduction to Distributed Computing

Is 1 different from 12? Eli Gafni UCLA Eli Gafni UCLA.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.

Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.

Consensus Hao Li.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

R/W Reductions Eli Gafni UCLA ICDCN06 12/30. Outline Tasks and r/w impossible task: –2 cons –3 cons NP-completeness R/W reduction “Weakest Unsolvable.

Byzantine Generals Problem: Solution using signed messages.

1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.

1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

Group-Solvability The Ultimate Wait-Freedom Eli Gafni UCLA DISC /4/04.

CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:

2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Concurrency in Distributed Systems: Mutual exclusion.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.

CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.

Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.

1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.

DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.

1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve.

1 Chapter 10 Distributed Algorithms. 2 Chapter Content This and the next two chapters present algorithms designed for loosely-connected distributed systems.

DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department

SysRép / 2.5A. SchiperEté The consensus problem.

Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.

Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.

“Towards Self Stabilizing Wait Free Shared Memory Objects” By:  Hopeman  Tsigas  Paptriantafilou Presented By: Sumit Sukhramani Kent State University.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

1 SECOND PART Algorithms for UNRELIABLE Distributed Systems: The consensus problem.

1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.

Algorithms for UNRELIABLE Distributed Systems:

When Is Agreement Possible

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Alternating Bit Protocol

Distributed Consensus

Agreement Protocols CS60002: Distributed Systems

Parallel and Distributed Algorithms

Distributed Systems, Consensus and Replicated State Machines

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Presentation transcript:

Distributed Algorithms: Asynch R/W SM Computability Eli Gafni, UCLA Summer Course, CRI, Haifa U, Israel

Computational Models Serial Van-Neuman –Turing Machines, RAM –PRAM Interleaving –ND Turing Machine - exits a paths –Asynch Distributed - for all paths

Message-Passing Interleaving N processors each with own program of the type “upon receiving message X in state Z do Y”. Y may involve sending messages to certain other processors. Special message type ``start’’ that can be received (or acted upon) only as first message.

MP Cont’ed ``Configuration’’ - messages in the `èther’’ distained to processors + states of processors. ``next Configuration’’ - choose a messge from the `èther’’ and deliver it to its destination processor. Operate the `Ùpon’’ procedure. Change processor’s state, and place new messages in the `èther’’. Initial configuartion - Start message in `èther’’ to each processor. Processors in intial state.

Full-Information History of a processor determines its state. History contains more info than ``state.’’ Processors’ programs are ``common knowledge.’’ W.l.o.g when interested in computability rather than efficiency - message is the history of the sender.

Computability (Problems/Tasks) Each processor eventually halts with ``output’’. Some n-tuple outputs are valid some are not. To prevent ``default’’ solutions the output tuples are parametrized by the set of processors that ``started.’’

Fail-Free Model Every message eventually delivered, and all processors eventually respond. All problems are solvable: –Receive/send message to/from all –Determine the set of ``starters’’ –Apply default output to that set of starters

Faults Communication: ``channel’’ goes down. Processor: Fail-Stop, Byzantine. Many other faults possible: Discuss.

Communication Synchronous: Proceed by ``rounds’’. Every message sent at the beginning of a round is received by the end of the round. Assume 2 processors/ 2 channels and one of the channels may fail-stop to deliver

Consensus Proc’s vote ``attack/retreat.’’ If both vote same and no communication failure both eventually decide their vote. Else they decide both either ``attack’’ or ``retreat.’’

2 procs synch cons w/ single channel failure in a round: Impossible. Cannot do it with no communication Cannot do it in 1 round: –A_1 donot receive must decide A since its view is compatible with A_2 no fault run. –Similarly R_2. –Say w.l.o.g R_2/A_1 no fault is R –Fail the message to A_1 to get contradiction.

Impossibility Cont’ed In general A_1 has to decide A comm with A_2 whether it receive or does not receive the last message. It must have commited to A at the end of the previous round, so does A_2, so the last round is not necessary, Contradiction.

N>2 No alg for even N when N-1 channels may fail - one of the 2 procs emulate N-1 procs and ine emulates the remaining one. Alg for less then N-1 channels: –Send input to all, receive –Send all that received to all, receive –Until have heard input of all: decide. –(prove liveness…)

From Synch to ``Asynch’’ What if the faults are less N-1 in each round, but the set of faults may ``jump around’’? Correctness of the prev alg does not depend on the faults being static.

Proc fail-stop failure What if synch, no comm failure, but proc fail-stop. What if t at most can fail-stop? In t+1 rounds there is a ``clean’’ round that the view of all inputs is shared by all.

What if single failure that ``Jumps’’ In each round some or all messages from a SINGLE proc may not be delivered. If N=2 obviously cannot do cons since can emulate comm failure. What if N=3? ---- suspense.

SWMR Async Shared Memory n procs p_0,…,p_n and n cells C_1,…,C_n. Proc p_k writes to C_k and can read all cells one at a time. Configuration: each proc at a state that enable writing its cell, or reading some cell. A proc is chosen it reads or writes, changes state, and another is chosen, etc.

Properties of SM If all procs write and then read all cells in arbitrary order, then each p_k returns the set of procs S_k it has seen write: –What is the property of the sets that makes them realization of SM execution? –At least one sees all, continue inductively –Fat Immediate Snapshots

Immediate Snapshots p_k \in S_k (proc reads itself) p_k \subseteq p_j or vice versa If p_j in S_k then S_j \subseteq S_k Can you implement IS in SWMR SM? Atomic Snap is I without the last property IS \subseteq AS

How do you Take AS in the middle of Comp rather than as a task? Use sequence numbers with each new value written, double sacn until success AS as a model - the read operation returns the whole memory rather than a single cell.