Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb 20161.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Distributed Leader Election Algorithms in Synchronous Ring Networks
Impossibility of Distributed Consensus with One Faulty Process
Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.
Lecture 8: Asynchronous Network Algorithms
Distributed Computing 1. Lower bound for leader election on a complete graph Shmuel Zaks ©
Uncoordinated Checkpointing The Global State Recording Algorithm.
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Chapter 15 Basic Asynchronous Network Algorithms
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Distributed Computing 2. Leader Election – ring network Shmuel Zaks ©
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Distributed Computing 1. Introduction Shmuel Zaks ©
Self Stabilizing Algorithms for Topology Management Presentation: Deniz Çokuslu.
Lecture 7: Synchronous Network Algorithms
Distributed Computing 5. Snapshot Shmuel Zaks ©
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Byzantine Generals Problem: Solution using signed messages.
Università degli Studi dell’Aquila Academic Year 2009/2010 Course: Algorithms for Distributed Systems Instructor: Prof. Guido Proietti Time: Monday:
1 Complexity of Network Synchronization Raeda Naamnieh.
CPSC 668Set 1: Introduction1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Bit Complexity of Breaking and Achieving Symmetry in Chains and Rings.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Lecture 6: Introduction to Distributed Computing.
Distributed Computing 5. Snapshot Shmuel Zaks ©
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Selected topics in distributed computing Shmuel Zaks
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Why do we need models? There are many dimensions of variability in distributed systems. Examples: interprocess communication mechanisms, failure classes,
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Distributed Computing 3. Leader Election – lower bound for ring networks Shmuel Zaks ©
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 1: Introduction 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
1 Broadcast. 2 3 Use a spanning tree Root 4 synchronous It takes the same time at link to send a message It takes the same time at each node to process.
Lecture #14 Distributed Algorithms (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Leader Election (if we ignore the failure detection part)
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Hwajung Lee. Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.   i,j  V  i,j are non-faulty ::
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty ::
Lecture 9: Asynchronous Network Algorithms
Leader Election (if we ignore the failure detection part)
CS60002: Distributed Systems
Agreement Protocols CS60002: Distributed Systems
Parallel and Distributed Algorithms
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Lecture 8: Synchronous Network Algorithms
Leader Election Ch. 3, 4.1, 15.1, 15.2 Chien-Liang Fok 4/29/2019
Presentation transcript:

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb 20161

2 Haifa

GSSI - Feb CS, Technion

Part 0: Part 0: Distributed computing – an overview: basic notions; seminar focus: from lower bounds, via impossibility, to fault tolerance and self-stabilization. Part 1: Part 1: Lower bounds Part 2: Part 2: Computing in spite of faults - impossibility of consensus Part 3: Part 3: Detecting faults - the snapshot algorithm Part 4: Self-stabilization - Self recovery from faults GSSI - Feb 20164

Part 0: Part 0: An overview Part 1: Part 1: Lower bounds Part 2: Part 2: Computing in spite of faults Part 3: Part 3: Detecting faults Part 4: Part 4: Self-stabilization GSSI - Feb 20165

processors communication problem Communication network GSSI - Feb A. The model

Anonymous GSSI - Feb 20167

12 a e 6 c Unique identities GSSI - Feb 20168

d a e b c message passing communication lines, channels topology communication GSSI - Feb 20169

ab c d e directed, undirected (message passing) GSSI - Feb

 message delivery mechanism  fifo  reliable, no faults  finite, arbitrary delay  queues of messages (message passing) GSSI - Feb

Distributed algorithm, protocol  Send a message  receive a message  do local computation GSSI - Feb (message passing) Execution

R4R4 GSSI - Feb R1R1 R2R2 R3R3 R5R5 e a b d c shared memory

A B read/write (shared memory) GSSI - Feb

synchronization Synchronous, Asynchronous d a e b c GSSI - Feb

Asynchronous Model GSSI - Feb ij time t+???time t Clock Network (synchronization)

Synchronous Model GSSI - Feb ij time t+dtime t Clock Network (synchronization)

Asynchronous Model - many executions GSSI - Feb Synchronous Model - unique execution rounds (synchronization)

Asynchronou s GSSI - Feb Synchronous Shared memory Message passing (synchronization)

GSSI - Feb Asynchronous model: for correctness, for upper bound analysis Synchronous model: for lower bound analysis

Topology GSSI - Feb Ring d a e b c

Clique d a e b c (Topology) GSSI - Feb

General (Topology) GSSI - Feb

Why simple networks? They enable the understanding of many design issues In existing general networks – assume a virtual simple network implemented (e.g. a ring) (Topology) GSSI - Feb

Complexity measures GSSI - Feb  Synchronous system  time  Asynchronous system  communication  communication (messages, bits)  time (synchronous time, longest chain, bounded delay)

Parallel vs. Distributed computing Parallel computing – given a problem … (ex: sorting) Distributed computing – Given a network … (ex: broadcast) GSSI - Feb

(Parallel vs. Distributed computing( Parallel computing : time vs. number of processors Distributed computing: number of messages Complexity goals: Parallel computing: efficiency Distributed computing: correctness GSSI - Feb

problem, task P1P1 P2P2 P3P3 input output Leader election yes no consensus GSSI - Feb b. Problems

issues GSSI - Feb  design and analysis of algorithms  impossibility, lower bounds  fault tolerance

problems GSSI - Feb  broadcast  snapshot  consensus  shortest path, maximal flow  leader election, breaking symmetry, maximum finding, spanning tree, center  termination  deadlock

Example: broadcast GSSI - Feb d a e b c f

Broadcast: bfs (breadth-first-search) GSSI - Feb d a e b c f

Broadcast: dfs (depth-first-search) GSSI - Feb d a e b c f

message complexity  each edge carries exactly one message at each direction  message complexity is 2|E| GSSI - Feb

time complexity GSSI - Feb synchronous time 2|E| longest chain 2|E| bounded delay 2|E|

pi (propogation of information), shout-echo GSSI - Feb d a e b c f

Algorithm pi ( p ropogation of i nformation) send m to each neighbour stop GSSI - Feb if receive m along edge e: send m on all edges except e stop

pi Theorem: The following holds for every execution of the pi algorithm: A processor receives the message m at most once. The execution terminates. each processor receives the message m. The edges on which processors receive m form a spanning tree. The message complexity is 2|E|-|V|+1. The time complexity … GSSI - Feb

pif (propogation of information with feedback) shout-echo GSSI - Feb d a e b c f

Distributed algorithms “Positive” results: design, analysis, upper bounds “Negative” results: lower bounds, impossibility GSSI - Feb c. In this seminar

P1P1 P2P2 P3P3 input output Leader election yes no GSSI - Feb Part 1: Part 1: Lower bounds

GSSI - Feb message passing asynchronous ? x x x x Leader election

GSSI - Feb We’ll see: a lower bound of Ω (n log n) messages

GSSI - Feb d a e b c f Lower bound and fault tolerance Usually all processors need to compute some function Lower bound of Ω(|E|) g

problem, task P1P1 P2P2 P3P3 input output consensus GSSI - Feb Part 2: Part 2: Computing in spite of faults

message passing asynchronous Consensus GSSI - Feb We’ll see: impossibility to reach consensus.

GSSI - Feb Snapshot Part 3: Part 3: Detecting faults

GSSI - Feb We’ll see: snapshot algorithm.

GSSI - Feb Example: clock synchronization Part 4: Self-stabilization

GSSI - Feb

4 Let’s try … GSSI - Feb

4 But … GSSI - Feb

GSSI - Feb We’ll see: self stabilizing algorithms, proofs and performance analysis.