Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

Slides:

Advertisements

Similar presentations

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.

Advertisements

Timed Distributed System Models  A. Mok 2014 CS 386C.

Impossibility of Distributed Consensus with One Faulty Process

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

6.852: Distributed Algorithms Spring, 2008 Class 7.

Distributed Systems Overview Ali Ghodsi

An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.

UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

Ordering and Consistent Cuts Presented By Biswanath Panda.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.

Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.

Distributed Systems Terminating Reliable Broadcast Prof R. Guerraoui Distributed Programming Laboratory.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Composition Model and its code. bound:=bound+1.

Consensus and Related Problems Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,

Distributed Systems – CS425/CSE424/ECE428 – Fall Nikita Borisov — UIUC1.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

9/14/20151 Lecture 18: Distributed Agreement CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown.

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

Chapter 14 Asynchronous Network Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.

Consensus and Its Impossibility in Asynchronous Systems.

Distributed System Models (Fundamental Model). Architectural Model Goal Reliability Manageability Adaptability Cost-effectiveness Service Layers Platform.

University of Tampere, CS Department Distributed Commit.

1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

Prepared By: Md Rezaul Huda Reza

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department

Building Dependable Distributed Systems, Copyright Wenbing Zhao

SysRép / 2.5A. SchiperEté The consensus problem.

1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.

Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.

Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Distributed Consensus

Agreement Protocols CS60002: Distributed Systems

Outline Announcements Fault Tolerance.

EEC 688/788 Secure and Dependable Computing

Distributed systems Reliable Broadcast

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed systems Consensus

Distributed Systems Terminating Reliable Broadcast

M. Mock and E. Nett and S. Schemmer

Presentation transcript:

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems

Reliable distributed services Example 1: reliable broadcast –Ensure that a message sent to a group of processes is received (delivered) by all or none Example 2: atomic commit –Ensure that the processes reach a common decision on whether to commit or abort a transaction

Underlying services Processes (abstracting computers) Channels (abstracting networks) Failure detectors (abstracting time)

Processes (1) The distributed system is made of a finite set of processes: each process models a sequential program Processes are denoted by p 1,..p N or p, q, r Processes have unique identities and know each other Every pair of processes is connected by a link through which the processes exchange messages

Processes (2) A process executes a step at every tick of its local clock: –a step consists of a local computation (local event) and message exchanges with other processes (global event) One message is delivered from/sent to a process per step

Processes (3) The program of a process is made of a finite set of modules (or components) organized as a software stack Modules within the same process interact by exchanging events upon event do –// something –trigger

Modules of a process

Approach Specifications –What is the service?  i.e., the problem ~ liveness + safety Assumptions –What is the model?  i.e., the power of the adversary Algorithms –How do we implement the service? What cost?

Liveness and safety (1) Safety is a property which states that nothing bad should happen Liveness is a property which states that something good should happen –Any specification can be expressed in terms of liveness and safety properties (Lamport and Schneider)

Liveness and safety (2) Example: tell the truth –Having to say something is liveness –Not lying is safety

Specifications Example 1: reliable broadcast –Ensure that a message sent to a group of processes is received (delivered) by all or none Example 2: atomic commit –Ensure that the processes reach a common decision on whether to commit or abort a transaction

Processes (1) A process either executes the algorithm assigned to it (steps) or fails Two kinds of failures are mainly considered –Omissions: the process omits to send messages it is supposed to send (distracted) –Arbitrary: the process sends messages it is not supposed to send (malicious or Byzantine)

Processes (2) Crash-stop: a more specific case of omissions –A process that omits a message to a process, omits all subsequent messages to all processes (permanent distraction): it crashes

Processes (3) By default, we shall assume a crash-stop model throughout this course; that is, unless specified otherwise: processes fail only by crashing (no recovery) A correct process is a process that does not fail (that does not crash)

Processes/Channels Processes communicate by message passing through communication channels Messages are uniquely identified and the message identifier includes the sender’s identifier

Fair-loss links Fair loss: if a message is sent infinitely often by p i to p j, and neither p i or p j crashes, then m is delivered infinitely often by p j Finite duplication: if a message is sent a finite number of times by p i to p j, it is delivered a finite number of times by p j No creation: no message is delivered unless it was sent

Stubborn links Stubborn delivery: if a process p i sends a message m to a correct process p j, and p i does not crash, then p j delivers m an infinite number of times No creation: No message is delivered unless it was sent

Reliable (Perfect) links Properties –Validity: if p i and p j are correct, then every message sent by p i to p j is eventually delivered by p j –No duplication: no message is delivered (to a process) more than once –No creation: no message is delivered unless it was sent

Reliable links We shall assume reliable links (also called perfect) throughout this course (unless specified otherwise) Roughly speaking, reliable links ensure that messages exchanged between correct processes are not lost

Failure detection (1) A failure detector is a distributed oracle that provides processes with suspicions about crashed processes It is implemented using (i.e., it encapsulates) timing assumptions According to the timing assumptions, the suspicions can be accurate or not

Failure detection (2) A failure detector module is defined by events and properties Events –Indication: Properties –Completeness –Accuracy

Failure detection (3) Perfect –Strong completeness: eventually, every process that crashes is permanently suspected by every correct process –Strong accuracy: no process is suspected before it crashes Eventually perfect –Strong completeness –Eventual strong accuracy: eventually, no correct process is ever suspected

Failure detection (4) Implementation –Processes periodically exchange heartbeat messages –A process sets a timeout based on worst case round trip of a message exchange –A process suspects another process if it timeouts that process –A process that delivers a message from a suspected process revises its suspicion and increases its time-out

Timing assumptions Synchronous –Processing: the time it takes for a process to execute a step is bounded and known –Delays: there is a known upper bound limit on the time it takes for a message to be received –Clocks: the drift between a local clock and the global real time clock is bounded and known Eventually synchronous: the timing assumptions hold eventually Asynchronous: no assumption

Algorithms (1)

Algorithms (2)

The rest; for every abstraction We assume a crash-stop system with a perfect failure detector (fail-stop) –We give algorithms We try to make a weaker assumption –We revisit the algorithms FINE