Computer Science 425/ECE 428/CSE 424 Distributed Systems (Fall 2009) Lecture 20 Self-Stabilization Reading: Chapter from Prof. Gosh’s book Klara Nahrstedt.

Slides:



Advertisements
Similar presentations
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Advertisements

CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Self Stabilizing Algorithms for Topology Management Presentation: Deniz Çokuslu.
Self-stabilizing Distributed Systems Sukumar Ghosh Professor, Department of Computer Science University of Iowa.
Self-Stabilization in Distributed Systems Barath Raghavan Vikas Motwani Debashis Panigrahi.
Fabian Kuhn, Microsoft Research, Silicon Valley
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Lecture 4: Elections, Reset Anish Arora CSE 763 Notes include material from Dr. Jeff Brumfield.
Introduction to Self-Stabilization Stéphane Devismes.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Self-Stabilization An Introduction Aly Farahat Ph.D. Student Automatic Software Design Lab Computer Science Department Michigan Technological University.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Lecture 3-1 Computer Science 425 Distributed Systems Lecture 3 Logical Clock and Global States/ Snapshots Reading: Chapter 11.4&11.5 Klara Nahrstedt.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Lecture 3 1.
Selected topics in distributed computing Shmuel Zaks
Lecture 13 Peer-to-Peer systems (Part II) CS 425/ECE 428/CSE 424 Distributed Systems (Fall 2009) CS 425/ECE 428/CSE 424 Distributed Systems (Fall 2009)
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 8 Leader Election Section 12.3 Klara Nahrstedt.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Faults and fault-tolerance
Dissecting Self-* Properties Andrew Berns & Sukumar Ghosh University of Iowa.
Autonomic distributed systems. 2 Think about this Human population x10 9 computer population.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.
Indranil Gupta (Indy) September 27, 2012 Lecture 10 Peer-to-peer Systems II Reading: Chord paper on website (Sec 1-4, 6-7)  2012, I. Gupta Computer Science.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Self-stabilization. What is Self-stabilization? Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward.
CS 542: Topics in Distributed Systems Self-Stabilization.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Self-stabilization. Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward error recovery). Guarantees.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Hwajung Lee.  Technique for spontaneous healing.  Forward error recovery.  Guarantees eventual safety following failures. Feasibility demonstrated.
Program Correctness. The designer of a distributed system has the responsibility of certifying the correctness of the system before users start using.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
ITEC452 Distributed Computing Lecture 15 Self-stabilization Hwajung Lee.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Analysis of the Evolution of Peer-to-Peer Systems Proseminar “Peer – to – Peer Information Systems” WS 04/05 Prof. Gerhard Weikum Speaker : Emil Zankov.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Scalable Peer-to-peer Lookup Service for Internet Applications
DHT Routing Geometries and Chord
Chord Advanced issues.
Student: Fang Hui Supervisor: Teo Yong Meng
Chord Advanced issues.
Chord Advanced issues.
Corona Robust Low Atomicity Peer-To-Peer Systems
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Computer Science 425/ECE 428/CSE 424 Distributed Systems (Fall 2009) Lecture 20 Self-Stabilization Reading: Chapter from Prof. Gosh’s book Klara Nahrstedt

Acknowledgement The slides during this semester are based on ideas and material from the following sources: –Slides prepared by Professors M. Harandi, J. Hou, I. Gupta, N. Vaidya, Y-Ch. Hu, S. Mitra. –Slides from Professor S. Gosh’s course at University o Iowa.

Administrative MP2 posted October 5, 2009, on the course website, –Deadline November 6 (Friday) –Demonstration, 4-6pm, 11/6/2009 –Tutorial for MP2 planned for October 28 evening if students send questions to TA by October 25. Send requests what you would like to hear in the tutorial.

Plan for Today Motivation for Self-Stabilization Self-Stabilization Concepts/Definitions Dijkstra’s stabilization of mutual exclusion in unidirectional ring Chord’s stabilization protocol Stabilizing graph coloring

Motivation As the number of computing elements increase in distributed systems failures become more common Fault tolerance (FT) should be automatic, without external intervention two kinds of fault tolerance –masking: application layer does not see faults, e.g., redundancy and replication –non-masking: system deviates, deviation is detected and then corrected: e.g., feedback, roll back and recovery self-stabilization is a general technique for non-masking FT distributed systems

Self-stabilization Technique for spontaneous healing Guarantees eventual safety following failures Feasibility demonstrated by Dijkstra (CACM `74) E. Dijkstra

Configurations of Distributed Systems Two classes of configurations (or behaviors) –Legitimate configuration »In non-reactive system is represented by invariant over global state of the system Example: legal state of network routing: no cycle in a route between pair of nodes »in reactive system is determined by a state predicate and by behavior. Example: in token ring, legitimate config. When (i) there is exactly one token in the network; (ii) in infinite behavior of the system, each process receives the token infinitely often. –Illegitimate configuration »Example: if process grasps token, but does not release it, then the first criterion of the legitimate config. Is true, but the second criterion is not satisfied, hence configuration becomes illegitimate.

Self-stabilizing systems recover from any initial configuration to a legitimate configuration in a bounded number of steps, as long as the codes are not corrupted Assumption: failures affect the state (and data) but not the program since program executes the self- stabilization; Such systems can be deployed ad hoc, and are guaranteed to function properly in bounded time Guarantees fault tolerance when the mean time between failures (MTBF) >> mean time to recovery (MTTR) –Stabilization provides solution when failures are infrequent and temporary malfunctions are acceptable

Reasons for illegal configurations Transient failures perturb the global state. The ability to spontaneously recover from any initial state implies that no initialization is ever required. –Example: disappearance of the only circulating token in token ring; data corruption due to radio interference or power supply variations; Topology changes: topology of network changes at run time when node crashes or new node is added to the system –Example: peer-to-peer networks and their churn rate (dynamic networks) – see stabilization protocol in Chord Environmental changes: environment of a program may change without notice –Example: traffic lights in city may run different programs depending on volume and distribution of traffic. If system runs “early morning program” in the afternoon rush hours, we have illegal configuration.

Self-stabilizing systems Self-stabilizing systems exhibits non-masking fault-tolerance They satisfy the following two criteria – convergence regardless of initiate state, the system eventually returns to legal configuration – closure once in legal configuration, system continues in legal configuration unless failure or perturbation corrupts data memory fault L: Legitimate configuration Non L: Illegitimate configruation

Example1: Stabilizing mutual exclusion in unidirectional ring consider a unidirectional ring of processes. Legal configuration = exactly one token in the ring desired “normal” behavior: single token circulates in the ring Only the node that holds the token can access the critical region!!!

Dijkstra’s stabilizing mutual exclusion

Example execution

Example stabilizing execution

Why does it work? 1. at any configuration, at least one process can make a move (has token) –suppose p 1,…,p N-1 cannot make a move –then x[N-1] = x[N-2] = … x[0] –then p 0 can make a move

Why does it work? 1. at any configuration, at least one process can make a move (has token) 2. set of legal configurations is closed under all moves

Why does it work? 1. at any configuration, at least one process can make a move (has token) 2. set of legal configurations is closed under all moves 3. total number of possible moves from (successive configurations) never increases –any move by p i either enables a move for p i+1 or none at all

Why does it work? 1. at any configuration, at least one process can make a move (has token) 2. set of legal configurations is closed under all moves 3. total number of possible moves from (successive configurations) never increases 4. all illegal configuration C converges to a legal configuration in a finite number of moves –there must be a value, say v, that does not appear in C –except for p 0, none of the processes create new values –p 0 takes infinitely many steps, and therefore, eventually sets x[0] = v –all other processes copy value v and a legal configuration is reached in N-1 steps

Putting it all together Legal configuration = a configuration with a single token Perturbations or failures take the system to configurations with multiple tokens –e.g. mutual exclusion property may be violated Within finite number of steps, if no further failures occur, then the system returns to a legal configuration

P2P Systems - Chord Search N80 0 Say m=7 N32 N45 File cnn.com/index.html with key K42 stored here All “arrows” are RPCs N112 N96 N16 Who has cnn.com/index.html ? (hashes to K42) At node n, send query for key k to largest successor/finger entry < k (all mod m) if none exist, send query to successor(n) (lookup in finger table) N45 > K42, hence Take successor i ft[i] < < > 42

Stabilization Protocol in Chord Chord has to deal with peer churns – topological changes!!! Maintaining finger tables only is expensive in case of dynamic joint and leave nodes Chord therefore separates correctness from performance goals via stabilization protocols Basic stabilization protocol –Keep successor’s pointers correct! –Then use them to correct finger tables

Search under peer failures N80 0 Say m=7 N32 N45 File cnn.com/index.html with key K42 stored here X X X Lookup fails (N16 does not know N45) N112 N96 N16 Who has cnn.com/index.html ? (hashes to K42)

Search under peer failures N80 0 Say m=7 N32 N45 File cnn.com/index.html with key K42 stored here X One solution: maintain r multiple successor entries in case of failure, use successor entries N112 N96 N16 Who has cnn.com/index.html ? (hashes to K42)

Search under peer failures (2) N80 0 Say m=7 N32 N45 File cnn.com/index.html with key K42 stored here X X Lookup fails (N45 is dead) N112 N96 N16 Who has cnn.com/index.html ? (hashes to K42)

Search under peer failures (2) N80 0 Say m=7 N32 N45 File cnn.com/index.html with key K42 stored here X One solution: replicate file/key at r successors and predecessors N112 N96 N16 K42 replicated Who has cnn.com/index.html ? (hashes to K42)

New peers joining N80 0 Say m=7 N32 N45 N112 N96 N16 N40 1. N40 acquires that N45 is its successor 2. N45 updates its info about predecessor to be N40 3. N32 runs stabilizer and asks N45 for predecessor 4. N45 returns N40 5. N32 updates its info about successor to be N40 6. N32 notifies N40 to be its predecessor N40 periodically talks to neighbors to update own finger table Stabilization protocol Peers also keep info about their predecessors to deal with dynamics

New peers joining (2) N80 0 Say m=7 N32 N45 N112 N96 N16 N40 N40 may need to copy some files/keys from N45 (files with fileid between 32 and 40) K34,K38

Chord Stabilization Protocol Concurrent peer joins, leaves, failures might cause loopiness of pointers, and failure of lookups –Chord peers periodically run a stabilization algorithm that checks and updates pointers and keys –Ensures non-loopiness of fingers, eventual success of lookups and O(log(N)) lookups –[TechReport on Chord webpage] defines weak and strong stability –Each stabilization round at a peer involves a constant number of messages –Strong stability takes stabilization rounds (!)

Stabilizing graph coloring Simple coloring problem and algorithm

Graph coloring problem

Simple coloring algorithm

Correctness of simple coloring (SC) each action resolves the color of a node w.r.t. its neighbors once a node gets a distinct color, it never changes its color each node changes color at most once, algorithm terminates after N-1 steps

Properties of SC

Summary What is self-stabilization? Self-stabilization systems Simple coloring problem and algorithm