Chapter 18 Distributed Control Algorithms Copyright © 2008.

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
1 Distributed Deadlock Fall DS Deadlock Topics Prevention –Too expensive in time and network traffic in a distributed system Avoidance.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Handling Deadlocks n definition, wait-for graphs n fundamental causes of deadlocks n resource allocation graphs and conditions for deadlock existence n.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Distributed Process Management
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
CS 582 / CMPE 481 Distributed Systems
What we will cover…  Distributed Coordination 1-1.
Distributed Process Management
Module 2.4: Distributed Systems
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Distributed process management: Distributed deadlock
Chapter 18.3: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms and Distributed Processing Section 6.5.
Chapter 17 Theoretical Issues in Distributed Systems
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
1 A Mutual Exclusion Algorithm for Ad Hoc Mobile networks Presentation by Sanjeev Verma For COEN th Nov, 2003 J. E. Walter, J. L. Welch and N. Vaidya.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
4.5 DISTRIBUTED MUTUAL EXCLUSION MOSES RENTAPALLI.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
MUTUAL EXCLUSION AND QUORUMS CS Distributed Mutual Exclusion Given a set of processes and a single resource, develop a protocol to ensure exclusive.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Synchronization CSCI 4780/6780. Mutual Exclusion Concurrency and collaboration are fundamental to distributed systems Simultaneous access to resources.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Vector Clock Each process maintains an array of clocks –vc.j.k denotes the knowledge that j has about the clock of k –vc.j.j, thus, denotes the clock of.
Chapter 7 –System Model – typical assumptions underlying the study of distributed deadlock detection Only reusable resources, only exclusive access, single.
Studying Different Problems from Distributed Computing Several of these problems are motivated by trying to use solutiions used in `centralized computing’
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Distributed Process Coordination Presentation 1 - Sept. 14th 2002 CSE Spring 02 Group A4:Chris Sun, Min Fang, Bryan Maden.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
The Principles of Operating Systems Chapter 9 Distributed Process Management.
Exercises for Chapter 11: COORDINATION AND AGREEMENT
Coordination and Agreement
Termination detection
Synchronization: Distributed Deadlock Detection
Chapter 18 Distributed Control Algorithms
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Mutual Exclusion Continued
Distributed Mutual Exclusion
Distributed Mutual Exclusion
Outline Distributed Mutual Exclusion Introduction Performance measures
CSE 486/586 Distributed Systems Leader Election
CSE 486/586 Distributed Systems Mutual Exclusion
Lecture 10: Coordination and Agreement
Synchronization (2) – Mutual Exclusion
Prof. Leonardo Mostarda University of Camerino
Lecture 11: Coordination and Agreement
Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.
Distributed Mutual eXclusion
CSE 486/586 Distributed Systems Mutual Exclusion
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Chapter 18 Distributed Control Algorithms Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction Operation of Distributed Control Algorithms Correctness of Distributed Control Algorithms Distributed Mutual Exclusion Distributed Deadlock Handling Distributed Scheduling Algorithms Distributed Termination Detection Election Algorithms Practical Issues in Using Distributed Control Algorithms

Operating Systems, by Dhananjay Dhamdhere Copyright © OS Control Functions in a Distributed Environment Special features of distributed OS control functions –Mutual exclusion Involves synchronization of processes in different computers –Deadlock handling Deadlocks may involve use of resources in different hosts –Scheduling Perform load balancing for comparable loading of computers –Termination detection Check whether all processes of a computation, which may operate in different computers, have completed –Election Elect coordinator for a privileged function Operating Systems, by Dhananjay Dhamdhere3

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere4 Operation of Distributed Control Algorithms (continued) A distributed control algorithm operates in parallel with its clients, so that it can respond readily to events related to its service –Each process has a control part and a basic part

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere5 Correctness of Distributed Control Algorithms Algorithm correctness has two facets: –Liveness: eventually performs correct actions i.e., without indefinite delays –Safety: does not perform wrong actions Proving correctness of a distributed algorithm is a complex task

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere6 Correctness of Distributed Control Algorithms (continued)

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere7 Distributed Mutual Exclusion A Permission-Based Algorithm Token-Based Algorithms

Operating Systems, by Dhananjay Dhamdhere Copyright © Ricart and Agrawala algorithm: CS entry in FCFS order –Requires 2 x (n – 1 ) messages per CS entry Operating Systems, by Dhananjay Dhamdhere8 A Permission-Based Algorithm

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere9

Operating Systems, by Dhananjay Dhamdhere Copyright © Maekawa Algorithm Each process has a request set of processes; it seeks the permission of only processes in the request set (R i represents the request set of process P i ) –Correctness is ensured through the following rules: For all P i : P i is included in R i For all P i, P j : R i ∩ R j is non-null –The algorithm requires 2 x √n messages per CS entry Operating Systems, by Dhananjay Dhamdhere10

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere11 Token-Based Algorithms for Mutual Exclusion A token represents the privilege to use a CS –Only a process possessing token can enter CS Safety of algorithm follows from this rule Liveness: must ensure token eventually reaches a (requesting) process These algorithms use abstract system models –Edges represent the paths used to pass control messages –For example: Abstract ring and tree topologies

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere12 An Algorithm Employing the Ring Topology

Operating Systems, by Dhananjay Dhamdhere Copyright © Raymond’s Token-Based Algorithm Features of the algorithm –The algorithm uses an abstract inverted tree to reduce the number of messages. It has three invariants P holder, the process holding the token, is at root of the tree Each process in the system belongs to the tree Each process other than the P holder has only one edge, which points to its parent in the tree Thus, each process has a path that ends on P holder –Each process has a local request queue When it receives a request, it puts the requestor’s id in the queue When it makes a request, it puts its own id in the queue Operating Systems, by Dhananjay Dhamdhere13

Operating Systems, by Dhananjay Dhamdhere Copyright © Uses an abstract inverted tree as the system model Operating Systems, by Dhananjay Dhamdhere14 Raymond’s Algorithm Token is transferred to P 3

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere15

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere16 Distributed Deadlock Handling Deadlock detection, prevention, and avoidance approaches studied earlier use state information –Problems arise in extending these approaches to a distributed system Approaches in distributed systems: –Detection: centralized and distributed –Prevention No special techniques for distributed deadlock avoidance have been discussed in OS literature Model used in this section: –SISR model of resource allocation

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere17 Problems in Centralized Deadlock Detection Steps in centralized deadlock detection: –Collect WFGs of all nodes at a central node –Superimpose them to form a merged WFG –Use a conventional deadlock detection algorithm Problem: can lead to phantom deadlocks –Violates safety property in deadlock detection

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere18 Distributed Deadlock Detection Key issues in deadlock detection: –A cycle indicates a deadlock in an SISR system –A knot indicates a deadlock in an MISR system Distributed deadlock detection approach: –Cycles and knots detected through joint actions of nodes in system –Every node can detect and declare a deadlock Two such algorithms: –Diffusion computation-based algorithm –Edge chasing algorithm

Operating Systems, by Dhananjay Dhamdhere Copyright © Diffusion Computation-Based Deadlock Detection Diffusion computation: used to collect info about nodes –Diffusion phase Computation that has originated in one node, spreads to other nodes –A control message called a query is sent along each edge –The first query received by a node is called an engaging query. On receiving it, the node sends queries along all its out-edges –Information collection phase Each node sends information in response to each query –It sends a dummy reply for a non-engaging query –It collects information from all replies it received, adds its own information, and sends the result as the reply to the engaging query. We call it an engaging reply Operating Systems, by Dhananjay Dhamdhere19

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere20 Diffusion Computation-Based Algorithm Algorithm 18.4 was proposed by Chandy, Misra, and Haas (1983) –Works for SISR and MISR systems P 2, P 3 are blocked. P 1 becomes blocked and sends a query but does not receive a reply because P 4 is not blocked P 4 requests a resource held by P 1, becomes blocked and sends a query. It would receive a reply and declare a deadlock

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere21 Diffusion Computation-Based Algorithm (continued)

Operating Systems, by Dhananjay Dhamdhere Copyright © Mitchell–Merritt Algorithm for Distributed Deadlock Detection It is an edge chasing algorithm—control messages are sent over WFG edges to detect cycles –A provision is made to ensure that the cycle has not been broken before it was detected Each process is assigned a public label and a private label –The labels are identical when a process is created –The public label of a process changes when it gets blocked on a resource request –It also changes when it waits for a process having a larger public label A wait-for edge with a specific relation between public and private labels of its source and destination processes indicates presence of a deadlock Operating Systems, by Dhananjay Dhamdhere22

Operating Systems, by Dhananjay Dhamdhere Copyright © Rules of the algorithm: Operating Systems, by Dhananjay Dhamdhere23 Mitchell-Merritt Algorithm public label private label

Operating Systems, by Dhananjay Dhamdhere Copyright © Distributed Deadlock Prevention Cycles are prevented as follows: –A pair (local time, node id) is used to time-stamp creation of a process –When process P i requests a resource allocated to P j, time-stamps of P i, P j are used to decide whether P i can wait for P j Two approaches –Wait-or-die P i is allowed to wait if older than P j ; otherwise, it is killed –Would-or-wait P i is allowed to wait if younger than P j ; otherwise P j is killed A killed process retains original timestamp if restarted Operating Systems, by Dhananjay Dhamdhere24

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere25 Distributed Scheduling Algorithms A distributed scheduling algorithm balances computational loads in the nodes –Uses process migration Apply a threshold to decide if a node is heavily or lightly loaded

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere26 Distributed Scheduling Algorithms (continued) Migration may be preemptive or nonpreemptive Stability is an important issue –An unstable algorithm may lead to a situation similar to thrashing A process is transferred very often; does not make progress Algorithms can be sender- or receiver-initiated –A heavily loaded node is a sender –A lightly loaded node is a receiver –A symmetrically initiated algorithm contains features of both

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere27 Distributed Scheduling Algorithms (continued)

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere28 Distributed Scheduling Algorithms (continued)

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere29 Distributed Termination Detection When a process terminates, OS frees its resources –This approach is not adequate for distributed systems Processes of a distributed computation execute in different nodes of a distributed system. They should be terminated when all of them have completed their tasks –These processes perform work assigned to them A process is active when it is performing work, and passive when it has no work Work is assigned to a process through a message –Passive process becomes active on receiving a message –Distributed termination condition (DTC): All processes of a distributed computation are passive No basic messages are in transit

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere30 Distributed Termination Detection (continued) Credit-distribution-based termination detection –Every process is assigned a numerical credit It sends some of its credit in each message –When a process becomes passive, it sends its credit to collector process –The distributed computation is known to have terminated when credit accumulated by collector is C Diffusion computation-based termination detection –Each process that becomes passive initiates a diffusion computation to determine if DTC holds

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere31 Distributed Termination Detection (continued)

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere32 Election Algorithms Critical system functions are assigned to a coordinator (for the function) –E.g., replacing lost token in a token-based algorithm Coordinator is typically the highest-priority process in set A process that finds that coordinator is not responding assumes it has failed –Initiates election algorithm Algorithm chooses highest-priority nonfailed process as new coordinator –Then, announces its id to all nonfailed processes

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere33 Election Algorithms for Unidirectional Ring Topologies Links in ring are assumed to be FIFO channels Assumption: control part of failed process continues to function and forwards received messages along out- edge Election performed by obtaining ids of all nonfailed processes and electing highest-priority process Two types of messages: –“elect me” and “new coordinator” O(n 2 ) messages or 3n−1 (worst case) in refined version

Operating Systems, by Dhananjay Dhamdhere Copyright © Election Algorithms Algorithm for the ring topology Process P i initiates by sending (“elect me”, P i ) message Process P j receiving an (“elect me”, P i ) message sends an (“elect me”, P j ) message and then forwards P i ’s message P i receives back its own message after receiving message of every other process; it elects the highest priority process as leader –It sends a “new coordinator” message to inform others –Algorithm 2: Refinement of algorithm 1 In Step 2, P j sends its own message if its priority is higher than P i ’s; otherwise, it sends P i ’s message Only highest priority process would get back its own message Operating Systems, by Dhananjay Dhamdhere34

Operating Systems, by Dhananjay Dhamdhere Copyright © Election Algorithm Bully algorithm 1.Initiator P i sends (“elect me”, P i ) messages to all higher priority processes and starts a time-out interval T 1 a.If a time-out occurs, it sends a “new coordinator” message to lower priority processes b.If it receives a “don’t you dare” message from a higher priority process P j, it starts another time-out interval T 2 i.If a time-out occurs, it starts a new election 2.If a process P j receives an “elect me” message from a lower priority process a.It sends a “don’t you dare” message to its sender b.Starts a new election by sending (“elect me”, P j ) messages Requires O(n 2 ) messages per election Operating Systems, by Dhananjay Dhamdhere35

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere36 Practical Issues in Using Distributed Control Algorithms Resource Management Process Migration

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere37 Resource Management Name server in each node must be updated when resources are added –Solution: use an arrangement of name servers as in the domain name service (DNS) Only the name server of a domain needs to be updated when a resource is added

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere38 Process Migration Reasons for process migration: –To achieve load balancing –To reduce network traffic involved in utilizing a remote resource –To provide availability of services when a node has to be shut down for maintenance

Operating Systems, by Dhananjay Dhamdhere Copyright © Process Migration Difficulties –Process state is distributed in various data structures of the OS –Process id’s may change due to migration Process id’s are used in interprocess communication Solution: Use global process ids as in Sun cluster –Delivery of messages requires a special provision A node receiving a message would redirect it if the destination process has migrated out of it –This residual state causes poor reliability Alternatively, all processes may be informed when a process is migrated –Requires a complex protocol Operating Systems, by Dhananjay Dhamdhere39

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere40 Summary Actions of a distributed control algorithms are performed in many nodes of the system –Two aspects of correctness are liveness and safety Distributed system models: physical and logical Examples of distributed control algorithms: –Distributed mutual exclusion: e.g., token based –Distributed deadlock detection: diffusion computation –Distributed scheduling (to balance load) –Distributed termination: e.g., credit-based –Election (highest priority process wins)