Mutual Exclusion A condition in which there is a set of processes, only one of which is able to access a given resource or perform a given function at.

Slides:



Advertisements
Similar presentations
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Distributed Mutual Exclusion.
Advertisements

CS542 Topics in Distributed Systems Diganta Goswami.
Ricart and Agrawala’s Algorithm
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
1 Deadlock Deadlock is a situation where a process or a set of processes is blocked on an event that never occurs Processes while holding some resources.
Handling Deadlocks n definition, wait-for graphs n fundamental causes of deadlocks n resource allocation graphs and conditions for deadlock existence n.
Deadlock Prevention, Avoidance, and Detection.  The Deadlock problem The Deadlock problem  Conditions for deadlocks Conditions for deadlocks  Graph-theoretic.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
CS 582 / CMPE 481 Distributed Systems
What we will cover…  Distributed Coordination 1-1.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Synchronization in Distributed Systems. Mutual Exclusion To read or update shared data, a process should enter a critical region to ensure mutual exclusion.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.
Distributed process management: Distributed deadlock
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Deadlocks in Distributed Systems Deadlocks in distributed systems are similar to deadlocks in single processor systems, only worse. –They are harder to.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Mutual Exclusion.
4.5 DISTRIBUTED MUTUAL EXCLUSION MOSES RENTAPALLI.
4.5 Distributed Mutual Exclusion Ranjitha Shivarudraiah.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Chapter 7 –System Model – typical assumptions underlying the study of distributed deadlock detection Only reusable resources, only exclusive access, single.
Lecture 10 – Mutual Exclusion Distributed Systems.
Synchronization: Distributed Mutual Exclusion (Week:6)
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
ICS Deadlocks 6.1 Deadlocks with Reusable and Consumable Resources 6.2 Approaches to the Deadlock Problem 6.3 A System Model –Resource Graphs –State.
Lecture 6 Deadlock 1. Deadlock and Starvation Let S and Q be two semaphores initialized to 1 P 0 P 1 wait (S); wait (Q); wait (Q); wait (S);. signal (S);
Chapter 7: Deadlocks.
4.5 Distributed Mutual Exclusion
Synchronization: Distributed Deadlock Detection
Concurrency: Deadlock and Starvation
Mutual Exclusion Continued
Distributed Mutual Exclusion
G.Anuradha Ref:- Galvin
Concurrency Control.
Distributed Mutex EE324 Lecture 11.
Chapter 7 Deadlocks.
Distributed Mutual Exclusion
Chapter 15 : Concurrency Control
G.Anuradha Ref:- Galvin
Outline Distributed Mutual Exclusion Introduction Performance measures
CSE 486/586 Distributed Systems Mutual Exclusion
Lecture 10: Coordination and Agreement
Chapter 7: Deadlocks.
Synchronization (2) – Mutual Exclusion
Distributed Deadlock Detection
Prof. Leonardo Mostarda University of Camerino
Operating Systems Principles Process Management and Coordination Lecture 6: Deadlocks 主講人:虞台文.
Lecture 11: Coordination and Agreement
Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.
Distributed Mutual eXclusion
CSE 486/586 Distributed Systems Mutual Exclusion
Transactions, Properties of Transactions
Presentation transcript:

Mutual Exclusion A condition in which there is a set of processes, only one of which is able to access a given resource or perform a given function at any time

Centralized Systems Mutual exclusion via: Test & set Semaphores Messages Monitors

Distributed Mutual Exclusion Assume there is agreement on how a resource is identified Pass identifier with requests Create an algorithm to allow a process to obtain exclusive access to a resource

Distributed Mutual Exclusion Centralized Algorithm Token Ring Algorithm Distributed Algorithm Decentralized Algorithm

Centralized algorithm Mimic single processor system One process elected as coordinator request(R) C Request resource Wait for response Receive grant access resource Release resource grant(R) P release(R)

Centralized algorithm If another process claimed resource: Coordinator does not reply until release Maintain queue Service requests in FIFO order P2 Queue request(R) request(R) P1 C P2 grant(R) request(R) P0 grant(R) P1 release(R)

Centralized algorithm Benefits Fair All requests processed in order Easy to implement, understand, verify Problems Process cannot distinguish being blocked from a dead coordinator Centralized server can be a bottleneck

Token Ring algorithm Assume known group of processes Some ordering can be imposed on group Construct logical ring in software Process communicates with neighbor P0 P1 P2 P3 P4 P5

Token Ring algorithm Initialization Token circulates around ring Process 0 gets token for resource R Token circulates around ring From Pi to P(i+1)mod N When process acquires token Checks to see if it needs to enter critical section If no, send token to neighbor If yes, access resource Hold token until done P0 P1 P2 P3 P4 P5 token(R)

Token Ring algorithm Only one process at a time has token Mutual exclusion guaranteed Order well-defined Starvation cannot occur If token is lost (e.g. process died) It will have to be regenerated Does not guarantee FIFO order

Ricart & Agrawala algorithm Distributed algorithm using reliable multicast and logical clocks Process wants to enter critical section: Compose message containing: Identifier (machine ID, process ID) Name of resource Timestamp (totally-ordered Lamport) Send request to all processes in group Wait until everyone gives permission Enter critical section / use resource

Ricart & Agrawala algorithm When process receives request: If receiver not interested: Send OK to sender If receiver is in critical section Do not reply; add request to queue If receiver just sent a request as well: Compare timestamps: received & sent messages Earliest wins If receiver is loser, send OK If receiver is winner, do not reply, queue When done with critical section Send OK to all queued requests

Ricart & Agrawala algorithm N points of failure A lot of messaging traffic Demonstrates that a fully distributed algorithm is possible

Lamport’s Mutual Exclusion Each process maintains request queue Contains mutual exclusion requests Requesting critical section: Process Pi sends request(i, Ti) to all nodes Places request on its own queue When a process Pj receives a request, it returns a timestamped ack Lamport time

Lamport’s Mutual Exclusion Entering critical section (accessing resource): Pi received a message (ack or release) from every other process with a timestamp larger than Ti Pi’s request has the earliest timestamp in its queue Difference from Ricart-Agrawala: Everyone responds … always - no hold-back Process decides to go based on whether its request is the earliest in its queue

Lamport’s Mutual Exclusion Releasing critical section: Remove request from its own queue Send a timestamped release message When a process receives a release message Removes request for that process from its queue This may cause its own entry have the earliest timestamp in the queue, enabling it to access the critical section

Characteristics of Decentralized Algorithms No machine has complete information about the system state Machines make decisions based only on local information Failure of one machine does not ruin the algorithm Three is no implicit assumption that a global clock exists

Decentralized Algorithm Based on the Distributed Hash Table (DHT) system structure previously introduced Peer-to-peer Object names are hashed to find the successor node that will store them Here, we assume that n replicas of each object are stored

Mutual Exclusion Algorithms Non-token based: A site/process can enter a critical section when an assertion (condition) becomes true. Algorithm should ensure that the assertion will be true in only one site/process. Token based: A unique token (a known, unique message) is shared among cooperating sites/processes. Possessor of the token has access to critical section. Need to take care of conditions such as loss of token, crash of token holder, possibility of multiple tokens, etc.

General System Model At any instant, a site may have several requests for critical section (CS), queued up, and serviced one at a time. Site States: Requesting CS, executing CS, idle (neither requesting nor executing CS). Requesting CS: blocked until granted access, cannot make additional requests for CS. Executing CS: using the CS. Idle: action is outside the site. In token-based approaches, idle site can have the token.

Mutual Exclusion: Requirements Freedom from deadlocks: two or more sites should not endlessly wait on conditions/messages that never become true/arrive. Freedom from starvation: No indefinite waiting. Fairness: Order of execution of CS follows the order of the requests for CS. (equal priority). Fault tolerance: recognize “faults”, reorganize, continue. (e.g., loss of token).

Performance Number of messages per CS invocation: should be minimized. Synchronization delay, i.e., time between the leaving of CS by a site and the entry of CS by the next one: should be minimized. Response time: time interval between request messages transmissions and exit of CS. System throughput, i.e., rate at which system executes requests for CS: should be maximized. If sd is synchronization delay, E the average CS execution time: system throughput = 1 / (sd + E).

Performance metrics Next site Last site enters CS exits CS Time Synchronization delay Messages sent Enter CS CS Request arrives Exit CS Time E Response Time

Performance ... Low and High Load: Best and Worst Case: Low load: No more than one request at a given point in time. High load: Always a pending mutual exclusion request at a site. Best and Worst Case: Best Case (low loads): Round-trip message delay + Execution time. 2T + E. Worst case (high loads). Message traffic: low at low loads, high at high loads. Average performance: when load conditions fluctuate widely.

Simple Solution Control site: grants permission for CS execution. A site sends REQUEST message to control site. Controller grants access one by one. Synchronization delay: 2T -> A site release CS by sending message to controller and controller sends permission to another site. System throughput: 1/(2T + E). If synchronization delay is reduced to T, throughput doubles. Controller becomes a bottleneck, congestion can occur.

Non-token Based Algorithms Notations: Si: Site I Ri: Request set, containing the ids of all Sis from which permission must be received before accessing CS. Non-token based approaches use time stamps to order requests for CS. Smaller time stamps get priority over larger ones. Lamport’s Algorithm Ri = {S1, S2, …, Sn}, i.e., all sites. Request queue: maintained at each Si. Ordered by time stamps. Assumption: message delivered in FIFO.

Lamport’s Algorithm Requesting CS: Executing CS: Releasing CS: Send REQUEST(tsi, i). (tsi,i): Request time stamp. Place REQUEST in request_queuei. On receiving the message; sj sends time-stamped REPLY message to si. Si’s request placed in request_queuej. Executing CS: Si has received a message with time stamp larger than (tsi,i) from all other sites. Si’s request is the top most one in request_queuei. Releasing CS: Exiting CS: send a time stamped RELEASE message to all sites in its request set. Receiving RELEASE message: Sj removes Si’s request from its queue.

Lamport’s Algorithm… Performance. Optimization 3(N-1) messages per CS invocation. (N - 1) REQUEST, (N - 1) REPLY, (N - 1) RELEASE messages. Synchronization delay: T Optimization Suppress reply messages. (e.g.,) Sj receives a REQUEST message from Si after sending its own REQUEST message with time stamp higher than that of Si’s. Do NOT send REPLY message. Messages reduced to between 2(N-1) and 3(N-1).

Lamport’s Algorithm: Example Step 1: (2,1) S1 S2 (1,2) S3 Step 2: S1 (1,2) (2,1) S2 enters CS S2 (1,2) (2,1) S3 (1,2) (2,1)

Lamport’s: Example… Step 3: (1,2) (2,1) S1 S2 leaves CS S2 (1,2) (2,1) S1 enters CS S2 (2,1) (1,2) (2,1) S3 (1,2) (2,1) (2,1)

Ricart-Agrawala Algorithm Requesting critical section Si sends time stamped REQUEST message Sj sends REPLY to Si, if Sj is not requesting nor executing CS If Sj is requesting CS and Si’s time stamp is smaller than its own request. Request is deferred otherwise. Executing CS: after it has received REPLY from all sites in its request set. Releasing CS: Send REPLY to all deferred requests. i.e., a site’s REPLY messages are blocked only by sites with smaller time stamps

Ricart-Agrawala: Performance 2(N-1) messages per CS execution. (N-1) REQUEST + (N-1) REPLY. Synchronization delay: T. Optimization: When Si receives REPLY message from Sj -> authorization to access CS till Sj sends a REQUEST message and Si sends a REPLY message. Access CS repeatedly till then. A site requests permission from dynamically varying set of sites: 0 to 2(N-1) messages.

Ricart-Agrawala: Example Step 1: S1 (2,1) S2 (1,2) S3 Step 2: S1 S2 enters CS S2 (2,1) S3

Ricart-Agrawala: Example… Step 3: S1 S1 enters CS S2 (2,1) S2 leaves CS S3

Distributed Deadlock

Distributed Deadlock Problem definition Permanent blocking of a set of processes that either compete for system resources or communicate with each other No node has complete and up-to-date knowledge of the entire distributed system Message transfers between processes take unpredictable delays

Distributed Deadlock A distributed system consists of a number of sites connected by a network. Each site maintains some of the resources of the system. Processes with a global unique identifier run on the distributed system. They make resource request to a controller. There is one controller per site. If the resource is local, the process makes a request of the local controller. The controller at each site could maintain a WFG on the process request s that it know about it. This is the local WFG. However, each site’s WFG could be cycle free and yet the distributed system could be blocked. This is called Global Deadlock.

System Model : System has only reusable resources Only exclusive access to resources Only one copy of each resource States of a process: running or blocked Running state: process has all the resources Blocked state: waiting on one or more resource

Types of Deadlocks Resource Deadlocks A process needs multiple resources for an activity. Use AND Condition AND Condition : Deadlock occurs if each process in a set request resources held by another process in the same set, and it must receive all the requested resources to move further. Communication Deadlocks Processes wait to communicate with other processes in a set. Use Or Condition: Each process in the set is waiting on another process’s message, and no process in the set initiates a message until it receives a message for which it is waiting.

Graph Models Nodes of a graph are processes. Edges of a graph the pending requests or assignment of resources. Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2. Transaction-wait-for Graphs (TWF): WFG in databases. Deadlock: directed cycle in the graph. Cycle example: P1 P2

Graph Models Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2. R1 P1 R2 P2

Deadlock in resource allocation Conditions for deadlock in resource allocation Mutual exclusion: The resource can be used by only one process at a time Hold and wait: A process holds a resource while waiting for other resources No preemption: A process cannot be preempted to free up the resource Circular wait: A closed cycle of processes is formed, where each process holds one or more resources needed by the next process in the cycle Strategies Prevent the formation of a circular wait Detect the potential or the actual occurrence of a circular wait Types of algorithms Deadlock prevention Deadlock avoidance Deadlock detection Special issues in distributed systems Resources are distributed across many sites The control processes that control access to resources do not have complete, up-to-date information on the global state of the system

Deadlock in Resource Allocation: Deadlock handling strategies Deadlock prevention 1.a Prevent the circular-wait condition by defining a linear ordering of resource types A process can be assigned resources only according to the linear ordering Disadvantages Resources cannot be requested in the order that are needed Resources will be longer than necessary 1.b Prevent the hold-and-wait condition by requiring the process to acquire all needed resources before starting execution Inefficient use of resources Reduced concurrency Process can become deadlocked during the initial resource acquisition Future needs of a process cannot be always predicted

Deadlock prevention (cont.) 1.c Use of time-stamps Example: Use time-stamps for transactions to a database – each transaction has the time-stamp of its creation The circular wait condition is avoided by comparing time-stamps: strict ordering of transactions is obtained, the transaction with an earlier time-stamp always wins “Wait-die” method if [ e (T2) < e (T1) ] halt_T2 (‘wait’); else kill_T2 (‘die’); “Wound-wait” method kill_T1 (‘wound’);

Wait-die Vs. Wound-wait If an old process wants a resource held by a young process, the old one will wait. If a young process wants a resource held by an old process, the young process will be killed. Observation: The young process, after being killed, will then start up again, and be killed again. This cycle may go on many times before the old one release the resource. Once we are assuming the existence of transactions, we can do something that had previously been forbidden: take resources away from running processes. When a conflict arises, instead of killing the process making the request, we can kill the resource owner. Without transactions, killing a process might have severe consequences. With transactions, these effects will vanish magically when the transaction dies. Wound-wait: (we allow preemption & ancestor worship) If an old process wants a resource held by a young process, the old one will preempt the young process -- wounded and killed, restarts and wait. If a young process wants a resource held by an old process, the young process will wait.

(2) Deadlock avoidance Decision made dynamically, before allocating a resource, the resulting global system state is checked - if safe, allow allocation Disadvantages Every site has to maintain global state of system (extensive overhead in storage and communication) Different sites may determine (concurrently) that state is safe, but global state may be unsafe: verification for safe global state by different sites must be mutually exclusive Large overhead to check for every allocation (distributed system may have large number of processes and resources) Conclusion: Deadlock avoidance impractical in distributed systems

(3) Deadlock Detection Principle of operation Detection of a cycle in WFG proceeds concurrently with normal operation Requirements for the deadlock detection and resolution algorithms Detection The algorithm must detect all existing deadlock in finite time The algorithm should not report non-existent (phantom) deadlock Resolution (recovery) All existing wait-for dependencies in WFG must be removed, i.e. roll-back one or more processes that are deadlocked and give their resources to other blocked processes Observation Deadlock detection is the most popular strategy for handling deadlocks in distributed systems

Algorithms for Distributed Deadlock Detection 3) Deadlock Detection (cont..) Control for distributed deadlock detection can be: Centralized Distributed Hierarchical  a.1 Centralized deadlock detection algorithms A central control site constructs the global WFG and searches for cycles Control site maintain WFG continuously (with every assignment) or when running deadlock detection (and asking all sites for WFG updates) Disadvantages: single point of failure and congestion  a.2 The completely centralized algorithm All sites request resources and release resources by sending corresponding messages to control site Control site updates WFG for each request/release For every new request edge added to WFG, control site checks WFG for deadlock Alternative: each site maintain its WFG and update control site periodically or on request

3) Deadlock Detection (cont.) Deadlock in resource allocation: Algorithms for distributed deadlock detection   3) Deadlock Detection (cont.) b. Hierarchical deadlock detection algorithms Sites organized in a tree structure with one site at the root of the tree Each node (except for leaf nodes) has information about the dependent nodes Deadlock is detected by the node that is the common ancestor of all sites which have resource allocations in conflict Deadlock is detected at the lowest level

3) Deadlock Detection (cont.) Deadlock in resource allocation: Algorithms for distributed deadlock detection (cont.) 3) Deadlock Detection (cont.) c. Distributed deadlock detection algorithms Principles All sites responsible for detecting a global deadlock Global state graph distributed over many sites: several of them participate in detection Detection initiated when a process suspected to be deadlocked Advantages: No single point of failure, no congestion Disadvantages: Difficult to implement (no shared memory) Types of algorithms Path-pushing algorithms Each node builds a WFG based on local info & info from other sites Detect and resolves local deadlocks Transmits to other sites deadlock info in form of (waiting path) Edge-chasing algorithms Special messages (probes) sent along edges of WFG to detect a cycle When blocked process receives probe, resends it on its outgoing edges of WFG When a process receives a probe it initiated, declares deadlock