CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection client-server), Global Names File Caching: In memory, In local disk Cache Update Policies : Write Back, Write through Case study: Sun Microsystems NFS  Today: distributed coordination

CS3772 What is distributed coordination?  In previous lectures we discussed various mechanisms to synchronize actions of processes in one machine Mutual exclusion: Semaphores, locks, monitors Ways of dealing with deadlocks: ignoring it, detecting it (let deadlock occur, detect them, and try to recover), prevention (statically make deadlock structurally impossible), Avoidance (avoid deadlock by allocating resources carefully) These mechanisms have been centralized Distributed coordination can be seen as generalization of these to distributed systems.

CS3773 Event Ordering Being able to order events is important to synchronization, e.g., we need to be able to specify that a resource can only be used after it has been granted. In a centralized system, it is possible to determine order of events This because all processes share common clock and memory In a distributed system there is no common clock It is therefore sometimes impossible to tell which of two events occurred first.

CS3774 Event order: The Happened- Before Relation Happened-Before denoted with arrow, e.g. A->B If A and B are events in the same process, and A was executed before B, then A-> B If A is the event sending a message and B is receiving a message, then A->B If A->B and B->C, then A->C If two events A and B are not related with -> relation then these events were executed concurrently  We don’t know which of these two events happened first

CS3775 Example:Space-time diagram three distributed processes p0 p1 p2 p3 q0 q1 q2 q3 r0 r1 r2 r3

CS3776 Example cont.  Ordered events: Are p0->q1; r0->q3; q2->r3; q0->p3 And also p0->q3 (as p0->q1 AND q1->q3)…  Concurrent events: q0 and p2 r0 and q2 p1 and q2  Since neither affects the other it is NOT important to know

CS3777 Implementation of Event Ordering  We would need either a COMMON CLOCK or PERFECTLY SYNCHRONIZED CLOCKS to determine event ordering in distributed systems  Not available/possible unfortunately!  How can we define the happened-before relationship WITHOUT physical clocks in distributed systems?

CS3778 Implementation of Event Ordering  We define a logical clock, LCi, for each process Pi.  We associate a timestamp with each event  We advance the logical clocks when sending messages to account for slower logical clocks, i.e., if A send to B and B’s clock is less that A’s timestamp, we advance LC(B) to LC(A) + 1;  Now we can meet global-ordering requirement: if A->B then A’s timestamp < B’s timestamp.

CS3779 Mutual Exclusion  How can we provide mutual exclusion across distributed processes?  1. Centralized approach We have one of the processes as coordinator To enter a critical section each process sends Request and waits for a Reply message. If there is a process in the critical section the coordinator queues the request. To leave the critical section we must send a Release message

CS37710 Centralized approach for mutual exclusion Advantages: Relatively small overhead Ensures mutual exclusion If scheduling is fair no starvation occurs Disadvantages: Coordinator can fail A new coordinator must be ELECTED Once the new coordinator is elected it must poll all the processes to reconstruct the request queue.

CS37711 Fully Distributed approach for mutual exclusion  Far more complicated solution  When a process Pi wants to enter its critical section, it generates a new timestamp TS, and sends a message Request(Pi,TS) to all processes.  A process can enter the critical section if receives Reply messages from all other processes.  Process Pj may not reply directly Because is already in its critical section Because it wants to enter its critical section, it checks TS and if his is smaller, the Reply is deferred

CS37712 Fully Distributed Approach  Advantages: mutual exclusion ensured Starvation free (scheduled based on Timestamp) Deadlock free  Disadvantages All processes must know each other If one process fails system collapses. Need continuous monitoring of the state of all processes to detect when one process fails.  Suitable for small number of processes

CS37713 Token-Passing approach to mutual exclusion  A token (is a special type of message) circulates among all processes  Processes logically organized in a ring  If a process does not need to enter a critical section it passes the token to its neighbor  Advantage: in highly loaded system only one message may be enough, starvation free …  Disadvantage: if a process fails a new logical ring must be established, in system with low contention (no process wants to enter its critical section) the amount of messages per a critical section entry can be very large.

CS37714 Deadlock handling with deadlock prevention  Deadlock avoidance not practical- require information about resource usage ahead of time that is rarely available.  Deadlock prevention Can use the local algorithms with modifications For example, we can use the resource-ordering (ensuring that resources are accessed in order) technique but first we need to define a global ordering among resources. New techniques are using time-stamp ordering: The wait-die scheme  Non-preemptive technique  If TS of Pi is smaller than TS of Pj, the resource Pi is requesting is hold by Pj, then Pi can wait for resource. Otherwise Pi must be rolled back (restarted). The wound-wait scheme  Preemptive  The opposite of wait-die: Pi waits if its TS is larger than Pj’s, otherwise Pj is rolled back and the resource is preempted from Pj.

CS37715 Deadlock handling with deadlock detection  The deadlock-prevention may preempt resources even if no deadlock has occurred!  Deadlock detection is based on so called wait-for graphs  A wait-for graph shows resource allocation state  A cycle in the wait-for graph represents deadlock P5P3 P1P2 P3 P2 P4 Site A Site B

CS37716 Global wait-for graphs  To show that there is NO DEADLOCK it is not enough to show that there is no cycle locally  We need to construct the global wait-for graph  It is the union of all local graphs. P5P3 P1 P2 P4

CS37717 How to construct this global wait-for graph?  Centralized approach:  The graph is maintained in ONE process: the deadlock-detection coordinator  Since there is communication delay in the system we have two types of graphs: Real wait-for graph // real but unknown state of the system Constructed wait-for graph // approximation generated by the coordinator during the execution of its algorithm  When is the wait-for graph constructed? 1.Whenever a new local edge inserted/removed a message is sent 2. Periodically maintained 3. Whenever the coordinator invokes the cycle-detector algorithm  What happens if a cycle is detected? The coordinator selects a victim and notifies all processes

CS37718 Centralized approach deadlock detection  False cycles may exist in the constructed global wait-for graph (because messages arrive in some order and delays contribute to edges added that form cycles; if a removed edge message arrives after another add edge message)  There is a centralized deadlock detection algorithm based on Option 3 that guarantees that it detects all deadlocks and no false deadlocks are detected.

CS37719 Summary  Event ordering in distributed systems  Various approaches for Mutual Exclusion in distributed systems Centralized approach Toke based approach Fully distributed  Deadlock prevention and detection Global wait-for graph

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

Similar presentations

Presentation on theme: "CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

Similar presentations

Presentation on theme: "CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection."— Presentation transcript:

Similar presentations

About project

Feedback