Distributed Systems Principles and Paradigms Chapter 05 Synchronization.

Distributed Systems Principles and Paradigms Chapter 05 Synchronization

Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS? –To coordinate access of shared resources –To order events

Time, Clocks and Clock Synchronization Time –Why is time important in DS? –E.g. UNIX make utility (see Fig. 5-1) Clocks (Timer) –Physical clocks –Logical clocks (introduced by Leslie Lamport) –Vector clocks (introduced by Collin Fidge) Clock Synchronization –How do we synchronize clocks with real-world time? –How do we synchronize clocks with each other? 05 – 1 Distributed Algorithms/5.1 Clock Synchronization

Physical Clocks (1/3) Problem: Clock Skew – clocks gradually get out of synch and give different values Solution: Universal Coordinated Time (UTC): Formerly called GMT (Greenwich Mean Time) Based on the number of transitions per second of the cesium 133 atom (very accurate). At present, the real time is taken as the average of some 50 cesium-clocks around the world – International Atomic Time Introduces a leap second from time to time to compensate that days are getting longer. UTC is broadcasted through short wave radio (with the accuracy of +/- 1 msec) and satellite (Geostationary Environment Operational Satellite, GEOS, with the accuracy of +/- 0.5 msec). Question: Does this solve all our problems? Don’t we now have some global timing mechanism? 05 – 2 Distributed Algorithms/5.1 Clock Synchronization

Physical Clocks (2/3) Problem: Suppose we have a distributed system with a UTC- receiver somewhere in it, we still have to distribute its time to each machine. Basic principle: Every machine has a timer that generates an interrupt H (typically 60) times per second. There is a clock in machine p that ticks on each timer interrupt. Denote the value of that clock by Cp ( t), where t is UTC time. Ideally, we have that for each machine p, Cp ( t) = t, or, in other words, dC/ dt = 1 Theoretically, a timer with H=60 should generate 216,000 ticks per hour In practice, the relative error of modern timer chips is 10**-5 (or between 215,998 and 216,002 ticks per hour) 05 – 3 Distributed Algorithms/5.1 Clock Synchronization

Physical Clocks (3/3) Goal: Never let two clocks in any system differ by more than  time units => synchronize at least every  2  seconds. 05 – 4 Distributed Algorithms/5.1 Clock Synchronization Where   is the max. drift rate

Clock Synchronization Principles Principle I: Every machine asks a time server for the accurate time at least once every  / 2  seconds (see Fig. 5-5). But you need an accurate measure of round trip delay, including interrupt handling and processing incoming messages. Principle II: Let the time server scan all machines periodically, calculate an average, and inform each machine how it should adjust its time relative to its present time. Ok, you’ll probably get every machine in sync. Note: you don’t even need to propagate UTC time (why not?) 05 – 5 Distributed Algorithms/5.1 Clock Synchronization

Clock Synchronization Algorithms The Berkeley Algorithm  The time server polls periodically every machine for its time  The received times are averaged and each machine is notified of the amount of the time it should adjust  Centralized algorithm, See Figure 5-6 Decentralized Algorithm  Every machine broadcasts its time periodically for fixed length resynchronization interval  Averages the values from all other machines (or averages without the highest and lowest values) Network Time Protocol (NTP)  the most popular one used by the machines on the Internet  uses an algorithm that is a combination of centralized/distributed 05 – 6 Distributed Algorithms/5.2 Logical Clocks

Network Time Protocol (NTP) a protocol for synchronizing the clocks of computers over packet- switched, variable-latency data networks (i.e., Internet) NTP uses UDP port 123 as its transport layer. It is designed particularly to resist the effects of variable latency NTPv4 can usually maintain time to within 10 milliseconds (1/100 s) over the public Internet, and can achieve accuracies of 200 microseconds (1/5000 s) or better in local area networks under ideal conditions visit the following URL to understand NTP in more detail http://en.wikipedia.org/wiki/Network_Time_Protocol

The Happened-Before Relationship Problem: We first need to introduce a notion of ordering before we can order anything. The happened-before relation on the set of events in a distributed system is the smallest relation satisfying: If a and b are two events in the same process, and a comes before b, then a  b. (a happened before b) If a is the sending of a message, and b is the receipt of that message, then a  b. If a  b and b  c, then a  c. (transitive relation) Note: if two events, x and y, happen in different processes that do not exchange messages, then they are said to be concurrent. Note: this introduces a partial ordering of events in a system with concurrently operating processes. 05 – 6 Distributed Algorithms/5.2 Logical Clocks

Logical Clocks (1/2) Problem: How do we maintain a global view on the system’s behavior that is consistent with the happened-before relation? Solution: attach a timestamp C ( e) to each event e, satisfying the following properties: P1: If a and b are two events in the same process, and a  b, then we demand that C ( a) < C ( b) P2: If a corresponds to sending a message m, and b to the receipt of that message, then also C ( a) < C ( b) Problem: How do we attach a timestamp to an event when there’s no global clock?  maintain a consistent set of logical clocks, one per process. 05 – 7 Distributed Algorithms/5.2 Logical Clocks

Logical Clocks (2/2) Each process Pi maintains a local counter Ci and adjusts this counter according to the following rules: (1) For any two successive events that take place within Pi, Ci is incremented by 1. (2) Each time a message m is sent by process Pi, the message receives a timestamp Tm = Ci. (3) Whenever a message m is received by a process Pj, Pj adjusts its local counter Cj : Property P1 is satisfied by (1); Property P2 by (2) and (3). This is called the Lamport’s Algorithm 05 – 8 Distributed Algorithms/5.2 Logical Clocks

Logical Clocks – Example 05 – 9 Distributed Algorithms/5.2 Logical Clocks Fig 5-7. (a) Three processes, each with its own clock. The clocks run at different rates. (b) Lamport’s algorithm corrects the clocks

Assign the Lamport’s logical clock values for all the events in the above timing diagram. Assume that each process’s local clock is set to 0 initially. a b c d e f g h i j k l P1P2 P3

From the above timing diagram, what can you say about the following events? between a and b:a  b between b and f : b  f between e and k:concurrent between c and h:concurrent between k and h:k  h a b c d e f g h i j k l P1P2 P3 1 2 3 4 1 3 4 5 6 1 2 3

Total Ordering with Logical Clocks Problem: it can still occur that two events happen at the same time. Avoid this by attaching a process number to an event: Pi timestamps event e with Ci ( e) i Then: Ci ( a) i happened before Cj ( b) j if and only if: 1: Ci ( a) < Cj ( a) ; or 2: Ci ( a) = Cj ( b) and i < j 05 – 10 Distributed Algorithms/5.2 Logical Clocks

Example: Totally-Ordered Multicast (1/2) Problem: We sometimes need to guarantee that concurrent updates on a replicated database are seen in the same order everywhere: Process P1 adds $100 to an account (initial value: $1000) Process P2 increments account by 1% There are two replicas Outcome: in absence of proper synchronization, replica #1 will end up with $1111, while replica #2 ends up with $1110. 05 – 11 Distributed Algorithms/5.2 Logical Clocks

Example: Totally-Ordered Multicast (2/2) Process Pi sends timestamped message msg i to all others. The message itself is put in a local queue queue i. Any incoming message at Pj is queued in queue j, according to its timestamp. Pj passes a message msg i to its application if: (1) msg i is at the head of queue j (2) for each process Pk, there is a message msg k in queue j with a larger timestamp. Note: We are assuming that communication is reliable and FIFO ordered. 05 – 12 Distributed Algorithms/5.2 Logical Clocks

Fidge’s Logical Clocks with Lamport’s clocks, one cannot directly compare the timestamps of two events to determine their precedence relationship - if C(a) < C(b) then a  b - if C(a) < C(b), it could be a  b or a  b - e.g., events e and b in the previous example Figure * C(e) = 1 and C(b) = 2 * thus C(e) < C(b) but e  b the main problem is that a simple integer clock can not order both events within a process and events in different processes Collin Fidge developed an algorithm that overcomes this problem Fidge’s clock is represented as a vector [c 1, c 2, …, c n ] with an integer clock value for each process (c i contains the clock value of process i) // / /

Fidge’s Algorithm The Fidge’s logical clock is maintained as follows: 1: Initially all clock values are set to the smallest value. 2: The local clock value is incremented at least once before each primitive event in a process. 3: The current value of the entire logical clock vector is delivered to the receiver for every outgoing message. 4: Values in the timestamp vectors are never decremented. 5:Upon receiving a message, the receiver sets the value of each entry in its local timestamp vector to the maximum of the two corresponding values in the local vector and in the remote vector received. The element corresponding to the sender is a special case; it is set to one greater than the value received, but only if the local value is not greater than that received.

Get r_vector from the received msg sent by process q; if l_vector [q]  r_vector[q] then l_vector[q] : = r_vector[q] + 1; for i : = 1 to n do l_vector[i] := max(l_vector[i], r_vector[i]); Timestamps attached to the events are compared as follows: e p  f q iff T e p [p] < T f q [p] (where e p represents an event e occurring in process p, T e p represents the timestamp vector of the event e p, and the i th element of T e p is denoted by T e p [i].) This means event e p happened before event f q if and only if process q received a direct or indirect message from p and that message was sent after e p had occurred. If e p and f q are in the same process (i,e., p = q), the local elements of their timestamps represent their occurrences in the process.

Assign the Lamport’s and Fidge’s logical clock values for all the events in the above timing diagram. Assume that each process’s logical clock is set to 0 initially. a b c d e f g h i j k l P1P2 P3

P1P2 P3 a b c d e f g h i j k l 1 2 3 4 1 3 4 5 6 1 2 3 [1,0,0] [2,0,0] [3,0,0] [4,0,0] [0,1,0] [3,2,0] [3,3,3] [3,4,3] [5,5,3] [0,0,3] [0,0,2] [0,0,1]

The above diagram shows both Lamport timestamps (an integer value ) and Fidge timestamps (a vector of integer values ) for each event. –Lamport clocks: 2 < 5 since b  h, 3 < 4 but c  g. –Fidge clocks: f  h since 2 < 4 is true, b  h since 2 < 3 is true, h  a since 4 < 0 is false, c  h since (3 < 3) is false and (4 < 0) is false.

a b c d e f g h i j k l P1P2 P3 m n o P4 Assign the Lamport’s and Fidge’s logical clock values for all the events in the above timing diagram. Assume that each process’s logical clock is set to 0 initially.

From the above timing diagram, what can you say about the following events? 1.between b and n: 2.between b and o: 3.between m and g: 4.between c and h: 5.between c and l: 6.between j and g: 7.between k and i: 8.between j and h:

READING Reference: Colin Fidge, “Logical Time in Distributed Computing Systems”, IEEE Computer, Vol. 24, No. 8, pp. 28-33, August 1991.

Global State (1/3) Basic Idea: Sometimes you want to collect the current state of a distributed computation, called a distributed snapshot. It consists of all local states and messages in transit. Important: A distributed snapshot should reflect a consistent state: 05 – 15 Distributed Algorithms/5.3 Global State

Global State (2/3) Note: any process P can initiate taking a distributed snapshot P starts by recording its own local state P subsequently sends a marker along each of its outgoing channels When Q receives a marker through channel C, its action depends on whether it had already recorded its local state: – Not yet recorded: it records its local state, and sends the marker along each of its outgoing channels – Already recorded: the marker on C indicates that the channel’s state should be recorded: all messages received before this marker and the time Q recorded its own state. Q is finished when it has received a marker along each of its incoming channels 05 – 16 Distributed Algorithms/5.3 Global State

Global State (3/3) 05 – 17Distributed Algorithms/5.3 Global State (a) Organization of a process and channels for a distributed snapshot (b) Process Q receives a marker for the first time and records its local state (c) Q records all incoming message (d) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel

Election Algorithms Principle: Many distributed algorithms require that some process acts as a coordinator. The question is how to select this special process dynamically. Note: In many systems the coordinator is chosen by hand (e.g., file servers, DNS servers). This leads to centralized solutions => single point of failure. Question: If a coordinator is chosen dynamically, to what extent can we speak about a centralized or distributed solution? Question: Is a fully distributed solution, i.e., one without a coordinator, always more robust than any centralized/coordinated solution? 05 – 18 Distributed Algorithms/5.4 Election Algorithms

Election by Bullying (1/2) Principle: Each process has an associated priority (weight). The process with the highest priority should always be elected as the coordinator. Issue: How do we find the heaviest process? Any process can just start an election by sending an election message to all other processes (assuming you don’t know the weights of the others). If a process P heavy receives an election message from a lighter process P light, it sends a take-over message to P light. P light is out of the race. If a process doesn’t get a take-over message back, it wins, and sends a victory message to all other processes. 05 – 19 Distributed Algorithms/5.4 Election Algorithms

Election by Bullying (2/2) Question: We’re assuming something very important here – what? 05 – 20 Distributed Algorithms/5.4 Election Algorithms Assumption: Each process knows the process number of other processes

Election in a Ring Principle: Process priority is obtained by organizing processes into a (logical) ring. Process with the highest priority should be elected as coordinator. Any process can start an election by sending an election message to its successor. If a successor is down, the message is passed on to the next successor. If a message is passed on, the sender adds itself to the list. When it gets back to the initiator, everyone had a chance to make its presence known. The initiator sends a coordinator message around the ring containing a list of all living processes. The one with the highest priority is elected as coordinator. See Figure 5-12. 05 – 21 Distributed Algorithms/5.4 Election Algorithms Question: Does it matter if two processes initiate an election? Question: What happens if a process crashes during the election?

Mutual Exclusion Problem: A number of processes in a distributed system want exclusive access to some resource. Basic solutions: Via a centralized server. Completely distributed, with no topology imposed. Completely distributed, making use of a (logical) ring. Centralized: Really simple: 05 – 22 Distributed Algorithms/5.5 Mutual Exclusion

Mutual Exclusion: Ricart & Agrawala Principle: The same as Lamport except that acknowledgments aren’t sent. Instead, replies (i.e., grants) are sent only when: The receiving process has no interest in the shared resource; or The receiving process is waiting for the resource, but has lower priority (known through comparison of timestamps). In all other cases, reply is deferred (see the algorithm on pg. 267) 05 – 23 Distributed Algorithms/5.5 Mutual Exclusion

Mutual Exclusion: Token Ring Algorithm Essence: Organize processes in a logical ring, and let a token be passed between them. The one that holds the token is allowed to enter the critical region (if it wants to) 05 – 24 Distributed Algorithms/5.5 Mutual Exclusion

Distributed Transactions The transaction model Classification of transactions Concurrency control

The Transaction Model (1) Updating a master tape is fault tolerant. Question: What happens if this computer operation fails?  Both tapes are rewound and the job is restarted from the beginning without any harm being done

Figure 5-18 Example primitives for transactions. PrimitiveDescription BEGIN_TRANSACTIONMake the start of a transaction END_TRANSACTIONTerminate the transaction and try to commit ABORT_TRANSACTIONKill the transaction and restore the old values READRead data from a file, a table, or otherwise WRITEWrite data to a file, a table, or otherwise The Transaction Model (2)

a)Transaction to reserve three flights commits b)Transaction aborts when third flight is unavailable BEGIN_TRANSACTION reserve BOS -> JFK; reserve JFK -> ICN; reserve SEL -> KPO; END_TRANSACTION (a) BEGIN_TRANSACTION reserve BOS -> JFK; reserve JFK -> ICN; reserve SEL -> KPO full => ABORT_TRANSACTION (b) The Transaction Model (3)

ACID Properties of Transactions Atomic  To the outside world, the transaction happens indivisibly Consistent  The transaction does not violate system invariants Isolated  Concurrent transactions do not interfere with each other Durable  Once a transaction commits, the changes are permanent

Nested Transactions Constructed from a number of subtransactions The top-level transaction may create children that run in parallel with one another to gain performance or simplify programming Each of these children is called a subtransaction and it may also have one or more subtransactions When any transaction or subtransaction starts, it is conceptually given a private copy of all data in the entire system for it to manipulate as it wishes If it aborts, its private space is destroyed If it commits, its private space replaces the parent’s space If the top-level transaction aborts, all the changes made in the subtransactions must be wiped out

Distributed Transactions - Transactions involving subtransactions that operate on data that are distributed across multiple machines - Separate distributed algorithms are needed to handle the locking of data and committing the entire transaction

Implementing Transactions 1. Private Workspace Gives a private workspace (i.e., all the data it has access to) to a process when it begins a transaction 2.Writeahead Log Files are actually modified in place but before any block is changed, a record is written to a log telling which transaction is making the change which file and block is being changed what the old and new values are Only after the log has been written successfully, the change is made to the file Question: Why is a log needed?  for “rollback” if necessary

Private Workspace a)The file index and disk blocks for a three-block file b)The situation after a transaction has modified block 0 and appended block 3 c)After committing

Writeahead Log (a) a transaction (b) – (d) The log before each statement is executed x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y = y + 2 x = y * y; END_TRANSACTION; (a) Log [x = 0 / 1] (b) Log [x = 0 / 1] [y = 0 / 2] (c) Log [x = 0 / 1] [y = 0 / 2] [x = 1 / 4] (d)

Concurrency Control (1) Fig. 5-23 General organization of managers for handling transactions The goal of concurrency control is to allow multiple transactions to be executed simultaneously Final result should be the same as if all transactions had run sequentially

General organization of managers for handling distributed transactions. Concurrency Control (2)

Serializability (1) (d) Possible schedules Question: Why is Schedule 3 illegal? BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION (a) BEGIN_TRANSACTION x = 0; x = x + 2; END_TRANSACTION (b) BEGIN_TRANSACTION x = 0; x = x + 3; END_TRANSACTION (c) Schedule 1x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3Legal Schedule 2x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3;Legal Schedule 3x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3;Illegal (d) (a) – (c) Three transactions T 1, T 2, and T 3

Serializability (2) Two operations conflict is they operate on the same data and if at least one of them is a write operation –read-write conflict: exactly one of the operations is a write –write-write conflict: involves more than one write operations Concurrency control algorithms can generally be classified by looking at the way read and write operations are synchronized –Using locking –Explicitly ordering operations using timestamps

Two-Phase Locking (1) Fig. 5-26 Two-phase locking In two-phase locking (2PL), the scheduler first acquires all the locks it needs during the growing (1 st ) phase, and then releases them during the shrinking (2 nd ) phase See the rules on pg. 284

Fig. 5-27 Strict two-phase locking Two-Phase Locking (2) In strict two-phase locking, the shrinking phase does not take place until the transaction has finished running and has either committee or aborted.

READING: Read Chapter 5

Distributed Systems Principles and Paradigms Chapter 05 Synchronization.

Similar presentations

Presentation on theme: "Distributed Systems Principles and Paradigms Chapter 05 Synchronization."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems Principles and Paradigms Chapter 05 Synchronization.

Similar presentations

Presentation on theme: "Distributed Systems Principles and Paradigms Chapter 05 Synchronization."— Presentation transcript:

Similar presentations

About project

Feedback