CS294, YelickTime, p1 CS 294-8 Time, Clocks, and Snapshots in Distributed Systems

CS294, YelickTime, p1 CS 294-8 Time, Clocks, and Snapshots in Distributed Systems http://www.cs.berkeley.edu/~yelick/294

CS294, YelickTime, p2 Agenda Concurrency models –Partial orders, state machines, Correctness conditions –Serializability, … –Safety and liveness Clock synchronization Bag of tricks for distributed alg’s –Timestamps, markers,

CS294, YelickTime, p3 Common Approach #1 Reasoning about a concurrent system: partially order events in which you can observe the ordering –Messages, memory operations, events Consider all possible topological sorts to serialize Each of those serial “histories” must be “correct.”

CS294, YelickTime, p4 Happens Before Relation System is a set of processes Each process is a sequence of events (a total ordering) on events Happens before, denoted ->, is defined as the smallest relation s.t.: –1) if a and b are in the same process, and a is before b, then a -> b –2) If a is a message send and b is the matching receive, then a -> b –3) if a->b and b->c then a->c (transitive)

CS294, YelickTime, p5 Happens Before Example What if processes are multithreaded? How to we determine the events? –Send message, receive message, what else? –What about read/write? Nonblocking opns? Does this help if we’re trying to reason about a program or an algorithm? time

CS294, YelickTime, p6 Common Approach #2 Reasoning about a concurrent system: View the system as a state machine that produces serial executions: –Invocation/response events, send/receive message (with explicit model of network between) Each of the global serial executions must be “correct.”

CS294, YelickTime, p7 State Machine Approach A distributed system is: –A finite set of processes –A finite set of channels (network links) A channel is –Reliable, order-preserving, and with infinite buffer space A process is: –A set of states, with one initial state –A set of events Models differ on specific details

CS294, YelickTime, p8 Comparison of Models Which approach (partial orders vs. state machines) is better? Is one lower level than the other? Is one for shared memory and the other for message?

CS294, YelickTime, p9 Common Notions of Correctness Serializability, strong serializability Sequential consistency, total store ordering, weak ordering, … Linearizability (used in wait-free) All of these are based on the idea that operations (transactions) must appear to happen in some serial order Which of these (or others) are useful?

CS294, YelickTime, p10 Variations on Correctness Why are there so many notions? –Do all processes observe the same “serial” order, or can some see different views? –Are the specifications of each operation deterministic? Can processes see different “correct” behaviors? –Are all operations executed by a process ordered? E.g., read x -> read y doesn’t matter. –Is the observed order consistent with real time?

CS294, YelickTime, p11 Closed World Assumption Most of these correctness ideas assume that all communication is part of the system. Anomalies come from: –Phone calls between users –A second “external” network How does one prevent these anomalies?

CS294, YelickTime, p12 Clock Condition A logical clock assigns a timestamp C to each event a. Clock Condition: For any events a, b: –If a->b then C To implement a clock –Increment clock between each pair of events within a process –When you send a message, append current time –When you receive a message, update own clock with max(timestamp, my-current-time) +1

CS294, YelickTime, p13 Mutual Exclusion Lamport defines a total order => as the clock ordering with ties broken by PID “Clocks” are not synchronized a priori –Loosely synchronized within algorithm Uses for mutual exclusion algorithm –How useful is this algorithm? –Is the specification useful?

CS294, YelickTime, p14 Clock Synchronization Two notions of synchronization: –External: Within some bound of UTC (Universal Coordinated Time) –Internal: Within some bound of other clocks in the system Huge literature on clock synch. under various models (e.g., PODC). –Impossible in general due to message delays –Many algorithms synchronize clocks with high probability (within some bound) Are synchronized clocks useful? For fault tolerance or in general?

CS294, YelickTime, p15 Clock Synchronization Algorithms Christian [89] describes a centralized time server approach –Set time = t_server + t_round_trip/2 –No fault tolerance to server failure Berkeley Algorithm [89]: –Master polls slaves and records round_trip time to estimate local times –Averages “non faulty” clocks –Send delta (not new time) to slaves –Master can be re-elected

CS294, YelickTime, p16 Clock Synchronization Algorithms Network Time Protocol (NTP) [Mills 95] meant for wide area: –Uses statistical techniques to filter clock data –Servers can reconfigure after faults –Protected (authentication used) Design: –Primary servers connected to radio clock –Secondary servers synchronize with primaries, with distance from primary defining stata –Modes: Multicast, Procedure call, Symmetric –1-10s of milliseconds accuracy reported

CS294, YelickTime, p17 Stable Properties Termination detection, garbage detection, deadlock are stable properites: once true, they remain true “forever.” Snapshot algorithms are useful for stable properties. What properties related to faults are stable?

CS294, YelickTime, p18 Safety and Liveness A safety property says: –informally: nothing bad ever happens –formally: any violation of a safety property can be observed on a finite subsequence of the execution –this is a partial correctness condition A liveness property says: –informally: something good eventually happens –formally: it is a property of an infinite execution that cannot be checked on finite subsequences –this is a total correctness condition (combined with safety) What about real time constraints?

CS294, YelickTime, p19 Snapshot Algorithm Each process records is own state Pair of processes that share a channel coordinate to save its state c c’ p q Example: Single token conservation system s 0 = no token; s 1 = has token in-p --> snapshot p --> in c --> snapshot q and channels by q => see 2 copies of token

CS294, YelickTime, p20 Snapshot Algorithm Idea: mark point in message stream –P sends 1 marker after recording its state and before sending any message on channel c –When q receives the marker it either Records current state and c as being empty Keeps previously recorded state and records c as contain all messages received since recorded and after marker –Works for n strongly connect processors

CS294, YelickTime, p21 Related Algorithms Dijkstra-Scholten termination detection –Create an implicit spanning tree: sender of your first message is your parent –Keep track of children awaiting termination and signal parent when they (and you) are done.

CS294, YelickTime, p22 Backup Example Slides

CS294, YelickTime, p23 Using State Machines to Model Parallel Systems A large parallel application is built from a set of communicating objects The correctness can be divided into two problems: –the correctness of each of the objects –the correctness of the system that uses the objects Both can be modeled as a state machine Distributed hash table Distributed task queue

CS294, YelickTime, p24 Atomicity In any attempt to reason about concurrency, need to determine the level of atomicity –what is the smallest indivisible event? reads and writes to memory (usually) basic arithmetic and conditional operations message sends/receives Often too complex to do all reasoning at this level –group a set of low level events within a function/method into a higher level atomic event Leads to multi-level, modular proofs

CS294, YelickTime, p25 Specifications and Implementations The first step in reasoning about correctness is having a specification of desired behavior –what sequences of operations (inputs and results) are legal? A specification and implementation may be written –in the same formal language –in two different languages Concentrate on the behaviors produced and avoid syntax here

CS294, YelickTime, p26 Correctness of Serial Types A sequential ADT may be specified by pre/post conditions on the operations –insert (s, e) pre: true post: s = s + { e } –remove (s, e) pre: e in s post: s = s - { e } An implementation is correct if, given any sequence of these operations, they meet the specifications insert remove … time

CS294, YelickTime, p27 Linearizability Taken from “Axioms for concurrent objects” by Herlihy and Wing, TOPLAS, July 1990. Intuition behind correctness condition –each operation should appear instantaneous –order of nonconcurrent operations should be preserved Using this intuition, examine some histories and determine if they are acceptable Example: a queue DT with –enqueue (E) and –dequeue (D) operations

CS294, YelickTime, p28 Some Queue Histories History 1, acceptable History 2, not acceptable History 3, not acceptable q.E(x) A q.D(y) A q.E(z) A q.E(y) B q.D(x) B q.E(x) A q.D(y) A q.E(y) B q.E(x) A q.D(y) A q.E(y) B q.D(y) B

CS294, YelickTime, p29 Execution Model Details A concurrent systems is a –collection of sequential threads –communication through shared objects Each object has a –unique name –a type, which defines the set of legal operations Formally, each operation invocation is 2 events –an invocation event: –a response event: –where term is a termination condition: OK, exception, etc. –an invocation and response match if they have the same process name and the same object name

CS294, YelickTime, p30 Histories Assume invocation and response events are totally ordered (although operation may overlap) A history is a finite sequence of invocation and response events An invocation is pending in a history if no matching response follows. Given a history H, complete(H) is the maximal subsequence of H consisting only of invocations and matching responses A history H is sequential if: –the first event in H is an invocation –each invocation (except possibly the last) is immediately followed by a matching response A history that is not sequential is concurrent.

CS294, YelickTime, p31 Ordering in Histories A history H induces an irreflexive partial order < H op1 < H op2 iff the response for op1 appears before the invocation for op2 in H another presentation (non graphical) of the same history op1 op2 op1 < op2 & op1 < op4 op3 op4 op3 < op2 & op3 < op4 op1_inv op3_inv op1_res op3_res op2_inv op4_inv op2_res op4_res

CS294, YelickTime, p32 Linearizability A process subhistory H|P (H at P) is the subsequence of all events in H whose process names are P Two histories H1 and H2 are equivalent if for all processes P, H1|P = H2|P A history H is linearizable if it can be extended (by appending zero or more response events) to a history H’ such that –L1: complete(H’) is equivalent to some legal sequential history S –L2: < H subseteq < S

CS294, YelickTime, p33 Observations on Linearizability Linearizability is stronger that sequential consistency Linearizability is composable, which is nice for building large systems of component objects Intuitively, linearizability states that operations must appear to take effect sometime between their invocation and response Too strong for large scale machines and distributed data structures?

CS294, YelickTime, p34 Relaxed Consistency Model In the Multipol project, we used a weaker notion of correctness Each operation was divided into an invocation part and completion part –like put/sync and get/sync in Split-C –e.g., E.update_mesh & E.sync_update –e.g., T.insert_element * T.sync_insert This worked well for several example problems Data structures do not compose well –need more than 2 phases in some cases

CS294, YelickTime, p35 Correctness An implementation is a set of history in which events of two objects –a representation object, REP –an abstract object, ABS are interleaved in a constrained way: For each history H in the implementation –1) the subhistories H|REP and H|ABS are well-formed –2) for each processor P, each REP operation in H | P lies within an abstract operation in H | P –e.g., q.enq_inv(x) lock(q.l), q.back++, q.dat[back]=x, unlock(q.l) q.enq_res/OK An implementation is correct if H|ABS is linearizable

CS294, YelickTime, p36 Safety and Liveness Examples Examples of safety properties –There are no race conditions –This queue is never empty –One thread never accesses another thread’s data –No arithmetic operation in the program ever overflows Examples of liveness properties –The program eventually terminates –The scheduler is “fair”: every enabled thread eventually gets to run –The scheduler is “fair”: a thread enabled infinitely often will get to run infinitely often –The set will eventually contain the grobner basis of the input

CS294, YelickTime, p37 Proving Safety and Liveness Techniques for proving safety properties are relatively straightforward –sometimes many states and cases for a particular system –extends ideas from reasoning about sequential code Techniques for proving liveness are much more difficult –often want something stronger than liveness as well, such as a time bound –some of the formal frameworks make proofs of particular liveness properties easier, e.g., fairness –often ensure a liveness property using a stronger safety property

CS294, YelickTime, p38 Methods for Proving Correctness (Safety) In serial implementations, the subset of REP values that are legal representations of and ABS values are those that satisfy the representation invariant –I: REP -> BOOL, a predicate on REP values The meaning of a legal representation is defined by an abstraction function –A: REP -> ABS In sequential programs, the abstract operations may go through a set of states in which the invariant I is not true

CS294, YelickTime, p39 Existence of Abstraction Functions There are three cases that arise in trying to prove that an abstraction function exist in a concurrent system The function can be defined directly on the REP state A history variable is needed to record a past event A prophecy variable is needed record a future event (Alternatively, you may use an abstraction relation.) A(t’)A(t) t t’

CS294, YelickTime, p40 Example 1: Locked Queue Given a queue implementation containing –integers, back, front –an array of values, items –a lock, l Enq = proc (q: queue, x: item) // ignoring buffer overflow lock(l) i: int = q.back++ // allocate new slot q.items[i] = x // fill it unlock(l) Deq = proc (q: queue) returns (item) signals empty lock(l) if (back==front) signal empty else front++ ret: item = items[front] unlock (l) return(ret)

CS294, YelickTime, p41 Simple Abstraction Function The abstraction function maps the elements in items items[front…back] to the abstract queue value Proof is straightward The lock prevents the “interesting” cases

CS294, YelickTime, p42 Example 2: Statistical DB Type Given a “statistical DB” Spec with the operations –add(x): add a new number, x, to the DB –size(): report the number of elements in the DB –mean(): report the mean of all elements in the DB –variance(): report the variance of all elements in the DB Straightforward implementation keeps set of values A more compact one uses the three values only: –integer count, initially 0 // number of elements –float sum, initially 0 // sum of elements –float sumSquare, initially 0 // sum of squares of all elements Implementations of operations are straightforward

CS294, YelickTime, p43 Need for History Variables The problem with verifying this example is that the specification contains more information than the implementation Proof idea: –add a variable items to the representation state (for the proof only) –the implementation may update items –it may not use items to compute operations The abstraction function, A, for the augmented DB simply maps the items field to the abstract state Need to prove the representation invariants: –count = items.size –sum = sum({x | x in items}) –sumSquare = sum({x 2 | x in items})

CS294, YelickTime, p44 Example 2: Queue Implementation Given a queue implementation containing –an integer, back –an array of values, items Enq = proc (q: queue, x: item) // ignoring buffer overflow i: int = INC(q.back) // allocate new slot, atomic STORE(q.items[i], x) // fill it Deq = proc (q: queue) returns (item) while true do range: int = READ(q.back) - 1 for i: int in 1.. range do x: item = SWAP (q.items[i], null) if x!= null then return

CS294, YelickTime, p45 Queue Example Notes There are several atomic operations defined here –STORE, SWAP, INC –These may or may not be supported on given hardware, which would change the proof The deq operation starts at the front end of the queue –slots already dequeued will show up at nulls –slots not yet filled will also be nulls –picks the first non-empty slot –will repeat scan until it finds an element, waiting for an enqueue to happen if necessary Many inefficiencies, such as the lack of a head pointer. Example is to illustrate proof technique.

CS294, YelickTime, p46 Need for Prophecy Variables Representation invariant –to be useful, these must be true at every point in an execution, I.e., the points observable by other concurrent operations –sequential case was much weaker Abstraction function Enq(x) A Enq(y) B INC(q.back) A for this execution, there is no way of OK(1) A defining an abstraction function without INC(q.back) B predicting the future, I.e., whether x or y OK(2) B will be dequeued first STORE(q.items[2], y) B OK() B

CS294, YelickTime, p47 Abstraction Relation An alternate approach to using history and prophecy variables is to use and abstraction relation, rather than a function Queue example: HistoryLinearized values {[]} Enq(x) A {[], [x]} Enq(y) B {[], [x], [y], [x,y], [y,x]} OK() B {[y], [x,y], [y,x]} OK() A {[x,y], [y,x]} Deq() C {[x], [y], [x,y], [y,x]} OK(x) C {[y]}

CS294, YelickTime, p48 Key idea in Queue Proof Lemma: If Enq(x), Enq(y), Deq(x) and Deq(y) are complete operations of H such that x’s Enq precedes y’s Enq, then y’s Deq does not precede x’s Deq. Proof: Suppose this is not true. Pick a linearization and let q i and q j be queue values following the Deq operation of x and y respectively. From the assumption that j < i, q j-1 = [y,…,x,…] which implies that y is enqueued before x, a contradiction.

CS294, YelickTime, p49 Relaxed Consistency Model In the Multipol project, we used a weaker notion of correctness Each operation was divided into an invocation part and completion part –like put/sync and get/sync in Split-C –e.g., E.update_mesh & E.sync_update –e.g., T.insert_element * T.sync_insert This worked well for several example problems Data structures do not compose well –need more than 2 phases in some cases

CS294, YelickTime, p50 Generalizing Linearizability to Relaxed Consistency Generalization to split-phase operations is straightforward Each logical operation has: –op_start –op_complete In the formal model, each of these have invocation and response events (like other operations) The total order <H is define by the completion of op_complete happening before the initiation of op_start Informally, the operation must take place atomically between two phases Additional generalization: different processes see different total orders.

CS294, YelickTime, p1 CS 294-8 Time, Clocks, and Snapshots in Distributed Systems

Similar presentations

Presentation on theme: "CS294, YelickTime, p1 CS 294-8 Time, Clocks, and Snapshots in Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS294, YelickTime, p1 CS 294-8 Time, Clocks, and Snapshots in Distributed Systems

Similar presentations

Presentation on theme: "CS294, YelickTime, p1 CS 294-8 Time, Clocks, and Snapshots in Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback