CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Slides:



Advertisements
Similar presentations
Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.
Advertisements

SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
1 Causality. 2 The “happens before” relation happens before (causes)
Efficient Solutions to the Replicated Log and Dictionary Problems
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
CPSC 668Set 4: Asynchronous Lower Bound for LE in Rings1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Cloud Computing Concepts
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed Systems Foundations Lecture 1. Main Characteristics of Distributed Systems Independent processors, sites, processes Message passing No shared.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Chapter 17 Theoretical Issues in Distributed Systems
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Page 1 Logical Clocks Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is.
“Virtual Time and Global States of Distributed Systems”
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
CIS825 Lecture 2. Model Processors Communication medium.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global State Recording
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
CSC 8320 Advanced Operating System
Distributed Snapshots & Termination detection
Overview of Ordering and Logical Time
SYNCHORNIZATION Logical Clocks.
Time and Clock Primary standard = rotation of earth
EECS 498 Introduction to Distributed Systems Fall 2017
COT 5611 Operating Systems Design Principles Spring 2012
Global State Recording
EECS 498 Introduction to Distributed Systems Fall 2017
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
湖南大学-信息科学与工程学院-计算机与科学系
Logical Clocks and Casual Ordering
Time And Global Clocks CMPT 431.
CS 425 / ECE 428  2013, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou.
ITEC452 Distributed Computing Lecture 10 Time in a Distributed System
Chapter 5 Synchronization
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Chapter 5 (through section 5.4)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Global States
Distributed algorithms
CSE 542: Operating Systems
Proof of liveness: an example
COT 5611 Operating Systems Design Principles Spring 2014
Outline Theoretical Foundations
Presentation transcript:

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Set 12: Causality CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS CSCE 668 Fall 2011 Prof. Jennifer Welch

Logical Clocks Motivation In an asynchronous system, we often cannot tell which of two events occurred before the other: Example A Example B p0 p1 m0 m1 p0 p1 m0 m1 In Example A, processors cannot tell which message was sent first. Probably not important. In Example B, processors can tell which message was sent first. Might be important. Let's try to determine relative ordering of some (not all) events. Set 12: Causality CSCE 668

Happens Before Partial Order Given an execution, computation event a happens before computation event b, denoted a  b, if a and b occur at same processor and a precedes b, or a results in sending m and b includes receipt of m, or there exists computation event c such that a  c and c  b (transitive closure) Set 12: Causality CSCE 668

Happens Before Partial Order Happens before means that information can flow from a to b, i.e., that a might cause b. a  b b  c a d p0 p1 m0 m1 c  d a  c a  d b c b  d Set 12: Causality CSCE 668

Concurrent Events If a does not happen before b, and b does not happen before a, then a and b are concurrent, denoted a || b. Set 12: Causality CSCE 668

Happens Before Example Rule 1: a  b, c  d  e  f, g  h i Rule 2: a  d, g  e, f  i h || e, … Rule 3: a  e, c  i, … Set 12: Causality CSCE 668

Logical Clocks Logical clocks are values assigned to events to provide some information about the order in which events happen. Goal is to assign an integer L(e) to each computation event e in an execution such that if a  b, then L(a) < L(b). Set 12: Causality CSCE 668

Logical Timestamps Algorithm Each pi keeps a counter (logical timestamp) Li, initially 0 Every message that pi sends is timestamped with current value of Li Li is incremented at each step by pi to be greater than its current value, and the timestamps on all messages received at this step If a is an event at pi, then assign L(a) to be the value of Li at the end of a. Set 12: Causality CSCE 668

Logical Timestamps Example 1 2 3 4 5 a  b : L(a) = 1 < 2 = L(b) f  i : L(f) = 4 < 5 = L(i) a  e : L(a) = 1 < 3 = L(e) etc. Set 12: Causality CSCE 668

Getting a Total Order If a total order is required, break ties using ids. In the example, L(a) = (1,0), L(c) = (1,1), etc. Timestamps are ordered lexicographically. In the example, L(a) < L(c). Set 12: Causality CSCE 668

Drawback of Logical Clocks a  b implies L(a) < L(b), but L(a) < L(b) does not necessarily imply a  b. In previous example, L(g) = 1 and L(b) = 2, but g does not happen before b. Reason is that "happens before" is a partial order, but logical clock values are integers, which are totally ordered. Set 12: Causality CSCE 668

Vector Clocks Generalize logical clocks to provide non- causality information as well as causality information. Implement with values drawn from a partially ordered set instead of a totally ordered set. Assign a value V(e) to each computation event e in an execution such that a  b if and only if V(a) < V(b). Set 12: Causality CSCE 668

Vector Timestamps Algorithm Each pi keeps an n-vector Vi, initially all 0's Entry j in Vi is pi 's estimate of how many steps pj has taken Every msg pi sends is timestamped with current value of Vi At every step, increment Vi[i] by 1 When receiving a message with vector timestamp T, update Vi 's components j ≠ i so that Vi[j] = max(T[j],Vi[j]) If a is an event at pi, then assign V(a) to be value of Vi at end of a. Set 12: Causality CSCE 668

Manipulating Vector Timestamps Let V and W be two n-vectors of integers. Equality: V = W iff V[i] = W[i] for all i. Example: (3,2,4) = (3,2,4) Less than or equal: V ≤ W iff V[i] ≤ W[i] for all i. Example: (2,2,3) ≤ (3,2,4) and (3,2,4) ≤ (3,2,4) Less than: V < W iff V ≤ W but V ≠ W. Example: (2,2,3) < (3,2,4) Incomparable: V || W iff !(V ≤ W) and !(W ≤ V). Example: (3,2,4) || (4,1,4) Set 12: Causality CSCE 668

Manipulating Vector Timestamps The partial order on n-vectors just defined is not the same as lexicographic ordering. Lexicographic ordering is a total order on vectors. Consider (3,2,4) vs. (4,1,4) in the two approaches. Set 12: Causality CSCE 668

Vector Timestamps Example (1,0,0) (1,2,0) (1,3,1) (1,4,1) (0,0,1) (0,0,2) (1,4,3) (2,0,0) (0,1,0) V(g) = (0,0,1) and V(b) = (2,0,0), which are incomparable. Compare with logical clocks L(g) = 1 and L(b) = 2. Set 12: Causality CSCE 668

Correctness of Vector Timestamps Theorem (6.5 & 6.6): Vector timestamps implement vector clocks. Proof: First, show a  b implies V(a) < V(b). Case 1: a and b both occur at pi, a first. Since Vi increases at each step, Set 12: Causality CSCE 668

Correctness of Vector Timestamps Case 2: a occurs at pi and causes m to be sent, while b occurs at pj and includes the receipt of m. During b, pj updates its vector timestamp in such a way that V(a) ≤ V(b). pi 's estimate of number of steps taken by pj is never an over-estimate. Since m is not received before it is sent, pi 's estimate of the number of steps taken by pj when a occurs is less than the number of steps taken by pj when b occurs. So V(a)[j] < V(b)[j]. Thus V(a) < V(b). Set 12: Causality CSCE 668

Correctness of Vector Timestamps Case 3: There exists c such that a  c and c  b. By induction (from Cases 1 and 2) and transitivity of <, V(a) < V(b). Next show V(a) < V(b) implies a  b. Equivalent to showing !(a  b) implies !(V(a) < V(b)) Set 12: Causality CSCE 668

Correctness of Vector Timestamps Suppose a occurs at pi, b occurs at pj, and a does not happen before b. Let V(a)[i] = k. Since a does not happen before b, there is no chain of messages from pi to pj originating at pi 's k-th step or later and ending at pj before b. Thus V(b)[i] < k. Thus !(V(a) < V(b)). Set 12: Causality CSCE 668

Size of Vector Timestamps Vector timestamps are big: n components in each one values in the components grow without bound Is there a more efficient way to implement vector clocks? Answer is NO, at least under some conditions. Set 12: Causality CSCE 668

Vector Clock Size Lower Bound Theorem (6.9): Any implementation of vector clocks using vectors of real numbers requires vectors of length n (number of processors). Proof: For any value of n, consider this execution: Set 12: Causality CSCE 668

Example Bad Execution For n = 4: Set 12: Causality CSCE 668

Vector Clock Size Lower Bound Claim 1: ai+1 || bi for all i (with wraparound) Proof: Since each proc. does all sends before any receives, there is no transitivity. Also pi+1 does not send to pi. Claim 2: ai+1  bj for all j ≠ i. Proof: If j = i+1, obvious. If j ≠ i+1, then pi+1 sends to pj: Set 12: Causality CSCE 668

Vector Clock Size Lower Bound Suppose in contradiction, there is a way to implement vector clocks with k-vectors of reals, where k < n. By Claim 1, ai+1 || bi => V(ai+1) and V(bi) are incomparable => V(ai+1) is larger than V(bi) in some coordinate h(i) => h : {0,…,n-1}  {0,…,k} Set 12: Causality CSCE 668

Vector Clock Size Lower Bound Since k < n, the function h is not 1-1. So there exist distinct i and j such that h(i) = h(j). Let r be this common value of h. V(a0) V(a1) … V(ai+1) V(aj+1) V(an-1) V(b0) V(bi) V(bj) V(bn-2) V(bn-1) > in h(0) comp > in h(i) comp > in h(j) comp > in h(n-2) comp > in h(n-1) comp two of these components are the same, say h(i) = h(j) = r Set 12: Causality CSCE 668

Vector Clock Size Lower Bound V(bi) > in component r V(ai+1) > in component r, contradicts aj+1  bi ≤ in all components, since ai+1  bj V(bj) > in component r V(aj+1) Set 12: Causality CSCE 668

Vector Clock Size Lower Bound So V(ai+1) is larger than V(bi) in coordinate r and V(aj+1) is larger than V(bj) in coordinate r also. V(aj+1)[r] > V(bj)[r] by def. of r ≥ V(ai+1)[r] by Claim 2 (ai+1  bj) & correct. ≥ V(bi)[r] by def. of r Thus V(aj+1) !< V(bi), contradicting Claim 2 (aj+1  bi) and assumed correctness of V. Set 12: Causality CSCE 668

Application of Causality: Consistent Cuts Consider an asynchronous message passing system with FIFO message delivery per channel at most one msg received per computation step Number the computation steps of each processor 1,2,3,… A cut of an execution is K = (k0,…,kn-1), where ki indicates number of computation steps taken by pi Set 12: Causality CSCE 668

Consistent Cuts In a consistent cut K = (k0,…,kn-1), if step s of pj some cuts In a consistent cut K = (k0,…,kn-1), if step s of pj happens before step ki of pi, then s ≤ kj. (1,3) and (2,4) are consistent. (3,6) is inconsistent: step 4 by p0 happens before step 6 of p1, but 4 is greater than 3. Set 12: Causality CSCE 668

Finding a Recent Consistent Cut Problem Version 1: Processors all given a cut K and must find a maximal consistent cut that is ≤ K. Application: Logging-based crash recovery. Procs periodically write their state to stable storage When a proc recovers from a crash, it tries to recover to latest logged state, but needs to coordinate with other procs Set 12: Causality CSCE 668

Vector Clocks Solution Implement vector clocks using vector timestamps appended to application msgs. Store the vector clock of each computation step in a local array store[1,…] When pi is given input cut K: for x := K[i] downto 1 do if store[x] ≤ K then return x return x (entry for pi of global answer) Set 12: Causality CSCE 668

What About Channel State? Processor states are not sufficient to capture entire system state. Messages in transit must be calculated. Solution here requires additional storage (number of messages) additional computation at recovery time (involving replaying original execution to capture messages sent but not received) Set 12: Causality CSCE 668

Another Take on Recent Consistent State Problem Version 2: A subset of procs initiate (at arbitrary times) trying to find a consistent cut that includes the state of at least one of the initiators when it started. Called a distributed snapshot. Snapshot info can be collected at one proc. and then analyzed. Application: termination detection Set 12: Causality CSCE 668

Marker Algorithm initially answer = -1 and num = 0 Instead of adding extra information on each application message, insert control messages ("markers") into the channels. Code for pi: initially answer = -1 and num = 0 when application msg arrives: num++; do application action when marker arrives or when initiating snapshot: if answer = -1 then answer := num // pi's part of final answer send marker to all neighbors Set 12: Causality CSCE 668

What About Channel States? pi records sequence of msgs received from pj between the time pi records its answer and the time pi gets the marker from pj These are the msgs in transit from pj to pi in the cut returned by the algorithm. Set 12: Causality CSCE 668