CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

Slides:



Advertisements
Similar presentations
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 14: Simulations 1.
Advertisements

Global States.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
CPSC 668Set 14: Simulations1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 7: Mutual Exclusion with Read/Write Variables1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 5: Synchronous LE in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Complexity of Network Synchronization Raeda Naamnieh.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 13: Clocks1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 17: Fault-Tolerant Register Simulations1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Formal Model for Simulations Instructor: DR. Lê Anh Ngọc Presented by – Group 6: 1. Nguyễn Sơn Hùng 2. Lê Văn Hùng 3. Nguyễn Xuân Hậu 4. Nguyễn Xuân Tùng.
Modeling Process CSCE 668Set 14: Simulations 2 May be several algorithms (processes) runs on each processor to simulate the desired communication system.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1.
Modelling III: Asynchronous Shared Memory Model Chapter 9 by Nancy A. Lynch presented by Mark E. Miyashita.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 15: Broadcast 1.
Consensus and Its Impossibility in Asynchronous Systems.
Time Bounds for Shared Objects in Partially Synchronous Systems Jennifer L. Welch Dagstuhl Seminar on Consistency in Distributed Systems Feb 2013.
DISTRIBUTED ALGORITHMS By Nancy.A.Lynch Chapter 18 LOGICAL TIME By Sudha Elavarti.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
Efficient Fork-Linearizable Access to Untrusted Shared Memory Presented by: Alex Shraer (Technion) IBM Zurich Research Laboratory Christian Cachin IBM.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 5: Synchronous LE in Rings 1.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
“Towards Self Stabilizing Wait Free Shared Memory Objects” By:  Hopeman  Tsigas  Paptriantafilou Presented By: Sumit Sukhramani Kent State University.
An algorithm of Lock-free extensible hash table Yi Feng.
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
Introduction to distributed systems description relation to practice variables and communication primitives instructions states, actions and programs synchrony.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Shared Memory
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Presentation transcript:

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch

CPSC 668Set 16: Distributed Shared Memory2 Distributed Shared Memory A model for inter-process communication Provides illusion of shared variables on top of message passing Shared memory is often considered a more convenient programming platform than message passing Formally, give a simulation of the shared memory model on top of the message passing model We'll consider the special case of –no failures –only read/write variables to be simulated

CPSC 668Set 16: Distributed Shared Memory3 Shared Memory Issues A process will invoke a shared memory operation at some time The simulation algorithm running on the same node will execute some code, possibly involving exchanges of messages Eventually the simulation algorithm will inform the process of the result of the shared memory operation. So shared memory operations are not instantaneous! –Operations (invoked by different processes) can overlap What should be returned by operations that overlap other operations? –defined by a memory consistency condition

CPSC 668Set 16: Distributed Shared Memory4 Sequential Specifications Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency. Object supports operations –invocations –matching responses Set of sequences of operations that are legal

CPSC 668Set 16: Distributed Shared Memory5 Sequential Spec for R/W Registers Operations are reads and writes Invocations are read i (X) and write i (X,v) Responses are return i (X,v) and ack i (X) A sequence of operations is legal iff each read returns the value of the latest preceding write.

CPSC 668Set 16: Distributed Shared Memory6 Memory Consistency Conditions Consistency conditions tie together the sequential specification with what happens in the presence of concurrency. We will study two well-known conditions: –linearizability –sequential consistency We will only consider read/write registers, in the absence of failures.

CPSC 668Set 16: Distributed Shared Memory7 Definition of Linearizability Suppose  is a sequence of invocations and responses. –an invocation is not necessarily immediately followed by its matching response  is linearizable if there exists a permutation  of all the operations in  (now each invocation is immediately followed by its matching response) s.t. –  |X is legal (satisfies sequential spec) for all X, and –if response of operation O 1 occurs in  before invocation of operation O 2, then O 1 occurs in  before O 2 (  respects real-time order of non- concurrent operations in  ).

CPSC 668Set 16: Distributed Shared Memory8 Linearizability Examples write(X,1)ack(X) Suppose there are two shared variables, X and Y, both initially 0 read(Y)return(Y,1) write(Y,1)ack(Y)read(X)return(X,1) p0p0 p1p1 Is this sequence linearizable?Yes - green triangles. What if p 1 's read returns 0? 0 No - see arrow

CPSC 668Set 16: Distributed Shared Memory9 Definition of Sequential Consistency Suppose  is a sequence of invocations and responses.  is sequentially consistent if there exists a permutation  of all the operations in  s.t. –  |X is legal (satisfies sequential spec) for all X, and –if response of operation O 1 occurs in  before invocation of operation O 2 at the same process, then O 1 occurs in  before O 2 (  respects real-time order of operations by the same process in  ).

CPSC 668Set 16: Distributed Shared Memory10 Sequential Consistency Examples write(X,1)ack(X) Suppose there are two shared variables, X and Y, both initially 0 read(Y)return(Y,1) write(Y,1)ack(Y)read(X)return(X,0) p0p0 p1p1 Is this sequence sequentially consistent?Yes - green numbers. What if p 0 's read returns 0? 0 No - see arrows

CPSC 668Set 16: Distributed Shared Memory11 Specification of Linearizable Shared Memory Comm. System Inputs are invocations on the shared objects Outputs are responses from the shared objects A sequence  is in the allowable set iff –Correct Interaction: each proc. alternates invocations and matching responses –Liveness: each invocation has a matching response –Linearizability:  is linearizable

CPSC 668Set 16: Distributed Shared Memory12 Specification of Sequentially Consistent Shared Memory Inputs are invocations on the shared objects Outputs are responses from the shared objects A sequence  is in the allowable set iff –Correct Interaction: each proc. alternates invocations and matching responses –Liveness: each invocation has a matching response –Sequential Consistency:  is sequentially consistent

CPSC 668Set 16: Distributed Shared Memory13 Algorithm to Implement Linearizable Shared Memory Uses totally ordered broadcast as the underlying communication system. Each proc keeps a replica for each shared variable When read request arrives: –send bcast msg containing request –when own bcast msg arrives, return value in local replica When write request arrives: –send bcast msg containing request –upon receipt, each proc updates its replica's value –when own bcast msg arrives, respond with ack

CPSC 668Set 16: Distributed Shared Memory14 The Simulation alg 0 read/writereturn/ack to-bc-sendto-bc-recv Totally Ordered Broadcast alg n-1 read/writereturn/ack to-bc-sendto-bc-recv … user of read/write shared memory Shared Memory

CPSC 668Set 16: Distributed Shared Memory15 Correctness of Linearizability Algorithm Consider any admissible execution  of the algorithm –underlying totally ordered broadcast behaves properly –users interact properly Show that , the restriction of  to the events of the top interface, satisfies Liveness, and Linearizability.

CPSC 668Set 16: Distributed Shared Memory16 Correctness of Linearizability Algorithm Liveness (every invocation has a response): By Liveness property of the underlying totally ordered broadcast. Linearizability: Define the permutation  of the operations to be the order in which the corresponding broadcasts are received. –  is legal: because all the operations are consistently ordered by the TO bcast. –  respects real-time order of operations: if O 1 finishes before O 2 begins, O 1 's bcast is ordered before O 2 's bcast.

CPSC 668Set 16: Distributed Shared Memory17 Why is Read Bcast Needed? The bcast done for a read causes no changes to any replicas, just delays the response to the read. Why is it needed? Let's see what happens if we remove it.

CPSC 668Set 16: Distributed Shared Memory18 Why Read Bcast is Needed write(1) read return(1) read return(0) to-bc-send p0p0 p1p1 p2p2

CPSC 668Set 16: Distributed Shared Memory19 Algorithm for Sequential Consistency The linearizability algorithm, without doing a bcast for reads: Uses totally ordered broadcast as the underlying communication system. Each proc keeps a replica for each shared variable When read request arrives: –immediately return the value stored in the local replica When write request arrives: –send bcast msg containing request –upon receipt, each proc updates its replica's value –when own bcast msg arrives, respond with ack

CPSC 668Set 16: Distributed Shared Memory20 Correctness of SC Algorithm Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the per-proc. order of writes. Lemma (9.4): If p i writes Y and later reads X, then p i 's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read).

CPSC 668Set 16: Distributed Shared Memory21 Correctness of the SC Algorithm (Theorem 9.5) Why does SC hold? Given any admissible execution , must come up with a permutation  of the shared memory operations that is –legal and –respects per-proc. ordering of operations

CPSC 668Set 16: Distributed Shared Memory22 The Permutation  Insert all writes into  in their to-bcast order. Consider each read R in  in the order of invocation: –suppose R is a read by p i of X –place R in  immediately after the later of 1.the operation by p i that immediately precedes R in , and 2.the write that R "read from" (caused the latest update of p i 's local copy of X preceding the response for R)

CPSC 668Set 16: Distributed Shared Memory23 Permutation Example write(2) read return(2) read return(1) to-bc-send p0p0 p1p1 p2p2 ack write(1)ack to-bc-send permutation is given by green numbers

CPSC 668Set 16: Distributed Shared Memory24 Permutation  Respects Per Proc. Ordering For a specific proc: Relative ordering of two writes is preserved by Lemma 9.3 Relative ordering of two reads is preserved by the construction of  If write W precedes read R in exec. , then W precedes R in  by construction Suppose read R precedes write W in . Show same is true in .

CPSC 668Set 16: Distributed Shared Memory25 Permutation  Respects Ordering Suppose R and W are swapped in  : –There is a read R' by p i that equals or precedes R in –There is a write W' that equals W or follows W in the to- bcast order –And R' "reads from" W'. But: –R' finishes before W starts in  and –updates are done to local replicas in to-bcast order (Lemma 9.3) so update for W' does not precede update for W –so R' cannot read from W'. R' RW  |p i :  : …W … W' … R' … R …

CPSC 668Set 16: Distributed Shared Memory26 Permutation  is Legal Consider some read R by p i and some write W s.t. R reads from W in . Suppose in contradiction, some other write W' falls between W and R in  : Why does R follow W' in  ?  : …W … W' … R …

CPSC 668Set 16: Distributed Shared Memory27 Permutation  is Legal Case 1: R follows W' in  because W' is also by p i and R follows W' in . Update for W at p i precedes update for W' at p i in  (Lemma 9.3). Thus R does not read from W, contradiction.

CPSC 668Set 16: Distributed Shared Memory28 Permutation  is Legal Case 2: R follows W' in  due to some operation O by p i s.t. –O precedes R in , and –O is placed between W' and R in   : …W … W' … O … R … Case 2.1: O is a write. update for W' at p i precedes update for O at p i in  (Lemma 9.3) update for O at p i precedes p i 's local read for R in  (Lemma 9.4) So R does not read from W, contradiction.

CPSC 668Set 16: Distributed Shared Memory29 Permutation  is Legal  : …W … W' … O' … O … R … Case 2.2: O is a read. A recursive argument shows that there exists a read O' by p i (which might equal O) that –reads from W' in  and –appears in  between W' and O Update for W at p i precedes update for W' at p i in  (Lemma 9.3). Update for W' at p i precedes local read for O' at p i in  (otherwise O' would not read from W'). Recall that O' equals or precedes O (from above) and O precedes R (by assumption for Case 2) in  Thus R cannot read from W, contradiction.

CPSC 668Set 16: Distributed Shared Memory30 Performance of SC Algorithm Read operations are implemented "locally", without requiring any inter-process communication. Thus reads can be viewed as "fast": time between invocation and response is that needed for some local computation. Time for writes is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented).

CPSC 668Set 16: Distributed Shared Memory31 Alternative SC Algorithm It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance: –writes are local/fast (even though bcasts are sent, don't wait for them to be received) –reads can require waiting for some bcasts to be received Like the previous SC algorithm, this one does not implement linearizable shared memory.

CPSC 668Set 16: Distributed Shared Memory32 Time Complexity for DSM Algorithms One complexity measure of interest for DSM algorithms is how long it takes for operations to complete. The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally-ordered broadcast message to be received. The sequential consistency algorithm required D time for writes and C time for reads, where C is the time for doing some local computation. Can we do better? To answer this question, we need some kind of timing model.

CPSC 668Set 16: Distributed Shared Memory33 Timing Model Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast). Assume that every message has delay in the range [d-u,d]. Claim: Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d).

CPSC 668Set 16: Distributed Shared Memory34 Time and Clocks in Layered Model Timed execution: associate an occurrence time with each node input event. Times of other events are "inherited" from time of triggering node input –recall assumption that local processing time is negligible. Model hardware clocks as before: run at same rate as real time, but not synchronized Notions of view, timed view, shifting are same: –Shifting Lemma still holds (relates h/w clocks and msg delays between original and shifted execs)

CPSC 668Set 16: Distributed Shared Memory35 Lower Bound for SC Let T read = worst-case time for a read to complete Let T write = worst-case time for a write to complete Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, T read + T write  d.

CPSC 668Set 16: Distributed Shared Memory36 SC Lower Bound Proof Consider any SC simulation with T read + T write < d. Let X and Y be two shared variables, both initially 0. Let  0 be admissible execution whose top layer behavior is write 0 (X,1) ack 0 (X) read 0 (Y) return 0 (Y,0) –write begins at time 0, read ends before time d –every msg has delay d Why does  0 exist? –The alg. must respond correctly to any sequence of invocations. –Suppose user at p 0 wants to do a write, immediately followed by a read. –By SC, read must return 0. –By assumption, total elapsed time is less than d.

CPSC 668Set 16: Distributed Shared Memory37 SC Lower Bound Proof Similarly, let  1 be admissible execution whose top layer behavior is write 1 (Y,1) ack 1 (Y) read 1 (X) return 1 (X,0) –write begins at time 0, read ends before time d –every msg has delay d  1 exists for similar reason. Now merge p 0 's timed view in  0 with p 1 's timed view in  1 to create admissible execution  '. But  ' is not SC, contradiction!

CPSC 668Set 16: Distributed Shared Memory38 SC Lower Bound Proof time0d write(X,1)read(Y,0) p0p0 p1p1 00 write(Y,1) read(X,0) p0p0 p1p1 11 write(X,1)read(Y,0) p0p0 p1p1 '' write(Y,1)read(X,0)

CPSC 668Set 16: Distributed Shared Memory39 Linearizability Write Lower Bound Theorem (9.8): In any simulation of linearizable shared memory on top of point-to-point message passing, T write ≥ u/2. Proof: Consider any linearizable simulation with T write < u/2. Let be an admissible exec. whose top layer behavior is: p 1 writes 1 to X, p 2 writes 2 to X, p 0 reads 2 from X Shift to create admissible exec. in which p 1 and p 2 's writes are swapped, causing p 0 's read to violate linearizability.

CPSC 668Set 16: Distributed Shared Memory40 Linearizability Write Lower Bound 0u/2 u time: p0p0 p1p1 p2p2 write 1 read 2 write 2  : p0p0 p1p1 p2p2 delay pattern d - u/2 d d - u

CPSC 668Set 16: Distributed Shared Memory41 Linearizability Write Lower Bound 0u/2 u time: p0p0 p1p1 p2p2 write 1 read 2 write 2 p0p0 p1p1 p2p2 delay pattern d d - u d d shift p 1 by u/2 shift p 2 by -u/2

CPSC 668Set 16: Distributed Shared Memory42 Linearizability Read Lower Bound Approach is similar to the write lower bound. Assume in contradiction there is an algorithm with T read < u/4. Identify a particular execution: –fix a pattern of read and write invocations, occurring at particular times –fix the pattern of message delays Shift this execution to get one that is –still admissible –but not linearizable

CPSC 668Set 16: Distributed Shared Memory43 Linearizability Read Lower Bound Original execution: p 1 reads X and gets 0 (old value). Then p 0 starts writing 1 to X. When write is done, p 0 reads X and gets 1 (new value). Also, during the write, p 0 and p 1 alternate reading X. At some point, the reads stop getting the old value (0) and start getting the new value (1)

CPSC 668Set 16: Distributed Shared Memory44 Linearizability Read Lower Bound Set all delays in this execution to be d - u/2. Now shift p 2 earlier by u/2. Verify that result is still admissible (every delay either stays the same or becomes d or d - u). But in shifted execution, sequence of values read is 0, 0, …, 0, 1, 0, 1, 1, …, 1

CPSC 668Set 16: Distributed Shared Memory45 p0p0 p1p1 p2p2 Linearizability Read Lower Bound read 0 read 1 read 0 read 1 read 0 write 1 u/2 p0p0 p1p1 read 0 read 1 p2p2 read 0 write 1