CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1

Distributed Shared Memory CSCE 668Set 16: Distributed Shared Memory 2  A model for inter-process communication  Provides illusion of shared variables on top of message passing  Shared memory is often considered a more convenient programming platform than message passing  Formally, give a simulation of the shared memory model on top of the message passing model  We'll consider the special case of  no failures  only read/write variables to be simulated

The Simulation CSCE 668Set 16: Distributed Shared Memory 3 alg 0 read/writereturn/ack sendrecv Message Passing System alg n-1 read/writereturn/ack sendrecv … users of read/write shared memory Shared Memory

Shared Memory Issues CSCE 668Set 16: Distributed Shared Memory 4  A process invokes a shared memory operation (read or write) at some time  The simulation algorithm running on the same node executes some code, possibly involving exchanges of messages  Eventually the simulation algorithm informs the process of the result of the shared memory operation.  So shared memory operations are not instantaneous!  Operations (invoked by different processes) can overlap  What values should be returned by operations that overlap other operations?  defined by a memory consistency condition

Sequential Specifications CSCE 668Set 16: Distributed Shared Memory 5  Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency.  Object supports operations  invocations  matching responses  Set of sequences of operations that are legal

Sequential Spec for R/W Registers CSCE 668Set 16: Distributed Shared Memory 6  Each operation has two parts, invocation and response  Read operation has invocation read i (X) and response return i (X,v) (subscript i indicates proc.)  Write operation has invocation write i (X,v) and response ack i (X) (subscript i indicates proc.)  A sequence of operations is legal iff each read returns the value of the latest preceding write.  Ex: [write 0 (X,3) ack 0 (X)] [read 1 (X) return 1 (X,3)]

Memory Consistency Conditions CSCE 668Set 16: Distributed Shared Memory 7  Consistency conditions tie together the sequential specification with what happens in the presence of concurrency.  We will study two well-known conditions:  linearizability  sequential consistency  We will only consider read/write registers, in the absence of failures.

Definition of Linearizability CSCE 668Set 16: Distributed Shared Memory 8  Suppose  is a sequence of invocations and responses for a set of operations.  an invocation is not necessarily immediately followed by its matching response, can have concurrent, overlapping ops   is linearizable if there exists a permutation  of all the operations in  (now each invocation is immediately followed by its matching response) s.t.   |X is legal (satisfies sequential spec) for all vars X, and  if response of operation O 1 occurs in  before invocation of operation O 2, then O 1 occurs in  before O 2 (  respects real-time order of non-overlapping operations in  ).

Linearizability Examples CSCE 668Set 16: Distributed Shared Memory 9 write(X,1)ack(X) Suppose there are two shared variables, X and Y, both initially 0 read(Y)return(Y,1) write(Y,1)ack(Y)read(X)return(X,1) p0p0 p1p1 Is this sequence linearizable? Yes - brown triangles. W hat if p 1 's read returns 0? 0 No - see arrow. 1 2 3 4

Definition of Sequential Consistency CSCE 668Set 16: Distributed Shared Memory 10  Suppose  is a sequence of invocations and responses for some set of operations.   is sequentially consistent if there exists a permutation  of all the operations in  s.t.   |X is legal (satisfies sequential spec) for all vars X, and  if response of operation O 1 occurs in  before invocation of operation O 2 at the same process, then O 1 occurs in  before O 2 (  respects real- time order of operations by the same process in  ).

Sequential Consistency Examples CSCE 668Set 16: Distributed Shared Memory 11 write(X,1)ack(X) Suppose there are two shared variables, X and Y, both initially 0 read(Y)return(Y,1) write(Y,1)ack(Y)read(X)return(X,0) p0p0 p1p1 Is this sequence sequentially consistent?Yes - brown numbers. What if p 0 's read returns 0? 0 No - see arrows. 12 34

Specification of Linearizable Shared Memory Comm. System CSCE 668Set 16: Distributed Shared Memory 12  Inputs are invocations on the shared objects  Outputs are responses from the shared objects  A sequence  is in the allowable set iff  Correct Interaction: each proc. alternates invocations and matching responses  Liveness: each invocation has a matching response  Linearizability:  is linearizable

Specification of Sequentially Consistent Shared Memory CSCE 668Set 16: Distributed Shared Memory 13  Inputs are invocations on the shared objects  Outputs are responses from the shared objects  A sequence  is in the allowable set iff  Correct Interaction: each proc. alternates invocations and matching responses  Liveness: each invocation has a matching response  Sequential Consistency:  is sequentially consistent

Algorithm to Implement Linearizable Shared Memory CSCE 668Set 16: Distributed Shared Memory 14  Uses totally ordered broadcast as the underlying communication system.  Each proc keeps a replica for each shared variable  When read request arrives:  send bcast msg containing request  when own bcast msg arrives, return value in local replica  When write request arrives:  send bcast msg containing request  upon receipt, each proc updates its replica's value  when own bcast msg arrives, respond with ack

The Simulation CSCE 668Set 16: Distributed Shared Memory 15 alg 0 read/writereturn/ack to-bc-sendto-bc-recv Totally Ordered Broadcast alg n-1 read/writereturn/ack to-bc-sendto-bc-recv … users of read/write shared memory Shared Memory

Correctness of Linearizability Algorithm CSCE 668Set 16: Distributed Shared Memory 16  Consider any admissible execution  of the algorithm in which  underlying totally ordered broadcast behaves properly  users interact properly (alternate invocations and responses  Show that , the restriction of  to the events of the top interface, satisfies Liveness and Linearizability.

Correctness of Linearizability Algorithm CSCE 668Set 16: Distributed Shared Memory 17  Liveness (every invocation has a response): By Liveness property of the underlying totally ordered broadcast.  Linearizability: Define the permutation  of the operations to be the order in which the corresponding broadcasts are received.   is legal: because all the operations are consistently ordered by the TO bcast.   respects real-time order of operations: if O 1 finishes before O 2 begins, O 1 's bcast is ordered before O 2 's bcast.

Why is Read Bcast Needed? CSCE 668Set 16: Distributed Shared Memory 18  The bcast done for a read causes no changes to any replicas, just delays the response to the read.  Why is it needed?  Let's see what happens if we remove it.

Why Read Bcast is Needed CSCE 668Set 16: Distributed Shared Memory 19 write(1) read return(1) read return(0) to-bc-send p0p0 p1p1 p2p2

Algorithm for Sequential Consistency CSCE 668Set 16: Distributed Shared Memory 20  The linearizability algorithm, without doing a bcast for reads:  Uses totally ordered broadcast as the underlying communication system.  Each proc keeps a replica for each shared variable  When read request arrives:  immediately return the value stored in the local replica  When write request arrives:  send bcast msg containing request  upon receipt, each proc updates its replica's value  when own bcast msg arrives, respond with ack

Correctness of SC Algorithm CSCE 668Set 16: Distributed Shared Memory 21 Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the order of non-overlapping writes - implies per-process order of writes is preserved Lemma (9.4): If p i writes Y and later reads X, then p i 's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read).

Correctness of the SC Algorithm CSCE 668Set 16: Distributed Shared Memory 22 (Theorem 9.5) Why does SC hold?  Given any admissible execution , must come up with a permutation  of the shared memory operations that is  legal and  respects per-proc. ordering of operations

The Permutation  CSCE 668Set 16: Distributed Shared Memory 23  Insert all writes into  in their to-bcast order.  Consider each read R in  in the order of invocation:  suppose R is a read by p i of X  place R in  immediately after the later of 1. the operation by p i that immediately precedes R in , and 2. the write that R "read from" (caused the latest update of p i 's local copy of X preceding the response for R)

Permutation Example CSCE 668Set 16: Distributed Shared Memory 24 write(2) read return(2) read return(1) to-bc-send p0p0 p1p1 p2p2 ack write(1)ack to-bc-send permutation is given by brown numbers 1 3 4 2

Permutation  Respects Per Proc. Ordering CSCE 668Set 16: Distributed Shared Memory 25 For a specific proc:  Relative ordering of two writes is preserved by Lemma 9.3  Relative ordering of two reads is preserved by the construction of   If write W precedes read R in exec. , then W precedes R in  by construction  Suppose read R precedes write W in . Show same is true in .

Permutation  Respects Ordering CSCE 668Set 16: Distributed Shared Memory 26  Suppose in contradiction R and W are swapped in  :  There is a read R' by p i that equals or precedes R in   There is a write W' that equals W or follows W in the to-bcast order  And R' "reads from" W'.  But:  R' finishes before W starts in  and  updates are done to local replicas in to-bcast order (Lemma 9.3) so update for W' does not precede update for W  so R' cannot read from W'. R' RW  |p i :  : …W … W' … R' … R …

Permutation  is Legal CSCE 668Set 16: Distributed Shared Memory 27  Consider some read R of X by p i and some write W s.t. R reads from W in .  Suppose in contradiction, some other write W' to X falls between W and R in  :  Why does R follow W' in  ?  : …W … W' … R …

Permutation  is Legal CSCE 668Set 16: Distributed Shared Memory 28 Case 1: W' is also by p i. Then R follows W' in  because R follows W' in .  Update for W at p i precedes update for W' at p i in  (Lemma 9.3).  Thus R does not read from W, contradiction.

Permutation  is Legal CSCE 668Set 16: Distributed Shared Memory 29 Case 2: W' is not by p i. Then R follows W' in  due to some operation O, also by p i, s.t.  O precedes R in , and  O is placed between W' and R in  Consider the earliest such O. Case 2.1: O is a write (not necessarily to X).  update for W' at p i precedes update for O at p i in  (Lemma 9.3)  update for O at p i precedes p i 's local read for R in  (Lemma 9.4)  So R does not read from W, contradiction.  : …W … W' … O … R …

Permutation  is Legal CSCE 668Set 16: Distributed Shared Memory 30 C ase 2.2: O is a read. By construction of , O must read X and in fact read from W' (otherwise O would not be after W') Update for W at p i precedes update for W' at p i in  (Lemma 9.3). Update for W' at p i precedes local read for O at p i in  (otherwise O would not read from W'). Thus R cannot read from W, contradiction.  : …W … W' … O … R …

Performance of SC Algorithm CSCE 668Set 16: Distributed Shared Memory 31  Read operations are implemented "locally", without requiring any inter-process communication.  Thus reads can be viewed as "fast": time between invocation and response is only that needed for some local computation.  Time for a write is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented).

Alternative SC Algorithm CSCE 668Set 16: Distributed Shared Memory 32  It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance:  writes are local/fast (even though bcasts are sent, don't wait for them to be received)  reads can require waiting for some bcasts to be received  Like the previous SC algorithm, this one does not implement linearizable shared memory.

Time Complexity for DSM Algorithms CSCE 668Set 16: Distributed Shared Memory 33  One complexity measure of interest for DSM algorithms is how long it takes for operations to complete.  The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally- ordered broadcast message to be received.  The sequential consistency algorithm required D time for writes and 0 time for reads, since we are assuming time for local computation is negligible.  Can we do better? To answer this question, we need some kind of timing model.

Timing Model CSCE 668Set 16: Distributed Shared Memory 34  Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast).  Assume that every message has delay in the range [d-u,d].  Claim: Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d).

Time and Clocks in Layered Model CSCE 668Set 16: Distributed Shared Memory 35  Timed execution: associate an occurrence time with each node input event.  Times of other events are "inherited" from time of triggering node input  recall assumption that local processing time is negligible.  Model hardware clocks as before: run at same rate as real time, but not synchronized  Notions of view, timed view, shifting are same:  Shifting Lemma still holds (relates h/w clocks and msg delays between original and shifted execs)

Lower Bound for SC CSCE 668Set 16: Distributed Shared Memory 36 Let T read = worst-case time for a read to complete Let T write = worst-case time for a write to complete Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, T read + T write  d.

SC Lower Bound Proof CSCE 668Set 16: Distributed Shared Memory 37  Consider any SC simulation with T read + T write < d.  Let X and Y be two shared variables, both initially 0.  Let  0 be admissible execution whose top layer behavior is write 0 (X,1) ack 0 (X) read 0 (Y) return 0 (Y,0)  write begins at time 0, read ends before time d  every msg has delay d  Why does  0 exist?  The alg. must respond correctly to any sequence of invocations.  Suppose user at p 0 wants to do a write, immediately followed by a read.  By SC, read must return 0.  By assumption, total elapsed time is less than d.

SC Lower Bound Proof CSCE 668Set 16: Distributed Shared Memory 38 time0d write(X,1)read(Y,0) p0p0 p1p1 00

SC Lower Bound Proof CSCE 668Set 16: Distributed Shared Memory 39  Similarly, let  1 be admissible execution whose top layer behavior is write 1 (Y,1) ack 1 (Y) read 1 (X) return 1 (X,0)  write begins at time 0, read ends before time d  every msg has delay d   1 exists for similar reason.

SC Lower Bound Proof CSCE 668Set 16: Distributed Shared Memory 40 time0d write(X,1)read(Y,0) p0p0 p1p1 00 write(Y,1) read(X,0) p0p0 p1p1 11

SC Lower Bound Proof CSCE 668Set 16: Distributed Shared Memory 41  Now merge p 0 's timed view in  0 with p 1 's timed view in  1 to create admissible execution  '.  But  ' is not SC, contradiction!

SC Lower Bound Proof CSCE 668Set 16: Distributed Shared Memory 42 time0d write(X,1)read(Y,0) p0p0 p1p1 00 write(Y,1) read(X,0) p0p0 p1p1 11 write(X,1)read(Y,0) p0p0 p1p1 '' write(Y,1)read(X,0)

Linearizability Write Lower Bound CSCE 668Set 16: Distributed Shared Memory 43 Theorem (9.8): In any simulation of linearizable shared memory on top of point-to-point message passing, T write ≥ u/2. Proof: Consider any linearizable simulation with T write < u/2.  Let be an admissible exec. whose top layer behavior is: p 1 writes 1 to X, p 2 writes 2 to X, p 0 reads 2 from X  Shift to create admissible exec. in which p 1 and p 2 's writes are swapped, causing p 0 's read to violate linearizability.

Linearizability Write Lower Bound CSCE 668Set 16: Distributed Shared Memory 44 0u/2 u time: p0p0 p1p1 p2p2 write 1 read 2 write 2  : p0p0 p1p1 p2p2 delay pattern d - u/2 d d - u

Linearizability Write Lower Bound CSCE 668Set 16: Distributed Shared Memory 45 0u/2 u time: p0p0 p1p1 p2p2 write 1 read 2 write 2 p0p0 p1p1 p2p2 delay pattern d d - u d d shift p 1 by u/2 shift p 2 by -u/2

Linearizability Read Lower Bound CSCE 668Set 16: Distributed Shared Memory 46  Approach is similar to the write lower bound.  Assume in contradiction there is an algorithm with T read < u/4.  Identify a particular execution:  fix a pattern of read and write invocations, occurring at particular times  fix the pattern of message delays  Shift this execution to get one that is  still admissible  but not linearizable

Linearizability Read Lower Bound CSCE 668Set 16: Distributed Shared Memory 47 Original execution:  p 1 reads X and gets 0 (old value).  Then p 0 starts writing 1 to X.  When write is done, p 0 reads X and gets 1 (new value).  Also, during the write, p 0 and p 1 alternate reading X.  At some point, the reads stop getting the old value (0) and start getting the new value (1)

Linearizability Read Lower Bound CSCE 668Set 16: Distributed Shared Memory 48  Set all delays in this execution to be d - u/2.  Now shift p 2 earlier by u/2.  Verify that result is still admissible (every delay either stays the same or becomes d or d - u).  But in shifted execution, sequence of values read is 0, 0, …, 0, 1, 0, 1, 1, …, 1

Linearizability Read Lower Bound CSCE 668Set 16: Distributed Shared Memory 49 p0p0 p1p1 p2p2 read 0 read 1 read 0 read 1 read 0 write 1 u/2 p0p0 p1p1 read 0 read 1 p2p2 read 0 write 1

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

Similar presentations

Presentation on theme: "CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

Similar presentations

Presentation on theme: "CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1."— Presentation transcript:

Similar presentations

About project

Feedback