1 Principles of Reliable Distributed Systems Lecture 10: Atomic Shared Memory Objects and Shared Memory Emulations Spring 2007 Prof. Idit Keidar.

Slides:



Advertisements
Similar presentations
© 2005 P. Kouznetsov Computing with Reads and Writes in the Absence of Step Contention Hagit Attiya Rachid Guerraoui Petr Kouznetsov School of Computer.
Advertisements

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Prof. Jennifer Welch 1. FIFO Queue Example 2  Sequential specification of a FIFO queue:  operation with invocation enq(x) and response ack  operation.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Concurrent Objects and Consistency The Art of Multiprocessor Programming Spring 2007.
Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
A Mile-High View of Concurrent Algorithms Hagit Attiya Technion.
Chair of Software Engineering Concurrent Object-Oriented Programming Prof. Dr. Bertrand Meyer Lecture 5: Concurrent Objects.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Today: –Introduction –Consistency models Data-centric consistency.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Concurrent Objects Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640, University.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 17: Fault-Tolerant Register Simulations1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Distributed Algorithms (22903) Lecturer: Danny Hendler Shared objects: linearizability, wait-freedom and simulations Most of this presentation is based.
Idit Keidar, Topics in Reliable Distributed Systems, Technion EE, Winter Topics in Reliable Distributed Systems Winter Dr.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
1 © R. Guerraoui Seth Gilbert Professor: Rachid Guerraoui Assistants: M. Kapalka and A. Dragojevic Distributed Programming Laboratory.
Concurrent Objects Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Concurrent Objects Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Please read sections 3.7 and 3.8.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Linearizability By Mila Oren 1. Outline  Sequential and concurrent specifications.  Define linearizability (intuition and formal model).  Composability.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
6.852: Distributed Algorithms Spring, 2008 Class 13.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
CORRECTNESS CRITERIA FOR CONCURRENCY & PARALLELISM 6/16/2010 Correctness Criteria for Parallelism & Concurrency 1.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Concurrent Objects MIT 6.05s by Maurice Herlihy & Nir Shavit.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
“Towards Self Stabilizing Wait Free Shared Memory Objects” By:  Hopeman  Tsigas  Paptriantafilou Presented By: Sumit Sukhramani Kent State University.
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
Distributed Algorithms (22903) Lecturer: Danny Hendler Shared objects: linearizability, wait-freedom and simulations Most of this presentation is based.
Concurrent Objects Companion slides for
Concurrent Objects Companion slides for
Concurrent Objects.
Concurrent Objects Companion slides for
Distributed Algorithms (22903)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Replication Improves reliability Improves availability
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Algorithms (22903)
Distributed Algorithms (22903)
EEC 688/788 Secure and Dependable Computing
Distributed systems Consensus
Fault-Tolerant SemiFast Implementations of Atomic Read/Write Registers
Presentation transcript:

1 Principles of Reliable Distributed Systems Lecture 10: Atomic Shared Memory Objects and Shared Memory Emulations Spring 2007 Prof. Idit Keidar

2 Material Attiya and Welch, Distributed Computing –Ch. 9 & 10 Nancy Lynch, Distributed Algorithms –Ch. 13 & 17 Linearizability slides adapted from Maurice Herlihy

3 Shared Memory Model All communication through shared memory! –No message-passing. Shared memory registers/objects. Accessed by processes with ids 1,2,… Note: we have two types of entities: objects and processes.

4 Motivation Multiprocessor architectures with shared memory Multi-threaded programs Distributed shared memory (DSM) Abstraction for message passing systems –We will see how to emulate shared memory in message passing systems. –We will see how to use shared memory for consensus and state machine replication.

5 Linearizability Semantics for Concurrent Objects

6 FIFO Queue: Enqueue Method q.enq ( ) Process

7 FIFO Queue: Dequeue Method q.deq()/ Process

8 Sequential Objects Each object has a state –Usually given by a set of fields –Queue example: sequence of items Each object has a set of methods –Only way to manipulate state –Queue example: enq and deq methods

9 Methods Take Time time Method call invocation 12:00 q.enq(... ) response 12:01 void

10 Split Method Calls into Two Events Invocation –method name & args –q.enq(x) Response –result or exception –q.enq(x) returns void –q.deq() returns x –q.deq() throws empty

11 A Single Process (Thread) Sequence of events First event is an invocation Alternates matching invocations and responses This is called a well-formed interaction

12 Concurrent Methods Take Overlapping Time time Method call

13 Concurrent Objects What does it mean for a concurrent object to be correct? What is a concurrent FIFO queue? –FIFO means strict temporal order –Concurrent means ambiguous temporal order Help!

14 Sequential Specifications Precondition, say for q.deq( … ) –Queue is non-empty Postcondition: –Returns & removes first item in queue You got a problem with that?

15 Concurrent Specifications Naïve approach –Object has n methods –Must specify O(n 2 ) possible interactions –Maybe more If the quque is empty and then enq begins and deq begins after enq(x) begins but before enq(x) ends then … Linearizability: same as it ever was

16 Linearizability Each method should – –“Take effect” Effect defined by the sequential specification –Instantaneously Take 0 time –Between its invocation and response events.

17 Linearization A linearization of a concurrent execution  is –A sequential execution Each invocation is immediately followed by its response Satisfies the object’s sequential specification –Looks like  Responses to all invocations are the same as in  –Preserves real-time order Each invocation-response pair occurs between the corresponding invocation and response in 

18 Linearizability and Atomicity A concurrent execution that has a linearization is linearizable. An object that has only linearizable executions is atomic.

19 Why Linearizability? “Religion”, not science Scientific justification: –Facilitates reasoning –Nice mathematic properties Common-sense justification –Preserves real-time order –Matches my intuition (sorry about yours)

20 Example time q.enq(x) q.enq(y)q.deq(x) q.deq(y) time

21 Example time q.enq(x) q.enq(y) q.deq(y)

22 Example time q.enq(x) q.deq(x) time

23 Example time q.enq(x) q.enq(y) q.deq(y) q.deq(x) time

24 Read/Write Variable Example time read(1)write(0) write(1) time read(0)

25 Read/Write Variable Example time read(1)write(0) write(1) write(2) time read(1)

26 Read/Write Variable Example time read(1)write(0) write(1) write(2) time read(2)

27 Concurrency How much concurrency does linearizability allow? When must a method invocation block? Focus on total methods –defined in every state –why?

28 Concurrency Question: when does linearizability require a method invocation to block? Answer: never Linearizability is non-blocking

29 Non-Blocking Theorem If method invocation A q.invoc() is pending in linearizable history H, then there exists a response A q:resp() such that H + A q:resp() is legal

30 Note on Non-Blocking A given implementation of linearizability may be blocking The property itself does not mandate it –for every pending invocation, there is always a possible return value that does not violate linearizability –the implementation may not always know it…

31 Atomic Objects An object is atomic if all of its concurrent executions are linearizable What if we want an atomic operation on multiple objects?

32 Serializability A transaction is a finite sequence of method calls A history is serializable if –transactions appear to execute serially Strictly serializable if –order is compatible with real-time Used in databases

33 Serializability is Blocking x.read(0) y.read(0)x.write(1) y.write(1)

34 Comparison Serializability appropriate for –fault-tolerance –multi-step transactions Linearizability appropriate for –single objects –multiprocessor synchronization

35 Critical Sections Easy way to implement linearizability –take sequential object –make each method a critical section Like synchronized methods in Java™ Problems? –Blocking –No concurrency

36 Linearizability Summary Linearizability –Operation takes effect instantaneously between invocation and response Uses sequential specification –No O(n 2 ) interactions Non-Blocking –Never required to pause method call Granularity matters

37 Atomic Register Emulation in a Message-Passing System [ Attiya, Bar-Noy, Dolev ]

38 Distributed Shared Memory (DSM) Can we provide the illusion of atomic shared-memory registers in a message- passing system? In an asynchronous system? Where processes can fail?

39 Liveness Requirement Wait-freedom (wait-free termination): every operation by a correct process p completes in a finite number of p’s steps Regardless of steps taken by other processes –In particular, the other processes may fail or take any number of steps between p’s steps –But p must be given a chance to take as many steps as it needs. (Fairness).

40 Register Holds a value Can be read Can be written Interface: –int read(); /* returns a value */ –void write(int v); /* returns ack */

41 Take I: Failure-Free Case Each process keeps a local copy of the register Let’s try state machine replication –Step1: Implement atomic broadcast (how?) Recall: atomic broadcast service interface: –broadcast(m) –deliver(m)

42 Emulation with Atomic Broadcast (Failure-Free) Upon client request ( read / write ), –Broadcast the request Upon deliver write request –Write to local copy of register –If from local client, return ack to client Upon deliver read request –If from local client, return local register value to client Homework questions: –Show that the emulated register is atomic –Is broadcasting reads required for atomicity?

43 What If Processes Can Crash? Does the same solution work?

44 ABD: Fault-Tolerant Emulation [ Attiya, Bar-Noy, Dolev ] Assumes up to f<n/2 processes can fail Main ideas: –Store value at majority of processes before write completes –read from majority –read intersects write, hence sees latest value

45 Take II: 1-Reader 1-Writer (SRSW) Single-reader – there is only one process that can read from the register Single-writer – there is only one process that can write to the register The reader and writer are just 2 processes; –The other n-2 processes are there to help

46 Trivial Solution? Writer simply sends message to reader –When does it return ack ? –What about failures? We want a wait-free solution: –if the reader (writer) fails, the writer (reader) should be able to continue writing (reading)

47 SRSW Algorithm: Variables At each process: –x, a copy of the register –t, initially 0, unique tag associated with latest write

48 SRSW Algorithm Emulating Write To perform write(x,v) –choose tag > t –set x ← v; t ← tag –send (“write”, v, t) to all Upon receive (“write”, v, tag) –if (tag > t) then set x ← v; t ← tag fi –send (“ack”, v, tag) to writer When writer receives (“ack”, v, t) from majority (counting an ack from itslef too) –return ack to client

49 SRSW Algorithm Emulating Read To perform read(x,v) –send (“read”) to all Upon receive (“read”) –send (“read-ack”, x, t) to reader When reader receives (“read-ack”, v, tag) from majority (including local values of x and t) –choose value v associated with largest tag –store these values in x,t –return x

50 Does This Work? Only possible overlap is between read and write –why? When a read does not overlap any write – –it reads at least one copy that was written by the latest write (why?) –this copy has the highest tag (why?) What is the linearization order when there is overlap? What if 2 read s overlap the same write ?

51 Example time read(1)read(?) write(1) time

52 Wait-Freedom Only waiting is for majority of responses There is a correct majority All correct processes respond to all requests –Respond even if the tag is smaller

53 Take III: n-Reader 1-Writer (MRSW) n-reader – all the processes can read Does the previous solution work? What if 2 read s by different processes overlap the same write ?

54 Example time read(1) read(?) write(1) time

55 MRSW Algorithm Extending the Read When reader receives (“read-ack”, v, tag) from majority –choose value v associated with largest tag –store these values in x,t –send (“propagate”, x, t) to all (except writer) Upon receive (“propagate”, v, tag) from process i –if (tag > t) then set x ← v; t ← tag fi –send (“prop-ack”, x, t) to process i When reader receives (“prop-ack”, v, tag) from majority (including itself) –return x

56 The Complete Read S1 S2 Sn S1 S2 Sn S1 (“read”)(“read-ack”,v, t) Phase 1Phase 2 : Multi-reader only read() return (“propagate”, v, t) (“prop-ack”)

57 Take IV: n-Reader n-Writer (MRMW) n-writer – all the processes can write to the register Does the previous solution work?

58 Playing Tag What if two writers use the same tag for writing different values? Need to ensure unique tags –That’s easy: break ties, e.g., by process id What if a later write uses a smaller tag than an earlier one? –Must be prevented (why?)

59 MRMW Algorithm Extending the Write To perform write(x,v) –send (“query”) to all Upon receive (“query”) from i –send (“query-ack”, t) to i When writer receives (“query-ack”, tag) from majority (counting its own tag) –choose unique tag > all received tags –continue as in 1-writer algorithm What if another writer chooses a higher tag before write completes?

60 The Complete Write S1 S2 Sn S1 S2 Sn S1 (“query”)(“query-ack”, t) Phase 1: Multi-writer onlyPhase 2 write(v) ack (“write”, v, t)(“ack”)

61 How Long Does it Take? The write emulation –Single-writer: 2 rounds (steps) –Multi-writer: 4 rounds (steps) The read emulation –Single-reader: 2 rounds (steps) –Multi-reader: 4 rounds (steps)

62 What if A Majority Can Fail? You guessed it! Homework question

63 Can We Emulate Every Atomic Object the Same Way?

64 Difference from Consensus Works even if the system is completely asynchronous In Paxos, there is no progress when there are multiple leaders Here, there is always progress – multiple writers can write concurrently –One will prevail (Which?)