Distributed Algorithms (22903)

Slides:



Advertisements
Similar presentations
1 © R. Guerraoui Universal constructions R. Guerraoui Distributed Programming Laboratory.
Advertisements

Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.
1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Universality of Consensus The Art of Multiprocessor Programming Spring 2007.
Prof. Jennifer Welch 1. FIFO Queue Example 2  Sequential specification of a FIFO queue:  operation with invocation enq(x) and response ack  operation.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Distributed Algorithms (22903) Lecturer: Danny Hendler Shared objects: linearizability, wait-freedom and simulations Most of this presentation is based.
What Can Be Implemented Anonymously ? Paper by Rachid Guerraui and Eric Ruppert Presentation by Amir Anter 1.
Art of Multiprocessor Programming 1 Universality of Consensus Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
Wait-Free Consensus CPSC 661 Fall 2003 Supervised by: Lisa Higham Presented by: Wei Wei Zheng Nuha Kamaluddeen.
Common2 extended to stacks and unbound concurrency By:Yehuda Afek Eli Gafni Adam Morrison May 2007 Presentor: Dima Liahovitsky 1.
1 Consensus Hierarchy Part 2. 2 FIFO (Queue) FIFO Object headtail.
Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book.
Distributed Algorithms (22903) Lecturer: Danny Hendler The Atomic Snapshot Object The Renaming Problem This presentation is based on the book “Distributed.
Concurrent Computing Seminar Introductory Lecture Instructor: Danny Hendler
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
Distributed Algorithms (22903) Lecturer: Danny Hendler Shared objects: linearizability, wait-freedom and simulations Most of this presentation is based.
Distributed Algorithms (22903) Lecturer: Danny Hendler Lock-free stack algorithms.
Distributed Algorithms (22903) Lecturer: Danny Hendler Approximate agreement This presentation is based on the book “Distributed Computing” by Hagit attiya.
Process Management Deadlocks.
Wait-Free Consensus CPSC 661 Fall 2003 Supervised by: Lisa Higham
Distributed Algorithms (22903)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Processes and threads.
Chapter 3: Process Concept
PARALLEL PROGRAM CHALLENGES
O(log n / log log n) RMRs Randomized Mutual Exclusion
O(log n / log log n) RMRs Randomized Mutual Exclusion
Distributed Algorithms (22903)
Chapter 5: Process Synchronization
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Algorithms (22903)
Lesson Objectives Aims
Operating Systems Lecture 6.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
CS510 - Portland State University
Module 7a: Classic Synchronization
Process Description and Control
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Software Transactional Memory Should Not be Obstruction-Free
Process Description and Control
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CS210- Lecture 5 Jun 9, 2005 Agenda Queues
Multiprocessor Synchronization Algorithms ( )
ITEC452 Distributed Computing Lecture 7 Mutual Exclusion
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Chapter 6: Synchronization Tools
Distributed Algorithms (22903)
Distributed systems Consensus
Nir Shavit Multiprocessor Synchronization Spring 2003
More concurrency issues
Presentation transcript:

Distributed Algorithms (22903) The wait-free hierarchy and the universality of consensus Lecturer: Danny Hendler This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch

Formally: the Consensus Object Supports a single operation: decide Each process pi calls decide with some input vi from some domain. decide returns a value from the same domain. The following requirements must be met: Agreement: In any execution E, all decide operations must return the same value. Validity: The values returned by the operations must equal one of the inputs.

Wait-free consensus can be solved easily by compare&swap Comare&swap(b,old,new) atomically v read from b if (v = old) { b new return success } else return failure; How? Motorola 680x0 IBM 370 Sun SPARC 80X86 MIPS PowerPC DECAlpha

Would this consensus algorithm from reads/writes work? Initially decision=null Decide(v) ; code for pi, i=0,1 if (decision = null) decision=v return v else return decision

A proof that wait-free consensus for 2 or more processes cannot be solved by registers.

A FIFO queue Supports 2 operations: q.enqueue(x) – returns ack q.dequeue – returns the first item in the queue or empty if the queue is empty.

FIFO queue + registers can implement 2-process consensus Initially Q=<0> and Prefer[i]=null, i=0,1 Decide(v) ; code for pi, i=0,1 Prefer[i]:=v qval=Q.deq() if (qval = 0) then return v else return Prefer[1-i] There is no wait-free implementation of a FIFO queue shared by 2 or more processes from registers

A proof that wait-free consensus for 3 or more processes cannot be solved by FIFO queue (+ registers)

The wait-free hierarchy We say that object type X solves wait-free n-process consensus if there exists a wait-free consensus algorithm for n processes using only shared objects of type X and registers. The consensus number of object type X is n, denoted CN(X)=n, if n is the largest integer for which X solves wait-free n-process consensus. It is defined to be infinity if X solves consensus for every n. Lemma: If CN(X)=m and CN(Y)=n>m, then there is no wait-free implementation of Y from instances of X and registers in a system with more than m processes.

The wait-free hierarchy (cont’d) registers 1 FIFO queue, stack, test-and-set 2 … Compare-and-swap 

The universality of conensus An object is universal if, together with registers, it can implement any other object in a wait-free manner. We will show that any object X with consensus number n is universal in a system with n or less processes An algorithm is lock-free if it guarantees that some operation terminates after some finite total number of steps performed by processes. The lock-freedom progress property is weaker than wait-freedom.

Universal constructions Given the sequential specification of any object, implement a linearizable wait-free concurrent version of it: A lock-free construction using CAS A lock-free construction using consensus A wait-free construction using consensus A bounded-memory wait-free construction using consensus

A lock-free universal algorithm using CAS Each operation is represented by a shared record of type opr. typedef opr structure { inv ;the operation invocation, including its parameters new-state ;the new state of the object, after applying the operation response ;The response of the operation } Head inv new-state response inv new-state response … inv new-state response

A lock-free universal algorithm using CAS (cont’d) Head anchor inv new-state response inv new-state response … inv new-state=init response Initially Head points to the anchor record. Head.newstate is initialized with the implemented object’s initial state. When inv occurs point:=new opr, point.inv:=inv repeat h:=Head point.new-state, point.response=apply(inv, h.new-state) until compare&swap(Head, h, point)=h return point.response

A lock-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) Head anchor seq seq=1 seq … inv new-state response after inv=null new-state=init response=null after inv new-state response after

… A lock-free universal algorithm using consensus (cont’d) Head anchor seq seq=1 seq … inv new-state response after inv=null new-state=init response=null after inv new-state response after Initially all Head entries points to the anchor record. When inv occurs point:=new opr, point.inv:=inv for j=0 to n-1 ; find a record with the maximum sequenece number if Head[j].seq > Head[i].seq then Head[i]=Head[j] repeat win:=decide(Head[i].after,point) ; try to thread your operation win.seq:=Head[i].seq+1 < win.new-state, win.response > :=apply(win.inv, Head[i].new-state) Head[i]=win ; point to the following record until win=point return point.response

inv new-state response after A wait-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) We add a helping mechanism Announce inv new-state response after seq When performing operation with sequence number j, try to help process (j mod n)

A wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. When inv occurs Announce[i]:=new opr, Announce[i].inv:=inv,Announce[i].seq:=0 for j=0 to n-1 ; find a record with the maximum sequenece number if Head[j].seq > Head[i].seq then Head[i]=Head[j] while Announce[i].seq=0 do priority:=Head[i].seq+1 mod n ; ID of process with priority if Announce[priority].seq=0 ; If help is needed then point:=Announce[priority] ; help the other process else point:=Announce[i] ; perform own operation win:=decide(Head[i].after, point) < win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state) win.seq:=Head[i].seq+1 Head[i]=win return Announce[i].reponse

A proof that the universal algorithm using consensus is wait-free

What is the number of records needed by the algorithm? A bounded-memory wait-free universal algorithm using consensus What is the number of records needed by the algorithm? Unbounded! The following algorithm uses a bounded # of records Each process allocates records from its private pool A record is recycled once we’re sure it will not be referenced anymore We don’t need this mechanism if we use a language with a GC (such as Java)

A bounded-memory wait-free universal algorithm using consensus (cont’d) When can we recycle record #k? No process trying to thread record (k+n+1) or higher will write record k. After all the processes that thread records k…k+n terminate, record k can be freed. When process p finishes threading record m it releases records m-1…m-n. After record k is released by the operations threading records k+1…k+n – it can be recycled.

A bounded-memory wait-free universal algorithm using consensus: data structures Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) before ;A pointer to the previous record released[1..n] initially true Head anchor inv new-state response before after seq inv new-state response before after seq … inv new-state response before after seq

A bounded-memory wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. When inv occurs point:=a free record from private pool, point.inv:=inv,point.seq:=0 for r:=1 to n do point.released[r]:=false, Announce[i]:=point for j=0 to n-1 ; find a record with the maximum sequenece number if Head[j].seq > Head[i].seq then Head[i]=Head[j] while Announce[i].seq=0 do priority:=Head[i].seq+1 mod n ; ID of process with priority if Announce[priority].seq=0 ; If help is needed then point:=Announce[priority] ; help the other process else point:=Announce[i] ; perform own operation win:=decide(Head[i].after, point) < win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state) win.before:=Head[i] win.seq:=Head[i].seq+1 Head[i]=win temp:=Announce[i].before for r:=1 to n do if temp<> anchor then before-temp:=temp.before, temp.released[r]:=true, temp:= before-temp return Announce[i].response

How many records are required by the algorithm? Each incomplete operation may waste n distinct records There may be up to n incomplete operations At any point in time, up to n2 non-recycable records All non-recycable records may belong to same process! Each pool should have O(n2) records, O(n3) total records needed