Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot.

Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot

Plan  Introduction Objectifs et contexte  Objets et mémoire partagée Mémoire partagée linearisabilté Implémentation wait-free Universalité du consensus  Communication par messages Détecteurs de défaillances Implémentation de la mémoire partagée Implémentation d'objets partagés Hiérarchie du consensus et détecteurs de défaillances  Conclusion(s)

Introduction et contexte  Possible – impossible (FLP)  Mémoire partagée - communication par échanges de messages  Objets partagés: Comparaison et hiérarchie:  un test-and-set est-il plus puissant qu'un compare-and-swap?  Vers les transactions

Introduction…  Détecteur de défaillances: Détecteur minimal et comparaison (connaissance nécessaire et suffisante sur les pannes) hiérarchie des problèmes Consensus  Accord sur une valeur Registres Exclusion mutuelle Le plus faible des plus faibles  K-set consensus (accord sur au plus k-valeurs)

Shared memory  Set of processes p1, …, pn (process=sequential thread)  Processes are asynchronous a step can take an arbitrary (finite) time  Processes communicate trough shared data structures (objects) examples: shared memory, test-and-set, queue..

Objects:  an object is defined by its type e.g.: the type of R is atomic register  the type of the object defines a set of possible states and a set of primitives operations e.g.: the state of the register is the value stored, the primitives are read() write(v)  processes access objects by primitives operations

Objects:  we consider here only atomic objects a sequential specification defines the behavior of the object (a transition system) linearizability (=atomicity)  operations of concurrent processes may overlap, but each operation appears to take effect instantaneously between its invocation and its response: the operation appears to be atomic crashes:  if a process crashes between an invocation and the corresponding response the operation completes or aborts  every invocation by correct processes terminates

Example: atomic register  States : the value stored (  initially)  Operations: read() and write(v)  Sequential specification: read() returns the value stored write(v) changes the state of the register (the new state is v)  Linearizability: each time interval between a request / answer of an operation can be reduced to a point such that the history of read/write satisfies the specification

Atomic register  With only one writer linearizability is here equivalent to: a read returns the last value written  if a read is concurrent with a write the read returns either the previous written value or the value of a concurrent write()  if a read operation r precedes another read operation r' then r' cannot return a value written before the one returned by r can be generalized to multi-writer atomic registers

Linearizable Write 1Write 00 Read 0 Write 1Write 00 Read 1

Linearizable? Write 10 Read 1 Read 0 impossible

Another example  consensus: sequential specification propose(0)propose(1) decide(1)/propose(*) decide(0)/propose(*)

Another example  RMW RMW(r register, f function) returns value previous := r r :=f(r) return previous  from RMW we get test-and-set, swap, compare-and-swap.

Implementation  Given some objects O1, …, Om and processes p1, …, pn is-it possible to implement another object O? Wait-free implementation:  the implementation is correct (in an intuitive sense)  every invocation from correct processes terminates  moreover a correct process can always terminate its invocation with only its own steps (with objects O1,…,Om)

Wait-free  Wait-free implementation As each process can always finish the work alone, a wait-free implementation tolerate any number of (crash failure) very strong assumption!

Wait-free implementations  Consider k-consensus (i.e. consensus between k processes)  Let the consensus number for object X be the largest k such that k-consensus can be implemented with X and atomic registers  (clearly if consensus number for O is strictly greater than consensus number for O', there is no implementation for O using only O')

Wait-free implementations  Results registers have consensus number equals to 1 (FLP) test-and-set has consensus number equals to 2 … for each n there some objects with consensus number n

Example  FIFO queue: decide(v) returns val prefer[P]:=v if deq(q) =  then return prefer[P] else return prefer[Q] With FIFO and registers it is possible to get 2-consensus but not 3-consensus

Results  Universality of consensus (Herlihy): the n-consensus is universal in a system of n processes: every object shared by n processes can be (wait-free) implemented with n-consensus and registers  (principle of the proof: with help of a n- consensus processes agree on the history of the object)

Plan  Objects shared memory model linearizability wait-free implementation Main results: universality of consensus  Message passing failure detectors shared memory implementation object implementation Consensus Hierarchy with failure detectors  Conclusion

Message passing  The previous results prove that generally (at least) objects with consensus number >1 cannot be implemented with only registers  Instead of sharing data structures it is interesting to consider message passing models message passing: processes don't share data but can send and receive messages (Note that message passing could be defined in the previous general framework– communication channels are then the shared data structures)

Message passing model  Processes communicate by messages  Communication is asynchronous (no bound on communication delays)  Communication is point-to-point and reliable  Processes can fail by crashing  Message passing models are suitable and natural for networks  (shared objects models are more suitable for hardware)

Message passing  In message passing it is interesting to implement objects: objects are easier to work with some objects are natural in message passing models (e.g. registers consensus)

Atomic register: practical point of view  Data server  Ensure safety properties If a value is written it is available (even if the writer disappears) When a process ends its write() then all next read() will return this value (or a value written later) –note that the writer knows when the write ends

Shared register implementation  With only one reader and one writer and a majority of correct processes (sketch): for the k-th write  to write(v): the writer sends (v,k) to all processes and waits for receiving an "ack" from a majority of processes.  to read(): the reader asks all processes and waits for receiving an answer (v,k) from a majority of processes; the value read is the value with the greatest k  when a process receives (v,k) from the writer it stores (v,k) and then sends an "ack" to the server  when a process receives a query from the reader it answers with the stored (v,k).

It works…  because: by the majority assumption there is always at least one process that participates to the last write and the read.  then the read returns the last written value  (but this implementation is not really atomic: if the writer crashes during a write, next reads could returns the previous value or the new one. It is not very difficult to fix it: the reader always value with maximal timestamp )  (some classical algorithms enables to implement general atomic registers from atomic register with one reader and one writer)

Implementation issues  in message passing there is no implementation of consensus (even if at most one process can crash)  the implementation of registers needs to have a majority of correct processes

Then … failure detectors  The impossibility results come from crashes (without failure all these problems are easy to solve).  Then: add oracles giving (possibly unreliable) information about crashes. what information about crashes of processes enable to solve the problem? what information about crashes is needed?

Failure detectors  distributed "oracle" F: at each time t a process can ask the failure detector and gets an answer  (generally the answer is a list of processes suspected to be dead)  the output is not the same at each process the output of failure detector F depends only on the history of crashes (not on the states of processes).  Example: perfect failure detector output: lists of suspected processes  if p is in the list for q then p is crashed  if p is crashed then p will eventually belong to the list of suspected processes of q

Failure detector comparison  Reduction: Failure detector F is weaker than failure detector F' (F≤F') if F can be implemented from F' ≤ defines a partial order

Minimal Failure Detector  Given a problem P, F is a minimal failure detector for P if and only if With help of F, P can be solved if F' enables to solve P then F ≤ F'  Then if F is a minimal failure detector for P: F encapsulates the information about crashes needed to solve P

Minimal Failure Detector  Why look for the minimal failure detector? find the needed information about crashes compare problems: if the minimal failure detector for P is weaker than the minimal failure detector for P' then P is easier than P' (from a practical point of view the knowledge of the minimal failure detector helps to find the assumptions on the underlying system to solve the problem)

Then to implement Objects:  In message passing for each object O find the minimal failure detector to implement O from the comparison between these failure detectors we get an hierarchy on these objects  Then we get 2 hierarchies on objects consensus number as defined before minimal failure detector needed for the object

S-register  Begin with registers (consensus number =1) S-register is an atomic register in which only processes in S can read or write (but all processes may participate to its implementation)

Weakest failure detector  with a majority of correct processes atomic registers can be implemented without failure detector  but without a majority of correct processes? Failure detector Σ

Failure detector Σ S  Σ S (p,t) (output for process p of failure detector Σ S at time t) is a list of trusted processes. (q Є Σ S (p,t) means that p considers that q is not dead at time t)  Intersection: for each process p, q in S, for each time tout t, t’ : Σ S (p,t)  Σ S (q,t’) is not empty (at least one process is trusted by p and q)  Completeness: There is a time t such that for each correct process in S for each time t’>t Σ S (p,t’) contains only correct processes

Remarks  with a majority of correct processes Σ S can be implemented in asynchronous systems.  Σ S gives a kind of quorum (a quorum is a family of sets such that two elements of the family always have a non empty intersection).

Theorem  Σ S is the weakest failure detector to implement S-register sufficient part: adapt the previous algorithm necessary part: more difficult…

S-Consensus  S is a set of processes  S-consensus processes in S propose value and have to (irrevocably) decide. The decision has to ensure:  Validity: the decision value has been proposed  Agreement: if p and q decide they decide the same value  Termination: every correct process eventually decides

ΩSΩS  Ω S (p,t) (output for p of failure detector Ω S at time t) is a process (the leader)  Eventual leader election: there is a time t, there is a correct process l, such that for every correct process p in S for all time t’>t Ω S (p,t’)=l  intuitively: after some time all processes agree on the same leader forever

Theorem  Σ S *Ω S is the weakest failure detector for S-consensus. (Σ S *Ω S outputs both Σ S and Ω S )

For the proof  (necessary condition) Adaptation of the proof of Chandra, Hadzilacos et Toueg: from an S-consensus algorithm using a failure detector, implement Ω S With reliable broadcast and S-consensus implement S-register, (then use the previous theorem)

For the proof Sufficient condition process in S forever C:=1 +r mod n Send(Coord, v,r) to C wait for receiving (One,*,r) from C or suspect C in Ω S if receeived (One,w,r) then FromCoord:=w else undef Send(Keep,FromCoord,r) to all wait for receiving (Two,*,r) form all processes in Σ S If there only one value v received decide this value v send (decide,v) to all stop else if received only 2 values (w and undef) then v:=w

all processes  When received (Coord,*,k) for the first time (let (Coord,x,k) this message ) send (One,x,k) to all processes in S  When received (Keep,*,k) for the first time, (let (Keep,x,k) this message ) send Two,x,k) to all processes in S

k-consensus k-consensus = consensus between any subset of k processes Result:  for 2<=k<=n: The weakest failure detector for k-consensus is Σ*Ω

proof (idea):  consider case k=2  From the previous results: the weakest failure detector for 2- consensus is the set of Σ S *Ω S for all subsets with 2 elements

Proof  From these Σ S (S is the set of subsets with two elements) atomic registers can be implemented then we get Σ  From these Ω S (S is the set of subsets with two elements) it is possible to implement Ω: let G=(X,E) the graph where X is the set of processes, and (p,q)ЄE if there is x such that q is an eventual leader pour Ω {p,x}. Consider the strongly connected components of: there is an unique sink connected component and this sink contains (eventually) only correct processes.

p q p has q as leader the sink

Proof (sketch) From this we deduce an algorithm for Ω :all processes approximate this graph and compute the sink: the output of the emulated failure detector is this sink. Eventually, this sink contains only correct processes. (then extract the same leader in this sink) Then we get Ω

Corollary If the consensus number of atomic object T is 2: Then:  The weakest failure detector for T is Σ*Ω  Every failure detector implementing T implements any object.  (in other word T is universal for all n)

Corollary  Concerning message passing models with failure detectors there is only two classes for objects: no consensus k=1 (atomic registers Σ ) k>1 then consensus for every n (Σ*Ω )

Conclusion  In shared memory objects are given (by hardware) and can be compared with consensus number for example: no implementation of compare-&-swap with test-&-set and registers  In message passing with failure detectors objects are implemented and there is (essentially) two classes (with consensus and without consensus) all objects with consensus number>1 are equivalent!  Implementation with shared objects and implementation in message passing with failure detectors are not the same!

Conclusion…  Mémoire transactionnelle: Abortable objects Rendre atomique des séquences de code  La hiérarchie des détecteurs de défaillances Mémoire partagée – message passing  K-set agreement Le plus petit faible des plus faibles?

Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot.

Similar presentations

Presentation on theme: "Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot.

Similar presentations

Presentation on theme: "Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot."— Presentation transcript:

Similar presentations

About project

Feedback