Presentation is loading. Please wait.

Presentation is loading. Please wait.

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.

Similar presentations


Presentation on theme: "UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems."— Presentation transcript:

1 UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

2 UPV / EHU 2 Mikel Larrea − Mannheim, May 2011 Context and Seminal Papers In the Consensus problem, all correct processes propose a value and must reach a unanimous and irrevocable decision on some proposed value [FLP85] M. Fischer, N. Lynch, M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 1985 [CT96] T. Chandra, S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996 [CHT96] T. Chandra, V. Hadzilacos, S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 1996

3 UPV / EHU 3 Mikel Larrea − Mannheim, May 2011 Motivation

4 UPV / EHU 4 Mikel Larrea − Mannheim, May 2011 Motivation++ (Zurich, July 2010)

5 UPV / EHU 5 Mikel Larrea − Mannheim, May 2011 Crash Failure Detectors [CT96]

6 UPV / EHU 6 Mikel Larrea − Mannheim, May 2011 Strengthening Completeness

7 UPV / EHU 7 Mikel Larrea − Mannheim, May 2011 Guest Stars: P and Omega P: strong completeness, eventual strong accuracy –Eventually every process that crashes is permanently suspected by every correct process –There is a time after which correct processes are not suspected by any correct process Omega satisfies the following property: –There is a time after which all the correct processes always trust the same correct process What is a correct process? –It depends on the failure model :-)

8 UPV / EHU 8 Mikel Larrea − Mannheim, May 2011 FD-based Consensus

9 UPV / EHU 9 Mikel Larrea − Mannheim, May 2011 Fault-tolerant Architecture

10 UPV / EHU 10 Mikel Larrea − Mannheim, May 2011 Outline Part I: Crash Environments –(Near-) Communication-efficient algorithms for P –Communication-optimal algorithms for P Part II: Crash-Recovery Environments –Implementing Omega with/without stable storage –Communication-efficient algorithms for Omega –From Omega to P –Fault-tolerant aggregator election and data aggregation in wireless sensor networks Part III: Omission Environments –Secure failure detection and consensus in TrustedPals –Communication-efficient algorithm for P

11 UPV / EHU Part I: P in Crash Environments Joint work with Roberto Cortiñas, Alberto Lafuente, Iratxe Soraluze, Joachim Wieland

12 UPV / EHU 12 Mikel Larrea − Mannheim, May 2011 The First P Algorithm [CT96]

13 UPV / EHU 13 Mikel Larrea − Mannheim, May 2011 Part I. Summary of Results Efficient implementations of P –Nearly communication-efficient algorithms (n+C links are used forever) Q-based, transformations –Communication-efficient algorithms (n links) Pure ring-based, optimizations Optimal implementations of P –Communication-optimal algorithms (C links) RBcast-based, one-to-one, one-to-all

14 UPV / EHU 14 Mikel Larrea − Mannheim, May 2011 Reliable Broadcast [CT96] “All correct processes deliver the same set of messages”

15 UPV / EHU 15 Mikel Larrea − Mannheim, May 2011 P in Crash Environments [WLL07] J. Wieland, M. Larrea, A. Lafuente. An evaluation of ring-based algorithms for the Eventually Perfect failure detector class. 15th International Conference on Parallel, Distributed and Network-based Processing, 2007 [LSCL08] M. Larrea, I. Soraluze, R. Cortiñas, A. Lafuente. An Evaluation of Communication-Optimal P Algorithms. 16th International Conference on Parallel, Distributed and Network-based Processing, 2008

16 UPV / EHU Joint work with José Javier Astrain, Ernesto Jiménez, Cristian Martín, Iratxe Soraluze Part II: Omega in Crash-Recovery Environments

17 UPV / EHU 17 Mikel Larrea − Mannheim, May 2011 Part II. Summary of Results Redefinition of Omega –Take into account unstable processes –Take into account the availability of stable storage Implementation of Omega –With and without stable storage –Efficient algorithms From Omega to P Fault-tolerant aggregator election and data aggregation in wireless sensor networks

18 UPV / EHU 18 Mikel Larrea − Mannheim, May 2011 From Omega to P

19 UPV / EHU Joint work with Roberto Cortiñas, Felix Freiling, Marjan Ghajar-Azadanlou, Alberto Lafuente, Lucia Penso, Iratxe Soraluze Part III: P in Omission Environments

20 UPV / EHU 20 Mikel Larrea − Mannheim, May 2011 Part III. Summary of Results Reduction from Byzantine to omission –Processes are equipped with tamper proof security modules (e.g., smartcards) Actually, omission + buffering/timing attacks Omission models –send | receive | general –permanent | transient –non-selective | selective

21 UPV / EHU 21 Mikel Larrea − Mannheim, May 2011 Part III. Summary of Results Impossibility result –P is impossible to implement in the (transient) general omission model Redefinition and implementation of P –In-connected and out-connected processes –All-to-all communication, sequence numbers, connectivity matrix P-based Consensus –Termination: every in-connected process eventually decides –Adaptation of Chandra-Toueg’s algorithm

22 UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU Thank you! mikel.larrea@ehu.es


Download ppt "UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems."

Similar presentations


Ads by Google