UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

UPV / EHU 2 Mikel Larrea − Mannheim, May 2011 Context and Seminal Papers In the Consensus problem, all correct processes propose a value and must reach a unanimous and irrevocable decision on some proposed value [FLP85] M. Fischer, N. Lynch, M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 1985 [CT96] T. Chandra, S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996 [CHT96] T. Chandra, V. Hadzilacos, S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 1996

UPV / EHU 3 Mikel Larrea − Mannheim, May 2011 Motivation

UPV / EHU 4 Mikel Larrea − Mannheim, May 2011 Motivation++ (Zurich, July 2010)

UPV / EHU 5 Mikel Larrea − Mannheim, May 2011 Crash Failure Detectors [CT96]

UPV / EHU 6 Mikel Larrea − Mannheim, May 2011 Strengthening Completeness

UPV / EHU 7 Mikel Larrea − Mannheim, May 2011 Guest Stars: P and Omega P: strong completeness, eventual strong accuracy –Eventually every process that crashes is permanently suspected by every correct process –There is a time after which correct processes are not suspected by any correct process Omega satisfies the following property: –There is a time after which all the correct processes always trust the same correct process What is a correct process? –It depends on the failure model :-)

UPV / EHU 8 Mikel Larrea − Mannheim, May 2011 FD-based Consensus

UPV / EHU 9 Mikel Larrea − Mannheim, May 2011 Fault-tolerant Architecture

UPV / EHU 10 Mikel Larrea − Mannheim, May 2011 Outline Part I: Crash Environments –(Near-) Communication-efficient algorithms for P –Communication-optimal algorithms for P Part II: Crash-Recovery Environments –Implementing Omega with/without stable storage –Communication-efficient algorithms for Omega –From Omega to P –Fault-tolerant aggregator election and data aggregation in wireless sensor networks Part III: Omission Environments –Secure failure detection and consensus in TrustedPals –Communication-efficient algorithm for P

UPV / EHU Part I: P in Crash Environments Joint work with Roberto Cortiñas, Alberto Lafuente, Iratxe Soraluze, Joachim Wieland

UPV / EHU 12 Mikel Larrea − Mannheim, May 2011 The First P Algorithm [CT96]

UPV / EHU 13 Mikel Larrea − Mannheim, May 2011 Part I. Summary of Results Efficient implementations of P –Nearly communication-efficient algorithms (n+C links are used forever) Q-based, transformations –Communication-efficient algorithms (n links) Pure ring-based, optimizations Optimal implementations of P –Communication-optimal algorithms (C links) RBcast-based, one-to-one, one-to-all

UPV / EHU 14 Mikel Larrea − Mannheim, May 2011 Reliable Broadcast [CT96] “All correct processes deliver the same set of messages”

UPV / EHU 15 Mikel Larrea − Mannheim, May 2011 P in Crash Environments [WLL07] J. Wieland, M. Larrea, A. Lafuente. An evaluation of ring-based algorithms for the Eventually Perfect failure detector class. 15th International Conference on Parallel, Distributed and Network-based Processing, 2007 [LSCL08] M. Larrea, I. Soraluze, R. Cortiñas, A. Lafuente. An Evaluation of Communication-Optimal P Algorithms. 16th International Conference on Parallel, Distributed and Network-based Processing, 2008

UPV / EHU Joint work with José Javier Astrain, Ernesto Jiménez, Cristian Martín, Iratxe Soraluze Part II: Omega in Crash-Recovery Environments

UPV / EHU 17 Mikel Larrea − Mannheim, May 2011 Part II. Summary of Results Redefinition of Omega –Take into account unstable processes –Take into account the availability of stable storage Implementation of Omega –With and without stable storage –Efficient algorithms From Omega to P Fault-tolerant aggregator election and data aggregation in wireless sensor networks

UPV / EHU 18 Mikel Larrea − Mannheim, May 2011 From Omega to P

UPV / EHU Joint work with Roberto Cortiñas, Felix Freiling, Marjan Ghajar-Azadanlou, Alberto Lafuente, Lucia Penso, Iratxe Soraluze Part III: P in Omission Environments

UPV / EHU 20 Mikel Larrea − Mannheim, May 2011 Part III. Summary of Results Reduction from Byzantine to omission –Processes are equipped with tamper proof security modules (e.g., smartcards) Actually, omission + buffering/timing attacks Omission models –send | receive | general –permanent | transient –non-selective | selective

UPV / EHU 21 Mikel Larrea − Mannheim, May 2011 Part III. Summary of Results Impossibility result –P is impossible to implement in the (transient) general omission model Redefinition and implementation of P –In-connected and out-connected processes –All-to-all communication, sequence numbers, connectivity matrix P-based Consensus –Termination: every in-connected process eventually decides –Adaptation of Chandra-Toueg’s algorithm

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU Thank you! mikel.larrea@ehu.es

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.

Similar presentations

Presentation on theme: "UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.

Similar presentations

Presentation on theme: "UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems."— Presentation transcript:

Similar presentations

About project

Feedback