Linear Time Byzantine Self- Stabilizing Clock Synchronization Ariel Daliot 1, Danny Dolev 1, Hanna Parnas 2 1 School of Engineering and Computer Science,

Slides:



Advertisements
Similar presentations
Causality in online gaming Objectives – Understand how online gaming relates to causality research in distributed systems – Be able to apply distributed.
Advertisements

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Impossibility of Distributed Consensus with One Faulty Process
CS 542: Topics in Distributed Systems Diganta Goswami.
Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
Teaser - Introduction to Distributed Computing
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Byzantine Generals Problem: Solution using signed messages.
1 Complexity of Network Synchronization Raeda Naamnieh.
Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Randomized and Quantum Protocols in Distributed Computation Michael Ben-Or The Hebrew University Michael Rabin’s Birthday Celebration.
Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.
Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Composition Model and its code. bound:=bound+1.
Time Supriya Vadlamani. Asynchrony v/s Synchrony Last class: – Asynchrony Event based Lamport’s Logical clocks Today: – Synchrony Use real world clocks.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Add Fault Tolerance – order & time Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Optimal Clock Synchronization T.K. Srikanth.
1 Clock Synchronization Ronilda Lacson, MD, SM. 2 Introduction Accurate reliable time is necessary for financial and legal transactions, transportation.
Lecture 2-1 CS 425/ECE 428 Distributed Systems Lecture 2 Time & Synchronization Reading: Klara Nahrstedt.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Selected topics in distributed computing Shmuel Zaks
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.
Global Time in Distributed Real-Time Systems Dr. Konstantinos Tatas.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
Spring 2003CS 4611 Replication Outline Failure Models Mirroring Quorums.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
6 SYNCHRONIZATION. introduction processes synchronize –exclusive access. –agree on the ordering of events much more difficult compared to synchronization.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
Distributed Systems Lecture 5 Time and synchronization 1.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
The consensus problem in distributed systems
Routing: Distance Vector Algorithm
Agreement Protocols CS60002: Distributed Systems
Maya Haridasan April 15th
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Linear Time Byzantine Self- Stabilizing Clock Synchronization Ariel Daliot 1, Danny Dolev 1, Hanna Parnas 2 1 School of Engineering and Computer Science, 2 Department of Neurobiology and the Otto Loewi Center for Cellular and Molecular Neurobiology, The Hebrew University of Jerusalem, Israel This research is supported in part by Intel COMM Grant - Internet Network/Transport Layer & QoS Environment (IXA)

Lecture Outline What is “Pulse Synchronization” Examples of pulse synchronization in nature A biologically inspired pulse synchronization algorithm for distributed computer networks Efficient Byzantine Self-Stabilizing clock synchronization above pulse synchronization

The target is to synchronize pulses from any state and any faults.....| | | | |.... ……...| | | | | | | | | |.....t ……………......| | | | …......| | | | | …………….| | | |.....| ……......| | | | | |||||| ||.....|||......||......||......| ||.||.||.....|......|| …… …….....| |||.||.||.|||| | |||||||||||||||||||||...||||.||||…... cycle Synchronized state Arbitrary state

Self-Stabilizing “Pulse Synchronization” Convergence: Starting from an arbitrary state s, the system reaches a synchronized state in finite time Closure: If s is a synchronized state of the system at real-time t 0 then  real-time t≥ t 0 : 1.The system state at time t is a synchronized state 2.«Linear Envelope», for every correct node p: a[t-t 0 ] + b  ψ p (t, t 0 )  g[ t-t 0 ] + hLinear Envelope Ψ p (t 1,t 2 ) is is the number of pulses a correct node p i invoked during a real time interval [t 1,t 2 ] within which p i was continuously correct

“Pulse Synchronization”, in distributed computer systems The computers are required to: Invoke regular pulses in tight synchrony Synchronize from ANY state (self-stabilization) Limit the pulse frequency Tolerate a number of node and network faults Examples of other synchronization problems: “Firing Squad”, “Clock Synchronization”, etc.

Fault Models Many problems trivial with no faults, some unsolvable with a single fault (E.g. Byzantine Generals) Common fault models: Crash/Link/Message faults Byzantine failures (“malicious” faults) –Usually proven to require n>3f to tolerate f faults –Not solvable for some problems Transient faults (system in arbitrary state or total chaos) –Requires Self-Stabilizing algorithms in order to overcome –Not solvable for some problems (Clock Synchronization)

Self-Stabilization Addresses the situation when ALL nodes can concurrently be faulty for a limited period of time A SS algorithm realizes its task once the system is back within the assumption boundaries Is orthogonal to Byzantine failures, i.e. these are uncorrelated fault models Byzantine algorithms typically focus on limiting the influence of faulty nodes once the task has been realized Self-stabilizing algorithms focus on realizing the task following a “catastrophic” state

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology The phenomenon of synchronization is displayed by many biological systems –Synchronized flashing of the male malaccae fireflies –Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm –Crickets that chirp in unison –Coordinated mass spawning in corals –Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters

Synchrony phenomena in biology Synchronization typically attained in spite of: –Inherent difference among the biological elements –High levels of noise from external sources or from the biological elements Displays high degree of “Self-Stabilization”

Cardiac ganglion of the lobster (Sivan, Dolev & Parnas, 2000) Four interneurons tightly synchronize their pulses in order to give the heart its optimal pulse rate (though one is enough for activation) Able to adjust the synchronized firing pace, up to a certain bound (e.g. while escaping a predator) motor neurons |..|.. |.|.||. |..|..

Cardiac ganglion of the lobster (Sivan, Dolev & Parnas, 2000) Must not fire out of synchrony for prolonged times in spite of –Noise –Single neuron death –Inherent variations in the firing rate –Firing frequency regulating Neurohormones –Temperature change The vitality of the cardiac ganglion suggests it has evolved to be optimized for –Fault tolerance –Re-synchronization from any state (“self-stabilization”) –Tight synchronization –Fast re-synchronization

We present a Byzantine self-stabilizing pulse synchronization algorithm which works similarly to the cardiac pacemaker Input to the algorithm: n, f (f<⅓. n) and cycle Output of the algorithm: tells the node when to invoke a pulse The algorithm is proven to solve the pulse synchronization problem

The neurobiological principles «borrowed» Endogenous Periodic Spiking Time dependent Refractory Function, R Summation Nodes communicate with Messages which are then Summated to generate the Counter that is sent in the message and compared to the Refractory Function

“Pulse Synchronization” Algorithm

.....| | | | | |. ……...| | | | | | | | | | | |. t ……………......| | | | |. …......| | | | | |. …………….| | | |.....| |..……......| | | | | |......|||||| ||.....|||......||......||......| ||.||.||.....|......|| …… …….....| |||.||.||.|||| | |||||||||||||||||||||...||||.||||… Algorithm Analysis and Complexity arbitrary initial state tightness = d (near optimal) f < ⅓. n log(n) cycles½-1 cycle

Algorithm Complexity Tolerates f concurrent Byzantine faults, f < ⅓. n Self-stabilizing, synchronizes from an arbitrary state Faults can cause the synchronized pulse frequency to be up to twice the original frequency Synchronization is VERY tight, d time units, theoretical bound is –The best non-stabilizing Byzantine CS reaches d time units –Self-Stabilizing Byzantine CS reaches (n-f). d Time complexity is O(logn) –Best Self-stabilizing Byzantine Digital Clock Synchronization converges in expected (n-f). n 6(n-f) pulses –Non-stabilizing Byzantine Clock Synchronization use O(f)-O(n 2 ) time

A related problem – real-time-Clock Synchronization (rCS) There exists γ, t 0, ν, a and b such that  t≥t 0 : Agreement. For any correct nodes p, q |C p (t) - C q (t)| ≤ γ, (precision) Validity. For every correct node p (1+ν) -1 t +a ≤ C p (t) ≤ (1+ν)t + b, (accuracy) Optimal precision is d. (1-1/n) Optimal accuracy is ν = 

real-time-Clock Synchronization rCS has two additional constraints over pulse synchronization: –The pulses have labels (“the time”) –The time needs to approximate real time Most Byzantine rCS use the following principles: –At every time the computers exchange clock values –They operate some function on the received values (which seeks to neutralize the effect of the Byzantine values and set the clocks close to each other)

Clock Synchronization simple example… All computers exchange clocks at time X They must reach time X at roughly the same time Assume some computer receives the values: [435, 2, 2, 3, 2, 935, 3, 3, 2, 2, 6984] Throw away the values that look bad Do some statistical function on what is left and set the clock accordingly It can be proven this leaves all the computers with very close clock values that are good estimates of real time

real-time-Clock Synchronization impossibility result with no external time source => This works only if the clocks initially have close values => Which implies rCS cannot be solved when all clocks hold arbitrary times =>Which means there is no self-stabilizing algorithm for rCS I.e. if the clocks are initially far apart they cannot both synchronize AND estimate real time => Internal rCS assume clocks are initially synched

FAB8 AIT WSR - ww16-17/2003 Main outages and issues: …. synchronize problem in VAX - On WW Impact: 8 CW SC's unable to introduce lots for a period of 4 hr and 15 min.( from 23:00 until 03:15). Root cause: - the job which synchronize the time between the VAXes failed on 22:00 (Thursday Night) and created gaps between the machines clocks. This gap caused the remotes which worked with CW* to get to loop status, with error message of FCM message. Solution: Time synchronized. Helpdesk will get alerts when this job will fail again.

A Distributed System according to Lamport “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.”

Applicability of logical clocks Many algorithms depending on clock synchronization actually only require synchronized logical clocks E.g.: TDMA, Kerberos tickets, DHCP leases, global snapshots, data base time stamps and many others Why then not use a self-stabilizing Byzantine clock synchronization algorithm that synchronizes logical time?

Self-Stabilizing Byzantine Clock Synchronization Because the known previously best self- stabilizing Byzantine clock synchronization algorithm converges in expected (n-f). n 6(n-f) time! (Dolev-Welch, 95) The difficulty lies in the fact that: –the initial clock values can differ arbitrarily –there is no agreed time for exchanging the values and setting the clock according to the values received –the clocks can wrap around

Clock Synchronization using synchronized pulses We assume no outside source of real-time At every pulse exchange clock values and operate some clock adjustment function on the received multiset If clocks were initially close to real time then they will stay close to real time If not then the clocks will proceed synchronously close to logical time This scheme yields a Byzantine self-stabilizing clock synchronization algorithm with convergence time, accuracy and precision on the order of existing rCS

The Byzantine Self-Stabilizing Clock Synchronization Algorithm At “pulse” event Begin Clock := ET; Wait for every correct node to invoke a pulse; ET := SS-Strong-Byz-Agreement(ET + cycle mod M); End ET - Expected Time of next pulse Cycle - Expected elapsed logical time until next pulse M - Bound on clock value

The Byzantine Self-Stabilizing Pulse Synchronization Algorithm if (cycle_countdown = 0) then send “Propose-Pulse” message to all; if (received f+1 distinct “Propose-Pulse” messages) then send “propose-Pulse” message to all; if (received n-f distinct “Propose-Pulse” messages) then invoke “pulse” event; cycle_countdown := cycle; flush “Propose-Pulse” message counter; ignore “Propose-Pulse” messages for 2d(1+  ) time;

The Self-Stabilizing Byzantine Strong Agreement Algorithm Any Strong Byzantine Agreement algorithm can be used Agreement and validity is not ensured until the pulses synchronize Self-stabilization is supported by counting recovering nodes as correct only following cycle+time-for-BA of correct behavior We use a slightly modified version of the Toueg, Perry and Srikanth (1987) Strong Byzantine Agreement algorithm It has the advantage of “early stopping”: if all correct nodes start with identical values then termination is within 2 rounds Hence, during continuous correct system behavior clock synchronization is maintained with very little overhead

*global pulse Comparison chart AlgorithmSelf- Stabilizing /Byzantine Precision  AccuracyConvergence Time Messages PULSE-CSSS+BYZ 2d + O(  ) cycle+2(2f+5)dO(nf 2 ) NOADJUST- CS SS+BYZ 2d + O(  ) cycleO(n 2 ) DHSSBYZ d+O(  )(f+1)d+ O(  ) 2(f+1)dO(n 2 ) LL-APPROXBYZ 5  + O(  )  + O(  )d+O(  ) O(n 2 ) DW-SYNCH*SS+BYZ00M. 2 2(n-f) n 2. M. 2 2(n-f) DW-BYZ-SSSS+BYZ 4(n-f)  + O(  )(n-f)  +O(  ) O(n O(n) ) PT-SYNC*SS004n 2 O(n 2 )

The teaching of Pythagoras “Evolution is the law of life” “Unity is the law of God” “Number is the law of the universe”

“Everything in the universe is subject to predictable progressive cycles”. Pythagoras,

Future research: Noise tolerance of reciprocal cyclic/oscillatory behavior Things are not static, events happen again and again… Almost everything displays some degree of oscillatory behavior Examples: earth rotation, the moon spins around its axis with exactly the same period as its period of rotation around the earth, yearly seasons, day-night, human metabolic cycle, electron rotation, moods, fashion, culture, hormone levels, animal population levels, menstruation period, virus epidemics, animal migration, trends in politics, stocks, etc, etc, … The phenomena have a reciprocal effect on each other What mechanisms prevent from entering a chaotic mess? Refractoriness can be used to explain this point in simple terms It is a previously unused element in non-linear dynamics

Questions?

“ Synchronization is rewarding… ”

Related Problems Digital Clock Synchronization –Agreement on pulse counters, with or without a global pulse Clock Synchronization –Common notion of real time, high precision and accuracy Phase Clocks –Agreement on pulse counters in asynchronous settings Synchronized Rates –Clocks progress at approximately the same rate, the times may differ Firing Squad –All nodes enter the same state in step k after a process has initialized fire Pulse Synchronization –Precise synchronization of regular pulses, slack linear envelope accuracy

“Pulse Synchronization” algorithm (n, f, cycle) 1. Send a Message consisting of the Counter (“Pulse”) 2. Summate messages from the other computers received within a «time window» and update the Counter accordingly 3. If (Counter ≥ current Threshold) Then Goto 1 Else Goto 2

“Linear Envelope”

The hunting of the R k Given cycle, f and n, the solution to the whole problem is by solving the set of linear constraints on R

The hunting of the R k Given cycle, f and n, the solution to the whole problem is by solving the set of linear constraints on R How many equations (partitions) are there?

The hunting of the R k Given cycle, f and n, the solution to the whole problem is by solving the set of linear constraints on R How many equations (partitions) are there? Turns out there is no analytic solution to this question, (integer partitioning), the number is denoted P(k)

The hunting of the R k Given cycle, f and n, the solution to the whole problem is by solving the set of linear constraints on R How many equations (partitions) are there? Turns out there is no analytic solution to this question, (integer partitioning), the number is denoted P(k) Asymptotically equals e.g. P(50) = P(1000) =

Still hunting R k No computer can solve this exponential set of constraints or do Gaussian elimination Some other method for reaching a solution (if exists) is sought after