Download presentation
Presentation is loading. Please wait.
1
State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010
2
Goals Learn and understand Paxos and Python. Design program for fault-tolerant distributed system using the Paxos algorithm. Test on a real internet scale system, Planet- Lab.
3
The Problem – Distributed Storage Using Distributed Algorithms on a network has many advantages It also has many problems This project focuses on the Synchronization Problem
4
Synchronization The task: Successfully issue a state machine which involves all the computers of a network All the computers need to be in sync regarding the Current State and the Next States. All the computers need to know the transitions.
5
Problems? Can any computer choose the next state? What if a computer disconnects ungracefully? What if a message is delayed due to congestion? Other problems… Solution: Use a dedicated algorithm
6
A Solution – Paxos Keeping the Safety requirements ensures an agreed-upon value, by all computers, is chosen Keeping the Liveness requirements ensures a value will be chosen
7
Paxos - Background Paxos Made Simple Leslie Lamport 01 Nov 2001 Paxos Made Live
8
Principles The system consists of three agent classes: – Proposers – Acceptors – Learners Some of them distinguished Communicate via messages
9
Principles – continued A single computer – a Leader – is in charge Decision cycle in two phases: 1.A majority must promise to commit to a recent proposal. 2.Once a majority has committed, all computers are informed of the Decision.
10
Safety requirements Only a value that has been proposed may be chosen, Only a single value is chosen, and A process never learns that a value has been chosen unless it actually has been.
11
Liveness requirements Some proposed value is eventually chosen. A process can eventually learn the value which has been chosen.
12
Implementing a State Machine Collection of servers, each implementing a state machine. The i-th state machine command in the sequence is the value chosen by the i-th instance of the Paxos consensus algorithm. A pre-decided set of commands is necessary.
13
Planet-Lab Planet-Lab is a global research network that supports the development of new network services. Understanding the system is required Monitoring is necessary – Generally, implemented via NSSL-lab.
14
Project Design Chosen language for implementation: Python Network framework: Twisted Matrix Implementation stages: – Single Decision on NSSL – Multiple Decisions on NSSL – Single Decision on Planet-Lab – Multiple Decisions on Planet-Lab
16
Implementation Use Cases – Acceptor disconnects? – Leader disconnects? At which stage? – Acceptor message fails to deliver?
17
Implementation Leader Election – In fact an inherent part of the algorithm Output and monitoring – Actual output not visible in general – Only via monitoring
18
Flow 1.Register Nodes 2.Verify and install necessary files 3.Upload 4.Initiate Monitor 5.Run and wait for activity 6.Review results
19
Implementation – File Structure
20
Results Everything works at the NSSL In Real-Life, not necessarily Communication phenomena – messages arriving unordered, in large chunks, etc. Works well for up to 20-30 Nodes Use cases tested in Lab
21
Conclusions Preliminary work needed to understand Twisted Matrix and Planet-Lab Dealing with network problems – SSH Tunnel instead of “real” monitoring Requirements fulfilled
22
Further work Optimize networking protocol – Improve client-server interface – Inefficient startup – N(N-1) for N machines Partition Decision processes – Only few nodes decide each resolution
23
Thank you
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.