State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010.

Slides:

Advertisements

Similar presentations

Paxos and Zookeeper Roy Campbell.

Advertisements

Paxos Made Simple Leslie Lamport. Introduction ► Lock is the easiest way to manage concurrency  Mutex and semaphore.  Read and write locks in 2PL for.

Impossibility of Distributed Consensus with One Faulty Process

N-Consensus is the Second Strongest Object for N+1 Processes Eli Gafni UCLA Petr Kuznetsov Max Planck Institute for Software Systems.

NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.

CloudWatcher: Network Security Monitoring Using OpenFlow in Dynamic Cloud Networks or: How to Provide Security Monitoring as a Service in Clouds? Seungwon.

Teaser - Introduction to Distributed Computing

CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

Distributed Systems Overview Ali Ghodsi

Consensus Hao Li.

Byzantine Generals Problem: Solution using signed messages.

Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.

Minimum intrusion GRID. Build one to throw away … So, in a given time frame, plan to achieve something worthwhile in half the time, throw it away, then.

A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.

Strong Consistency and Agreement COS 461: Computer Networks Spring 2011 Mike Freedman 1 Jenkins,

Fault-tolerance techniques RSM, Paxos Jinyang Li.

Minimum intrusion GRID. Build one to throw away … So, in a given time frame, plan to achieve something worthwhile in half the time, throw it away, then.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.

EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.

PAXOS Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum.

CHUBBY and PAXOS Sergio Bernales 1Dennis Kafura – CS5204 – Operating Systems.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.

Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.

Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:

CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.

Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Wrap-up Steve Ko Computer Sciences and Engineering University at Buffalo.

Practical Byzantine Fault Tolerance

Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.

Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,

Architectural Design of Distributed Applications Chapter 13 Part of Design Analysis Designing Concurrent, Distributed, and Real-Time Applications with.

Paxos A Consensus Algorithm for Fault Tolerant Replication.

Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Wrap-up Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.

© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.

Implementing Replicated Logs with Paxos John Ousterhout and Diego Ongaro Stanford University Note: this material borrows heavily from slides by Lorenzo.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

CSci8211: Distributed Systems: State Machines 1 Detour: Some Theory of Distributed Systems Supplementary Materials  Replicated State Machines Notion of.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

Detour: Distributed Systems Techniques

Consensus, impossibility results and Paxos Ken Birman.

The consensus problem in distributed systems

Distributed Systems – Paxos

Alternative system models

EECS 498 Introduction to Distributed Systems Fall 2017

Implementing Consistency -- Paxos

Distributed Systems, Consensus and Replicated State Machines

Fault-tolerance techniques RSM, Paxos

EEC 688/788 Secure and Dependable Computing

Consensus, FLP, and Paxos

EEC 688/788 Secure and Dependable Computing

EECS 498 Introduction to Distributed Systems Fall 2017

Replicated state machine and Paxos

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Implementing Consistency -- Paxos

Presentation transcript:

State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Goals Learn and understand Paxos and Python. Design program for fault-tolerant distributed system using the Paxos algorithm. Test on a real internet scale system, Planet- Lab.

The Problem – Distributed Storage Using Distributed Algorithms on a network has many advantages It also has many problems This project focuses on the Synchronization Problem

Synchronization The task: Successfully issue a state machine which involves all the computers of a network All the computers need to be in sync regarding the Current State and the Next States. All the computers need to know the transitions.

Problems? Can any computer choose the next state? What if a computer disconnects ungracefully? What if a message is delayed due to congestion? Other problems… Solution: Use a dedicated algorithm

A Solution – Paxos Keeping the Safety requirements ensures an agreed-upon value, by all computers, is chosen Keeping the Liveness requirements ensures a value will be chosen

Paxos - Background Paxos Made Simple Leslie Lamport 01 Nov 2001 Paxos Made Live

Principles The system consists of three agent classes: – Proposers – Acceptors – Learners Some of them distinguished Communicate via messages

Principles – continued A single computer – a Leader – is in charge Decision cycle in two phases: 1.A majority must promise to commit to a recent proposal. 2.Once a majority has committed, all computers are informed of the Decision.

Safety requirements Only a value that has been proposed may be chosen, Only a single value is chosen, and A process never learns that a value has been chosen unless it actually has been.

Liveness requirements Some proposed value is eventually chosen. A process can eventually learn the value which has been chosen.

Implementing a State Machine Collection of servers, each implementing a state machine. The i-th state machine command in the sequence is the value chosen by the i-th instance of the Paxos consensus algorithm. A pre-decided set of commands is necessary.

Planet-Lab Planet-Lab is a global research network that supports the development of new network services. Understanding the system is required Monitoring is necessary – Generally, implemented via NSSL-lab.

Project Design Chosen language for implementation: Python Network framework: Twisted Matrix Implementation stages: – Single Decision on NSSL – Multiple Decisions on NSSL – Single Decision on Planet-Lab – Multiple Decisions on Planet-Lab

Implementation Use Cases – Acceptor disconnects? – Leader disconnects? At which stage? – Acceptor message fails to deliver?

Implementation Leader Election – In fact an inherent part of the algorithm Output and monitoring – Actual output not visible in general – Only via monitoring

Flow 1.Register Nodes 2.Verify and install necessary files 3.Upload 4.Initiate Monitor 5.Run and wait for activity 6.Review results

Implementation – File Structure

Results Everything works at the NSSL In Real-Life, not necessarily Communication phenomena – messages arriving unordered, in large chunks, etc. Works well for up to Nodes Use cases tested in Lab

Conclusions Preliminary work needed to understand Twisted Matrix and Planet-Lab Dealing with network problems – SSH Tunnel instead of “real” monitoring Requirements fulfilled

Further work Optimize networking protocol – Improve client-server interface – Inefficient startup – N(N-1) for N machines Partition Decision processes – Only few nodes decide each resolution

Thank you