Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:

Slides:

Advertisements

Similar presentations

Advertisements

Impossibility of Distributed Consensus with One Faulty Process

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.

IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.

6.852: Distributed Algorithms Spring, 2008 Class 7.

Life after CAP Ali Ghodsi CAP conjecture [reminder] Can only have two of: – Consistency – Availability – Partition-tolerance Examples.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.

(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)

Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus

Consensus Hao Li.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Efficient Solutions to the Replicated Log and Dictionary Problems

BREWER’S CONJECTURE AND THE FEASIBILITY OF CAP WEB SERVICES (Eric Brewer) Seth Gilbert Nancy Lynch Presented by Kfir Lev-Ari.

Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.

20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.

Distributed Storage System Survey

Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.

Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.

Consensus and Its Impossibility in Asynchronous Systems.

Practical Byzantine Fault Tolerance

Lecture 4: Sun: 23/4/1435 Distributed Operating Systems Lecturer/ Kawther Abas CS- 492 : Distributed system & Parallel Processing.

CAP + Clocks Time keeps on slipping, slipping…. Logistics Last week’s slides online Sign up on Piazza now – No really, do it now Papers are loaded in.

DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.

SysRép / 2.5A. SchiperEté The consensus problem.

Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.

Fault Tolerance (2). Topics r Reliable Group Communication.

1 Fault Tolerance and Recovery Mostly taken from

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

Consensus, impossibility results and Paxos Ken Birman.

The consensus problem in distributed systems

When Is Agreement Possible

Trade-offs in Cloud Databases

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem

Alternating Bit Protocol

PERSPECTIVES ON THE CAP THEOREM

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Causal Consistency and Two-Phase Commit

Transaction Properties: ACID vs. BASE

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Strong Consistency & CAP Theorem

Abstractions for Fault Tolerance

Presentation transcript:

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by: Karl Smith

 CAP  Consistency  Availability  Partition-Tolerance  Three desirable, and expected properties of real- world services  Brewer states that it is impossible to guarantee all three Introduction

 Most web services attempt to provide strongly consistent data  Most use ACID databases  Atomic  Consistent  Isolated  Durable  Web services also need fault-tolerance  Handle crashing nodes, and network partitioning ACID

Formal Model

 Data should maintain atomic consistency  There must exist a total order on all operations such that each operation looks as if it were completed at a single instant  This is not the same as the Atomic requirement in ACID Atomic Data Objects

 Every request received by a non-failing node in the system must result in a response  No time requirement  Difficult because even in severe network failures, every request must terminate  Brewer originally only required almost all requests get a response, this has been simplified to all Available Data Objects

 When the network is partitioned all messages sent from nodes in one partition to nodes in another partition are lost  This causes the difficulty because  Every response must be atomic even though arbitrary messages might not be delivered  Every node must respond even though arbitrary messages may be lost  No failure other then total network failure is allowed to cause incorrect responses Partition Tolerance

 Asynchronous Networks  There is no clock  Nodes must make decisions based only on messages received and local computation  Partially Synchronous Networks  Each system has a clock  Clocks increase at the same rate  Clocks my not be synchronized Different Networks

 It is impossible to provide the following in all fair executions (included those in which messages are lost)  Availability  Atomic Consistency  Proven by contradiction Asynchronous Network Impossible

 Let the system be made of two nodes(G1,G2) that are partitioned separately such that all messages between G1 and G2 are lost  If a write occurs on G1, then later a read occurs on G2, G2 cannot return the data written as a result of the write to G1 Impossible Proof

 No solution exists to meet all three requirements, but any two can be accommodated Partial Solutions

 Ignore all requests  Alternate solution: each data object is hosted on a single node and all actions involving that object are forwarded to the node hosting the object Atomic & Partition Tolerant

 If no partitions occur it is clearly possible to provided atomic, available data  Systems that run on intranets and LANs are an example of these algorithms Atomic & Available

 The service can return the initial value for all requests  The system can provide weakened consistency, this is similar to web caches Available & Partition Tolerant

 It is impossible to provide the following in all fair executions (included those in which messages are lost)  Availability  Atomic Consistency  Proven by contradiction  (This is very similar to the synchronous slide) Partially Synchronous Network Still Impossible

 Let the system be made of two nodes(G1,G2) that are partitioned separately such that all messages between G1 and G2 are lost  If a write occurs on G1, then later a read occurs on G2, G2 cannot return the data written as a result of the write to G1  (This seems familiar) Impossible Proof

 By allowing stale data to be returned when messages are lost it is possible to maintain a weaker consistency  Delayed-t consistency- there is an atomic order for operations only if there was an interval between the operations in which all messages were delivered Weaker Consistency Conditions

1.P is a partial order that orders all write operations, and orders all read operations with respect to the write operations. 2.The value returned by every read operation is exactly the one written by the previous write operation in P (or the initial value, if there is no such previous write in P). 3.The order in P is consistent with the order of read and write requests submitted at each node. 4.(Atomicity) If all messages in the execution are delivered, and an operation θ completes before an operation Φ begins, then Φ does not precede θ in the partial order P, 5.(Weakly Consistent) Assume there exists an interval of time longer than t in which no messages are lost. Further, assume an operation, θ, completes before the interval begins, and another operation, Φ, begins after the interval ends. Then Φ does not precede θ in the partial order Definition

1.A sends a request to C for the most recent value. 2.If A receives a response from C, save the value and send it to the client. 3.If A concludes that a message was lost (i.e. a timeout occurs), then return the value with the highest sequence number received from C (see below), or the initial-value (if no value has yet been received from C). Read

1.A sends a message to C with the new value. 2.If A receives an acknowledgement from C, then A sends an acknowledgement to the client, and stops. 3.If A concludes a message was lost (i.e. a timeout occurs), then A sends an acknowledgement to the client. 4.If A has not yet received an acknowledgement from C, then A sends a message to C with the new value. 5.If A concludes a message was lost (i.e. a timeout occurs), A repeats step 4 within t − 4 * t timeout seconds. Write

1.C increments its sequence number by 1. 2.C sends out the new value and the sequence number to every node. 3.If C concludes a message was lost (i.e. a timeout occurs), then C resends the value and sequence number to the missing node within time t − 2 * t timeout seconds. 4.Repeat step 3 until every node has acknowledged the value. New Value

 Proved that CAP is impossible to provide  Any two properties can be maintained  It is possible to achieve a compromise between consistency and availability in a partially synchronous network Conclusion