EECS 498 Introduction to Distributed Systems Fall 2017

Slides:

Advertisements

Similar presentations

Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.

Advertisements

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.

Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.

Causal Consistency Without Dependency Check Messages Willy Zwaenepoel.

Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Distributed Systems Fall 2011 Gossip and highly available services.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Fault Tolerance via the State Machine Replication Approach Favian Contreras.

Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.

IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.

CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586 Distributed Systems Consistency --- 3

Primary-Backup Replication COS 418: Distributed Systems Lecture 5 Kyle Jamieson.

Primary-Backup Replication

CSE 486/586 Distributed Systems Consistency --- 2

Nomadic File Systems Uri Moszkowicz 05/02/02.

Distributed Systems – Paxos

CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.

EEC 688/788 Secure and Dependable Computing

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem

Replication State Machines via Primary-Backup

Replication Middleware for Cloud Based Storage Service

Strong Consistency & CAP Theorem

View Change Protocols and Reconfiguration

EECS 498 Introduction to Distributed Systems Fall 2017

Replication and Consistency

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

CSE 486/586 Distributed Systems Consistency --- 3

Implementing Consistency -- Paxos

EECS 498 Introduction to Distributed Systems Fall 2017

Replication and Consistency

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

Replication Improves reliability Improves availability

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

EEC 688/788 Secure and Dependable Computing

Fault-tolerance techniques RSM, Paxos

PERSPECTIVES ON THE CAP THEOREM

EECS 498 Introduction to Distributed Systems Fall 2017

EECS 498 Introduction to Distributed Systems Fall 2017

IS 651: Distributed Systems Fault Tolerance

Scalable Causal Consistency

Lecture 21: Replication Control

Replication State Machines via Primary-Backup

Fault-Tolerant State Machine Replication

EECS 498 Introduction to Distributed Systems Fall 2017

Causal Consistency and Two-Phase Commit

Replication State Machines via Primary-Backup

Distributed Systems CS

Replicated state machine and Paxos

View Change Protocols and Reconfiguration

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Strong Consistency & CAP Theorem

Global Distribution.

Lecture 21: Replication Control

Chain Replication lecture

CSE 486/586 Distributed Systems Consistency --- 2

Implementing Consistency -- Paxos

CSE 486/586 Distributed Systems Consistency --- 3

Replication and Consistency

Presentation transcript:

EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha

Consistency vs. availability Replica can execute client requests only if in same partition as majority What consistency can we offer if we want any replica to always serve client requests? Replica Replica Replica Replica Replica October 18, 2017 EECS 498 – Lecture 12

Eventual Consistency If no new updates, all replicas eventually converge to same state Apply updates in same order at all replicas in a manner that preserves causal ordering Use Lamport clocks Don’t want to wait to apply an update until clock progressed at all nodes Apply update immediately; roll back and re-apply later October 18, 2017 EECS 498 – Lecture 12

Eventual Consistency Never know whether some write from the past may yet reach your node So all entries in log must be tentative forever All nodes must store entire log forever To commit updates and garbage collect them Rely on highly available primary to order updates Order chosen by primary must preserve causality October 18, 2017 EECS 498 – Lecture 12

Committing Updates At any replica: Upon sync with primary Stable state Log of tentatively ordered updates (order based on Lamport clock timestamps) Upon sync with primary Receive updates in order Apply updates to stable state and prune log October 18, 2017 EECS 498 – Lecture 12

Replicated State Machines Lamport/vector clock based RSM Cannot progress if any replica is unavailable Eventually consistent even if intermittent connectivity Primary backup replication Can replace primary/backup upon failure Unavailable until failed replica is replaced Paxos based RSM Available as long as majority in same partition October 18, 2017 EECS 498 – Lecture 12

Consistency Spectrum Read-after- write Eventual Causal Sequential Linearizability Consistency Ease of programming Latency Availability October 18, 2017 EECS 498 – Lecture 12

Web Services in the Cloud VM VM VM Storage Service October 18, 2017 EECS 498 – Lecture 12

Example Scenario Photo sharing web service deployed in AWS Use S3 for storage Photo URL  Image data Album name  <List of photo URLs> User name  (# of photos, <List of album names>) Problems application must cope with due to eventual consistency of storage? October 18, 2017 EECS 498 – Lecture 12

Problems with Eventual Consistency Lack of referential integrity Total # of photos != sum of # of photos in each album Updates may get lost Initialize album1 and add photo1 Read album1’s content and add photo2 Read album1’s content and add photo3 Album1 may eventually have only photo1 and photo3 Why is Amazon S3 still useful at all? Read-after-write consistency for initial PUT Most data is write-once October 18, 2017 EECS 498 – Lecture 12

Design Challenge Goals: Minimize latency for reads and writes Total ordering of writes Okay to respond to reads with stale data but minimize occurrences Data center A Data center B Data center C Webservers Webservers Webservers Storage Storage Storage October 18, 2017 EECS 498 – Lecture 12

Facebook Architecture October 18, 2017 EECS 498 – Lecture 12

Consistency at Facebook 0.0004% of reads violate linearizability Despite system not designed for linearizability! Implication: Whether a system offers linearizable consistency cannot be ascertained via measurements Must reason about system’s design October 18, 2017 EECS 498 – Lecture 12

PACELC If partition, Else, Choose availability vs. consistency Else, Choose latency vs. consistency Externally visible effects may not reveal if system chooses latency over consistency R R R R R October 18, 2017 EECS 498 – Lecture 12

Topics for mid-term Logical clocks Global snapshot for coordinated checkpointing Primary-backup replication External consistency; viewservice; split brain Consistency: linearizable, causal, eventual Tradeoffs vs. availability and latency Case studies: GFS, Chubby, ZooKeeper October 18, 2017 EECS 498 – Lecture 12