Logic and Lattices for Distributed Programming Neil Conway UC Berkeley Joint work with: Peter Alvaro, Peter Bailis, David Maier, Bill Marczak, Joe Hellerstein,

Slides:



Advertisements
Similar presentations
Disorderly Distributed Programming with Bloom
Advertisements

Time, Clocks, and the Ordering of Events in a Distributed System
Edelweiss: Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California,
BloomUnit Declarative testing for distributed programs Peter Alvaro UC Berkeley.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Distributed Systems Overview Ali Ghodsi
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley.
Logic and Lattices for Distributed Programming Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein UC Berkeley David Maier Portland State.
Distributed Programming and Consistency: Principles and Practice Peter Alvaro Neil Conway Joseph M. Hellerstein UC Berkeley.
Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)
Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Byzantine Generals Problem: Solution using signed messages.
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.
An Associative Broadcast Based Coordination Model for Distributed Processes James C. Browne Kevin Kane Hongxia Tian Department of Computer Sciences The.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 CS 201 Compiler Construction Lecture 4 Data Flow Framework.
The CAP Theorem Tomer Gabel, Wix BuildStuff 2014.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Disorderly programming for a distributed world Peter Alvaro UC Berkeley.
Molecular Transactions G. Ramalingam Kapil Vaswani Rigorous Software Engineering, MSRI.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Cassandra - A Decentralized Structured Storage System
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
Cloud Programming: From Doom and Gloom to BOOM and Bloom Peter Alvaro, Neil Conway Faculty Recs: Joseph M. Hellerstein, Rastislav Bodik Collaborators:
CAP + Clocks Time keeps on slipping, slipping…. Logistics Last week’s slides online Sign up on Piazza now – No really, do it now Papers are loaded in.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
9/14/20051 Time, Clocks, and the Ordering of Events in a Distributed System by L. Lamport CS 5204 Operating Systems Vladimir Glina Fall 2005.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
Hwajung Lee. Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Distributed systems. distributed systems and protocols distributed systems: use components located at networked computers use message-passing to coordinate.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
Big Data Yuan Xue CS 292 Special topics on.
Randomized Algorithms for Distributed Agreement Problems Peter Robinson.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Bloom: Big Systems, Small Programs Neil Conway UC Berkeley.
Consistency Analysis in Bloom: a CALM and Collected Approach Authors: Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak Presented by:
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CS 440 Database Management Systems
Time and Clock.
EECS 498 Introduction to Distributed Systems Fall 2017
CS 440 Database Management Systems
Time and Clock.
EECS 498 Introduction to Distributed Systems Fall 2017
CRDTs and Coordination Avoidance (Lecture 8, cs262a)
Based on slides by Ali Ghodsi and Ion Stoica
Presentation transcript:

Logic and Lattices for Distributed Programming Neil Conway UC Berkeley Joint work with: Peter Alvaro, Peter Bailis, David Maier, Bill Marczak, Joe Hellerstein, Sriram Srinivasan Basho Chats #004 June 27, 2012

Programming

Distributed Programming

Dealing with Disorder Introduce order –Paxos, Zookeeper, Two-Phase Commit, … –“Strong Consistency” Tolerate disorder –Correct behavior in the face of many possible network orders –Typical goal: replicas converge to same final state “Eventual Consistency”

Eventual Consistency PopularHard to program

Help developers build reliable programs on top of eventual consistency

This Talk 1. Theory –CRDTs, Lattices, and CALM 2. Practice –Programming with Lattices –Case Study: KVS

Read: {Alice, Bob} Write: {Alice, Bob, Dave} Write: {Alice, Bob, Carol} Students {Alice, Bob, Dave} Students {Alice, Bob, Carol} Client 0 Client 1 Read: {Alice, Bob} Students {Alice, Bob} How to resolve? Students {Alice, Bob}

Proble m Replicas perceive different event orders GoalSame final state at all replicas Solutio n Commutative operations (“merge functions”)

Students {Alice, Bob, Carol, Dave} Client 0 Client 1 Merge = Set Union

Commutative Operations Used by Dynamo, Riak, Bayou, etc. Formalized as CRDTs: Convergent and Commutative Replicated Data Types –Shapiro et al., INRIA ( ) –Based on join semilattices –Commutative, associative, idempotent Practical libraries: Statebox, Knockbox

Time Set (Union) Integer (Max) Boolean (Or) “Growth”: Larger Sets “Growth”: Larger Numbers “Growth”: false  true

Client 0 Client 1 Students {Alice, Bob, Carol, Dave} Teams { } Read: {Alice, Bob, Carol, Dave} Read: { } Write: {, } Teams {, } Remove: {Dave} Students {Alice, Bob, Carol} Replica Synchronization Students {Alice, Bob, Carol} Teams {, }

Client 0 Client 1 Students {Alice, Bob, Carol, Dave} Teams { } Read: {Alice, Bob, Carol} Read: { } Teams { } Remove: {Dave} Students {Alice, Bob, Carol} Replica Synchronization Students {Alice, Bob, Carol} Nondeterministic Outcome! Teams { }

Possible Solution: Wrap both replicated values in a single complex CRDT

Goal: Compose larger application using “safe” mappings between simple lattices

Time Set (merge = Union) Integer (merge = Max) Boolean (merge = Or) size() >= 5 Monotone function from set  max Monotone function from max  boolean

Monotonicity in Practice “The more you know, the more you know” Never retract previous outputs (“mistake-free”) Typical patterns: immutable data accumulate knowledge over time threshold tests (“if” w/o “else”) Typical patterns: immutable data accumulate knowledge over time threshold tests (“if” w/o “else”)

Monotonicity and Determinism Agents strictly learn more knowledge over time Monotone: different learning order, same final outcome Result: Program is deterministic!

A program is confluent if it produces the same results regardless of network nondeterminism 20

A program is confluent if it produces the same results regardless of network nondeterminism 21

Consistency As Logical Monotonicity CALM Analysis 1.All monotone programs are confluent 2.Simple syntactic test for monotonicity Result: Simple static analysis for eventual consistency

Handling Non-Monotonicity … is not the focus of this talk Basic choices: 1.Nodes agree on an event order using a coordination protocol (e.g., Paxos) 2.Allow non-deterministic outcomes If needed, compensate and apologize

Putting It Into Practice What we’d like: Collection of agents No shared state (  message passing) Computation over arbitrary lattices

Bloom OrganizationCollection of agents CommunicationMessage passing StateRelations (sets) ComputationRelational rules over sets (Datalog, SQL)

BloomBloom L OrganizationCollection of agents CommunicationMessage passing StateRelations (sets)Lattices ComputationRelational rules over sets (Datalog, SQL) Functions over lattices

Quorum Vote in Bloom L QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, :voter_id] channel :result_chn, lset :votes lmax :vote_cnt lbool :got_quorum end bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } end Map set ! max Map max ! bool Threshold test on bool Lattice state declarations 27 Communication interfaces Accumulate votes into set Annotated Ruby class Program state Program logic Merge function for set lattice

Builtin Lattices NameDescription?a t bSample Monotone Functions lboolThreshold testfalse a ∨ b when_true() ! v lmaxIncreasing number 1max(a,b ) gt(n) ! lbool +(n) ! lmax -(n) ! lmax lminDecreasing number −1−1min(a,b)lt(n) ! lbool lsetSet of values;a [ bintersect(lset) ! lset product(lset) ! lset contains?(v) ! lbool size() ! lmax lpsetNon-negative set;a [ bsum() ! lmax lbagMultiset of values;a [ bmult(v) ! lmax +(lbag) ! lbag lmapMap from keys to lattice values empty map at(v) ! any-lat intersect(lmap) ! lmap 28

Case Study

Goal: Provably eventually consistent key-value store (KVS) Assumption: Map keys to lattice values (i.e., values do not decrease) Assumption: Map keys to lattice values (i.e., values do not decrease) Solution: Use a map lattice

Time Replica 1 Replica 2 Nested lattice value

Time Replica 1 Replica 2 Add new K/V pair

Time Replica 1 Replica 2 “Grow” value in extant K/V pair

Time Replica 1 Replica 2 Replica Synchronization

Goal: Provably eventually consistent KVS that stores arbitrary values Solution: Assign a version to each key-value pair Each replica stores increasing versions, not increasing values

Object Versions in Dynamo/Riak 1.Each KV pair has a vector clock version 2.Given two versions of a KV pair, prefer the one with the strictly greater version 3.If versions are incomparable, invoke user- defined merge function

Vector Clock: Map from node IDs  logical clocks Logical Clock: Increasing counter Solution: Use a map lattice Solution: Use an increasing-int lattice

Version-Value Pairs Pair = Pair merge(Pair o) { if self.fst > o.fst: self elsif self.fst < o.fst: o else new Pair(self.fst.merge(o.fst), self.snd.merge(o.snd)) }

Time Replica 1 Replica 2

Time Replica 1 Replica 2 Version increase; NOT value increase

Time Replica 1 Replica 2 R1’s version replaces R2’s version

Time Replica 1 Replica 2 New R2

Time Replica 1 Replica 2 Concurrent writes!

Time Replica 1 Replica 2 Merge VC (automatically), value merge via user’s lattice (as in Dynamo)

Lattice Composition in KVS

Conclusion Dealing with EC Many event orders  order- independent (disorderly) programs LatticesDisorderly state Monotone Functions Disorderly computation Monotone Bloom Lattices + monotone functions for safe distributed programming

Questions Welcome Please try Bloom! Or: gem install bud

Backup Slides

Lattices hS,t,?i is a bounded join semi-lattice iff: –S is a partially ordered set –t is a binary operator (“least upper bound”) For all x,y 2 S, x t y = z where x · S z, y · S z, and there is no z’  z 2 S such that z’ · S z. Associative, commutative, and idempotent –? is the “least” element in S (8x 2 S: ? t x = x) 49 Example: increasing integers –S = Z, t = max, ? = -∞

Monotone Functions f : S  T is a monotone function iff 8a,b 2 S : a · S b ) f(a) · T f(b) 50 Example: size(Set) ! Increasing-Int size({A, B}) = 2 size({A, B, C}) = 3

From Datalog ! Lattices Datalog (Bloom)Bloom L StateRelationsLattices Example Values[[“red”, 1], [“green”, 2]]set: [“red”, “green”] map: {“red” => 1, “green” => 2} counter: 5 condition: false ComputationRules over relationsFunctions over lattices Monotone Computation Monotone rulesMonotone functions Program SemanticsFixpoint of rules (stratified semantics) Fixpoint of functions (stratified semantics) 51

Bloom Operational Model 52

QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, :voter_id] channel :result_chn, table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn = QUORUM_SIZE} end Quorum Vote in Bloom Communication Persistent Storage Transient Storage Accumulate votes Send message when quorum reached Not (set) monotonic! 53

Current Status WriteupsBloom L : UCB Tech ReportUCB Tech Report Bloom/CALM: CIDR’11, websiteCIDR’11website Lattice Runtime Available as a git branch To be merged soon-ish Examples, Case Studies KVS Shopping carts Causal delivery Under development: MDCC, concurrent editingMDCC