Presentation is loading. Please wait.

Presentation is loading. Please wait.

The CAP Theorem Tomer Gabel, Wix BuildStuff 2014.

Similar presentations


Presentation on theme: "The CAP Theorem Tomer Gabel, Wix BuildStuff 2014."— Presentation transcript:

1 The CAP Theorem Tomer Gabel, Wix BuildStuff 2014

2 Credits Originally a talk by Yoav Abrahami (Wix) Based on “Call Me Maybe” by Kyle “Aphyr” Kingsbury

3 Brewer’s CAP Theorem Partition Tolerance ConsistencyAvailability

4 Brewer’s CAP Theorem Partition Tolerance ConsistencyAvailability Pick two!

5 By Example I want this book! –I add it to the cart –Then continue browsing There’s only one copy in stock!

6 By Example I want this book! –I add it to the cart –Then continue browsing There’s only one copy in stock! … and someone else just bought it.

7 Consistency

8 Consistency: Defined In a consistent system: All participants see the same value at the same time “Do you have this book in stock?”

9 Consistency: Defined If our book store is an inconsistent system: –Two customers may buy the book –But there’s only one item in inventory! We’ve just violated a business constraint.

10 Availability

11 Availability: Defined An available system: –Is reachable –Responds to requests (within SLA) Availability does not guarantee success! –The operation may fail –“This book is no longer available”

12 Availability: Defined What if the system is unavailable? –I complete the checkout –And click on “Pay” –And wait –And wait some more –And… Did I purchase the book or not?!

13

14 Partition Tolerance: Defined Partition: one or more nodes are unreachable This is distinct from a dead node… … but observably the same No practical system runs on a single node So all systems are susceptible! A B C D E

15 “The Network is Reliable” All four happen in an IP network To a client, delays and drops are the same Perfect failure detection is provably impossible 1 ! 1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson“Impossibility of Distributed Consensus with One Faulty Process”

16 Partition Tolerance: Reified External causes: –Bad network config –Faulty equipment –Scheduled maintenance Even software causes partitions: –Bad network config. –GC pauses –Overloaded servers Plenty of war stories! –Netflix –Twilio –GitHub –Wix :-) Some hard numbers 1 : –5.2 failed devices/day –59K lost packets/day –Adding redundancy only improves by 40% 1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al“Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”

17 “Proving” CAP

18 In Pictures Let’s consider a simple system: –Service A writes values –Service B reads values –Values are replicated between nodes These are “ideal” systems –Bug-free, predictable Node 1 V0V0 V0V0 A A Node 2 V0V0 V0V0 B B

19 In Pictures “Sunny day scenario”: –A writes a new value V 1 –The value is replicated to node 2 –B reads the new value Node 1 V0V0 V0V0 A A Node 2 V0V0 V0V0 B B V1V1 V1V1 V1V1 V1V1 V1V1 V1V1

20 In Pictures What happens if the network drops? –A writes a new value V 1 –Replication fails –B still sees the old value –The system is inconsistent Node 1 V0V0 V0V0 A A Node 2 V0V0 V0V0 B B V1V1 V0V0 V1V1 V1V1

21 In Pictures Possible mitigation is synchronous replication –A writes a new value V 1 –Cannot replicate, so write is rejected –Both A and B still see V 0 –The system is logically unavailable Node 1 V0V0 V0V0 A A Node 2 V0V0 V0V0 B B V1V1

22 What does it all mean?

23 The network is not reliable Distributed systems must handle partitions Any modern system runs on >1 nodes… … and is therefore distributed Ergo, you have to choose: –Consistency over availability –Availability over consistency

24 Granularity Real systems comprise many operations –“Add book to cart” –“Pay for the book” Each has different properties It’s a spectrum, not a binary choice! ConsistencyAvailability Shopping CartCheckout

25 CAP IN THE REAL WORLD Kyle “Aphyr” Kingsbury Breaking consistency guarantees since 2013

26 PostgreSQL Traditional RDBMS –Transactional –ACID compliant Primarily a CP system –Writes against a master node “Not a distributed system” –Except with a client at play!

27 PostgreSQL Writes are a simplified 2PC: –Client votes to commit –Server validates transaction –Server stores changes –Server acknowledges commit –Client receives acknowledgement Client Server Commit Store Ack

28 PostgreSQL But what if the ack is never received? The commit is already stored… … but the client has no indication! The system is in an inconsistent state Client Server Commit Store Ack

29 PostgreSQL Let’s experiment! 5 clients write to a PostgreSQL instance We then drop the server from the network Results: –1000 writes –950 acknowledged –952 survivors

30 So what can we do? 1.Accept false-negatives –May not be acceptable for your use case! 2.Use idempotent operations 3.Apply unique transaction IDs –Query state after partition is resolved These strategies apply to any RDBMS

31 A document-oriented database Availability/scale via replica sets –Client writes to a master node –Master replicates writes to n replicas User-selectable consistency guarantees

32 MongoDB When a partition occurs: –If the master is in the minority, it is demoted –The majority promotes a new master… –… selected by the highest optime

33 MongoDB The cluster “heals” after partition resolution: –The “old” master rejoins the cluster –Acknowleged minority writes are reverted!

34 MongoDB Let’s experiment! Set up a 5-node MongoDB cluster 5 clients write to the cluster We then partition the cluster … and restore it to see what happens

35 MongoDB With write concern unacknowleged: –Server does not ack writes (except TCP) –The default prior to November 2012 Results: –6000 writes –5700 acknowledged –3319 survivors –42% data loss!

36 MongoDB With write concern acknowleged: –Server acknowledges writes (after store) –The default guarantee Results: –6000 writes –5900 acknowledged –3692 survivors –37% data loss!

37 MongoDB With write concern replica acknowleged: –Client specifies minimum replicas –Server acks after writes to replicas Results: –6000 writes –5695 acknowledged –3768 survivors –33% data loss!

38 MongoDB With write concern majority: –For an n-node cluster, requires at least n/2 replicas –Also called “quorum” Results: –6000 writes –5700 acknowledged –5701 survivors –2 false positives :-( –No data loss

39 So what can we do? 1.Keep calm and carry on –As Aphyr puts it, “not all applications need consistency” –Have a reliable backup strategy –… and make sure you drill restores! 2.Use write concern majority –And take the performance hit

40 The prime suspects Aphyr’s Jepsen tests include: –Redis –Riak –Zookeeper –Kafka –Cassandra –RabbitMQ –etcd (and consul) –ElasticSearch If you’re considering them, go read his posts In fact, go read his posts regardless http://aphyr.com/tags/jepsen

41

42 Immutable Data Immutable (adj.): “Unchanging over time or unable to be changed.” Meaning: –No deletes –No updates –No merge conflicts –Replication is trivial

43 Idempotence An idempotent operation: –Can be applied one or more times with the same effect Enables retries Not always possible –Side-effects are key –Consider: payments

44 Eventual Consistency A design which prefers availability … but guarantees that clients will eventually see consistent reads Consider git: –Always available locally –Converges via push/pull –Human conflict resolution

45 Eventual Consistency The system expects data to diverge … and includes mechanisms to regain convergence –Partial ordering to minimize conflicts –A merge function to resolve conflicts

46 Vector Clocks A technique for partial ordering Each node has a logical clock –The clock increases on every write –Track the last observed clocks for each item –Include this vector on replication When observed and inbound vectors have no common ancestor, we have a conflict This lets us know when history diverged

47 Vector Clocks: Example 1 1 “Why Vector Clocks Are Hard”, JustinSheehy on the Basho Blog“Why Vector Clocks Are Hard”

48 CRDTs Commutative Replicated Data Types 1 A CRDT is a data structure that: –Eventually converges to a consistent state –Guarantees no conflicts on replication 1 “A comprehensive study of Convergent and Commutative Replicated Data Types”, Shapiro et al“A comprehensive study of Convergent and Commutative Replicated Data Types”

49 CRDTs CRDTs provide specialized semantics: –G-Counter: Monotonously increasing counter –PN-Counter: Also supports decrements –G-Set: A set that only supports adds –2P-Set: Supports removals but only once OR-Sets are particularly useful –Keeps track of both additions and removals –Can be used for shopping carts

50 Questions? Complaints?

51 WE’RE DONE HERE! Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Aphyr’s “Call Me Maybe” blog posts: http://aphyr.com/tags/jepsen


Download ppt "The CAP Theorem Tomer Gabel, Wix BuildStuff 2014."

Similar presentations


Ads by Google