Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prophecy: Using History for High-Throughput Fault Tolerance

Similar presentations


Presentation on theme: "Prophecy: Using History for High-Throughput Fault Tolerance"— Presentation transcript:

1 Prophecy: Using History for High-Throughput Fault Tolerance
Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University

2 Non-crash failures happen
Model as Byzantine (malicious) It’s hard to model non-crash failures. One way we can model them… Independence

3 Mask Byzantine faults Service Clients
Given this failure model, we then try to mask the failure so the client doesn’t see it. Clients Service

4 Mask Byzantine faults Throughput Clients Replicated service

5 Mask Byzantine faults Linearizability (strong consistency) Clients
Throughput Linearizability (strong consistency) Clients Replicated service

6 Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions

7 Prophecy High throughput + good consistency No free lunch:
Read-mostly workloads Slightly weakened consistency

8 Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy

9 Traditional BFT reads application Agree? Clients Replica Group

10 A cache solution cache application Agree? Clients Replica Group

11 A cache solution … Problems: Huge cache Invalidation Clients
application Problems: Huge cache Invalidation Agree? Your cache becomes as large as your disk Clients Replica Group

12 A compact cache … Clients Replica Group Requests Responses req1 resp1
application Requests Responses req1 resp1 req2 resp2 req3 resp3 Clients Replica Group

13 A compact cache … Clients Replica Group Requests Responses Requests
application Requests Responses Requests Responses sketch(req1) sketch(resp1) sketch(req2) sketch(resp2) sketch(req3) sketch(resp3) Clients Replica Group

14 A sketcher … Clients Replica Group sketcher application
That solves the big cache problem. To solve the invalidation problem, we will just make sure to get the full response from one replica. Clients Replica Group

15 A sketcher sketch webpage ………… ………… Clients ………… Replica Group

16 Fast, load-balanced reads
Executing a read sketch webpage ………… Agree? Fast, load-balanced reads ………… ………… END: Suppose the responses don’t match… how could this happen? Clients ………… Replica Group

17 Executing a read … Clients Replica Group sketch webpage Agree? …………

18 Executing a read … key-value store replicated state machine Clients
sketch webpage key-value store ………… replicated state machine ………… Clients ………… Replica Group

19 Executing a read … Maintain a fresh cache Clients Replica Group sketch
webpage Agree? ………… Maintain a fresh cache ………… ………… Clients ………… Replica Group

20 Did we achieve linearizability?
NO!

21 Executing a read … Clients Replica Group sketch webpage ………… ………… …………

22 Executing a read … Clients Replica Group sketch webpage Agree? …………

23 Executing a read … Fast reads may be stale Clients Replica Group
sketch webpage ………… Agree? Fast reads may be stale ………… ………… Clients ………… Replica Group

24 Load balancing … Pr(k stale) = gk Clients Replica Group sketch webpage
………… ………… Agree? Pr(k stale) = gk ………… Clients ………… Replica Group

25 D-Prophecy vs. BFT … Traditional BFT: D-Prophecy:
Each replica executes read Linearizability D-Prophecy: One replica executes read “Delay-once” linearizability Clients Replica Group We call this system D-Prophecy

26 Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy We’ve solved the throughput problem for read requests at the cost of slightly weakened consistency But we still have two other problems, and these are bad for internet services

27 Key-exchange overhead
11% 3% Sessions containing only 8 requests, throughput is 1/10 of its maximum

28 Internet services … Clients Replica Group
So if we apply our current design to internet services, we are faced with the problems of high per-session overhead, and modified clients. We can solve both of these problems by introducing a proxy in between the client and replica group Clients Replica Group

29 Consolidate sketchers
A proxy solution Consolidate sketchers Proxy Sketcher Clients Replica Group

30 Sketcher must be fail-stop
A proxy solution Sketcher must be fail-stop Sketcher Clients Trusted Replica Group

31 A proxy solution … Trust middlebox already Small and simple Clients
Sketcher must be fail-stop Trust middlebox already Small and simple Sketcher Our implementation required less than 3000 LOC Clients Trusted Replica Group

32 Fast, load-balanced reads
Prophecy Executing a read Fast, load-balanced reads ………… ………… ………… ………… q Sketcher ………… Clients Trusted Req Resp s(q)  ………… Replica Group

33 Prophecy … Fast reads may be stale Clients Replica Group Sketcher Req
………… ………… ………… ………… ………… ………… Sketcher Clients Trusted Req Resp s(q)  ………… ………… Replica Group

34 Delay-once linearizability
No assumptions about organization of system state or consistency model of replica group

35 Delay-once linearizability
Read-after-write property  W, R, W, W, R, R, W, R  No assumptions about organization of system state or consistency model of replica group

36 Delay-once linearizability
Read-after-write property  W, R, W, W, R, R, W, R  No assumptions about organization of system state or consistency model of replica group

37 Example application Upload embarrassing photos Weak may reorder
1. Remove colleagues from ACL 2. Upload photos 3. (Refresh) Weak may reorder Delay-once preserves order Eventual consistency: updates may appear in different orders at different replicas

38 Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy

39 Implementation Modified PBFT C++, Tamer async I/O
PBFT is stable, complete Competitive with Zyzzyva et. al. C++, Tamer async I/O Sketcher: 2000 LOC PBFT library: 1140 LOC PBFT client: 1000 LOC

40 Evaluation Prophecy vs. proxied-PBFT D-Prophecy vs. PBFT
Proxied systems D-Prophecy vs. PBFT Non-proxied systems

41 Evaluation Prophecy vs. proxied-PBFT We will study: Proxied systems
Performance on “null” workloads Performance with real replicated service Where system bottlenecks, how to scale

42 Basic setup Sketcher (concurrent) Clients (100) Replica Group (PBFT)

43 Fraction of failed fast reads
Alexa top sites: < 15% Fraction of failed fast reads If you have a transition ratio of 1, then every read is being preceded by a write. If you have a transition ratio of 0, then reads are only preceded by other reads.

44 Small benefit on null reads

45 Apache webserver setup
Sketcher Clients Replica Group

46 Large benefit on real workload
3.7x 2.0x Concurrent reads of a 1-byte webpage

47 Benefit grows with work
94s (Apache) Null workloads are misleading!

48 Benefit grows with work

49 Single sketcher bottlenecks
Concurrent reads of a 1-byte webpage

50 Scaling out First tier does load balancing, second tier does consistent hashing… no more complex than how you’d scale-out and partition a system today

51 Scales linearly with replicas
Concurrent reads of a 1-byte webpage

52 Summary Prophecy good for Internet services
Fast, load-balanced reads D-Prophecy good for traditional services Prophecy scales linearly while PBFT stays flat Limitations: Read-mostly workloads (meas. study corroborates) Delay-once linearizability (useful for many apps)

53 Thank You

54 Additional slides

55 Transitions Prophecy good for read-mostly workloads
Are transitions rare in practice?

56 Measurement study Alexa top sites
Access main page every 20 sec for 24 hrs

57 Mostly static content

58 Mostly static content 15%

59 Dynamic content Rabin fingerprinting on transitions
43% differ by single contiguous change Sampled 4000 of them, over half due to: Load balancing directives Random IDs in links, function parameters “change” = insertion/deletion/substitution (i.e. edit distance 1)


Download ppt "Prophecy: Using History for High-Throughput Fault Tolerance"

Similar presentations


Ads by Google