Presentation is loading. Please wait.

Presentation is loading. Please wait.

Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Similar presentations


Presentation on theme: "Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell."— Presentation transcript:

1 Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell

2 Byzantine fault isolation \'biz- ə n- tēn folt ī-sə-'lā- shən\ n (2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application- defined partial correctness '' ˙ ' Farsite \'fär-sīt\ n (2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs Definitions Byzantine fault \'biz- ə n- tēn folt\ n (1982) : a failure of a system component that produces arbitrary behavior ' ˙ ' BFI \ bē-ef-'ī\ n (2006) : Byzantine fault isolation '

3 Talk Outline Context – Farsite system Why BFT doesn’t scale Farsite’s use of multiple BFT groups The need for isolating Byzantine faults Formal system specification BFI in Farsite

4 Farsite System client server client server

5 Farsite System usersBFT group metadata clients – Metadata

6 usersBFT groupclients Using Byzantine agreement protocol, assign sequence numbers to messages Prepare-commit among 2 T + 1 servers T = tolerable faults R = count of replicas R > 3 T Deterministically update metadata Reply to client Farsite System– Metadata

7 The Cost of BFT Groups computation messages message delays  1 2 2  4 32 5

8 0 1 2 3 4 5 6 7 1234567 machine count throughput multiple idealtypicalflatBFT Throughput vs. Scale

9 Workload Sharing Workload clientserver

10 BFT at Scale

11 Multiple BFT Groups

12 Tree of BFT Groups

13 / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin

14 Delegation to New Group / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin

15 Pathname Resolution / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin /users/Alice/code/C#/bar

16 Machine Failures at Scale

17 Group Failures at Scale

18 System Failure at Scale

19 Quantitative Fault Analysis Example system –File system distributed among interacting BFT groups Simplifying assumptions –Files are partitioned evenly among BFT groups –Machine failures are independent Machine fault probability = 0.001 Evaluate: operational fault rate –Probability that an operation on a randomly selected file exhibits a fault

20 Operational Faults vs. System Scale 1101001,00010,000100,000 system scale (count of BFT groups) operational fault rate BFT 4, no BFIBFT 7, no BFIBFT 10, no BFI BFT 4, ideal BFIBFT 4, tree (4) BFIBFT 4, tree (16) BFI 10 –1 10 0 –2 10 –3 10 –4 10 –5 10 –6 10 –7 6  10 –6 0.45 6  10 –6 3  10 –5

21 BFI versus no BFI

22 computation throughput reduction: messages  4 32  10 60% 200 84% 4-member BFT groups with BFI 10-member BFT groups without BFI

23 refinement BFI via Formal Specification state actions state semantic spec distributed system spec actions + faults ment + faults Improved! NEW

24 C++emacs tools src a.ha.cppa.exe Farsite Semantic Spec cl.exe open handlespending operations open read move / code bin a.obj

25 Farsite Distributed-System Spec

26 Farsite Refinement del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj

27 Actions are State Transitions / open handles pending operations a.cpp

28 Proving Refinement Inductively / open handles pending operations a.cpp

29 Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move code bin a.obj /

30 Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj

31 emacssrc a.ha.cppa.exe bin a.obj code Hello world,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %% @@),.,. {^ \-~-/ ^} " ",". { _ } / } ==_.:Y:. _=={ { _/ `--^--' \_} } / \ / \ / { ( ) y \ ! | | ! /,-.i~ ~i i~ ~i,-. (!!( V )!!) ^-'-'-^-'-'-^ Safety –A tainted file may have arbitrary contents and attributes –A tainted file may appear not linked into namespace –A tainted file may pretend not to have children it actually has –A tainted file may pretend to have children that do not exist –A tainted file may pretend another tainted file is a child or parent Liveness –Operations involving a tainted file may not complete Semantic Fault Specification C++ tools cl.exe / A tainted file may have arbitrary contents and attributes A tainted file may appear not linked into namespace A tainted file may pretend not to have children it actually has A tainted file may pretend to have children that do not exist A tainted file may pretend another tainted file is a child or parent Operations involving a tainted file may not complete foobar

32 Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups Distributed-System Improvements Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups

33 Summary of BFI Methodology Formally specify your system –Semantic spec: user’s view of system –Distributed-system spec: designer’s view of system –Refinement interprets distributed-system spec in semantic terms Modify distributed-system spec to express Byzantine faults Simultaneously –Strategically weaken semantic spec to describe faults –Improve distributed-system spec to quarantine faults Refinement lets you know when you are done

34 Conclusions BFT groups have negative throughput scaling Scalable systems can be built from multiple BFT groups System scale increases the probability of non-maskable Byzantine faults If faults are not isolated, a single faulty group can corrupt the entire system. BFI is a methodology for isolating Byzantine faults BFI uses formal system specification Improves fault tolerance without hurting throughput, unlike increasing BFT group size

35 Contact Information JohnDo@microsoft.com Howell@microsoft.com http://research.microsoft.com/farsite

36 Backup Slides

37 Semantic specification –1800 lines of TLA+ –114 definitions Distributed-system specification –11,500 lines of TLA+ –775 definitions Why so big? –Windows file-system semantics are complex –Scalability and strong consistency –Byzantine fault isolation Farsite Spec Stats


Download ppt "Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell."

Similar presentations


Ads by Google