Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Automatically Checking Thousands of Failures with Micro-Specifications Haryadi S. Gunawi, Thanh Do †, Pallavi Joshi, Joseph M. Hellerstein, Andrea.

Similar presentations


Presentation on theme: "Towards Automatically Checking Thousands of Failures with Micro-Specifications Haryadi S. Gunawi, Thanh Do †, Pallavi Joshi, Joseph M. Hellerstein, Andrea."— Presentation transcript:

1 Towards Automatically Checking Thousands of Failures with Micro-Specifications Haryadi S. Gunawi, Thanh Do †, Pallavi Joshi, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau †, Remzi H. Arpaci-Dusseau †, Koushik Sen University of California, Berkeley † University of Wisconsin, Madison

2 Cloud Era Solve bigger human problems Use cluster of thousands of machines 2

3 Failures in The Cloud “The future is a world of failures everywhere” - Garth Gibson “Recovery must be a first-class operation” - Raghu Ramakrishnan “Reliability has to come from the software” - Jeffrey Dean 3

4 4

5 5

6 Why Failure Recovery Hard? Testing is not advanced enough against complex failures –Diverse, frequent, and multiple failures –FaceBook photo loss Recovery is under specified –Need to specify failure recovery behaviors –Customized well-grounded protocols Example: Paxos made live – An engineering perspective [PODC’ 07] 6

7 Our Solutions FTS (“FATE”) – Failure Testing Service –New abstraction for failure exploration –Systematically exercise 40,000 unique combinations of failures DTS (“DESTINI”) – Declarative Testing Specification –Enable concise recovery specifications –We have written 74 checks (3 lines / check) Note: Names have changed since the paper 7

8 Summary of Findings Applied FATE and DESTINI to three cloud systems: HDFS, ZooKeeper, Cassandra Found 16 new bugs Reproduced 74 bugs Problems found –Inconsistency –Data loss –Rack awareness broken –Unavailability 8

9 Outline Introduction FATE DESTINI Evaluation Summary 9

10 10 M1C23M1C234 M1C23M1C23 No failures Setup Stage Recovery: Recreate fresh pipeline Data transfer Stage Recovery: Continue on surviving nodes Bug in Data Transfer Stage Recovery X3X3 X2X2 X1X1 Setup Stage Alloc. Req. Data Transfer Stage Failures at DIFFERENT STAGES lead to DIFFERENT FAILURE BEHAVIORS Goal: Exercise different failure recovery path

11 FATE A failure injection framework –target IO points –Systematically exploring failure –Multiple failures New abstraction of failure scenario –Remember injected failures –Increase failure coverage 11 M1C23 XXXX XX

12 Failure ID FieldsValues StaticFunc. CallOutputStream.read() Source FileBlockReceiver.java DynamicStack Track… Domain specific SourceNode 2 DestinationNode 3 Net. MessageData Packet FailureTypeCrash After Hash

13 How Developers Build Failure ID? FATE intercepts all I/Os Use aspectJ to collect information at every I/O point –I/O buffers (e.g file buffer, network buffer) –Target I/O (e.g. file name, IP address) Reverse engineer for domain specific information 13

14 Failure ID FieldsValues StaticFunc. CallOutputStream.read() Source FileBlockReceiver.java DynamicStack Track… Domain specific SourceNode 2 DestinationNode 3 Net. MessageData Packet FailureTypeCrash After Hash

15 Exploring Failure Space 14 M1C23 AABABC Exp #1: A Exp #2: B Exp #3: C M1C23 ABC B AA AB AC B C BC

16 Outline Introduction FATE DESTINI Evaluation Summary 15

17 DESTINI Enable concise recovery specifications Check if expected behaviors match with actual behaviors Important elements: –Expectations –Facts –Failure Events –Check Timing Interpose network and disk protocols 16

18 Writing specifications “Violation if expectation is different from actual facts” violationTable():- expectationTable(), NOT-IN actualTable() DataLog syntax: :- derivation,AND 17

19 18 M1C23 Correct recovery X M1C23 X Incorrect Recovery Expected Nodes (Block, Node) BNode 1 BNode 2 actualNodes(Block, Node) BNode 1 BNode 2 IncorrectNodes (Block, Node) incorrectNodes(B, N) :- expectedNodes(B, N), NOT-IN actualNodes(B, N);

20 19 M1C23 Correct recovery X Expected Nodes (Block, Node) BNode 1 BNode 2 actualNodes(Block, Node) BNode 1 IncorrectNodes (Block, Node) BNode 2 M1C23 X Incorrect recovery BUILD EXPECTATIONSCAPTURE FACTS incorrectNodes(B, N) :- expectedNodes(B, N), NOT-IN actualNodes(B, N);

21 Building Expectations expectedNodes(B, N) :- getBlockPipe(B, N); 20 Expected Nodes(Block, Node) BNode 1 BNode 2 BNode 3 M1C23 X MasterClient Give me list of nodes for B [Node 1, Node 2, Node 3]

22 Updating Expectation DEL expectedNodes(B, N) :- fateCrashNode(N), writeStage(B, Stage), Stage = “Data Transfer”, expectedNode(B, N) 21 Expected Nodes(Block, Node) BNode 1 BNode 2 BNode 3 M1C23 X “Client receives all acks from setup stage writeStage”  enter Data Transfer stage Precise failure events  Different stages  different recovery behaviors  different specifications  FATE and DESTINI must work hand in hand setupAcks (B, Pos, Ack) :- cdpSetupAck (B, Pos, Ack); goodAcksCnt (B, COUNT ) :- setupAcks (B, Pos, Ack), Ack == ’OK’; nodesCnt (B, COUNT ) :- pipeNodes (B,, N, ); writeStage (B, Stg) :- nodesCnt (NCnt), goodAcksCnt (ACnt), NCnt == Acnt, Stg := “Data Transfer”;

23 Capture Facts actualNodes(B, N) :- blocksLocation(B, N, Gs), latestGenStamp(B, Gs) 22 actualNodes(Block, Node) BNode 1 blocksLocations(B, N, Gs) BNode 12 BNode 21 BNode 31 latestGenStamp(B, Gs) B2 M1C23 Correct recovery X M1C23 X Incorrect recovery B_gs2B_gs1

24 Violation and Check-Timing 23 actualNodes(Block, Node) BNode 1 ExpectedNodes(Bloc k, Node) BNode 1 BNode 2 IncorrectNodes (Block, Node) BNode 2 incorrectNodes(B, N) :- expectedNodes(B, N), NOT-IN actualNodes(B, N), cnpComplete(B) ; There is a point in time where recovery is ongoing, thus specifications are violated Need precise events to decide when the check should be done –In this example, upon block completion

25 Rules 24 r1incorrectNodes (B, N) :-:- cnpComplete (B), expectedNodes (B, N), NOT-IN actualNodes (B, N); r2pipeNodes (B, Pos, N) :-:- getBlkPipe (UFile, B, Gs, Pos, N); r3expectedNodes (B, N) :-:- getBlkPipe (UFile, B, Gs, Pos, N); r4DEL expectedNodes (B, N) :-:- fateCrashNode (N), pipeStage (B, Stg), Stg == 2, expectedNodes (B, N); r5setupAcks (B, Pos, Ack) :-:- cdpSetupAck (B, Pos, Ack); r6goodAcksCnt (B, CUUNT ) :-:- setupAcks (B, Pos, Ack), Ack == ’OK’; r7nodesCnt (B, COUNT ) :-:- pipeNodes (B,, N, ); r8pipeStage (B, Stg) :-:- nodesCnt (NCnt), goodAcksCnt (ACnt), NCnt == Acnt, Stg := 2; r9blkGenStamp (B, Gs) :-:- dnpNextGenStamp (B, Gs); r10blkGenStamp (B, Gs) :-:- cnpGetBlkPipe (UFile, B, Gs,, ); r11diskFiles (N, File) :-:- fsCreate (N, File); r12diskFiles (N, Dst) :-:- fsRename (N, Src, Dst), diskFiles (N, Src, Type); r13DEL diskFiles (N, Src) :-:- fsRename (N, Src, Dst), diskFiles (N, Src, Type); r14fileTypes (N, File, Type) :-:- diskFiles(N, File), Type := Util.getType(File); r15blkMetas (N, B, Gs) :-:- fileTypes (N, File, Type), Type == metafile, Gs := Util.getGs(File); r16actualNodes (B, N) :-:- blkMetas (N, B, Gs), blkGenStamp (B, Gs); Capture Facts, Build Expectation from IO events  No need to interpose internal functions Specification Reuse  For the first check, # rules : #check is 16:1  Overall, #rules: # check ratio is 3:1

26 Outline Introduction FATE DESTINI Evaluation Summary 25

27 Evaluation FATE: 3900 lines, DESTINI: 1200 lines Applied FATE and DESTINI to three cloud systems –HDFS, ZooKeeper, Cassandra 40,000 unique combination of failures Found 16 new bugs, reproduced 74 bugs 74 recovery specifications –3 lines / check 26

28 Bugs found Reduced availability and performance Data loss due to multiple failures Data loss in log recovery protocol Data loss in append protocol Rack awareness property is broken 27

29 Conclusion FATE explores multiple failure systematically DESTINI enables concise recovery specifications FATE and DESTINI: a unified framework –Testing recovery specifications requires a failure service –Failure service needs recovery specifications to catch recovery bugs 28

30 Thank you! 29 The Advanced Systems Laboratory Berkeley Orders of Magnitude QUESTIONS? Downloads our full TR paper from these websites

31 New Challenges Exponential growth of multiple failures –FATE exercised 40,000 failure combinations in 80 hours 30

32 DESTINI vs. Related works Framework# ChecksLines/check D3S1053 Pip4443 WiDS1522 P2 Monitor1112 DESTINI743 31

33 HDFS Java SDK Failure Server Filters Fail/ No Fail? Workload Driver while (server injects new failureIDs) { runWorkload(); // e.g hdfs.write } Failure Surface FATE Architecture

34 DESTINI DESTINI stateY(..) :- cnpEv(..), state(X); NDCFATE

35 Current state of the Art: Failure exploration  Rarely deal with multiple failures  Or using random approach System specifications  Unit test checking: cumbersome  WiDS, Pip: not integrated with failure service

36 35 M1C23 M1C234 X1X1 M1C23 X2X2 M1C23 X3X3 No failures Recovery 1: Recreate fresh pipeline Recovery 2: Continue on surviving nodes Bug in recovery 2 Static: InputStream.read() Domain: - Src : Node 1 - Dest: Node 2 - Type: Data Transfer Static: InputStream.read() Domain: - Src : Node 2 - Dest: Node 3 - Type: Data Transfer Static: InputStream.read() Domain: - Src : Node 1 - Dest: Node 2 - Type: Setup


Download ppt "Towards Automatically Checking Thousands of Failures with Micro-Specifications Haryadi S. Gunawi, Thanh Do †, Pallavi Joshi, Joseph M. Hellerstein, Andrea."

Similar presentations


Ads by Google