Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.

Similar presentations


Presentation on theme: "Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments."— Presentation transcript:

1 Presenter: Chi-Hung Lu 1

2 Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments Protocols involve complex interactions among a collection of networked machines Need to handle failures ranging from network problems to crashing nodes Intricate sequences of events can trigger complex errors as a result of mishandled corner cases 2

3 Approaches Logging-based Debugging X-Trace Bi-directional Distributed BackTracker (BDB) Pip Deterministic Replay WiDS Friday Jockey Model Checking MaceMC 3

4 R. Fonseca et al, NSDI 07 4

5 Problem Description It is difficult to diagnose the source of the problem for an internet application Current network diagnostic tools only focus on one particular protocol Does not share information on the application between the user, service, and the network operators 5

6 Examples traceroute Could locate IP connectivity problem Could not reveal proxy or DNS failures HTTP monitoring suite Could locate application problem Could not diagnose routing problems 6

7 Examples 7 User DNS Server Proxy Web Server

8 Examples 8 User DNS Server Proxy Web Server

9 Examples 9 User DNS Server Proxy Web Server

10 Examples 10 User DNS Server Proxy Web Server

11 X-Trace An integrated tracing framework Record the network path that were taken Invoke X-Trace when initiating an application task Insert X-Trace metadata with a task identifier in the request Propagate the metadata down to lower layers through protocol interfaces 11

12 Task Tree X-Trace tags all network operations resulting from a particular task with the same task identifier Task tree is the set of network operations connected with an initial task Task tree could be reconstruct after collecting trace data with reports 12

13 An example of the task tree A simple HTTP request through a proxy 13

14 X-Trace Components Data X-Trace metadata Network path Task tree Report Reconstruct task tree 14

15 Propagation of X-Trace Metadata The propagation of X-Trace metadata through the task tree 15

16 Propagation of X-Trace Metadata The propagation of X-Trace metadata through the task tree 16

17 The X Trace metadata FieldUsage FlagsBits that specify which of the three optional components are present TaskIDAn unique integer ID TreeInfoParentID, OpID, EdgeType DestinationSpecify the address that X-Trace report should be sent to OptionsAccommodate future extensions mechanism 17

18 Operation of X-Trace Metadata 18

19 Operation of X-Trace Metadata 19

20 X-Trace Report Architecture 20

21 X-Trace Report Architecture 21

22 X-Trace Report Architecture 22

23 Usage Scenario (1) Web request and recursive DNS queries 23

24 Usage Scenario (2) A request fault annotated with user input 24

25 Usage Scenario (3) A client and a server communicate over I3 overlay network 25

26 Usage Scenario (3) Internet Indirect Infrastructure (I3) 26

27 Usage Scenario (3) Internet Indirect Infrastructure (I3) 27

28 Usage Scenario (3) Internet Indirect Infrastructure (I3) 28

29 Usage Scenario (3) Tree for normal operation 29

30 Usage Scenario (3) The receiver host fails 30

31 Usage Scenario (3) Middlebox process crash 31

32 Usage Scenario (3) The middlebox host fails 32

33 Discussion Report loss Non-tree request structures Partial deployment Managing report traffic Security Considerations 33

34 X. Liu et al, NSDI 07 34

35 Problem Description Log mining is both labor-intensive and fragile Latent bugs often are distributed across multiple nodes Logs reflect incomplete information of an execution Non-determinism of distributed application 35

36 Goals Efficiently verify application properties Provide fairly complete information about an execution Reproduce the buggy runs deterministically and faithfully 36

37 Approach Log the actual execution of a distributed system Apply predicate checking in a centralized simulator over a run driven by testing scripts or replayed by logs Output violation report along with message traces An execution is interpreted as a sequence of events, which are dispatched to corresponding handling routines 37

38 Components A versatile script language Allow a developer to refine system properties into straightforward assertions A checker Inspect for violations 38

39 Architecture Components of WiDS Checker 39

40 Architecture Reproduce real runs Log all non-deterministic events using Lamport’s logical clock Check user-defined predicates A versatile scription language to specify system states being observed and the predicates for invariants and correctness Screen out false alarms with auxiliary information For liveness properties Trace root causes using a visualization tool 40

41 Programming with WiDS WiDS APIs are mostly member function of the WiDSObject class WiDS runtime maintains an event queue to buffer pending events and dispatches them to corresponding handling routines 41

42 Enabling Replay Logging Log all WiDS nondeterminism Redirect OS calls and log the results Embed a Lamport Clock in each out-going message Checkpoint Support partial replay Save the WiDS process context Replay Start from the beginning or a checkpoint Replay events in serialized Lamport order 42

43 Checker Observe memory state Define states and evaluate predicates Refresh database for each event Maintain history Re-evaluate modified predicates Auxiliary information for violations Liveness properties only guarantee to be true eventually 43

44 44

45 45

46 46

47 Visualization Tools Message flow graph 47

48 Evaluation Benchmark and result summary 48

49 Performance Running time for evaluating predicates 49

50 Logging Overhead Percentage of logging time 50

51 Discussion System is debugged by those who developed it Bugs are hunted by those who are intimately familiar with the system 51


Download ppt "Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments."

Similar presentations


Ads by Google