Presentation is loading. Please wait.

Presentation is loading. Please wait.

Motivation: Finding the root cause of a symptom

Similar presentations


Presentation on theme: "Motivation: Finding the root cause of a symptom"— Presentation transcript:

0 Differential Provenance: Better Network Diagnostics with Reference Events
Ang Chen Yang Wu Andreas Haeberlen Wenchao Zhou Boon Thau Loo University of Pennsylvania Georgetown University+

1 Motivation: Finding the root cause of a symptom
Traffic arriving at the wrong server !?! Overly specific flow entry /24 /24 Internet Bob Web server 2 Web server 1 DPI Networks can (and frequently do!) have bugs Example: Software-defined networks We need a good debugger!

2 Debugging networks with provenance
C received packet Packet P Packet P B sent packet A B C B received packet Rule match on B Rule installed by controller A sent packet A received packet Rule match on A Incoming packet at controller Typical debuggers tell us what happened: NetSight: Packet histories Y!: Network provenance Key benefit: Rich explanation of what, when, and why.

3 Problem: Explanation can be too big!
Rule 7: Next-hop=port2 root Root cause: faulty rule Packet arrives at wrong server The problem: Finding the root cause in a large provenance tree.

4 Key insight: Use reference events!
Bob Web server 2 Web server 1 DPI Remember that some packets were routed correctly. The same things should have happened to all packets! Key insight: If we have both a (bad) symptom and a (good) reference, we only need to reason about the differences between them!

5 A new debugger Bob collects both a bad symptom and a good reference
fault Field 3 of config entry 4 is wrong! Bob reference Debugger Bob collects both a bad symptom and a good reference Bob sends both events to the debugger Debugger generates provenance, outputs difference Ideally, there is only one diff—the root cause!

6 Outline Motivation: Network diagnostics Background Key insight
A new debugger Differential provenance Are references typically available? Strawman approach Our approach Initial results Conclusion

7 Are references typically available?
Survey: Posts on the ‘Outages’ mailing list in Sept-Dec 2014. 64 posts related to diagnostics. 42/64 (66%) posts involve both a fault and some reference. Examples: Some DNS servers have stale records, but others are good Probes sometimes fail, sometimes succeed More examples in the paper

8 Strawman solution - = ? Bad provenance Reference provenance A strawman solution: Pick out different nodes in trees. Bad provenance: 201 nodes Reference provenance: 156 nodes Naïve diff: 278 nodes!

9 Why does the strawman not work?
Faulty rule Observation: The diff can be larger than the individual trees. Reason #1: Differences that “do not matter” E.g., timestamps, packet payloads, etc. Reason #2: “Butterfly effect” A small difference can change later events drastically!

10 Differential provenance
Output: - Rule 7: change port - Rule 9: change range Bad provenance Reference provenance Approach: Change past events, and think about what could have happened. (1) Find some early ‘differences’ in the trees. (2) Change the faulty node to a correct equivalent. (3) Use replay to determine what would have happened. (4) Output the set of changes that align the trees.

11 Technical challenges Challenge #1: Where do we start?
Heuristics: Change early events, minimum changes… E.g., prefer changing 1 event than 1000 events. Challenge #2: How should we make the change? Approach: Think about what should have happened. E.g., packet should go to switch 2, not 1. Challenge #3: Irrelevant differences? Approach: Equivalence relations between events. E.g., IPs and See paper for more details.

12 Setup Setup Overly specific flow entry 4.3.2.0/24 4.3.3.0/24 Internet
Web server 1 DPI Setup Platform: RapidNet SDN: 6 switches, 2 servers The symptom: misrouted packets from /24 The reference: packets from /24

13 Differential provenance
Initial results = Fault: 201 nodes Naïve diff Reference: 156 nodes = Rule 7: next hop should be port 1, not 2! Differential provenance Differential provenance finds a single node (the faulty rule) to be the root cause!

14 Conclusion Thanks! Debugging networks is hard
Need good debuggers! Provenance can find the causes of an event Problem: Explanation can be too detailed. Idea: Use reference events Sufficient to find the (few) differences to the observed symptom New debugger based on differential provenance Result: Very precise diagnostics Ideally, can identify a single root cause! Thanks!


Download ppt "Motivation: Finding the root cause of a symptom"

Similar presentations


Ads by Google