Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges in Making Tomography Practical Yiyi Huang, Georgia Tech Nick Feamster, Georgia Tech Renata Teixeira, LIP6 Christophe Diot, Thomson.

Similar presentations

Presentation on theme: "Challenges in Making Tomography Practical Yiyi Huang, Georgia Tech Nick Feamster, Georgia Tech Renata Teixeira, LIP6 Christophe Diot, Thomson."— Presentation transcript:

1 Challenges in Making Tomography Practical Yiyi Huang, Georgia Tech Nick Feamster, Georgia Tech Renata Teixeira, LIP6 Christophe Diot, Thomson

2 2 Problem Network operators need to detect and isolate faults quickly, before customers complain Plenty of existing alarms –SNMP traps –Active probes –Anomaly detection systems Unfortunately, this set of alarms does not help operators locate and eliminate problems that induce problems on end-to-end paths

3 3 Network Tomography to the Rescue Send end-to-end probes through the network Monitor paths for differences in reachability Infer location of reachability problem from these differences Monitor x y Targets

4 4 Some Problems Scalability vs. speed: Detection must be fast Ambiguity: Losses are one-way but dont always have access to both ends of the path Lack of synchronization: Different monitors see different conditions Dynamics: Topology can change, loss can be transient

5 5 Doppler: Making Tomography Practical Fast, scalable detection –Solution: Monitor selection algorithm to reduce the number of monitors and targets so that cycle times are fast Transient packet loss –Solution: Triggered confirmation of failed paths One-way losses –Solution: New algorithm based on IP spoofing Dynamic routing –Solution: Periodic snapshots of the network topology Controlled evaluation on VINI, plus limited wide-area experiments.

6 6 Fast, Scalable Detection Select monitors, targets to satisfy two conditions –All interfaces are covered (or diagnosable) –The number of monitors is small enough to ensure a short round time Two goals –Coverage: When a failure occurs, system detects it Every interface is covered by at least one path –Diagnosability: When a failure occurs, system locates it Every interface is covered by a unique set of paths

7 7 Offline Path Selection: Diagnosability Step 1: Compute the set of paths that cover all interfaces (greedy set cover heuristic) Step 2: Compute hitting set for each interface Step 3: Build equivalence classes for interfaces with common hitting set –For each interface in a set with more than one interface, find path that crosses only that interface

8 8 Detection, Confirmation, Correlation Periodic (once per 5 minutes) topology snapshot from all monitors to all destinations keeps track of underlying topology before the failure Detection: Periodic probes (once per cycle time) detect failure Confirmation: When a probe is lost, the monitor sends three additional probes. If all three are lost, path is determined to have failed. Correlation: Paths that fail within 10 seconds of one another are grouped.

9 9 Disambiguating One-Way Losses: Spoofing Monitor sends request to spoofer to send probe Probe has IP address of the monitor If reply reaches the monitor, reverse path is working M Spoofer: Send spoofed packet with source address of M T

10 10 Identification: NetDiagnoser Binary network tomography algorithm [Dhamdhere et al.] Input: hosts, destinations, topology before the failure Output: Set of possible locations for the fault

11 11 Evaluation of Detection Algorithms Controlled experiments on the VINI testbed –Emulated copy of Abilene network on wide-area paths –Probing strategy emulates the paths that would be probed in monitor selection algorithm –Compare reduced set of paths to aggressive measurement approach Varied failure location and duration –Duration varied from 5 to 80 seconds –Test repeated for each failed link Measure detection and false alarm rates Preliminary experiments using data from real-world networks

12 12 Detection: Scale and Speed Compute reduction in the number of paths required to achieve coverage and diagnosability –Reduction from about 27,000 paths to 151 paths For real-world networks, compute corresponding reduction in cycle time –Reduction from aout 3.5 minutes to < 5 seconds

13 13 Single-Link Failures More selective probing identifies more of the shorter link failures (due to shorter cycle time) Also results in fewer false alarms

14 14 Single-Node Failures Similar results to single-link failures –Selective measurements result in faster detection, fewer false alarms

15 15 Does Failure Confirmation Reduce the Total Number of Alarms? Confirmation reduces the number of failures by > 35% Correlation further reduces the number of alarms (by about a factor of 10)

16 16 How Quickly can Doppler Identify Failures? Answer: Roughly 20 seconds using the reduced set of paths Two main components –Detection/Confirmation: Time from when failure was injected to the time Doppler could detect and confirm the failure –Correlation: Time to group failures and construct reachability matrix

17 17 Detection and Confirmation Delay Most failures are detected within 3-5 seconds

18 18 Correlation Delay Reducing the number of paths to probe significantly reduces total correlation time

19 19 Summary Making tomography practical is challenging –Asynchronous measurements –Scale and speed –Changing topologies –Ambiguity about forward and reverse paths Doppler: Set of techniques to address many of these problems Current analysis is still performed offline –Many additional challenges remain to coordinate online measurements

Download ppt "Challenges in Making Tomography Practical Yiyi Huang, Georgia Tech Nick Feamster, Georgia Tech Renata Teixeira, LIP6 Christophe Diot, Thomson."

Similar presentations

Ads by Google