Presentation is loading. Please wait.

Presentation is loading. Please wait.

Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira.

Similar presentations


Presentation on theme: "Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira."— Presentation transcript:

1 Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira (UPMC, France) Patrick Thiran (EPFL, Switzerland) Christophe Diot (Thomson, France)

2 The Internet is great, but problems happen UoA network Net1 Net2 Net3 How to automatically detect and identify problems? Is my connection ok? Is the server up? Is the problem in some of the networks in the path?

3 Current alarms are not enough Network equipments already have many alarms ◦ SNMP traps ◦ Anomaly detection systems But, alarms may not reflect user’s experience ◦ Hard to map users’ complaints to alarms ◦ Problem may not raise an alarm A C B D C wrongly filters packets to /24

4 Active monitoring system to detect faults Network admins often resort to active measurements ◦ Active monitoring servers inside their network ◦ Subscribe to third-party monitoring service e.g.,Keynote or RIPE TTM Challenge Cannot continuously overload the network or end-user’s machine to detect faults, which are rar e events

5 Problem definition M1 M2 T3 T1 T2 A C B D target hosts monitors Goal detect failures of any of the interfaces in the subscriber’s network with minimum probing overhead subscriber network

6 Simple solution: Coverage problem M1 M2 T3 T1 T2 A C B D Instead of probing all paths, select the minimum set of paths that covers all interfaces in the subscriber’s network

7 Coverage solution doesn’t detect all types of failures Detects full-stop failures ◦ Failures that affect all packets that traverse the faulty interface  Eg., interface or router crashes, fiber cuts, bugs But not path-specific failures ◦ Failures that affect only a subset of paths that cross the faulty interface  Eg., router misconfigurations

8 New formulation of failure detection problem Simultaneously select the frequency to probe each path ◦ Lower frequency per-path probing can achieve a high frequency probing of each interface M1 M2 T3 T1 T2 A C B D 1 every 9 mins 1 every 3 mins

9 Properties of solution Probe minimization for failure detection is no longer NP- hard ◦ Can find optimal solution using linear programming Needs synchronization among monitors ◦ Monitors need to collaborate to probe an interface Alternative probabilistic solution with Poisson probes to avoids synchronization overhead M1 M2 T3 T1 T2 A C B D 1 every 9 mins 1 every 3 mins

10 Scaling law of probing cost Probing cost (number of probes sent per second) scales almost linearly with the size of the subscriber’s network ◦ In our inferred internet graphs For a random power-law graph, probing cost is a linear function of the number of nodes (n) Bounded by the isometric path number of a graph, i(G) For other graphs: Graphi(G) Cycle2n/(n+1) Completen/2 Hypercuben/log n Gridn/2

11 Evaluation Paths obtained using traceroutes ◦ From 750 PlanetLab nodes to 3,000 DNS servers ◦ From 12 RON nodes to 60,000 targets Subscriber networks are probed ASes ◦ Map IPs to ASes using Mao et al.’s technique ◦ 1,366 ASes in PlanetLab ◦ 6,517 ASes in RON Compute probing costs varying parameters ◦ Set of paths, failure durations, subscriber’s network

12 Probing costs varying size of subscriber network in PlanetLab Duration Path-specific = 1000 sec Full-stop duration = 1 sec

13 Summary Practical formulation of failure detection problem ◦ Incorporates both full-stop and path-specific failures Solution minimizes probing cost ◦ Using linear programming Inferred internet graphs are among the most expensive to probe ◦ Probing cost scales almost linearly with network size Next step ◦ Deploy a system based on these probing techniques

14 Probing costs Duration Path-specific = 2 sec Full-stop duration = 1 sec

15 Varying Failure Durations Full-stop duration = 10 sec Path-specific failures dominate the cost Full-stop failures dominate the cost

16 Probing costs varying size of subscriber network in RON Duration Path-specific = 1000 sec Full-stop duration = 1 sec


Download ppt "Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira."

Similar presentations


Ads by Google