Presentation is loading. Please wait.

Presentation is loading. Please wait.

A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.

Similar presentations


Presentation on theme: "A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems."— Presentation transcript:

1 A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems

2 How to Monitor? Four schemes How to monitor?ProsCons Monitor devices such as routers No overhead Device status does not directly translate into user perceived performance Monitor BGP updates No overhead; Know what happens in other network Do not see some data-plane anomaly Monitor flow- level traffic No overhead; Real traffic; Witness direct impact of failures Do not witness failures directly Active probingWitness direct impact of failures Extra overhead; May not mimic the real traffic

3 What Constraint to Monitor? Network meets ISP’s goals Resource utilization Routing goes as specified by policy … Network meets users’ goals Reachability Most fundamental end-to-end property Easy to define and formulate Delay, loss Less easier to define and formulate Application level: Bulk transfer, VOIP Depends on reachability, delay, loss, etc

4 Our Monitor Scheme Monitor reachability using active probing Focus on reachability Use ping – no need for remote cooperation Trade off between probing efficiency and probing coverage (challenges) Disclaimers Do not monitor delay or loss Do not consider ISP’s goals

5 Troubleshooting -- Next Step to Monitoring Goal of troubleshooting Localize the root cause How local? Depends on the nature of the cause Purpose of troubleshooting Local root cause Pin-point the problem and fix it Remote root cause Contact the responsible networks to solve the problem By-pass the faulty network

6 Localize the Root Cause AS 1 AS 2 AS 3 Topology dimension Forwarding paths (those who do forwarding) Control plane Physical and link layers Firewalls (those who prevent forwarding) Protocol dimension a->b b->c c->d m->n m->l n->l x->y y->z z->x Localize the cause at protocol level Link level Localize the cause at AS level Both AS and protocol level

7 Troubleshooting: Three building blocks Tool traceroute, ping, netflow, looking glass, etc Data: generated by tool e2e reachability, BGP updates, traffic profile, etc Brain: the intelligent part, usually network operator Digest the data, make inference, leverage dependency, draw from past experience The key of troubleshooting. Hard problem

8 What Can We Do to Improve? Improve the tool Promote the cooperation among networks Traceroute -> resilient remote traceroute BGP feed -> resilient remote BGP feed Improve the automation of brain Unify previous work

9 Automatic Brain It’s a challenging problem Fault may occur at multiple levels Involve machine learning Example work: Enterprise network services, sigcomm’07, by Paramvir Bahl et al.

10 Dependency Graph Approach Decompose a large system into components Infer the dependencies among components A depends on B: If B fails, A fails Lead to a hierarchy of dependencies: dependency graph (like Makefile) A set of observations on some components For example, F,H,X works but G fails Infer the status of other components using dependencies, finally locate the root cause component

11 Dependency Graph Example 1 Multi-tier dependency graph. Diagnoses multi-level fault but needs automated construction. [ From Paramvir Bahl et al, sigcomm’07 ]

12 Dependency Graph Example 2 Flat dependency graph. Diagnoses simple fault. [From Ramana Kompella et al, infocom’07 ]

13 Trade-off in Decomposition The granularity of decomposition determines the how specific the troubleshooting is Fine-grained decomposition Advantage: more specific Disadvantage: graph is more complex, constructing and solve it is challenging Coarse-grained decomposition Advantage: graph is simple, constructing and solving it is less challenging Disadvantage: less specific

14 Dependency Graph Regarding Internet Routing p can send packets to q Forwarding path p->q is OK Link u_i->u_{i+1} is up AS N_i has correct route AB A depends on B Path p->q before failure: IP hops: u_0, u_1, …, u_n, AS hops: N_0, N_1, …., N_m Physical path p->q is OK Control plane info is correctly propagated p can ping q q can send packets to p … AS N_i imports routes of prefix p N_{i+1}

15 Dependency Graph Regarding Internet Routing (cont.) Account for three common root causes Link/router failure Router misconfiguration leading to missing route (i.e. does not import route) Router misconfiguration or attack leading to prefix hijacking Topology-wise locate the root cause, and also tell among the three root causes Reasonably specific

16 Recent Work on Network Troubleshooting Infocom’07, Detection and Localization of Network Black Holes, by Ramana R. Kompella et al Automate the “brain”. Consider only physical failure. Mainly for intra-domain. Flat dependency graph. CoNext’07, NetDiagnoser: Troubleshooting network unreachabilities using end-to-end probes and routing data, by Amogh Dhamdhere et al Automate the “brain”. Consider both physical failure and control plane fault. For inter-domain. Flat dependency graph. Sigcomm’07, Automating Cross-layer Diagnosis of Enterprise Wireless Networks, by Cheng et al Improving the “tool”. Measure and infer various delays in a wireless environment Sigcomm’07, Towards Highly Reliable Enterprise Network Services Via Inference of Multi-level Dependencies, by Paramvir Bahl et al Automate the “brain”. Mainly for enterprise network and services. Deal with multi-level faults. Automatically generate multi-tier dependency graph.

17 NetDiagnoser: Overview Troubleshooting unreachability Fault assumption: Link failure, router misconfiguration causing partial link failure (in particular BGP export filter misconfiguration) Deal with filtered traceroute More comprehensive than previous work Infrastructure: sensors, all pair-wise traceroute Mechanisms: Binary tomography Per-neighor-basis logical link modeling control plane Combining BGP withdraw message

18 NetDiagnoser: Logical Links

19 Netdiagnoser: Dependency Assumption P can send packets to q Forwarding path p->q is OK Link u_i->u_{i+1} is up AS N_{i+1} exports prefix q to AS N_i AB A depends on B P->q: IP hops: u_0, u_1, …, u_n, AS hops: N_0, N_1, …., N_m Physical path p->q is OK Control plane info is correctly propagated


Download ppt "A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems."

Similar presentations


Ads by Google