Presentation on theme: "Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Santosh Vempala, Megan Elmore."— Presentation transcript:
Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Santosh Vempala, Megan Elmore
2 Internet Availability It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. … It should be: 1.Robust and available. The network should be as robust, fault- tolerant and available as the wire-line telephone network is today. 2.… E911 service Air traffic control … Stanford University Clean-Slate Design for the Internet: OK for and the Web, but what about:
3 It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be: 1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today. Work to do… Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 nines More critical (or at least availability-centric) applications on the Internet At the same time, the Internet is getting more difficult to debug –Scale, complexity, disconnection, etc.
9 Idea: Backup/Multipath For intradomain routing –IP and MPLS fast re-route –Packet deflections [Yang 2006] –ECMP, NotVia, Loop-Free Alternates [Cisco] For interdomain routing –MIRO [Rexford 2006] Problem –Scale: Protecting against arbitrary failures requires storing lots of state, exchanging lots of messages –Control: End systems cant signal when they think a path has failed
10 Backup Paths: Promise and Problems Bad: If any link fails on both paths, s is disconnected from t Want: End systems remain connected unless the underlying graph has a cut ts
11 Path Splicing: Main Idea Step 1 (Generate slices): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration Step 2 (Splice end-to-end paths): Allow traffic to switch between instances at any node in the protocol t s Compute multiple forwarding trees per destination. Allow packets to switch slices midstream.
12 Outline Path Splicing for Intradomain Routing –Generating slices –Constructing paths –Forwarding –Recovery Evaluation –Reliability and recovery –Stretch –Effects on traffic Path Splicing for Interdomain Routing Ongoing: Prototype and Deployment Paths
13 Generating Slices Goal: Each instance provides different paths Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weight –Two schemes: Uniform and degree-based ts Base Graph ts Perturbed Graph
14 How to Perturb the Link Weights? Uniform: Perturbation is a function of the initial weight of the link Degree-based: Perturbation is a linear function of the degrees of the incident nodes –Intuition: Deflect traffic away from nodes where traffic might tend to pass through by default
15 Constructing Paths Goal: Allow multiple instances to co-exist Mechanism: Virtual forwarding tables a t c s b t a t c Slice 1 Slice 2 dstnext-hop
16 Forwarding Traffic Packet has shim header with forwarding bits Routers use lg(k) bits to index forwarding tables –Shift bits after inspection To access different (or multiple) paths, end systems simply change the forwarding bits –Incremental deployment is trivial –Persistent loops cannot occur Various optimizations are possible
17 Forwarding: Putting It Together End system sets forwarding bits in packet header –Forwarding bits specify slice to be used at any hop Router examines/shifts bits, and forwards t s
18 Recovery Mechanisms End-system recovery –Switch slices at every hop with probability 0.5 Network-based recovery –Router switches to a random slice if next hop is unreachable –Continue for a fixed number of hops until destination is reached 18
19 Availability Evaluation: Two Aspects Reliability: Connectivity in the routing tables should approach the that of the underlying graph –If two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in traffic Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path
20 Availability Evaluation A definition for reliability Does path splicing improve reliability? –How close can splicing get to the best possible reliability (i.e., that of the underlying graph)? Can path splicing enable fast recovery? –Can end systems (or intermediate nodes) find alternate paths fast enough?
21 Reliability Definition Reliability: the probability that, upon failing each edge with probability p, the graph remains connected Reliability curve: the fraction of source- destination pairs that remain connected for various link failure probabilities p The underlying graph has an underlying reliability (and reliability curve) –Goal: Reliability of routing system should approach that of the underlying graph.
22 Reliability Curve: Illustration Probability of link failure (p) Fraction of source-dest pairs disconnected Better reliability More edges available to end systems -> Better reliability
23 Experimental Setup Evaluation on two topologies –GEANT (Real) and Sprint (Rocketfuel) Compute base graph by taking the union of k perturbed graphs Remove an edge from the base graph with probability p Compute number of pairs that could reach one another (average over 1,000 trials)
24 Reliability Approaches Optimal Sprint (Rocketfuel) topology 1,000 trials p indicates probability edge was removed from base graph Reliability approaches optimal Average stretch is only 1.3 Sprint topology, degree-based perturbations
25 Simple Recovery Strategies Work Well Which paths can be recovered within 5 trials? –Sequential trials: 5 round-trip times –…but trials could also be made in parallel Recovery approaches maximum possible Adding a few more slices improves recovery beyond best possible reliability with fewer slices.
26 Significant Novelty for Modest Stretch Novelty: difference in nodes in a perturbed shortest path from the original shortest path Example s d Novelty: 1 – (1/3) = 2/3 Fraction of edges on short path shared with long path
27 Summary: Splicing Can Improve Availability Reliability: Connectivity in the routing tables should approach the that of the underlying graph –Approach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between them –Result: Splicing ~ 10 trees achieves near-optimal reliability Recovery: In case of failure, nodes should quickly be able to discover a new path –Approach: End nodes randomly select new bits –Result: Recovery within 5 trials approaches best possible.
28 Does Splicing Create Loops? Persistent loops are avoidable –In the simple scheme, path bits are exhausted from the header –Never switching back to the same Transient loops can still be a problem because they increase end-to-end delay (stretch) –Longer end-to-end paths –Wasted capacity –Two-hop loops do occur (around 1 in 100 trials for k=2, more for higher values of k), but can be avoided with the mechanisms above
29 Interactions with Traffic Maximum utilization unaffected
30 Path Splicing for Interdomain Routing Observation: Many routers already learn multiple alternate routes to each destination. Idea: Use the bits to index into these alternate routes at an ASs ingress and egress routers. Storing multiple entries per prefix Indexing into them based on packet headers Selecting the best k routes for each destination Required new functionality d default alternate Splice paths at ingress and egress routers
31 Experimental Setup 2,500-node policy-annotated AS graph Use C-BGP to compute routes on base graph Remove each inter-AS edge with probability p Test connectivity between a random subset of AS pairs Compute base reliability without policy restrictions
32 Interdomain Splicing: Reliability 2-slice deployment approaches best possible
33 Incremental Deployment Partial deployment provides some gains
34 Ongoing Work Software implementation –Click Element –PlanetLab/VINI deployment Extension to Cisco Multi-Topology Routing –IETF draft in-progress
35 Open Questions and Ongoing Work How does splicing interact with traffic engineering? Sources controlling traffic? What are the best mechanisms for generating slices and recovering paths? Can splicing eliminate dynamic routing?
36 Conclusion Simple: Forwarding bits provide access to different paths through the network Scalable: Exponential increase in available paths, linear increase in state Stable: Fast recovery does not require fast routing protocols