Presentation on theme: "Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala."— Presentation transcript:
Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala
2 It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be: 1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today. It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be: 1.Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today. 2.… Availability
3 Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability 911 Phone service (1993 NRIC report +) –29 minutes per year per line –99.994% availability Std. Phone service (various sources) –53+ minutes per line per year –99.99+% availability
4 Can the Internet Be Always On? Various studies (Paxson, etc.) show the Internet is at about 2.5 nines More critical (or at least availability-centric) applications on the Internet At the same time, the Internet is getting more difficult to debug –Increasing scale, complexity, disconnection, etc. Is it possible to get to 5 nines of availability? If so, how?
5 Availability: Two Aspects Reliability: Connectivity in the routing tables should approach the that of the underlying graph –If two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in traffic Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path
6 Where Todays Protocols Stand Reliability: Routing protocols are single path. –When a link or node failure occurs, routers must recompute new paths to each destination –Approach: Compute backup paths –Challenge: Many possible failure scenarios! Recovery: Todays Internet routing protocols –Meanwhile, packets are dropped, reordered, etc. –Approach: Switch to a backup when a failure occurs –Challenge: Must quickly discover a new working path
7 Multipath: Promise and Problems Bad: If any link fails on both paths, s is disconnected from t Want: End systems remain connected unless the underlying graph has a cut ts
8 Path Splicing: Main Idea Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol t s Compute multiple forwarding trees per destination. Allow packets to switch slices midstream.
10 Mechanism #1: Perturbations Goal: Each instance provides different paths Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weight –Two schemes: Uniform and degree-based ts 3 3 3 Base Graph ts 3.5 4 5 1.5 1.25 Perturbed Graph
11 How to Perturb the Link Weights? Uniform: Perturbation is a function of the initial weight of the link Degree-based: Perturbation is a linear function of the degrees of the incident nodes –Intuition: Deflect traffic away from nodes where traffic might tend to pass through by default
12 Mechanism #2: Network Slicing Goal: Allow multiple instances to co-exist Mechanism: Virtual forwarding tables a t c s b t a t c Slice 1 Slice 2 dstnext-hop
13 Forwarding Traffic Packet has shim header with forwarding bits Routers use lg(k) bits to index forwarding tables –Shift bits after inspection To access different (or multiple) paths, end systems simply change the forwarding bits –Incremental deployment is trivial –Persistent loops cannot occur
14 Putting It Together End system sets forwarding bits in packet header Forwarding bits specify slice to be used at any hop Router: examines/shifts forwarding bits, and forwards t s
15 A Definition Motivated by Reliability Reliability: the probability that, upon failing each edge with probability p, the graph remains connected Reliability curve: the fraction of source- destination pairs that remain connected for various link failure probabilities p The underlying graph has an underlying reliability (and reliability curve) –Goal: Reliability of routing system should approach that of the underlying graph.
16 Reliability Curve: Illustration Probability of link failure (p) Fraction of source-dest pairs disconnected Better reliability More edges available to end systems -> Better reliability
17 Reliability Approaches Optimal Sprint (Rocketfuel) topology 1,000 trials p indicates probability edge was removed from base graph Reliability approaches optimal Average stretch is only 1.3 Sprint topology, degree-based perturbations
18 Recovery is Fast Which paths can be recovered within 5 trials? –Sequential trials: 5 round-trip times –…but trials could also be made in parallel Recovery approaches maximum possible Adding a few more slices improves recovery beyond best possible reliability with fewer slices.
19 Stretch is Bounded Stretch: How much longer is the path taken by packets over the optimal path? –Stretch is bounded in one slice by amount of perturbation –…but what about the stretch of spliced paths? –As long as significant progress (a large fraction of the distance to d) is achieved for each hop, stretch bounded Implication: Loops are rare.
20 Summary: Splicing Improves Availability Reliability: Connectivity in the routing tables should approach the that of the underlying graph –Approach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between them. –Result: Splicing ~ 10 trees achieves near-optimal reliability Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path –Approach: End nodes randomly select new bits. –Result: Recovery within five trials approaches best possible.
21 Open Questions and Future Work How does splicing interact with traffic engineering? Sources controlling traffic? What are the best mechanisms for reliability and recovery? What changes are required to todays routers to make splicing possible? Can splicing eliminate dynamic routing?
22 Variation: BGP Splicing Observation: Many routers already learn multiple alternate routes to each destination. Idea: Use the forwarding bits to index into these alternate routes at an ASs ingress and egress routers. Storing multiple entries per prefix Indexing into them based on packet headers Selecting the best k routes for each destination Required new functionality d default alternate Splice paths at ingress and egress routers
23 Conclusion Simple: Forwarding bits provide access to different paths through the network Scalable: Exponential increase in available paths, linear increase in state Stable: Fast recovery does not require fast routing protocols No modifications to existing routing protocols http://www.cc.gatech.edu/~feamster/papers/splicing-hotnets.pdf
25 History: Network Embedding Given: virtual (V) and physical (P) network –Topology, constraints, etc. Problem: find the appropriate mapping onto available physical resources (nodes and edges) Idea: Define a virtual graph G onto which G can be embedded A link in G can be mapped to multiple links in G How to forward traffic over multiple links in G? …
26 Possible Applications/Future Work Fast recovery from poorly performing paths Data transfer with easy multi-path –Overlay networks, CDNs, etc. –Transfer of video with multiple description Security applications Spatial diversity in wireless networks
27 Significant Novelty for Modest Stretch Novelty: difference in nodes in a perturbed shortest path from the original shortest path Example s d Novelty: 1 – (1/3) = 2/3 Fraction of edges on short path shared with long path
28 Related Work Pre-Computed Backup Paths –Multi-Topology Routing –Multiple Router Configuration –MPLS Fast Reroute End-Node Controlled Traffic –Source routing –Routing deflections Multipath routing (ECMP, MIRO, etc.) IGP link-weight optimization Measurement of path diversity and multihoming Layer-3 VPNs
29 Other Properties Scalable –Exponential increase in paths, linear increase in state Fast recovery from underlying failures Automatic tuning (e.g., for traffic engineering) –Perturbations achieve property of automatically spreading traffic across different links –Standard link-weight optimization is potentially brittle in the face of link failures Incrementally deployable
30 Prototype Implementation Click and Quagga on PL-VINI –http://www.vini-veritas.net/ Control Plane Forwarding Table Daemon Classifier Control Plane Forwarding Table Daemon
31 Loops, Reconsidered Problem: Potential for loops between ASes –AS-level loops can be longer than intra-AS loops Two possible approaches –Detection: routers mark packets and determine that packets have traversed the same AS twice –Prevention: Exploit common routing policies to ensure that packets are only deflected along valley- free paths
32 Preventing Inter-AS Loops with Policy Observation: inter-AS loops inherently involve traversal that violates valley-free Constraints: 1. once a down deflection has occurred, do not deflect 2. only allow one across deflection Possible relaxation: allow a limited number of violations, specified by source
33 Possible Application: Routing Security One Idea: Mitigating BGP route hijacks –End systems or routers learn multiple routes to each destination –Alternate paths at any intermediate point along the path can be tested by twiddling bits in the header Service that uses splicing to alternate paths to discover several valid ones
34 Definitions of Path Diversity Connectivity: Minimum number of edges whose failure disconnects the graph (min cut) Expansion: Intuitively, small cuts disconnect small groups of nodes from the graph
35 Design Goals Reachability: allow endpoints to communicate High Diversity: expose paths to end hosts that survive failures –Capacity: the total available data rate between each source- destination pair should be high –Fault tolerance: the number of disjoint paths should be high, and the network should remain connected under failures Low Stretch: paths should not be too circuitous Scalability: scale to a large number of networks, destinations, routers, etc. Todays routing protocols do not exploit the diversity of the underlying network graph