Presentation on theme: "Nick Feamster Georgia Tech"— Presentation transcript:
1Nick Feamster Georgia Tech Path SplicingNick Feamster Georgia TechJoint work with Murtaza Motiwala, Santosh Vempala, Megan Elmore
2Internet Availability OK for and the Web, but what about:E911 serviceAir traffic control…Stanford University Clean-Slate Design for the Internet:“It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. … It should be:Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.…
3Work to do…Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines”More “critical” (or at least availability-centric) applications on the InternetAt the same time, the Internet is getting more difficult to debugScale, complexity, disconnection, etc.It is not difficult to create a list of desired characteristics for a new Internet. Deciding how todesign and deploy a network that achieves these goals is much harder. Over time, our listwill evolve. It should be:1. Robust and available. The network should be as robust, fault-tolerant andavailable as the wire-line telephone network is today.
8Threats to Availability Natural disastersPhysical failures (node, link)Router software bugsMisconfigurationMis-coordinationDenial-of-service (DoS) attacksChanges in traffic patterns (e.g., flash crowd)…
9Idea: Backup/Multipath For intradomain routingIP and MPLS fast re-routePacket deflections [Yang 2006]ECMP, NotVia, Loop-Free Alternates [Cisco]For interdomain routingMIRO [Rexford 2006]ProblemScale: Protecting against arbitrary failures requires storing lots of state, exchanging lots of messagesControl: End systems can’t signal when they think a path has “failed”
10Backup Paths: Promise and Problems Bad: If any link fails on both paths, s is disconnected from tWant: End systems remain connected unless the underlying graph has a cut
11Path Splicing: Main Idea Compute multiple forwarding trees per destination. Allow packets to switch slices midstream.tsStep 1 (Generate slices): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configurationStep 2 (Splice end-to-end paths): Allow traffic to switch between instances at any node in the protocol
12Outline Path Splicing for Intradomain Routing Evaluation Generating slicesConstructing pathsForwardingRecoveryEvaluationReliability and recoveryStretchEffects on trafficPath Splicing for Interdomain RoutingOngoing: Prototype and Deployment Paths
13Generating Slices Goal: Each instance provides different paths Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weightTwo schemes: Uniform and degree-based“Base” Graphts3.5451.51.25Perturbed Graph33st3
14How to Perturb the Link Weights? Uniform: Perturbation is a function of the initial weight of the linkDegree-based: Perturbation is a linear function of the degrees of the incident nodesIntuition: Deflect traffic away from nodes where traffic might tend to pass through by default
15Constructing Paths Goal: Allow multiple instances to co-exist Mechanism: Virtual forwarding tablesatcsbt at cSlice 1Slice 2dstnext-hop
16Forwarding Traffic Packet has shim header with forwarding bits Routers use lg(k) bits to index forwarding tablesShift bits after inspectionTo access different (or multiple) paths, end systems simply change the forwarding bitsIncremental deployment is trivialPersistent loops cannot occurVarious optimizations are possible
17Forwarding: Putting It Together End system sets forwarding bits in packet headerForwarding bits specify slice to be used at any hopRouter examines/shifts bits, and forwardsst
18Recovery Mechanisms End-system recovery Network-based recovery Switch slices at every hop with probability 0.5Network-based recoveryRouter switches to a random slice if next hop is unreachableContinue for a fixed number of hops until destination is reachedNeeds good explanationNetwork-based works almost as well as end-user recovery scheme. The reason that we may not be able to find a path using network-based scheme is if we end-up in a path with a dead-end due to switching.1818
19Availability Evaluation: Two Aspects Reliability: Connectivity in the routing tables should approach the that of the underlying graphIf two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in trafficRecovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path
20Availability Evaluation A definition for reliabilityDoes path splicing improve reliability?How close can splicing get to the best possible reliability (i.e., that of the underlying graph)?Can path splicing enable fast recovery?Can end systems (or intermediate nodes) find alternate paths fast enough?
21Reliability Definition Reliability: the probability that, upon failing each edge with probability p, the graph remains connectedReliability curve: the fraction of source-destination pairs that remain connected for various link failure probabilities pThe underlying graph has an underlying reliability (and reliability curve)Goal: Reliability of routing system should approach that of the underlying graph.
22Reliability Curve: Illustration Fraction of source-dest pairs disconnectedBetter reliabilityProbability of link failure (p)More edges available to end systems -> Better reliability
23Experimental Setup Evaluation on two topologies GEANT (Real) and Sprint (Rocketfuel)Compute base graph by taking the union of k perturbed graphsRemove an edge from the base graph with probability pCompute number of pairs that could reach one another (average over 1,000 trials)
24Reliability Approaches Optimal Sprint (Rocketfuel) topology1,000 trialsp indicates probability edge was removed from base graphReliability approaches optimalAverage stretch is only 1.3Sprint topology, degree-based perturbations
25Simple Recovery Strategies Work Well Which paths can be recovered within 5 trials?Sequential trials: 5 round-trip times…but trials could also be made in parallelRecovery approaches maximum possibleAdding a few more slices improves recovery beyond best possible reliability with fewer slices.
26Significant Novelty for Modest Stretch Novelty: difference in nodes in a perturbed shortest path from the original shortest pathFraction of edges on short path shared with long pathExamplesdNovelty: 1 – (1/3) = 2/3
27Summary: Splicing Can Improve Availability Reliability: Connectivity in the routing tables should approach the that of the underlying graphApproach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between themResult: Splicing ~ 10 trees achieves near-optimal reliabilityRecovery: In case of failure, nodes should quickly be able to discover a new pathApproach: End nodes randomly select new bitsResult: Recovery within 5 trials approaches best possible.
28Does Splicing Create Loops? Persistent loops are avoidableIn the simple scheme, path bits are exhausted from the headerNever switching back to the sameTransient loops can still be a problem because they increase end-to-end delay (“stretch”)Longer end-to-end pathsWasted capacityTwo-hop loops do occur (around 1 in 100 trials for k=2, more for higher values of k), but can be avoided with the mechanisms above
29Interactions with Traffic Maximum utilization unaffected
30Path Splicing for Interdomain Routing Observation: Many routers already learn multiple alternate routes to each destination.Idea: Use the bits to index into these alternate routes at an AS’s ingress and egress routers.defaultdalternateSplice paths at ingress and egress routersStoring multiple entries per prefixIndexing into them based on packet headersSelecting the “best” k routes for each destinationRequired new functionality
31Experimental Setup 2,500-node policy-annotated AS graph Use C-BGP to compute routes on base graphRemove each inter-AS edge with probability pTest connectivity between a random subset of AS pairsCompute base reliability without policy restrictions
32Interdomain Splicing: Reliability 2-slice deployment approaches best possible
33Incremental Deployment Partial deployment provides some gains
34Ongoing Work Software implementation Click ElementPlanetLab/VINI deploymentExtension to Cisco Multi-Topology RoutingIETF draft in-progress
35Open Questions and Ongoing Work How does splicing interact with traffic engineering? Sources controlling traffic?What are the best mechanisms for generating slices and recovering paths?Can splicing eliminate dynamic routing?
36ConclusionSimple: Forwarding bits provide access to different paths through the networkScalable: Exponential increase in available paths, linear increase in stateStable: Fast recovery does not require fast routing protocols