Outline of Talk Convergence Strategy and Motivation Solution Taxonomy Existing Solution Space Summary
Traditional convergence strategy Switch to new as fast as you can independently Required for failures –Strategy optimized for this case –traffic to failed element is lost –so speed is essential Used for everything else –common method –traffic can be lost due to loops Fast-Reroute prevents traffic loss due to failure but loops can still cause loss.
Micro-Loop Properties Independent decisions can cause micro-loops. Loops may occur between pairs of nodes or cycles of nodes. Duration depends on relative time to update FIBs. –Implementation differences –Number of affected destinations –Propagation time Loss due to Loop duration may be longer (an order of magnitude) than Loss during the Fast Reroute failover.
Controlled convergence Made feasible for failure case by fast reroute –Traffic is not lost so can afford to take time –Can use common method for both failure and management change events Traditional convergence optimized for failure case without fast-reroute. We can do better… (but keep traditional as safe fall-back for single failure assumption violation.)
Method Comparison NameProsCons Delay (in FIB compute/install Inc. Cost Change Only local router needs to support Unacceptable delay (Bounded, but can be hours!) Unacceptably high (Bounded by max metric used) Synch. FIB Install Seems simple…Couples NTP to Routing. Implementation is complicated - variances may cause very short loop. Minimal (1) Ordered SPFs Control plane only.SRLGs require destination-based decision. High (Bounded by network diameter) Path Locking Deals with SRLGs & uncorrelated changes. Various depending on sub-method Completeness depends on additional forwarding mechanisms. Small (3)
Incremental cost change A change in a link cost of x can only cause loops whose “cyclic” cost is <=x Minimum cycle is 2 (1 in each direction) Hence cost change of 1 can never cause a loop. Where minimum cycle is larger, larger increments can be used. Once cost reaches cost of alternate path no more loops possible. No Cooperation Required – But Can Take Hours
Synchronized FIB swap Network synchronized change-over at predetermined time –Signal/determine time to change –Network Synchronized Time (NTP is there) Either Two FIBs for fast swap –Substantial hardware implications Or FIB update “fast-enough” from change-over time. Dependent on NTP Conceptually simple with minimal signalling – NTP dependency & implementation concerns
Ordering by signalling alone On change, tell old primary neighbors to wait for you Wait for all neighbors as instructed, install FIB, and tell your old primary neighbors. Assumes a single non-SRLG failure –Otherwise communication per destination is required No Estimation Required for FIB Compute/Install - Require Reliable Fast Signalling and Non-Trivial Protocol Extensions
Ordered FIB changes For any isolated link/node change Determine “safe” ordering for FIB installation –bad news: update from edge to failure, –good news: update from change to edge Each router computes its “rank” with respect to the change. Delays for a number of worst-case FIB compute/install times proportional to its rank.
Computing the ordering Single Reverse SPF rooted at change node –Use old SPT to determine relevant node For bad news:- count maximum depth of sub-tree below you For good news:- count maximum hops to change
Delay Proportional to Network Diameter For Good News, rSPF gives necessary depth. For Bad News, rSPF is overly pessimistic for some topologies. Strategies to reduce unnecessary delay –Prune rSPF by only considering the branch across the failure – but still too pessimistic. –Run SPF rooted at edge nodes to correctly prune them – but doesn’t scale. –Compare rSPFs before and after failure Avoids all micro-loops and requires single FIB install. Delay dependent on network diameter so may be unacceptable.
Signalling optimization to Reduce delay Use actual FIB compute/install instead of worst- case –In many cases, actual delay is 0 b/c no change needed. Signal to parents in rSPF when –Nothing to do, or –Completed FIB changes Can change FIB when received signal from all children (or when delay expires) Only an optimization –Loss of signals falls back to delay based
SRLG Concerns Diverse failures may require mutually incompatible ordering Different orderings for individual destination sets may help Need Rules to merge multiple rSPFs
Ordered SPF Summary No forwarding changes required. No signalling required at time of change. Complete prevention of loops for isolated node or link changes. Requires cooperation from all routers May delay re-convergence for tens of seconds (unless optional signalling used) SRLGs require per destination delays and may delay re-convergence more.
Path Locking Framework Obtain a fixed convergence delay regardless of network. Avoid ordering issue by providing transitional paths. Handles SRLGs Different methods to –Determine/Create transitional paths –Direct traffic to use transitional paths Standard trade-off of complexity versus coverage. 1.Tunnels for Transitional Paths 2.Safe Neighbors for Transitional Next-Hops 3.Marked Packets to Use Transitional Topology 4.U-turn Packets to Use New Topology
Time-Line of Convergence Change Discovery Time – At this point, all routers know about the change. Routers install transitional path support. –For some methods, immediately start use of self- determined transitional paths. Use Transitional Paths Time (1 worst-case FIB compute/install later) – All routers use transitional paths, if available, and new primary next-hops otherwise. Lock to New Topology Time (1 worst-case FIB compute/install later) – All routers use new primary next-hops. All micro-loops avoided if a transitional path always exists.
Create Tunnels Requires tunnel computation/creation at topology change Old topology Locking –Tunnel to the upstream side of the failure –Single tunnel for all affected destinations (if link/node failure). New topology locking –Tunnel to first unaffected router on new primary path Tunnels provide a transitional path that can traverse non-supporting routers. Non-supporting routers can only loop locally originated traffic.
Safe Neighbors Find a safe neighbor to use as a transitional next-hop. –Safety condition is a neighbor that is loop-free on old topology and a downstream path on new topology. If two neighboring routers don’t have a safe neighbor, a micro-loop can form on that link. Analysis of real topologies shows pretty good coverage. Local micro-loops possible with non-supporting routers.
Packet Marking Can mark packets to force forwarding according to a particular topology. Topology can be new or old. All marking starts at the Use Transitional Paths Time If using new topology, traffic on new topology after 1+ worst-case FIB compute/install delay.
U-turn Packet Create transitional next-hop by directing U-turn packets to the new primary next-hops. At Use Transitional Paths Time, send traffic to new primaries (potentially explicitly marked as U-turn packets). If implicitly determined U-turn packets, doesn’t require marking. Explicit method for signalling support of U-turns
Lots of Possibilities… What are important criteria? –Time to be converged Affects single failure assumption Network Stability Ballpark requirement is 10s –Simplicity –Support for SRLGs –No additional mechanisms beyond IP (but coverage may suffer…) –Common additional mechanisms for this and IPFRR advanced methods. –Should also work for LDP
Conclusions & Next Steps Incremental Cost Change is impractical. Synchronized FIB Swap – what is the implementation complexity? Implications of coupling NTP to routing? Ordered SPF – long delay and poor SRLG support. Is that enough to be an issue? Path Locking –Seem most promising –Many possibilities to get similar results Please send suggestions and comments to the list. This solution set may not be complete.