Presentation on theme: "Routing Convergence and the Impact of Scale Dan Massey Colorado State University."— Presentation transcript:
Routing Convergence and the Impact of Scale Dan Massey Colorado State University
26 October email@example.com Internet Routing and BGP l Internet divided into Autonomous Systems n Large-scale implies maintaining entire topology at a router is not feasible. l BGP is the inter-AS routing protocol. n Router stores the AS path to a destination. n Path allows router to apply policies l How quickly does BGP converge after a change? n Can BGP continue to scale with more growth? n Do we need BGP changes or a new protocol?
26 October firstname.lastname@example.org (B A) (C B A) (E D B A) (H G F E A) BGP Path Exploration H BZ D A E C dest. IG Obsolete paths: (C B A), (E D B A) If Z knew [B A] failed, it couldve avoided the obsolete paths Zs Candidate paths: () (C B A) (E D B A) (I H G F A) () (E D B A) (I H G F A) () (I H G F A) F ( )
26 October email@example.com Path Exploration and Policy l Internet does not select the shortest path n Policies limit the number of potential paths. n Especially at high level tiers. l Example: Due to routing policy, AS-X (lower tier) sees more alternate paths than AS-Z (tier-1). n Via multiple providers n Via peers Z X P2 Y W P1
26 October firstname.lastname@example.org Impact of Topology Growth l Denser connectivity => more alternate paths l Impact depends on policies and tier n Lower tier nodes see more slow convergence MRAI off MRAI on Jan 2, 2004Dec 2, 2004 Beacon prefix 18.104.22.168/24 RV peer ( AS# )#updates#paths#updates#paths 1239 (tier1)444374 12216288711 2914 (tier1)10662797 35571021919839
26 October email@example.com Convergence Improvements l MRAI Timer (Deployed Now) n Require minimum time between updates n Typically 30 seconds l Assertion Checking (Proposed in INFOCOM 02) n Signal policy or topological failure in some cases n Discard routes that include failed subpath l Ghost Flushing (Proposed in INFOCOM 03) n When the MRAI timer delays an update, send a withdrawal l Attach Failure Notification (INFOCOM05, CompNet05) n Explicitly list the cause of the failure
26 October firstname.lastname@example.org MRAI Rate-Limiting Timer Minimum Route Advertisement Interval (MRAI) timer: Within M=30 seconds, at most one announcement from A to B P1P1 P2P2 P P P As path changes: Msgs from A to B: P1P1 time=0time=30 time=60 P4P4 P b. delay convergence a. suppress transient changes Impact:
26 October email@example.com MRAI and Ghost Flushing MRAI prevents removal of stale information Suppose P1 to P5 are increasingly worse Neighbor believes P1 still available until time 30 P1P1 P2P2 P P P As path changes: Msgs from A to B: P1P1 time=0time=30 time=60 P4P4 P w Ghost Flushing: if change to longer path and MRAI applies, send a withdraw w
26 October firstname.lastname@example.org Root Cause Notification l The node who detects the failure attaches root cause to msg l Other nodes copy the root cause to outgoing messages (B A) (C B A) (E D B A) (H G F E A) H BZ D A E C IG Zs Candidate paths: F () ( C B A ) ( E D B A ) (I H G F A) ( ), [B A] failure the first msg is enough for Z to remove all the obsolete paths
26 October email@example.com Ghost Flushing Assertion BGP Root Cause Notification Fail-down Simulation Results Fail-down: destination becomes unreachable
26 October firstname.lastname@example.org Ghost Flushing Assertion BGP Root Cause Notification Implication: more redundancy means faster T long convergence Fail-over Simulation Results Fail-over: nodes switch to worse paths
26 October email@example.com Conclusions? (Not Yet!) l Root Cause Approach is Clear Winner n But several non-trivial deployment problems n Not immediately clear we could standardize it. l Ghost-Flushing Does Well in Fail-down n Easily incrementally deployed n But may not work well in Fail-over l MRAI Timer Only n Leaves us with current convergence problems n And the network is getting larger…. n And other complications in large systems….
26 October firstname.lastname@example.org Damping Analysis simulation calculation no damping Convergence Updates Trigger Damping Policies! (could fix if we damped the RCN rather than just updates)
26 October email@example.com But What About Packets? Improving packet delivery is the ultimate goal Ghost Flushing Assertion BGP Root Cause Notification
26 October firstname.lastname@example.org Conclusions l Root Cause Approach Adds Many Benefits n Convergence, dampening, packet delivery, diagnosis,…. l New Routing Designs Should Include RCN n Should be a required part of new routing protocols l Can RCN Be Added to BGP? n Not clear given existing complications n To be continued in IRTF Routing Research Group –Encourage interested researchers to join