Delayed Internet Routing Convergence due to Flap Dampening Z. Morley Mao Ramesh Govindan, Randy Katz, George Varghese

Slow Internet routing convergence BGP is a path-vector protocol Convergence can be O(n!) Multi-homed fail-over linear with longest backup path length Can take up to 15 minutes Why so slow? Protocol effects: path vector protocol Flap Damping can delay convergence! Unexpected interference between two mechanisms of the routing protocol Study this interaction and propose a solution to eliminate this undesired interaction

What is route flap dampening? RFC2439, widely deployed Goals: Reduce router processing load caused by instability Prevent sustained routing oscillations Without sacrificing convergence times for well- behaved routes Parameters: Penalty, half-life, suppress-limit, reuse limit, maximum suppressed time

How does flap dampening work? Suppress limit Reuse limit time penalty Exponentially decayed RIPE-229 recommendation: Don’t damp until fourth flap /24 or longer prefixes: max=min outage 60 min /22, /23 prefixes: max outage=45min, min outage=30min Other prefixes: max outage=30min, min outage=10min

Route withdraw convergence process Example topology: 12 34 Assuming node 1 has a route to a destination, it withdraws the route: Stage (msg processed)Msg queued 0: 1->[2,3,4]W 1: (1->2W)1->[3,4]W, 2->[3,4]A[241] 2: (1->3W)1->4W, 2->[3,4]A[241], 3->[2,4]A[341] 3: (1->4W)2->[3,4]A[241], 3->[2,4]A[341], 4->[2,3]A[431] 4: (4->2A[431])2->[3,4]A[241], 3->[2,4]A[341], 4->[3]A[431] 5: (4->3A[431])2->[3,4]A[241], 3->[2,4]A[341] 6: (3->2A[341])2->[3,4]A[241], 3->[4]A[341] 7: (3->4A[341])2->[3,4]A[241] 8: (2->3A[241])2->[4]A[241], 9: (2->4A[241]) MinRouteAdver timer expires: 4->[2,3]W, 3->[2,4]A[3241], 2->[3,4]A[2431] … (omitted) Note: In responding to withdrawal from 1, node 3 sends out 3 messages: 3->[2,4]A[341], 3->[2,4]A[3241], 3->[2,4]W

Interaction btw. Flap damping and convergence Assume a node 5 is attached to 3, and after node 1 withdraws, it announces the route again Node 5 can suppress the route from node 3! A single flap is multiplied by 3, triggering route suppression Convergence is further delayed! Example topology: 12 34 5

Data analysis Is the toy topology realistic? Exchange points often have clique topologies There are usually multiple backup paths Evidence found in data analysis of real BGP updates Example (from RIPE): BGP4MP|1009757425|A|202.12.29.64|4608|199.5.187.0/24|4608 1221 4637 701|IGP|202.12.29.64|0|0||NAG|| BGP4MP|1009757478|A|202.12.29.64|4608|199.5.187.0/24|4608 1221 4637 1 701|IGP|202.12.29.64|0|0||NAG|| BGP4MP|1009757505|A|202.12.29.64|4608|199.5.187.0/24|4608 1221 4637 7176 1 701|IGP|202.12.29.64|0|0||NAG|| BGP4MP|1009757531|W|202.12.29.64|4608|199.5.187.0/24

Simulations/Analysis Simulation using SSFnet Topologies Toy topologies, e.g., cliques Real AS graphs with commercial relationships Analysis Impact of flap damping on convergence Properties of topologies to trigger this effect Effect of policies Decisions of provider selections and connectivity

Proposed solution Redefine the definition of flap Currently any route change is considered a flap New definition flap has to change direction of route degree of preference (dop) value, relative to the previous flap Keep two additional bits (about dop comparison) 00: undefined, 01: equal, 10: better, 11: worse Convergence flap properties Increasing Aspath lengths Route value keeps increasing Solution is currently evaluated using trace- driven simulation!

Conclusion/Future work Route flap damping can interfere with BGP route convergence Trades off convergence for stability Interesting thought exercises: Tradeoffs between convergence and stability Flap Damping How to infer the causes of flaps How to prevent damping legitimate updates Challenges: Internet topology is less hierarchical Multi-homing is growing

