Eliminating Packet Loss Caused by BGP Convergence Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs.

Slides:



Advertisements
Similar presentations
Multihoming and Multi-path Routing
Advertisements

Multihoming and Multi-path Routing
Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
Network Layer: Internet-Wide Routing & BGP Dina Katabi & Sam Madden.
1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
© J. Liebeherr, All rights reserved 1 Border Gateway Protocol This lecture is largely based on a BGP tutorial by T. Griffin from AT&T Research.
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
By Hitesh Ballani, Paul Francis, Xinyang Zhang Slides by Benson Luk for CS 217B.
Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski.
1 Measurement of Highly Active Prefixes in BGP Ricardo V. Oliveira, Rafit Izhak-Ratzin, Beichuan Zhang, Lixia Zhang GLOBECOM’05.
Routing So how does the network layer do its business?
1 An Experimental Analysis of BGP Convergence Time Timothy Griffin AT&T Research & Brian Premore Dartmouth College.
BGP: Inter-Domain Routing Protocol Noah Treuhaft U.C. Berkeley.
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Computer Networking Lecture 10: Inter-Domain Routing
Inherently Safe Backup Routing with BGP Lixin Gao (U. Mass Amherst) Timothy Griffin (AT&T Research) Jennifer Rexford (AT&T Research)
1 Interdomain Routing Policy Reading: Sections plus optional reading COS 461: Computer Networks Spring 2008 (MW 1:30-2:50 in COS 105) Jennifer Rexford.
Interdomain Routing and the Border Gateway Protocol (BGP) Reading: Section COS 461: Computer Networks Spring 2011 Mike Freedman
Computer Networks Layering and Routing Dina Katabi
EQ-BGP: an efficient inter- domain QoS routing protocol Andrzej Bęben Institute of Telecommunications Warsaw University of Technology,
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
9/15/2015CS622 - MIRO Presentation1 Wen Xu and Jennifer Rexford Department of Computer Science Princeton University Chuck Short CS622 Dr. C. Edward Chow.
CS 3700 Networks and Distributed Systems Inter Domain Routing (It’s all about the Money) Revised 8/20/15.
On AS-Level Path Inference Jia Wang (AT&T Labs Research) Joint work with Z. Morley Mao (University of Michigan, Ann Arbor) Lili Qiu (University of Texas,
Reducing Transient Disconnectivity using Anomaly-Cognizant Forwarding Andrey Ermolinskiy, Scott Shenker University of California – Berkeley and ICSI.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
More on Internet Routing A large portion of this lecture material comes from BGP tutorial given by Philip Smith from Cisco (ftp://ftp- eng.cisco.com/pfs/seminars/APRICOT2004.
BGP topics to be discussed in the next few weeks: –Excessive route update –Routing instability –BGP policy issues –BGP route slow convergence problem –Interaction.
Routing Convergence Dan Massey Colorado State University.
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,
On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering.
R-BGP: Staying Connected in a Connected World Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs.
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
D1 - 08/12/2015 Requirements for planned maintenance of BGP sessions draft-dubois-bgp-pm-reqs-02.txt
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
An internet is a combination of networks connected by routers. When a datagram goes from a source to a destination, it will probably pass through many.
Routing Protocols Brandon Wagner.
Securing BGP Bruce Maggs. BGP Primer AT&T /8 Sprint /16 CMU /16 bmm.pc.cs.cmu.edu Autonomous System Number Prefix.
BGP Routing Stability of Popular Destinations Jennifer Rexford, Jia Wang, Zhen Xiao, and Yin Zhang AT&T Labs—Research Florham Park, NJ All flaps are not.
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
Border Gateway Protocol (BGP) (Bruce Maggs and Nick Feamster)
1 Effective Diagnosis of Routing Disruptions from End Systems Ying Zhang Z. Morley Mao Ming Zhang.
Securing BGP Bruce Maggs. BGP Primer AT&T /8 Sprint /16 CMU /16 bmm.pc.cs.cmu.edu Autonomous System Number Prefix.
Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability Zhenhai Duan Florida State University 1.
CSci5221: Inter-Domain Routing Convergence Issues and Improvements 1 Inter-Domain Routing Convergence Issues, Impacts and Improvements Inter-Domain Routing.
1 Internet Routing: BGP Routing Convergence Jennifer Rexford Princeton University
CS 3700 Networks and Distributed Systems
BGP Routing Stability of Popular Destinations
CS 3700 Networks and Distributed Systems
Jian Wu (University of Michigan)
Border Gateway Protocol
COMP 3270 Computer Networks
COS 561: Advanced Computer Networks
(How the routers’ tables are filled in)
Department of Computer and IT Engineering University of Kurdistan
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
BGP Interactions Jennifer Rexford
COS 461: Computer Networks
Transient BGP Loops Do they matter, and what can be done about them?
BGP Instability Jennifer Rexford
Computer Networks Protocols
Presentation transcript:

Eliminating Packet Loss Caused by BGP Convergence Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs

BGP Convergence Causes You Packet Loss Route changes cause up to 30% packet loss for more than 2 minutes [Labovitz00] Even for domains dual homed to tier 1 providers, a failover event can cause multiple loss bursts, and one loss burst can last for up to 20s [Wang06] Popular and unpopular prefixes experience losses due to BGP convergence [Wang05] 50% of VoIP disruptions are highly correlated with BGP updates [Kushman06] The Problem:

What Kind of Solution Do We Want? Eliminate packet loss during BGP convergence An adopting ISP protects itself and its customers from loss even if no other ISP cooperates

Transient Path Loss Problem Nobody offers Patrick an alternate path MIT AT&T Avi Patrick Sprint All of Patrick’s providers are using him to get to MIT

Transient Path Loss Problem MIT AT&T Avi Patrick Sprint Link Down LOSS! Patrick knows no alternate path to MIT Patrick drops AT&T’s and Avi’s packets to MIT, and his own

Transient Path Loss Problem MIT AT&T Avi Patrick Sprint Link Down Eventually, Patrick withdraws path from AT&T and Avi AT&T and Avi stop sending packets to Patrick

Transient Path Loss Problem MIT AT&T Avi Patrick Sprint Link Down Eventually, Patrick withdraws path from AT&T and Avi AT&T and Avi stop sending packets to Patrick AT&T announces the Sprint path to Patrick  Traffic flows Temporary Packet Loss Significant loss happens in today’s Internet, even when connected to Tier 1s

How do we solve Patrick’s problem? Tell Patrick a failover path before the link fails rather than after it, as is often the case in current BGP

AT&T advertises to Patrick “AT&T  Sprint  MIT” as a failover path Link Fails  Patrick immediately sends traffic on failover path MIT AT&T Avi Patrick Sprint Link Down No Loss ! Help Patrick Help You!

Our Solution: Two Simple Rules Routing Rule: Each router advertises only one failover path and only to the next hop router on its primary path Forwarding Rule: When routers receive packets from the next-hop interface for their primary path they forward them along the failover path

Guarantee: A router is guaranteed to see no BGP-caused packet loss during convergence, if it will have a valley-free path to the destination at convergence

Helps Even When Deployed in a Single AS Patrick Joe R1 R2 To MIT Link Down R1 offers Failover path to R2 R2 U-turns packets – back to R1 No Loss ! Currently Patrick drops packets even if he knows an alternate path MIT

Experimental Results Router-Level Simulation over the full Internet o AS-graph from Routeviews and RIPE BGP Data o Use inference algorithms to annotate links with customer-provider or peer relationships o Add border routers based on the connections to other AS o Used internal MRAI of 5s and external MRAI of 30s For each experiment: o Random destination o Take down a Random Link o Find the duration of packet loss for routers using the down link which have a path after convergence o Run for 1000 Randomly Chosen Links and Destinations

Fraction of Routers Duration of Packet Loss in Seconds ~20% See 30s or More of Packet Loss Significant Benefit Running Only in AT&T

Fraction of Routers Duration of Packet Loss in Seconds Significant Benefit Running Only in AT&T Setting MRAI to 0 still leaves Significant Packet Loss Twice the Number of Updates for Both AT&T and Customers

Fraction of Routers Duration of Packet Loss in Seconds Significant Benefit Running Only in AT&T Less than 3% if only AT&T adopts

Fraction of Routers Duration of Packet Loss in Seconds Running Everywhere Eliminates All Packet Loss Full Benefit Once Running Everywhere

Little Additional Overhead 312K 325K Less than 5% more updates network wide

Conclusion Offer a failover path only to next-hop neighbor Eliminates packet loss resulting from BGP convergence An adopting ISP reduces loss even if no other ISP cooperates Simple Mechanism Solves Problem Deployable