Nick Feamster MIT Robust Internet Routing.

Slides:



Advertisements
Similar presentations
Practical Searches for Stability in iBGP
Advertisements

Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)
OSPF 1.
Introduction to IP Routing Geoff Huston. Routing How do packets get from A to B in the Internet? A B Internet.
BGP01 An Examination of the Internets BGP Table Behaviour in 2001 Geoff Huston Telstra.
Network Monitoring System In CSTNET Long Chun China Science & Technology Network.
Improving Internet Availability. Some Problems Misconfiguration Miscoordination Efficiency –Market efficiency –Efficiency of end-to-end paths Scalability.
Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)
Networking Research Nick Feamster CS Nick Feamster Ph.D. from MIT, Post-doc at Princeton this fall Arriving January 2006 –Here off-and-on until.
Data-Plane Accountability with In-Band Path Diagnosis Murtaza Motiwala, Nick Feamster Georgia Tech Andy Bavier Princeton University.
Internet Availability Nick Feamster Georgia Tech.
Nick Feamster Research Interest: Networked Systems Arriving January 2006 Likely teaching CS 7260 in Spring 2005 Here off-and-on until then. works.
Multihoming and Multi-path Routing
Network Operations Nick Feamster
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)
1 Resonance: Dynamic Access Control in Enterprise Networks Ankur Nayak, Alex Reimers, Nick Feamster, Russ Clark School of Computer Science Georgia Institute.
Network Operations Nick Feamster
Network Operations Research Nick Feamster
Multihoming and Multi-path Routing
Address-based Route Reflection Ruichuan Chen (MPI-SWS) Aman Shaikh (AT&T Labs - Research) Jia Wang (AT&T Labs - Research) Paul Francis (MPI-SWS) CoNEXT.
MPLS VPN.
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v MPLS VPN Technology Introducing the MPLS VPN Routing Model.
Route Optimisation RD-CSY3021.
Multihoming and Multi-path Routing CS 7260 Nick Feamster January
1 Incentive-Compatible Interdomain Routing Joan Feigenbaum Yale University Vijay Ramachandran Stevens Institute of Technology Michael Schapira The Hebrew.
Deployment of MPLS VPN in Large ISP Networks
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
© J. Liebeherr, All rights reserved 1 Border Gateway Protocol This lecture is largely based on a BGP tutorial by T. Griffin from AT&T Research.
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
Towards a Logic for Wide-Area Internet Routing Nick Feamster and Hari Balakrishnan M.I.T. Computer Science and Artificial Intelligence Laboratory Kunal.
S ufficient C onditions to G uarantee P ath V isibility Akeel ur Rehman Faridee
Slide -1- February, 2006 Interdomain Routing Gordon Wilfong Distinguished Member of Technical Staff Algorithms Research Department Mathematical and Algorithmic.
Interdomain Routing Establish routes between autonomous systems (ASes). Currently done with the Border Gateway Protocol (BGP). AT&T Qwest Comcast Verizon.
Inherently Safe Backup Routing with BGP Lixin Gao (U. Mass Amherst) Timothy Griffin (AT&T Research) Jennifer Rexford (AT&T Research)
Economic Incentives in Internet Routing Jennifer Rexford Princeton University
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research Joint work with Lixin Gao.
Nick Feamster Interdomain Routing Correctness and Stability.
Towards a Logic for Wide- Area Internet Routing Nick Feamster Hari Balakrishnan.
I-4 routing scalability Taekyoung Kwon Some slides are from Geoff Huston, Michalis Faloutsos, Paul Barford, Jim Kurose, Paul Francis, and Jennifer Rexford.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
© 2009 Cisco Systems, Inc. All rights reserved. ROUTE v1.0—6-1 Connecting an Enterprise Network to an ISP Network BGP Attributes and Path Selection Process.
CS 3700 Networks and Distributed Systems Inter Domain Routing (It’s all about the Money) Revised 8/20/15.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
Towards an Internet that “Never Fails” Hari Balakrishnan MIT Joint work with Nick Feamster, Scott Shenker, Mythili Vutukuru.
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
CSci5221: BGP Policies1 Inter-Domain Routing: BGP, Routing Policies, etc. BGP Path Selection and Policy Routing Stable Path Problem and Policy Conflicts.
Doing Don’ts: Modifying BGP Attributes within an Autonomous System Luca Cittadini, Stefano Vissicchio, Giuseppe Di Battista Università degli Studi RomaTre.
CS 3700 Networks and Distributed Systems
CS 3700 Networks and Distributed Systems
New Directions in Routing
Border Gateway Protocol
COS 561: Advanced Computer Networks
Interdomain Traffic Engineering with BGP
Introduction to Internet Routing
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
Backbone Networks Mike Freedman COS 461: Computer Networks
BGP Policies Jennifer Rexford
COS 561: Advanced Computer Networks
BGP Security Jennifer Rexford Fall 2018 (TTh 1:30-2:50 in Friend 006)
COS 461: Computer Networks
COS 561: Advanced Computer Networks
BGP Instability Jennifer Rexford
Computer Networks Protocols
Presentation transcript:

Nick Feamster MIT Robust Internet Routing

Design of Large, Federated Systems Distributed systems with independent, competing entities driven by self-interest Must cooperate to provide service Internet routing, peer-to-peer systems, federated computing infrastructures,… Key problem: Specifying and guaranteeing correct operation

Internet Routing Large-scale: Thousands of autonomous networks Self-interest: Independent economic and performance objectives But, must cooperate for global connectivity Comcast AT&T Sprint Level3 MIT The Internet

Internet Routing Protocol: BGP Route Advertisement Autonomous Systems Session Traffic

Configuration Defines Behavior How traffic enters and leaves the network –Load balance –Traffic engineering –Primary/backup paths Which neighboring networks can send traffic –Defines business relationships and contracts How routers within the network learn routes –Scaling and performance Provides flexibility for realizing operational goals FlexibilityComplexity

Most Important Goal: Correctness Configuration is difficult. Operators make mistakes. –Complex policies –Configuration is distributed across routers Each network independently configured –Unintended policy interactions Mistakes happen! Unfortunately… Why?

Today: Stimulus-Response Unacceptable programming environment Mistakes cause downtime Mistakes often not immediately apparent What happens if I tweak this policy…?

Catastrophic Configuration Faults …a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint. -- news.com, April 25, 1997Sprint Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes. -- wired.com, January 25, 2001 WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue." -- cnn.com, October 3, 2002 "A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration). -- dslreports.com, February 23, 2004

Operator Mailing List (NANOG) Date: Mon, 18 Oct :15: Subject: Level 3 US east coast "issues" Level 3 experiencing widespread "unspecified routing issues" on the US east coast. Master ticket Anyone have more specific information? Date: Mon, 18 Oct :20: (EDT) Subject: Re: Level 3 US east coast "issues" Level 3 is currently experiencing a backbone outage causing routing instability and packet loss. We are working to restore and will be sending hourly updates…

Operator Mailing List Note: Only includes problems openly discussed on this list. Compare: 83 power outages, 1 fire

Problem Guarantee correctness of the global routing system. Examine only local configurations.

Contributions Correctness specification and constraints for global Internet routing rcc (router configuration checker) –Static configuration analysis tool for fault detection –Used by network operators (including large backbone networks) Analysis of real-world network configurations from 17 autonomous systems

rcc Design Intermediate Representation Correctness Specification Constraints Faults Distributed router configurations

Challenges Defining a correctness specification Deriving verifiable constraints from specification Analyzing complex, distributed configuration Verifying correctness with local (per-AS) information

Correctness Specification Path Visibility For each usable path, a corresponding route advertisement must be available Route Validity For each available route, there must exist a corresponding usable path Safety For any given set of configurations distributed across routers in different ASes, a stable path must exist, and the protocol must converge to it

Factoring Routing Configuration Ranking: route selection Dissemination: internal route advertisement Filtering: route advertisement Customer Competitor Primary Backup

Path Visibility If every router learns a route for every usable path, then path visibility is satisfied. A usable path: - Reaches the destination - Corresponds to the path that packets take when using that route - Conforms to the policies of the routers on that path Possible path visibility faults Dissemination Filtering - Partition in session-level graph that disseminates routes - Filtering routes for prefixes for usable paths

Path Visibility: Internal BGP (iBGP) Default: dont re-advertise iBGP-learned routes. Complete propagation requires full mesh iBGP. Doesnt scale. Route reflection improves scaling. Client: re-advertise as usual. Route reflector: reflect non-client routes to all clients, client routes to non-clients and other clients. iBGP Route reflector

Path Visibility: iBGP Signaling R W X Z Y Route reflectors Clients No route to destination. Debugging nightmare!

Path Visibility: iBGP Signaling R W X Z Route reflectors Clients Y Theorem. Suppose the iBGP reflector-client relationship graph contains no cycles. Then, path visibility is safisfied if, and only if, the set of routers that are not route reflector clients forms a full mesh. Condition is easy to check with static analysis.

Path Visibility Faults in Practice 11 networks 133 routers 420 sessions Analysis of configuration from 17 ASes

Route Validity If every route that a router learns corresponds to a usable path, then route validity is satisfied. A usable path: - Reaches the destination - Corresponds to the path that packets take when using that route - Conforms to the policies of the routers on that path Possible route validity faults Filtering - Unintentionally providing transit service - Advertising routes that violate higher-level policy - Originating routes for private (or unowned) address space Dissemination - Loops and deflections along internal routing path

Route Validity: Consistent Export Rules of settlement-free peering: –Advertise routes at all peering points –Advertised routes must have equal AS path length Sprint AT&T Enables hot potato routing. equally good routes

Route Validity: Consistent Export Malice/deception iBGP signaling partition Inconsistent export policy neighbor route-map PEER permit 10 set prepend 123 neighbor route-map PEER permit 10 set prepend Possible Causes Neighbor AS Export ExportClause Prepend Policy normalization makes comparison easy.

Route Validity Faults in Practice 233 Sessions 117 Sessions 6 Sessions 45 Sessions 196 Sessions Analysis of configuration from 17 ASes

Safety If the routing protocol computes a stable path assignment for every possible initial state and message ordering, then the protocol is safe. Depends on the interactions of rankings and filters of multiple ASes. Challenge: Guarantee safety with only local information.

Safety: Policy Interactions Problem: Local rankings of each network may conflict For any possible path assignment, some node will always want to switch to a better path.

Expressiveness vs. Safety Restricting ranking and filtering can guarantee safety. No dispute wheel implies safety. Whats known No dispute wheel is a global condition. Systems with dispute wheels can be safe

Constraints for Safety We show Dispute ring implies no safety under filtering. First necessary condition for safety. Safe No Dispute Ring Safe under Filtering No Dispute Wheel

Given no restrictions on filtering or topology, what are the local restrictions on rankings to guarantee safety under filtering? Safety, Autonomy, Expressiveness? Define local. We introduce a verifier. A local verifier should have three properties: –Operates on the rankings of a single network. –Permutation invariance: answer independent of node labels. –Scale invariance: answer independent of new nodes. Local Verifier Rankings (from single network) Accept/Reject

Examples of Local Verifiers Accept next-hop rankings. Reject others. –Captures most routing policies –Problem: resulting system not guaranteed to be safe (Operationally significant.) Routing based on shortest hop count. –Guarantees safety under filtering –Problem: too restrictive Is there a middle ground?

Results: Shortest Paths is It Theorem. Permitting paths of length n+2 over paths of length n can violate safety under filtering. Theorem. Permitting paths of length n+1 over paths of length n can result in a dispute wheel.

Proof Idea: Build a Dispute Ring Any pair of (n,n+1)-hop paths defines a (partial) permutation. Add nodes, apply permutation invariance. Add new nodes and rankings for new paths. Verifier must accept some new ranking (scale invariance) hops 2-hops add 6,

Safety, Autonomy, Expressiveness: Pick Two Next-hop rankings not guaranteed to be safe. Shortest hop count is safe under filtering. Preferring paths of length n+2 over paths of length n can violate safety under filtering. Preferring paths of length n+1 over paths of length n can result in a dispute wheel.

rcc Implementation PreprocessorParser Verifier Distributed router configurations Relational Database (mySQL) Conditions Faults

Operator Feedback Thats wicked! -- Nicolas Strina, ip-man.net Thanks again for a great tool. -- Paul Piecuch, IT Manager...good to finally see more coverage of routing as distributed programming. From my experience, the principles of software engineering eliminate a vast majority of errors. -- Joe Provo, rcn.com I find your approach useful, it is really not fun (but critical for the health of the network) to keep track of the inconsistencies among different routers…a configuration verifier like yours can give the operator a degree of confidence that the sky won't fall on his head real soon now. -- Arnaud Le Tallanter, clara.net

Summary: Faults across ASes Route Validity Path Visibility

rcc: Summary of Contributions Correctness specification for Internet routing –Path visibility –Route validity –Safety Static analysis of routing configuration –Global correctness guarantees with only local checks New results on global stability Analysis of 17 real-world networks Practical and research significance –Downloaded by over sixty operators.

Take-away Lessons Distributed configuration is a bad idea. Intra-AS route dissemination is too complex. Safety, Autonomy, Expressiveness: Pick Two Possibilities for using shortest paths routing that preserve expressiveness? Ongoing work on Routing Control Platform

Research Interests Internet and Enterprise Networking Anonymous Communication Distributed Communication/C omputation CorrectnessInternet routingWireless routing PredictabilityInternet Traffic Engineering SecurityData plane security, VPNs Mix-net analysisWeb client authentication Fault Diagnosis and Monitoring Anomaly detectionSpam forensics PerformanceEnd-to-end Internet path failures Censorship- resistant Web browsing Signal processing in sensor networks Designing, building, modeling, and measuring distributed systems where multiple autonomous entities must cooperate to provide a service.

Research Interests Internet routing Public wireless networks Anonymous communication systems Distributed communication and computation Measuring, modeling, designing, and building distributed systems where multiple autonomous entities must cooperate to provide a service. Examples Challenges Correct, robust operation Predictability Security Fault monitoring Troubleshooting Performance This talk

Research Areas Routing: Internet and Enterprise Networks –Protocol design [HotNets 2004] –Traffic engineering [CCR 2003, SIGMETRICS 2004] –Network architectures [FDNA 2004, NSDI 2005] –Routing instability and end-to-end performance [SIGMETRICS 2003] Network Security –Censorship-resistant communication [Usenix Security 2002] –Web client authentication [Usenix Security 2001] –Anomaly detection [IMC 2004] Adaptive Streaming Media –Real-time video transcoding [ICIP 1999] –Network protocols for selective retransmission [PV 2002, PV 2003]

Possible Reactions The protocol is buggy. –What to fix? –New protocol would have to be just as flexible Configuration language is too low-level. –Flaws in todays configuration languages? –Unclear what primitives the language should permit

Example: Bogon Routes Feamster et al., An Empirical Study of Bogon Route Advertisements. ACM CCR, January 2005.

Inconsistent Export in Practice Feamster et al., BorderGuard: Detecting Cold Potatoes from Peers. ACM IMC, October 2004.

Recall: Internet Routing Policy Ranking: route selectionFiltering: route advertisement Ranking: Allows each router to select best route Useful for primary/backup, traffic engineering, etc. Filtering: Controls which neighboring networks learn routes Useful for implementing contracts

Which faults does rcc detect? Like static analysis of software, neither complete nor sound. Faults found by rcc Latent faults Potentially active faults End-to-end failures

Many Demands on Internet Routing Operation on an Internet scale –Millions of destinations –Thousands of independent networks –Hundreds of routers per network Implementation of complex policies –Transit, peering, paid peering –Primary/backup –Traffic engineering Guaranteed convergence Performance –Fast convergence Security Routing Table Entries Source: Geoff Huston

Path Visibility: Blackhole Date: Thu, 18 Jul :05: (EDT) From: Chad Oleary Subject: Re: problems with 701 To: We're starting to see the same issues with UUNet, again. Anyone else seeing this? Trying to reach Qwest... traceroute to ( ), 30 hops max, 38 byte packets 1 esc-lp2-gw.e-solutionscorp.com ( ) ms ms ms Serial2-10.GW1.TPA2.ALTER.NET ( ) ms ms ms at XL4.ATL1.ALTER.NET ( ) ms ms ms 4 0.so XL2.ATL5.ALTER.NET ( ) ms ms ms 5 POS7-0.BR2.ATL5.ALTER.NET ( ) ms ms ms 6 * * * 7 * * * …