Presentation is loading. Please wait.

Presentation is loading. Please wait.

15744 - Fall 2004 Lecture 31 Inter-Domain Routing: BGP, Overlay Routing, Multihoming Vyas Sekar Based on slides from: Srini Seshan, Tim Griffin.

Similar presentations


Presentation on theme: "15744 - Fall 2004 Lecture 31 Inter-Domain Routing: BGP, Overlay Routing, Multihoming Vyas Sekar Based on slides from: Srini Seshan, Tim Griffin."— Presentation transcript:

1 15744 - Fall 2004 Lecture 31 Inter-Domain Routing: BGP, Overlay Routing, Multihoming Vyas Sekar Based on slides from: Srini Seshan, Tim Griffin

2 15744 - Fall 2004 Lecture 32 Readings Assigned –[Mit notes] – Overview of BGP/ Interdomain routing –[Gao01] – Inferrring AS relationships –[Lab00] – BGP Convergence –[S+99] -- Suboptimality in Internet routing Optional –[Griffin01] – BGP Tutorial –[SARK01] – Characterizing the Internet Hierarchy from multiple vantage points –[APMS+04] – A Comparison of Overlay routing and multihoming –[AWBKM] – Resilient Overlay Networks –[GW02] – iBGP configuration

3 15744 - Fall 2004 Lecture 33 Outline Need for hierarchical routing BGP –ASes, Policies –BGP Attributes –BGP Path Selection –iBGP –Inferring AS relationships Problems with BGP –Convergence –Sub optimal routing Overlay Routing and Multihoming Summary

4 15744 - Fall 2004 Lecture 34 Routing Hierarchies Flat routing doesn’t scale –Each node cannot be expected to have routes to every destination (or destination network) Key observation –Need less information with increasing distance to destination Two radically different approaches for routing –The area hierarchy? –The landmark hierarchy (discuss in routing alternatives)?

5 15744 - Fall 2004 Lecture 35 Areas Divide network into areas –Areas can have nested sub-areas –Constraint: no path between two sub-areas of an area can exit that area Hierarchically address nodes in a network –Sequentially number top- level areas –Sub-areas of area are labeled relative to that area –Nodes are numbered relative to the smallest containing area

6 15744 - Fall 2004 Lecture 36 Routing Within area –Each node has routes to every other node Outside area –Each node has routes for other top-level areas only –Inter-area packets are routed to nearest appropriate border router Can result in sub-optimal paths

7 15744 - Fall 2004 Lecture 37 Path Sub-optimality 12 3 1.1 1.2 2.1 2.2 3.1 3.2 2.2.1 3 hop red path vs. 2 hop green path start end 3.2.1 1.2.1

8 15744 - Fall 2004 Lecture 38 A Logical View of the Internet Tier 1 Tier 2 Tier 3 National (Tier 1 ISP) –“Default-free” with global reachability info Eg: AT & T, UUNET, Sprint Regional (Tier 2 ISP) –Regional or country- wide Eg: Pacific Bell Local (Tier 3 ISP) Eg: Telerama DSL Customer Provider

9 15744 - Fall 2004 Lecture 39 Landmark Routing: Basic Idea Source wants to reach LM0[a], whose address is c.b.a: Source can see LM2[c], so sends packet towards c Entering LM1[b] area, first router diverts packet to b Entering LM0[a] area, packet delivered to a - Not shortest path - Packet does not necessarily follow specified landmarks - No policy routing - Not source routing -Why -Small routing tables -Dynamic, self configuring algorithms

10 15744 - Fall 2004 Lecture 310 Outline Need for hierarchical routing BGP –ASes, Policies –BGP Attributes –BGP Path Selection –iBGP –Inferring AS relationships Problems with BGP –Convergence –Sub optimal routing Overlay Routing and Multihoming Summary

11 15744 - Fall 2004 Lecture 311 Autonomous Systems (ASes) Autonomous Routing Domain –Glued together by a common administration, policies etc Autonomous system – is a specific case of an ARD –ARD is a concept vs AS is an actual entity that participates in routing –Has an unique 16 bit ASN assigned to it and typically participates in inter-domain routing Examples: –MIT: 3, CMU: 9 –AT&T: 7018, 6341, 5074, … –UUNET: 701, 702, 284, 12199, … –Sprint: 1239, 1240, 6211, 6242, … How do ASes interconnect to provide global connectivity How does routing information get exchanged

12 15744 - Fall 2004 Lecture 312 Nontransit vs. Transit ASes ISP 1 ISP 2 Nontransit AS might be a corporate or campus network. Could be a “content provider” NET A Traffic NEVER flows from ISP 1 through NET A to ISP 2 (At least not intentionally!) IP traffic

13 15744 - Fall 2004 Lecture 313 Customers and Providers Customer pays provider for access to the Internet provider customer IP traffic provider customer

14 15744 - Fall 2004 Lecture 314 The Peering Relationship peer customerprovider Peers provide transit between their respective customers Peers do not provide transit between peers Peers (often) do not exchange $$$ traffic allowed traffic NOT allowed A B C

15 15744 - Fall 2004 Lecture 315 Peering Wars Reduces upstream transit costs Can increase end-to-end performance May be the only way to connect your customers to some part of the Internet (“Tier 1”) You would rather have customers Peers are usually your competition Peering relationships may require periodic renegotiation Peering struggles are by far the most contentious issues in the ISP world! Peering agreements are often confidential. PeerDon’t Peer

16 15744 - Fall 2004 Lecture 316 Routing in the Internet Link state or distance vector? –No universal metric – policy decisions Problems with distance-vector: –Bellman-Ford algorithm may not converge Problems with link state: –Metric used by routers not the same – loops –LS database too large – entire Internet –May expose policies to other AS’s

17 15744 - Fall 2004 Lecture 317 Solution: Distance Vector with Path Each routing update carries the entire path Loops are detected as follows: –When AS gets route check if AS already in path If yes, reject route If no, add self and (possibly) advertise route further Advantage: –Metrics are local - AS chooses path, protocol ensures no loops

18 15744 - Fall 2004 Lecture 318 BGP-4 BGP = Border Gateway Protocol Is a Policy-Based routing protocol Is the EGP of today’s global Internet Relatively simple protocol, but configuration is complex and the entire world can see, and be impacted by, your mistakes. 1989 : BGP-1 [RFC 1105] –Replacement for EGP (1984, RFC 904) 1990 : BGP-2 [RFC 1163] 1991 : BGP-3 [RFC 1267] 1995 : BGP-4 [RFC 1771] –Support for Classless Interdomain Routing (CIDR)

19 15744 - Fall 2004 Lecture 319 BGP Operations (Simplified) Establish session on TCP port 179 Exchange all active routes Exchange incremental updates AS1 AS2 While connection is ALIVE exchange route UPDATE messages BGP session

20 15744 - Fall 2004 Lecture 320 Interconnecting BGP Peers BGP uses TCP to connect peers Advantages: –Simplifies BGP –No need for periodic refresh - routes are valid until withdrawn, or the connection is lost –Incremental updates Disadvantages –Congestion control on a routing protocol? –Inherits TCP vulnerabilities! –Poor interaction during high load

21 15744 - Fall 2004 Lecture 321 Four Types of BGP Messages Open : Establish a peering session. Keep Alive : Handshake at regular intervals. Notification : Shuts down a peering session. Update : Announcing new routes or withdrawing previously announced routes. announcement = prefix + attributes values

22 15744 - Fall 2004 Lecture 322 Policy with BGP BGP provides capability for enforcing various policies Policies are not part of BGP: they are provided to BGP as configuration information BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s Import policy –What to do with routes learned from neighbors? –Selecting best path Export policy –What routes to announce to neighbors? –Depends on relationship with neighbor

23 15744 - Fall 2004 Lecture 323 Examples of BGP Policies A multi-homed AS refuses to act as transit –Limit path advertisement A multi-homed AS can become transit for some AS’s –Only advertise paths to some AS’s –Eg: A Tier-2 provider multi-homed to Tier-1 providers An AS can favor or disfavor certain AS’s for traffic transit from itself

24 15744 - Fall 2004 Lecture 324 Export Policy An AS exports only best paths to its neighbors –Guarantees that once the route is announced the AS is willing to transit traffic on that route To Customers –Announce all routes learned from peers, providers and customers, and self-origin routes To Providers –Announce routes learned from customers and self- origin routes To Peers –Announce routes learned from customers and self- origin routes

25 15744 - Fall 2004 Lecture 325 Import Routes From peer From peer From provider From provider From customer From customer provider routecustomer routepeer routeISP route

26 15744 - Fall 2004 Lecture 326 Export Routes To peer To peer To customer To customer To provider From provider provider routecustomer routepeer routeISP route filters block

27 15744 - Fall 2004 Lecture 327 BGP Route Processing Best Route Selection Apply Import Policies Best Route Table Apply Export Policies Install forwarding Entries for best Routes. Receive BGP Updates Best Routes Transmit BGP Updates Apply Policy = filter routes & tweak attributes Based on Attribute Values IP Forwarding Table Apply Policy = filter routes & tweak attributes Open ended programming. Constrained only by vendor configuration language

28 15744 - Fall 2004 Lecture 328 BGP UPDATE Message List of withdrawn routes Network layer reachability information –List of reachable prefixes Path attributes –Origin –Path –Metrics All prefixes advertised in message have same path attributes

29 15744 - Fall 2004 Lecture 329 Path Selection Criteria Information based on path attributes Attributes + external (policy) information Examples: –Hop count –Policy considerations Preference for AS Presence or absence of certain AS –Path origin –Link dynamics

30 15744 - Fall 2004 Lecture 330 Important BGP Attributes Local Preference AS-Path MED Next hop

31 15744 - Fall 2004 Lecture 331 LOCAL PREF Local (within an AS) mechanism to provide relative priority among BGP routers R1R2 R3R4 I-BGP AS 256 AS 300 Local Pref = 500Local Pref =800 AS 100 R5 AS 200

32 15744 - Fall 2004 Lecture 332 LOCAL PREF – Common Uses Handle routes advertised to multi-homed transit customers –Should use direct connection (multihoming typically has a primary/backup arrangement) Peering vs. transit –Prefer to use peering connection, why? In general, customer > peer > provider –Use LOCAL PREF to ensure this

33 15744 - Fall 2004 Lecture 333 AS_PATH List of traversed AS’s Useful for l oop checking and for path-based route selection (length, regexp) AS 500 AS 300 AS 200AS 100 180.10.0.0/16 300 200 100 170.10.0.0/16 300 200 170.10.0.0/16180.10.0.0/16

34 15744 - Fall 2004 Lecture 334 Multi-Exit Discriminator (MED) Hint to external neighbors about the preferred path into an AS –Non-transitive attribute –Different AS choose different scales Used when two AS’s connect to each other in more than one place

35 15744 - Fall 2004 Lecture 335 MED Typically used when two ASes peer at multiple locations Hint to R1 to use R3 over R4 link Cannot compare AS40’s values to AS30’s R1R2 R3R4 AS 30 AS 40 180.10.0.0 MED = 120 180.10.0.0 MED = 200 AS 10 180.10.0.0 MED = 50

36 15744 - Fall 2004 Lecture 336 MED MED is typically used in provider/subscriber scenarios It can lead to unfairness if used between ISP because it may force one ISP to carry more traffic: SF NY ISP1 ignores MED from ISP2 ISP2 obeys MED from ISP1 ISP2 ends up carrying traffic most of the way ISP1 ISP2

37 15744 - Fall 2004 Lecture 337 Other Attributes ORIGIN –Source of route (IGP, EGP, other) NEXT_HOP –Address of next hop router to use Check out http://www.cisco.com for full explanationhttp://www.cisco.com Question: Too many choices/ attributes how to select routes !

38 15744 - Fall 2004 Lecture 338 Route Selection Process Highest Local Preference Shortest ASPATH Lowest MED i-BGP < e-BGP Lowest IGP cost to BGP egress Lowest router ID traffic engineering Enforce relationships Throw up hands and break ties

39 15744 - Fall 2004 Lecture 339 Internal vs. External BGP R3R4 R1 R2 E-BGP BGP can be used by R3 and R4 to learn routes How do R1 and R2 learn routes? Option 1: Inject routes in IGP Only works for small routing tables Option 2: Use I-BGP AS1 AS2

40 15744 - Fall 2004 Lecture 340 Internal BGP (I-BGP) Same messages as E-BGP Different rules about re-advertising prefixes: –Prefix learned from E-BGP can be advertised to I-BGP neighbor and vice-versa, but –Prefix learned from one I-BGP neighbor cannot be advertised to another I-BGP neighbor –Reason: no AS PATH within the same AS and thus danger of looping.

41 15744 - Fall 2004 Lecture 341 Internal BGP (I-BGP) R3R4 R1 R2 E-BGP I-BGP R3 can tell R1 and R2 prefixes from R4 R3 can tell R4 prefixes from R1 and R2 R3 cannot tell R2 prefixes from R1 R2 can only find these prefixes through a direct connection to R1 Result: I-BGP routers must be fully connected (via TCP)! contrast with E-BGP sessions that map to physical links AS1 AS2

42 15744 - Fall 2004 Lecture 342 Route Reflector eBGP update iBGP updates Mesh does not scale RR Each RR passes only best routes, no longer N^2 scaling problem

43 15744 - Fall 2004 Lecture 343 Policy Impact Different relationships – Transit, Peering Export policies  selective export “Valley-free” routing –Number links as (+1, 0, -1) for customer-to- provider, peer and provider-to-customer –In any path should only see sequence of +1, followed by at most one 0, followed by sequence of -1

44 15744 - Fall 2004 Lecture 344 How to infer AS relationships? Can we infer relationship from the AS graph –From routing information –From size of ASes /AS topology graph –From multiple views and route announcements [Gao01] –Three-pass heuristic –Data from University of Oregon RouteViews [SARK01] –Data from multiple vantage points –Formulate TOR problem –Heuristic for solving the relationship assignment

45 15744 - Fall 2004 Lecture 345 [Gao00] Basic Algorithm Phase 1: Identify the degrees of the ASes from the tables Phase 2: Annotate edges with “transit” relation –AS u transits traffic for AS v if it provides its provider/peer routes to v. Phase 3: Identify P2C, C2P, Sibling edges –P2C -> If and only if u transits for v, and v does not, Sibling otherwise –Peering relationship ? Refined Algorithm : Another parameter L

46 15744 - Fall 2004 Lecture 346 How does Phase 2 work? Notion of Valley free routing –Each AS path can be Uphill Downhill Uphill – Downhill Uphill – P2P P2P -- Downhill Uphill – P2P – Downhill How to identify Uphill/Downhill –Heuristic: Identify the highest degree AS to be the end of the uphill path (path starts from source)

47 15744 - Fall 2004 Lecture 347 Observations from [Gao00] Heuristic to identify top provider does not work Algorithm inferences verified from sources within AT & T. Majority are P2C, few peering, few sibling –Peering is few because of the dataset used? –Sibling relationships are becoming more common Mergers, takeovers, backup relationships AS relationships are often complex – inferred relationships are “dominant commercial relationships”

48 15744 - Fall 2004 Lecture 348 Questions.. Is inter-AS relationship –Prefix based –Customer based –Independent? Is the degree based heuristic valid? Are peering relationships underestimated? How useful is inferring the relationships –Policy violations –Anomaly detection? –Is this information revealed against the providers will?

49 15744 - Fall 2004 Lecture 349 BGP Inefficiencies, Overlays and Multihoming Aditya Akella

50 15744 - Fall 2004 Lecture 350 BGP Complexity BGP is a very complicated protocol –Too many knobs –Need to accommodate (sub-optimal) ISP policies –Requires complex, human configuration For all its complexity, BGP offers no guarantees –Performance?? –Reliability?? –Correctness?? –Reachability?? All of BGPs complexity begets… Headache!

51 15744 - Fall 2004 Lecture 351 BGP Pitfalls and Problems Pitfalls and problems –Misconfiguration –Convergence –Performance –Reliability –Stability –Security –And the list goes on…

52 15744 - Fall 2004 Lecture 352 Favorite Scapegoat! BGP Networking community

53 15744 - Fall 2004 Lecture 353 Misconfiguration [Mahajan02Sigcomm] Origin misconfiguration: accidentally inject routes for prefixes into global BGP tables Old RouteNew Route Self deaggregation (failure to aggregate) a.b.0.0/16 X Y Za.b.c.0/24 X Y Z Related origin (likely connected to the network– human error) a.b.0.0/16 X Y Za.b.0.0/16 X Y a.b.0.0/16 X Y Z O a.b.c.0/24 X Y a.b.c.0/24 X Y Z O Foreign origin (address space hijack!) a.b.0.0/16 X Y Za.b.0.0/16 X Y O a.b.c.0/24 X Y O e.f.g.h/i X Y O

54 15744 - Fall 2004 Lecture 354 Misconfiguration Export misconfiguration: export route to a peer in violation of policy ExportPolicy Violation Provider  AS  ProviderRoute exported to provider was imported from a provider Provider  AS  PeerRoute exported to peer was imported from a provider Peer  AS  ProviderRoute exported to provider was imported from a peer Peer  AS  PeerRoute exported to peer was imported from a peer

55 15744 - Fall 2004 Lecture 355 Interesting Observations Origin misconfig –72% of new routes may be misconfig –11-13% of misconfig incidents affect connectivity Pings and e-mail checks –Self de-aggregation is the main cause Export misconfig –Upto 500 misconfiguration incidents per day –All forms are prevalent, although provider-AS-provider is more likely

56 15744 - Fall 2004 Lecture 356 Effects and Causes Effects –Routing load –Connectivity disruption –Extra traffic –Policy violation Causes (Origin misconfig) –Router vendor software bugs: announce and withdraw routes on reboot –Reliance on upstream filtering –New configuration not saved to stable storage (separate command and no autosave!) –Hijacks of address spaces –Forgotten to install filter –Human operators and poor interface P1P2 A C Intended policy: Provide transit to C through link A-C Configured policy: Export all routes originated by C to P1 and P2 Correct policy: export only when AS path is “C” Export Misconfig

57 15744 - Fall 2004 Lecture 357 BGP Convergence [Labovitz00Sigcomm] Conventional beliefs –Path vector converges faster than traditional DV (eliminates the count to infinity problem) –Internet path restoration takes order of 10s of seconds Convergence –Recovery after a fault may take as much as ten minutes –Single routing fault could result in multiple announcements and withdrawals –Loss and RTT around times of faults are much worse Upon route withdrawal, explore paths of increasing length –In the worst case, could explore n! paths –Depends which messages are processed and when Limit between update message could reduce messages –Forces all outstanding messages to be processed

58 15744 - Fall 2004 Lecture 358 IBGP Problems [Griffin02Sigcomm] Route reflectors could impose signaling and forwarding anamolies  instability! R i is a reflector for C i (updates sent between R i and C i, i=1, 2) R i is a BGP router announcing P i into the network C i will only know about P i and it as best path But C i ---P i shortest path is C i C i+1 R i and this causes a forwarding loop!

59 15744 - Fall 2004 Lecture 359 End-to-End Routing Behavior [Paxson96Sigcomm] Large scale routing behavior as seen by end- hosts, based on analysis of traceroutes Pathologies: persistent routing loops, routing failures and long connectivity outages Stability: 9% or routes changed every 10s of minutes, 30% about ~6hrs and 68% took a few days Symmetry: more than half of paths probed were asymmetric at router level

60 15744 - Fall 2004 Lecture 360 Inefficiencies in BGP & Internet Routing Route convergence and oscillations Poor reliability –No way to exploit redundancy in Internet paths Inefficiency: sub-optimal RTTs and throughputs –What are some of the causes? Policies in routing: Inter-domain and Intra-domain Lack of direct routes, “sparseness” of the Internet graph

61 15744 - Fall 2004 Lecture 361 Inefficiency of Routes [Spring03Sigcomm] Three classes of reasons for poor performance (“inflation”) –Intra-domain topology and policy Topology: no direct link between all cities Routing policy: “shortest paths” may be avoided due to engineering –ISP Peering Peeering topology: limited peering between ISPs Peering policy: hot-potato routing or early-exit routing –Inter-domain Topology: AS graph is sparse Inter-domain policies: policies are policies

62 15744 - Fall 2004 Lecture 362 Path Inflation Summary

63 15744 - Fall 2004 Lecture 363 Performance: End-to-End Perspective From an end-to-end view… –Is there a way of extracting better performance? Is there scope? How do we realize this? Scope: Savage99, Akella03, Akella04 Reality: UW’s “Detour” system, MIT’s RON, Akamai’s SureRoute, CMU’s Route Control

64 15744 - Fall 2004 Lecture 364 Quantifying Performance Loss [Savage99Sigcomm] Measure round trip time (RTT) and loss rate between pairs of hosts Alternate path characteristics –30-55% of hosts had lower latency –10% of alternate routes have 50% lower latency –75-85% have lower loss rates

65 15744 - Fall 2004 Lecture 365 Bandwidth Estimation RTT & loss for multi-hop path –RTT by addition –Loss either worst or combine of hops – why? Large number of flows  combination of probabilities Small number of flows  worst hop Bandwidth calculation –TCP bandwidth is based primarily on loss and RTT 70-80% paths have better bandwidth 10-20% of paths have 3x improvement

66 15744 - Fall 2004 Lecture 366 Possible Sources of Alternate Paths A few really good or bad AS’s –No, benefit of top ten hosts not great Better congestion or better propagation delay? –How to measure? Propagation = 5th percentile of delays –Both contribute to improvement of performance

67 15744 - Fall 2004 Lecture 367 Overlay Networks Basic idea: –Treat multiple hops through IP network as one hop in overlay network –Run routing protocol on overlay nodes Why? –For performance – like the Savage 99 paper showed –For efficiency – can make core routers very simple E.g. CSFQ, Also aid deployment. E.g. Active networks –For functionality – can provide new features such as multicast, active processing

68 15744 - Fall 2004 Lecture 368 Future of Overlay Application specific overlays –Why should overlay nodes only do routing? Caching –Intercept requests and create responses Transcoding –Changing content of packets to match available bandwidth Peer-to-peer applications

69 15744 - Fall 2004 Lecture 369 Overlay Challenges “Routers” no longer have complete knowledge about link they are responsible for How do you build efficient overlay –Probably don’t want all N 2 links – which links to create? –Without direct knowledge of underlying topology how to know what’s nearby and what is efficient? Do we need overlays for performance?

70 15744 - Fall 2004 Lecture 370 Number of Route Choices Flexible control of end- to-end path  many route choices Multiple candidate paths Single path Multiple BGP paths BGP: one path via each ISP  choices linked to #ISPs Few more route choices…?

71 15744 - Fall 2004 Lecture 371 Route Selection Mechanism BGP: simple, coarse metrics such as least AS hops, policy Best performing path Least AS hops Policy compliant Current best performing BGP path Overlays: complex, performance-oriented selection Sophisticated selection among multiple BGP routes Smart selection “Multihoming route control”

72 15744 - Fall 2004 Lecture 372 Overlay Routing vs Multihoming Route Control [Akella04Sigcomm] > > ~ ~ 1-multihoming k-multihoming 1-overlays k-overlays ~

73 15744 - Fall 2004 Lecture 373 Overlay Routing vs. Multihoming Route Control Cost Operational issues Route ControlOverlay Routing Sprint $$ Genuity $$ ATT $$ Overlay provider $$ ATT $$ Overlay node forces inter- mediate ISP to provide transit /18 netblock Announce /20 sub-blocks to ISPs If all multihomed ends do this Routing table expansionBad interactions with policies Connectivity feesConnectivity fees + overlay fee

74 15744 - Fall 2004 Lecture 374 Summary Route control similar to overlay routing for most practical purposes Overlays very useful for deploying functionality –Multicast, VPNs, QoS, security But overlays may be overrated for end-to-end performance and resilience Don’t abandon BGP – there’s still hope of extracting good performance and availability


Download ppt "15744 - Fall 2004 Lecture 31 Inter-Domain Routing: BGP, Overlay Routing, Multihoming Vyas Sekar Based on slides from: Srini Seshan, Tim Griffin."

Similar presentations


Ads by Google