Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet Routing: Measurement, Modeling, and Analysis Dr. Jia Wang AT&T Labs Research Florham Park, NJ 07932, USA

Similar presentations


Presentation on theme: "Internet Routing: Measurement, Modeling, and Analysis Dr. Jia Wang AT&T Labs Research Florham Park, NJ 07932, USA"— Presentation transcript:

1

2 Internet Routing: Measurement, Modeling, and Analysis Dr. Jia Wang jiawang@research.att.com AT&T Labs Research Florham Park, NJ 07932, USA http://www.research.att.com/~jiawang/ Prof. Zhuoqing Morley Mao zmao@umich.edu Department of EECS University of Michigan Ann Arbor, MI 48109, USA http://www.eecs.umich.edu/~zmao/ ACM Sigmetrics 2005 Tutorial

3 2 Outline 1.Overview of Inter-domain routing 2.Measuring inter-domain paths 3.BGP Measurement 4.BGP Modeling Our opinions should not be taken to represent AT&T policies

4 Part I: Overview of Inter- domain Routing

5 4 Internet  Loose cooperative effort of Internet Service Providers (ISPs)  E.g., AT&T, Sprint, UUNet, AOL  Best effort service  Connectedness  Anyone connected to the Internet can exchange traffic with anyone else connected to the Internet

6 5 : Routing session routes Control plane: exchange routes Internet routing rusty.cs.berkeley.edu IP=169.229.62.116 Prefix=169.229.0.0/16 www.cnn.com IP=64.236.16.52 Prefix=64.236.16.0/20 Internet IP traffic Data plane: forward traffic Fail over to alternate route

7 6 Internet routing domain  Autonomous routing domain  Network devices under same technical and administrative control  Common routing policy  E.g., ISPs, enterprise networks  Autonomous system  Autonomous routing domain with an AS number (ASN)  AS numbers: 16 bits integer  Public AS number: 1 – 64511  Private AS number: 64512 – 65535  Examples  AT&T: 7018, 6431, …  Sprint: 1239, 1240, …  MIT: 3

8 7 More than 20,000 ASes today Berkeley Internet CNN Calren Level3 GNN IP traffic QwestSprintUUnet University company AT&T business ISP Autonomous System Berkeley Calren Level3 QwestSprintUUnet University company AT&T business ISP Berkeley Calren Level3 QwestSprintUUnet University company AT&T business ISP

9 8 Internet routing architecture IP traffic Berkeley CNN Level3 Internet CalrenGNN Inter-domain routing Intra-domain routing

10 9  Run within a certain network infrastructure  Optimize routes taken between points within a network  Internal Gateway Protocols (IGPs)  Metrics based  OSPF (Open Shortest Path First)  RIP (Routing Information Protocol)  IS-IS (Intermediate System to Intermediate System)

11 10 Inter-domain routing  Run between networks  Provide full connectivity of entire Internet  External Gateway Protocol (EGP)  Policy based  BGP (Border Gateway Protocol)

12 11 Link state protocols  Examples: OSPF, IS-IS  Based on Dijkstra’s shortest path computation  Each router periodically floods immediate reachability information to other routers  Fast convergence  High communication and computation overhead  Not scalable for large networks  Requires periodic refreshes

13 12 Vectoring protocols  Distance vs. Path Vector  Distance: hop count (RIP)  Path: entire path (BGP)  Helps identify loops  Supports policy-based routing based on path  Minimal communication overhead  Takes longer to converge, i.e., in proportion to the maximum path length

14 13 Link state vs. vectoring OSPF IS-IS RIP BGP IGP EGP Link stateVectoring BGP is a path vector protocol

15 14 Classful addressing  IPv4: 32 bits  Five classes of networks ClassAddressMask# of networks# of hosts A0*255.0.0.0128~1.6M B10*255.255.0.01638465535 C110*255.255.255.0~2.1M255 DUsed for multicast EReserved and currently unused Improve scaling factor of routing in the Internet => classless

16 15 CIDR: Classless Inter-domain Routing (RFC1519)  No implicit mask based on the class of the network  Explicit masks passed in the routing protocol  Allow aggregation and hierarchical routing 00001100 00100110 00000000 00000000 11111111 11111111 11000000 00000000 IP address: 12.70.0.0Mask: 255.255.252.0 CIDR representation: 12.70.0.0/22 Address Mask Network prefix Host identifier 00001100 00100110 00000000 00000000 11111111 11111111 11000000 00000000

17 16 Address aggregation Internet 12.70.1.0/24 12.70.2.0/24 12.70.3.0/24 12.70.0.0/24 ISP A ISP B 12.70.0.0/22 12.71.0.0/16

18 17 Routing and forwarding  Routing  The decision process of choosing optimal path that is consistent with the administrative or technical policy  Forwarding  The act of receiving a packet, doing a lookup, and copying a packet to the next hop

19 18 Classless forwarding Internet 135.120.0.1 12.70.0.20 IP traffic PrefixNext hop 12.70.0.0/2410.20.0.1 12.70.0.0/1610.20.1.1 12.0.0.0/810.20.128.1 0.0.0.0 10.20.128.10 10.20.0.1 10.20.1.1 10.20.128.1 10.20.128.10

20 19 Inter-domain routing with CIDR support  BGP-4 [RFC1771]  De facto EGP  Carry routing information between ASes  Path vector protocol  Policy based routing  Run on top of TCP for reliability  Basic operations  Set up BGP session  Exchange all candidate routes  Send incremental updates

21 20 Establish BGP session 12.10.0.112.10.0.2 Establish neighboring session between 12.10.0.1 and 12.10.0.2 PrefixNext hop 12.70.0.0/2410.20.0.1 12.9.0.0/1610.20.1.1 PrefixNext hop 135.120.0.0/2410.128.0.1 68.35.0.0/1610.192.1.1 TCP 179

22 21 Exchange all candidate routes 12.10.0.112.10.0.2 PrefixNext hop 12.70.0.0/2410.20.0.1 12.9.0.0/1610.20.1.1 135.120.0.0/2410.128.0.1 68.35.0.0/1610.192.1.1 PrefixNext hop 135.120.0.0/2410.128.0.1 68.35.0.0/1610.192.1.1 12.70.0.0/2410.20.0.1 12.9.0.0/1610.20.1.1 12.70.0.0/2410.20.0.1 12.9.0.0/1610.20.1.1 135.120.0.0/2410.128.0.1 68.35.0.0/1610.192.1.1

23 22 Send incremental updates 12.10.0.112.10.0.2 PrefixNext hop 12.70.0.0/2410.20.0.1 12.9.0.0/1610.20.1.1 135.120.0.0/2410.128.0.1 68.35.0.0/1610.192.1.1 PrefixNext hop 135.120.0.0/2410.128.0.1 68.35.0.0/1610.192.1.1 12.70.0.0/2410.20.0.1 12.9.0.0/1610.20.1.1 Withdraw 12.9.0.0/16

24 23 BGP messages  OPEN: set up a peering session  UPDATE: announce new routes or withdraw previously announced routes  NOTIFICATION: shut down a peering session  KEEPALIVE: confirm active connection at regular interval

25 24 Internal vs. external BGP Internet I-BGP E-BGP AS A AS B AS C E-BGP update I-BGP update I-BGP update

26 25 Scaling I-BGP for large AS  Route reflectors  Confederations E-BGP update RR Only best paths being sent by RR AS 1000 EBGP IBGP AS 65010 AS 65020

27 26 Establish connectivity 135.120.0.0/16 12.10.0.1 12.10.0.2 PrefixNext hop AS path 135.120.0.0/1612.10.0.11 EBGP IBGP EBGP 12.10.0.5 12.10.0.6 AS 1 AS 2 AS 3 PrefixNext hop AS path 135.120.0.0/1612.10.0.52 1 PrefixNext hop AS path 135.120.0.0/1612.10.0.11

28 27 IGP and BGP working together 135.120.0.0/16 12.10.0.1 12.10.0.2 PrefixNext hop AS path 135.120.0.0/1612.10.0.11 EBGP IBGP EBGP 12.10.0.5 12.10.0.6 AS 1 AS 2 AS 3 PrefixNext hop AS path 135.120.0.0/1612.10.0.11 10.10.0.1 PrefixNext hop 12.10.0.0/3010.10.0.1 135.120.0.0/1610.10.0.1 12.10.0.0/30

29 28 Policy routing ISP1 ISP4ISP3 Cust1Cust2 ISP2 traffic Connectivity DOES NOT imply reachability! Policy determines how traffic can flow on the Internet

30 29 BGP routing process Apply input policy Routes received from peers Select best route Best routes Apply output policy Routes advised to peers Routing table Forwarding table BGP is not shortest path routing!

31 30 Best route selection  Highest local preference  Shortest AS path  Lowest MED (Multi-Exit-Discriminator)  I-BGP < E-BGP  Lowest I-BGP cost to E-BGP egress  Tie breaking rules

32 31 Best route selection  Highest local preference  To enforce economical relationships between domains  Shortest AS path  Lowest MED (Multi-Exit-Discriminator)  I-BGP < E-BGP  Lowest I-BGP cost to E-BGP egress  Tie breaking rules

33 32 Best route selection  Highest local preference  Shortest AS path  Compare the quality of routes, assuming shorter AS-path length is better  Lowest MED (Multi-Exit-Discriminator)  I-BGP < E-BGP  Lowest I-BGP cost to E-BGP egress  Tie breaking rules

34 33 Best route selection  Highest local preference  Shortest AS path  Lowest MED (Multi-Exit-Discriminator)  To implement “cold potato” routing between neighboring domains  I-BGP < E-BGP  Lowest I-BGP cost to E-BGP egress  Tie breaking rules

35 34 Best route selection  Highest local preference  Shortest AS path  Lowest MED (Multi-Exit-Discriminator)  I-BGP < E-BGP  Prefer EBGP routes to IBGP routes  Lowest I-BGP cost to E-BGP egress  Tie breaking rules

36 35 Best route selection  Highest local preference  Shortest AS path  Lowest MED (Multi-Exit-Discriminator)  I-BGP < E-BGP  Lowest I-BGP cost to E-BGP egress  Prefer routes via the nearest IGP neighbor  To implement “hot potato” routing  Tie breaking rules

37 36 Best route selection  Highest local preference  Shortest AS path  Lowest MED (Multi-Exit-Discriminator)  I-BGP < E-BGP  Lowest I-BGP cost to E-BGP egress  Tie breaking rules  Router ID based: lowest router ID  Age based: oldest route

38 37 BGP route propagation  Not all possible routes propagate  Commercial relationships determine policies for  Route import  Route selection  Route export

39 38 Typical AS relationships  Provider-customer  customer pay money for transit  Peer-peer  typically exchange respective customers’ traffic for free  Siblings  Mutual transit agreement  Provide connectivity to the rest of the Internet for each other

40 39 AS relationships translate into BGP export rules  Export to a provider or a peer  Allowed: its routes and routes of its customers and siblings  Disallowed: routes learned from other providers or peers  Export to a customer or a sibling  Allowed: its routes, the routes of its customers and siblings, and routes learned from its providers and peers

41 40 Which AS paths are legal?  Valley-free:  After traversing a provider-customer or peer-peer edge, cannot traverse a customer-provider or peer-peer edge  Invalid path: >= 2 peer links, downhill- uphill, downhill-peer, peer-uphill

42 41 Example of valley-free paths X X [1 2 3], [1 2 6 3] are valley-free [1 4 3], [1 4 5 3] are not valley free

43 42 Inferring AS relationships  Identify the AS-level hierarchy of Internet  Not shortest path routing  Predict AS-level paths  Traffic engineering  Understand the Internet better  Correlate with and interpret BGP update  Identify BGP misconfigurations  E.g., errors in BGP export rules

44 43 Existing approaches  On inferring Autonomous Systems Relationships in the Internet, by L. Gao, IEEE Global Internet, 2000.  Characterizing the Internet hierarchy from multiple vantage points, by L. Subramanian, S. Agarwal, J. Rexford, and R. Katz, IEEE Infocom, 2002.  Computing the Types of the Relationships between Autonomous Systems, by G. Battista, M. Patrignani, and M. Pizzonia, IEEE Infocom, 2003.  On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang, and Y. Zhang, ACM Sigmetrics, 2005.

45 44 Policy routing causes path inflation  End-to-end paths are significantly longer than necessary  Why?  Topology and routing policy choices within an ISP, between pairs of ISPs, and across the global Internet  Peering policies and interdomain routing lead to significant inflation  Interdomain path inflation is due to lack of BGP policy to provide convenient engineering of good paths across ISPs

46 45 Path inflation  Based on [Mahajan03]  Comparing actual Internet paths with hypothetical “direct” link

47 Part II: Measuring Inter- domain Forwarding Paths

48 47 Why do we care?  Characterize end-to-end network paths  Latency  Capacity  Link utilization  Loss rate.  Diagnose routing anomalies  Forwarding loop, blackholes, routing changes, unexpected paths, main component of end-to-end latency.  Discover Internet topology  Server placement

49 48 Key challenge  Need to understand how packets flow through the Internet without real-time access to proprietary routing data from each domain.  Identify accurate packet forwarding paths  Characterize the performance metrics of each hop along the paths

50 49 Existing approaches  With access to the source  AS-level traceroute  Towards an Accurate AS-Level Traceroute Tool, by Z. Mao, J. Rexford, J. Wang, and R. Katz, ACM Sigcomm, 2003.  Scalable and Accurate Identification of AS-Level Forwarding Paths, by Z. Mao, D. Johnson, J. Rexford, J. Wang, and R. Katz, IEEE Infocom, 2004.  Without access to the source  Routescope  On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang, and Y. Zhang, ACM Sigmetrics, 2005.

51 50 AS-Level Traceroute  Traceroute gives IP level forwarding path  IP address of the router interfaces on a forwarding path  RTT statistics for each hop along the way

52 51 Traceroute from AT&T Research to www.cnn.com traceroute to cnn.com (64.236.24.12), 30 hops max, 40 byte packets 1 oden (135.207.16.1) 1 ms 1 ms 1 ms 2 * * * 3 attlr-gate (192.20.225.1) 2 ms 2 ms 2 ms 4 12.119.155.157 (12.119.155.157) 3 ms 4 ms 4 ms 5 gbr6-p52.n54ny.ip.att.net (12.123.192.18) 4 ms 4 ms 4 ms 6 tbr2-p012401.n54ny.ip.att.net (12.122.11.29) 4 ms (ttl=249!) 5 ms (ttl=249!) 5 ms (ttl=249!) 7 ggr2-p390.n54ny.ip.att.net (12.123.3.62) 4 ms 5 ms 4 ms 8 att-gw.ny.aol.net (192.205.32.218) 4 ms 4 ms 4 ms 9 bb2-nye-P1-0.atdn.net (66.185.151.66) 4 ms 4 ms 4 ms 10 bb2-vie-P8-0.atdn.net (66.185.152.201) 13 ms (ttl=245!) 12 ms (ttl=245!) 12 ms (ttl=245!) 11 bb1-vie-P11-0.atdn.net (66.185.152.206) 10 ms 10 ms 10 ms 12 bb1-cha-P7-0.atdn.net (66.185.152.28) 20 ms 20 ms 20 ms 13 bb1-atm-P6-0.atdn.net (66.185.152.182) 25 ms 25 ms 25 ms 14 pop1-atl-P4-0.atdn.net (66.185.136.17) 25 ms (ttl=243!) 24 ms (ttl=243!) 24 ms (ttl=243!) 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * Who is responsible for the forwarding problem? Destination unreachable!

53 52 Need to know Inter-domain level path  Obtain AS level paths  BGP AS path  Traceroute AS path

54 53 BGP AS path AS A AS B AS C Prefix d Forwarding path: data traffic Signaling path: control traffic d: path=[C] d: path=[BC] PrefixAS path dA B C… Is BGP AS path the answer?No!

55 54 BGP AS path is not the answer  Requires timely access to BGP data  Signaling path may differ from forwarding path  Route aggregation and filtering  Routing anomalies: e.g., deflections, loops [Griffin2002]  BGP misconfigurations: e.g., incorrect AS prepending Two paths may differ precisely when operators most need accurate data to diagnose a problem!

56 55 AS AAS BAS CAS D Traceroute AS path  Obtain IP level path using traceroute  Map IP addresses to ASes Is traceroute AS path the answer?NO! SourceDestination a bcde

57 56 Traceroute AS path is not the answer  Identifying ASes along forwarding path is surprisingly difficult!  Internet route registry  Origin AS in BGP routes

58 57 Internet route registry  Whois database  E.g. NANOG traceroute, prtraceroute  Out-of-date, incomplete  Address allocation to customers  Acquisition, mergers, break-ups

59 58 Origin AS in BGP routes  Last AS in the AS path for each prefix  More accurate and complete than whois data PrefixAS path dA B C ……

60 59 Limitations of BGP origin AS  Multiple Origin AS (MOAS)  Multi-homing  misconfiguration  Internet eXchange Points (IXPs)  Infrastructure addresses may not be advertised  Does not require to be announced publicly  Security concerns  Addresses announced by someone else  Static routed customers  Shared equipments at boundary between ASes Need accurate IP-to-AS mapping!

61 60 Accurate AS-level traceroute Combine BGP and traceroute data to find a better answer!

62 61 Assumptions  IP-to-AS mapping  Mappings from BGP tables are mostly correct.  Change slowly  BGP paths and forwarding paths mostly match.  70% of the BGP path and traceroute path match

63 62 BGP path and traceroute path could differ!  Inaccurate IP-to-AS mapping  Traceroute problems  Legitimate mismatches

64 63 BGP path and traceroute path could differ!  Inaccurate IP-to-AS mapping  Internet eXchange Points (IXPs)  Sibling ASes  Unannounced infrastructure addresses  Traceroute problems  Legitimate mismatches

65 64 Internet eXchange Points (IXPs)  Shared infrastructure connected to multiple service providers  Exchange BGP routes and data traffic  May have its own AS number or announced by participating ASes  Dedicated BGP sessions between pairs of participating ASes  E.g., Mae-East, Mae-West, PAIX.

66 65 IXPs cause extra AS hop  Extra AS hop in traceroute path  Large number of fan-in and fan-out ASes  Non transit AS, small address block, likely MOAS A B C D E F G Traceroute AS pathBGP AS path B C F G AE

67 66 Sibling ASes  Single organization owns and manages multiple ASes  May share address space  Cause extra AS hop  Large fan-in and fan-out for the “sibling AS pair” Traceroute AS path BGP AS path A B C D E F G H A B C D E F G

68 67 Unannounced infrastructure addresses  ASes do not necessarily announce infrastructure via BGP  Lead to “unmapped” addresses  Sometimes fall into supernet announced by AS’s provider or sibling

69 68 Unannounced infrastructure addresses 1. A,C AS A AS B AS C 2. A 3. B,A4. A,C,A Extra AS hop in traceroute path Missing AS hop in traceroute path Substitute AS hop AS loop in traceroute path

70 69 BGP path and traceroute path could differ!  Inaccurate IP-to-AS mapping  Traceroute problems  Forwarding path changing during traceroute  Interface numbering at AS boundaries  ICMP response refers to outgoing interface  Legitimate mismatches

71 70 Forwarding path changing during traceroute AS AAS BAS C AS AAS C AS DAS E AS D AS hop B is substituted by AS D in the traceroute path Route flaps between A B C and A D E

72 71 Interface numbering at AS boundaries AS AAS BAS C AS AAS C Missing AS hop B in traceroute path

73 72 ICMP response refers to outgoing interface AS B AS AAS C ICMP message Extra AS hop B in traceroute path

74 73 BGP path and traceroute path could differ!  Inaccurate IP-to-AS mapping  Traceroute problems  Legitimate mismatches  Route aggregation and filtering  Routing anomalies, e.g., deflections

75 74 Route aggregation/filtering 8.0.0.0/8 B C8.0.0.0/8 C 8.64.0.0/16 C D AS BAS CAS A Extended traceroute path due to filtering by AS B

76 75 Mismatch patterns and causes Extra AS Miss AS AS Loop Subst AS Other IXPX Sibling ASesXXXX Unannounced IPXXXX Aggregation/ filteringX Inter-AS interfaceXX ICMP source addressXXXX Routing anomalyXXXXX

77 76 BGP and traceroute data collection Initial mappings from origin AS of a large set of BGP tables Traceroute paths from multiple locations Compare Look for known causes of mismatches (e.g., IXP, sibling ASes) Edit IP-to-AS mappings (a single change explaining a large number of mismatches) For each location: Combine all locations: Local BGP pathsTraceroute AS paths For each location: (Ignoring unstable paths)

78 77 Measurement setup  Eight vantage points  Upstream providers: US-centric tier-1 ISPs  Sweep all routable IP address space  About 200,000 IP addresses, 160,000 prefixes, 15,000 destination ASes

79 78 Preprocessing BGP paths  Discard prefixes with BGP paths containing  Routing changes based on BGP updates  Private AS numbers (64512 - 65535)  Empty AS paths (local destinations)  AS loops from misconfiguration  AS SET instead of AS sequence  Less than 1% prefixes affected

80 79 Preprocessing traceroute paths  Resolving incomplete traceroute paths  Unresolved hops within a single AS map to that AS  Unmapped hops between ASes  Try match to neighboring AS using DNS, Whois  Trim unresponsive (*) hops at the end  Compare with the beginning of local BGP paths  MOAS at the end of paths  Assume multi-homing without BGP  Validation using AT&T router configurations  More than 98% cases validated

81 80 Initial IP-to-AS Mapping WhoisCombined BGP tables Resolving incompletes Match44.7%73.2%78.0% Mismatch29.4%8.3%9.0% Ratio1.58.89.0

82 81 Heuristics to improve mappings  Overall modification to mappings  10% IP-to-AS mappings modified  25 IXPs identified  28 pairs of sibling ASes found  1150 of the /24 prefixes shared IXPSibling ASes Unannounced address space Match84.4%85.9%90.6% Mismatch8.7%7.8%3.5% Ratio9.711.026.0

83 82 Systematic optimization  Dynamic-programming and iterative improvement  Initial IP-to-AS mapping derived from BGP routing tables  Identify a small number of modifications that significantly improve the match rate.  95% match ratio, less than 3% changes, very robust

84 83 Optimization results Input mappingMismatch Full initial Mapping5.23% Heuristically optimized mapping3.08% Omit 10% initial mapping6.57% Omit 4 probing sources6.34% Omit probing destinations (one probe per unique BGP path) 7.12%

85 84 AS-level path inference  Without access to the source  Challenges  Asymmetric routes: 60%  Complicated routing policies  Multihomed networks Find the shortest policy path that conforms with AS relationships

86 85 Routescope  Assumptions  Explicit AS relationship  Peer-peer  Provider-customer  Shortest policy AS path preferred  Valley-free  Uniform routing policy within an AS  AS destination based uniform routing  Stability These assumptions are mostly correct.

87 86 AS path inference algorithm  Compose AS graph based on BGP tables  Infer AS relationship  Classify edges based on AS relationship  Customer-provider (UP) link  Provider-customer (DOWN) link  Peering (FLAT) link  Compute shortest policy path conforming the “valley- free” rule using modified Dijkstra’s algorithm  Infer the first AS hop if multiple paths returned

88 87 AS path inference accuracy TotalMatchMatch length Exact match ShorterLonger AS7018 (tier-1) 1808582%83%35%17%0% AS2152 (tier-2) 1199064% 10%35%0% AS8121 (tier-3) 1575716%27%3%69%4% All BGP gateways 245770%73%30%22%4% US BGP gateways 190760%62%27%34%4% If the first hop is known, 15% of mismatches can be eliminated.

89 88 First hop inference  Gather candidate first hop ASes from S by launch traceroute to S from multiple vantage points  Identify the transition point T that is likely to be on the path from S to D by testing hop_count(S,T) + hop_count(T,D) = hop_count(S,D) Source Destination AS S AS T2 AS T1AS D AS C Transition point T1 T2 Only have access to D

90 89 Hop count inference  Hop_count(S,T) = hop_count(T,S)  To infer hop_count(H,D): H = T or S  Send ping packet to H  Guess the initial TTL value TTL0 set by H  Get TTL value TTL1 in ICMP response packet received from H  Hop_count(H,D) = TTL0 - TTL1 + 1  Common value for TTL0:  32 (Win95/98/Me)  64 (Linux, Compaq Tru64)  128 (Win NT/2000/XP)  255 (most UNIX systems)

91 90 Improvement with known first AS hop TotalMatch lengthImprovement AS7018 (tier-1)1808586%3% AS2152 (tier-2)1199076%12% AS8121 (tier-3)1575748%21% All BGP gateways 190770%8% US BGP gateways 245788%15%

92 91 Possible causes of inaccuracy  Complicated AS relationships: 15% paths  Two consecutive FLAT links  DOWN link followed by a FLAT link  FLAT link followed by UP link  Routing policies  Shortest path vs. customer routes  Inconsistent advertisement to different peering locations  BGP tie-breaking rules  AS prepending:>28% ASes

93 Part III: BGP Measurement

94 93 BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages indicate changes, no refreshes

95 94 Internet routing instability  Large # of BGP updates  Failures  Policy changes  Redundant messages  Routing instability  Route keeps changing, e.g., routes keep going up and down

96 95 Implications  Router overhead  Transient delay and loss  Unreachable hosts  High loss rate  High jitter  Long delays  Significant packet reordering  Poor predictability of traffic flow How do we know if the instability is due to routing or network congestion?

97 96 Measure BGP stability  First work by Labovitz et al.  Methodology  Collect routing messages from five public exchange points  BGP information considered  AS path  Next hop: next hop to reach a network  Two routes are the same if they have the same AS path and next hop  Other attributes (e.g., MED, communities) ignored  Focus on forwarding path stability

98 97 Measurement methodology

99 98 BGP information exchange  Announcements: a router has either  Learned of a new route, or  Made a policy decision that it prefers a new route  Withdrawals: a router concludes that a network is no longer reachable  Explicit: associated to the withdrawal message  Implicit: (in effect an announcement) when a route is replaced as a result of an announcement message  In steady state BGP updates should be only the result of infrequent policy changes  BGP is stateful, requires no refreshes  Update rate: indication of network stability

100 99 Example of delayed convergence Example topology: 1 2 34 d Assuming node 1 has a route to a destination, and it withdraws the route: Stage (msg processed)Msg queued 0: 1->{2,3,4}W 1: 1->{2,3,4}W2->{3,4}A[241], 3->{2,4}A[341], 4->{2,3}A[431] 0 2: [1] 3: [1] 4: [1] 1 [41] [31] 2: 2->{3,4}A[241] 3->{2,4}A[341], 4->{2,3}A[431] 3: 3->{2,4}A[341]4->{2,3}A[431], 4->{2,3}W 4: 4->{2,3}A[431] 4 [431] [241] -- MinRouteAdver timer expires:4->{2,3}W, 3->{2,4}A[3241], 2->{3,4}A[2431] … (omitted) Note: In response to a withdrawal from 1, node 3 sends out 3 messages: 3->{2,4}A[341], 3->{2,4}A[3241], 3->{2,4}W 9: 3->{2,4}W stage node 9 --

101 100 Types of inter-domain routing updates  Forwarding instability  may reflect topology changes  Policy fluctuations (routing instability)  may reflect changes in routing policy information  Pathological updates  redundant updates that are neither routing nor forwarding instability  Instability  forwarding instability and policy fluctuation  change forwarding path

102 101 Routing successive events (instability)  WADiff  W: a route is explicitly withdrawn as it becomes unreachable  A: is later replaced with an alternative route  Forwarding instability  AADiff  A: a route is implicitly withdrawn  A: then replaced by an alternative route as the original route becomes unavailable or a new preferred route becomes available  Forwarding instability

103 102 Routing successive events (pathological instability)  WADup  W: a route is explicitly withdrawn  A: then reannounced later  forwarding instability or pathological behavior  AADup  A: a route is implicitly withdrawn  A: then replaced with a duplicate of the original route  pathological behavior or policy fluctuation  WWDup  The repeated transmission of BGP withdrawals for a prefix that is currently unreachable (pathological behavior)

104 103 Measurement findings: overview  Year 2000  BGP updates more than one order of magnitude larger than expected  Routing information dominated by pathological updates  Implementation problems  BGP self-synchronization  Unconstrained routing policies

105 104 Routing problem findings  Implementation problems  Redundant updates  Routers do not maintain the history of the announcements sent to neighbors  Self-synchronization  BGP routers exchange information simultaneously  may lead to periodic link/router failures  Unconstrained routing policies may lead to persistent route oscillations

106 105 Instability measurement  Instability and redundant updates exhibits strong correlation with load  (30 seconds, 24 hours and seven days periods)  Instability usually exhibits high frequency  Pathological updates exhibits both high and low frequencies

107 106 Non-localized instability  No single AS dominates instability statistics  No correlation between size of AS and its impact on instability statistics  There is no small set of paths that dominate instability statistics

108 107 Measurement conclusions  Routing in the Internet exhibits many undesirable behaviors  Instability over a wide range of time scales  Asymmetric routes  Network outages  Problem seems to worsen  Many problems are due to software bugs or inefficient router architectures

109 108 Lessons  Even after decades of experience routing in the Internet is not a solved problem  This attests the difficulty and complexity of building distributed algorithm in the Internet, i.e., in a heterogeneous environment with products from various vendors  Simple protocols may increase the chance to be  Understood  Implemented right

110 109 Better understanding of BGP dynamics  Difficulties  Multiple administrative domains  Unknown information (policies, topologies)  Unknown operational practices  Ambiguous protocol specs Proposal: a controlled active measurement infrastructure for continuous BGP monitoring – BGP Beacons.

111 110 What is a BGP Beacon?  An unused, globally visible prefix with known Announce/Withdrawal schedule  For long-term, public use

112 111 Who will benefit from BGP Beacon?  Researchers: study BGP dynamics  To calibrate and interpret BGP updates  To study convergence behavior  To analyze routing and data plane interaction  Network operators  Serve to debug reachability problems  Test effects of configuration changes:  E.g., flap damping setting

113 112 Related work  Differences from Labovitz’s “BGP fault- injector”  Long-term, publicly documented  Varying advertisement schedule  Beacon sequence number (AGG field)  Enabler for many research in routing dynamics  RIPE Ris Beacons  Set up at 9 exchange points

114 113 Internet Active measurement infrastructure BGP Beacon #1 198.133.206.0/24 1:Oregon RouteViews Stub AS Upstream provider Upstream provider ISP Many Observation points: 2. RIPE ISP 6.Berkeley 4. Verio 3.AT&T 5. MIT Send route update

115 114 Deployed PSG Beacons PrefixSrc AS Start date Upstream provider AS Beacon host Beacon location 198.133.206.0/2431308/10/022914, 1239Randy BushWA, US 192.135.183.0/2456379/4/023701, 2914Dave MeyerOR, US 203.10.63.0/2412219/25/021221Geoff HustonAustralia 198.32.7.0/24394410/24/022914, 8001Andrew PartanMD, US 192.83.230.0/24313006/12/032914, 1239Randy BushWA, US

116 115 Deployed PSG Beacons  B1, 2, 3, 5:  Announced and withdrawn with a fixed period  (2 hours) between updates  1st daily ANN: 3:00AM GMT  1st daily WD: 1:00AM GMT  B4: varying period  B5: fail-over experiments  Software available at: http://www.psg.com/~zmao

117 116 Beacon 5 schedule Live host behind the beacon for data analysis Study fail-over Behavior for multi-homed customers

118 117 Beacon terminology  Input signal: Beacon-injected change 3:00:00 GMT: Announce (A0) 5:00:00 GMT: Withdrawal (W) Beacon prefix: 198.133.206.0/24 Beacon AS RouteView AT&T  Output signal: 5:00:10 A1 5:00:40 W 5:01:10 A2 Signal length: number of updates in output signal (3 updates) Signal duration: time between first and last update in the signal (5:00:10 - - 5:01:10, 60 seconds) Inter-arrival time: time between consecutive updates Internet

119 118 Process Beacon data  Identify output signals, ignore external events  Data cleaning  Anchor prefix as reference  Same origin AS as beacon prefix  Statically nailed down  Minimize interference between consecutive input signals  Beacon period is set to 2 hours  Time stamp and sequence number  Attach additional information in the BGP updates  Make use of a transitive attribute: Aggregator fields

120 119 Beacon data cleaning process  Goal  Clearly identify updates associated with injected routing change  Discard beacon events influenced by external routing changes

121 120 Cumulative Beacon statistics: significant noise  Current observation points:  111 peers: RIPE, Route-View, Berkeley, MIT, MIT-RON nodes, ATT-Research, AT&T, AMS-IXP, Verio Avg expansion: 2*0.2+1*0.8=1.2

122 121 Cumulative Beacon statistics: significant noise  Example response to ANN-beacon at peer p  R1: ASpath= 286 209 1 3130 3927  R2: ASpath= 286 209 2914 3130 3927  100 events: 20: R1 R2, 80: R2 BeaconMax no. transient routes Max ANN- out-signal length Max WD- out-signal length Max ANN-avg expansion Max WD-avg expansion 118611149.711.2 21799157.010.8 311716135.811.4 430718158.816.3 Out-signal length=1No. transient routes=2

123 122 Cisco vs. Juniper update rate-limiting Known last-hop Cisco and Juniper routers from the same AS and location Average signal length: average number of updates observed for a single beacon-injected change

124 123 “Cisco-like” last-hop routers (sec) Linear increase in signal duration wrt signal length Slope=30 second Due to Cisco’s default rate-limiting setting

125 124 (sec) “Juniper-like” last-hop routers Signal duration relatively stable wrt increase in signal length Shorter signal duration compared to “Cisco-like” last-hops

126 125 Route flap damping  A mechanism to punish unstable routes by suppressing them  Reduce router processing load due to instability  Prevent sustained routing oscillations  Do not sacrifice convergence times for well-behaved routes There is conjecture a single announcement can cause route suppression.

127 126 RFC2439: Route flap damping Exponentially decayed  Scope  Inbound external routes  Per neighbor, per destination  Penalty  Flap: route change  Increases for each flap  Decays exponentially 0 1000 2 32 Reuse threshold 750 Time (min) Penalty Cisco default setting 3000 4 Suppress threshold 2000

128 127 Strong evidence for withdrawal- and announcement- triggered suppression. Route flap damping analysis

129 128 Distinguish between announcement and withdrawal Summary : WD-triggered sup more likely than ANN- triggered sup Cisco: overall more likely trigger sup than Juniper (AAAW-pattern) Juniper: more aggressive for AWAW pattern

130 129 Convergence analysis Summary: Withdrawals converge slower than announcements Most beacon events converge within 3 minutes

131 130 Output signal duration 30 6090120

132 131 Beacon 1’s upstream change Single-homed (AS2914) Multi-homed (AS1,2914) Multi-homed (AS1239, 2914)

133 132 Beacon for identifying router behavior Beacon 2 seen from RouteView data Rate-limiting timer  30 second Different rate-limiting behavior: Cisco vs. Juniper

134 133 Inter-arrival time analysis

135 134 Inter-arrival time modeling  Geometric distribution (body):  Update rate-limiting behavior: every 30 sec  Prob(missing update train) independent of how many already missed  Mass at 1:  Discretization of timestamps for times<1  Shifted exponential distribution (tail):  Most likely due to route flap damping

136 135 Motivation C BR C C C AS1 AS2AS3 destination A B C D Failure Disruption Congestion Mitigation AS4 source A backbone network is vulnerable to routing changes that occur in other domains.

137 136 Goal  Identify important routing anomalies  Lost reachability  Persistent flapping  Large traffic shifts Contributions: Build a tool to identify a small number of important routing disruptions from a large volume of raw BGP updates in real time. Use the tool to characterize routing disruptions in an operational network

138 137 Capturing Routing Changes C BR C CPE BGP Monitor C BR C C C C C C C C C iBGP eBGP Updates Best routes A large operational network (8/16/2004 – 10/10-2004)

139 138 Challenges  Large volume of BGP updates  Millions daily, very bursty  Too much for an operator to manage  Different from root-cause analysis  Identify changes and their effects  Focus on actionable events rather than diagnosis  Diagnose causes in/near the AS

140 139 System Architecture Event Classification Event Classification “Typed” Events EE BR EE EE BGP Updates (10 6 ) BGP Update Grouping BGP Update Grouping Events Persistent Flapping Prefixes (10 1 ) (10 5 ) Event Correlation Event Correlation Clusters Frequent Flapping Prefixes (10 3 ) (10 1 ) Traffic Impact Prediction Traffic Impact Prediction EE BR EE EE Large Disruptions Netflow Data (10 1 ) From millions of updates to a few dozen reports

141 140 Grouping BGP Update into Events Challenge: A single routing change  leads to multiple update messages  affects routing decisions at multiple routers Approach: Group together all updates for a prefix with inter-arrival < 70 seconds Flag prefixes with changes lasting > 10 minutes. BGP Update Grouping BGP Update Grouping EE BR EE EE BGP Updates Events Persistent Flapping Prefixes

142 141 Grouping Thresholds  Based on our understanding of BGP and data analysis  Event timeout: 70 seconds  2 * MRAI timer + 10 seconds  98% inter-arrival time < 70 seconds  Convergence timeout: 10 minutes  BGP usually converges within a few minutes  99.9% events < 10 minutes

143 142 Persistent Flapping Prefixes  Types of persistent flapping  Conservative damping parameters (78.6%)  Protocol oscillations due to MED (18.3%)  Unstable interfaces or BGP sessions (3.0%) A surprising finding: 15.2% of updates were caused by persistent-flapping prefixes even though flap damping is enabled.

144 143 Example: Unstable eBGP Session ISP Peer Customer E C E B E A E D p  Flap damping parameters is session-based  Damping not implemented for iBGP sessions

145 144 Event Classification Challenge: Major concerns in network management  Changes in reachability  Heavy load of routing messages on the routers  Change of flow of the traffic through the network Event Classification Event Classification Events “Typed” Events, e.g., Loss/Gain of Reachability Solution: classify events by severity of their impact

146 145 Event Category – “No Disruption” ISP E A p E B E C E E AS 2 E D AS 1 No Traffic Shift “No Disruption”: no border routers have any traffic shift. (50.3%)

147 146 Event Category – “Internal Disruption” ISP E A p E B E C E E AS 2 E D AS 1 Internal Traffic Shift “Internal Disruption”: all traffic shifts are internal. (15.6%)

148 147 Event Category – “Single External Disruption” ISP E A p E B E C E E AS 2 E D AS 1 external Traffic Shift “Single External Disruption”: only one of the traffic shifts is external (20.7%)

149 148 Statistics on Event Classification EventsUpdates No Disruption50.3%48.6% Internal Disruption15.6%3.4% Single External Disruption20.7%7.9% Multiple External Disruption7.4%18.2% Loss/Gain of Reachability6.0%21.9%  First 3 categories have significant day-to-day variations  Updates per event depends on the type of events and the number of affected routers

150 149 Event Correlation Challenge: A single routing change  affects multiple destination prefixes Event Correlation Event Correlation “Typed” Events Clusters Solution: group the same-type, close-occurring events

151 150 EBGP Session Reset  Caused most of “single external disruption” events  Check if the number of prefixes using that session as the best route changes dramatically  Validation with Syslog router report (95%) time Number of prefixes session failure session recovery

152 151 Hot-Potato Changes  Hot-Potato Changes  Caused “internal disruption” events  Validation with OSPF measurement (95%) [Teixeira et al – SIGMETRICS’ 04] ISP P E A E B E C 10119 “Hot-potato routing” = route to closest egress point

153 152 Traffic Impact Prediction Challenge: Routing changes have different impacts on the network which depends on the popularity of the destinations Traffic Impact Prediction Traffic Impact Prediction EE BR Clusters Large Disruptions Netflow Data EE BR EE Solution: weigh each cluster by traffic volume

154 153 Traffic Impact Prediction  Traffic weight  Per-prefix measurement from netflow  10% prefixes accounts for 90% of traffic  Traffic weight of a cluster  the sum of “traffic weight” of the prefixes  A small number of large clusters have large traffic weight  Mostly session resets and hot-potato changes

155 154 Performance Evaluation  Memory  Static memory: “current routes”, 600 MB  Dynamic memory: “clusters”, 300 MB  Speed  99% of intervals of 1 second of updates can be process within 1 second  Occasional execution lag  Every interval of 70 seconds of updates can be processed within 70 seconds Measurements were based on 900MHz CPU

156 155 Conclusion of BGP Troubleshooting Tool  BGP troubleshooting system  Fast, online fashion  Operators’ concerns (reachability, flapping, traffic)  Significant information reduction  millions of update  a few dozens of large disruptions  Uncovered important network behavior  Hot-Potato changes  Session resets  Persistent-flapping prefixes

157 Part IV BGP Modeling

158 157 BGP Is Not Guaranteed to Converge!  BGP is not guaranteed to converge to a stable routing. Policy inconsistencies can lead to “livelock” protocol oscillations.  Goal:  Design a simple, tractable and complete model of BGP modeling  Example application: sufficient condition to guarantee convergence.

159 158 BGP is Solving What Problem?  X can  aid in the design of policy analysis algorithms and heuristics,  aid in the analysis and design of BGP and extensions,  help explain some BGP routing anomalies,  provide a fun way of thinking about the protocol Underlying problem Shortest Paths Distributed means of computing a solution. X? RIP, OSPF, IS-IS BGP

160 159 Separate Dynamic and Static Semantics  Static semantics:  BGP policies  Stable Paths Problem  Dynamic semantics:  BGP  SPVP  SPVP: Simple Path Vector Protocol  A distributed algorithm for solving Stable Paths Problem

161 160 What is Stable Paths Problem? Example:  A graph of nodes and edges,  Node 0, called the origin,  For each non-zero node, a set or permitted paths to the origin. This set always contains the “null path”.  A ranking of permitted paths at each node. Null path is always least preferred. 2 5 5 2 1 0 0 2 1 0 2 0 1 3 0 1 0 3 0 4 2 0 4 3 0 3 4 2 1 most preferred … least preferred (not null)

162 161 A Solution to SPP  A solution is an assignment of permitted paths to each node such that  node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph,  each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors

163 162 A Solution to SPP  A solution need not represent a shortest path tree or a spanning tree 5 5 2 1 0 0 2 1 0 2 0 1 3 0 1 0 3 0 4 2 0 4 3 0 3 4 2 1

164 163 There can be Multiple Solutions to an SPP First solution 102 1 2 0 1 0 102102 2 1 0 2 0 1 2 0 1 0 2 1 0 2 0 1 2 0 1 0 2 1 0 2 0 Second solution DISAGREE

165 164 Multiple Solutions Can Occur Due to Recovery: 1023 1 0 1 2 3 0 2 3 0 2 1 0 3 2 1 0 3 0 1 02 3 1 02 3 Remove primary linkRestore primary link 1 0 1 2 3 0 2 3 0 3 1 0 3 2 1 0 3 0 primary link backup link

166 165 Ranking BGP Paths  Highest local Preference  Shortest AS path Length  Origin: IGP<EGP<INCOMPLETE  Lowest MED value  IBGP preferred over EBGP  Lowest IGP cost  Tie breaking

167 166 Bad Gadget: No Solution 2 0 3 1 2 1 0 2 0 1 3 0 1 0 3 2 0 3 0 4 Stage 1: 1: [10] 2: [210] 3: [30] Stage 2: 1:[130] 2:[20] 3:[320] Back to stage 1

168 167 Bad Gadget: No Solution 2 0 3 1 2 1 0 2 0 1 3 0 1 0 3 2 0 3 0 4 Stage 1: 1: [10] 2: [20] 3: [320] Stage 2: 1:[130] 2:[210] 3:[30] Back to stage 1

169 168 Has A Solution, But Can Get Trapped: 102 1 2 0 1 0 2 1 0 2 0 3456 5 3 1 0 5 6 3 1 2 0 5 3 1 2 0 6 3 1 0 6 4 3 1 2 0 6 3 1 2 0 4 3 1 0 4 5 3 1 2 0 4 3 1 2 0 3 1 0 3 1 2 0 As with DISAGREE, this part has two distinct solutions This part has a solution only when node 1 is assigned the direct path (1 0).

170 169 Has A Solution, But Can Get Trapped: 102 1 2 0 1 0 2 1 0 2 0 3456 5 3 1 0 5 6 3 1 2 0 5 3 1 2 0 6 3 1 0 6 4 3 1 2 0 6 3 1 2 0 4 3 1 0 4 5 3 1 2 0 4 3 1 2 0 3 1 0 3 1 2 0 As with DISAGREE, this part has two distinct solutions This part has a solution only when node 1 is assigned the direct path (1 0).

171 170 How To Solve An SPP?  Exponential complexity  Just enumerate all path assignments, And check stability of each….  NP-complete  3-SAT can be reduced to SPP

172 171 Distributed Algorithms to Solve SPP  OSPF-like  Distributed topology, path ranks  Solve SPP locally  Exponential worst case  How to avoid loops if multiple solutions exist?  RIP-like:  Pick the best path form neighbors’ paths  Tell neighbors about changes  Can diverge  Not guaranteed to find a solution even if it exists  No bound on convergence time

173 172 SPVP Protocol  Pick the best path available at any time process spvp[u] { receive P from w  { rib-in(u  w) := u P if rib(u) != best(u) { rib(u) := best(u) foreach v in peers(u) { send rib(u) to v }

174 173 SPVP and SPP  SPVP wanders around assignment space SPP SolvableSPVP Can Diverge must converge must diverge

175 174 A sufficient condition for sanity If an instance of SPP has an acyclic dispute digraph, then Static (SPP) solvable Dynamic (SPVP) unique solution safe (can’t diverge) predictable restoration all sub-problems uniquely solvable robust with respect to link/node failures

176 175 Dispute Digraph Example 21 0 4 3 1 3 0 1 0 2 1 0 2 0 4 2 0 4 3 0 3 4 2 0 3 0 3 4 2 0 2 1 0 2 01 0 3 0 4 3 01 3 0 4 2 0 BAD GADGET II CYCLE

177 176 Dispute Wheels u_0 u_1 u_2 u_i u_(i+1) u_k Q_0 Q_1 Q_2 Q_k Q_(I+1) Q_i R_0 R_1 R_i R_k At u_i, rank of Q_i is less than or equal to rank of R_iQ_(i+1) There exists a dispute wheel iff there exists cycle in the dispute digraph

178 177 Dispute Wheel Example 21 0 3 1 2 3 0 1 2 0 1 0 2 3 1 0 2 3 0 2 0 3 1 2 0 3 1 0 3 0 21 0 3 1 3 2

179 178 A Dynamic Solution  Extend SPVP with a history attribute,  A route’s history contains a path in the dispute digraph that “explains” how the route was obtained,  A route history will contain a dispute cycle if and only if a policy dispute is dynamically realized.  If a route’s history contains a cycle, then suppress it ….


Download ppt "Internet Routing: Measurement, Modeling, and Analysis Dr. Jia Wang AT&T Labs Research Florham Park, NJ 07932, USA"

Similar presentations


Ads by Google