Presentation on theme: "Shivkumar Kalyanaraman 1 Routing: Overview and Key Protocols Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Based in part."— Presentation transcript:
Shivkumar Kalyanaraman 1 Routing: Overview and Key Protocols Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Based in part upon slides of Prof. Raj Jain (OSU), S. Keshav (Cornell), J. Kurose (U Mass), Noel Chiappa (MIT), Tim Griffin (AT&T), Ion Stoica (UCB),
Shivkumar Kalyanaraman 2 Routing vs Forwarding vs Bridging Distance vector vs Link state routing Addressing and Routing: Scalability OSPF, RIP protocols Inter-domain Routing Issues BGP protocol Overview
Shivkumar Kalyanaraman 3 Routing vs. Forwarding Forwarding: select an output port based on destination address and routing table Data-plane function Often implemented in hardware Routing: process by which routing table is built.. … so that the series of local forwarding decisions takes the packet to the destination with high probability, and …(reachability condition) … the path chosen/resources consumed by the packet is efficient in some sense… (optimality and filtering condition) Control-plane function Implemented in software
Shivkumar Kalyanaraman 4 Forwarding Table Can display forwarding table using “netstat -rn” Sometimes called “routing table” Destination Gateway Flags Ref Use Interface UH lo U 2 13 fa U le U 2 25 qaa U 3 0 le0 default UG
Shivkumar Kalyanaraman 5 Interconnection Devices H H B H H Router Extended LAN =Broadcast domain LAN= Collision Domain Network Datalink Physical Transport Router Bridge/Switch Repeater/Hub Gateway Application Network Datalink Physical Transport Application
Shivkumar Kalyanaraman 6 Routing problem q Collect, process, and condense global state into local forwarding information q Global state q inherently large q dynamic q hard to collect q Hard issues: q consistency, completeness, scalability q Impact of resource needs of sessions
Shivkumar Kalyanaraman 7 Consistency q Defn: A series of independent local forwarding decisions must lead to connectivity between any desired (source, destination) pair in the network. q If the states are inconsistent, the network is said not to have “converged” to steady state (I.e. is in a transient state) q Inconsistency leads to loops, wandering packets etc q In general a part of the routing information may be consistent while the rest may be inconsistent. q Large networks => inconsistency is a scalability issue. q Consistency can be achieved in two ways: q Fully distributed approach: a consistency criterion or invariant across the states of adjacent nodes q Signaled approach: the signaling protocol sets up local forwarding information along the path.
Shivkumar Kalyanaraman 8 Completeness q Defn: The network as a whole and every node has sufficient information to be able to compute all paths. q In general, with more information available locally, routing algorithms tend to converge faster, because the chances of inconsistency reduce. q But this means that more distributed state must be collected at each node and processed. q The demand for completeness also limits the scalability of the algorithm. q Since both consistency and completeness pose scalability problems, large networks have to be structured hierarchically and abstract entire networks as a single node.
Shivkumar Kalyanaraman 9 Internet Routing Model q 2 key features: q Dynamic routing q Intra- and Inter-AS routing, AS = locus of admin control q Internet organized as “autonomous systems” (AS). q AS is internally connected q Interior Gateway Protocols (IGPs) within AS. q Eg: RIP, OSPF, HELLO q Exterior Gateway Protocols (EGPs) for AS to AS routing. q Eg: EGP, BGP-4
Shivkumar Kalyanaraman 10 Dynamic Routing Model
Shivkumar Kalyanaraman 11 Intra-AS and Inter-AS routing inter-AS, intra-AS routing in gateway A.c network layer link layer physical layer a b b a a C A B d Gateways: perform inter-AS routing amongst themselves perform intra-AS routers with other routers in their AS A.c A.a C.b B.a c b c
Shivkumar Kalyanaraman 12 Intra-AS and Inter-AS routing: Example Host h2 a b b a a C A B d c A.a A.c C.b B.a c b Host h1 Intra-AS routing within AS A Inter-AS routing between A and B Intra-AS routing within AS B
Shivkumar Kalyanaraman 13 Basic Dynamic Routing Methods q Source-based: source gets a map of the network, q source finds route, and either q signals the route-setup (eg: ATM approach) q encodes the route into packets (inefficient) q Link state routing: per-link information q Get map of network (in terms of link states) at all nodes and find next-hops locally. q Maps consistent => next-hops consistent q Distance vector: per-node information q At every node, set up distance signposts to destination nodes (a vector) q Setup this by peeking at neighbors’ signposts.
Shivkumar Kalyanaraman 14 DV & LS: consistency criterion q The subset of a shortest path is also the shortest path between the two intermediate nodes. q Corollary: q If the shortest path from node i to node j, with distance D(i,j) passes through neighbor k, with link cost c(i,k), then: D(i,j) = c(i,k) + D(k,j) i k j c(i,k) D(k,j)
Shivkumar Kalyanaraman 15 Distance Vector DV = Set (vector) of Signposts, one for each destination
Shivkumar Kalyanaraman 16 Distance Vector (DV) Approach Consistency Condition: D(i,j) = c(i,k) + D(k,j) q The DV (Bellman-Ford) algorithm evaluates this recursion iteratively. q In the m th iteration, the consistency criterion holds, assuming that each node sees all nodes and links m- hops (or smaller) away from it (i.e. an m-hop view) A E D CB Example network A E D CB A’s 2-hop view (After 2 nd Iteration) A E B 7 1 A’s 1-hop view (After 1 st iteration)
Shivkumar Kalyanaraman 17 Distance Vector (DV) Example q A’s distance vector D(A,*): q After Iteration 1 is: [0, 7, INFINITY, INFINITY, 1] q After Iteration 2 is: [0, 7, 8, 3, 1] q After Iteration 3 is: [0, 7, 5, 3, 1] q After Iteration 4 is: [0, 6, 5, 3, 1] A E D CB Example network A E D CB A’s 2-hop view (After 2 nd Iteration) A E B 7 1 A’s 1-hop view (After 1 st iteration)
Shivkumar Kalyanaraman 18 Link State (LS) Approach q The link state (Dijkstra) approach is iterative, but it pivots around destinations j, and their predecessors k = p(j) q Observe that an alternative version of the consistency condition holds for this case: D(i,j) = D(i,k) + c(k,j) q Each node i collects all link states c(*,*) first and runs the complete Dijkstra algorithm locally. i k j c(k,j) D(i,k)
Shivkumar Kalyanaraman 19 Dijkstra’s algorithm: example Step set N A AD ADE ADEB ADEBC ADEBCF D(B),p(B) 2,A D(C),p(C) 5,A 4,D 3,E D(D),p(D) 1,A D(E),p(E) infinity 2,D D(F),p(F) infinity 4,E A E D CB F The shortest-paths spanning tree rooted at A is called an SPF-tree
Shivkumar Kalyanaraman 20 q Topology information is flooded within the routing domain q Best end-to-end paths are computed locally at each router. q Best end-to-end paths determine next-hops. q Based on minimizing some notion of distance q Works only if policy is shared and uniform q Examples: OSPF, IS-IS q Each router knows little about network topology q Only best next-hops are chosen by each router for each destination network. q Best end-to-end paths result from composition of all next- hop choices q Does not require any notion of distance q Does not require uniform policies at all routers q Examples: RIP, BGP Link StateVectoring Summary: Distributed Routing Techniques
Shivkumar Kalyanaraman 21 RIP: Routing Information Protocol Uses hop count as metric (max: 16 is infinity) Tables (vectors) “advertised” to neighbors every 30 s. Each advertisement: upto 25 entries No advertisement for 180 sec: neighbor/link declared dead routes via neighbor invalidated new advertisements sent to neighbors (Triggered updates) neighbors in turn send out new advertisements (if tables changed) link failure info quickly propagates to entire net poison reverse used to prevent ping-pong loops (infinite distance = 16 hops)
Shivkumar Kalyanaraman 22 RIPv1 Problems (Continued) q Split horizon/poison reverse does not guarantee to solve count-to-infinity problem q 16 = infinity => RIP for small networks only! q Slow convergence q Broadcasts consume non-router resources q RIPv1 does not support subnet masks (VLSMs) q No authentication
Shivkumar Kalyanaraman 23 RIPv2 q Why ? Installed base of RIP routers q Provides: q VLSM support q Authentication q Multicasting q “Wire-sharing” by multiple routing domains, q Tags to support EGP/BGP routes. q Uses reserved fields in RIPv1 header. q First route entry replaced by authentication info.
Shivkumar Kalyanaraman 24 Link State Protocols q Key: Create a network “map” at each node. q 1. Node collects the state of its connected links and forms a “Link State Packet” (LSP) q 2. Flood LSP => reaches every other node in the network and everyone now has a network map. q 3. Given map, run Dijkstra’s shortest path algorithm (SPF) => get paths to all destinations q 4. Routing table = next-hops of these paths. q 5. Hierarchical routing: organization of areas, and filtered control plane information flooded.
Shivkumar Kalyanaraman 25 Hello: Packet Format
Shivkumar Kalyanaraman 26 Topology Dissemination q A.k.a LSP distribution q 1. Flood LSPs on links except incoming link q Require at most 2E transfers for n/w with E edges q 2. Sequence numbers to detect duplicates q Why? Routers/links may go down/up q Issue: wrap-around, larger sequence number is not the most recent!
Shivkumar Kalyanaraman 29 Topology Dissemination (Continued) q Checksum field: q Drop packet if in error, get retransmission from neighbor q Age field (similar to TTL) q Number of seconds since LSA originated q Periodically incremented after acceptance q Originating router refreshes LSA after 30 min q Delete if Age = MaxAge q Low age field + large seq # => that LSA is flapping or frequently changing …
Shivkumar Kalyanaraman 30 Recovering from a partition On partition, LSP databases can get out of synch Databases described by database descriptor records Routers on each side of a newly restored link talk to each other to update databases (determine missing and out-of- date LSPs) => selective synchronization
Shivkumar Kalyanaraman 31 Inter-Domain Routing: Big Picture Large ISP Dial-Up ISP Access Network Small ISP Stub Large number of diverse networks
Shivkumar Kalyanaraman 32 Requirements for Inter-AS Routing q Should scale for the size of the global Internet. q Focus on reachability, not optimality q Use address aggregation techniques to minimize core routing table sizes and associated control traffic q At the same time, it should allow flexibility in topological structure (eg: don’t restrict to trees etc) q Allow policy-based routing between autonomous systems q Policy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes) q Fully distributed routing (as opposed to a signaled approach) is the only possibility. q Extensible to meet the demands for newer policies.
Shivkumar Kalyanaraman 33 Who speaks Inter-AS routing? R border routerinternal router BGP R2R1R3 AS1 AS2 Two types of routers Border router(Edge), Internal router(Core) Two border routers of different ASes will have a BGP session
Shivkumar Kalyanaraman 34 Customers and Providers Customer pays provider for access to the Internet provider customer IP traffic provider customer
Shivkumar Kalyanaraman 35 Nontransit vs. Transit ASes ISP 1 ISP 2 Nontransit AS might be a corporate or campus network. Could be a “content provider” NET A Traffic NEVER flows from ISP 1 through NET A to ISP 2 Internet Service providers (ISPs) have transit networks
Shivkumar Kalyanaraman 36 The Peering Relationship peer customerprovider Peers provide transit between their respective customers Peers do not provide transit between peers Peers (often) do not exchange $$$ traffic allowed traffic NOT allowed
Shivkumar Kalyanaraman 37 BGP-4 q BGP = Border Gateway Protocol q Is a Policy-Based routing protocol q Is the de facto EGP of today’s global Internet q Relatively simple protocol, but configuration is complex and the entire world can see, and be impacted by, your mistakes : BGP-1 [RFC 1105] –Replacement for EGP (1984, RFC 904) 1990 : BGP-2 [RFC 1163] 1991 : BGP-3 [RFC 1267] 1995 : BGP-4 [RFC 1771] –Support for Classless Interdomain Routing (CIDR)
Shivkumar Kalyanaraman 38 BGP Operations (Simplified) Establish session on TCP port 179 Exchange all active routes Exchange incremental updates AS1 AS2 While connection is ALIVE exchange route UPDATE messages BGP session
Shivkumar Kalyanaraman 39 Four Types of BGP Messages q Open : Establish a peering session. q Keep Alive : Handshake at regular intervals. q Notification : Shuts down a peering session. q Update : Announcing new routes or withdrawing previously announced routes. announcement = prefix + attributes values
Shivkumar Kalyanaraman 40 Two Types of BGP Neighbor Relationships External Neighbor (eBGP) in a different Autonomous Systems Internal Neighbor (iBGP) in the same Autonomous System AS1 AS2 eBGP iBGP iBGP is routed (using IGP!)
Shivkumar Kalyanaraman 41 I-BGP and E-BGP R border router internal router R1 AS1 R4 R5 B AS3 E-BGP R2R3 A AS2 announce B IGP: Interior Gateway Protocol. Examples: IS-IS, OSPF I-BGP IGP
Shivkumar Kalyanaraman 42 IBGP vs EBGP q I-BGP nodes: typically ABRs, or other nodes where default routes terminate q I-BGP peering sessions between every pair of routers within an AS: full mesh. A B D C Physical link IBGP session AS1
Shivkumar Kalyanaraman 44 AS Confederations q Divide and conquer: Divides a large AS into sub- ASs AS R1 R2 Sub-AS
Shivkumar Kalyanaraman 45 Service Provider Global Internet Routing Mesh …...…… …...……. Address Aggregation: CIDR Service Provider Global Internet Routing Mesh …...…… /16 Inter-domain Routing With CIDR Inter-domain Routing Without CIDR
Shivkumar Kalyanaraman 46 RFC 1519: Classless Inter-Domain Routing (CIDR) IP Address : IP Mask: Address Mask for hostsNetwork Prefix Pre-CIDR: Network ID ended on 8-, 16, 24- bit boundary CIDR: Network ID can end at any bit boundary Usually written as /15, a.k.a “supernetting”
Shivkumar Kalyanaraman 47 Longest Prefix Match (Classless) Forwarding Destination = payload PrefixInterfaceNext Hop / ATM 5/0/ / /23attached Ethernet 0/1/3 Serial 1/0/ IP Forwarding Table / ATM 5/0/9 even better OK better best!
Shivkumar Kalyanaraman 48 What is Routing Policy q Policy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes) q Public description of the relationship between external BGP peers q Can also describe internal BGP peer relationship q Eg: Who are my BGP peers q What routes are q Originated by a peer q Imported from each peer q Exported to each peer q Preferred when multiple routes exist q What to do if no route exists?
Shivkumar Kalyanaraman 49 BGP Route Processing Best Route Selection Apply Import Policies Best Route Table Apply Export Policies Install forwarding Entries for best Routes. Receive BGP Updates Best Routes Transmit BGP Updates Apply Policy = filter routes & tweak attributes Based on Attribute Values IP Forwarding Table Apply Policy = filter routes & tweak attributes
Shivkumar Kalyanaraman 50 Policy Implementation Flow Main BGP RIB Adj RIB Out Outgo- ing Adj RIB In Incom- ing Main RIB/ FIB IGPs Static & HW Info
Shivkumar Kalyanaraman 51 Import and Export Policies q For inbound traffic q Filter outbound routes q Tweak attributes on outbound routes in the hope of influencing your neighbor’s best route selection q For outbound traffic q Filter inbound routes q Tweak attributes on inbound routes to influence best route selection outbound routes inbound routes inbound traffic outbound traffic In general, an AS has more control over outbound traffic
Shivkumar Kalyanaraman 52 BGP Policy Knob: Attributes Value Code Reference ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] reserved for development From IANA: We will cover a subset of these attributes Not all attributes need to be present in every announcement
Shivkumar Kalyanaraman 53 UPDATE message in BGP q Primary message between two BGP speakers. q Used to advertise/withdraw IP prefixes (NLRI) q Path attributes field : unique to BGP q Apply to all prefixes specified in NLRI field q Optional vs Well-known; Transitive vs Non-transitive Withdrawn Routes Length 2 octets Withdrawn Routes (variable length) Total Path Attributes Length Path Attributes (variable length) Network Layer Reachability Info. (NLRI: variable length)
Shivkumar Kalyanaraman 54 Path Attributes: ORIGIN q ORIGIN: q Describes how a prefix came to BGP at the origin AS q Prefixes are learned from a source and “injected” into BGP: q Directly connected interfaces, manually configured static routes, dynamic IGP or EGP q Values: q IGP (EGP): Prefix learnt from IGP (EGP) q INCOMPLETE: Static routes
Shivkumar Kalyanaraman 55 Path Attributes: AS-PATH q List of ASs thru which the prefix announcement has passed. AS on path adds ASN to AS-PATH q Eg: /16 originates at AS1 and is advertised to AS3 via AS2. q Eg: AS-SEQUENCE: “ ” q Used for loop detection and path selection AS1 (100) AS2 (200) AS3 (15) /16
Shivkumar Kalyanaraman 56 Traffic Often Follows ASPATH AS 4AS 3 AS 2 AS /16 ASPATH = IP Packet Dest =
Shivkumar Kalyanaraman 57 … But It Might Not AS 4AS 3 AS 2 AS /16 ASPATH = IP Packet Dest = AS /25 ASPATH = /25 AS 2 filters all subnets with masks longer than / /16 ASPATH = 1 From AS 4, it may look like this packet will take path 3 2 1, but it actually takes path 3 2 5
Shivkumar Kalyanaraman 58 Shorter AS-PATH Doesn’t Mean Shorter # Hops AS 4 AS 3 AS 2 AS 1 BGP says that path 4 1 is better than path Duh!
Shivkumar Kalyanaraman 59 Path Attributes: NEXT-HOP q Next-hop: node to which packets must be sent for the IP prefixes. May not be same as peer. q UPDATE for , NEXT-HOP= BGP Speakers Not a BGP Speaker
Shivkumar Kalyanaraman 60 Recursive Lookup q If routes (prefix) are learnt thru iBGP, NEXT-HOP is the iBGP router which originated the route. q Note: iBGP peer might be several IP-level hops away as determined by the IGP q Hence BGP NEXT-HOP is not the same as IP next- hop q BGP therefore checks if the “NEXT-HOP” is reachable through its IGP. q If so, it installs the IGP next-hop for the prefix q This process is known as “recursive lookup” – the lookup is done in the control-plane (not data-plane) before populating the forwarding table. q Example in next slide
Shivkumar Kalyanaraman 61 Forwarding Table Join EGP with IGP For Connectivity AS 1AS / EGP /16 destinationnext hop /30 destinationnext hop /16 Next Hop = / /16 destinationnext hop /
Shivkumar Kalyanaraman 62 Local Preference AS1 AS2 MED Load-Balancing Knobs in BGP q LOCAL-PREF: outbound traffic, local preference (box- level knob) q MED: Inbound-traffic, typically from the same ISP (link- level knob)
Shivkumar Kalyanaraman 63 Path Attribute: LOCAL-PREF q Locally configured indication about which path is preferred to exit the AS in order to reach a certain network. Default value = 100. Higher is better.
Shivkumar Kalyanaraman 64 Attributes: MULTI-EXIT Discriminator q Also called METRIC or MED Attribute. Lower is better q AS1:multihomed customer. q AS2 (provider) includes MED to AS1 q AS1 chooses which link (NEXTHOP) to use q Eg: traffic to AS3 can go thru Link1, and AS2 thru Link2 AS1 AS2 AS3 AS4 Link A Link B
Shivkumar Kalyanaraman 65 MEDs Can Export Internal Instability Heavy Content Web Farm /24 MED = /24 MED = 56 OR FLAP
Shivkumar Kalyanaraman 66 ASPATH Padding: Shed inbound traffic Padding will (usually) force inbound traffic from AS 1 to take primary link AS /24 ASPATH = customer AS 2 provider /24 backupprimary /24 ASPATH = 2
Shivkumar Kalyanaraman 67 Deaggregation + Multihoming AS 1 customer AS 2 provider /8 AS 3 provider /16 If AS 1 does not announce the more specific prefix, then most traffic to AS 2 will go through AS 3 because it is a longer match AS 2 is “punching a hole” in the CIDR block of AS 1=> subverts CIDR
Shivkumar Kalyanaraman 68 CIDR at Work, No load balancing ISP3 AS / /16 Link A Link B ISP /11 ISP / / /16 PrefixNext Hop ORIGIN AS /11ISP /10ISP2 Table at ISP3
Shivkumar Kalyanaraman 69 CIDR Subverted for Load Balancing ISP3 AS / /16 Link A Link B ISP /11 ISP / /24, / /24, /16 PrefixNext Hop ORIGIN AS /11ISP /10ISP /24ISP1AS /24ISP2AS1 Table at ISP3
Shivkumar Kalyanaraman 70 How Can Routes be Colored? BGP Communities A community value is 32 bits By convention, first 16 bits is ASN indicating who is giving it an interpretation community number Community Attribute = a list of community values. (So one route can belong to multiple communities) RFC 1997 (August 1996) Used within and between ASes The set of ASes must agree on how to interpret the community value Very powerful BECAUSE it has no (predefined) meaning Two reserved communities no_advertise 0xFFFFFF02: don’t pass to BGP neighbors no_export = 0xFFFFFF01: don’t export out of AS
Shivkumar Kalyanaraman 71 Communities Example q 1:100 q Customer routes q 1:200 q Peer routes q 1:300 q Provider Routes q To Customers q 1:100, 1:200, 1:300 q To Peers q 1:100 q To Providers q 1:100 AS 1 ImportExport
Shivkumar Kalyanaraman 72 BGP Route Selection Process q If NEXTHOP is inaccessible do not consider the route. q Prefer largest LOCAL-PREF q If same LOCAL-PREF prefer the shortest AS-PATH. q If all paths are external prefer the lowest ORIGIN code (IGP
Shivkumar Kalyanaraman 73 Route Selection Summary Highest Local Preference Shortest ASPATH Lowest MED i-BGP < e-BGP Lowest IGP cost to BGP egress Lowest router ID traffic engineering Enforce relationships Throw up hands and break ties
Shivkumar Kalyanaraman 75 Large BGP Tables Considered Harmful Routing tables must store best routes and alternate routes Burden can be large for routers with many alternate routes (route reflectors for example) Routers have been known to die Increases CPU load, especially during session reset
Shivkumar Kalyanaraman 76 Summary q Routing Concepts q DV and LS algorithms q RIP, OSPF, BGP