Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers).

Similar presentations


Presentation on theme: "Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers)."— Presentation transcript:

1 Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following:  If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!)  If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material. Thanks and enjoy! JFK/KWR All material copyright 1996-2007 J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down Approach 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July 2007.

2 Network Layer4-2 Chapter 4: Network Layer Chapter goals: r understand principles behind network layer services: m network layer service models m forwarding versus routing m how a router works m routing (path selection) m dealing with scale m advanced topics: IPv6, mobility r instantiation, implementation in the Internet

3 Network Layer4-3 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

4 Network Layer4-4 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers segments to transport layer r network layer protocols in every host, router r router examines header fields in all IP datagrams passing through it application transport network data link physical application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical

5 Network Layer4-5 Network layer functions r Connection setup m connection-oriented, host- to-host connection m datagram r Delivery semantics: m Unicast, broadcast, multicast, anycast m In-order, any-order r Security m secrecy, integrity, authenticity r Demux to upper layer m next protocol m Can be either transport or network (tunneling) r Quality-of-service m provide predictable performance r Fragmentation m break-up packets based on data-link layer properties r Routing m path selection and packet forwarding r Addressing m flat vs. hierarchical m global vs. local m variable vs. fixed length

6 Network Layer4-6 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

7 Network Layer4-7 Network service model Combining the functions into a particular network Q: What service model for “channel” transporting datagrams from sender to rcvr? Example services for individual datagrams: r guaranteed delivery r guaranteed delivery with less than 40 msec delay Example services for a flow of datagrams: r in-order datagram delivery r guaranteed minimum bandwidth to flow r restrictions on changes in inter- packet spacing (jitter)

8 Network Layer4-8 Network layer connection and connection-less service r Datagram network provides network-layer connectionless service r VC network provides network-layer connection service m Analogous to the transport-layer services, but on a host-to-host basis with an in-network implementation

9 Network Layer4-9 Connection-oriented virtual circuits r Circuit abstraction m Examples: ATM, frame relay, X.25, phone network m Model call setup and signaling for each call before data can flow guaranteed performance during call call teardown and signaling to remove call m Network support each packet carries circuit identifier (not destination host ID) every router on source-dest path maintains “state” for each passing circuit link, router resources (bandwidth, buffers) allocated to VC to guarantee circuit-like performance application transport network data link physical application transport network data link physical 1. Initiate call 2. incoming call 3. Accept call 4. Call connected 5. Data flow begins 6. Receive data

10 Network Layer4-10 Connectionless datagram service r Postal service abstraction (Internet) m Model no call setup or teardown at network layer no service guarantees m Network support no state within network on end-to-end connections packets forwarded based on destination host ID packets between same source-dest pair may take different paths application transport network data link physical application transport network data link physical 1. Send data 2. Receive data

11 Network Layer4-11 Datagram or VC network: why? Internet r data exchange among computers m “elastic” service, no strict timing req. r “smart” end systems (computers) m can adapt, perform control, error recovery m simple inside network, complexity at “edge” r many link types m different characteristics m uniform service difficult ATM r evolved from telephony r human conversation: m strict timing, reliability requirements m need for guaranteed service r “dumb” end systems m telephones m complexity inside network m only network provider can deploy new services!

12 Network Layer4-12 Network layer service models: Network Architecture Internet ATM Service Model best effort CBR VBR ABR UBR Bandwidth none constant rate guaranteed rate guaranteed minimum none Loss no yes no Order no yes Timing no yes no Congestion feedback no (inferred via loss) no congestion no congestion yes no Guarantees ?

13 Network Layer4-13 Adding circuits to the Internet r Intserv, Diffserv, RSVP m At the end of course if time permits m Chapter 7 in book

14 Network Layer4-14 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

15 Network Layer4-15 The Internet Network layer forwarding table Host, router network layer functions: Routing protocols path selection RIP, OSPF, BGP IP protocol addressing conventions datagram format packet handling conventions ICMP protocol error reporting router “signaling” Transport layer: TCP, UDP Link layer physical layer Network layer

16 Network Layer4-16 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

17 Network Layer4-17 IP datagram format ver length 32 bits data (variable length, typically a TCP or UDP segment) 16-bit identifier Internet checksum time to live 32 bit source IP address IP protocol version number header length (bytes) max number remaining hops (decremented at each router) for fragmentation/ reassembly total datagram length (bytes) upper layer protocol to deliver payload to head. len type of service “type” of data flgs fragment offset upper layer 32 bit destination IP address Options (if any) E.g. timestamp, record route taken, specify list of routers to visit. how much overhead with TCP? r 20 bytes of TCP r 20 bytes of IP r = 40 bytes + app layer overhead

18 Network Layer4-18 IP header r Version m Currently at 4, next version 6 r Header length m Length of header (20 bytes plus options) r Type of Service m Typically ignored m Replaced by DiffServ and ECN r Length m Length of IP fragment (payload) r Identification m To match up with other fragments r Flags m Don’t fragment flag m More fragments flag r Fragment offset m Where this fragment lies in entire IP datagram m Measured in 8 octet units (11 bit field)

19 Network Layer4-19 IP header (cont) r Time to live m Ensure packets exit the network r Protocol m Demultiplexing to higher layer protocols (TCP, UDP, SCTP) r Header checksum m Ensures some degree of header integrity m Relatively weak – 16 bit r Source IP, Destination IP (32 bit addresses) r Options m E.g. Source routing, record route, etc. m Performance issues Poorly supported

20 Network Layer4-20 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

21 Network Layer4-21 Recall network layer functions r How does IPv4 support.. m Connection setup m Delivery semantics m Security m Demux to upper layer m Quality-of-service m Fragmentation m Addressing m Routing

22 Network Layer4-22 IP connection setup r Hourglass design r No support for network layer connections m Unreliable datagram service m Out-of-order delivery possible m Connection semantics only at higher layer m Compare to ATM and phone network…

23 Network Layer4-23 IP delivery semantics r No reliability guarantees m Loss r No ordering guarantees m Out-of-order delivery possible r Unicast mostly m IP broadcast (255.255.255.255) not forwarded m IP multicast supported, but not widely used 224.0.0.0 to 239.255.255.255

24 Network Layer4-24 IP security r Weak support for integrity m IP checksum IP has a header checksum, leaves data integrity to TCP/UDP Catch errors within router or bridge that are not detected by link layer Incrementally updated as routers change fields http://www.rfc-editor.org/rfc/rfc1141.txt m No support for secrecy, authenticity r IPsec m Retrofit IP network layer with encryption and authentication m http://www.rfc-editor.org/rfc/rfc2411.txt http://www.rfc-editor.org/rfc/rfc2411.txt

25 Network Layer4-25 Internet checksum (review) Sender: r treat segment contents as sequence of 16-bit integers (See TCP checksum) r checksum: addition (1’s complement sum) of segment contents r sender puts checksum value into UDP checksum field Receiver: r compute checksum of received segment r check if computed checksum equals checksum field value: m NO - error detected m YES - no error detected. But maybe errors nonetheless? Goal: detect “errors” (e.g., flipped bits) in transmitted packet (note: used at transport layer only)

26 Network Layer4-26 IP demux to upper layer r http://www.rfc-editor.org/rfc/rfc1700.txt http://www.rfc-editor.org/rfc/rfc1700.txt m Protocol type field 1 = ICMPICMP 2 = IGMP 3 = GGP 4 = IP in IP 6 = TCPTCP 8 = EGP 9 = IGP 17 = UDPUDP 29 = ISO-TP4 80 = ISO-IP 88 = IGRP 89 = OSPFIGP 94 = IPIP http://www.rfc-editor.org/rfc/rfc2003.txthttp://www.rfc-editor.org/rfc/rfc2003.txt

27 Network Layer4-27 IP quality of service r IP originally had “type-of-service” (TOS) field to eventually support quality m Not used, ignored by most routers r Mid 90s m Integrated services (intserv) and RSVP signalling m Per-flow end-to-end QoS support Per-flow signaling Per-flow network resource allocation (*FQ, *RR scheduling algorithms) Setup and match flows on connection ID

28 Network Layer4-28 IP quality of service r RSVP m http://www.rfc-editor.org/rfc/rfc2205.txt http://www.rfc-editor.org/rfc/rfc2205.txt m Provides end-to-end signaling to network elements m General purpose protocol for signaling information m Not used now on a per-flow basis to support int-serv, but being reused for diff-serv. r intserv m Defines service model (guaranteed, controlled-load) http://www.rfc-editor.org/rfc/rfc2210.txt http://www.rfc-editor.org/rfc/rfc2211.txt http://www.rfc-editor.org/rfc/rfc2212.txt m Dozens of scheduling algorithms to support these services WFQ, W 2 FQ, STFQ, Virtual Clock, DRR, etc.

29 Network Layer4-29 IP quality of service r Why did RSVP, intserv fail? m Complexity Scheduling Routing (pinning routes) Per-flow signaling overhead m Lack of scalability Per-flow state m Economics Providers with no incentive to deploy SLA, end-to-end billing issues m QoS a weak-link property Requires every device on an end-to-end basis to support flow

30 Network Layer4-30 IP quality of service r Now it’s diffserv… m Use the “type-of-service” bits as a priority marking m http://www.rfc-editor.org/rfc/rfc2474.txt http://www.rfc-editor.org/rfc/rfc2474.txt m http://www.rfc-editor.org/rfc/rfc2475.txt http://www.rfc-editor.org/rfc/rfc2475.txt m http://www.rfc-editor.org/rfc/rfc2597.txt http://www.rfc-editor.org/rfc/rfc2597.txt m http://www.rfc-editor.org/rfc/rfc2598.txt http://www.rfc-editor.org/rfc/rfc2598.txt m Core network relatively stateless m AF Assured forwarding (drop precedence) m EF Expedited forwarding (strict priority handling)

31 Network Layer4-31 IP Fragmentation & Reassembly r network links have MTU (max.transfer unit) - largest possible link-level frame. m different link types, different MTUs r large IP datagram (can be 64KB) “fragmented” within network m one datagram becomes several datagrams m IP header on each fragment m IP identifier and offset fields to identify and order fragments fragmentation: in: one large datagram out: 3 smaller datagrams reassembly

32 Network Layer4-32 IP Fragmentation & Reassembly r Where to do reassembly? m End nodes avoids unnecessary work m Dangerous to do at intermediate nodes Buffer space Must assume single path through network May be re-fragmented later on in the route again fragmentation: in: one large datagram out: 3 smaller datagrams reassembly

33 Network Layer4-33 IP Fragmentation and Reassembly ID =x offset =0 fragflag =0 length =4000 ID =x offset =0 fragflag =1 length =1500 ID =x offset =185 fragflag =1 length =1500 ID =x offset =370 fragflag =0 length =1040 One large datagram becomes several smaller datagrams Example r 4000 byte datagram r MTU = 1500 bytes 1480 bytes in data field offset = 1480/8

34 Network Layer4-34 Fragmentation is Harmful r Uses resources poorly m Forwarding costs per packet m Best if we can send large chunks of data m Worst case: packet just bigger than MTU r Poor end-to-end performance m Loss of a fragment makes other fragments useless r Reassembly is hard m Buffering constraints

35 Network Layer4-35 Fragmentation r Path MTU Discovery m Remove fragmentation from the network m Mandatory in IPv6 Network layer does no fragmentation m Hosts dynamically discover smallest MTU of path http://www.rfc-editor.org/rfc/rfc1191.txt Algorithm: –Initialize MTU to MTU for first hop –Send datagrams with Don’t Fragment bit set –If ICMP “pkt too big” msg, decrease MTU What happens if path changes? –Periodically (>5mins, or >1min after previous increase), increase MTU Some routers will return proper MTU

36 Network Layer4-36 Fragmentation r References m Characteristics of Fragmented IP Traffic on Internet Links. Colleen Shannon, David Moore, and k claffy -- CAIDA, UC San Diego. ACM SIGCOMM Internet Measurement Workshop 2001. http://www.aciri.org/vern/sigcomm-imeas- 2001.program.html Characteristics of Fragmented IP Traffic on Internet Links http://www.aciri.org/vern/sigcomm-imeas- 2001.program.html – C. A. Kent and J. C. Mogul, "Fragmentation considered harmful," in Proceedings of the ACM Workshop on Frontiers in Computer Communications Technology, pp. 390--401, Aug. 1988. http://www.research.compaq.com/wrl/techreports/abstr acts/87.3.html http://www.research.compaq.com/wrl/techreports/abstr acts/87.3.html

37 Network Layer4-37 IP Addressing r IP address: m 32-bit identifier for host/router interface r interface: connection between host, router and physical link m routers typically have multiple interfaces m host may have multiple interfaces m IP addresses associated with interface, not host, router 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 223.1.1.1 = 11011111 00000001 00000001 00000001 223 111

38 Network Layer4-38 IP Addressing r IP address: m network part (high order bits) m host part (low order bits) r What’s a network ? m all interfaces that can physically reach each other without intervening router m each interface shares the same network part of IP address 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 network consisting of 3 IP networks (for IP addresses starting with 223, first 24 bits are network address) LAN

39 Network Layer4-39 Subnets How to find the networks (subnets)? r Detach each interface from router, host r create “islands of isolated networks r Each isolated network is called a subnet r Notation: m Interfaces on a subnet share identical “bits” as prefix m Bits identified by mask 255.255.255.0 machine addresses all begin with the same 24 bits Also denoted by /24 223.1.1.0/24 223.1.2.0/24 223.1.3.0/24 Subnet mask: /24

40 Network Layer4-40 Subnets How many? 223.1.1.1 223.1.1.3 223.1.1.4 223.1.2.2 223.1.2.1 223.1.2.6 223.1.3.2 223.1.3.1 223.1.3.27 223.1.1.2 223.1.7.0 223.1.7.1 223.1.8.0223.1.8.1 223.1.9.1 223.1.9.2

41 Network Layer4-41 How do networks get IP addresses? r Total IP address size: 4 billion r Initially one large class (8-bit network, 24-bit host) m ISP given an 8-bit network number to manage m Each router keeps track of each network (2 8 =256 routes) m Each network has 16 million hosts m Problem: one size does not fit all r Classful addressing m Accomodate smaller networks (LANs) m Class A: 128 networks, 16M hosts m Class B: 16K networks, 64K hosts m Class C: 2M networks, 256 hosts m Total routes potentially > 2,113,664 routes ! High Order Bits 0 10 110 Format 7 bits of net, 24 bits of host (/8) 14 bits of net, 16 bits of host (/16) 21 bits of net, 8 bits of host (/24) Class A B C

42 Network Layer4-42 IP address classes Network IDHost ID 816 Class A 32 0 Class B 10 Class C 110 Multicast Addresses Class D 1110 Reserved for experiments Class E 1111 24 Network ID Host ID 1.0.0.0 to 127.255.255.255 128.0.0.0 to 191.255.255.255 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255

43 Network Layer4-43 Special IP Addresses r Private addresses – http://www.rfc-editor.org/rfc/rfc1918.txt http://www.rfc-editor.org/rfc/rfc1918.txt – Class A: 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 prefix) – Class B: 172.16.0.0 - 172.31.255.255 (172.16.0.0/12 prefix) – Class C: 192.168.0.0 - 192.168.255.255 (192.168.0.0/16 prefix) r 127.0.0.1: local host (a.k.a. the loopback address) r 255.255.255.255 m IP broadcast to local hardware that must not be forwarded m http://www.rfc-editor.org/rfc/rfc919.txt http://www.rfc-editor.org/rfc/rfc919.txt r 0.0.0.0 m IP address of unassigned host (BOOTP, ARP, DHCP) m Default route advertisement

44 Network Layer4-44 IP Addressing Problem #1 (1984) r Inefficient use of address space m Class A (rarely given out, sparse usage) m Class B = 64k hosts Very few LANs have close to 64K hosts Electrical/LAN limitations, performance or administrative reasons e.g., class B net allocated enough addresses for 64K hosts, even if only 2K hosts in that network m Need simple/address-efficient way to get multiple “networks” Reduce the number of addresses that are assigned, but not used r Subnet addressing m http://www.rfc-editor.org/rfc/rfc917.txt http://www.rfc-editor.org/rfc/rfc917.txt m Split large address ranges into multiple smaller ones (subnet) m Dramatically increases potential number of routes!

45 Network Layer4-45 Subnetting r Variable length subnet masks m Subnet a class B address space into several chunks Network Host Network HostSubnet 1111.. 00000000..1111Mask

46 Network Layer4-46 Subnetting Example r Assume an organization was assigned a class B address 150.100 r Assume it has < 100 hosts per subnet m How many host bits do we need? Seven m What is the network mask? 11111111 11111111 11111111 10000000 255.255.255.128 or /25 m How many subnets of this size can be created within this address space? m List them

47 Network Layer4-47 Subnetting Example r Assume an organization was assigned a class B address 150.100 r Assume it has < 100 hosts per subnet m How many host bits do we need? Seven m What is the network mask? 11111111 11111111 11111111 10000000 255.255.255.128 or /25 m How many subnets of this size can be created within this address space? 512(/16 = 2 16 hosts, /25 = 2 7 hosts … 2 16 /2 7 = 2 9 = 512) m List them 150.100.0.0/25(…00000000.0*******) 150.100.0.128/25(…00000000.1*******) 150.100.1.0/25(…00000001.0*******) 150.100.1.128/25(…00000001.1*******) … 150.100.255.0/25(…11111111.0*******) 150.100.255.128/25(…11111111.1*******)

48 Network Layer4-48 Subnetting Example r Split the following network into 16 equal subnetworks m 131.252.128.0/17

49 Network Layer4-49 Subnetting Example r Split the following network into 16 equal subnetworks m 131.252.128.0/17 10000011. 11111100. 10000000. 00000000 m Split into 16 parts using next 4 significant bits 10000011. 11111100. 10000000. 00000000 10000011. 11111100. 10001000. 00000000 10000011. 11111100. 10010000. 00000000 10000011. 11111100. 10011000. 00000000 etc. m Solution 131.252.128.0/21 131.252.136.0/21 131.252.144.0/21 etc.

50 Network Layer4-50 IP Address Problem #2 (1991) r Address space depletion m In danger of running out of classes A and B m Class A very few in number, IANA frugal in giving them out m Class B subnetting only applied to new allocations of class B existing class B networks sparsely populated people refuse to give it back m Class C plenty available, but too small for most domains r Supernetting m Assign multiple consecutive class C blocks as one block m Allows class C usage while limiting number of routes used m http://www.rfc-editor.org/rfc/rfc1338.txt http://www.rfc-editor.org/rfc/rfc1338.txt

51 Network Layer4-51 IP Address Problem #2 (1991) r Example m Combine the following class C networks into one larger network 131.252.0.0/24 131.252.1.0/24 131.252.2.0/24 131.252.3.0/24 131.252.4.0/24 131.252.5.0/24 131.252.6.0/24 131.252.7.0/24 Answer: 131.252.0.0/21.00000000.*.00000001.*.00000010.*.00000011.*.00000100.*.00000101.*.00000110.*.00000111.*

52 Network Layer4-52 IP Address Problem #3 (1991) r Explosion of routes m Subnetting class B m Increasing use of class C explodes # of routes r Remove classes m Classless Inter-Domain Routing (CIDR) m Arbitrary aggregation of contiguous addresses m http://www.rfc-editor.org/rfc/rfc1518.txt m http://www.rfc-editor.org/rfc/rfc1519.txt

53 Network Layer4-53 IP addressing: CIDR r Original classful addressing m Use class structure (A, B, C) to determine network ID for route lookup r CIDR: Classless InterDomain Routing m Do not use classes to determine network ID m network portion of address of arbitrary length m route format: a.b.c.d/x, where x is # bits in network portion of address 11001000 00010111 00010000 00000000 network part host part 200.23.16.0/23

54 Network Layer4-54 CIDR r Assign any range of addresses to network m Use common part of address as network number m e.g., addresses 192.4.16.* to 192.4.31.* have the first 20 bits in common. Thus, we use this as the network number m netmask is /20, /xx is valid for almost any xx m 192.4.16.0/20 r Enables more efficient usage of address space (and router tables) m More on how this impacts routing later….

55 Network Layer4-55 CIDR example r Consider the following sets of /24 networks m 194.252.10.0/24 m 194.252.11.0/24 m 194.252.12.0/24 m 194.252.13.0/24 m 194.252.14.0/24 m 194.252.15.0/24 m 194.252.16.0/24 m 194.252.17.0/24 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

56 Network Layer4-56 CIDR example r Consider the following sets of /24 networks m 194.252.10.0/24 =.00001010.* m 194.252.11.0/24 =.00001011.* 194.252.10.0/23 m 194.252.12.0/24 =.00001100.* m 194.252.13.0/24 =.00001101.* m 194.252.14.0/24 =.00001110.* m 194.252.15.0/24 =.00001111.* 194.252.12.0/22 m 194.252.16.0/24 =.00010000.* m 194.252.17.0/24 =.00010001.* 194.252.16.0/23 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

57 Network Layer4-57 CIDR example r Consider the following sets of /24 networks m 194.252.0.0/24 m 194.252.1.0/24 m 194.252.2.0/24 m 194.252.3.0/24 m 194.252.4.0/24 m 194.252.5.0/24 m 194.252.6.0/24 m 194.252.7.0/24 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

58 Network Layer4-58 CIDR example r Consider the following sets of /24 networks m 194.252.0.0/24 =.00000000.* m 194.252.1.0/24 =.00000001.* = 194.252.1.0/24 m 194.252.2.0/24 =.00000010.* = m 194.252.3.0/24 =.00000011.* = 194.252.2.0/23 m 194.252.4.0/24 =.00000100.* = m 194.252.5.0/24 =.00000101.* = m 194.252.6.0/24 =.00000110.* = m 194.252.7.0/24 =.00000111.* = 194.252.4.0/22 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

59 Network Layer4-59 CIDR route aggregation “Send me anything with addresses beginning 200.23.16.0/20” 200.23.16.0/23200.23.18.0/23200.23.30.0/23 Fly-By-Night-ISP Organization 0 Organization 7 Internet Organization 1 ISPs-R-Us “Send me anything with addresses beginning 199.31.0.0/16” 200.23.20.0/23 Organization 2...... Hierarchical addressing allows efficient advertisement of routing information:

60 Network Layer4-60 CIDR route aggregation ISP X given 16 class C networks 200.23.16.* to 200.23.31.* (or 200.23.16/20) 200.23.16.0/24, 200.200.17.0/24 200.23.18.0/24, 200.200.19.0/24 200.23.20.0/24, 200.200.21.0/24 200.23.22.0/24, 200.200.23.0/24 Large company 200.23.16.0/ 21 Medium company 200.23.24.0/ 22 200.23.24.0/24 200.23.25.0/24 200.23.26.0/24 200.23.27.0/24 Small company 200.23.28.0 /23 200.23.28.0/24 200.23.29.0/24 Tiny company 200.23.30.0/ 24 Adjacent ISP router ISP X Route Interface 200.23.16/20 1 1 Route Interface 200.23.16/21 2 200.23.24/22 3 200.23.28/23 4 200.23.30/24 5 1 2 3 4 5

61 Network Layer4-61 CIDR Shortcomings r Customer selecting a new provider m Renumbering required 201.10.0.0/21 201.10.0.0/22 201.10.4.0/24 201.10.5.0/24201.10.6.0/23 Provider 1Provider 2 199.31.0.0/16

62 Network Layer4-62 CIDR shortcomings r Multi-homing ISPs-R-Us has a more specific route to Organization 1 “Send me anything with addresses beginning 200.23.16.0/20” 200.23.16.0/23200.23.18.0/23200.23.30.0/23 Fly-By-Night-ISP Organization 0 Organization 7 Internet Organization 1 ISPs-R-Us “Send me anything with addresses beginning 199.31.0.0/16 or 200.23.18.0/23” 200.23.20.0/23 Organization 2......

63 Network Layer4-63 Getting IP addresses Q: How does network get IP addresses? A: organization gets allocated portion of its provider ISP’s address space m ISPs get it from ICANN: Internet Corporation for Assigned Names and Numbers Allocates addresses, manages DNS, resolves disputes m Customers get sub-blocks from ISPs ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20 Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23... ….. …. …. Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

64 Network Layer4-64 CIDR and IP route lookup (forwarding) r IP routing m Done only based on destination IP address m Lookup route in forwarding table r Classful IP Route Lookup m In the early days, address classes made it easy A: 0 | 7 bit network | 24 bit host (16M each) B: 10 | 14 bit network | 16 bit host (64K) C: 110 | 21 bit network | 8 bit host (255) m Address would specify prefix for forwarding table m Simple lookup

65 Network Layer4-65 Classful IP forwarding r www.pdx.edu address 131.252.120.50 www.pdx.edu m Class B address – route prefix is 131.252 m Lookup 131.252 in class B forwarding table m Prefix – part of address that really matters for routing r Forwarding table contains m List of prefix entries m A few fixed prefix lengths (8/16/24) r Large tables m 2 Million class C networks m Sites with multiple class C networks have multiple route entries at every router

66 Network Layer4-66 CIDR and IP forwarding r CIDR advantages m Saves space in route tables m Makes more efficient use of address space ISP allocated 8 class C chunks, 201.10.0.0 to 201.10.7.255 –201.10.0.0/24 201.10.1.0/24 201.10.2.0/24 201.10.3.0/24 –201.10.4.0/24 201.10.5.0/24 201.10.6.0/24 201.10.7.0/24 Combine 8 class C entries with 1 combined entry –First 21 bits are network number –Written as 201.10.0.0/21 m Routing protocols carry prefix length with destination network address

67 Network Layer4-67 CIDR and IP forwarding r CIDR disadvantage m Makes route lookup more complex CIDR fundamentally changes route lookup algorithm Before CIDR –Separate class A/B/C route tables each with O(1) lookup –Table lookup based on class (A,B,C) After CIDR –One table containing many prefix lengths –Must find the most specific route that matches the destination IP address in packet –Must match against all routes simultaneously via longest prefix match

68 Network Layer4-68 Longest prefix matching Prefix Match Link Interface 11001000 00010111 00010 0 11001000 00010111 00011000 1 11001000 00010111 00011 2 otherwise 3 DA: 11001000 00010111 00011000 10101010 Examples DA: 11001000 00010111 00010110 10100001 Which interface?

69 Network Layer4-69 CIDR example Provider Routing to the network Packet to 10.1.1.3 arrives Path is R2 – R1 – H1 – H2 H2 H3 H4 R1 10.1.1/24 10.1.1.2 10.1.1.4 10.1.16/24 10.1.8/24 10.1.3/24 10.1.1.3 10.1.2/24 R2 10.1.3.2 10.1.8.4 10.1.1.1 10.1.2.2 10.1.3.1 10.1.8.1 10.1.2.1 10.1.16.1 H1 10.1.1.2/31

70 Network Layer4-70 CIDR example Routing table at R2 DestinationNext HopInterface 127.0.0.1 lo0 Default or 0/0provider10.1.16.1 10.1.8.0/2410.1.8.1 10.1.2.0/2410.1.2.1 10.1.0.0/2210.1.2.210.1.2.1 Subnet Routing Packet to 10.1.1.3 Matches 10.1.0.0/22 H2 H3 H4 R1 10.1.1/24 10.1.1.2 10.1.1.4 10.1.16/24 10.1.8/24 10.1.3/24 10.1.1.3 10.1.2/24 R2 10.1.3.2 10.1.8.4 10.1.1.1 10.1.2.2 10.1.3.1 10.1.8.1 10.1.2.1 10.1.16.1 H1 10.1.1.2/31 10.1.16.0/2410.1.16.1

71 Network Layer4-71 CIDR example Routing table at R1 DestinationNext HopInterface 127.0.0.1 lo0 Default or 0/010.1.2.110.1.2.2 10.1.3.1 10.1.1.0/2410.1.1.1 10.1.2.2 Subnet Routing Packet to 10.1.1.3 Matches 10.1.1.2/31 Longest prefix match 10.1.1.410.1.1.1 10.1.2.0/24 10.1.1.2/31 10.1.3.0/24 H2 H3 H4 R1 10.1.1/24 10.1.1.2 10.1.1.4 10.1.16/24 10.1.8/24 10.1.3/24 10.1.1.3 10.1.2/24 R2 10.1.3.2 10.1.8.4 10.1.1.1 10.1.2.2 10.1.3.1 10.1.8.1 10.1.2.1 10.1.16.1 H1 10.1.1.2/31 10.1.1.3 matches both routes, use longest prefix match

72 Network Layer4-72 CIDR example Routing table at H1 DestinationNext HopInterface 127.0.0.1 lo0 Default or 0/010.1.1.110.1.1.4 10.1.1.0/2410.1.1.4 10.1.1.2/31 10.1.1.2 Subnet Routing Packet to 10.1.1.3 Direct route Longest prefix match H2 H3 H4 R1 10.1.1/24 10.1.1.2 10.1.1.4 10.1.16/24 10.1.8/24 10.1.3/24 10.1.1.3 10.1.2/24 R2 10.1.3.2 10.1.8.4 10.1.1.1 10.1.2.2 10.1.3.1 10.1.8.1 10.1.2.1 10.1.16.1 H1 10.1.1.2/31 10.1.1.3 matches both routes, use longest prefix match

73 Network Layer4-73 Longest-prefix matching r Algorithms and data structures for CIDR-based IP forwarding m Ruiz-Sanchez, Biersack, Dabbous, “Survey and Taxonomy of IP address Lookup Algorithms”, IEEE Network, Vol. 15, No. 2, March 2001 Binary tree Multi-bit tree LC tree Lulea tree Full expansion/compression Binary search on prefix lengths Binary range search Multiway range search Multiway range trees Binary search on hash tables (Waldvogel – SIGCOMM 97)

74 Network Layer4-74 Binary tree Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A 0 0 0 0 1 1 0 00 00 1 1 11 1 BCDEFGHI r Data structure to support longest-prefix match for forwarding r Bit-wise traversal from left-to-right m Continue as far as possible while keeping track of deepest match Example: 000000 Example: 101000

75 Network Layer4-75 Path-compressed binary tree r Eliminate single branch point nodes m Saves unnecessary memory lookups m Branches labelled by bit to examine m Continue as far as possible while keeping track of deepest match r Variants include PATRICIA and BSD trees Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A 0 10 0 00 1 1 11 1 BCDEFGHI 0 Bit=3Bit=2 Bit=3 Bit=4 Bit=1 Example: 010100 x

76 Network Layer4-76 Example #2 r Create a binary tree that implements the following forwarding table Route Prefixes A 0* B 00010* C 00011* D *

77 Network Layer4-77 Example #2: Binary tree Route Prefixes A 0* B 00010* C 00011* D * A 0 0 0 1 B 0 CD

78 Network Layer4-78 Example #2 r Create a path-compressed binary tree that implements the following forwarding table Route Prefixes A 0* B 00010* C 00011* D *

79 Network Layer4-79 Example #2: Path-compressed binary tree Route Prefixes A 0* B 00010* C 00011* D * A 0 B 0 C Bit=1 Bit=5 1 D

80 Network Layer4-80 Multi-bit trees r Problem with all single-bit trees m Still incur too many memory accesses per lookup m Lookup done a single bit at a time m CPUs access 32-bits at a time r Multi-bit trees m Compare multiple bits at a time m Stride = number of bits being examined m Reduces memory accesses m Increases memory required Forces table expansion for prefixes falling in between strides m Two types Variable stride multi-bit trees Fixed stride multi-bit trees r Most route entries are Class C m Optimize “stride” based on this

81 Network Layer4-81 Variable stride multi-bit tree r Single level has variable stride lengths Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A 01 01 00011011 ADDBCCE 00011011 GFIH 00011011 Route for C expanded/duplicated Stride either 1 or 2 bits

82 Network Layer4-82 Fixed stride multi-bit tree r Single level has equal strides Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A 000001010011100101110111 AA 000110110001101100011011 CEDDDBFFGHGHII

83 Network Layer4-83 Issues r Scaling m IPv6? r Stride choice m Tuning stride to route table

84 Network Layer4-84 IP Address Problem #4 (1994) r Even with CIDR, address space running out m IPv6 still being developed, a long way from being deployed r Network Address Translation (NAT) m Alternate solution to address space depletion problem Kludge (but useful) m Sits between your network and the Internet m Dynamically assign source address from a pool of available addresses “Statistically multiplex” address usage Each machine gets unique, external IP address out of pool Replaces local, private, network layer source IP addresses to global IP addresses m Has a pool of global IP addresses (less than number of hosts on your network)

85 Network Layer4-85 NAT Illustration Global Internet Private Network Pool of global IP addresses Operation: Source (S) wants to talk to Destination (D): Create S g -S p mapping Replace S p with S g for outgoing packets Replace S g with S p for incoming packets P G DgDg SpSp Data NAT DestinationSource DgDg SgSg Data

86 Network Layer4-86 IP addressing and NAT r What if we only have one IP address? m Add port translation to NAT Sometimes referred to as NAPT (Network Address Port Translator) m Both addresses and ports are translated Translates Paddr + flow info to Gaddr + new flow info Uses TCP/UDP port numbers m Potentially thousands of simultaneous connections with one global IP address 16-bit port-number field: 60,000 simultaneous connections with a single LAN-side address!

87 Network Layer4-87 NAT with port translation 10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4 138.76.29.7 local network (e.g., home network) 10.0.0/24 rest of Internet Datagrams with source or destination in this network have 10.0.0/24 address for source, destination (as usual) All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers

88 Network Layer4-88 NAT r Advantages m range of addresses not needed from ISP: just a small set of IP addresses for all devices m can change addresses of devices in local network without notifying outside world m can change ISP without changing addresses of devices in local network m devices inside local net not explicitly addressable, visible by outside world (a security plus).

89 Network Layer4-89 NAT Implementation: NAT router must: m outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)... remote clients/servers will respond using (NAT IP address, new port #) as destination addr. m remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair m incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table

90 Network Layer4-90 NAT example 10.0.0.1 10.0.0.2 10.0.0.3 S: 10.0.0.1, 3345 D: 128.119.40.186, 80 1 10.0.0.4 138.76.29.7 1: host 10.0.0.1 sends datagram to 128.119.40.186, 80 NAT translation table WAN side addr LAN side addr 138.76.29.7, 5001 10.0.0.1, 3345 …… S: 128.119.40.186, 80 D: 10.0.0.1, 3345 4 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 2 2: NAT router changes datagram source addr from 10.0.0.1, 3345 to 138.76.29.7, 5001, updates table S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3 3: Reply arrives dest. address: 138.76.29.7, 5001 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 10.0.0.1, 3345

91 Network Layer4-91 NAT is controversial r Routers should only process up to layer 3 m violates network transparency key feature that allows one to deploy any application without coordinating with network infrastructure m implicit assumption that network header is unchanged in network m address shortage should instead be solved by IPv6 r Other problems m No inbound connections Must be taken into account by app designers, eg, P2P applications m Some protocols carry addresses e.g., FTP carries addresses in text What is the problem? m Encryption

92 Network Layer4-92 NAT problem #1: traversal r Incoming connections m client want to connect to server with address 10.0.0.1 m server address 10.0.0.1 local to LAN (client can’t use it as destination addr) m only one externally visible NATted address: 138.76.29.7 r solution 1: statically configure NAT to forward incoming connection requests at given port to server m e.g., (123.76.29.7, port 2500) always forwarded to 10.0.0.1 port 25000 m Or use DMZ host 10.0.0.1 10.0.0.4 NAT router 138.76.29.7 Client ?

93 Network Layer4-93 NAT problem #1: traversal r solution 2: Universal Plug and Play (UPnP) Internet Gateway Device (IGD) Protocol. Allows NATted host to:  learn public IP address (138.76.29.7)  enumerate existing port mappings  add/remove port mappings (with lease times) i.e., automate static NAT port map configuration 10.0.0.1 10.0.0.4 NAT router 138.76.29.7 IGD

94 Network Layer4-94 NAT problem #1: traversal r solution 3: relaying (used in Skype) m NATed server establishes connection to relay m External client connects to relay m relay bridges packets between to connections 10.0.0.1 NAT router 138.76.29.7 Client 1. connection to relay initiated by NATted host 2. connection to relay initiated by client 3. relaying established

95 Network Layer4-95 NAT problem #2: loss of transparency r Breaks applications that assume network does not modify packets r Prevents new applications that make the same assumption r Example m ftp, NAT, and PORT command

96 Network Layer4-96 ftp, NAT and PORT command r Normal FTP mode m Server has port 20, 21 reserved m Client initiates control connection to port 21 on server m Client allocates port X for data connection m Client passes its IP address and the data connection port (X) in a PORT command to server m Server parses PORT command and initiates connection from its own port 20 to the client on port X r What if client is behind a NAT device?

97 Network Layer4-97 ftp, NAT and PORT command r Problem m ftp server connects to a private IP address! 192.168.0.1 192.168.0.2 Packet #1 SrcIP=192.168.0.1 SrcPort=1312 DstIP=131.252.220.66 DstPort=21 ------------------- PORT command “Connect to me at IP=192.168.0.1 Port=20” NAPT translator ExternalIP=129.95.50.3 Packet #1 after NAPT SrcIP=129.95.50.3 SrcPort=2000 DstIP=131.252.220.66 DstPort=21 -------------------- PORT command “Connect to me at IP=192.168.0.1 Port=20”

98 Network Layer4-98 ftp, NAT and PORT command r Solution #1 m Modify packets at NAT NAT must captures outgoing connections destined for port 21 Looks for PORT command and translates address/port payload –http://www.practicallynetworked.com/support/linksys_ftp _port.htm What if NAT doesn’t parse PORT command correctly? What if ftp server is running on a different port than 21?

99 Network Layer4-99 ftp, NAT and PORT command r Need to rewrite points to bigger problem! m Loss of network transparency m Network must modify application data in order for application to run correctly! 192.168.0.1 192.168.0.2 Packet #1 SrcIP=192.168.0.1 SrcPort=1312 DstIP=131.252.220.66 DstPort=21 ------------------- PORT command “Connect to me at IP=192.168.0.1 Port=20” NAPT translator ExternalIP=129.95.50.3 Packet #1 after NAPT SrcIP=129.95.50.3 SrcPort=2000 DstIP=131.252.220.66 DstPort=21 -------------------- PORT command “Connect to me at IP=129.95.50.3 Port=2001”

100 Network Layer4-100 ftp, NAT, and PORT command r Solution #2 m Passive (PASV) mode Client initiates control connection to port 21 on server Client enables “Passive” mode Server responds with PORT command giving client the IP address and port to use for subsequent data connection (usually port 20, but can be bypassed) Client initiates data connection by connecting to specified port on server m Most web browsers do PASV-mode ftp

101 Network Layer4-101 ftp, NAT, and PORT command r PASV mode transfers 192.168.0.1 192.168.0.2 NAPT translator ExternalIP=129.95.50.3 After PASV command SrcIP=131.252.220.66 SrcPort=21 DstIP=129.95.50.3 DstPort=2000 -------------------- PORT command “Connect to me at IP=131.252.220.66 Port=20”

102 Network Layer4-102 ftp, NAT, and PORT command r Solution #2 m What if server is behind a NAT device? See client issues m What if both client and server are behind NAT devices? Problem Similar to P2P xfers and Skype –See IETF STUN WG

103 Network Layer4-103 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

104 Network Layer4-104 ICMP: Internet Control Message Protocol r Essentially a network-layer protocol for passing control messages r used by hosts & routers to communicate network-level information m error reporting: unreachable host, network, port, protocol m echo request/reply (used by ping) r network-layer “above” IP: m ICMP msgs carried in IP datagrams r ICMP message: type, code plus first 8 bytes of IP datagram causing error r http://www.rfc- editor.org/rfc/rfc792.txt Type Code description 0 0 echo reply (ping) 3 0 dest. network unreachable 3 1 dest host unreachable 3 2 dest protocol unreachable 3 3 dest port unreachable 3 6 dest network unknown 3 7 dest host unknown 4 0 source quench (congestion control - not used) 8 0 echo request (ping) 9 0 route advertisement 10 0 router discovery 11 0 TTL expired 12 0 bad IP header

105 Network Layer4-105 ICMP and traceroute r What do “real” Internet delay & loss look like?  Traceroute program: provides delay measurement from source to router along end-end Internet path towards destination. For all i: m sends three packets that will reach router i on path towards destination m router i will return packets to sender m sender times interval between transmission and reply. 3 probes

106 Network Layer4-106 ICMP and traceroute r Source sends series of UDP segments to dest m First has TTL =1 m Second has TTL=2, etc. m Unlikely port number r When nth datagram arrives to nth router: m Router discards datagram m And sends to source an ICMP message (type 11, code 0) m Message includes name of router& IP address r When ICMP message arrives, source calculates RTT r Traceroute does this 3 times Stopping criterion r UDP segment eventually arrives at destination host r Destination returns ICMP “host unreachable” packet (type 3, code 3) r When source gets this ICMP, stops.

107 Network Layer4-107 Examples 1 cs-gw (128.119.240.254) 1 ms 1 ms 2 ms 2 border1-rt-fa5-1-0.gw.umass.edu (128.119.3.145) 1 ms 1 ms 2 ms 3 cht-vbns.gw.umass.edu (128.119.3.130) 6 ms 5 ms 5 ms 4 jn1-at1-0-0-19.wor.vbns.net (204.147.132.129) 16 ms 11 ms 13 ms 5 jn1-so7-0-0-0.wae.vbns.net (204.147.136.136) 21 ms 18 ms 18 ms 6 abilene-vbns.abilene.ucaid.edu (198.32.11.9) 22 ms 18 ms 22 ms 7 nycm-wash.abilene.ucaid.edu (198.32.8.46) 22 ms 22 ms 22 ms 8 62.40.103.253 (62.40.103.253) 104 ms 109 ms 106 ms 9 de2-1.de1.de.geant.net (62.40.96.129) 109 ms 102 ms 104 ms 10 de.fr1.fr.geant.net (62.40.96.50) 113 ms 121 ms 114 ms 11 renater-gw.fr1.fr.geant.net (62.40.103.54) 112 ms 114 ms 112 ms 12 nio-n2.cssi.renater.fr (193.51.206.13) 111 ms 114 ms 116 ms 13 nice.cssi.renater.fr (195.220.98.102) 123 ms 125 ms 124 ms 14 r3t2-nice.cssi.renater.fr (195.220.98.110) 126 ms 126 ms 124 ms 15 eurecom-valbonne.r3t2.ft.net (193.48.50.54) 135 ms 128 ms 133 ms 16 194.214.211.25 (194.214.211.25) 126 ms 128 ms 126 ms 17 * * * 18 * * * 19 fantasia.eurecom.fr (193.55.113.142) 132 ms 128 ms 136 ms traceroute: gaia.cs.umass.edu to www.eurecom.fr Three delay measurements from gaia.cs.umass.edu to cs-gw.cs.umass.edu * means no response (probe lost, router not replying) trans-oceanic link

108 Network Layer4-108 Try it r Some routers labeled with airport code of city they are located in m traceroute www.yahoo.comwww.yahoo.com Packets go to SEA, back to PDX, SJC m traceroute www.oregonlive.comwww.oregonlive.com Packets go to SMF, SFO, SJC, NYC, EWR. m traceroute www.uoregon.eduwww.uoregon.edu Packets go to Pittock block to Eugene m traceroute www.lclark.eduwww.lclark.edu Packets go to SEA and back to PDX

109 Network Layer4-109 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

110 Network Layer4-110 IPv6 r Redefine functions of IP (version 4) m What changes should be made in…. IP addressing IP delivery semantics IP quality of service IP security IP routing IP fragmentation IP error detection

111 Network Layer4-111 IPv6 r Initial motivation: 32-bit address space soon to be completely allocated (est. 2008) r Additional motivation: m Remove ancillary functionality Speed processing/forwarding m Add missing, but essential functionality header changes to facilitate QoS new “anycast” address: route to “best” of several replicated servers IPv6 datagram format: m fixed-length 40 byte header m no fragmentation allowed

112 Network Layer4-112 IPv6 Header (Cont) Priority: identify priority among datagrams in flow Flow Label: identify datagrams in same “flow.” (concept of“flow” not well defined). Next header: identify next protocol for data

113 Network Layer4-113 IPv6 Changes r Scale – addresses are 128bit m Header size? r Simplification m Removes infrequently used parts of header m 40 byte fixed header vs. 20+ byte variable header r IPv6 removes checksum m IPv4 checksum = provide extra protection on top of data- link layer and below transport layer m End-to-end principle Is this necessary? IPv6 answer =>No m Relies on upper layer protocols to provide integrity m Reduces processing time at each hop

114 Network Layer4-114 IPv6 Changes r IPv6 eliminates fragmentation m Requires path MTU discovery r ICMPv6: new version of ICMP m additional message types, e.g. “Packet Too Big” r Protocol field replaced by next header field m Unify support for protocol demultiplexing as well as option processing r Option processing m Options allowed, but only outside of header, indicated by “Next Header” field m Options header does not need to be processed by every router Large performance improvement Makes options practical/useful

115 Network Layer4-115 IPv6 Changes r TOS replaced with traffic class octet m Support QoS via DiffServ r FlowID field m Help soft state systems, accelerate flow classification m Maps well onto TCP connection or stream of UDP packets on host-port pair r Additional requirements m Support for security m Support for mobility m Easy auto-configuration

116 Network Layer4-116 Transition From IPv4 To IPv6 r Not all routers can be upgraded simultaneous m no “flag days” m How will the network operate with mixed IPv4 and IPv6 routers? r Two proposed approaches: m Dual Stack: some routers with dual stack (v6, v4) can “translate” between formats m Tunneling: IPv6 carried as payload in an IPv4 datagram among IPv4 routers

117 Network Layer4-117 Tunneling A B E F IPv6 tunnel Logical view: Physical view: A B E F IPv6 IPv4

118 Network Layer4-118 Tunneling A B E F IPv6 tunnel Logical view: Physical view: A B E F IPv6 C D IPv4 Flow: X Src: A Dest: F data Flow: X Src: A Dest: F data Flow: X Src: A Dest: F data Src:B Dest: E Flow: X Src: A Dest: F data Src:B Dest: E A-to-B: IPv6 E-to-F: IPv6 B-to-C: IPv6 inside IPv4 B-to-C: IPv6 inside IPv4

119 Network Layer4-119 Dual Stack Approach r Dual-stack router translates b/w v4 and v6 m v4 addresses have special v6 equivalents m Issue: how to translate “FlowField” of v6 ?

120 Network Layer4-120 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

121 Network Layer4-121 Two Key Network-Layer Functions r forwarding: move packets from router’s input to appropriate router output r routing: determine route taken by packets from source to dest. m routing algorithms analogy: r routing: process of planning trip from source to dest r forwarding: process of getting through single interchange

122 Network Layer4-122 1 2 3 0111 value in arriving packet’s header routing algorithm local forwarding table header value output link 0100 0101 0111 1001 32213221 Interplay between routing, forwarding r Previously: Forward based on forwarding table r Q: How to generate forwarding tables? Routing algorithms and protocols

123 Network Layer4-123 Who handles IP routing functions? m Source (IP source routing) m Network edge devices m Network routers

124 Network Layer4-124 Source Routing r IP source route option m Packet carries path to destination Entire path (strict) Partial path (loose) Attach list of IP addresses within header r Router processing m Examine first step in directions m Increment pointer offset in header m Forward to step m Copy entire source route header on fragmentation

125 Network Layer4-125 Source Routing Example Receiv er Packet R1/R2/R3 Sender 2 3 4 1 2 3 4 1 2 3 4 1 R2 R3 R1 R2/R3 R3

126 Network Layer4-126 Source Routing r Advantages m Switches can be very simple and fast r Disadvantages m Variable (unbounded) header size m Sources must know or discover topology (e.g., failures) r Typical use m Ad-hoc networks (DSR) m Machine room networks (Myrinet)

127 Network Layer4-127 Network edge device routing r Virtual circuits, tag switching r Connection setup phase m Map IP route into appropriate label, wavelength, circuit at the network edge m Switch on label, wavelength, circuit ID in core m ATM, MPLS, lambda switching r In-network processing m Lookup flow ID – simple table lookup m Potentially replace flow ID with outgoing flow ID m Forward to output port

128 Network Layer4-128 Virtual Circuits Examples Receiver edge Packet 1,5  3,7 Sender edge 2 3 4 1 1,7  4,2 2 3 4 1 2 3 4 1 2,2  3,6 R2 R3 R1 57 2 6

129 Network Layer4-129 Virtual Circuits r Advantages m More efficient lookup (simple table lookup) Easier for hardware implementations m More flexible (different path for each flow) m Can reserve bandwidth at connection setup r Disadvantages m Still need to route connection setup request m More complex failure recovery – must recreate connection state r Typical uses m ATM – combined with fix sized cells m MPLS – tag switching for IP networks

130 Network Layer4-130 IP Datagrams on Virtual Circuits r Challenge – when to setup connections m At bootup time – permanent virtual circuits (PVC) Large number of circuits m For every packet transmission Connection setup is expensive m For every connection What is a connection? How to route connectionless traffic? m Based on traffic VC for long-lived flows Normal IP forwarding for all other flows

131 Network Layer4-131 Network routers (Global IP addresses) r Hop-by-hop forwarding based on destination IP carried by packet m Each packet has destination IP address m Each router has forwarding table of.. destination IP  next hop IP address m IP route table calculated in network routers r Most prevalent way to route on the Internet m Distributed routing algorithm for calculating forwarding tables

132 Network Layer4-132 Global Address Example Receiver Packet R Sender 2 3 4 1 2 3 4 1 2 3 4 1 R2 R3 R1 R R R  3 R  4 R  3 R

133 Network Layer4-133 Global Addresses r Advantages m Simple error recovery r Disadvantages m Every router knows about every destination Potentially large tables m All packets to destination take same route

134 Network Layer4-134 Comparison Source RoutingGlobal Addresses Header SizeWorstOK – Large address Router Table SizeNone Number of hosts (prefixes) Forward OverheadBestPrefix matching Virtual Circuits OK (larger than global if IP payload) Number of circuits Good (table index) Setup OverheadNone Error RecoveryTell all hostsTell all routers Connection Setup Tell all routers, Tear down circuit and re-route

135 Network Layer4-135 Routing protocols Graph abstraction for routing algorithms: r Graph: G = (N,E) m N=graph nodes (routers) A, B, C, D, E, F m E=graph edges (links) (A,B), (A,D), (A,C), (B,C), (B,D), (C,D), (C,E), (C,F), (D,E), (E,F) Cost associated with edge –Delay, $, congestion r Routing algorithms find minimum cost paths through graph Goal: determine “good” path (sequence of routers) thru network from source to dest. A E D CB F 2 2 1 3 1 1 2 5 3 5

136 Network Layer4-136 Routing Algorithm classification Global or decentralized information? Global: r all routers have complete topology, link cost info r “link state” algorithms Decentralized: r router knows physically-connected neighbors, link costs to neighbors r iterative process of computation, exchange of info with neighbors r “distance vector” algorithms

137 Network Layer4-137 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

138 Network Layer4-138 A Link-State Routing Algorithm Dijkstra’s algorithm r net topology, link costs known to all nodes m accomplished via “link state broadcast” m all nodes have same info r computes least cost paths from one node (‘source”) to all other nodes m gives forwarding table for that node m iterative: after k iterations, know least cost path to k dest.’s

139 Network Layer4-139 Dijkstra’s algorithm r Start condition m Each node assumed to know state of links to its neighbors r Step 1: Link state broadcast m Each node broadcasts its local link states to all other nodes m Reliable flooding mechanism r Step 2: Shortest-path tree calculation m Each node locally computes shortest paths to all other nodes from global state m Dijkstra’s shortest path tree (SPT) algorithm

140 Network Layer4-140 Link state broadcast r Link State Packets (LSPs) to broadcast state to all nodes r Periodically, each node creates a link state packet containing: m Node ID m List of neighbors and link cost m Sequence number m Time to live (TTL) m Node outputs LSP on all its links

141 Network Layer4-141 Link state broadcast r Reliable Flooding m When node J receives LSP from node K If LSP is the most recent LSP from K that J has seen so far, J saves it in database and forwards a copy on all links except link LSP was received on Otherwise, discard LSP m How to tell more recent Use sequence numbers –Same method as sliding window protocols –Needed to avoid stale information from flood –Problem: sequence number wrap-around »Addressed algorithmically using lollipop sequence numbering

142 Network Layer4-142 Shortest-path tree calculation Notation:  c(x,y): link cost from node x to y; = ∞ if not direct neighbors  D(v): current value of cost of path from source to dest. v  p(v): predecessor node along path from source to v  N': set of nodes whose least cost path definitively known

143 Network Layer4-143 Dijsktra’s Algorithm 1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'

144 Network Layer4-144 Shortest-path tree calculation (Dijkstra’s algorithm example) AFBDEC 2 2 2 3 1 1 1 3 5 5 B CDE F D(v) = min( D(v), D(w) + c(w,v) )

145 Network Layer4-145 Dijkstra’s algorithm example AFBDEC 2 2 2 3 1 1 1 3 5 5 B CDE F D(v) = min( D(v), D(w) + c(w,v) )

146 Network Layer4-146 Dijkstra’s algorithm example AFBDEC 2 2 2 3 1 1 1 3 5 5 B CDE F D(v) = min( D(v), D(w) + c(w,v) )

147 Network Layer4-147 Dijkstra’s algorithm example AFBDEC 2 2 2 3 1 1 1 3 5 5 B CDE F D(v) = min( D(v), D(w) + c(w,v) )

148 Network Layer4-148 Dijkstra’s algorithm example AFBDEC 2 2 2 3 1 1 1 3 5 5 B CDE F D(v) = min( D(v), D(w) + c(w,v) )

149 Network Layer4-149 Dijkstra’s algorithm example AFB DE C 2 2 2 3 1 1 1 3 5 5 B CDE F D(v) = min( D(v), D(w) + c(w,v) )

150 Network Layer4-150 Dijkstra’s algorithm example A E D CB F Resulting shortest-path tree from A: B D E C F (A,B) (A,D) destination link Resulting forwarding table in A:

151 Network Layer4-151 Link state algorithm characteristics r Computation overhead m n nodes m each iteration: need to check all nodes, w, not in N n*(n+1)/2 comparisons: O(n**2) more efficient implementations possible: O(n log(n)) r Space requirements m Size of LSDB r Bandwidth requirements m Reliable flooding O(N*E) r Stability m Consistent LSDBs required for loop-free paths A B C D 1 3 52 1 Packet from C  A may loop around BDC if B knows about failure and C & D do not X

152 Network Layer4-152 Link-state algorithm issues Oscillations possible: r e.g., link cost = amount of carried traffic r Example: path to A flaps as traffic routed clockwise and counter-clockwise r Common problem in load-based link metrics m A. Khanna and J. Zinky, "The Revised ARPANET Routing Metric," in ACM SIGCOMM, 1989, pp. 45--46. A D C B 1 1+e e 0 e 1 1 0 0 A D C B 2+e 0 0 0 1+e 1 A D C B 0 2+e 1+e 1 0 0 A D C B 2+e 0 e 0 1+e 1 initially … recompute routing … recompute

153 Network Layer4-153 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

154 Network Layer4-154 Distance vector routing algorithms r Variants used in m Early ARPAnet m RIP (intra-domain routing protocol) m BGP (inter-domain routing protocol) r Distributed next hop computation m “Gossip with immediate neighbors until you find the best route” m Best route is achieved when there are no more changes r Unit of information exchange m Vector of distances to destinations

155 Network Layer4-155 Distance Vector Algorithm Bellman-Ford algorithm (1957) Define D x (y) := cost of least-cost path from x to y Then D x (y) = min {c(x,v) + D v (y) } where min is taken over all neighbors v of x v

156 Network Layer4-156 Bellman-Ford example u y x wv z 2 2 1 3 1 1 2 5 3 5 Clearly, D v (z) = 5, D x (z) = 3, D w (z) = 3 D u (z) = min { c(u,v) + D v (z), c(u,x) + D x (z), c(u,w) + D w (z) } = min {2 + 5, 1 + 3, 5 + 3} = 4 Node that achieves minimum is next hop in shortest path ➜ forwarding table B-F equation says:

157 Network Layer4-157 Bellman-Ford r Update distance information iteratively m Start with link table (as with Dijkstra), calculate distance table iteratively m Distance table data structure table of known distances and next hops kept per node row for each possible destination column for each directly-attached neighbor to node A E D CB 7 8 1 2 1 2 D E () A B C D A1764A1764 B 14 8 9 11 D5542D5542 cost to destination via destination Distance table at node E

158 Network Layer4-158 D j (k,*) Bellman-Ford algorithm r Centralized version i j k j’ k’ c(i,j) c(i,j’) D j’ (k,*) D i (k,*) For node i while there is a change in D for all k not neighbor of i for each j neighbor of i D i (k,j) = c(i,j) + D j (k,*) if D i (k,j) < D i (k,*) { D i (k,*) = D i (k,j) H i (k) = j } D X (Y,Z) distance from X to Y, via Z as next hop c(X,Z) + min {D Z (Y,w)} w = = D X (Y,*) Minimum known distance from X to Y = H X (Y) = Next hop node from X to Y

159 Network Layer4-159 Distance table example for node E A E D CB 7 8 1 2 1 2 D E () A B C D A1764A1764 B 14 8 9 11 D5542D5542 cost to destination via destination D E (C,D) c(E,D) + min {D D (C,w)} w = = 2+2 = 4 D E (A,D) c(E,D) + min {D D (A,w)} w = = 2+3 = 5 D E (A,B) c(E,B) + min {D B (A,w)} w = = 8+6 = 14 loop! H X (Y) =

160 Network Layer4-160 Distance table gives forwarding table D () A B C D A1764A1764 B 14 8 9 11 D5542D5542 E cost to destination via destination ABCD ABCD A,1 D,5 D,4 Outgoing link to use, cost destination Distance table Routing table H (Y) X

161 Network Layer4-161 Distributed Bellman-Ford r Make Bellman algorithm distributed (Ford-Fulkerson 1962) m Each node i has distance vector estimates to other nodes m Iterate Each node sends around and recalculates D[i,*] When a node x receives new DV estimate from neighbor, it updates its own DV using B-F equation: If estimates change, broadcast entire table to neighbors –continues until no nodes exchange info. –self-terminating: no “signal” to stop m D[i,*] eventually converges to shortest distance D x (y) ← min v {c(x,v) + D v (y)} for each node y ∊ N

162 Network Layer4-162 Distributed Bellman-Ford overview Asynchronous: r “triggered updates” m no need to exchange info/iterate in lock step! Iterative: r When local link costs change r When neighbor sends a message that its least cost path has changed for a node Distributed: r nodes communicate only with directly-attached neighbors r each node notifies neighbors only when its least cost path to any destination changes m neighbors then notify their neighbors if necessary wait for (change in local link cost of msg from neighbor) recompute distance table if least cost path to any dest has changed, notify neighbors Each node:

163 Network Layer4-163 Distributed Bellman-Ford algorithm 1 Initialization: 2 for all adjacent nodes v: 3 D X (*,v) = infinity /* the * operator means "for all rows" */ 4 D X (v,v) = c(X,v) 5 for all destinations, y 6 send min w (D X (y,w)) to each neighbor /* w over all X's neighbors */ At all nodes, X:

164 Network Layer4-164 Distributed Bellman-Ford algorithm 8 loop 9 wait (until I see a link cost change to neighbor V 10 or until I receive update from neighbor V) 11 12 if (c(X,V) changes by d) 13 /* change cost to all dest's via neighbor v by d */ 14 /* note: d could be positive or negative */ 15 for all destinations y: D X (y,V) = D X (y,V) + d 16 17 else if (update received from V wrt destination Y) 18 /* shortest path from V to some Y has changed */ 19 /* V has sent a new value for its min w (D V (Y,w)) */ 20 /* call this received new value is "newval" */ 21 for the single destination Y: D X (Y,V) = c(X,V) + newval 22 23 if we have a new min w (D X (Y,w)for any destination Y 24 send new value of min w (D X (Y,w)) to all neighbors 25 26 forever

165 Network Layer4-165 Analyzing Distributed Bellman-Ford r Continuously send local distance tables of best known routes to all neighbors until your table converges m Computation diffuses until all nodes converge m Will computation converge quickly and deterministically? Not all the time, pathologic cases possible (count-to- infinity) Several algorithms for minimizing such cases

166 Network Layer4-166 DBF example A B E C D Info at Node A B C D ABC 07~ 701 ~10 ~~2 7 1 1 2 28 Distance to Node D ~ ~ 2 0 E18~2 1 8 ~ 2 0 E Initial Distance Vectors

167 Network Layer4-167 DBF example Info at Node A B C D ABC 07 ~ 701 ~ 10 ~~ 2 Distance to Node D ~ ~ 2 0 E18~2 1 8 ~ 2 0 E A B E C D 7 1 1 2 28 What is the new distance table at E after E receives D’s Routes?

168 Network Layer4-168 DBF example Info at Node A B C D ABC 07 ~ 701 ~ 10 ~~ 2 Distance to Node D ~ ~ 2 0 E1842 1 8 ~ 2 0 E A B E C D 7 1 1 2 28 What is the new distance table at E after E receives D’s Routes? Cost to C is updated from ~ to 4

169 Network Layer4-169 DBF example Info at Node A B C D ABC 07 ~ 701 ~10 ~~2 Distance to Node D ~ ~ 2 0 E1842 1 8 ~ 2 0 E A B E C D 7 1 1 2 28 What is the new distance table at A after A receives B’s Routes?

170 Network Layer4-170 DBF example Info at Node A B C D ABC 078 701 ~10 ~~2 Distance to Node D ~ ~ 2 0 E1842 1 8 ~ 2 0 E A B E C D 7 1 1 2 28 What is the new distance table at A after A receives B’s Routes? Cost to C is updated from ~ to 8, cost to E unchanged

171 Network Layer4-171 DBF example Info at Node A B C D ABC 078 701 ~10 ~~2 Distance to Node D ~ ~ 2 0 E1842 1 8 ~ 2 0 E A B E C D 7 1 1 2 28 What is the new distance table at A after A receives E’s Routes?

172 Network Layer4-172 DBF example Info at Node A B C D ABC 075 701 ~10 ~~2 Distance to Node D 3 ~ 2 0 E1842 1 8 ~ 2 0 E A B E C D 7 1 1 2 28 What is the new distance table at A after A receives E’s Routes? Cost to C is updated from 8 to 5, cost to D updated from ~ to 3

173 Network Layer4-173 DBF example Info at Node A B C D ABC 065 601 510 332 Distance to Node D 3 3 2 0 E1542 1 5 4 2 0 E A B E C D 7 1 1 2 28 And so on, until final distances....

174 Network Layer4-174 DBF example dest A B C D ABD 1145 785 694 4112 Next hop E’s routing table A B E C D 7 1 1 2 28

175 Network Layer4-175 DBF (another example) X Z 1 2 7 Y D X (Y,Z) c(X,Z) + min {D Z (Y,w)} w = = 7+1 = 8 D X (Z,Y) c(X,Y) + min {D Y (Z,w)} w = = 2+1 = 3

176 Network Layer4-176 DBF (another example) X Z 1 2 7 Y See book for explanation of this example

177 Network Layer4-177 DBF (good news example) Link cost changes: node detects local link cost change updates distance table (line 15) if cost change in least cost path, notify neighbors (lines 23,24) fast convergence X Z 1 4 50 Y 1

178 Network Layer4-178 DBF (good news example) x z 1 4 50 y 1 t 0 ) y detects link-cost change, updates its DV, informs neighbors. t 1 ) z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV. t 2 ) y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z. algorithm terminates “good news travels fast”

179 Network Layer4-179 DBF (count-to-infinity example) Link cost changes: good news travels fast bad news travels slow - “count to infinity” problem! alternate route implicitly used link that changed X Z 1 4 50 Y 60 algorithm continues on!

180 Network Layer4-180 How are loops caused? r Observation 1: m Y’s metric to X increases r Observation 2: m Z picks Y as next hop to X m But, the implicit path from Z to X includes itself!

181 Network Layer4-181 DBF: (count-to-infinity example) A 25 1 1 BC B C2 1 dest cost A C1 1 dest cost A B1 2 dest cost X

182 Network Layer4-182 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 ~ dest cost A B1 2 dest cost C Sends Routes to B

183 Network Layer4-183 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 3 dest cost A B1 2 dest cost B Updates Distance to A

184 Network Layer4-184 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 3 dest cost A B1 4 dest cost B Sends Routes to C

185 Network Layer4-185 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 5 dest cost A B1 4 dest cost C Sends Routes to B

186 Network Layer4-186 Solutions to looping r Split horizon m Do not advertise route to X to an adjacent neighbor if your route to X goes through that neighbor m If C routes through B to get to A, C does not advertise (C=>A) route to B. r Poisoned reverse m Advertise an infinite distance route to X to an adjacent neighbor if your route to X goes through that neighbor m If C routes through B to get to A, C advertises to B that its distance to A is infinity

187 Network Layer4-187 Split-horizon with poisoned reverse If Z routes through Y to get to X : Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z) will this completely solve count to infinity problem? X Z 1 4 50 Y 60 algorithm terminates new route to X not involving Y can now select and advertise route to X via Z route to X through Y goes thru Z poison it!

188 Network Layer4-188 Solutions to looping r Split horizon with poisoned reverse m Works for two node loops m Does not work for loops with more nodes 1 1 1 1 A X B C D

189 Network Layer4-189 Other solutions to looping r Route poisoning m Advertise infinite cost on a route to everyone (not just next hop) when lowest cost route increases m Gets rid of stale information throughout network m Used in conjunction with Path Holdown r Path Holddown m Freeze route for a fixed time Do not switch to an alternate while route poisoning is happening In our example, A and B delay changing and advertising new routes A and B both set route to D to infinity after single step m Configuring holddown delay Delay too large: Slow convergence Delay too small: Count-to-infinity more probable

190 Network Layer4-190 Other solutions to looping r Path vector m Select loop-free paths m Each route advertisement carries entire path m If a router sees itself in path, it rejects the route m BGP does it this way m Space proportional to diameter of network

191 Network Layer4-191 Looping r Do solutions completely eliminate loops? m No! Transient loops are still possible m Why? Because implicit path information may be stale m See this in BGP convergence r Only way to fix this m Ensure that you have up-to-date information by explicitly querying

192 Network Layer4-192 Comparing link-state vs. distance vector r Communication costs r Processing costs r Optimality r Stability m Convergence time m Loop freedom m Oscillation damping

193 Network Layer4-193 Message complexity, network bandwidth r LS: with n nodes, E links, O(nE) msgs sent m Send info about your neighbors to everyone m Small messages broadcast globally r DV: exchange between neighbors only m Send everything you know to your neighbors m Large messages, but transfers only to neighbors Link State vs. Distance Vector

194 Network Layer4-194 Link State vs. Distance Vector Speed of Convergence r LS: O(n 2 ) algorithm requires O(nE) msgs m Faster – can forward LSPs before processing m Single SPT calculation r DV: convergence time varies m Fast with triggered updates m count-to-infinity problem m may be routing loops

195 Network Layer4-195 Link State vs. Distance Vector Space requirements: r LS: maintains entire topology r DV: maintains only neighbor state m path vector maintains routes proportional to network diameter

196 Network Layer4-196 Link State vs. Distance Vector Robustness: r LS m Can be made robust since sources are aware of alternate paths within topology r DV m Can advertise incorrect paths to all destinations m Incorrect calculation can spread to entire network

197 Network Layer4-197 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

198 Network Layer4-198 Hierarchical Routing scale: with 200 million destinations: r can’t store all dest’s in routing tables! r routing table exchange would swamp links! r Flat routing does not scale administrative autonomy r internet = network of networks r each network admin may want to control routing in its own network Our routing study thus far - idealization r all routers identical r network “flat” … not true in practice

199 Network Layer4-199 Routing Hierarchies r Key observation m Need less information with increasing distance to destination m Hierarchical routing saves table size reduces update traffic allows routing to scale r Two radically different approaches m The area hierarchy m The landmark hierarchy Covered in advanced topics at end of course...

200 Network Layer4- 200 Areas r Divide network into areas m Areas can have nested sub-areas No path between two sub-areas of an area can exit that area m Within area, each node has routes to every other node m Outside area Each node has routes for other top-level areas only (not nodes within those areas) Inter-area packets are routed to nearest appropriate border router

201 Network Layer4-201 Internet Routing Hierarchy r Internet areas called “autonomous systems” (AS) m administrative autonomy r routers in same AS run same routing protocol m “intra-AS” routing protocol (IGP) m Each AS can run its own intra-AS routing protocol Border routers m Special routers in AS that directly link to another AS m Responsible for routing to destinations outside AS run intra-AS routing protocol with all other routers in AS run inter-AS routing protocol or exterior gateway protocol (EGP) with other gateway routers in other AS’s

202 Network Layer4- 202 Internet Routing Hierarchy Border router A.c m Routing protocols Inter-AS externally Intra-AS internally m Forwarding table configured by both network layer link layer physical layer a b b a a C A B d A.a A.c C.b B.a c b c Forwarding Table

203 Network Layer4- 203 Why different Intra- and Inter-AS routing ? Policy: r Intra-AS: single administrative policy m No policy decisions needed, performance dominates m Focus on performance r Inter-AS: ISP wants control over how its traffic routed, who routes through its net. m Policy and monetary factors dominate over performance

204 Network Layer4- 204 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c Inter-AS tasks r Suppose router in AS1 receives datagram for destination outside of AS1 m router should forward packet to gateway router, but which one? AS1 must: 1. learn which dests reachable through AS2, which through AS3 2. propagate this reachability info to all routers in AS1 Job of inter-AS routing!

205 Network Layer4- 205 Example: Setting forwarding table in router 1d r suppose AS1 learns (via inter-AS protocol) that subnet x reachable via AS3 (gateway 1c) but not via AS2. r inter-AS protocol propagates reachability info to all internal routers. r router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c. m installs forwarding table entry (x,I) 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c x …

206 Network Layer4- 206 Example: Choosing among multiple ASes r now suppose AS1 learns from inter-AS protocol that subnet x is reachable from AS3 and from AS2. r to configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x. m this is also the job of inter-AS routing protocol! 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c x … …

207 Network Layer4- 207 Learn from inter-AS protocol that subnet x is reachable via multiple gateways Use routing info from intra-AS protocol to determine costs of least-cost paths to each of the gateways Choose the gateway that has the smallest least cost Determine from forwarding table the interface I that leads to least-cost gateway. Enter (x,I) in forwarding table Example: Choosing among multiple ASes r Cost-based selection

208 Network Layer4- 208 AS Categories r Stub: an AS that has only a single connection to one other AS - carries only local traffic. r Multi-homed: an AS that has connections to more than one AS, but does not carry transit traffic r Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)

209 Network Layer4- 209 AS categories example AS1AS3AS2AS1AS2AS3AS1AS2 Stub Multi-homed Transit

210 Network Layer4-210 Path Sub-optimality 12 3 1.1 1.2 2.1 2.2 3.1 3.2 2.2.1 3 hop red path vs. 2 hop green path start end 3.2.1 1.2.1

211 Network Layer4-211 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

212 Network Layer4-212 Intra-AS Routing r Also known as Interior Gateway Protocols (IGP) r Most common Intra-AS routing protocols: m RIP: Routing Information Protocol Distance-vector m OSPF: Open Shortest Path First Link-state m IGRP: Interior Gateway Routing Protocol (Cisco proprietary) Distance-vector

213 Network Layer4-213 RIP (Routing Information Protocol) r Distance vector algorithm m Distance metric: # of hops (max = 15 hops) m Vectors exchanged every 30 sec and when triggered m Static update period leads to synchronization problems m Split horizon with poisonous reverse r Included in BSD-UNIX Distribution in 1982 m RIP-2 in 1993 adds prefix mask for CIDR D C BA u v w x y z destination hops u 1 v 2 w 2 x 3 y 3 z 2 From router A to subsets:

214 Network Layer4-214 RIP: Example Destination Network Next Router Num. of hops to dest. wA2 yB2 zB7 x--1 ….…..... w xy z A C D B Routing table in D

215 Network Layer4-215 RIP: Example Destination Network Next Router Num. of hops to dest. wA2 yB2 zB A7 5 x--1 ….…..... Routing table in D w xy z A C D B Dest Next hops w - 1 x - 1 z C 4 …. …... Advertisement from A to D

216 Network Layer4-216 RIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor/link declared dead m routes via neighbor invalidated m new advertisements sent to neighbors m neighbors in turn send out new advertisements (if tables changed) m link failure info quickly propagates to entire net m poison reverse used to prevent ping-pong loops (infinite distance = 16 hops)

217 Network Layer4-217 RIP Table processing r RIP routing tables managed by application-level process called routed (route daemon) r advertisements sent in UDP packets, periodically repeated physical link network forwarding (IP) table Transprt (UDP) routed physical link network (IP) Transprt (UDP) routed forwarding table

218 Network Layer4-218 IGRP (Interior Gateway Routing Protocol) r CISCO proprietary; successor of RIP (mid 80s) m Distance Vector, like RIP m several cost metrics (delay, bandwidth, reliability, load etc) m 90 sec update with triggered updates m Split horizon V1: path holddown V2: route poisoning m uses TCP to exchange routing updates m EIGRP Loop-free routing via DUAL (based on diffused computation) CIDR support

219 Network Layer4-219 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

220 Network Layer4- 220 OSPF (Open Shortest Path First) r “open”: publicly available r Uses Link State algorithm m LS packet dissemination m Topology map at each node m Route computation using Dijkstra’s algorithm r Advertisements disseminated to entire AS (via flooding) m Carried in OSPF messages directly over IP (rather than TCP or UDP

221 Network Layer4-221 OSPF “advanced” features (not in RIP) r Security: all OSPF messages authenticated (to prevent malicious intrusion) r Multiple same-cost paths allowed (only one path in RIP) r Integrated uni- and multicast support: m Multicast OSPF (MOSPF) uses same topology data base as OSPF r Hierarchical OSPF in large domains.

222 Network Layer4- 222 Hierarchical OSPF r two-level hierarchy: local area, backbone. m Link-state advertisements only in area m each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. r area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers. r backbone routers: run OSPF routing limited to backbone. r boundary routers: connect to other AS’s.

223 Network Layer4- 223 Hierarchical OSPF

224 Network Layer4- 224 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

225 Network Layer4- 225 History r Mid-80s: EGP (Exterior Gateway Protocol) m Used in original ARPAnet m Reachability protocol (no shortest path) Single bit for reachability information m Topology restricted to a tree (no cycles allowed) ARPA-managed packet switches at top of tree m Unacceptable once Internet grew to multiple independent backbones r Result: BGP development

226 Network Layer4- 226 BGP r BGP (Border Gateway Protocol): the de facto standard r BGP provides each AS a means to: 1. Get subnet reachability information from neighbor ASs. 2. Propagate reachability information to routers within AS. 3. Determine “good” routes to subnets based on reachability information and policy. r Allows a subnet to advertise its existence to rest of the Internet: “I am here” m What if a subnet lies about who it is? m Recent route hijackings

227 Network Layer4- 227 Inter-AS routing: BGP r Link state or distance vector? m Problems with distance-vector: Bellman-Ford algorithm may not converge m More problems with link state: Everyone sees every link –LS database too large – entire Internet –Can’t easily control who uses the network (i.e. an ISP may want to hide particular links from being used by others, but link states are broadcast) Metric used by routers not the same – loops –No universal routing metric –Policy drives routing decisions r Result: BGP is a distance-vector protocol

228 Network Layer4- 228 BGP r Path Vector protocol: m BGP advertisements to neighbors (peers) contain entire path (i.e, sequence of ASs) to a destination E.g., Gateway X sends its path to dest. Z: –Path (X,Z) = X,Y1,Y2,Y3,…,Z m When AS gets route check if AS already in path If yes, reject route If no, add self and (possibly) advertise route further m Allows for policy application (different metrics) Metrics are local - AS chooses path, protocol ensures no loops Supports CIDR aggregation (BGP4)

229 Network Layer4- 229 BGP basics r Pairs of routers (BGP peers) exchange routing info over semi- permanent TCP connections: BGP sessions m Note that BGP sessions do not correspond to physical links. m Two types eBGP and iBGP eBGP between gateways iBGP from gateway to internal routers of an AS r AS2 advertises a prefix to AS1 m AS2 is promising it will forward any datagrams destined to that prefix towards the prefix. m AS2 can aggregate prefixes in its advertisement 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c eBGP session iBGP session

230 Network Layer4- 230 Distributing reachability info r With eBGP session between 3a and 1c, AS3 sends prefix reachability info to AS1. r 1c can then use iBGP do distribute this new prefix reach info to all routers in AS1 r 1b can then re-advertise the new reach info to AS2 over the 1b-to-2a eBGP session r When router learns about a new prefix, it creates an entry for the prefix in its forwarding table. 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c eBGP session iBGP session

231 Network Layer4-231 Path attributes & BGP routes r advertised prefix includes BGP attributes. m prefix + attributes = “route” r two important attributes: m AS-PATH: contains ASs through which prefix advertisement has passed: e.g, AS 67, AS 17 m NEXT-HOP: indicates specific internal-AS router to next-hop AS. (may be multiple links from current AS to next-hop-AS) r when gateway router receives route advertisement, uses import policy to accept/decline.

232 Network Layer4- 232 BGP messages r Exchanged using TCP. m Advantages: Simplifies BGP No need for periodic refresh - routes are valid until withdrawn, or the connection is lost Incremental updates m Disadvantages BGP TCP spoofing attack Congestion control on a routing protocol? Poor interaction during high load (Code Red)

233 Network Layer4- 233 BGP messages r Example messages m OPEN: opens TCP connection to peer and authenticates sender m UPDATE: advertises new path (or withdraws old) m KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request m NOTIFICATION: reports errors in previous msg; also used to close connection

234 Network Layer4- 234 Policy with BGP r BGP provides capability for enforcing various policies r Policies are not part of BGP: they are provided to BGP as configuration information r BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s

235 Network Layer4- 235 Path Selection Criteria r Path attributes + external (policy) information r Examples: m Hop count m Policy considerations Preference for AS Presence or absence of certain AS m Path origin rejecting false routes m Link dynamics m Early-exit Hot-potato routing for transit packets

236 Network Layer4- 236 Examples of BGP Policies r A multi-homed AS refuses to act as transit m Limit path advertisement r A multi-homed AS can become transit for some AS’s m Only advertise paths to some AS’s r An AS can favor or disfavor certain AS’s for traffic transit from itself

237 Network Layer4- 237 BGP routing policy r A,B,C are provider networks r X,W,Y are customers (of provider networks) r X is dual-homed: attached to two networks m X does not want to route from B via X to C m.. so X will not advertise to B a route to C

238 Network Layer4- 238 BGP routing policy (2) r A advertises to B the path AW r B advertises to X the path BAW r Should B advertise to C the path BAW? m No! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers m B wants to force C to route to w via A m B wants to route only to/from its customers!

239 Network Layer4- 239 Network Layer summary r Service model r Network-layer functions r Instantiation on the Internet m Delivery model m Addressing m Forwarding m Routing

240 Network Layer4- 240 Extra slides

241 Network Layer4-241 Getting a datagram from source to dest. Classful routing example IP datagram: 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B E misc fields source IP addr dest IP addr data datagram remains unchanged, as it travels source to destination addr fields of interest here Dest. Net. next router Nhops 223.1.1 1 223.1.2 223.1.1.4 2 223.1.3 223.1.1.4 2 routing table in A

242 Network Layer4- 242 Getting a datagram from source to dest. Starting at A, given IP datagram addressed to B: r look up net. address of B r find B is on same net. as A r link layer will send datagram directly to B inside link-layer frame m B and A are directly connected 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B E Dest. Net. next router Nhops 223.1.1 1 223.1.2 223.1.1.4 2 223.1.3 223.1.1.4 2 misc fields 223.1.1.1223.1.1.3 data

243 Network Layer4- 243 Getting a datagram from source to dest. Starting at A, dest. E: m look up network address of E m E on different network A, E not directly attached m routing table: next hop router to E is 223.1.1.4 m link layer sends datagram to router 223.1.1.4 inside link- layer frame m datagram arrives at 223.1.1.4 m continued….. 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B E Dest. Net. next router Nhops 223.1.1 1 223.1.2 223.1.1.4 2 223.1.3 223.1.1.4 2 misc fields 223.1.1.1223.1.2.2 data

244 Network Layer4- 244 Getting a datagram from source to dest. 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B E misc fields 223.1.1.1223.1.2.2 data network router Nhops interface 223.1.1 - 1 223.1.1.4 223.1.2 - 1 223.1.2.9 223.1.3 - 1 223.1.3.27 Dest. next Arriving at 223.1.4, destined for 223.1.2.2 m look up network address of E m E on same network as router’s interface 223.1.2.9 router, E directly attached m link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9 m datagram arrives at 223.1.2.2!!! (hooray!)

245 Network Layer4- 245 Issues in Router Table Size r One entry for every host on the Internet m 100M entries r One entry for every LAN m Every host on LAN shares prefix m Still too many r One entry for every organization m Every host in organization shares prefix m Requires careful address allocation m What constitutes an “organization”?

246 Network Layer4- 246 Binary tree Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* 0 0 0 0 0 0 0 0 0 000 0 000 1 11 1111 11111111 0 0 0 0 0 0 0 0 000 0 000 1 11 1111 11111111 1

247 Network Layer4- 247 NAT example #2 r Use the source port field (of TCP or UDP) along with pool of IP addresses m Example: single, globally routable external IP address 192.168.0.1 192.168.0.2 Packet #2 SrcIP=192.168.0.2 SrcPort=1312 DstIP=131.252.220.66 DstPort=21 Packet #1 SrcIP=192.168.0.1 SrcPort=1312 DstIP=131.252.220.66 DstPort=21 NAPT translator ExternalIP=129.95.50.3

248 Network Layer4- 248 NAT example #2 192.168.0.1 192.168.0.2 Packet #2 SrcIP=192.168.0.2 SrcPort=1312 DstIP=131.252.220.66 DstPort=21 Packet #1 SrcIP=192.168.0.1 SrcPort=1312 DstIP=131.252.220.66 DstPort=21 NAPT translator ExternalIP=129.95.50.3 Packet #1 after NAPT SrcIP=129.95.50.3 SrcPort=2000 DstIP=131.252.220.66 DstPort=21 Packet #2 after NAPT SrcIP=129.95.50.3 SrcPort=2001 DstIP=131.252.220.66 DstPort=21

249 Network Layer4- 249 NAT example #2 192.168.0.1 192.168.0.2 NAPT translator ExternalIP=129.95.50.3 Reply #1 SrcIP=131.252.220.66 SrcPort=21 DstIP=129.95.50.3 DstPort=2000 Reply #2 SrcIP=131.252.220.66 SrcPort=21 DstIP=129.95.50.3 DstPort=2001

250 Network Layer4- 250 NAT example #2 192.168.0.1 192.168.0.2 Reply #2 after NAPT SrcIP=131.252.220.66 SrcPort=21 DstIP=192.168.0.2 DstPort=1312 Reply #1 after NAPT SrcIP=131.252.220.66 SrcPort=21 DstIP=192.168.0.1 DstPort=1312 NAPT translator ExternalIP=129.95.50.3 Reply #1 SrcIP=131.252.220.66 SrcPort=21 DstIP=129.95.50.3 DstPort=2000 Reply #2 SrcIP=131.252.220.66 SrcPort=21 DstIP=129.95.50.3 DstPort=2001

251 Network Layer4-251 Link-state broadcasts: Wrapped sequence numbers r Wrapped sequence numbers m 0-N where N is large m If difference between numbers is large, assume a wrap m A is older than B if…. A < B and |A-B| < N/2 or… A > B and |A-B| > N/2 r What about new nodes or rebooted nodes that are out of sync with sequence number space? m Lollipop sequence (Perlman 1983)

252 Network Layer4- 252 Lollipop sequence numbers r Divide sequence number space r Special negative sequence for recovering from reboot m New and rebooted nodes use negative sequence numbers m Upon receipt of negative number, other nodes inform these nodes of current “up-to-date” sequence number r A older than B if m A < 0 and A < B m A > 0, A < B and (B – A) < N/4 m A > 0, A > B and (A – B) > N/4 0 -N/2 N/2 - 1

253 Network Layer4- 253 Distance Vector Algorithm r D x (y) = estimate of least cost from x to y r Distance vector: D x = [D x (y): y є N ] r Node x knows cost to each neighbor v: c(x,v) r Node x maintains D x = [D x (y): y є N ] r Node x also maintains its neighbors’ distance vectors m For each neighbor v, x maintains D v = [D v (y): y є N ]

254 Network Layer4- 254 x y z x y z 0 2 7 ∞∞∞ ∞∞∞ from cost to from x y z x y z 0 from cost to x y z x y z ∞∞ ∞∞∞ cost to x y z x y z ∞∞∞ 710 cost to ∞ 2 0 1 ∞ ∞ ∞ 2 0 1 7 1 0 time x z 1 2 7 y node x table node y table node z table D x (y) = min{c(x,y) + D y (y), c(x,z) + D z (y)} = min{2+0, 7+1} = 2 D x (z) = min{c(x,y) + D y (z), c(x,z) + D z (z)} = min{2+1, 7+0} = 3 32

255 Network Layer4- 255 x y z x y z 0 2 7 ∞∞∞ ∞∞∞ from cost to from x y z x y z 0 2 3 from cost to x y z x y z 0 2 3 from cost to x y z x y z ∞∞ ∞∞∞ cost to x y z x y z 0 2 7 from cost to x y z x y z 0 2 3 from cost to x y z x y z 0 2 3 from cost to x y z x y z 0 2 7 from cost to x y z x y z ∞∞∞ 710 cost to ∞ 2 0 1 ∞ ∞ ∞ 2 0 1 7 1 0 2 0 1 7 1 0 2 0 1 3 1 0 2 0 1 3 1 0 2 0 1 3 1 0 2 0 1 3 1 0 time x z 1 2 7 y node x table node y table node z table D x (y) = min{c(x,y) + D y (y), c(x,z) + D z (y)} = min{2+0, 7+1} = 2 D x (z) = min{c(x,y) + D y (z), c(x,z) + D z (z)} = min{2+1, 7+0} = 3

256 Network Layer4- 256 x y z x y z 0 2 7 ∞∞∞ ∞∞∞ from cost to from x y z x y z 0 2 3 from cost to x y z x y z 0 2 3 from cost to x y z x y z ∞∞ ∞∞∞ cost to x y z x y z 0 2 7 from cost to x y z x y z 0 2 3 from cost to x y z x y z 0 2 3 from cost to x y z x y z 0 2 7 from cost to x y z x y z ∞∞∞ 710 cost to ∞ 2 0 1 ∞ ∞ ∞ 2 0 1 7 1 0 2 0 1 7 1 0 2 0 1 3 1 0 2 0 1 3 1 0 2 0 1 3 1 0 2 0 1 3 1 0 time x z 1 2 7 y node x table node y table node z table D x (y) = min{c(x,y) + D y (y), c(x,z) + D z (y)} = min{2+0, 7+1} = 2 D x (z) = min{c(x,y) + D y (z), c(x,z) + D z (z)} = min{2+1, 7+0} = 3

257 Network Layer4- 257 VC implementation A VC consists of: 1. Path from source to destination 2. VC numbers, one number for each link along path 3. Entries in forwarding tables in routers along path r Packet belonging to VC carries a VC number. r VC number must be changed on each link. m New VC number comes from forwarding table

258 Network Layer4- 258 Forwarding table 12 22 32 1 2 3 VC number interface number Incoming interface Incoming VC # Outgoing interface Outgoing VC # 1 12 3 22 2 63 1 18 3 7 2 17 1 97 3 87 … … Forwarding table in northwest router: Routers maintain connection state information!

259 Network Layer4- 259 Forwarding table Destination Address Range Link Interface 11001000 00010111 00010000 00000000 through 0 11001000 00010111 00010111 11111111 11001000 00010111 00011000 00000000 through 1 11001000 00010111 00011000 11111111 11001000 00010111 00011000 00000000 through 2 11001000 00010111 00011111 11111111 otherwise 3 4 billion possible entries

260 Network Layer4- 260 RIP Table example (continued) Router: giroflee.eurocom.fr Three attached class C networks (LANs) Router only knows routes to attached LANs Default router used to “go up” Route multicast address: 224.0.0.0 Loopback interface (for debugging) Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ------ --------- 127.0.0.1 127.0.0.1 UH 0 26492 lo0 192.168.2. 192.168.2.5 U 2 13 fa0 193.55.114. 193.55.114.6 U 3 58503 le0 192.168.3. 192.168.3.5 U 2 25 qaa0 224.0.0.0 193.55.114.6 U 3 0 le0 default 193.55.114.129 UG 0 143454

261 Network Layer4-261 DUAL r Distributed Update Algorithm m Garcia-Luna-Aceves 1989 m Goal: Avoid transient loops in DV and LS algorithms Similar in flavor to route poisoning and path holddown m 2 ideas A path shorter than current path cannot contain a loop Based on diffusing computation (Dijkstra-Scholten 1980) –Wait until computation completes before changing routes in response to a new update –Similar to path-holddown m 3 kinds of messages Update, query, reply m 2 states for routers Active (queries outstanding), passive

262 Network Layer4- 262 DUAL On update if (lower cost) adopt else if (higher cost) { if (from next hop) { if (any path exists < old length from next hop) switch path else freeze route send query to all neighbors except next hop go into active wait for reply from all neighbors update route return to passive } send reply to all querying neighbors }

263 Network Layer4- 263 Hierarchical routing example 12 3 1.1 1.2 2.1 2.2 3.1 3.2 2.2.1 4 4.1 4.2 5 5.1 5.2 EGP IGP EGP IGP EGP

264 Network Layer4- 264 Inter-AS routing r EGP r BGP

265 Network Layer4- 265 BGP route selection r Router may learn about more than 1 route to some prefix. Router must select route. r Elimination rules: 1. Local preference value attribute: policy decision, hot potato routing 2. Shortest AS-PATH 3. Closest NEXT-HOP router 4. Additional criteria

266 Network Layer4- 266 Path attributes & BGP routes r When advertising a prefix, advert includes BGP attributes. m prefix + attributes = “route” r Two important attributes: m AS-PATH: contains the ASs through which the advert for the prefix passed: AS 67 AS 17 m NEXT-HOP: Indicates the specific internal-AS router to next-hop AS. (There may be multiple links from current AS to next-hop-AS.) r When gateway router receives route advert, uses import policy to accept/decline.

267 Network Layer4- 267 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b Intra-AS Routing algorithm Inter-AS Routing algorithm Forwarding table 3c Interconnected ASes r Forwarding table is configured by both intra- and inter-AS routing algorithm m Intra-AS sets entries for internal dests m Inter-AS & Intra-As sets entries for external dests

268 Network Layer4- 268 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c Inter-AS tasks r Suppose router in AS1 receives datagram for which dest is outside of AS1 m Router should forward packet towards one of the gateway routers, but which one? AS1 needs: 1. to learn which dests are reachable through AS2 and which through AS3 2. to propagate this reachability info to all routers in AS1 Job of inter-AS routing!

269 Network Layer4- 269 Example: Setting forwarding table in router 1d r Suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 (gateway 1c) but not from AS2. r Inter-AS protocol propagates reachability info to all internal routers. r Router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c. r Puts in forwarding table entry (x,I).

270 Network Layer4- 270 Learn from inter-AS protocol that subnet x is reachable via multiple gateways Use routing info from intra-AS protocol to determine costs of least-cost paths to each of the gateways Hot potato routing: Choose the gateway that has the smallest least cost Determine from forwarding table the interface I that leads to least-cost gateway. Enter (x,I) in forwarding table Example: Choosing among multiple ASes r Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2. r To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x. r This is also the job on inter-AS routing protocol! r Hot potato routing: send packet towards closest of two routers.

271 Network Layer4-271 Distance Vector in Practice r RIP and RIP2 m Uses split-horizon/poison reverse r BGP m Propagates entire path m Path also used for effecting policies

272 Network Layer4- 272 BGP path selection r router may learn about more than 1 route to some prefix. Router must select route. r elimination rules: 1. local preference value attribute: policy decision 2. shortest AS-PATH 3. closest NEXT-HOP router: hot potato routing 4. additional criteria

273 Network Layer4- 273 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

274 Network Layer4- 274 R1 R2 R3R4 source duplication R1 R2 R3R4 in-network duplication duplicate creation/transmission duplicate Broadcast Routing r deliver packets from source to all other nodes r source duplication is inefficient: r source duplication: how does source determine recipient addresses?

275 Network Layer4- 275 In-network duplication r flooding: when node receives brdcst pckt, sends copy to all neighbors m Problems: cycles & broadcast storm r controlled flooding: node only brdcsts pkt if it hasn’t brdcst same packet before m Node keeps track of pckt ids already brdcsted m Or reverse path forwarding (RPF): only forward pckt if it arrived on shortest path between node and source r spanning tree m No redundant packets received by any node

276 Network Layer4- 276 A B G D E c F A B G D E c F (a) Broadcast initiated at A (b) Broadcast initiated at D Spanning Tree r First construct a spanning tree r Nodes forward copies only along spanning tree

277 Network Layer4- 277 A B G D E c F 1 2 3 4 5 (a)Stepwise construction of spanning tree A B G D E c F (b) Constructed spanning tree Spanning Tree: Creation r Center node r Each node sends unicast join message to center node m Message forwarded until it arrives at a node already belonging to spanning tree

278 Multicast Routing: Problem Statement r Goal: find a tree (or trees) connecting routers having local mcast group members m tree: not all paths between routers used m source-based: different tree from each sender to rcvrs m shared-tree: same tree used by all group members Shared tree Source-based trees

279 Approaches for building mcast trees Approaches: r source-based tree: one tree per source m shortest path trees m reverse path forwarding r group-shared tree: group uses one tree m minimal spanning (Steiner) m center-based trees …we first look at basic approaches, then specific protocols adopting these approaches

280 Shortest Path Tree r mcast forwarding tree: tree of shortest path routes from source to all receivers m Dijkstra’s algorithm R1 R2 R3 R4 R5 R6 R7 2 1 6 3 4 5 i router with attached group member router with no attached group member link used for forwarding, i indicates order link added by algorithm LEGEND S: source

281 Reverse Path Forwarding if (mcast datagram received on incoming link on shortest path back to center) then flood datagram onto all outgoing links else ignore datagram  rely on router’s knowledge of unicast shortest path from it to sender  each router has simple forwarding behavior:

282 Reverse Path Forwarding: example result is a source-specific reverse SPT –may be a bad choice with asymmetric links R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member datagram will be forwarded LEGEND S: source datagram will not be forwarded

283 Reverse Path Forwarding: pruning r forwarding tree contains subtrees with no mcast group members m no need to forward datagrams down subtree m “prune” msgs sent upstream by router with no downstream group members R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member prune message LEGEND S: source links with multicast forwarding P P P

284 Shared-Tree: Steiner Tree r Steiner Tree: minimum cost tree connecting all routers with attached group members r problem is NP-complete r excellent heuristics exists r not used in practice: m computational complexity m information about entire network needed m monolithic: rerun whenever a router needs to join/leave

285 Center-based trees r single delivery tree shared by all r one router identified as “center” of tree r to join: m edge router sends unicast join-msg addressed to center router m join-msg “processed” by intermediate routers and forwarded towards center m join-msg either hits existing tree branch for this center, or arrives at center m path taken by join-msg becomes new branch of tree for this router

286 Center-based trees: an example Suppose R6 chosen as center: R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member path order in which join messages generated LEGEND 2 1 3 1

287 Internet Multicasting Routing: DVMRP r DVMRP: distance vector multicast routing protocol, RFC1075 r flood and prune: reverse path forwarding, source-based tree m RPF tree based on DVMRP’s own routing tables constructed by communicating DVMRP routers m no assumptions about underlying unicast m initial datagram to mcast group flooded everywhere via RPF m routers not wanting group: send upstream prune msgs

288 DVMRP: continued… r soft state: DVMRP router periodically (1 min.) “forgets” branches are pruned: m mcast data again flows down unpruned branch m downstream router: reprune or else continue to receive data r routers can quickly regraft to tree m following IGMP join at leaf r odds and ends m commonly implemented in commercial routers m Mbone routing done using DVMRP

289 Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers?  mcast datagram encapsulated inside “normal” (non-multicast- addressed) datagram  normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router  receiving mcast router unencapsulates to get mcast datagram physical topology logical topology

290 PIM: Protocol Independent Multicast r not dependent on any specific underlying unicast routing algorithm (works with all) r two different multicast distribution scenarios : Dense:  group members densely packed, in “close” proximity.  bandwidth more plentiful Sparse:  # networks with group members small wrt # interconnected networks  group members “widely dispersed”  bandwidth not plentiful

291 Consequences of Sparse-Dense Dichotomy: Dense r group membership by routers assumed until routers explicitly prune r data-driven construction on mcast tree (e.g., RPF) r bandwidth and non- group-router processing profligate Sparse : r no membership until routers explicitly join r receiver- driven construction of mcast tree (e.g., center-based) r bandwidth and non-group- router processing conservative

292 PIM- Dense Mode flood-and-prune RPF, similar to DVMRP but  underlying unicast protocol provides RPF info for incoming datagram  less complicated (less efficient) downstream flood than DVMRP reduces reliance on underlying routing algorithm  has protocol mechanism for router to detect it is a leaf-node router

293 PIM - Sparse Mode r center-based approach r router sends join msg to rendezvous point (RP) m intermediate routers update state and forward join r after joining via RP, router can switch to source-specific tree m increased performance: less concentration, shorter paths R1 R2 R3 R4 R5 R6 R7 join all data multicast from rendezvous point rendezvous point

294 PIM - Sparse Mode sender(s): r unicast data to RP, which distributes down RP-rooted tree r RP can extend mcast tree upstream to source r RP can send stop msg if no attached receivers m “no one is listening!” R1 R2 R3 R4 R5 R6 R7 join all data multicast from rendezvous point rendezvous point

295 Network Layer4- 295 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

296 Network Layer4- 296 Router Architecture Overview Two key router functions: rRouting mDetermine route taken by packets from source to destination mRun protocol (RIP, OSPF, BGP) Generate forwarding table from routing algorithms Algorithms based on either (LS,DV) rForwarding mProcess of moving packets from input port to output port mLookup forwarding table given information in packet mSwitch/forward datagrams from incoming to outgoing link based on route

297 Network Layer4- 297 What Does a Router Look Like? r Routing processor/controller m Handles routing protocols, error conditions r Line cards m Network interface cards r Forwarding engine m Fast path routing (hardware vs. software) r Backplane m Switch or bus interconnect

298 Network Layer4- 298 Typical mode of operation r Packet arrives arrives at inbound line card r Header transferred to forwarding engine r Forwarding engine determines output interface given a table initialized by routing processor r Forwarding engine signals result to line card r Packet copied to outbound line card

299 Network Layer4- 299 Routing Processor r Runs routing protocol r Uploads forwarding table to forwarding engines m Forwarding engines with two forwarding tables to allow easy switchover (double buffering) r Typically performs “slow-path” processing m ICMP error messages m IP option processing m IP fragmentation m IP multicast packets

300 Network Layer4- 300 Input Port Functions Decentralized switching: r given datagram dest., lookup output port using forwarding table in input port memory r goal: complete input port processing at ‘line speed’ r queuing: if datagrams arrive faster than forwarding rate into switch fabric Physical layer: bit-level reception Data link layer: e.g., Ethernet see chapter 5

301 Network Layer4-301 Input Port Queuing r Fabric slower than input ports combined => queuing may occur at input queues r Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward r queueing delay and loss due to input buffer overflow!

302 Network Layer4- 302 Input Port Queuing r Possible solution m Virtual output buffering Maintain per output buffer at input Solves head of line blocking problem Each of MxN input buffer places bid for output

303 Network Layer4- 303 Forwarding Engine r Two major components m Lookup logic/software Data structures and algorithms to lookup route table See previous section on IP route lookup m Caches Small, fast memory storing recent lookups m Alternatives Hardware-support Hints

304 Network Layer4- 304 Caches r Leverage temporal locality r Many packets to same destination m Long flows help, short flows do not r Similar to idea behind IP switching (ATM/MPLS) where long-lived flows map into single label r Example m Partridge, et. al. “A 50-Gb/s IP Router”, IEEE Trans. On Networking, Vol 6, No 3, June 1998. m 8KB L1 Icache Holds full forwarding code m 96KB L2 cache Forwarding table cache m 16MB L3 cache Full forwarding table x 2 - double buffered for updates

305 Network Layer4- 305 Alternatives r Lookup via content addressable memory (CAM) m Hardware based route lookup m Input = tag, output = value associated with tag m Requires exact match with tag Multiple cycles (1 per prefix length searched) with single CAM Multiple CAMs (1 per prefix) searched in parallel m Ternary CAM 0,1,don’t care values in tag match Priority (i.e. longest prefix) by order of entries in CAM r “Spatial caching” via protocol acceleration m Add clue (5 bits) to IP header m Indicate where IP lookup ended on previous node (Bremler-Barr SIGCOMM 99)

306 Network Layer4- 306 Types of network switching fabrics Memory Bus Multistage interconnection Crossbar interconnection

307 Network Layer4- 307 Types of network switching fabrics r Issues m Switch contention Packets arrive faster than switching fabric can switch Speed of switching fabric versus line card speed determines input queuing vs. output queuing

308 Network Layer4- 308 Switching Via Memory First generation routers: r packet copied by system’s (single) CPU r 2 bus crossings per datagram r speed limited by memory bandwidth Second generation routers: r input port processor performs lookup, copy into memory r Cisco Catalyst 8500 Input Port Output Port Memory System Bus

309 Network Layer4- 309 Switching Via Bus r Datagram from input port memory directly to output port memory via a shared bus r Issues m Bus contention: switching speed limited by bus bandwidth r Examples m 1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not regional or backbone)

310 Network Layer4-310 Switching Via An Interconnection Network r Overcome bus bandwidth limitations r Crossbar networks m Fully connected (n 2 elements) m All one-to-one, invertible permutations supported r Issues m Crossbar with N 2 elements hard to scale

311 Network Layer4-311 Switching Via An Interconnection Network r Multi-stage interconnection networks (Banyan) m Initially developed to connect processors in multiprocessor m Typically O(n log n) elements m Datagram fragmented fixed length cells, switched through the fabric r Issues m Blocking (not all one-to-one, invertible permutations supported) r Example m Cisco 12000: Gbps through an interconnection network A B C D W X Y Z

312 Network Layer4-312 Output Ports r Output contention m Datagrams arrive from fabric faster than output port’s transmission rate m Buffering required m Scheduling discipline chooses among queued datagrams for transmission

313 Network Layer4-313 Output port queueing r buffering when arrival rate via switch exceeds ouput line speed r queueing (delay) and loss due to output port buffer overflow!


Download ppt "Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers)."

Similar presentations


Ads by Google