Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers).

Slides:



Advertisements
Similar presentations
CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-
Advertisements

CS 457 – Lecture 16 Global Internet - BGP Spring 2012.
8-1 Last time □ Network layer ♦ Introduction forwarding vs. routing ♦ Virtual circuit vs. datagram details connection setup, teardown VC# switching forwarding.
Announcement r Recitation tomorrow on Project 2 r Midterm Survey at the end of this class.
Week 5: Internet Protocol Continue to discuss Ethernet and ARP –MTU –Ethernet and ARP packet format IP: Internet Protocol –Datagram format –IPv4 addressing.
Routing - I Important concepts: link state based routing, distance vector based routing.
Network Layer4-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley,
Chapter 5 The Network Layer.
N/W Layer Addressing1 Instructor: Anirban Mahanti Office: ICT Class Location: ICT 121 Lectures: MWF 12:00 – 12:50 Notes.
Chapter 4 Network Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 14.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Network Layer Overview and IP
11- IP Network Layer4-1. Network Layer4-2 The Internet Network layer forwarding table Host, router network layer functions: Routing protocols path selection.
Network Layer4-1 IP: Internet Protocol r Datagram format r IPv4 addressing r DHCP: Dynamic Host Configuration Protocol r NAT: Network Address Translation.
Network Layer4-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley,
Network Layer4-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley,
Network Layer4-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach Featuring the Internet, 2 nd edition. Jim Kurose, Keith Ross Addison-Wesley,
Network Layer4-1 Data Communication and Networks Lecture 6 Networks: Part 1 Circuit Switching, Packet Switching, The Network Layer October 13, 2005.
Announcement r Project 2 extended to 2/20 midnight r Project 3 available this weekend r Homework 3 available today, will put it online.
Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers).
Network Layer session 1 TELE3118: Network Technologies Week 4: Network Layer Basics, Addressing Some slides have been taken from: r Computer Networking:
Network Layer4-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.5 Routing algorithms m Link state m Distance.
EE 122: Intra-domain routing Ion Stoica September 30, 2002 (* this presentation is based on the on-line slides of J. Kurose & K. Rose)
12 – NAT, ICMP, IPv6 Network Layer4-1. Network Layer4-2 Chapter 4 Network Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd.
Network Layer4-1 NAT: Network Address Translation local network (e.g., home network) /24 rest of.
Network Layer Goals: understand principles behind network layer services: –routing (path selection) –dealing with scale –how a router works –advanced topics:
CS 1652 The slides are adapted from the publisher’s material All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Jack Lange.
12 – IP, NAT, ICMP, IPv6 Network Layer.
Data Communications and Computer Networks Chapter 4 CS 3830 Lecture 18 Omar Meqdadi Department of Computer Science and Software Engineering University.
Network Layer4-1 Chapter 4: Network Layer Chapter goals: r understand principles behind network layer services: m network layer service models m forwarding.
Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers).
Transport Layer 3-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012  CPSC.
CIS 3360: Internet: Network Layer Introduction Cliff Zou Spring 2012.
1 Chapter 4: Network Layer r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Hierarchical routing.
7-1 Last time □ Wireless link-layer ♦ Introduction Wireless hosts, base stations, wireless links ♦ Characteristics of wireless links Signal strength, interference,
1 CSE3213 Computer Network I Network Layer (7.1, 7.3, ) Course page: Slides modified from Alberto Leon-Garcia.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 Network Layer introduction.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note on the use of these.
Network Layer4-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note on the use of these.
Internet Protocol ECS 152B Ref: slides by J. Kurose and K. Ross.
4: Network Layer4-1 Schedule Today: r Finish Ch3 r Collect 1 st Project r See projects run r Start Ch4 Soon: r HW5 due Monday r Last chance for Qs r First.
The Network Layer & Routing
1 Network Layer Lecture 15 Imran Ahmed University of Management & Technology.
1 Network Layer Lecture 16 Imran Ahmed University of Management & Technology.
Network Layer4-1 Chapter 4 roadmap 4.1 Introduction and Network Service Models 4.2 Routing Principles 4.3 Hierarchical Routing 4.4 The Internet (IP) Protocol.
Sharif University of Technology, Kish Island Campus Internet Protocol (IP) by Behzad Akbari.
Transport Layer3-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.
Network Layer4-1 Datagram networks r no call setup at network layer r routers: no state about end-to-end connections m no network-level concept of “connection”
Data Communications and Computer Networks Chapter 4 CS 3830 Lecture 20 Omar Meqdadi Department of Computer Science and Software Engineering University.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note on the use of these.
Data Communications and Computer Networks Chapter 4 CS 3830 Lecture 19 Omar Meqdadi Department of Computer Science and Software Engineering University.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Network Layer4-1 NAT: Network Address Translation local network (e.g., home network) /24 rest of.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Network Layer4-1 Chapter 4: Network Layer Chapter goals: r understand principles behind network layer services: m network layer service models m forwarding.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note on the use of these.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
CSE 421 Computer Networks. Network Layer 4-2 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside.
4: Network Layer4-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.
Network Layer 4-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note.
Data Communication and Networks
Chapter 4: Network Layer
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 CPSC 335 Data Communication.
Chapter 4 Network Layer All material copyright
Chapter 4: Network Layer
Overview The Internet (IP) Protocol Datagram format IP fragmentation
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
DHCP: Dynamic Host Configuration Protocol
Presentation transcript:

Network Layer4-1 Chapter 4 Network Layer A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following:  If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!)  If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material. Thanks and enjoy! JFK/KWR All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down Approach 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July 2007.

Network Layer4-2 Chapter 4: Network Layer Chapter goals: r understand principles behind network layer services: m network layer service models m forwarding versus routing m how a router works m routing (path selection) m dealing with scale m advanced topics: IPv6, mobility r instantiation, implementation in the Internet

Network Layer4-3 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-4 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers segments to transport layer r network layer protocols in every host, router r router examines header fields in all IP datagrams passing through it application transport network data link physical application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical

Network Layer4-5 Network layer functions r Connection setup m connection-oriented, host- to-host connection m datagram r Delivery semantics: m Unicast, broadcast, multicast, anycast m In-order, any-order r Security m secrecy, integrity, authenticity r Demux to upper layer m next protocol m Can be either transport or network (tunneling) r Quality-of-service m provide predictable performance r Fragmentation m break-up packets based on data-link layer properties r Routing m path selection and packet forwarding r Addressing m flat vs. hierarchical m global vs. local m variable vs. fixed length

Network Layer4-6 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-7 Network service model Combining the functions into a particular network Q: What service model for “channel” transporting datagrams from sender to rcvr? Example services for individual datagrams: r guaranteed delivery r guaranteed delivery with less than 40 msec delay Example services for a flow of datagrams: r in-order datagram delivery r guaranteed minimum bandwidth to flow r restrictions on changes in inter- packet spacing (jitter)

Network Layer4-8 Network layer connection and connection-less service r Datagram network provides network-layer connectionless service r VC network provides network-layer connection service m Analogous to the transport-layer services, but on a host-to-host basis with an in-network implementation

Network Layer4-9 Connection-oriented virtual circuits r Circuit abstraction m Examples: ATM, frame relay, X.25, phone network m Model call setup and signaling for each call before data can flow guaranteed performance during call call teardown and signaling to remove call m Network support each packet carries circuit identifier (not destination host ID) every router on source-dest path maintains “state” for each passing circuit link, router resources (bandwidth, buffers) allocated to VC to guarantee circuit-like performance application transport network data link physical application transport network data link physical 1. Initiate call 2. incoming call 3. Accept call 4. Call connected 5. Data flow begins 6. Receive data

Network Layer4-10 Connectionless datagram service r Postal service abstraction (Internet) m Model no call setup or teardown at network layer no service guarantees m Network support no state within network on end-to-end connections packets forwarded based on destination host ID packets between same source-dest pair may take different paths application transport network data link physical application transport network data link physical 1. Send data 2. Receive data

Network Layer4-11 Datagram or VC network: why? Internet r data exchange among computers m “elastic” service, no strict timing req. r “smart” end systems (computers) m can adapt, perform control, error recovery m simple inside network, complexity at “edge” r many link types m different characteristics m uniform service difficult ATM r evolved from telephony r human conversation: m strict timing, reliability requirements m need for guaranteed service r “dumb” end systems m telephones m complexity inside network m only network provider can deploy new services!

Network Layer4-12 Network layer service models: Network Architecture Internet ATM Service Model best effort CBR VBR ABR UBR Bandwidth none constant rate guaranteed rate guaranteed minimum none Loss no yes no Order no yes Timing no yes no Congestion feedback no (inferred via loss) no congestion no congestion yes no Guarantees ?

Network Layer4-13 Adding circuits to the Internet r Intserv, Diffserv, RSVP m At the end of course if time permits m Chapter 7 in book

Network Layer4-14 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-15 The Internet Network layer forwarding table Host, router network layer functions: Routing protocols path selection RIP, OSPF, BGP IP protocol addressing conventions datagram format packet handling conventions ICMP protocol error reporting router “signaling” Transport layer: TCP, UDP Link layer physical layer Network layer

Network Layer4-16 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-17 IP datagram format ver length 32 bits data (variable length, typically a TCP or UDP segment) 16-bit identifier Internet checksum time to live 32 bit source IP address IP protocol version number header length (bytes) max number remaining hops (decremented at each router) for fragmentation/ reassembly total datagram length (bytes) upper layer protocol to deliver payload to head. len type of service “type” of data flgs fragment offset upper layer 32 bit destination IP address Options (if any) E.g. timestamp, record route taken, specify list of routers to visit. how much overhead with TCP? r 20 bytes of TCP r 20 bytes of IP r = 40 bytes + app layer overhead

Network Layer4-18 IP header r Version m Currently at 4, next version 6 r Header length m Length of header (20 bytes plus options) r Type of Service m Typically ignored m Replaced by DiffServ and ECN r Length m Length of IP fragment (payload) r Identification m To match up with other fragments r Flags m Don’t fragment flag m More fragments flag r Fragment offset m Where this fragment lies in entire IP datagram m Measured in 8 octet units (11 bit field)

Network Layer4-19 IP header (cont) r Time to live m Ensure packets exit the network r Protocol m Demultiplexing to higher layer protocols (TCP, UDP, SCTP) r Header checksum m Ensures some degree of header integrity m Relatively weak – 16 bit r Source IP, Destination IP (32 bit addresses) r Options m E.g. Source routing, record route, etc. m Performance issues Poorly supported

Network Layer4-20 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 functions m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-21 Recall network layer functions r How does IPv4 support.. m Connection setup m Delivery semantics m Security m Demux to upper layer m Quality-of-service m Fragmentation m Addressing m Routing

Network Layer4-22 IP connection setup r Hourglass design r No support for network layer connections m Unreliable datagram service m Out-of-order delivery possible m Connection semantics only at higher layer m Compare to ATM and phone network…

Network Layer4-23 IP delivery semantics r No reliability guarantees m Loss r No ordering guarantees m Out-of-order delivery possible r Unicast mostly m IP broadcast ( ) not forwarded m IP multicast supported, but not widely used to

Network Layer4-24 IP security r Weak support for integrity m IP checksum IP has a header checksum, leaves data integrity to TCP/UDP Catch errors within router or bridge that are not detected by link layer Incrementally updated as routers change fields m No support for secrecy, authenticity r IPsec m Retrofit IP network layer with encryption and authentication m

Network Layer4-25 Internet checksum (review) Sender: r treat segment contents as sequence of 16-bit integers (See TCP checksum) r checksum: addition (1’s complement sum) of segment contents r sender puts checksum value into UDP checksum field Receiver: r compute checksum of received segment r check if computed checksum equals checksum field value: m NO - error detected m YES - no error detected. But maybe errors nonetheless? Goal: detect “errors” (e.g., flipped bits) in transmitted packet (note: used at transport layer only)

Network Layer4-26 IP demux to upper layer r m Protocol type field 1 = ICMPICMP 2 = IGMP 3 = GGP 4 = IP in IP 6 = TCPTCP 8 = EGP 9 = IGP 17 = UDPUDP 29 = ISO-TP4 80 = ISO-IP 88 = IGRP 89 = OSPFIGP 94 = IPIP

Network Layer4-27 IP quality of service r IP originally had “type-of-service” (TOS) field to eventually support quality m Not used, ignored by most routers r Mid 90s m Integrated services (intserv) and RSVP signalling m Per-flow end-to-end QoS support Per-flow signaling Per-flow network resource allocation (*FQ, *RR scheduling algorithms) Setup and match flows on connection ID

Network Layer4-28 IP quality of service r RSVP m m Provides end-to-end signaling to network elements m General purpose protocol for signaling information m Not used now on a per-flow basis to support int-serv, but being reused for diff-serv. r intserv m Defines service model (guaranteed, controlled-load) m Dozens of scheduling algorithms to support these services WFQ, W 2 FQ, STFQ, Virtual Clock, DRR, etc.

Network Layer4-29 IP quality of service r Why did RSVP, intserv fail? m Complexity Scheduling Routing (pinning routes) Per-flow signaling overhead m Lack of scalability Per-flow state m Economics Providers with no incentive to deploy SLA, end-to-end billing issues m QoS a weak-link property Requires every device on an end-to-end basis to support flow

Network Layer4-30 IP quality of service r Now it’s diffserv… m Use the “type-of-service” bits as a priority marking m m m m m Core network relatively stateless m AF Assured forwarding (drop precedence) m EF Expedited forwarding (strict priority handling)

Network Layer4-31 IP Fragmentation & Reassembly r network links have MTU (max.transfer unit) - largest possible link-level frame. m different link types, different MTUs r large IP datagram (can be 64KB) “fragmented” within network m one datagram becomes several datagrams m IP header on each fragment m IP identifier and offset fields to identify and order fragments fragmentation: in: one large datagram out: 3 smaller datagrams reassembly

Network Layer4-32 IP Fragmentation & Reassembly r Where to do reassembly? m End nodes avoids unnecessary work m Dangerous to do at intermediate nodes Buffer space Must assume single path through network May be re-fragmented later on in the route again fragmentation: in: one large datagram out: 3 smaller datagrams reassembly

Network Layer4-33 IP Fragmentation and Reassembly ID =x offset =0 fragflag =0 length =4000 ID =x offset =0 fragflag =1 length =1500 ID =x offset =185 fragflag =1 length =1500 ID =x offset =370 fragflag =0 length =1040 One large datagram becomes several smaller datagrams Example r 4000 byte datagram r MTU = 1500 bytes 1480 bytes in data field offset = 1480/8

Network Layer4-34 Fragmentation is Harmful r Uses resources poorly m Forwarding costs per packet m Best if we can send large chunks of data m Worst case: packet just bigger than MTU r Poor end-to-end performance m Loss of a fragment makes other fragments useless r Reassembly is hard m Buffering constraints

Network Layer4-35 Fragmentation r Path MTU Discovery m Remove fragmentation from the network m Mandatory in IPv6 Network layer does no fragmentation m Hosts dynamically discover smallest MTU of path Algorithm: –Initialize MTU to MTU for first hop –Send datagrams with Don’t Fragment bit set –If ICMP “pkt too big” msg, decrease MTU What happens if path changes? –Periodically (>5mins, or >1min after previous increase), increase MTU Some routers will return proper MTU

Network Layer4-36 Fragmentation r References m Characteristics of Fragmented IP Traffic on Internet Links. Colleen Shannon, David Moore, and k claffy -- CAIDA, UC San Diego. ACM SIGCOMM Internet Measurement Workshop program.html Characteristics of Fragmented IP Traffic on Internet Links program.html – C. A. Kent and J. C. Mogul, "Fragmentation considered harmful," in Proceedings of the ACM Workshop on Frontiers in Computer Communications Technology, pp , Aug acts/87.3.html acts/87.3.html

Network Layer4-37 IP Addressing r IP address: m 32-bit identifier for host/router interface r interface: connection between host, router and physical link m routers typically have multiple interfaces m host may have multiple interfaces m IP addresses associated with interface, not host, router =

Network Layer4-38 IP Addressing r IP address: m network part (high order bits) m host part (low order bits) r What’s a network ? m all interfaces that can physically reach each other without intervening router m each interface shares the same network part of IP address network consisting of 3 IP networks (for IP addresses starting with 223, first 24 bits are network address) LAN

Network Layer4-39 Subnets How to find the networks (subnets)? r Detach each interface from router, host r create “islands of isolated networks r Each isolated network is called a subnet r Notation: m Interfaces on a subnet share identical “bits” as prefix m Bits identified by mask machine addresses all begin with the same 24 bits Also denoted by / / / /24 Subnet mask: /24

Network Layer4-40 Subnets How many?

Network Layer4-41 How do networks get IP addresses? r Total IP address size: 4 billion r Initially one large class (8-bit network, 24-bit host) m ISP given an 8-bit network number to manage m Each router keeps track of each network (2 8 =256 routes) m Each network has 16 million hosts m Problem: one size does not fit all r Classful addressing m Accomodate smaller networks (LANs) m Class A: 128 networks, 16M hosts m Class B: 16K networks, 64K hosts m Class C: 2M networks, 256 hosts m Total routes potentially > 2,113,664 routes ! High Order Bits Format 7 bits of net, 24 bits of host (/8) 14 bits of net, 16 bits of host (/16) 21 bits of net, 8 bits of host (/24) Class A B C

Network Layer4-42 IP address classes Network IDHost ID 816 Class A 32 0 Class B 10 Class C 110 Multicast Addresses Class D 1110 Reserved for experiments Class E Network ID Host ID to to to to

Network Layer4-43 Special IP Addresses r Private addresses – – Class A: ( /8 prefix) – Class B: ( /12 prefix) – Class C: ( /16 prefix) r : local host (a.k.a. the loopback address) r m IP broadcast to local hardware that must not be forwarded m r m IP address of unassigned host (BOOTP, ARP, DHCP) m Default route advertisement

Network Layer4-44 IP Addressing Problem #1 (1984) r Inefficient use of address space m Class A (rarely given out, sparse usage) m Class B = 64k hosts Very few LANs have close to 64K hosts Electrical/LAN limitations, performance or administrative reasons e.g., class B net allocated enough addresses for 64K hosts, even if only 2K hosts in that network m Need simple/address-efficient way to get multiple “networks” Reduce the number of addresses that are assigned, but not used r Subnet addressing m m Split large address ranges into multiple smaller ones (subnet) m Dramatically increases potential number of routes!

Network Layer4-45 Subnetting r Variable length subnet masks m Subnet a class B address space into several chunks Network Host Network HostSubnet Mask

Network Layer4-46 Subnetting Example r Assume an organization was assigned a class B address r Assume it has < 100 hosts per subnet m How many host bits do we need? Seven m What is the network mask? or /25 m How many subnets of this size can be created within this address space? m List them

Network Layer4-47 Subnetting Example r Assume an organization was assigned a class B address r Assume it has < 100 hosts per subnet m How many host bits do we need? Seven m What is the network mask? or /25 m How many subnets of this size can be created within this address space? 512(/16 = 2 16 hosts, /25 = 2 7 hosts … 2 16 /2 7 = 2 9 = 512) m List them /25(… *******) /25(… *******) /25(… *******) /25(… *******) … /25(… *******) /25(… *******)

Network Layer4-48 Subnetting Example r Split the following network into 16 equal subnetworks m /17

Network Layer4-49 Subnetting Example r Split the following network into 16 equal subnetworks m / m Split into 16 parts using next 4 significant bits etc. m Solution / / /21 etc.

Network Layer4-50 IP Address Problem #2 (1991) r Address space depletion m In danger of running out of classes A and B m Class A very few in number, IANA frugal in giving them out m Class B subnetting only applied to new allocations of class B existing class B networks sparsely populated people refuse to give it back m Class C plenty available, but too small for most domains r Supernetting m Assign multiple consecutive class C blocks as one block m Allows class C usage while limiting number of routes used m

Network Layer4-51 IP Address Problem #2 (1991) r Example m Combine the following class C networks into one larger network / / / / / / / /24 Answer: / * * * * * * * *

Network Layer4-52 IP Address Problem #3 (1991) r Explosion of routes m Subnetting class B m Increasing use of class C explodes # of routes r Remove classes m Classless Inter-Domain Routing (CIDR) m Arbitrary aggregation of contiguous addresses m m

Network Layer4-53 IP addressing: CIDR r Original classful addressing m Use class structure (A, B, C) to determine network ID for route lookup r CIDR: Classless InterDomain Routing m Do not use classes to determine network ID m network portion of address of arbitrary length m route format: a.b.c.d/x, where x is # bits in network portion of address network part host part /23

Network Layer4-54 CIDR r Assign any range of addresses to network m Use common part of address as network number m e.g., addresses * to * have the first 20 bits in common. Thus, we use this as the network number m netmask is /20, /xx is valid for almost any xx m /20 r Enables more efficient usage of address space (and router tables) m More on how this impacts routing later….

Network Layer4-55 CIDR example r Consider the following sets of /24 networks m /24 m /24 m /24 m /24 m /24 m /24 m /24 m /24 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

Network Layer4-56 CIDR example r Consider the following sets of /24 networks m /24 = * m /24 = * /23 m /24 = * m /24 = * m /24 = * m /24 = * /22 m /24 = * m /24 = * /23 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

Network Layer4-57 CIDR example r Consider the following sets of /24 networks m /24 m /24 m /24 m /24 m /24 m /24 m /24 m /24 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

Network Layer4-58 CIDR example r Consider the following sets of /24 networks m /24 = * m /24 = * = /24 m /24 = * = m /24 = * = /23 m /24 = * = m /24 = * = m /24 = * = m /24 = * = /22 r Using CIDR, what is the minimum number of prefixes that can be used to represent this range exactly?

Network Layer4-59 CIDR route aggregation “Send me anything with addresses beginning /20” / / /23 Fly-By-Night-ISP Organization 0 Organization 7 Internet Organization 1 ISPs-R-Us “Send me anything with addresses beginning /16” /23 Organization Hierarchical addressing allows efficient advertisement of routing information:

Network Layer4-60 CIDR route aggregation ISP X given 16 class C networks * to * (or /20) /24, / /24, / /24, / /24, /24 Large company / 21 Medium company / / / / /24 Small company / / /24 Tiny company / 24 Adjacent ISP router ISP X Route Interface / Route Interface / / / /

Network Layer4-61 CIDR Shortcomings r Customer selecting a new provider m Renumbering required / / / / /23 Provider 1Provider /16

Network Layer4-62 CIDR shortcomings r Multi-homing ISPs-R-Us has a more specific route to Organization 1 “Send me anything with addresses beginning /20” / / /23 Fly-By-Night-ISP Organization 0 Organization 7 Internet Organization 1 ISPs-R-Us “Send me anything with addresses beginning /16 or /23” /23 Organization

Network Layer4-63 Getting IP addresses Q: How does network get IP addresses? A: organization gets allocated portion of its provider ISP’s address space m ISPs get it from ICANN: Internet Corporation for Assigned Names and Numbers Allocates addresses, manages DNS, resolves disputes m Customers get sub-blocks from ISPs ISP's block /20 Organization /23 Organization /23 Organization /23... ….. …. …. Organization /23

Network Layer4-64 CIDR and IP route lookup (forwarding) r IP routing m Done only based on destination IP address m Lookup route in forwarding table r Classful IP Route Lookup m In the early days, address classes made it easy A: 0 | 7 bit network | 24 bit host (16M each) B: 10 | 14 bit network | 16 bit host (64K) C: 110 | 21 bit network | 8 bit host (255) m Address would specify prefix for forwarding table m Simple lookup

Network Layer4-65 Classful IP forwarding r address m Class B address – route prefix is m Lookup in class B forwarding table m Prefix – part of address that really matters for routing r Forwarding table contains m List of prefix entries m A few fixed prefix lengths (8/16/24) r Large tables m 2 Million class C networks m Sites with multiple class C networks have multiple route entries at every router

Network Layer4-66 CIDR and IP forwarding r CIDR advantages m Saves space in route tables m Makes more efficient use of address space ISP allocated 8 class C chunks, to – / / / /24 – / / / /24 Combine 8 class C entries with 1 combined entry –First 21 bits are network number –Written as /21 m Routing protocols carry prefix length with destination network address

Network Layer4-67 CIDR and IP forwarding r CIDR disadvantage m Makes route lookup more complex CIDR fundamentally changes route lookup algorithm Before CIDR –Separate class A/B/C route tables each with O(1) lookup –Table lookup based on class (A,B,C) After CIDR –One table containing many prefix lengths –Must find the most specific route that matches the destination IP address in packet –Must match against all routes simultaneously via longest prefix match

Network Layer4-68 Longest prefix matching Prefix Match Link Interface otherwise 3 DA: Examples DA: Which interface?

Network Layer4-69 CIDR example Provider Routing to the network Packet to arrives Path is R2 – R1 – H1 – H2 H2 H3 H4 R / / / / /24 R H /31

Network Layer4-70 CIDR example Routing table at R2 DestinationNext HopInterface lo0 Default or 0/0provider / / / Subnet Routing Packet to Matches /22 H2 H3 H4 R / / / / /24 R H / /

Network Layer4-71 CIDR example Routing table at R1 DestinationNext HopInterface lo0 Default or 0/ / Subnet Routing Packet to Matches /31 Longest prefix match / / /24 H2 H3 H4 R / / / / /24 R H / matches both routes, use longest prefix match

Network Layer4-72 CIDR example Routing table at H1 DestinationNext HopInterface lo0 Default or 0/ / / Subnet Routing Packet to Direct route Longest prefix match H2 H3 H4 R / / / / /24 R H / matches both routes, use longest prefix match

Network Layer4-73 Longest-prefix matching r Algorithms and data structures for CIDR-based IP forwarding m Ruiz-Sanchez, Biersack, Dabbous, “Survey and Taxonomy of IP address Lookup Algorithms”, IEEE Network, Vol. 15, No. 2, March 2001 Binary tree Multi-bit tree LC tree Lulea tree Full expansion/compression Binary search on prefix lengths Binary range search Multiway range search Multiway range trees Binary search on hash tables (Waldvogel – SIGCOMM 97)

Network Layer4-74 Binary tree Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A BCDEFGHI r Data structure to support longest-prefix match for forwarding r Bit-wise traversal from left-to-right m Continue as far as possible while keeping track of deepest match Example: Example:

Network Layer4-75 Path-compressed binary tree r Eliminate single branch point nodes m Saves unnecessary memory lookups m Branches labelled by bit to examine m Continue as far as possible while keeping track of deepest match r Variants include PATRICIA and BSD trees Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A BCDEFGHI 0 Bit=3Bit=2 Bit=3 Bit=4 Bit=1 Example: x

Network Layer4-76 Example #2 r Create a binary tree that implements the following forwarding table Route Prefixes A 0* B 00010* C 00011* D *

Network Layer4-77 Example #2: Binary tree Route Prefixes A 0* B 00010* C 00011* D * A B 0 CD

Network Layer4-78 Example #2 r Create a path-compressed binary tree that implements the following forwarding table Route Prefixes A 0* B 00010* C 00011* D *

Network Layer4-79 Example #2: Path-compressed binary tree Route Prefixes A 0* B 00010* C 00011* D * A 0 B 0 C Bit=1 Bit=5 1 D

Network Layer4-80 Multi-bit trees r Problem with all single-bit trees m Still incur too many memory accesses per lookup m Lookup done a single bit at a time m CPUs access 32-bits at a time r Multi-bit trees m Compare multiple bits at a time m Stride = number of bits being examined m Reduces memory accesses m Increases memory required Forces table expansion for prefixes falling in between strides m Two types Variable stride multi-bit trees Fixed stride multi-bit trees r Most route entries are Class C m Optimize “stride” based on this

Network Layer4-81 Variable stride multi-bit tree r Single level has variable stride lengths Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A ADDBCCE GFIH Route for C expanded/duplicated Stride either 1 or 2 bits

Network Layer4-82 Fixed stride multi-bit tree r Single level has equal strides Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* A AA CEDDDBFFGHGHII

Network Layer4-83 Issues r Scaling m IPv6? r Stride choice m Tuning stride to route table

Network Layer4-84 IP Address Problem #4 (1994) r Even with CIDR, address space running out m IPv6 still being developed, a long way from being deployed r Network Address Translation (NAT) m Alternate solution to address space depletion problem Kludge (but useful) m Sits between your network and the Internet m Dynamically assign source address from a pool of available addresses “Statistically multiplex” address usage Each machine gets unique, external IP address out of pool Replaces local, private, network layer source IP addresses to global IP addresses m Has a pool of global IP addresses (less than number of hosts on your network)

Network Layer4-85 NAT Illustration Global Internet Private Network Pool of global IP addresses Operation: Source (S) wants to talk to Destination (D): Create S g -S p mapping Replace S p with S g for outgoing packets Replace S g with S p for incoming packets P G DgDg SpSp Data NAT DestinationSource DgDg SgSg Data

Network Layer4-86 IP addressing and NAT r What if we only have one IP address? m Add port translation to NAT Sometimes referred to as NAPT (Network Address Port Translator) m Both addresses and ports are translated Translates Paddr + flow info to Gaddr + new flow info Uses TCP/UDP port numbers m Potentially thousands of simultaneous connections with one global IP address 16-bit port-number field: 60,000 simultaneous connections with a single LAN-side address!

Network Layer4-87 NAT with port translation local network (e.g., home network) /24 rest of Internet Datagrams with source or destination in this network have /24 address for source, destination (as usual) All datagrams leaving local network have same single source NAT IP address: , different source port numbers

Network Layer4-88 NAT r Advantages m range of addresses not needed from ISP: just a small set of IP addresses for all devices m can change addresses of devices in local network without notifying outside world m can change ISP without changing addresses of devices in local network m devices inside local net not explicitly addressable, visible by outside world (a security plus).

Network Layer4-89 NAT Implementation: NAT router must: m outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)... remote clients/servers will respond using (NAT IP address, new port #) as destination addr. m remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair m incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table

Network Layer4-90 NAT example S: , 3345 D: , : host sends datagram to , 80 NAT translation table WAN side addr LAN side addr , , 3345 …… S: , 80 D: , S: , 5001 D: , : NAT router changes datagram source addr from , 3345 to , 5001, updates table S: , 80 D: , : Reply arrives dest. address: , : NAT router changes datagram dest addr from , 5001 to , 3345

Network Layer4-91 NAT is controversial r Routers should only process up to layer 3 m violates network transparency key feature that allows one to deploy any application without coordinating with network infrastructure m implicit assumption that network header is unchanged in network m address shortage should instead be solved by IPv6 r Other problems m No inbound connections Must be taken into account by app designers, eg, P2P applications m Some protocols carry addresses e.g., FTP carries addresses in text What is the problem? m Encryption

Network Layer4-92 NAT problem #1: traversal r Incoming connections m client want to connect to server with address m server address local to LAN (client can’t use it as destination addr) m only one externally visible NATted address: r solution 1: statically configure NAT to forward incoming connection requests at given port to server m e.g., ( , port 2500) always forwarded to port m Or use DMZ host NAT router Client ?

Network Layer4-93 NAT problem #1: traversal r solution 2: Universal Plug and Play (UPnP) Internet Gateway Device (IGD) Protocol. Allows NATted host to:  learn public IP address ( )  enumerate existing port mappings  add/remove port mappings (with lease times) i.e., automate static NAT port map configuration NAT router IGD

Network Layer4-94 NAT problem #1: traversal r solution 3: relaying (used in Skype) m NATed server establishes connection to relay m External client connects to relay m relay bridges packets between to connections NAT router Client 1. connection to relay initiated by NATted host 2. connection to relay initiated by client 3. relaying established

Network Layer4-95 NAT problem #2: loss of transparency r Breaks applications that assume network does not modify packets r Prevents new applications that make the same assumption r Example m ftp, NAT, and PORT command

Network Layer4-96 ftp, NAT and PORT command r Normal FTP mode m Server has port 20, 21 reserved m Client initiates control connection to port 21 on server m Client allocates port X for data connection m Client passes its IP address and the data connection port (X) in a PORT command to server m Server parses PORT command and initiates connection from its own port 20 to the client on port X r What if client is behind a NAT device?

Network Layer4-97 ftp, NAT and PORT command r Problem m ftp server connects to a private IP address! Packet #1 SrcIP= SrcPort=1312 DstIP= DstPort= PORT command “Connect to me at IP= Port=20” NAPT translator ExternalIP= Packet #1 after NAPT SrcIP= SrcPort=2000 DstIP= DstPort= PORT command “Connect to me at IP= Port=20”

Network Layer4-98 ftp, NAT and PORT command r Solution #1 m Modify packets at NAT NAT must captures outgoing connections destined for port 21 Looks for PORT command and translates address/port payload – _port.htm What if NAT doesn’t parse PORT command correctly? What if ftp server is running on a different port than 21?

Network Layer4-99 ftp, NAT and PORT command r Need to rewrite points to bigger problem! m Loss of network transparency m Network must modify application data in order for application to run correctly! Packet #1 SrcIP= SrcPort=1312 DstIP= DstPort= PORT command “Connect to me at IP= Port=20” NAPT translator ExternalIP= Packet #1 after NAPT SrcIP= SrcPort=2000 DstIP= DstPort= PORT command “Connect to me at IP= Port=2001”

Network Layer4-100 ftp, NAT, and PORT command r Solution #2 m Passive (PASV) mode Client initiates control connection to port 21 on server Client enables “Passive” mode Server responds with PORT command giving client the IP address and port to use for subsequent data connection (usually port 20, but can be bypassed) Client initiates data connection by connecting to specified port on server m Most web browsers do PASV-mode ftp

Network Layer4-101 ftp, NAT, and PORT command r PASV mode transfers NAPT translator ExternalIP= After PASV command SrcIP= SrcPort=21 DstIP= DstPort= PORT command “Connect to me at IP= Port=20”

Network Layer4-102 ftp, NAT, and PORT command r Solution #2 m What if server is behind a NAT device? See client issues m What if both client and server are behind NAT devices? Problem Similar to P2P xfers and Skype –See IETF STUN WG

Network Layer4-103 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-104 ICMP: Internet Control Message Protocol r Essentially a network-layer protocol for passing control messages r used by hosts & routers to communicate network-level information m error reporting: unreachable host, network, port, protocol m echo request/reply (used by ping) r network-layer “above” IP: m ICMP msgs carried in IP datagrams r ICMP message: type, code plus first 8 bytes of IP datagram causing error r editor.org/rfc/rfc792.txt Type Code description 0 0 echo reply (ping) 3 0 dest. network unreachable 3 1 dest host unreachable 3 2 dest protocol unreachable 3 3 dest port unreachable 3 6 dest network unknown 3 7 dest host unknown 4 0 source quench (congestion control - not used) 8 0 echo request (ping) 9 0 route advertisement 10 0 router discovery 11 0 TTL expired 12 0 bad IP header

Network Layer4-105 ICMP and traceroute r What do “real” Internet delay & loss look like?  Traceroute program: provides delay measurement from source to router along end-end Internet path towards destination. For all i: m sends three packets that will reach router i on path towards destination m router i will return packets to sender m sender times interval between transmission and reply. 3 probes

Network Layer4-106 ICMP and traceroute r Source sends series of UDP segments to dest m First has TTL =1 m Second has TTL=2, etc. m Unlikely port number r When nth datagram arrives to nth router: m Router discards datagram m And sends to source an ICMP message (type 11, code 0) m Message includes name of router& IP address r When ICMP message arrives, source calculates RTT r Traceroute does this 3 times Stopping criterion r UDP segment eventually arrives at destination host r Destination returns ICMP “host unreachable” packet (type 3, code 3) r When source gets this ICMP, stops.

Network Layer4-107 Examples 1 cs-gw ( ) 1 ms 1 ms 2 ms 2 border1-rt-fa5-1-0.gw.umass.edu ( ) 1 ms 1 ms 2 ms 3 cht-vbns.gw.umass.edu ( ) 6 ms 5 ms 5 ms 4 jn1-at wor.vbns.net ( ) 16 ms 11 ms 13 ms 5 jn1-so wae.vbns.net ( ) 21 ms 18 ms 18 ms 6 abilene-vbns.abilene.ucaid.edu ( ) 22 ms 18 ms 22 ms 7 nycm-wash.abilene.ucaid.edu ( ) 22 ms 22 ms 22 ms ( ) 104 ms 109 ms 106 ms 9 de2-1.de1.de.geant.net ( ) 109 ms 102 ms 104 ms 10 de.fr1.fr.geant.net ( ) 113 ms 121 ms 114 ms 11 renater-gw.fr1.fr.geant.net ( ) 112 ms 114 ms 112 ms 12 nio-n2.cssi.renater.fr ( ) 111 ms 114 ms 116 ms 13 nice.cssi.renater.fr ( ) 123 ms 125 ms 124 ms 14 r3t2-nice.cssi.renater.fr ( ) 126 ms 126 ms 124 ms 15 eurecom-valbonne.r3t2.ft.net ( ) 135 ms 128 ms 133 ms ( ) 126 ms 128 ms 126 ms 17 * * * 18 * * * 19 fantasia.eurecom.fr ( ) 132 ms 128 ms 136 ms traceroute: gaia.cs.umass.edu to Three delay measurements from gaia.cs.umass.edu to cs-gw.cs.umass.edu * means no response (probe lost, router not replying) trans-oceanic link

Network Layer4-108 Try it r Some routers labeled with airport code of city they are located in m traceroute Packets go to SEA, back to PDX, SJC m traceroute Packets go to SMF, SFO, SJC, NYC, EWR. m traceroute Packets go to Pittock block to Eugene m traceroute Packets go to SEA and back to PDX

Network Layer4-109 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-110 IPv6 r Redefine functions of IP (version 4) m What changes should be made in…. IP addressing IP delivery semantics IP quality of service IP security IP routing IP fragmentation IP error detection

Network Layer4-111 IPv6 r Initial motivation: 32-bit address space soon to be completely allocated (est. 2008) r Additional motivation: m Remove ancillary functionality Speed processing/forwarding m Add missing, but essential functionality header changes to facilitate QoS new “anycast” address: route to “best” of several replicated servers IPv6 datagram format: m fixed-length 40 byte header m no fragmentation allowed

Network Layer4-112 IPv6 Header (Cont) Priority: identify priority among datagrams in flow Flow Label: identify datagrams in same “flow.” (concept of“flow” not well defined). Next header: identify next protocol for data

Network Layer4-113 IPv6 Changes r Scale – addresses are 128bit m Header size? r Simplification m Removes infrequently used parts of header m 40 byte fixed header vs. 20+ byte variable header r IPv6 removes checksum m IPv4 checksum = provide extra protection on top of data- link layer and below transport layer m End-to-end principle Is this necessary? IPv6 answer =>No m Relies on upper layer protocols to provide integrity m Reduces processing time at each hop

Network Layer4-114 IPv6 Changes r IPv6 eliminates fragmentation m Requires path MTU discovery r ICMPv6: new version of ICMP m additional message types, e.g. “Packet Too Big” r Protocol field replaced by next header field m Unify support for protocol demultiplexing as well as option processing r Option processing m Options allowed, but only outside of header, indicated by “Next Header” field m Options header does not need to be processed by every router Large performance improvement Makes options practical/useful

Network Layer4-115 IPv6 Changes r TOS replaced with traffic class octet m Support QoS via DiffServ r FlowID field m Help soft state systems, accelerate flow classification m Maps well onto TCP connection or stream of UDP packets on host-port pair r Additional requirements m Support for security m Support for mobility m Easy auto-configuration

Network Layer4-116 Transition From IPv4 To IPv6 r Not all routers can be upgraded simultaneous m no “flag days” m How will the network operate with mixed IPv4 and IPv6 routers? r Two proposed approaches: m Dual Stack: some routers with dual stack (v6, v4) can “translate” between formats m Tunneling: IPv6 carried as payload in an IPv4 datagram among IPv4 routers

Network Layer4-117 Tunneling A B E F IPv6 tunnel Logical view: Physical view: A B E F IPv6 IPv4

Network Layer4-118 Tunneling A B E F IPv6 tunnel Logical view: Physical view: A B E F IPv6 C D IPv4 Flow: X Src: A Dest: F data Flow: X Src: A Dest: F data Flow: X Src: A Dest: F data Src:B Dest: E Flow: X Src: A Dest: F data Src:B Dest: E A-to-B: IPv6 E-to-F: IPv6 B-to-C: IPv6 inside IPv4 B-to-C: IPv6 inside IPv4

Network Layer4-119 Dual Stack Approach r Dual-stack router translates b/w v4 and v6 m v4 addresses have special v6 equivalents m Issue: how to translate “FlowField” of v6 ?

Network Layer4-120 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-121 Two Key Network-Layer Functions r forwarding: move packets from router’s input to appropriate router output r routing: determine route taken by packets from source to dest. m routing algorithms analogy: r routing: process of planning trip from source to dest r forwarding: process of getting through single interchange

Network Layer value in arriving packet’s header routing algorithm local forwarding table header value output link Interplay between routing, forwarding r Previously: Forward based on forwarding table r Q: How to generate forwarding tables? Routing algorithms and protocols

Network Layer4-123 Who handles IP routing functions? m Source (IP source routing) m Network edge devices m Network routers

Network Layer4-124 Source Routing r IP source route option m Packet carries path to destination Entire path (strict) Partial path (loose) Attach list of IP addresses within header r Router processing m Examine first step in directions m Increment pointer offset in header m Forward to step m Copy entire source route header on fragmentation

Network Layer4-125 Source Routing Example Receiv er Packet R1/R2/R3 Sender R2 R3 R1 R2/R3 R3

Network Layer4-126 Source Routing r Advantages m Switches can be very simple and fast r Disadvantages m Variable (unbounded) header size m Sources must know or discover topology (e.g., failures) r Typical use m Ad-hoc networks (DSR) m Machine room networks (Myrinet)

Network Layer4-127 Network edge device routing r Virtual circuits, tag switching r Connection setup phase m Map IP route into appropriate label, wavelength, circuit at the network edge m Switch on label, wavelength, circuit ID in core m ATM, MPLS, lambda switching r In-network processing m Lookup flow ID – simple table lookup m Potentially replace flow ID with outgoing flow ID m Forward to output port

Network Layer4-128 Virtual Circuits Examples Receiver edge Packet 1,5  3,7 Sender edge ,7  4, ,2  3,6 R2 R3 R

Network Layer4-129 Virtual Circuits r Advantages m More efficient lookup (simple table lookup) Easier for hardware implementations m More flexible (different path for each flow) m Can reserve bandwidth at connection setup r Disadvantages m Still need to route connection setup request m More complex failure recovery – must recreate connection state r Typical uses m ATM – combined with fix sized cells m MPLS – tag switching for IP networks

Network Layer4-130 IP Datagrams on Virtual Circuits r Challenge – when to setup connections m At bootup time – permanent virtual circuits (PVC) Large number of circuits m For every packet transmission Connection setup is expensive m For every connection What is a connection? How to route connectionless traffic? m Based on traffic VC for long-lived flows Normal IP forwarding for all other flows

Network Layer4-131 Network routers (Global IP addresses) r Hop-by-hop forwarding based on destination IP carried by packet m Each packet has destination IP address m Each router has forwarding table of.. destination IP  next hop IP address m IP route table calculated in network routers r Most prevalent way to route on the Internet m Distributed routing algorithm for calculating forwarding tables

Network Layer4-132 Global Address Example Receiver Packet R Sender R2 R3 R1 R R R  3 R  4 R  3 R

Network Layer4-133 Global Addresses r Advantages m Simple error recovery r Disadvantages m Every router knows about every destination Potentially large tables m All packets to destination take same route

Network Layer4-134 Comparison Source RoutingGlobal Addresses Header SizeWorstOK – Large address Router Table SizeNone Number of hosts (prefixes) Forward OverheadBestPrefix matching Virtual Circuits OK (larger than global if IP payload) Number of circuits Good (table index) Setup OverheadNone Error RecoveryTell all hostsTell all routers Connection Setup Tell all routers, Tear down circuit and re-route

Network Layer4-135 Routing protocols Graph abstraction for routing algorithms: r Graph: G = (N,E) m N=graph nodes (routers) A, B, C, D, E, F m E=graph edges (links) (A,B), (A,D), (A,C), (B,C), (B,D), (C,D), (C,E), (C,F), (D,E), (E,F) Cost associated with edge –Delay, $, congestion r Routing algorithms find minimum cost paths through graph Goal: determine “good” path (sequence of routers) thru network from source to dest. A E D CB F

Network Layer4-136 Routing Algorithm classification Global or decentralized information? Global: r all routers have complete topology, link cost info r “link state” algorithms Decentralized: r router knows physically-connected neighbors, link costs to neighbors r iterative process of computation, exchange of info with neighbors r “distance vector” algorithms

Network Layer4-137 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-138 A Link-State Routing Algorithm Dijkstra’s algorithm r net topology, link costs known to all nodes m accomplished via “link state broadcast” m all nodes have same info r computes least cost paths from one node (‘source”) to all other nodes m gives forwarding table for that node m iterative: after k iterations, know least cost path to k dest.’s

Network Layer4-139 Dijkstra’s algorithm r Start condition m Each node assumed to know state of links to its neighbors r Step 1: Link state broadcast m Each node broadcasts its local link states to all other nodes m Reliable flooding mechanism r Step 2: Shortest-path tree calculation m Each node locally computes shortest paths to all other nodes from global state m Dijkstra’s shortest path tree (SPT) algorithm

Network Layer4-140 Link state broadcast r Link State Packets (LSPs) to broadcast state to all nodes r Periodically, each node creates a link state packet containing: m Node ID m List of neighbors and link cost m Sequence number m Time to live (TTL) m Node outputs LSP on all its links

Network Layer4-141 Link state broadcast r Reliable Flooding m When node J receives LSP from node K If LSP is the most recent LSP from K that J has seen so far, J saves it in database and forwards a copy on all links except link LSP was received on Otherwise, discard LSP m How to tell more recent Use sequence numbers –Same method as sliding window protocols –Needed to avoid stale information from flood –Problem: sequence number wrap-around »Addressed algorithmically using lollipop sequence numbering

Network Layer4-142 Shortest-path tree calculation Notation:  c(x,y): link cost from node x to y; = ∞ if not direct neighbors  D(v): current value of cost of path from source to dest. v  p(v): predecessor node along path from source to v  N': set of nodes whose least cost path definitively known

Network Layer4-143 Dijsktra’s Algorithm 1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'

Network Layer4-144 Shortest-path tree calculation (Dijkstra’s algorithm example) AFBDEC B CDE F D(v) = min( D(v), D(w) + c(w,v) )

Network Layer4-145 Dijkstra’s algorithm example AFBDEC B CDE F D(v) = min( D(v), D(w) + c(w,v) )

Network Layer4-146 Dijkstra’s algorithm example AFBDEC B CDE F D(v) = min( D(v), D(w) + c(w,v) )

Network Layer4-147 Dijkstra’s algorithm example AFBDEC B CDE F D(v) = min( D(v), D(w) + c(w,v) )

Network Layer4-148 Dijkstra’s algorithm example AFBDEC B CDE F D(v) = min( D(v), D(w) + c(w,v) )

Network Layer4-149 Dijkstra’s algorithm example AFB DE C B CDE F D(v) = min( D(v), D(w) + c(w,v) )

Network Layer4-150 Dijkstra’s algorithm example A E D CB F Resulting shortest-path tree from A: B D E C F (A,B) (A,D) destination link Resulting forwarding table in A:

Network Layer4-151 Link state algorithm characteristics r Computation overhead m n nodes m each iteration: need to check all nodes, w, not in N n*(n+1)/2 comparisons: O(n**2) more efficient implementations possible: O(n log(n)) r Space requirements m Size of LSDB r Bandwidth requirements m Reliable flooding O(N*E) r Stability m Consistent LSDBs required for loop-free paths A B C D Packet from C  A may loop around BDC if B knows about failure and C & D do not X

Network Layer4-152 Link-state algorithm issues Oscillations possible: r e.g., link cost = amount of carried traffic r Example: path to A flaps as traffic routed clockwise and counter-clockwise r Common problem in load-based link metrics m A. Khanna and J. Zinky, "The Revised ARPANET Routing Metric," in ACM SIGCOMM, 1989, pp A D C B 1 1+e e 0 e A D C B 2+e e 1 A D C B 0 2+e 1+e A D C B 2+e 0 e 0 1+e 1 initially … recompute routing … recompute

Network Layer4-153 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-154 Distance vector routing algorithms r Variants used in m Early ARPAnet m RIP (intra-domain routing protocol) m BGP (inter-domain routing protocol) r Distributed next hop computation m “Gossip with immediate neighbors until you find the best route” m Best route is achieved when there are no more changes r Unit of information exchange m Vector of distances to destinations

Network Layer4-155 Distance Vector Algorithm Bellman-Ford algorithm (1957) Define D x (y) := cost of least-cost path from x to y Then D x (y) = min {c(x,v) + D v (y) } where min is taken over all neighbors v of x v

Network Layer4-156 Bellman-Ford example u y x wv z Clearly, D v (z) = 5, D x (z) = 3, D w (z) = 3 D u (z) = min { c(u,v) + D v (z), c(u,x) + D x (z), c(u,w) + D w (z) } = min {2 + 5, 1 + 3, 5 + 3} = 4 Node that achieves minimum is next hop in shortest path ➜ forwarding table B-F equation says:

Network Layer4-157 Bellman-Ford r Update distance information iteratively m Start with link table (as with Dijkstra), calculate distance table iteratively m Distance table data structure table of known distances and next hops kept per node row for each possible destination column for each directly-attached neighbor to node A E D CB D E () A B C D A1764A1764 B D5542D5542 cost to destination via destination Distance table at node E

Network Layer4-158 D j (k,*) Bellman-Ford algorithm r Centralized version i j k j’ k’ c(i,j) c(i,j’) D j’ (k,*) D i (k,*) For node i while there is a change in D for all k not neighbor of i for each j neighbor of i D i (k,j) = c(i,j) + D j (k,*) if D i (k,j) < D i (k,*) { D i (k,*) = D i (k,j) H i (k) = j } D X (Y,Z) distance from X to Y, via Z as next hop c(X,Z) + min {D Z (Y,w)} w = = D X (Y,*) Minimum known distance from X to Y = H X (Y) = Next hop node from X to Y

Network Layer4-159 Distance table example for node E A E D CB D E () A B C D A1764A1764 B D5542D5542 cost to destination via destination D E (C,D) c(E,D) + min {D D (C,w)} w = = 2+2 = 4 D E (A,D) c(E,D) + min {D D (A,w)} w = = 2+3 = 5 D E (A,B) c(E,B) + min {D B (A,w)} w = = 8+6 = 14 loop! H X (Y) =

Network Layer4-160 Distance table gives forwarding table D () A B C D A1764A1764 B D5542D5542 E cost to destination via destination ABCD ABCD A,1 D,5 D,4 Outgoing link to use, cost destination Distance table Routing table H (Y) X

Network Layer4-161 Distributed Bellman-Ford r Make Bellman algorithm distributed (Ford-Fulkerson 1962) m Each node i has distance vector estimates to other nodes m Iterate Each node sends around and recalculates D[i,*] When a node x receives new DV estimate from neighbor, it updates its own DV using B-F equation: If estimates change, broadcast entire table to neighbors –continues until no nodes exchange info. –self-terminating: no “signal” to stop m D[i,*] eventually converges to shortest distance D x (y) ← min v {c(x,v) + D v (y)} for each node y ∊ N

Network Layer4-162 Distributed Bellman-Ford overview Asynchronous: r “triggered updates” m no need to exchange info/iterate in lock step! Iterative: r When local link costs change r When neighbor sends a message that its least cost path has changed for a node Distributed: r nodes communicate only with directly-attached neighbors r each node notifies neighbors only when its least cost path to any destination changes m neighbors then notify their neighbors if necessary wait for (change in local link cost of msg from neighbor) recompute distance table if least cost path to any dest has changed, notify neighbors Each node:

Network Layer4-163 Distributed Bellman-Ford algorithm 1 Initialization: 2 for all adjacent nodes v: 3 D X (*,v) = infinity /* the * operator means "for all rows" */ 4 D X (v,v) = c(X,v) 5 for all destinations, y 6 send min w (D X (y,w)) to each neighbor /* w over all X's neighbors */ At all nodes, X:

Network Layer4-164 Distributed Bellman-Ford algorithm 8 loop 9 wait (until I see a link cost change to neighbor V 10 or until I receive update from neighbor V) if (c(X,V) changes by d) 13 /* change cost to all dest's via neighbor v by d */ 14 /* note: d could be positive or negative */ 15 for all destinations y: D X (y,V) = D X (y,V) + d else if (update received from V wrt destination Y) 18 /* shortest path from V to some Y has changed */ 19 /* V has sent a new value for its min w (D V (Y,w)) */ 20 /* call this received new value is "newval" */ 21 for the single destination Y: D X (Y,V) = c(X,V) + newval if we have a new min w (D X (Y,w)for any destination Y 24 send new value of min w (D X (Y,w)) to all neighbors forever

Network Layer4-165 Analyzing Distributed Bellman-Ford r Continuously send local distance tables of best known routes to all neighbors until your table converges m Computation diffuses until all nodes converge m Will computation converge quickly and deterministically? Not all the time, pathologic cases possible (count-to- infinity) Several algorithms for minimizing such cases

Network Layer4-166 DBF example A B E C D Info at Node A B C D ABC 07~ 701 ~10 ~~ Distance to Node D ~ ~ 2 0 E18~2 1 8 ~ 2 0 E Initial Distance Vectors

Network Layer4-167 DBF example Info at Node A B C D ABC 07 ~ 701 ~ 10 ~~ 2 Distance to Node D ~ ~ 2 0 E18~2 1 8 ~ 2 0 E A B E C D What is the new distance table at E after E receives D’s Routes?

Network Layer4-168 DBF example Info at Node A B C D ABC 07 ~ 701 ~ 10 ~~ 2 Distance to Node D ~ ~ 2 0 E ~ 2 0 E A B E C D What is the new distance table at E after E receives D’s Routes? Cost to C is updated from ~ to 4

Network Layer4-169 DBF example Info at Node A B C D ABC 07 ~ 701 ~10 ~~2 Distance to Node D ~ ~ 2 0 E ~ 2 0 E A B E C D What is the new distance table at A after A receives B’s Routes?

Network Layer4-170 DBF example Info at Node A B C D ABC ~10 ~~2 Distance to Node D ~ ~ 2 0 E ~ 2 0 E A B E C D What is the new distance table at A after A receives B’s Routes? Cost to C is updated from ~ to 8, cost to E unchanged

Network Layer4-171 DBF example Info at Node A B C D ABC ~10 ~~2 Distance to Node D ~ ~ 2 0 E ~ 2 0 E A B E C D What is the new distance table at A after A receives E’s Routes?

Network Layer4-172 DBF example Info at Node A B C D ABC ~10 ~~2 Distance to Node D 3 ~ 2 0 E ~ 2 0 E A B E C D What is the new distance table at A after A receives E’s Routes? Cost to C is updated from 8 to 5, cost to D updated from ~ to 3

Network Layer4-173 DBF example Info at Node A B C D ABC Distance to Node D E E A B E C D And so on, until final distances....

Network Layer4-174 DBF example dest A B C D ABD Next hop E’s routing table A B E C D

Network Layer4-175 DBF (another example) X Z Y D X (Y,Z) c(X,Z) + min {D Z (Y,w)} w = = 7+1 = 8 D X (Z,Y) c(X,Y) + min {D Y (Z,w)} w = = 2+1 = 3

Network Layer4-176 DBF (another example) X Z Y See book for explanation of this example

Network Layer4-177 DBF (good news example) Link cost changes: node detects local link cost change updates distance table (line 15) if cost change in least cost path, notify neighbors (lines 23,24) fast convergence X Z Y 1

Network Layer4-178 DBF (good news example) x z y 1 t 0 ) y detects link-cost change, updates its DV, informs neighbors. t 1 ) z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV. t 2 ) y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z. algorithm terminates “good news travels fast”

Network Layer4-179 DBF (count-to-infinity example) Link cost changes: good news travels fast bad news travels slow - “count to infinity” problem! alternate route implicitly used link that changed X Z Y 60 algorithm continues on!

Network Layer4-180 How are loops caused? r Observation 1: m Y’s metric to X increases r Observation 2: m Z picks Y as next hop to X m But, the implicit path from Z to X includes itself!

Network Layer4-181 DBF: (count-to-infinity example) A BC B C2 1 dest cost A C1 1 dest cost A B1 2 dest cost X

Network Layer4-182 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 ~ dest cost A B1 2 dest cost C Sends Routes to B

Network Layer4-183 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 3 dest cost A B1 2 dest cost B Updates Distance to A

Network Layer4-184 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 3 dest cost A B1 4 dest cost B Sends Routes to C

Network Layer4-185 DBF: (count-to-infinity example) A 25 1 BC B C2 1 dest cost A C1 5 dest cost A B1 4 dest cost C Sends Routes to B

Network Layer4-186 Solutions to looping r Split horizon m Do not advertise route to X to an adjacent neighbor if your route to X goes through that neighbor m If C routes through B to get to A, C does not advertise (C=>A) route to B. r Poisoned reverse m Advertise an infinite distance route to X to an adjacent neighbor if your route to X goes through that neighbor m If C routes through B to get to A, C advertises to B that its distance to A is infinity

Network Layer4-187 Split-horizon with poisoned reverse If Z routes through Y to get to X : Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z) will this completely solve count to infinity problem? X Z Y 60 algorithm terminates new route to X not involving Y can now select and advertise route to X via Z route to X through Y goes thru Z poison it!

Network Layer4-188 Solutions to looping r Split horizon with poisoned reverse m Works for two node loops m Does not work for loops with more nodes A X B C D

Network Layer4-189 Other solutions to looping r Route poisoning m Advertise infinite cost on a route to everyone (not just next hop) when lowest cost route increases m Gets rid of stale information throughout network m Used in conjunction with Path Holdown r Path Holddown m Freeze route for a fixed time Do not switch to an alternate while route poisoning is happening In our example, A and B delay changing and advertising new routes A and B both set route to D to infinity after single step m Configuring holddown delay Delay too large: Slow convergence Delay too small: Count-to-infinity more probable

Network Layer4-190 Other solutions to looping r Path vector m Select loop-free paths m Each route advertisement carries entire path m If a router sees itself in path, it rejects the route m BGP does it this way m Space proportional to diameter of network

Network Layer4-191 Looping r Do solutions completely eliminate loops? m No! Transient loops are still possible m Why? Because implicit path information may be stale m See this in BGP convergence r Only way to fix this m Ensure that you have up-to-date information by explicitly querying

Network Layer4-192 Comparing link-state vs. distance vector r Communication costs r Processing costs r Optimality r Stability m Convergence time m Loop freedom m Oscillation damping

Network Layer4-193 Message complexity, network bandwidth r LS: with n nodes, E links, O(nE) msgs sent m Send info about your neighbors to everyone m Small messages broadcast globally r DV: exchange between neighbors only m Send everything you know to your neighbors m Large messages, but transfers only to neighbors Link State vs. Distance Vector

Network Layer4-194 Link State vs. Distance Vector Speed of Convergence r LS: O(n 2 ) algorithm requires O(nE) msgs m Faster – can forward LSPs before processing m Single SPT calculation r DV: convergence time varies m Fast with triggered updates m count-to-infinity problem m may be routing loops

Network Layer4-195 Link State vs. Distance Vector Space requirements: r LS: maintains entire topology r DV: maintains only neighbor state m path vector maintains routes proportional to network diameter

Network Layer4-196 Link State vs. Distance Vector Robustness: r LS m Can be made robust since sources are aware of alternate paths within topology r DV m Can advertise incorrect paths to all destinations m Incorrect calculation can spread to entire network

Network Layer4-197 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-198 Hierarchical Routing scale: with 200 million destinations: r can’t store all dest’s in routing tables! r routing table exchange would swamp links! r Flat routing does not scale administrative autonomy r internet = network of networks r each network admin may want to control routing in its own network Our routing study thus far - idealization r all routers identical r network “flat” … not true in practice

Network Layer4-199 Routing Hierarchies r Key observation m Need less information with increasing distance to destination m Hierarchical routing saves table size reduces update traffic allows routing to scale r Two radically different approaches m The area hierarchy m The landmark hierarchy Covered in advanced topics at end of course...

Network Layer Areas r Divide network into areas m Areas can have nested sub-areas No path between two sub-areas of an area can exit that area m Within area, each node has routes to every other node m Outside area Each node has routes for other top-level areas only (not nodes within those areas) Inter-area packets are routed to nearest appropriate border router

Network Layer4-201 Internet Routing Hierarchy r Internet areas called “autonomous systems” (AS) m administrative autonomy r routers in same AS run same routing protocol m “intra-AS” routing protocol (IGP) m Each AS can run its own intra-AS routing protocol Border routers m Special routers in AS that directly link to another AS m Responsible for routing to destinations outside AS run intra-AS routing protocol with all other routers in AS run inter-AS routing protocol or exterior gateway protocol (EGP) with other gateway routers in other AS’s

Network Layer Internet Routing Hierarchy Border router A.c m Routing protocols Inter-AS externally Intra-AS internally m Forwarding table configured by both network layer link layer physical layer a b b a a C A B d A.a A.c C.b B.a c b c Forwarding Table

Network Layer Why different Intra- and Inter-AS routing ? Policy: r Intra-AS: single administrative policy m No policy decisions needed, performance dominates m Focus on performance r Inter-AS: ISP wants control over how its traffic routed, who routes through its net. m Policy and monetary factors dominate over performance

Network Layer b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c Inter-AS tasks r Suppose router in AS1 receives datagram for destination outside of AS1 m router should forward packet to gateway router, but which one? AS1 must: 1. learn which dests reachable through AS2, which through AS3 2. propagate this reachability info to all routers in AS1 Job of inter-AS routing!

Network Layer Example: Setting forwarding table in router 1d r suppose AS1 learns (via inter-AS protocol) that subnet x reachable via AS3 (gateway 1c) but not via AS2. r inter-AS protocol propagates reachability info to all internal routers. r router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c. m installs forwarding table entry (x,I) 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c x …

Network Layer Example: Choosing among multiple ASes r now suppose AS1 learns from inter-AS protocol that subnet x is reachable from AS3 and from AS2. r to configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x. m this is also the job of inter-AS routing protocol! 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c x … …

Network Layer Learn from inter-AS protocol that subnet x is reachable via multiple gateways Use routing info from intra-AS protocol to determine costs of least-cost paths to each of the gateways Choose the gateway that has the smallest least cost Determine from forwarding table the interface I that leads to least-cost gateway. Enter (x,I) in forwarding table Example: Choosing among multiple ASes r Cost-based selection

Network Layer AS Categories r Stub: an AS that has only a single connection to one other AS - carries only local traffic. r Multi-homed: an AS that has connections to more than one AS, but does not carry transit traffic r Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)

Network Layer AS categories example AS1AS3AS2AS1AS2AS3AS1AS2 Stub Multi-homed Transit

Network Layer4-210 Path Sub-optimality hop red path vs. 2 hop green path start end

Network Layer4-211 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer4-212 Intra-AS Routing r Also known as Interior Gateway Protocols (IGP) r Most common Intra-AS routing protocols: m RIP: Routing Information Protocol Distance-vector m OSPF: Open Shortest Path First Link-state m IGRP: Interior Gateway Routing Protocol (Cisco proprietary) Distance-vector

Network Layer4-213 RIP (Routing Information Protocol) r Distance vector algorithm m Distance metric: # of hops (max = 15 hops) m Vectors exchanged every 30 sec and when triggered m Static update period leads to synchronization problems m Split horizon with poisonous reverse r Included in BSD-UNIX Distribution in 1982 m RIP-2 in 1993 adds prefix mask for CIDR D C BA u v w x y z destination hops u 1 v 2 w 2 x 3 y 3 z 2 From router A to subsets:

Network Layer4-214 RIP: Example Destination Network Next Router Num. of hops to dest. wA2 yB2 zB7 x--1 ….…..... w xy z A C D B Routing table in D

Network Layer4-215 RIP: Example Destination Network Next Router Num. of hops to dest. wA2 yB2 zB A7 5 x--1 ….…..... Routing table in D w xy z A C D B Dest Next hops w - 1 x - 1 z C 4 …. …... Advertisement from A to D

Network Layer4-216 RIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor/link declared dead m routes via neighbor invalidated m new advertisements sent to neighbors m neighbors in turn send out new advertisements (if tables changed) m link failure info quickly propagates to entire net m poison reverse used to prevent ping-pong loops (infinite distance = 16 hops)

Network Layer4-217 RIP Table processing r RIP routing tables managed by application-level process called routed (route daemon) r advertisements sent in UDP packets, periodically repeated physical link network forwarding (IP) table Transprt (UDP) routed physical link network (IP) Transprt (UDP) routed forwarding table

Network Layer4-218 IGRP (Interior Gateway Routing Protocol) r CISCO proprietary; successor of RIP (mid 80s) m Distance Vector, like RIP m several cost metrics (delay, bandwidth, reliability, load etc) m 90 sec update with triggered updates m Split horizon V1: path holddown V2: route poisoning m uses TCP to exchange routing updates m EIGRP Loop-free routing via DUAL (based on diffused computation) CIDR support

Network Layer4-219 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer OSPF (Open Shortest Path First) r “open”: publicly available r Uses Link State algorithm m LS packet dissemination m Topology map at each node m Route computation using Dijkstra’s algorithm r Advertisements disseminated to entire AS (via flooding) m Carried in OSPF messages directly over IP (rather than TCP or UDP

Network Layer4-221 OSPF “advanced” features (not in RIP) r Security: all OSPF messages authenticated (to prevent malicious intrusion) r Multiple same-cost paths allowed (only one path in RIP) r Integrated uni- and multicast support: m Multicast OSPF (MOSPF) uses same topology data base as OSPF r Hierarchical OSPF in large domains.

Network Layer Hierarchical OSPF r two-level hierarchy: local area, backbone. m Link-state advertisements only in area m each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. r area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers. r backbone routers: run OSPF routing limited to backbone. r boundary routers: connect to other AS’s.

Network Layer Hierarchical OSPF

Network Layer Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer History r Mid-80s: EGP (Exterior Gateway Protocol) m Used in original ARPAnet m Reachability protocol (no shortest path) Single bit for reachability information m Topology restricted to a tree (no cycles allowed) ARPA-managed packet switches at top of tree m Unacceptable once Internet grew to multiple independent backbones r Result: BGP development

Network Layer BGP r BGP (Border Gateway Protocol): the de facto standard r BGP provides each AS a means to: 1. Get subnet reachability information from neighbor ASs. 2. Propagate reachability information to routers within AS. 3. Determine “good” routes to subnets based on reachability information and policy. r Allows a subnet to advertise its existence to rest of the Internet: “I am here” m What if a subnet lies about who it is? m Recent route hijackings

Network Layer Inter-AS routing: BGP r Link state or distance vector? m Problems with distance-vector: Bellman-Ford algorithm may not converge m More problems with link state: Everyone sees every link –LS database too large – entire Internet –Can’t easily control who uses the network (i.e. an ISP may want to hide particular links from being used by others, but link states are broadcast) Metric used by routers not the same – loops –No universal routing metric –Policy drives routing decisions r Result: BGP is a distance-vector protocol

Network Layer BGP r Path Vector protocol: m BGP advertisements to neighbors (peers) contain entire path (i.e, sequence of ASs) to a destination E.g., Gateway X sends its path to dest. Z: –Path (X,Z) = X,Y1,Y2,Y3,…,Z m When AS gets route check if AS already in path If yes, reject route If no, add self and (possibly) advertise route further m Allows for policy application (different metrics) Metrics are local - AS chooses path, protocol ensures no loops Supports CIDR aggregation (BGP4)

Network Layer BGP basics r Pairs of routers (BGP peers) exchange routing info over semi- permanent TCP connections: BGP sessions m Note that BGP sessions do not correspond to physical links. m Two types eBGP and iBGP eBGP between gateways iBGP from gateway to internal routers of an AS r AS2 advertises a prefix to AS1 m AS2 is promising it will forward any datagrams destined to that prefix towards the prefix. m AS2 can aggregate prefixes in its advertisement 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c eBGP session iBGP session

Network Layer Distributing reachability info r With eBGP session between 3a and 1c, AS3 sends prefix reachability info to AS1. r 1c can then use iBGP do distribute this new prefix reach info to all routers in AS1 r 1b can then re-advertise the new reach info to AS2 over the 1b-to-2a eBGP session r When router learns about a new prefix, it creates an entry for the prefix in its forwarding table. 3b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c eBGP session iBGP session

Network Layer4-231 Path attributes & BGP routes r advertised prefix includes BGP attributes. m prefix + attributes = “route” r two important attributes: m AS-PATH: contains ASs through which prefix advertisement has passed: e.g, AS 67, AS 17 m NEXT-HOP: indicates specific internal-AS router to next-hop AS. (may be multiple links from current AS to next-hop-AS) r when gateway router receives route advertisement, uses import policy to accept/decline.

Network Layer BGP messages r Exchanged using TCP. m Advantages: Simplifies BGP No need for periodic refresh - routes are valid until withdrawn, or the connection is lost Incremental updates m Disadvantages BGP TCP spoofing attack Congestion control on a routing protocol? Poor interaction during high load (Code Red)

Network Layer BGP messages r Example messages m OPEN: opens TCP connection to peer and authenticates sender m UPDATE: advertises new path (or withdraws old) m KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request m NOTIFICATION: reports errors in previous msg; also used to close connection

Network Layer Policy with BGP r BGP provides capability for enforcing various policies r Policies are not part of BGP: they are provided to BGP as configuration information r BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s

Network Layer Path Selection Criteria r Path attributes + external (policy) information r Examples: m Hop count m Policy considerations Preference for AS Presence or absence of certain AS m Path origin rejecting false routes m Link dynamics m Early-exit Hot-potato routing for transit packets

Network Layer Examples of BGP Policies r A multi-homed AS refuses to act as transit m Limit path advertisement r A multi-homed AS can become transit for some AS’s m Only advertise paths to some AS’s r An AS can favor or disfavor certain AS’s for traffic transit from itself

Network Layer BGP routing policy r A,B,C are provider networks r X,W,Y are customers (of provider networks) r X is dual-homed: attached to two networks m X does not want to route from B via X to C m.. so X will not advertise to B a route to C

Network Layer BGP routing policy (2) r A advertises to B the path AW r B advertises to X the path BAW r Should B advertise to C the path BAW? m No! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers m B wants to force C to route to w via A m B wants to route only to/from its customers!

Network Layer Network Layer summary r Service model r Network-layer functions r Instantiation on the Internet m Delivery model m Addressing m Forwarding m Routing

Network Layer Extra slides

Network Layer4-241 Getting a datagram from source to dest. Classful routing example IP datagram: A B E misc fields source IP addr dest IP addr data datagram remains unchanged, as it travels source to destination addr fields of interest here Dest. Net. next router Nhops routing table in A

Network Layer Getting a datagram from source to dest. Starting at A, given IP datagram addressed to B: r look up net. address of B r find B is on same net. as A r link layer will send datagram directly to B inside link-layer frame m B and A are directly connected A B E Dest. Net. next router Nhops misc fields data

Network Layer Getting a datagram from source to dest. Starting at A, dest. E: m look up network address of E m E on different network A, E not directly attached m routing table: next hop router to E is m link layer sends datagram to router inside link- layer frame m datagram arrives at m continued… A B E Dest. Net. next router Nhops misc fields data

Network Layer Getting a datagram from source to dest A B E misc fields data network router Nhops interface Dest. next Arriving at , destined for m look up network address of E m E on same network as router’s interface router, E directly attached m link layer sends datagram to inside link-layer frame via interface m datagram arrives at !!! (hooray!)

Network Layer Issues in Router Table Size r One entry for every host on the Internet m 100M entries r One entry for every LAN m Every host on LAN shares prefix m Still too many r One entry for every organization m Every host in organization shares prefix m Requires careful address allocation m What constitutes an “organization”?

Network Layer Binary tree Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111*

Network Layer NAT example #2 r Use the source port field (of TCP or UDP) along with pool of IP addresses m Example: single, globally routable external IP address Packet #2 SrcIP= SrcPort=1312 DstIP= DstPort=21 Packet #1 SrcIP= SrcPort=1312 DstIP= DstPort=21 NAPT translator ExternalIP=

Network Layer NAT example # Packet #2 SrcIP= SrcPort=1312 DstIP= DstPort=21 Packet #1 SrcIP= SrcPort=1312 DstIP= DstPort=21 NAPT translator ExternalIP= Packet #1 after NAPT SrcIP= SrcPort=2000 DstIP= DstPort=21 Packet #2 after NAPT SrcIP= SrcPort=2001 DstIP= DstPort=21

Network Layer NAT example # NAPT translator ExternalIP= Reply #1 SrcIP= SrcPort=21 DstIP= DstPort=2000 Reply #2 SrcIP= SrcPort=21 DstIP= DstPort=2001

Network Layer NAT example # Reply #2 after NAPT SrcIP= SrcPort=21 DstIP= DstPort=1312 Reply #1 after NAPT SrcIP= SrcPort=21 DstIP= DstPort=1312 NAPT translator ExternalIP= Reply #1 SrcIP= SrcPort=21 DstIP= DstPort=2000 Reply #2 SrcIP= SrcPort=21 DstIP= DstPort=2001

Network Layer4-251 Link-state broadcasts: Wrapped sequence numbers r Wrapped sequence numbers m 0-N where N is large m If difference between numbers is large, assume a wrap m A is older than B if…. A < B and |A-B| < N/2 or… A > B and |A-B| > N/2 r What about new nodes or rebooted nodes that are out of sync with sequence number space? m Lollipop sequence (Perlman 1983)

Network Layer Lollipop sequence numbers r Divide sequence number space r Special negative sequence for recovering from reboot m New and rebooted nodes use negative sequence numbers m Upon receipt of negative number, other nodes inform these nodes of current “up-to-date” sequence number r A older than B if m A < 0 and A < B m A > 0, A < B and (B – A) < N/4 m A > 0, A > B and (A – B) > N/4 0 -N/2 N/2 - 1

Network Layer Distance Vector Algorithm r D x (y) = estimate of least cost from x to y r Distance vector: D x = [D x (y): y є N ] r Node x knows cost to each neighbor v: c(x,v) r Node x maintains D x = [D x (y): y є N ] r Node x also maintains its neighbors’ distance vectors m For each neighbor v, x maintains D v = [D v (y): y є N ]

Network Layer x y z x y z ∞∞∞ ∞∞∞ from cost to from x y z x y z 0 from cost to x y z x y z ∞∞ ∞∞∞ cost to x y z x y z ∞∞∞ 710 cost to ∞ ∞ ∞ ∞ time x z y node x table node y table node z table D x (y) = min{c(x,y) + D y (y), c(x,z) + D z (y)} = min{2+0, 7+1} = 2 D x (z) = min{c(x,y) + D y (z), c(x,z) + D z (z)} = min{2+1, 7+0} = 3 32

Network Layer x y z x y z ∞∞∞ ∞∞∞ from cost to from x y z x y z from cost to x y z x y z from cost to x y z x y z ∞∞ ∞∞∞ cost to x y z x y z from cost to x y z x y z from cost to x y z x y z from cost to x y z x y z from cost to x y z x y z ∞∞∞ 710 cost to ∞ ∞ ∞ ∞ time x z y node x table node y table node z table D x (y) = min{c(x,y) + D y (y), c(x,z) + D z (y)} = min{2+0, 7+1} = 2 D x (z) = min{c(x,y) + D y (z), c(x,z) + D z (z)} = min{2+1, 7+0} = 3

Network Layer x y z x y z ∞∞∞ ∞∞∞ from cost to from x y z x y z from cost to x y z x y z from cost to x y z x y z ∞∞ ∞∞∞ cost to x y z x y z from cost to x y z x y z from cost to x y z x y z from cost to x y z x y z from cost to x y z x y z ∞∞∞ 710 cost to ∞ ∞ ∞ ∞ time x z y node x table node y table node z table D x (y) = min{c(x,y) + D y (y), c(x,z) + D z (y)} = min{2+0, 7+1} = 2 D x (z) = min{c(x,y) + D y (z), c(x,z) + D z (z)} = min{2+1, 7+0} = 3

Network Layer VC implementation A VC consists of: 1. Path from source to destination 2. VC numbers, one number for each link along path 3. Entries in forwarding tables in routers along path r Packet belonging to VC carries a VC number. r VC number must be changed on each link. m New VC number comes from forwarding table

Network Layer Forwarding table VC number interface number Incoming interface Incoming VC # Outgoing interface Outgoing VC # … … Forwarding table in northwest router: Routers maintain connection state information!

Network Layer Forwarding table Destination Address Range Link Interface through through through otherwise 3 4 billion possible entries

Network Layer RIP Table example (continued) Router: giroflee.eurocom.fr Three attached class C networks (LANs) Router only knows routes to attached LANs Default router used to “go up” Route multicast address: Loopback interface (for debugging) Destination Gateway Flags Ref Use Interface UH lo U 2 13 fa U le U 2 25 qaa U 3 0 le0 default UG

Network Layer4-261 DUAL r Distributed Update Algorithm m Garcia-Luna-Aceves 1989 m Goal: Avoid transient loops in DV and LS algorithms Similar in flavor to route poisoning and path holddown m 2 ideas A path shorter than current path cannot contain a loop Based on diffusing computation (Dijkstra-Scholten 1980) –Wait until computation completes before changing routes in response to a new update –Similar to path-holddown m 3 kinds of messages Update, query, reply m 2 states for routers Active (queries outstanding), passive

Network Layer DUAL On update if (lower cost) adopt else if (higher cost) { if (from next hop) { if (any path exists < old length from next hop) switch path else freeze route send query to all neighbors except next hop go into active wait for reply from all neighbors update route return to passive } send reply to all querying neighbors }

Network Layer Hierarchical routing example EGP IGP EGP IGP EGP

Network Layer Inter-AS routing r EGP r BGP

Network Layer BGP route selection r Router may learn about more than 1 route to some prefix. Router must select route. r Elimination rules: 1. Local preference value attribute: policy decision, hot potato routing 2. Shortest AS-PATH 3. Closest NEXT-HOP router 4. Additional criteria

Network Layer Path attributes & BGP routes r When advertising a prefix, advert includes BGP attributes. m prefix + attributes = “route” r Two important attributes: m AS-PATH: contains the ASs through which the advert for the prefix passed: AS 67 AS 17 m NEXT-HOP: Indicates the specific internal-AS router to next-hop AS. (There may be multiple links from current AS to next-hop-AS.) r When gateway router receives route advert, uses import policy to accept/decline.

Network Layer b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b Intra-AS Routing algorithm Inter-AS Routing algorithm Forwarding table 3c Interconnected ASes r Forwarding table is configured by both intra- and inter-AS routing algorithm m Intra-AS sets entries for internal dests m Inter-AS & Intra-As sets entries for external dests

Network Layer b 1d 3a 1c 2a AS3 AS1 AS2 1a 2c 2b 1b 3c Inter-AS tasks r Suppose router in AS1 receives datagram for which dest is outside of AS1 m Router should forward packet towards one of the gateway routers, but which one? AS1 needs: 1. to learn which dests are reachable through AS2 and which through AS3 2. to propagate this reachability info to all routers in AS1 Job of inter-AS routing!

Network Layer Example: Setting forwarding table in router 1d r Suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 (gateway 1c) but not from AS2. r Inter-AS protocol propagates reachability info to all internal routers. r Router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c. r Puts in forwarding table entry (x,I).

Network Layer Learn from inter-AS protocol that subnet x is reachable via multiple gateways Use routing info from intra-AS protocol to determine costs of least-cost paths to each of the gateways Hot potato routing: Choose the gateway that has the smallest least cost Determine from forwarding table the interface I that leads to least-cost gateway. Enter (x,I) in forwarding table Example: Choosing among multiple ASes r Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2. r To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x. r This is also the job on inter-AS routing protocol! r Hot potato routing: send packet towards closest of two routers.

Network Layer4-271 Distance Vector in Practice r RIP and RIP2 m Uses split-horizon/poison reverse r BGP m Propagates entire path m Path also used for effecting policies

Network Layer BGP path selection r router may learn about more than 1 route to some prefix. Router must select route. r elimination rules: 1. local preference value attribute: policy decision 2. shortest AS-PATH 3. closest NEXT-HOP router: hot potato routing 4. additional criteria

Network Layer Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer R1 R2 R3R4 source duplication R1 R2 R3R4 in-network duplication duplicate creation/transmission duplicate Broadcast Routing r deliver packets from source to all other nodes r source duplication is inefficient: r source duplication: how does source determine recipient addresses?

Network Layer In-network duplication r flooding: when node receives brdcst pckt, sends copy to all neighbors m Problems: cycles & broadcast storm r controlled flooding: node only brdcsts pkt if it hasn’t brdcst same packet before m Node keeps track of pckt ids already brdcsted m Or reverse path forwarding (RPF): only forward pckt if it arrived on shortest path between node and source r spanning tree m No redundant packets received by any node

Network Layer A B G D E c F A B G D E c F (a) Broadcast initiated at A (b) Broadcast initiated at D Spanning Tree r First construct a spanning tree r Nodes forward copies only along spanning tree

Network Layer A B G D E c F (a)Stepwise construction of spanning tree A B G D E c F (b) Constructed spanning tree Spanning Tree: Creation r Center node r Each node sends unicast join message to center node m Message forwarded until it arrives at a node already belonging to spanning tree

Multicast Routing: Problem Statement r Goal: find a tree (or trees) connecting routers having local mcast group members m tree: not all paths between routers used m source-based: different tree from each sender to rcvrs m shared-tree: same tree used by all group members Shared tree Source-based trees

Approaches for building mcast trees Approaches: r source-based tree: one tree per source m shortest path trees m reverse path forwarding r group-shared tree: group uses one tree m minimal spanning (Steiner) m center-based trees …we first look at basic approaches, then specific protocols adopting these approaches

Shortest Path Tree r mcast forwarding tree: tree of shortest path routes from source to all receivers m Dijkstra’s algorithm R1 R2 R3 R4 R5 R6 R i router with attached group member router with no attached group member link used for forwarding, i indicates order link added by algorithm LEGEND S: source

Reverse Path Forwarding if (mcast datagram received on incoming link on shortest path back to center) then flood datagram onto all outgoing links else ignore datagram  rely on router’s knowledge of unicast shortest path from it to sender  each router has simple forwarding behavior:

Reverse Path Forwarding: example result is a source-specific reverse SPT –may be a bad choice with asymmetric links R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member datagram will be forwarded LEGEND S: source datagram will not be forwarded

Reverse Path Forwarding: pruning r forwarding tree contains subtrees with no mcast group members m no need to forward datagrams down subtree m “prune” msgs sent upstream by router with no downstream group members R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member prune message LEGEND S: source links with multicast forwarding P P P

Shared-Tree: Steiner Tree r Steiner Tree: minimum cost tree connecting all routers with attached group members r problem is NP-complete r excellent heuristics exists r not used in practice: m computational complexity m information about entire network needed m monolithic: rerun whenever a router needs to join/leave

Center-based trees r single delivery tree shared by all r one router identified as “center” of tree r to join: m edge router sends unicast join-msg addressed to center router m join-msg “processed” by intermediate routers and forwarded towards center m join-msg either hits existing tree branch for this center, or arrives at center m path taken by join-msg becomes new branch of tree for this router

Center-based trees: an example Suppose R6 chosen as center: R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member path order in which join messages generated LEGEND

Internet Multicasting Routing: DVMRP r DVMRP: distance vector multicast routing protocol, RFC1075 r flood and prune: reverse path forwarding, source-based tree m RPF tree based on DVMRP’s own routing tables constructed by communicating DVMRP routers m no assumptions about underlying unicast m initial datagram to mcast group flooded everywhere via RPF m routers not wanting group: send upstream prune msgs

DVMRP: continued… r soft state: DVMRP router periodically (1 min.) “forgets” branches are pruned: m mcast data again flows down unpruned branch m downstream router: reprune or else continue to receive data r routers can quickly regraft to tree m following IGMP join at leaf r odds and ends m commonly implemented in commercial routers m Mbone routing done using DVMRP

Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers?  mcast datagram encapsulated inside “normal” (non-multicast- addressed) datagram  normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router  receiving mcast router unencapsulates to get mcast datagram physical topology logical topology

PIM: Protocol Independent Multicast r not dependent on any specific underlying unicast routing algorithm (works with all) r two different multicast distribution scenarios : Dense:  group members densely packed, in “close” proximity.  bandwidth more plentiful Sparse:  # networks with group members small wrt # interconnected networks  group members “widely dispersed”  bandwidth not plentiful

Consequences of Sparse-Dense Dichotomy: Dense r group membership by routers assumed until routers explicitly prune r data-driven construction on mcast tree (e.g., RPF) r bandwidth and non- group-router processing profligate Sparse : r no membership until routers explicitly join r receiver- driven construction of mcast tree (e.g., center-based) r bandwidth and non-group- router processing conservative

PIM- Dense Mode flood-and-prune RPF, similar to DVMRP but  underlying unicast protocol provides RPF info for incoming datagram  less complicated (less efficient) downstream flood than DVMRP reduces reliance on underlying routing algorithm  has protocol mechanism for router to detect it is a leaf-node router

PIM - Sparse Mode r center-based approach r router sends join msg to rendezvous point (RP) m intermediate routers update state and forward join r after joining via RP, router can switch to source-specific tree m increased performance: less concentration, shorter paths R1 R2 R3 R4 R5 R6 R7 join all data multicast from rendezvous point rendezvous point

PIM - Sparse Mode sender(s): r unicast data to RP, which distributes down RP-rooted tree r RP can extend mcast tree upstream to source r RP can send stop msg if no attached receivers m “no one is listening!” R1 R2 R3 R4 R5 R6 R7 join all data multicast from rendezvous point rendezvous point

Network Layer Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet Protocol m Datagram format m IPv4 addressing m ICMP m IPv6 r 4.5 Routing algorithms m Link state m Distance Vector m Hierarchical routing r 4.6 Routing in the Internet m RIP m OSPF m BGP r 4.7 Broadcast and multicast routing

Network Layer Router Architecture Overview Two key router functions: rRouting mDetermine route taken by packets from source to destination mRun protocol (RIP, OSPF, BGP) Generate forwarding table from routing algorithms Algorithms based on either (LS,DV) rForwarding mProcess of moving packets from input port to output port mLookup forwarding table given information in packet mSwitch/forward datagrams from incoming to outgoing link based on route

Network Layer What Does a Router Look Like? r Routing processor/controller m Handles routing protocols, error conditions r Line cards m Network interface cards r Forwarding engine m Fast path routing (hardware vs. software) r Backplane m Switch or bus interconnect

Network Layer Typical mode of operation r Packet arrives arrives at inbound line card r Header transferred to forwarding engine r Forwarding engine determines output interface given a table initialized by routing processor r Forwarding engine signals result to line card r Packet copied to outbound line card

Network Layer Routing Processor r Runs routing protocol r Uploads forwarding table to forwarding engines m Forwarding engines with two forwarding tables to allow easy switchover (double buffering) r Typically performs “slow-path” processing m ICMP error messages m IP option processing m IP fragmentation m IP multicast packets

Network Layer Input Port Functions Decentralized switching: r given datagram dest., lookup output port using forwarding table in input port memory r goal: complete input port processing at ‘line speed’ r queuing: if datagrams arrive faster than forwarding rate into switch fabric Physical layer: bit-level reception Data link layer: e.g., Ethernet see chapter 5

Network Layer4-301 Input Port Queuing r Fabric slower than input ports combined => queuing may occur at input queues r Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward r queueing delay and loss due to input buffer overflow!

Network Layer Input Port Queuing r Possible solution m Virtual output buffering Maintain per output buffer at input Solves head of line blocking problem Each of MxN input buffer places bid for output

Network Layer Forwarding Engine r Two major components m Lookup logic/software Data structures and algorithms to lookup route table See previous section on IP route lookup m Caches Small, fast memory storing recent lookups m Alternatives Hardware-support Hints

Network Layer Caches r Leverage temporal locality r Many packets to same destination m Long flows help, short flows do not r Similar to idea behind IP switching (ATM/MPLS) where long-lived flows map into single label r Example m Partridge, et. al. “A 50-Gb/s IP Router”, IEEE Trans. On Networking, Vol 6, No 3, June m 8KB L1 Icache Holds full forwarding code m 96KB L2 cache Forwarding table cache m 16MB L3 cache Full forwarding table x 2 - double buffered for updates

Network Layer Alternatives r Lookup via content addressable memory (CAM) m Hardware based route lookup m Input = tag, output = value associated with tag m Requires exact match with tag Multiple cycles (1 per prefix length searched) with single CAM Multiple CAMs (1 per prefix) searched in parallel m Ternary CAM 0,1,don’t care values in tag match Priority (i.e. longest prefix) by order of entries in CAM r “Spatial caching” via protocol acceleration m Add clue (5 bits) to IP header m Indicate where IP lookup ended on previous node (Bremler-Barr SIGCOMM 99)

Network Layer Types of network switching fabrics Memory Bus Multistage interconnection Crossbar interconnection

Network Layer Types of network switching fabrics r Issues m Switch contention Packets arrive faster than switching fabric can switch Speed of switching fabric versus line card speed determines input queuing vs. output queuing

Network Layer Switching Via Memory First generation routers: r packet copied by system’s (single) CPU r 2 bus crossings per datagram r speed limited by memory bandwidth Second generation routers: r input port processor performs lookup, copy into memory r Cisco Catalyst 8500 Input Port Output Port Memory System Bus

Network Layer Switching Via Bus r Datagram from input port memory directly to output port memory via a shared bus r Issues m Bus contention: switching speed limited by bus bandwidth r Examples m 1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not regional or backbone)

Network Layer4-310 Switching Via An Interconnection Network r Overcome bus bandwidth limitations r Crossbar networks m Fully connected (n 2 elements) m All one-to-one, invertible permutations supported r Issues m Crossbar with N 2 elements hard to scale

Network Layer4-311 Switching Via An Interconnection Network r Multi-stage interconnection networks (Banyan) m Initially developed to connect processors in multiprocessor m Typically O(n log n) elements m Datagram fragmented fixed length cells, switched through the fabric r Issues m Blocking (not all one-to-one, invertible permutations supported) r Example m Cisco 12000: Gbps through an interconnection network A B C D W X Y Z

Network Layer4-312 Output Ports r Output contention m Datagrams arrive from fabric faster than output port’s transmission rate m Buffering required m Scheduling discipline chooses among queued datagrams for transmission

Network Layer4-313 Output port queueing r buffering when arrival rate via switch exceeds ouput line speed r queueing (delay) and loss due to output port buffer overflow!