Chapter 3 Internetworking.

Slides:

Advertisements

Similar presentations

CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-

Advertisements

COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.

20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

IPv4 - The Internet Protocol Version 4

1 o Two issues in practice – Scale – Administrative autonomy o Autonomous system (AS) or region o Intra autonomous system routing protocol o Gateway routers.

Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.

CECS 474 Computer Network Interoperability Notes for Douglas E. Comer, Computer Networks and Internets (5 th Edition) Tracy Bradley Maples, Ph.D. Computer.

1 Switching and Forwarding Bridges and Extended LANs.

Chapter 5 The Network Layer.

1 Chapter 3 Switching and Forwarding Outline 3.1 Switching and Forwarding 3.2 Bridges and LAN Switches 3.3 Cell Switching (ATM) 3.4 Implementation and.

1 Internetworking Outline Best Effort Service Model Global Addressing Scheme.

Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn

CSE331: Introduction to Networks and Security Lecture 7 Fall 2002.

CS Internetworking Slide Set 8. In this set... Addressing Datagram forwarding.

1 ELEN Lecture 13 LAN Bridges Routers, Switches, Gateways Network layer -IP Reading: 6.7,

Spring 2002CS 4611 Internetworking Outline Best Effort Service Model Global Addressing Scheme.

Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.

Institute of Technology Sligo - Dept of Computing Chapter 11 Layer 3 Protocols Paul Flynn.

1 Switching and Forwarding Bridges and Extended LANs.

Network Layer (Part IV). Overview A router is a type of internetworking device that passes data packets between networks based on Layer 3 addresses. A.

26-Aug-154/598N: Computer Networks Recap SBC UUNET Comcast Sprint End Users Internet First mile problem Last mile problem.

Fall 2006Computer Networks19-1 Chapter 19. Host-to-Host Delivery: Internetworking, Addressing, and Routing 19.1 Internetworks 19.2 Addressing 19.3 Routing.

TELE202 Lecture 10 Internet Protocols (2) 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »Internet Protocols (1) »Source: chapter 15 ¥This Lecture »Internet.

Network Layer4-1 NAT: Network Address Translation local network (e.g., home network) /24 rest of.

13-Sep-154/598N: Computer Networks Address Translation Map IP addresses into physical addresses –destination host –next hop router Techniques –encode physical.

M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.

Section 4 : The OSI Network Layer CSIS 479R Fall 1999 “Network +” George D. Hickman, CNI, CNE.

Cisco – Chapter 11 Routers All You Ever Wanted To Know But Were Afraid to Ask.

Fall 2010CMSC 1501 Internetworking. Fall 2010CMSC 1502 Assumptions Data pipe from every machine to every other machine. –Need not be single link (and.

1 Internet Protocol. 2 Connectionless Network Layers Destination, source, hop count Maybe other stuff –fragmentation –options (e.g., source routing) –error.

Network Layer Last Update Copyright Kenneth M. Chipps Ph.D.

7-1 Last time □ Wireless link-layer ♦ Introduction Wireless hosts, base stations, wireless links ♦ Characteristics of wireless links Signal strength, interference,

1 CSE3213 Computer Network I Network Layer (7.1, 7.3, ) Course page: Slides modified from Alberto Leon-Garcia.

10/13/2015© 2008 Raymond P. Jefferis IIILect 07 1 Internet Protocol.

20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Fall 2005Computer Networks20-1 Chapter 20. Network Layer Protocols: ARP, IPv4, ICMPv4, IPv6, and ICMPv ARP 20.2 IP 20.3 ICMP 20.4 IPv6.

Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note on the use of these.

Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.

20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.

1 Network Layer Lecture 16 Imran Ahmed University of Management & Technology.

Transport Layer3-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.

Internet Protocols (chapter 18) CSE 3213 Fall 2011.

Routing and Routing Protocols

1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Dynamic Host Configuration Protocol (DHCP)

CSC 600 Internetworking with TCP/IP Unit 5: IP, IP Routing, and ICMP (ch. 7, ch. 8, ch. 9, ch. 10) Dr. Cheer-Sun Yang Spring 2001.

1 Chapter 3 Internetworking Computer Networks: A Systems Approach, 5e Larry L. Peterson and Bruce S. Davie Copyright © 2010, Elsevier Inc. All rights Reserved.

1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 Module 10 Routing Fundamentals and Subnets.

Spring 2000CS 4611 Routing Outline Algorithms Scalability.

1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.

1 Internetworking: IP Packet Switching Reading: (except Implementation; pp )

1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.

IP Protocol CSE TCP/IP Concepts Connectionless Operation Internetworking involves connectionless operation at the level of the Internet Protocol.

1 COMP 431 Internet Services & Protocols The IP Internet Protocol Jasleen Kaur April 21, 2016.

Lecture 13 IP V4 & IP V6. Figure Protocols at network layer.

1 Packet Switching Outline Switching and Forwarding Bridges and Extended LANs.

Behrouz A. Forouzan TCP/IP Protocol Suite, 3rd Ed.

Scaling the Network Chapters 3-4 Part 2

Advanced Computer Networks

Scaling the Network: The Internet Protocol

Packet Switching Outline Store-and-Forward Switches

Routing Information Protocol (RIP)

Switching and Forwarding Bridges and Extended LANs

CS 457 – Lecture 10 Internetworking and IP

Internetworking Outline Best Effort Service Model

Chapter 20 Network Layer: Internet Protocol

Net 323 D: Networks Protocols

Network Layer: Control/data plane, addressing, routers

Scaling the Network: The Internet Protocol

Packet Switching Outline Store-and-Forward Switches

Presentation transcript:

Chapter 3 Internetworking

Problems In Chapter 2 we saw how to connect one node to another, or to an existing network. How do we build networks of global scale? How do we interconnect different types of networks to build a large global network?

Chapter Outline 3.1 Switching and Bridging 3.2 Basic Interworking (IP) 3.3 Routing 3.4 Implementation and Performance

Two limitations on the directly connected networks limit on how many hosts can be attached, examples only two hosts can be attached to a point-to-point link the Ethernet specification allows no more than 1,024 hosts

limit on how large of a geographic area a single network can serve, examples an Ethernet can span only 2,500 m wireless networks are limited by the ranges of their radios point-to-point links can be quite long

Goal build networks that can be global in scale Problem how to enable communication between hosts that are not directly connected Solution computer networks use packet switches to enable packets to travel from one host to another, even when no direct connection exists between those hosts

Packet switch a device with several inputs and outputs leading to and from the hosts that the switch interconnects Core job of a switch take packets that arrive on an input and forward (or switch) them to the right output so that they will reach their appropriate destination

A key problem that a switch must deal with is the finite bandwidth of its outputs if packets destined for a certain output arrive at a switch and their arrival rate exceeds the capacity of that output, then we have a problem of contention the switch queues (buffers) packets until the contention subsides, but if it lasts too long, the switch will run out of buffer space and be forced to discard packets when packets are discarded too frequently, the switch is said to be congested

3.1 Switching and Bridging a multi-input, multi-output device, which transfers packets from an input to one or more outputs star topology switched networks are more scalable (i.e., growing to large numbers of nodes) than shared-media networks because of the ability to support many hosts at full speed

A switch provides a star topology

Scalable Networks The figure shows the protocol graph that would run on a switch that is connected to two T3 links and one STS-1 SONET link Example protocol graph running on a switch

A switch forwards packets from input port to output port Port selected based on address in packet header Advantages cover large geographic area (tolerate latency) support large numbers of hosts (scalable bandwidth)

Example switch with three input and output ports

How does the switch decide on which output port to place each packets? general answer it looks at the header of the packet for an identifier that it uses to make the decision three common approaches datagram (or connectionless) approach virtual circuit (or connection-oriented approach) source routing

3.1.1 Datagram Sometimes called connectionless model Analogy: postal system No connection setup phase no round trip delay waiting for connection setup a host can send data as soon as it is ready

Each packet is forwarded independently of previous packets that might have been sent to the same destination two successive packets from host A to host B may follow completely different paths (perhaps because of a change in the forwarding table at some switch in the network)

A switch or link failure might not have any serious effect on communication if it is possible to find an alternate route around the failure and to update the forwarding table accordingly Since every packet must carry the full address of the destination, the overhead per packet is higher than for the connection-oriented model

Source host has no way of knowing if the network is capable of delivering a packet or if the destination host is even up and running Each switch maintains a forwarding (routing) table

Example the hosts have addresses A, B, C, and so on a switch consults a forwarding table (routing table) to decide how to forward a packet

Datagram forwarding: an example network

The table shows the forwarding information that switch 2 needs to forward datagrams Destination Port A 3 B C D E 2 F 1 G H

3.1.2 Virtual Circuit Switching Sometimes called connection-oriented model Analogy: phone call Explicit connection setup (and tear-down) phase it requires that a virtual connection from the source host to the destination host is set up before any data is sent Typically wait full RTT (Round Trip Time) for connection setup before sending first data packet

If a switch or a link in a connection fails the connection is broken and a new one needs to be established Subsequence packets follow same circuit Each switch maintains a Virtual Circuit (VC) table

Entry in the VC table on a single switch contains a virtual circuit identifier (VCI) uniquely identifies the connection at this switch which will be carried inside the header of the packets that belong to this connection

Incoming Interface Incoming VCI Outgoing Interface Outgoing VCI 2 5 1 11 Incoming Interface Incoming VCI Outgoing Interface Outgoing VCI 3 11 2 7 Incoming Interface Incoming VCI Outgoing Interface Outgoing VCI 7 1 4

an incoming interface on which packets for this VC arrive at the switch an outgoing interface in which packets for this VC leave the switch a potentially different VCI that will be used for outgoing packets

Two classes of approaches to establish connection state Permanent Virtual Circuit (PVC) Switched Virtual Circuit (SVC)

Permanent Virtual Circuit (PVC) administrator configures the state, in which case the virtual circuit is “permanent” administrator can also delete the state, so a permanent virtual circuit (PVC) might be thought of as a long-lived, or administratively configured VC

Switched Virtual Circuit (SVC) a host may set up and delete a VC by sending messages without the involvement of a network administrator this is referred to as signaling, and the resulting virtual circuits are said to be switched an SVC should more accurately be called a “signaled” VC, since it uses signaling (not switching) to distinguish an SVC from a PVC

Example assume that a network administrator wants to manually create a new virtual connection from host A to host B two-stage process connection setup data transfer

An example of a virtual circuit network (11) (7) (5) (4) An example of a virtual circuit network

The administrator picks a VCI value that is currently unused on each link for the connection suppose VCI = 5, the link from host A to switch 1 VCI = 11, the link from switch 1 to switch 2 VCI = 7, the link from switch 2 to switch 3 VCI = 4, the link from switch 3 to host B

VC table entry at switch 1 Incoming Interface Incoming VCI Outgoing Interface Outgoing VCI 2 5 1 11 VC table entry at switch 1 Incoming Interface Incoming VCI Outgoing Interface Outgoing VCI 3 11 2 7 VC table entry at switch 2 Incoming Interface Incoming VCI Outgoing Interface Outgoing VCI 7 1 4 VC table entry at switch 3

A packet is sent into a virtual circuit network

A packet makes its way through a virtual circuit network

Hop-by-hop flow control each node is ensured of having the buffers it needs to queue the packets that arrive on that circuit example, an X.25 network-a packet-switched network that uses the connection-oriented model

X.25 network employs the following three-part strategy buffers are allocated to each virtual circuit when the circuit is initialized the sliding window protocol is run between each pair of nodes along the virtual circuit, and this protocol is augmented with flow control to keep the sending node from overrunning the buffers allocated at the receiving node

the circuit is rejected by a given node if not enough buffers are available at that node when the connection request message is processed

Examples of virtual circuit technologies Asynchronous Transfer Mode (ATM) Frame Relay, e.g., Virtual Private Network (VPN) Frame Relay operates only at the physical and data link layers

ATM Cell Formats Two different cell formats User-Network Interface (UNI) format host-to-switch format interface between a telephone company and one of its customers Network-Network Interface (NNI) format switch-to-switch format interface between a pair of telephone companies

Architecture of an ATM network

User-Network Interface (UNI) GFC (4 bits): Generic Flow Control VPI (8 bits): Virtual Path Identifier VCI (16 bits): Virtual Circuit Identifier Type (3 bits): management, congestion control, AAL5 CLP (1 bit): Cell Loss Priority HEC (8 bits): Header Error Check (CRC-8) Network-Network Interface (NNI) GFC becomes part of VPI field (no GFC and becomes 12-bit VPI)

ATM cell format at the UNI

ATM Headers

ATM Virtual Path ATM uses a 24-bit identifier for vircuit circuits 8-bit virtual path identifier (VPI) 16-bit virtual circuit identifier (VCI)

Example a corporation has two sites that connect to a public ATM network, and that at each site the corporation has a network of ATM switches we could establish a virtual path between two sites using only the VPI field within the corporate sites, however, the full 24-bit space is used for switching

Example of a virtual path

Advantage of virtual path although there may be thousands or millions of virtual connections across the public network, the switches in the public network behave as if there is only one connection there needs to be much less connection-state information stored in the switches, avoiding the need for big, expensive tables of per-VCI information

TP、VPs、and VCs

Example of VPs and VCs

Connection Identifiers

Virtual Connection Identifiers in UNIs and NNIs

ATM Cell

Routing with a Switch

3.1.3 Source Routing Neither virtual circuits nor conventional datagrams All the information about network topology that is required to switch a packet across the network is provided by the source host

Various ways to implement source routing method1 put an ordered list of switch ports in the header and to rotate the list so that the next switch in the path is always at the front of the list for each packet that arrives on an input, the switch would read the port number in the header and transmit the packet on that output

Source routing in a switched network (where the switch reads the rightmost number)

method2 example, rather than rotate the header, each switch just strip the first element as it uses it method3 have the header carry a pointer to the current “next port” entry, so that each switch just updates the pointer rather than rotating the header

and (c) pointer. The labels are read right to left Three ways to handle headers for source routing: (a) rotation, (b) stripping, and (c) pointer. The labels are read right to left

3.1.4 Bridges and LAN Switches LANs have physical limitations (e.g., 2500m) Bridge connect two or more LANs Extended LAN a collection of LANs connected by one or more bridges accept and forward strategy (accept all frames transmitted on either of the Ethernets, so it could forward them to the other)

Learning Bridges Do not forward when unnecessary whenever a frame from host A that is addressed to host B arrives on port 1, there is no need for the bridge to forward the frame out over port 2

Illustration of a learning bridge

Host Port A 1 B C X 2 Y Z How does a bridge come to learn on which port the various hosts reside? each bridge inspects the source address in all the frames it receives when host A sends a frame to a host on either side of the bridge, the bridge receives this frame and records the fact that a frame from host A was just received on port 1 in this way, the bridge can build a table just like the following table

Host Port A 1 B C X 2 Y Z

Spanning Tree Algorithm Problem: extended LAN has a loop in it frames potentially loop through the extended LAN forever example bridges B1, B4, and B6 form a loop

Extended LAN with loops

Solution: bridges run a distributed spanning tree algorithm spanning tree is a subgraph of a graph that covers (spans) all the vertices, but contains no cycles

Example of (a) a cyclic graph; (b) a corresponding spanning tree

Spanning tree algorithm (developed by Radia Perlman) each bridge has a unique identifier (e.g., B1, B2, B3) the algorithm first elects the bridge with the smallest ID as the root of the spanning tree the root bridge always forwards frames out over all of its ports

each bridge computes the shortest path to the root and notes which of its ports is on this path this port is selected as the bridge’s preferred path to the root

finally, all the bridges connected to a given LAN elect a single designated bridge that will be responsible for forwarding frames toward the root bridge each LAN’s designated bridge is the one that is closest to the root, and if two or more bridges are equally close to the root, then the bridges’ identifiers with the smallest ID wins

Spanning tree with some ports not selected

Bridges have to exchange configuration messages with each other and then decide whether or not they are the root or a designated bridge based on these messages configuration messages contain the ID for the bridge that is sending the message the ID for what the sending bridge believes to be the root bridge the distance, measured in hops, from the sending bridge to the root bridge

each bridge records current best configuration message for each port initially, each bridge believes it is the root when learn not root, stop generating config messages in steady state, only root generates configuration messages when learn not designated bridge, stop forwarding config messages in steady state, only designated bridges forward config messages

root continues to periodically send config messages if any bridge does not receive config message after a period of time, it starts generating config messages claiming to be the root upon receiving a config message over a particular port the bridge checks to see if that new message is better than the current best configuration message recorded for that

the new configuration message is considered “better” than the currently recorded information if it identifies a root with a smaller ID or it identifies a root with an equal ID but with a shorter distance or the root ID and distance are equal, but the sending bridge has a smaller ID

Sequence of events assume all the bridges boot at about the same time and all the bridges would start off by claiming to be the root (Y, d, X) denotes a configuration message from node X in which it claims to be distance d from root node Y

Sequence of events on the activity at node B3 B3 receives (B2, 0, B2) since 2 < 3, B3 accepts B2 as root [(B2, 1, B3)] B3 adds one to the distance advertised by B2 (0) and thus sends (B2, 1, B3) toward B5 [(B2, 1, B3), (B2, 2, B5)] meanwhile, B2 accepts B1 as root because it has the lower ID, and it sends (B1, 1, B2) toward B3 [(B1, 1, B2), (B1, 2, B3)]

B5 accepts B1 as root and sends (B1, 1, B5) toward B3 [(B1, 1, B5), (B1, 2, B3)] B3 accepts B1 as root, and it notes that both B2 and B5 are closer to the root than it is [(B1, 2, B3), (B1, 1, B2), (B1, 1, B5)] B3 stops forwarding messages on both its interfaces (this leaves B3 with both ports not selected) [(B1, 1, B2), (B1, 1, B5)]

Spanning tree with some ports not selected (1) (5b) (6) (2) (7) (4b) (3) (4a) (5a) Spanning tree with some ports not selected

Broadcast and Multicast Since most LANs support both broadcast and multicast, then bridges must also support these two features Broadcast each bridge forwards a frame with a destination broadcast address out on each active (selected) port other than the one on which the frame was received Multicast implemented in exactly the same way, with each host deciding itself whether or not to accept the message

Limitations of Bridges Do not scale Do not accommodate heterogeneity

Do not Scale It is not realistic to connect more than a few (tens of) LANs by means of bridges the spanning tree algorithm scales linearly, i.e., there is no provision for imposing a hierarchy on the extended LAN bridges forward all broadcast frames and broadcast does not scale

Virtual LAN (VLAN) used to increase the scalability of extended LANs allows a single extended LAN to be partitioned into several seemingly separate LANs each virtual LAN is assigned an identifier (sometimes called a color), and packets can only travel from one segment to another if both segments have the same identifier this limits the number of segments in an extended LAN that will receive any given broadcast packet

Example four hosts (W, X, Y, Z) on four different LAN segments in the absence of VLANs, any broadcast packet from any host will reach all the other hosts suppose that we define the segments connected to hosts W and X as being in one LAN, VLAN 100 also define the segments that connect to hosts Y and Z as being in VLAN 200 to do his, we need to configure a VLAN ID on each port of bridges B1 and B2 the link between B1 and B2 is considered to be in both VLANs

Two virtual LANs share a common backbone

When a packet sent by host X arrives at bridge B2 the bridge observes that it came in a port that was configured as being in VLAN 100 it inserts a VLAN header between the Ethernet header and its payload the bridge applies normal rules for forwarding to the packet, with the extra restriction that the packet may not be sent out an interface that is not part of VLAN 100 thus, even a broadcast packet can’t be sent out the interface to host Z, which is in VLAN 200 Ethernet header VLAN header Payload

An attractive feature of VLANs it is possible to change the logical topology without moving any wires or changing any addresses example if we want to make the segment that connects to host Z be part of VLAN 100, and thus enable X, W and Z be on the same virtual LAN, we would just need to change one piece of configuration on bridge B2

Do not Accommodate Heterogeneity Bridges are fairly limited in the kinds of networks they can interconnect Bridges make use of the networks frame header and so can support only networks that have exactly the same format for addresses Bridges can be used to connect Ethernets to Ethernets, 802.5 (Token Ring) to 802.5, and Ethernets to 802.5 rings, since both networks support the same 48-bit address format Bridges do not readily generalize to other kinds of networks, such as ATM

3.2 Basic Internetworking (IP) 3.2.1 What is an Internework? 3.2.2 Service Model 3.2.3 Global Addresses 3.2.4 Datagram Forwarding in IP 3.2.5 Subnetting and Classless Addressing 3.2.6 Address Translation (ARP) 3.2.7 Host Configuration (DHCP) 3.2.8 Error Reporting (ICMP) 3.2.9 Virtual Networks and Tunnels

3.2.1 What is an Internework? Concatenation of networks A simple internetwork. Hn =host, Rn = router

An internetwork is a network of networks in the figure, we see Ethernets, an FDDI ring, and a point-to-point link each of these is a single-technology network the nodes that interconnect the networks are called routers (sometimes called gateways) The following figure shows how H1 and H8 are logically connected by the internet, including the protocol graph running on each node

A simple internetwork of protocol stack Protocol layers used to connect H1 to H8. ETH: the protocol that runs over Ethernet.

3.2.2 Service Model Service model for an internetwork a host-to-host service only if this service can somehow be provided over each of the underlying physical networks IP service model has two parts addressing scheme provides a way to identify all hosts in the internetwork datagram (conectionless) model of data delivery This service model is sometimes called best effort although IP makes every effort to deliver datagrams, it makes no guarantees

Datagram a type of packet sent in a connectionless manner over a network every datagram carry enough information to let the network forward the packet to its correct destination no need for any advance setup mechanism to tell the network what to do when the packet arrives

Best-effort delivery (unreliable service) if something goes wrong and has the following situations packets are lost packets are delivered out of order duplicate copies of a packet are delivered packets can be delayed for a long time the network does not make any attempt to recover from the failure

Datagram format

Datagram format a succession of 32-bit words the top word is transmitted first the leftmost byte of each word is transmitted first

1st word of the header Version: the version of IP the current version of IP is 4 (IPv4) HLen: the length of the header in 32-bit words most of the time, the header is 5 words (20 bytes) long

TOS: the 8-bit type of service allow packets to be treated differently based on application needs example, the TOS value might determine whether or not a packet should be placed in a special queue that receives low delay

Length: 16 bits of the header contain the length of the datagram, including the header the field counts bytes rather than words the maximum size of an IP datagram is 65,535 bytes the physical network over which IP is running may not support such long packets IP supports a fragmentation and reassembly process

2nd word of the header contains information about fragmentation Offset: 12-bit counts 8-byte chunk, not bytes the distance (number of chunks) between the start of the original data and the start of the current fragment

3rd word of the header TTL: one-byte time to live a specific number of seconds that the packet would be allowed to live routers along the path would decrement this field until it reached 0 Protocol: one-byte demultiplexing key identifies the higher-level protocol to which this IP packet should be passed values defined for TCP (6), UDP (17)

Checksum: calculated by considering the entire IP header as a sequence of 16-bit words adding them up using ones complement arithmetic, and taking the ones complement of the result

the fourth word of the header: SourceAddr the fifth word of the header: DestinationAddr there may be a number of options at the end of the header the presence or absence of options may be determined by examining the header length (HLen) field

Fragmentation and Reassembly Each network technology tends to have its own idea of how large a packet can be, example, Ethernet can accept packets up to 1,500 bytes long FDDI packets may be 4,500 bytes long Every network type has a maximum transmission unit (MTU) the largest IP datagram that it can carry in a frame this value is smaller than the largest packet size on that network because the IP datagram needs to fit in the payload of the link-layer frame

Fragmentation typically occurs when necessary (MTU < Datagram) to enable these fragments to be reassembled at the receiving host, they all carry the same identifier in the Ident field this identifier is chosen by the sending host and is intended to be unique among all the datagrams that might arrive at the destination from this source over some reasonable time period

since all fragments of the original datagram contain this identifier, the reassembling host will be able to recognize those fragments that go together should all the fragments not arrive at the receiving host, the host gives up on the reassembly process and discards the fragments that did arrive IP does not attempt to recover from missing fragments

example consider what happens when host Hl sends a datagram to host H8 assuming that the MTU is 1,500 bytes for the two Ethernets, 4,500 bytes for the FDDI network, and 532 bytes for the point-to-point network a 1,420-byte datagram (20-byte IP header plus 1,400 bytes of data) sent from H1 makes it across the first Ethernet and the FDDI network without fragmentation but must be fragmented into three datagrams at router R2 these three fragments are then forwarded by router R3 across the second Ethernet to the destination host

1500 532 1500 4500

IP datagrams traversing the sequence of physical networks

each fragment is itself a self-contained IP datagram that is transmitted over a sequence of physical networks, independent of the other fragments each IP datagram is reencapsulated for each physical network over which it travels

(a) (b) Header fields used in IP fragmentation: (a) unfragmented packet; (b) fragmented packets.

The unfragmented packet has 1,400 bytes of data and a 20-byte IP header when the packet arrives at router R2, which has an MTU of 532 bytes, it has to be fragmented a 532-byte MTU leaves 512 bytes for data after the 20-byte IP header, so the first fragment contains 512 bytes of data the router sets the M bit in the Flags field, meaning that there are more fragments to follow it sets the Offset to 0, since this fragment contains the first part of the original datagram

the data carried in the second fragment starts with the 513th byte of the original data, so the field in this header is set to 64 (= 512/8) the third fragment contains the last 376 bytes of data, and the offset is now 2 × 512 / 8 = 128 (since this is the last fragment, the M bit is not set)

3.2.3 Global Addresses Ethernet addresses are globally unique that alone does not suffice for an addressing scheme in a large internetwork Ethernet addresses are also flat they have no structure and provide very few clues to routing protocols

IP addresses are hierarchical made up of two parts that correspond to some sort of hierarchy in the internetwork network part identifies the network to which the host is attached all hosts attached to the same network have the same network part host part identifies each host uniquely on that particular network

example 1 the addresses of the hosts on network 1 would all have the same network part and different host parts example 2 the routers are attached to two networks they need to have an address on each network, one for each interface, e.g., router Rl an IP address on the interface to network 2 that has the same network part as the hosts on network 2 an IP address on the interface to network 3 that has the same network part as the hosts on network 3 IP addresses belong to interfaces than to hosts

IP addresses are divided into three different classes each of the following figure defines different-sized network and host parts there are also class D addresses specify a multicast group, and class E addresses that are currently unused in all cases, the address is 32 bits long

IP addresses: (a) class A; (b) class B; (c) class C Network Host 7 24 A: 14 16 1 B: 21 8 C: IP addresses: (a) class A; (b) class B; (c) class C

the class of an IP address is identified in the most significant few bits if the first bit is 0, it is a class A address if the first bit is 1 and the second is 0, it is a class B if the first two bits are 1 and the third is 0, it is a class C address of the approximately 4 billion (= 232)possible IP addresses one-half are class A one-quarter are class B one-eighth are class C

7 bits for the network part and 24 bits for the host part Class A addresses 7 bits for the network part and 24 bits for the host part 126 (= 27-2) class A networks (0 and 127 are reserved) each network can accommodate up to 224-2 (about 16 million) hosts (again, two are reserved values) Class B addresses 14 bits for the network part and 16 bits for the host part 65,534 (= 216-2) hosts

Class C addresses 21 bits for the network part and 8 bits for the host part 2,097,152 (= 22l) class C networks 254 hosts (host identifier 255 is reserved for broadcast, and 0 is not a valid host number)

IP addresses are written as four decimal integers separated by dots each integer represents the decimal value contained in 1 byte (= 0~255) of the address, starting at the most significant Example, 171.69.210.245 Internet domain names (DNS) also hierarchical domain names tend to be ASCII strings separated by dots, e.g., cs.nccu.edu.tw

3.2.4 Datagram Forwarding in IP the process of taking packet from an input and sending it out on the appropriate output Routing the process of building up the tables that allow the correct output for a packet to be determined

Strategy every datagram contains destination’s address if connected to destination network then forward to host if not directly connected then forward to some router forwarding table maps network number (NetworkNum) into next hop (NextHop) each host has a default router each router maintains a forwarding table

Datagram forwarding algorithm if (NetworkNum of destination = NetworkNum of one of my interfaces) then deliver packet to destination over that interface else if (NetworkNum of destination is in my forwarding table) then deliver packet to NextHop route deliver packet to default router

(simplified algorithm) if (NetworkNum of destination = my NetworkNum) then deliver packet to destination directly else deliver packet to default router

Example1 suppose H1 wants to send a datagram to H2 since they are on the same physical network, H1 and H2 have the same network number in their IP address H1 deduces that it can deliver the datagram directly to H2 over the Ethernet the one that needs to be resolved is how Hl finds out the correct Ethernet address for H2

Example2 suppose H1 wants to send a datagram to H8 since they are on different physical networks H1 deduces that it needs to send the datagram to a router Hl sends the datagram over the Ethernet to R1 R1 knows that it cannot deliver a datagram directly to H8 because neither of Rl’s interfaces is on the same network as H8

suppose R1’s default router is R2; R1 then sends the datagram to R2 over the token ring network assume R2 has the forwarding table shown as follows, it looks up H8’s network number (network 1) and forwards the datagram to R3

Network Number Next Hop 1 R3 2 R1 3 Interface 1 4 Interface 0 Forwarding table for router R2

R3 forwards the datagram directly to H8 it is possible to include the information about directly connected networks in the forwarding table example, we could label the network interfaces of router R2 as interface 0 for the point-to-point link (network 4) and interface l for the token ring (network 3) Network Number Next Hop 1 R3 2 R1 3 Interface 1 4 Interface 0 1

3.2.5 Subnetting and Classless Addressing Subnetting deals with address space utilization Original intent of IP addresses the network part would uniquely identify exactly one physical network Problem of address assignment : inefficiency class C with 2 hosts (2/255 = 0.78% efficiency) class B with 256 hosts (256/65535 = 0.39% efficiency)

Subnet add another level to address / routing hierarchy reduce the total number of network numbers that are assigned idea take a single IP network number and allocate the IP addresses with that network number to several physical networks a perfect use of subnetting is a large campus or corporation that has many physical networks

Subnet mask define variable partition of host part a single network number can be shared among multiple networks involves configuring all the nodes on each subnet with a subnet mask

subnet mask enables a subnet number hosts may be on different physical networks but share a single network number example, to share a single class B address among several physical networks, we could use a subnet mask of 255.255.255.0 (all 1s in the upper 24 bits and 0s in the lower 8 bits) the top 24 bits are network number the lower 8 bits are host number the top 16 bits identify the network in a class B address

three parts address network part (16 bits) subnet part (8 bits) host part (8 bits)

Subnetted Address

Subnet Example Subnet mask: 255.255.255.128 Subnet number: 128.96.34.0 128.96.34.15 128.96.34.1 H1 R1 128.96.34.130 Subnet number: 128.96.34.128 128.96.34.129 128.96.34.139 R2 H2 128.96.33.1 128.96.33.14 Subnet mask: 255.255.255.0 Subnet number: 128.96.33.0 H3

Exactly one subnet mask per subnet H1 IP address: 128.96.34.15 subnet mask: 255.255.255.128 subnet number: 128.96.34.0 Defines the subnet number of the host and of all other hosts on the same subnet take bitwise AND of “IP address” and “subnet mask” example, 128.96.34.15 AND 255.255.255.128 equals 128.96.34.0

When a host wants to send a packet to a certain IP address perform a bitwise AND of its own subnet mask and the destination IP address if the result equals the subnet number of the sending host the destination host is on the same subnet and the packet can be delivered directly over the subnet

if the results are not equal the packet needs to be sent to a router to be forwarded to another subnet example, if H1 is sending to H2, then H1 ANDs its subnet mask (255.255.255.128) with the address for H2 (128.96.34.139) to obtain 128.96.34.128 128.96.34.128 does not match the subnet number for H1 (128.96.34.0), so H1 and H2 are on different subnets H1 has to send packet to its default router R1 then to H2

Router with/without subnetting simple IP entries of forwarding tables is of the form (NetworkNum, NextHop) support subnetting entries of forwarding tables is of the form (SubnetNumber, SubnetMask, NextHop)

find the right entry in the table the router ANDs the packet's destination address with the SubnetMask for each entry in turn if the result matches the SubnetNumber of the entry, then this is the right entry to use it forwards the packet to the next hop router indicated router Rl of the “subnet example” would have the following entries

continuing with the example, a datagram from H1 being sent to H2 Rl would AND H2's address (128.96.34.139) with the subnet mask of the first entry (255.255.255.128) compare the result (128.96.34.128) with the network number for that entry (128.96.34.0) since this is not a match (the first entry), it proceeds to the next entry this time a match does occur (the second entry), so Rl delivers the datagram to H2 using interface 1, which is the interface connected to the same network as H2

Datagram Forwarding Algorithm D = destination IP address for each entry (SubnetNum, SubnetMask, NextHop) D1 = SubnetMask & D if D1 = SubnetNum if NextHop is an interface deliver datagram directly to D else deliver datagram to NextHop (router)

Classless Routing (CIDR) Classless InterDomain Routing (CIDR, pronounced "cider") CIDR addresses two scaling concerns in the Internet the growth of backbone routing tables as more and more network numbers need to be stored the potential for the 32-bit IP address space to be exhausted well before the 4 billionth (= 232) host is attached to the Internet CIDR assigns block of contiguous network numbers to nearby networks

CIDR tries to balance the following minimize the number of routes that a router needs to know the need to hand out addresses efficiently CIDR helps to aggregate routes uses a single entry in a forwarding table to reach a lot of different networks by breaking the rigid boundaries between address classes

example, consider a hypothetical AS (Autonomous System) with 16 class C network numbers instead of handing out 16 addresses at random, we can hand out a block of contiguous class C addresses suppose we assign the class C network numbers from 192.4.16 through 192.4.31 the top 20 bits of all the addresses in this range are the same (11000000 00000100 0001)

what we have effectively created is a 20-bit network number-something that is between a class B network number and a class C number

IP addresses: (a) class A; (b) class B; (c) class C Network Host 7 24 A: 14 16 1 B: 21 8 C: IP addresses: (a) class A; (b) class B; (c) class C

CIDR allows the prefixes (network numbers) can be of any length convention: place a /X after the prefix where X is the prefix length in bits the example above, the 20-bit prefix for all the networks 192.4.16 through 192.4.31 is represented as 192.4.16/20 if we want to represent a single class C network number, its prefix is 24 bits long, we would write it 192.4.16/24

Routing protocol can use CIDR to deal with "classless" addresses it must understand that a network number may be of any length network numbers are represented by (length, value) pairs length: gives the number of bits in the network prefix, e.g., 20 in the above example

Internet Service Provider (ISP) network has to provide Internet connectivity to a large number of corporations and campuses (customers) if we assign prefixes to the customers in such a way that many different customer networks connected to the provider network share a common, shorter address prefix, then we can get even greater aggregation of routes

example, assume that eight customers served by the provider network have each been assigned adjacent 24-bit network prefixes those prefixes all start with the same 21 bits all of the customer are reachable through the same provider network it can advertise a single route to all of them by just advertising the common 21-bit prefix they share

Route aggregation with CIDR 128 1 135 1 Route aggregation with CIDR

IP Forwarding Revisited CIDR means that prefixes may be of any length, from 2 to 32 bits it is possible to have prefixes in the forwarding table that "overlap," in the sense that some addresses may match more than one prefix example1 we might find both 171.69 (a 16-bit prefix) and 171.69.10 (a 24-bit prefix) in the forwarding table of a single router a packet destined to, say, 171.69.10.5, clearly matches both prefixes 171.69.10 would be the longest match in this case

example2 a packet destined to 171.69.20.5 would match 171.69 and not 171.69.10 in the absence of any other matching entry in the routing table, 171.69 would be the longest match

3.2.6 Address Translation (ARP) Issue IP datagrams contain IP addresses, but the physical interface hardware on the host or router to which you want to send the datagram only understands the addressing scheme of that particular network

Resolution translate the IP address to a link-level address that makes sense on this network (e.g., a 48-bit Ethernet address) encapsulate the IP datagram inside a frame that contains that link-1evel address and send it either to the ultimate destination or to a router that promises to forward the datagram toward the ultimate destination frame link-level address IP datagram Encapsulation

Simple way to map an IP address into a physical network address Network part Host part (physical address) Simple way to map an IP address into a physical network address encode a host’s physical address in the host part of its IP address example, a host with physical address 00100001 01001001 (the decimal value 33 in the upper byte and 73 in the lower byte) might be given the IP address 128.96.33.73 it is limited in that the network’s physical addresses can be no more than 16 bits long in this example

Alternative solution：Address Resolution Protocol (ARP) More general solution each host maintains a table of address pairs (map IP addresses into physical addresses) Alternative solution：Address Resolution Protocol (ARP) enable each host on a network to build up a table of mappings between IP addresses and link-level addresses since these mappings may over time (e.g. because an Ethernet card in a host breaks and is replaced by a new one with a new address), the entries are timed out periodically and removed

this happens on the order of every 15 minutes the set of mappings currently stored in a host is known as the ARP cache or ARP table

The ARP packet contains HardwareType ProtocolType the type of physical network (e.g., Ethernet) ProtocolType the higher-layer protocol (e.g., IP) HLen (“hardware” address length) and PLen (“protocol” address length) the length of the link-layer address and higher-layer protocol address

Operation Addresses specifies whether this is a request or a response source hardware (Ethernet) address (6 bytes) source protocol (IP) address (4 bytes) target hardware (Ethernet) address (6 bytes) target protocol (IP) address (4 bytes)

ARP Packet Format TargetHardwareAddr (bytes 2-5) TargetProtocolAddr (bytes 0-3) SourceProtocolAddr (bytes 2-3) Hardware type = 1 ProtocolType = 0x0800 SourceHardwareAddr (bytes 4-5) TargetHardwareAddr (bytes 0-1) SourceProtocolAddr (bytes 0-1) HLen = 48 PLen = 32 Operation SourceHardwareAddr (bytes 0-3) 8 16 31 ARP Packet Format

3.2.7 Host Configuration (DHCP) Dynamic Host Configuration Protocol (DHCP) relies on the existence of a DHCP server that is responsible for providing configuration information to hosts there is at least one DHCP server for an administrative domain at the simplest level, the DHCP server can function just as a centralized repository for host configuration information

a more sophisticated use of DHCP saves the network administrator from even having to assign addresses to individual hosts the DHCP server maintains a pool of available addresses that it hands out to hosts on demand this considerably reduces the amount of configuration an administrator must do by allocating a range of IP addresses (all with the same network number) to each network

DHCP server discovery to contact a DHCP server, a newly booted or attached host sends a DHCPDISCOVER message to a special IP (broadcast) address (255.255.255.255) it will be received by all hosts and routers on that network in the simplest case, one of these nodes is the DHCP server for the network the server would then reply to the host that generated the discovery message (all the other nodes would ignore it)

DHCP uses the concept of relay agent there is at least one relay agent on each network, and it is configured with just one piece of information: the IP address of the DHCP server when a relay agent receives a DHCPDISCOVER message, it unicasts it to the DHCP server and awaits the response, which it will then send back to the requesting client

A DHCP relay agent receives a broadcast DHCPDISCOVER message from a host and sends a unicast DHCPDISCOVER to a remote DHCP Server.

DHCP packet format

Hardware address length (HLen): 8 bits Hop count (Hops): 8 bits used by relay agents Transaction ID (Xid): 32 bits a random number chosen by the client used by the client and server to associate messages and responses between a client and a server Number of seconds (Secs): 16 bits the elapsed time in seconds since the client began an address acquisition or renewal process Flags: 16 bits defined in RFC 1542 B (Broadcast): 1 bit

Client IP address (ciaddr): 32 bits Your IP address (yiaddr): 32 bits Server IP address (siaddr): 32 bits Gateway IP address (giaddr): 32 bits Client hardware address (chaddr): 16 bytes

Server host name (sname): 64 bytes Boot filename (file): 128 bytes BOOTP/DHCP options: variable length the first four bytes contain the (decimal) values 99, 130, 83 and 99 the remainder of the field consists of a list of tagged parameters that are called options all of the vendor extensions used by BOOTP are also DHCP options

3.2.8 Error Reporting (ICMP) Internet Control Message Protocol (ICMP) defines a collection of error messages that are sent back to the source host whenever a router is unable to process an IP datagram successfully ICMP segment structure

ICMP header (starts at bit 160 of the IP header) Type ICMP type as specified above Code (see the following table) further specification of the ICMP type e.g. an ICMP Destination Unreachable might have this field set to 1 through 15 each bearing different meaning Checksum contains error checking data calculated from the ICMP header+data, with value 0 for this field

ID contains an ID value, should be returned in case of ECHO REPLY Sequence contains a sequence value, should be returned in case of ECHO REPLY

List of permitted control messages (incomplete list)

3.2.9 Virtual Networks and Tunnels Virtual Private Network (VPN) a more controlled connectivity corporations with many sites often build private networks by leasing transmission lines from the phone companies and using those lines to interconnect sites communication is restricted to take place only among the sites of that corporation, which is often desirable for security reasons to make a private network “virtual”, the leased transmission lines - which are not shared with any other corporations -would be replaced by some sort of shared network

(b) two virtual private networks sharing common switches. An example of virtual private networks: (a) two separate private networks; (b) two virtual private networks sharing common switches.

In the above figure Frame Relay or ATM network is used to provide the controlled connectivity among sites limited connectivity of a real private network is maintained IP Tunnel a virtual point-to-point link between a pair of nodes that are actually separated by an arbitrary number of networks

A tunnel through an internetwork (the change in encapsulation of the packet as it moves across the network)

A tunnel has been configured from R1 to R2 and assigned a virtual interface number 0 The forwarding table in R1 might therefore look like the following table R1 has two physical interfaces interface 0 connects to network 1 interface 1 connects to a large internetwork and is thus the default for all traffic that does not match something more specific in the forwarding table

R1 has a virtual interface, which is the interface to the tunnel suppose R1 receives a packet from network 1 that contains an address in network 2 the forwarding table says this packet should be sent out virtual interface 0 in order to send a packet out this interface, the router takes the packet, adds an IP header addressed to R2, and then proceeds to forward the packet as it had just been received R2’s address is 10.0.0.1 since the network number of this address is 10, not 1 or 2, a packet destined for R2 will be forwarded out the default interface into the internetwork

NetworkNum NextHop 1 Interface 0 2 Virtual interface 0 Default Forwarding table for router R1

3.3 Routing 3.3.1 Network as a Graph 3.3.2 Distance Vector (RIP) 3.3.3 Link State (OSPF) 3.3.4 Metrics

Difference between network-layer and link-layer Route a way or course taken in getting from a starting point to a destination send or direct along a specified course Routing find the path or course of forwarding according to information contained in packet (destination) Difference between network-layer and link-layer format of forwarding table way of updating the table

Link-layer Forwarding table mapping from destination physical address (MAC address) to port of forwarding Update of the table manually configured

IP (Network) Layer Forwarding table Update the table mapping from destination network id (NetNum) to next-hop (or interface) of forwarding Update the table manually configured (static route) dynamically learned from routing protocol

Forwarding vs. Routing Forwarding Routing taking a packet  looking at its destination address  consulting a table  sending the packet in a direction determined by that table locally done at a node Routing the process by which forwarding tables are built depends on a distributed algorithm

Forwarding Table vs. Routing Table used when a packet is being forwarded and so must contain enough information to accomplish the forwarding function a row in the forwarding table contains the mapping from a network number to an outgoing interface and some MAC information, such as the Ethernet address of the next hop

it contains mappings from network numbers to next-hops (IP addresses) Routing table the table that is built up by the routing algorithms as a precursor to building the forwarding table it contains mappings from network numbers to next-hops (IP addresses) Precursor：前導;先驅

Example, in the following tables the routing table tells us that network number 10 is to be reached by a next hop router with the IP address 171.69.245.10 the forwarding table contains the information about exactly how to forward a packet to that next hop send it out interface number 0 with a MAC address of 8:0:2b:e4:b:l:2 (the last piece of information is provided by the Address Resolution Protocol)

Example rows from (a) routing and (b) forwarding tables Network Number Next Hop 10 171.69.245.10 Network Number Interface MAC address 10 if0 8:0:2b:e4:b:1:2 (a) (b) Example rows from (a) routing and (b) forwarding tables

3.3.1 Network as a Graph

Basic problem of routing find the lowest-cost path between any two nodes, where the cost of a path equals the sum of the costs of all the edges that make up the path

Solution routing is achieved in most practical networks by running routing protocols among the nodes these protocols provide a distributed, dynamic way to solve the problem of finding the lowest-cost path in the presence of node or link failure addition of new node or new link changes of link cost it is difficult to make centralized solutions scalable, so all the widely used routing protocols use distributed algorithms

Elements of a routing protocol local data structure the routing table format of messages for exchanging routing information Static vs. dynamic routing static manually set forwarding table not adaptive to changes in network topology

dynamic abstract: weighted graph vertex: router edge: link weight: cost criterion: best path from source to destination “best”: path cost is minimum metrics for the cost hop delay loss fee of charge

R1 R2 R3 1 2 static dynamic

3.3.2 Distance Vector (RIP) Distance-Vector Algorithm (Bellman-Ford Algorithm) each node constructs a one-dimensional array (a vector) containing the "distances" (costs) to all other nodes and distributes that vector to its immediate neighbors response when receiving an announcement from a neighbor for every entry in the announcement, store it if the announced distance is shorter than what in the table a better route is found the announcer is just the next-hop in the table the metric to destination has been changed otherwise discard it

assumption initially, each node knows the cost of the link to each of its directly connected neighbors broken links are assigned an infinite cost, ∞

Local data structure routing table destination cost to the destination corresponding next-hop TTL (Time to Live) of the route

Messages exchanged among vertices Distance Vector (DV) C[n]: distance (cost) from current vertex to the destination vertex, n periodically announced to all the neighbors DV is telling neighbors how far I am to all the others

Distance Vector Algorithm In this example the cost of each link is set to 1 a least-cost path is simply the one with the fewest hops

Node X’s Routing Table: Cost / Next-Hop Initial State B A C D E F G Destination Y Node X’s Routing Table: Cost / Next-Hop A’s B’s C’s D’s E’s F’s G’s A 1/A ∞ B 1/B C 1/C D 1/D E 1/E F 1/F G 1/G

Destination Y Cost/ Next-Hop A B C D E F G 0/ 1/B 1/C 2/C 1/E 1/F 2/F A’s routing table Destination Y Cost/ Next-Hop A B C D E F G 0/ 1/B 1/C 2/C 1/E 1/F 2/F B A C E D F G Distance Vector sent by A

Node X’s Routing Table: Cost / Next-Hop F G After One Step Destination Y Node X’s Routing Table: Cost / Next-Hop A’s B’s C’s D’s E’s F’s G’s A 1/A 2/C 2/F B 1/B 2/A ∞ C 1/C 2/D D 1/D 2/G E 1/E F 1/F G 1/G

Node X’s Routing Table: Cost / Next-Hop F G After Two Steps convergence: no more changes when getting further announcement Destination Y Node X’s Routing Table: Cost / Next-Hop A’s B’s C’s D’s E’s F’s G’s A 1/A 2/C 2/F B 1/B 2/A 3/F C 1/C 2/D D 1/D 3/A 2/G E 1/E 3/C F 1/F G 1/G

Two different circumstances for a node to send a routing update to its neighbors periodic update each node automatically sends an update message every so often, even if nothing has changed triggered update happens whenever a node receives an update from one of its neighbors that causes it to change one of the routes in its routing table i.e., whenever a node's routing table changes, it sends an update to its neighbors, which may lead to a change in their tables, causing them to send an update to their neighbors

Link Failures Example 1 (stable) F detects that link to G has failed F sets distance to G to infinity and sends update to A [F：(G, ∞, G)] A sets distance to G to infinity since it uses F to reach G [A：(G, ∞, F)] ------------------------------------------------------------------------- A receives periodic update from C with 2-hop path to G A sets distance to G to 3 and sends update to F [A：(G, 3, C)] F decides it can reach G in 4 hops via A [F：(G, 4, A)] Pattern：(Dest, Cost, NextHop)

Example 2 (count to infinity) ∞ Example 2 (count to infinity) link from A to E fails A advertises distance of infinity to E [A：(E, ∞, E)] B and C advertise a distance of 2 to E [B：(E, 2, A)] ，[A：(E, 3, B)]，[C：(E, 2, A)]，[A：(E, 3, C)] B hears that E can be reached in 2 hops from C B decides it can reach E in 3 hops; advertises this to A [B：(E, 3, C)] A decides it can reach E in 4 hops; advertises this to C [A：(E, 4, B)] C decides that it can reach E in 5 hops… [C：(E, 5, A)]

Loop-breaking heuristics (partial solutions) set infinity to 16 split horizon split horizon with poison reverse

Solution-1 (set infinity to 16) use some relatively small number as an approximation of infinity, which at least bounds the amount of time that it takes to count to infinity example, set the maximum number of hops to get across a certain network is never going to be more than 16 (set 16 to be infinity value) drawback problem occurs if our network grew to a point where some nodes were separated by more than 16 hops

Solution-2 (split horizon) when a node sends a routing update to its neighbors, it does not send those routes it learned from each neighbor back to that neighbor example, if B has the route (E, 2, A) in its table, then it knows it must have learned this route from A, and so whenever B sends a routing update to A, it does not include the route (E, 2, A) in that update

Solution-3 (split horizon with poison reverse) (B actually sends that route back to A, but it puts negative information in the route to ensure that A will not eventually use B to get to E) Let B be a neighbor of A if in the routing table of B, the next hop entry for destination Z is A, B informs A that its distance to Z is infinite [B：(Z, cost, A) → A：(Z, ∞, B)]

Solution 2 & 3 only work for routing loops that involve two nodes ∞ Solution 2 & 3 only work for routing loops that involve two nodes example, for larger routing loops if B and C had waited for a while after hearing of the link failure from A before advertising routes to E they would have found that neither of them really had a route to E

(1,E) (4,C) A B C D G F E (2,A) (3,B) (3,F) (,-) B (3,B) (,E) A C (4,C) E D (,-) (3,F) F G (,E) (4,C) A B C D G F E (2,A) (3,B) (3,F) (,E) (4,C) A B C D G F E (,-)

Routing Information Protocol (RIP) A DV (Distance Vector) routing protocol Rather than advertising the cost of reaching other routers, the routers advertise the cost of reaching networks example, in the following figure, router C would advertise to router A the fact that it can reach networks 2 and 3 at cost 0 [C：(Net2, 0, Net2)，C：(Net3, 0, Net3)] networks 5 and 6 at cost 1 [C：(Net5, 1, Net3)，C：(Net6, 1, Net3)] network 4 at cost 2 [C：(Net4, 2, Net3)]

Example network running RIP

RIP packet format the majority of the packets is taken up with (network-address, distance) pairs example if router A learns from router B that network X can be reached at a lower cost via B than via the existing next hop in the routing table, then A updates the cost and next hop information for the network number accordingly

RIP packet format

RIP a fairly straightforward implementation of distance-vector routing routers running RIP send their advertisements every 30 seconds a router also sends an update message whenever an update from another router causes it to change its routing table

metrics or costs for routing all link costs being equal to 1 always try to find the minimum hop route valid distances are 1 through 15, with 16 representing infinity (this limits RIP to running on fairly small networks-those with no paths longer than 15 hops)

3.3.3 Link State (OSPF) Distance-Vector approach Link-State approach “tell neighbors where I can go, and how far” Link-State approach “tell all which neighbors I have” key reliable dissemination of link-state information calculation of routes from sum of link-state knowledge

Link-state routing the second major class of intradomain routing protocol assumptions each node is assumed to be capable of finding out the state of the link to its neighbors (up or down) and the cost of each link

basic idea every node knows how to reach its directly connected neighbors, and if we make sure that the totality of this knowledge is disseminated to every node, then every node will have enough knowledge of the network to build a complete map of the network link-state routing protocols rely on two mechanisms reliable dissemination of link-state information calculation of routes from the sum of all the accumulated link-state knowledge

Link-State Message Data Structure LSP (Link-State Packet) an update packet created by each node information for route calculation the ID of the node that created the LSP a list of directly connected neighbors of the node, with the cost of the link to each one

information for reliability a sequence number ensure having the most recent copy reset to zero when routing process restarted a time to live (TTL) for this packet toooooold packets are discarded

Reliable Flooding Send local LSP out on all of its directly connected links Each node receiving the LSP forwards it out on all of its links stores each node’s recent LSP forwards LSP to neighbors except the sender itself makes confirmation and retransmission with neighbors

The following figure shows an LSP being flooded in a small network each node becomes shaded as it stores the new LSP (a) the LSP arrives at node X, which sends it to neighbors A and C (b) A and C do not send it back to X, but send it on to B (c) B receives two identical copies of the LSP, it will accept whichever arrived first and ignore the second as a duplicate (d) B passes the LSP onto D, who has no neighbors to flood it to, and the process is complete

New LSP Generation Two circumstances to generate new LSP expiry of a periodic timer with period in tens minutes change in topology directly connected links go down detected by link-layer protocols immediate neighbors go down detected by periodic “hello” message

Calculation of Route Dijkstra’s Shortest Path Algorithm Notations N: vertex set of the graph l: l(i, j) is the (non-negative) cost of the edge (i, j) s: current vertex M: set of ever calculated vertices C(n): cost of path from s to n

Calculate a minimum-cost tree from s for each n in N-{s} C(n) = l(s,n) while (N != M) M = M union {w} such that C(w) is the minimum for all w in (N-M) for each n in (N-M) C(n) = MIN(C(n),C(w)+l(w,n))

In practice, each switch computes its routing table directly from the LSPs it has collected using a forward search approach for Dijkstris algorithm each switch maintains two lists, known as Tentative and Confirmed. each of these lists contains a set of entries of the form (Destination, Cost, NextHop)

Forward Search Approach for Dijkstra Algorithm 1. Initialize the Confirmed list with an entry for myself; this entry has a cost of 0. 2. For the node just added to the Confirmed list in the previous step, call it node Next, select its LSP 3. For each neighbor (Neighbor) of Next, calculate the cost (Cost) to reach this Neighbor as the sum of the cost from myself to Next and from Next to Neighbor (a) If Neighbor is currently not on either the Confirmed or the Tentative list, then add (Neighbor, Cost, NextHop) to the Tentative list, where NextHop is the direction I go to reach Next (b) If Neighbor is currently on the Tentative list, and the Cost is less than the currently listed cost for Neighbor, then replace the current entry with (Neighbor, Cost, NextHop), where NextHop is the direction I go to reach Next 4. If the Tentative list is empty, stop. Otherwise, pick the entry from the Tentative list with the lowest cost, move it to the Confirmed list, and return to step 2

Example Link-state routing: an example network

(B, 11, B) (C, 2, C) (B, 5, C) (A, 12, C) (A, 10, C)

Open Shortest Path First Protocol (OSPF) one of the most widely used link-state routing protocols Open: refers to the fact that it is an open, nonproprietary standard, created under the auspices of the IETF SPF: comes from an alternative name for link-state routing Auspice：贊助;支持

OSPF adds the following features to the basic link-state algorithm authentication of routing messages additional hierarchy OSPF introduces another layer of hierarchy into routing by allowing a domain to be partitioned into areas a router within a domain does not necessarily need to know how to reach every network within that domain, but know only how to get to the right area this reduces the amount of information that must be transmitted to and stored in each node

load balancing OSPF allows multiple routes to the same place to be assigned the same cost and will cause traffic to be distributed evenly over those routes

There are several different types of OSPF messages, but all begin with the same header OSPF header format Version: 2 Type: 1 through 5 SourceAddr: identifies the sender of the message AreaId: a 32-bit identifier of the area in which the node is located

0: no authentication is used 1: a simple password is used Checksum the entire packet, except the authentication data, is protected by a 16-bit checksum using the same algorithm as the IP header Authentication type 0: no authentication is used 1: a simple password is used 2: a cryptographic authentication checksum is used Cryptographic：密碼學

OSPF header format

Five OSPF message types Type 1: "hello" message, which a router sends to its peers to notify them that it is still alive and connected Type 2~5: used to request, send, and acknowledge the receipt of link-state messages Basic building block of link-state messages in OSPF is link-state advertisement (LSA) one message may contain many LSAs

OSPF packet format for link-state advertisement (Type 1)

OSPF link-state advertisement (LSA) Type 1 LSA: advertise the cost of links between routers Type 2 LSA: advertise networks to which the advertising router is connected LS Age the equivalent of a time to live, except that it counts up and the LSA expires when the age reaches a defined maximum value Type tells us that this is a type 1 LSA

Link-state ID & Advertising router in a type 1 LSA, these two fields are identical each carries a 32-bit identifier for the router that created this LSA LS sequence number detect old or duplicate LSAs LS checksum verify that data has not been corrupted it covers all fields in the packet except LS Age

Length the length in bytes of the complete LSA Link ID, Link Data, & metric each link in the LSA is represented by a Link ID, some Link Data, and a metric TOS allow OSPF to choose different routes for IP packets based on the value in their TOS field

3.3.4 Metrics Original ARPANET metric measures number of packets queued on each link took neither latency nor bandwidth into consideration New ARPANET metric stamp each incoming packet with its arrival time (AT) record departure time (DT) when link-level ACK arrives, the node compute the packet delay Delay = (DT-AT) + Transmit + Latency if timeout (ACK did not arrive), DT is reset to the time the packet was retransmitted link cost = average delay over some time period The ARPANET, developed by DARPA of the United States Department of Defense, was the world's first operational packet switching network, and the predecessor of the global Internet.

3.4 Implementation and Performance 3.4.1 Switch Basics A very simple way to build a switch buy a general-purpose workstation and equip it with a number of network interfaces run suitable software to receive packets on one of its interfaces perform any of the switching functions send packets out another of its interfaces

A workstation used as packet switch

The figure shows a workstation with three network interfaces used as a switch a path that a packet might take from the time it arrives on interface 1 until it is output on interface 2

we assume DMA (Direct Memory Access) the workstation has a mechanism to move data directly from an interface to its main memory, i.e., direct memory access (DMA) once the packet is in memory, the CPU examines its header to determine on which interface the packet should be out it then uses DMA to move the packet out to the appropriate interface the packet does not go to the CPU because the CPU inspects only the header of the packet

Main problem with using a workstation as a switch its performance is limited by the fact that all packets must pass through a single point of contention in the example shown, each packet crosses the I/O bus twice and is written to and read from main memory once the upper bound on aggregate throughput of such a device is, thus, either half the main memory bandwidth or half the I/O bus bandwidth, whichever is less (usually it’s the I/O bus bandwidth)

example a workstation with a 133-MHZ, 64-bit wide I/O bus can transmit data at a peak rate of a little over 8 Gbps (= 133 × 220 × 64) since forwarding a packet involves crossing the bus twice, the actual limit is 4 Gbps this upper bound also assumes that moving data is the only problem a fair approximation for long packets a bad one when packets are short the cost of processing each packet- (1) parsing its header and (2) deciding which output link to transmit it on-is likely to dominate

example, a workstation can perform all the necessary processing to switch 1 million packets each second (packet per second (pps) rate) if the average packet is short, say, 64 bytes throughput = pps × (bits per packet) = 1 × 106 × 64 × 8 (bits per second) = 512 × 106 (bits per second) this 512 Mbps would be shared by all users connected to the switch example, a 10-port switch with this aggregate throughput would only be able to cope with an average data rate of 51.2 Mbps on each port

To address this problem Switch fabric Control processor Output port Input To address this problem a large array of switch designs that reduce the amount of contention and provide high aggregate throughput some contention is unavoidable if every input has data to send to a single output, then they cannot all send it at once if data destined for different outputs is arriving at different inputs, a well-designed switch will be able to move data from inputs to outputs in parallel, thus increasing the aggregate throughput

3.4.2 Ports A 4 × 4 switch Switch fabric Control processor Output port Input A 4 × 4 switch

The 4 × 4 switch in the figure consists of fabric Control processor Output port Input The 4 × 4 switch in the figure consists of ports (input ports and output ports) communicate with the outside world contain fiber-optic receivers and buffers to hold packets that are waiting to be switched or transmitted, and often a significant amount of other circuitry that enables the switch to function switch fabric when presented with a packet, deliver it to the right output port control processor (at least one) in charge of the whole switch

Another key function of ports: buffering Input port the first place to look for performance bottlenecks has to receive a steady stream of packets, analyze information in the header of each one to determine which output port (or ports) the packet must be sent and pass the packet on to the fabric Another key function of ports: buffering it can happen in either the input or the output port it can also happen within the fabric (sometimes called internal buffering)

simple input buffering has some serious limitations example, an input buffer implemented as a FIFO as packets arrive at the switch, they are placed in the input buffer the switch then tries to forward the packets at the front of each FIFO to their appropriate output port if the packets at the front of several different input ports are destined for the same output port at the same time, then only one of them can be forwarded; the rest must stay in their input buffers

Simple illustration of head-of-line blocking

buffering drawback (head-of-line blocking) occurs at input buffering those packets left at the front of the input buffer prevent other packets further back in the buffer from getting a chance to go to their chosen outputs buffering wherever contention is possible input port (contend for fabric) internal (contend for output port) output port (contend for link)

3.4.3 Fabrics Should be able to move packets from input ports to output ports with minimal delay and in a way that meets the throughput goals of the switch Parallelism a high-performance fabric with n ports can often move one packet from each of its n ports to one of the output ports at the same time

Types of fabric shared bus shared memory crossbar self-routing Switch Control processor Output port Input Types of fabric shared bus shared memory crossbar self-routing

Shared bus Shared memory found in a conventional workstation used as a switch the bus bandwidth determines the throughput of the switch, high-performance switches usually have specially designed busses rather than the standard busses found in PCs Shared memory packets are written into a memory location by an input port and then read from memory by the output ports the memory bandwidth determines switch throughput, so wide and fast memory is typically used in this sort of design it usually uses a specially designed, high-speed memory bus

Crossbar a matrix of pathways that can be configured to connect any input port to any output port in their simplest form, they require each output port to be able to accept packets from all inputs at once

A 4 × 4 crossbar switches

000 001 010 011 100 101 110 111 Self-routing rely on some information in the packet header to direct each packet to its correct output usually a special “self-routing header” is appended to the packet by the input port after it has determined which output the packets needs to go to this extra header is removed before the packet leaves the switch self-routing fabrics are often built from large numbers of very simple 2×2 banyan switching fabrics

A self-routing header is applied to a packet at input to enable the fabric to send the packet to the correct output, where it is removed (a) packet arrives at input port; (b) input port attaches self-routing header to direct packet to correct output (c) self-routing header is removed at output port before packet leaves switch

constructed from simple 2 x 2 switching elements Banyan Network constructed from simple 2 x 2 switching elements self-routing header attached to each packet elements arranged to route based on this header look at 1 bit in each self-routing header route packets toward the upper output if it is zero or toward the lower output if it is one 000 001 010 011 100 101 110 111

if two packets arrive at the same time and both have the bit set to the same value, then they want to be routed to the same output and a collision will occur the banyan network routes all packets to the correct output without collisions if the packets are presented in ascending order

in the self-routing headers of four arriving packets. 000 001 010 011 100 101 110 111 Routing packets through a banyan network. The 3-bit numbers represent values in the self-routing headers of four arriving packets.