Download presentation
Presentation is loading. Please wait.
1
High-Speed Networks and Internets
by William Stallings
2
Table of Contents 1. Introduction. ----- 04
2. Protocols and Architecture 3. TCP and IP 4. Frame Relay 5. Asynchronous Transfer Mode (ATM) 6. High-Speed LANs 7. Overview of Probability and Stochastic Processes 8. Queuing Analysis 9. Self-Similar Traffic 10. Congestion Control in Data Networks and Internets
3
11. Link-Level Flow and Error Control. -----319
12. TCP Traffic Control 13. Traffic and Congestion Control in ATM Networks 14. Overview of Graph Theory and Least-Cost Paths 15. Interior Routing Protocols 16. Exterior Routing Protocols and Multicast 17. Integrated and Differentiated Services 18. Protocols for QOS Support 19. Overview of Information Theory 20. Lossless Compression 21. Lossy Compression
4
Chapter 1 Introduction
5
Objectives Bring together in perspective various components of the Internet Network Infrastructure Communication Infrastructure Organizations and groups that set standards
6
Network Infrastructure: Hardware and Access Infrastructure
Module Network Infrastructure: Hardware and Access Infrastructure
7
Evolution Started as ARPANET
Grew with the introduction of PCs, LANs and WANs CCITT (now ITU) was the initial standard setting organization Lower level protocols was X.25 Higher level protocol was TCP/IP that followed the initial introduction of the Network Control Protocol (NCP)
8
Current Trend Global network based on high speed fiber lines
IPV4 is being replaced by IPv6 X.25, Frame Relay etc. are being replaced with ATM
9
Hardware Infrastructure
The hardware infrastructure now is essentially a hierarchy of interconnected networks Local Departmental Campus or Enterprise Wide Area
10
Networking and Internetworking Devices
Hubs Layer 1 devices Switches Mostly Layer 2 devices Routers Layer 3 devices
11
LA Fiber Connection Hierarchy
12
Internet National Connection Example (Cogent Communications)
13
Internet Global Connection Example (MCI)
14
Major Digital Line Types
Lower speed access point DSL ISDN High network connections T1, T3 etc. OC3, OC12 etc.
15
Digital Lines and Speeds
64Kpbs ISDN 128 Kbps (BRI) 2 DSO (B channels) T1 1.544 Mbps 24 DSO T3 Mbps 28 T1 OC3 155 Mbps 100 T1 OC12 622 Mbps 4 OC3 OC48 2.5 Gbps 4 OC12 OC192 9.6 Gbps 4 OC48
16
Speed Faster backbones are providing faster access to the Internet
Internet2 is a joint venture project between many universities to develop a high-speed Internet This development, however, is very likely to be spearheaded by the industry given the commercial attractiveness of providing fast Internet access
17
High-Speed Internet (Abeline)
18
Abeline Update
19
Internet Traffic
20
Internet Traffic Status in Asia
Source:
21
Asia Traffic Index
22
Response Time to Asia
23
Packet Loss in Asia
24
Connection Hierarchy
25
Definition of Terms POP (Point of Presence)
NAP (National Access Points) High-speed backbone network service
26
Communication Infrastructure: The Protocols
27
Protocol of the Internet
TCP/IP
28
IP Addressing 32-bit numbering system
Divided into network ID and host ID Grouped into Classes A, B, C, D and E Classes A, B and C are the ones relevant to commercial use Several IP addresses have been reserved for private and other uses Addresses used in Network Address Translation (NAT) Addresses used of IP multicasting
29
Meeting the Demand for IP Addresses
DHCP Network Address Translation (NAT) IPv6 Classless Inter Domain Routing (CIDR)
30
Some Application Layer Protocol
HTTP, HTTPS FTP Telnet POP3 IMAP SMTP DNS DHCP SNMP X.500 LDAP
31
Transport Layer Protocols
TCP UDP ICMP OSPF SPX NetBEUI SMB For more information access:
32
Internet Layer Protocols
IPv4, IPv6 ARP NWLink NetBEIU
33
Network Interface Layer Protocols
Ethernet Token Ring IEEE x PPP X.25 FDDI Frame Relay ISDN ATM T and E carriers OC carriers xDSL Cable Modem
34
Some Popular Ports and Protocols
80 – HTTP Web services 20/21 – FTP
35
Additional Port Information
Extensive list of port numbers at IANA
36
Some Useful TCP/IP Commands
ping ipconfig finger hostname nslookup tracert nbtstat netstat telnet ftp
37
Further Information on TCP/IP Commands
In Windows XP help, search for “TCP/IP Utilities and Services” Access Garry Kessler’s manual at:
38
Domains and DNS Infrastructure
39
Top Level Domain (TLD) Extensions
“There are two types of top-level domains, generic and country code, plus a special top-level domain (.arpa) for Internet infrastructure. Generic domains were created for use by the Internet public, while country code domains were created to be used by individual countries as they deemed necessary.” Source:
40
The Three Top-Level Domains
Country Code Domains (.uk, .de, .jp, .us, etc.) Generic Domains (.aero, .biz, .com, .coop, .edu, .gov, .info, .int, .mil, .museum, .name, .net, .org, and .pro) Infrastructure Domain (.arpa)
41
Country Extensions
42
Domain Extensions Some prominent domain names .com, .edu, .org,
Some interesting newer domain names .net, .pro
43
More Information on Domain Extensions
Some useful information on qualifications, contact etc. can be obtained by navigating through the following IANA web link
44
Where to Find Domain Registrant Information?
45
Internet Domain Growth
46
Root Name Server Details
ftp://ftp.internic.net/domain/named.root
47
Accredited Domain Name Registrar Directory
Companies that are accredited by ICANN
48
Internet Agencies
49
Important Internet Groups
Internet Architecture Board (IAB) The Internet Engineering Steering Group (IESG) Internet Society (ISOC) Internet Assigned Numbers Authority (IANA)
50
Internet Engineering Task Force (IETF)
“The Internet Engineering Task Force (IETF) is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. It is open to any interested individual.” - IETF
51
IETF Working Groups The actual technical work of the IETF is done in its working groups, which are organized by topic into several areas (e.g., routing, transport, security, etc.). The IETF holds meetings three times per year.” – IETF The IETF working groups are grouped into areas, and managed by Area Directors, or ADs. The ADs are members of the Internet Engineering Steering Group (IESG). Providing architectural oversight is the Internet Architecture Board, (IAB). The IAB also adjudicates appeals when someone complains that the IESG has failed. The IAB and IESG are chartered by the Internet Society (ISOC) for these purposes. “
52
Functional Overview of IETF
53
Internet Society “The Internet Society (ISOC) is a professional membership society with more than 150 organization and 16,000 individual members in over 180 countries. It provides leadership in addressing issues that confront the future of the Internet, and is the organization home for the groups responsible for Internet infrastructure standards, including the Internet Engineering Task Force (IETF) and the Internet Architecture Board (IAB). “ - ISOC
54
Internet Architecture Board (IAB)
“The IAB is chartered both as a committee of the Internet Engineering Task Force (IETF) and as an advisory body of the Internet Society (ISOC). Its responsibilities include architectural oversight of IETF activities, Internet Standards Process oversight and appeal, and the appointment of the RFC Editor. The IAB is also responsible for the management of the IETF protocol parameter registries.” – IAB
55
IAB Access
56
Internet Assigned Numbers Authority (IANA)
“The central coordinator for the assignment of unique parameter values for Internet protocols.” - IETF “It is chartered by the Internet Society (ISOC) to act as the clearinghouse to assign and coordinate the use of numerous Internet protocol parameters.” - IETF
57
Internet Corporation for Assigned Names and Numbers (ICANN)
“The Internet Corporation for Assigned Names and Numbers (ICANN) is an internationally organized, non-profit corporation that has responsibility for Internet Protocol (IP) address space allocation, protocol identifier assignment, generic (gTLD) and country code (ccTLD) Top-Level Domain name system management, and root server system management functions. These services were originally performed under U.S. Government contract by the Internet Assigned Numbers Authority (IANA) and other entities. ICANN now performs the IANA function.” - ICANN
58
ICANN Home page http://www.icann.org/
Further Information on ICANN’s role
59
American Registry for Internet Numbers (ARIN)
“We at the American Registry for Internet Numbers manage the Internet numbering resources for North America, a portion of the Caribbean, and sub-equatorial Africa. A full list of countries in the ARIN region can be found by clicking here. As a nonprofit corporation with a bottom-up, community-based structure, our focus is completely on serving our members and the Internet community at large.” – ARIN
60
More About ARIN
61
ARIN Equivalent in Asia
Asia Pacific
62
National Registries For further information of national domain registries for different countries access the following site:
63
Council of Registrars (CORE)
“CORE is an international not-for-profit association of Registrars constituted under Swiss Law. CORE is active in the Domain Name Registration area since 1997.” -CORE Access at:
64
More on CORE “CORE's members are professional registrars from various areas (Europe, North America, Asia-Pacific) who handle domain name registration on behalf of customers. Currently CORE has members in present in 14 countries and manage in total over 400,000 domain names in various TLDs. CORE also acts as Registry Operator for two Sponsored TLDs, .aero and .museum. “ - CORE
65
Internet Network Information Center (InterNIC)
Provides the public with information regarding internet domain name registration services
66
All About Registering a Domain
FAQ on domain registration from InterNIC
67
Internet Research Task Force (IRTF)
“To promote research of importance to the evolution of the future Internet by creating focused, long-term and small Research Groups working on topics related to Internet protocols, applications, architecture and technology.” - IRTF
68
World Wide Web (W3) Consortium
“The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding. “ –W3
69
A Sample Work of W3C For more information on W3C’s work on the http protocol
70
National Telecommunication and Information Administration (NTIA)
“The National Telecommunications and Information Administration (NTIA) is the Executive Branch agency principally responsible for domestic and international telecommunications and information policy issues. “ - NTIA
71
NTIA Responsibilities
NTIA also manages the Federal use of the spectrum; administers infrastructure grants to support the development of a national information infrastructure accessible to all Americans; manages public telecommunications facilities grants designed to maintain and extend the public broadcasting infrastructure; and performs cutting-edge telecommunications research and engineering, including resolving technical telecommunications issues for the Federal government and private sector. “ - NTIA
72
Access NTIA
73
VeriSign© Manages the .com and .net domains Access at:
74
Network Solutions One of the largest and earliest domain name registrars
75
Chapter 2 Protocols and Architecture
76
Need For Protocol Architecture
E.g. File transfer Source must activate comms. Path or inform network of destination Source must check destination is prepared to receive File transfer application on source must check destination file management system will accept and store file for his user May need file format translation Task broken into subtasks Implemented separately in layers in stack Functions needed in both systems Peer layers communicate
77
Key Elements of a Protocol
Syntax Data formats Signal levels Semantics Control information Error handling Timing Speed matching Sequencing
78
Protocol Architecture
Task of communication broken up into modules For example file transfer could use three modules File transfer application Communication service module Network access module
79
Simplified File Transfer Architecture
80
A Three Layer Model Network Access Layer Transport Layer
Application Layer
81
Network Access Layer Exchange of data between the computer and the network Sending computer provides address of destination May invoke levels of service Dependent on type of network used (LAN, packet switched etc.)
82
Transport Layer Reliable data exchange
Independent of network being used Independent of application
83
Application Layer Support for different user applications
e.g. , file transfer
84
Protocol Architectures and Networks
85
Addressing Requirements
Two levels of addressing required Each computer needs unique network address Each application on a (multi-tasking) computer needs a unique address within the computer The service access point or SAP The port on TCP/IP stacks
86
Protocols in Simplified Architecture
87
Protocol Data Units (PDU)
At each layer, protocols are used to communicate Control information is added to user data at each layer Transport layer may fragment user data Each fragment has a transport header added Destination SAP Sequence number Error detection code This gives a transport protocol data unit
88
Protocol Data Units
89
Network PDU Adds network header
network address for destination computer Facilities requests
90
Operation of a Protocol Architecture
91
Standardized Protocol Architectures
Required for devices to communicate Vendors have more marketable products Customers can insist on standards based equipment Two standards: OSI Reference model Never lived up to early promises TCP/IP protocol suite Most widely used Also: IBM Systems Network Architecture (SNA)
92
OSI Open Systems Interconnection
Developed by the International Organization for Standardization (ISO) Seven layers A theoretical system delivered too late! TCP/IP is the de facto standard
93
OSI - The Model A layer model
Each layer performs a subset of the required communication functions Each layer relies on the next lower layer to perform more primitive functions Each layer provides services to the next higher layer Changes in one layer should not require changes in other layers
94
OSI Layers
95
The OSI Environment
96
OSI as Framework for Standardization
97
Layer Specific Standards
98
Elements of Standardization
Protocol specification Operates between the same layer on two systems May involve different operating system Protocol specification must be precise Format of data units Semantics of all fields allowable sequence of PCUs Service definition Functional description of what is provided Addressing Referenced by SAPs
99
Service Primitives and Parameters
Services between adjacent layers expressed in terms of primitives and parameters Primitives specify function to be performed Parameters pass data and control info
100
Primitive Types REQUEST
A primitive issued by a service user to invoke some service and to pass the parameters needed to specify fully the requested service INDICATION A primitive issued by a service provider either to: indicate that a procedure has been invoked by the peer service user on the connection and to provide the associated parameters, or notify the service user of a provider-initiated action RESPONSE A primitive issued by a service user to acknowledge or complete some procedure previously invoked by an indication to that user CONFIRM A primitive issued by a service provider to acknowledge or complete some procedure previously invoked by a request by the service user
101
Timing Sequence for Service Primitives
102
OSI Layers (1) Physical Physical interface between devices Mechanical
Electrical Functional Procedural Data Link Means of activating, maintaining and deactivating a reliable link Error detection and control Higher layers may assume error free transmission
103
OSI Layers (2) Network Transport of information
Higher layers do not need to know about underlying technology Not needed on direct links Transport Exchange of data between end systems Error free In sequence No losses No duplicates Quality of service
104
OSI Layers (3) Session Control of dialogues between applications
Dialogue discipline Grouping Recovery Presentation Data formats and coding Data compression Encryption Application Means for applications to access OSI environment
105
Use of a Relay
106
TCP/IP Protocol Architecture
Developed by the US Defense Advanced Research Project Agency (DARPA) for its packet switched network (ARPANET) Used by the global Internet No official model but a working one. Application layer Host to host or transport layer Internet layer Network access layer Physical layer
107
Physical Layer Physical interface between data transmission device (e.g. computer) and transmission medium or network Characteristics of transmission medium Signal levels Data rates etc.
108
Network Access Layer Exchange of data between end system and network
Destination address provision Invoking services like priority
109
Internet Layer (IP) Systems may be attached to different networks
Routing functions across multiple networks Implemented in end systems and routers
110
Transport Layer (TCP) Reliable delivery of data Ordering of delivery
111
Application Layer Support for user applications e.g. http, SMPT
112
OSI v TCP/IP
113
TCP Usual transport layer is Transmission Control Protocol
Reliable connection Connection Temporary logical association between entities in different systems TCP PDU Called TCP segment Includes source and destination port (c.f. SAP) Identify respective users (applications) Connection refers to pair of ports TCP tracks segments between entities on each connection
114
UDP Alternative to TCP is User Datagram Protocol
Not guaranteed delivery No preservation of sequence No protection against duplication Minimum overhead Adds port addressing to IP
115
TCP/IP Concepts
116
Addressing level Level in architecture at which entity is named
Unique address for each end system (computer) and router Network level address IP or internet address (TCP/IP) Network service access point or NSAP (OSI) Process within the system Port number (TCP/IP) Service access point or SAP (OSI)
117
Trace of Simple Operation
Process associated with port 1 in host A sends message to port 2 in host B Process at A hands down message to TCP to send to port 2 TCP hands down to IP to send to host B IP hands down to network layer (e.g. Ethernet) to send to router J Generates a set of encapsulated PDUs
118
PDUs in TCP/IP
119
Example Header Information
Destination port Sequence number Checksum
120
Some Protocols in TCP/IP Suite
121
Chapter 3 TCP and IP
122
TCP/IP Protocol Suite The TCP/IP Model, or Internet Protocol Suite, describes a set of general design guidelines and implementations of specific networking protocols to enable computers to communicate over a network. TCP/IP provides end-to-end connectivity specifying how data should be formatted, addressed, transmitted, routed and received at the destination. Protocols exist for a variety of different types of communication services between computers.
123
TCP/IP Protocol Suite History The Internet Protocol Suite (commonly known as TCP/IP Suite or the TCP/IP Model) is the set of computer network communications protocols and a description framework used for the Internet and other similar networks. The TCP/IP Model was created in the 1970s by DARPA, an agency of the United States Department of Defense (DOD). It evolved from ARPANET, which was the world's first wide area network and a predecessor of the Internet.
124
TCP/IP Protocol Suite Target goals
The DOD wanted to build a network to connect a number of military sites. The key requirements for the network were as follows: It must continue to function during nuclear war . The 7/8th rule required that the network should continue to function even when 7/8th of the network was not operational. It must be completely decentralized with no key central installation that could be destroyed and bring down the whole network. It must be fully redundant and able to continue communication between A and B even though intermediate sites and links might stop functioning during the conversation. The architecture must be flexible as the envisaged range of applications for the network was wide: from file transfer to time sensitive data such as voice.
125
TCP/IP Protocol Suite TCP/IP has evolved. The protocols within the TCP/IP Suite have been tested, modified, and improved over time. The original TCP/IP protocol suite targeted the management of large, evolving internetwork. Some TCP/IP goals included: Hardware independence - A protocol suite that could be used on a Mac, PC, mainframe, or any other computer. Software independence - A protocol suite that could be used by different software vendors and applications. This would enable a host on one site to communicate with a host on another site, without having the same software configuration: heterogeneous networks. Failure recovery and the ability to handle high error rates - A protocol suite that featured automatic recovery from any dropped or lost data. This protocol must be able to recover from an outage of any host on any part of the network and at any point in a data transfer.
126
TCP/IP Protocol Suite Efficient protocol with low overhead - A protocol suite that had a minimal amount of “extra” data moving with the data being transferred. This extra data called overhead, functions as packaging for the data being transferred and enables the data transmission. Overhead is similar to an envelope used to send a letter, or a box used to send a bigger item—having too much overhead is as efficient as using a large crate to send someone a necklace. Ability to add new networks to the internetwork without service disruption - A protocol suite that enabled new, independent networks to join this network of networks without bringing down the larger internetwork.
127
Routable Data - A protocol suite on which data could make its way through an internetwork of computers to any possible destination. For this to be possible, a single and meaningful addressing scheme must be used so that every computer that is moving the data can compute the best path for every piece of data as it moves through the network.
128
TCP/IP Protocol Suite Protocol Layers The TCP/IP protocol suite was developed before the OSI model was published. As a result, it does not use the OSI model as a reference. TCP/IP was developed using the Department of Defense (DoD) reference model. It’s important to be familiar with the OSI model, though, because OSI is used to compare the TCP/IP Suite with other protocol suites. Unlike the OSI model, the TCP/IP model has four layers. Still, the DoD model answers the same questions about network communications as the OSI model. An early architectural document, RFC 1122, emphasizes architectural principles over layering.
130
TCP/IP Protocol Suite Protocol Layers
End-to-End Principle: This principle has evolved over time. Its original expression put the maintenance of state and overall intelligence at the edges, and assumed the Internet that connected the edges retained no state and concentrated on speed and simplicity. Real-world needs for firewalls, network address translators, web content caches and the like have forced changes in this principle. Robustness Principle: "In general, an implementation must be conservative in its sending behavior, and liberal in its receiving behavior. That is, it must be careful to send well-formed datagrams, but must accept any datagram that it can interpret (e.g., not object to technical errors where the meaning is still clear) RFC 791.”
131
TCP/IP Protocol Suite Protocol Layers
RFC It defines a four-layer model, with the layers having names, not numbers, as follows: Application (process-to-process) Layer: This is the scope within which applications create user data and communicate this data to other processes or applications on another or the same host. The communications partners are often called peers. This is where the "higher level" protocols such as SMTP, FTP, SSH, HTTP, etc. operate. Transport (host-to-host) Layer: The Transport Layer constitutes the networking regime between two network hosts, either on the local network or on remote networks separated by routers. The Transport Layer provides a uniform networking interface that hides the actual topology (layout) of the underlying network connections. This is where flow-control, error-correction, and connection protocols exist, such as TCP. This layer deals with opening and maintaining connections between Internet hosts.
132
TCP/IP Protocol Suite Protocol Layers: RFC Internet (internetworking) Layer: The Internet Layer has the task of exchanging datagrams across network boundaries. It is therefore also referred to as the layer that establishes internetworking; indeed, it defines and establishes the Internet. This layer defines the addressing and routing structures used for the TCP/IP protocol suite. The primary protocol in this scope is the Internet Protocol, which defines IP addresses. Its function in routing is to transport datagrams to the next IP router that has the connectivity to a network closer to the final data destination. Link Layer: This layer defines the networking methods with the scope of the local network link on which hosts communicate without intervening routers. This layer describes the protocols used to describe the local network topology and the interfaces needed to affect transmission of Internet Layer datagrams to next-neighbor hosts.
133
TCP/IP Protocol Suite Protocol Layers
Encapsulation sequence of application data from UDP to a Link protocol frame: Data encapsulation - During a transmission, data crosses each one of the layers at the source machine. At each layer, a piece of information is added to the data packet, this is the header, a collection of information which guarantees transmission. At the destination machine, when passing through each layer, the header is read, and then deleted. So, upon its receipt, the message is in its original state. At each level, the data packet changes aspect, because a header is added to it, so the designations change according to the layers: The data packet is called a message at Application Layer The message is then encapsulated in the form of a segment in the Transport Layer Once the segment is encapsulated in the Internet Layer it takes the name of datagram Finally, we talk about a frame at the Link Layer 133
134
Comparison with TCP/IP
Pretty similar to OSI TCP/IP has less layers(four) Main difference in layers is after layer 4
135
Protocol Reference Model of OSI
OSI Overview 4. Data Encapsulation a) PDU conception – each protocol on the diff. layer has its own format. b) Headers are added while a packet is going down the stack at each layer. c) Trailers are usually added on the second layer.
136
TC/IP becomes a standard
A standard for software A standard for hardware Fur layers architecture Each layer independent on the others
137
Chapter 4 Frame Relay
138
What is Frame Relay? high-performance WAN protocol
operates at the physical and data link layers Originally designed for use across ISDN interfaces An example of packet-switched technology described as a streamlined version of X.25
139
Frame Relay vs. X.25 Frame Relay is a Layer 2 protocol suite, X.25 provides services at Layer 3 Frame Relay offers higher performance and greater transmission efficiency than X.25
140
Frame Relay Devices data terminal equipment (DTE)
terminating equipment for a specific network typically are located on the premises of a customer Examples: terminals, personal computers, routers, and bridges
141
Frame Relay Devices data circuit-terminating equipment (DCE)
carrier-owned internetworking devices to provide clocking and switching services in a network actually transmit data through the WAN
142
Figure 1 Frame Relay Devices
143
Frame Relay Virtual Circuits
provides connection-oriented data link layer communication a logical connection between two data terminal equipment across a Frame Relay packet-switched network provide a bi-directional communications path from one DTE device to another
144
Frame Relay Virtual Circuits
Switched virtual circuits (SVCs) temporary connections requires sporadic data transfer between DTE devices across the Frame Relay network Call Setup Data Transfer Idle Call Termination
145
Frame Relay Virtual Circuits
Permanent Virtual Circuits (PVCs) used for frequent and consistent data transfers between DTE devices across the Frame Relay network Data Transfer Idle
146
Congestion Control Mechanism
Forward-explicit congestion notification (FECN) Backward-explicit congestion notification (BECN)
147
Forward-explicit congestion notification (FECN)
initiated when a DTE device sends Frame Relay frames into the network When the frames reach the destination DTE device, the frame experienced congestion in the path from source to destination flow-control may be initiated, or the indication may be ignored
148
Backward-explicit congestion notification (BECN)
DCE devices set the value of the BECN bit to 1 in frames traveling in the opposite direction, informs the receiving DTE device that a particular path through the network is congested flow-control may be initiated, or the indication may be ignored
149
Frame Relay Discard Eligibility (DE)
(DE) bit is used to indicate that a frame has lower importance than other frames When the network becomes congested, DCE devices will discard frames with the DE bit
150
Frame Relay Error Checking
common error-checking mechanism known as the cyclic redundancy check (CRC) CRC compares two calculated values to determine whether errors occurred during the transmission
151
Frame Relay Network Implementation
consists of a number of DTE devices connected to remote ports on multiplexer equipment via traditional point-to-point services
152
Frame Relay Network Implementation
Figure 2 A simple Frame Relay network connects various devices to different services over a WAN.
153
Public Carrier-Provided Networks
Frame Relay switching equipment is located in the central offices of a telecommunications carrier The DCE equipment also is owned by the telecommunications provider The majority of today’s Frame Relay networks are public carrier-provided networks
154
Private Enterprise Networks
the administration and maintenance of the network are the responsibilities of the enterprise All the equipment, including the switching equipment, is owned by the customer
155
Figure 3 Frame Relay Frame
Frame Relay Frames Figure 3 Frame Relay Frame
156
Frame Relay Frames Flags indicate the beginning and end of the frame
Three primary components make up the Frame Relay frame the header and address area the user-data portion the frame-check sequence (FCS)
157
Frame Relay Frames The address area (2 bytes)
10 bits represents the actual circuit identifier 6 bits of fields related to congestion management
158
Frame Relay Frame Formats
Standard Frame Relay Frame LMI Frame Format
159
Standard Frame Relay Frame
Flags Delimits the beginning and end of the frame The value of this field is always the same (7E or )
160
Standard Frame Relay Frame
Address – contains the following information DLCI: The 10-bit DLCI is the essence of the Frame Relay header, values have local significance only, devices at opposite ends can use different DLCI values for the same virtual connection
161
Standard Frame Relay Frame
Address Extended Address (EA): used to indicate whether the byte in which the EA value is 1 is the last addressing field, the eighth bit of each byte of the Address field is used to indicate the EA
162
Standard Frame Relay Frame
Address Congestion Control: consists of the three bits; FECN, BECN, and DE bits
163
Standard Frame Relay Frame
Data – Contains encapsulated upper-layer data serves to transport the higher-layer protocol packet (PDU) through a Frame Relay network
164
Standard Frame Relay Frame
Frame Check Sequence Ensures the integrity of transmitted data
165
LMI Frame Format Figure 4 Nine fields comprise the Frame Relay that conforms to the LMI format
166
LMI Frame Format Flag - Delimits the beginning and end of the frame
LMI DLCI - Identifies the frame as an LMI frame instead of a basic Frame Relay frame Unnumbered Information Indicator - Sets the poll/final bit to zero
167
LMI Frame Format Protocol Discriminator - Always contains a value indicating that the frame is an LMI frame Call Reference - Always contains zeros. This field currently is not used for any purpose Message Type Status-inquiry message: Allows a user device to inquire about the status of the network Status message: Responds to status-inquiry messages. Status messages include keep-alives and PVC status messages
168
LMI Frame Format Information Elements—Contains a variable number of individual information elements (IEs) IE Identifier: Uniquely identifies the IE IE Length: Indicates the length of the IE Data: Consists of one or more bytes containing encapsulated upper-layer data Frame Check Sequence (FCS) - Ensures the integrity of transmitted data
169
Chapter 5 Asynchronous Transfer Mode(ATM)
170
ATM By the mid 1980s, three types of communication networks had evolved. The telephone network carries voice calls, television network carries video transmissions, and newly emerging computer network carries data. Telephone companies realized that voice communication was becoming a commodity service and that the profit margin would decrease over time. They realized that data communication was increasing. The telecommunication industry decided to expand its business by developing networks to carry traffic other than voice.
171
Goal of ATM (extremely ambitious)
Universal Service Support for all users Single, unified infrastructure Service guarantees Support for low-cost Devices
172
ATM The phone companies created Integrated Service Digital Network (ISDN) and Asynchronous Transfer Mode (ATM). ATM is intended as a universal networking technology that handles voice, video, and data transmission. ATM uses a connection-oriented paradigm in which an application first creates a virtual channel (VC), uses the channel for communication, and then terminates it. The communication is implemented by one or more ATM switches, each places an entry for the VC in its forwarding table.
173
ATM There are two types of ATM VCs: a PVC is created manually and survive power failures, and an SVC is created on demand. When creating a VC, a computer must specify quality of service (QoS) requirements. The ATM hardware either reserves the requested resources or denies the request.
174
Development of ATM ATM designers faced a difficult challenge because the three intended uses (voice, video, and data) have different sets of requirements. For example, both voice and video require low delay and low jitter (i.e. low variance in delay) that make it possible to deliver audio and video smoothly with gaps or delays in the output. Video requires a substantially higher data rate than audio. Most data networks introduce jitter as they handle packets.
175
Development of ATM To allow packet switches to operate at high speeds and to achieve low delay, low jitter, and echo cancellation, ATM technology divides all data into small, fixed-size packets called cells. Each ATM cell contains exactly 53 octets. 5 octets for header 48 octets for data
176
Cyclic Redundancy Check
ATM Cell Structure Bits: Flow Control VPI (First 4 bits) VPI (Last 4 bits) VCI (First 4 bits) VCI (Middle 8 bits) VCI (Last 4 bits) Payload type PRIO Cyclic Redundancy Check 48 Data Octets start here
177
ATM design and cells ATM was designed to be completely general. We will large cell for data and small cell for voice. In ATM, cell size is chosen as a compromise between large cells and small cells. Header is 10% of the payload area. In Ethernet: data => 1500 octets header => 14 octets cell tax =>1% In ATM: data => 48 octets header => 5 octets cell tax => 10%
178
ATM : Connection oriented
After the establishment of a connection between sender and receiver, the network hardware returns a connection identifier (a binary value) to each of the two computers. When sender sends cells, it places the connection identifier in each cell header. When it receives a cell, an ATM switch extracts the connection identifier and consults a table to determine how to forward the cell.
179
VPI/VCI Formally, an ATM connection is known as a virtual channel (VC). ATM assigns each VC a 24-bit identifier that is divided into 2 parts to produce a hierarchy. The first part, a virtual path identifier (VPI), specifies the path the VC follows through the network. A VPI is 8 bits long. The second part, a Virtual Channel Identifier (VCI), specifies a single VC within the path. A VCI is 16 bits long.
180
ATM Protocol Layers ATM Endpoint ATM Endpoint ATM Switch
ATM Adaptation Layer ATM Layer Physical Layer ATM Adaptation Layer ATM Layer Physical Layer ATM Layer Physical Layer Physical Medium
181
ATM Protocol Layer Physical Layer: The lowest layer in the ATM protocol. It describes the physical transmission media. We can use shielded and unshielded twisted pair, coaxial cable, and fiber-optic cable. ATM Layer: It performs all functions relating to the routing and multiplexing of cells over VCs. It generates a header to the segment streams generated by the AAL. Similarly, on receipt of a cell streams, it removes the header from the cell and pass the cell contents to the AAL protocol. To perform all these functions, the ATM layer maintains a table which contains a list of VCIs.
182
ATM Protocol Layer ATM Adaptation Layer: Top layer in the ATM protocol Model. It converts the submitted information into streams of 48-octet segments and transports these in the payload field of multiple ATM cells. Similarly, on receipt of the stream of cells relating to the same call, it converts the 48-octet information field into required form for delivery to the particular higher protocol layer. Currently five service types have been defined. They are referred to as AAL1-5. AAL1 and AAL2 are connection oriented. AAL1 provides a constant bit rate (CBR) service, where as AAL2 provides a variable bit rate (VBR) service. Initially, AAL 3 was defined to provide connection oriented and VBR service. Later, this service type was dropped and it is now merged with AAL 4. Both AAL ¾ and AAL 5 provide a similar connectionless VBR service.
183
Disadvantages ATM has not been widely accepted. Although some phone companies still use it in their backbone networks. The expense, complexity and lack of interoperability with other technologies have prevented ATM from becoming more prevalent.
184
Disadvantages Expense: ATM technology provides a comprehensive lists of services, even a moderate ATM switch costs much more than inexpensive LAN hardware. In addition, the network interface card needed to connect a computer to an ATM network is significantly more expensive than a corresponding Ethernet NIC. Connection Setup Latency: ATM’s connection-oriented paradigm introduces significant delay for distant communication. The time required to set up and tear down the ATM VC for distant communication is significantly larger than the time required to use it.
185
Disadvantages Cell Tax: ATM cell headers impose a 10% tax on all data transfer. In case of Ethernet, cell tax is 1%. Lack of Efficient Broadcast: Connection-oriented networks like ATM are sometimes called Non Broadcast Multiple Access (NBMA) networks because the hardware does not support broadcast or multicast. On an ATM network, broadcast to a set of computers is ‘simulated’ by arranging for an application program to pass a copy of the data to each computer in the set. As a result, broadcast is in efficient.
186
Disadvantages Complexity of QoS: The complexity of the specification makes implementation cumbersome and difficult. Many implementations do not support the full standard. Assumption of Homogeneity: ATM is designed to be a single, universal networking system. There is minimal provision for interoperating with other technologies
187
Chapter 6 High-Speed LANs
188
Introduction range of technologies Fast and Gigabit Ethernet
Fibre Channel High Speed Wireless LANs The most important of these are • Fast Ethernet and Gigabit Ethernet: The extension of 10-Mbps CSMA/CD (carrier sense multiple access with collision detection) to higher speeds is a logical strategy, because it tends to preserve the investment in existing systems. • Fibre Channel: This standard provides a low-cost, easily scalable approach to achieving very high data rates in local areas. • High-speed wireless LANs: Wireless LAN technology and standards have at last come of age, and high-speed standards and products are being introduced (see next chapter)
189
Why High Speed LANs? speed and power of PCs has risen
graphics-intensive applications and GUIs see LANs as essential to organizations for client/server computing now have requirements for centralized server farms power workgroups high-speed local backbone In recent years, two significant trends have altered the role of the personal computer, increased the volume of data to be handled over LANs, and therefore the requirements on the LAN: • The speed and computing power of personal computers has continued to enjoy explosive growth • MIS organizations have recognized the LAN as a viable and indeed essential computing platform, resulting in the focus on network computing. The following are examples of requirements that call for higher-speed LANs: • Centralized server farms: In many applications, there is a need for user, or client, systems to be able to draw huge amounts of data from multiple centralized servers, called server farms.. As the performance of the servers themselves has increased, the bottleneck has shifted to the network. • Power workgroups: These groups typically consist of a small number of cooperating users who need to draw massive data files across the network. In such cases, large amounts of data are distributed to several workstations, processed, and updated at very high speed for multiple iterations. • High-speed local backbone: As processing demand grows, LANs proliferate at a site, and high-speed interconnection is necessary.
190
Ethernet (CSMA/CD) most widely used LAN standard developed by
Xerox - original Ethernet IEEE 802.3 Carrier Sense Multiple Access with Collision Detection (CSMA/CD) random / contention access to media The most widely used high-speed LANs today are based on Ethernet and were developed by the IEEE standards committee. As with other LAN standards, there is both a medium access control layer and a physical layer. The media access uses CSMA/CD. This and its precursors can be termed random access, or contention, techniques. They are random access in the sense that there is no predictable or scheduled time for any station to transmit; station transmissions are ordered randomly. They exhibit contention in the sense that stations contend for time on the shared medium.
191
ALOHA developed for packet radio nets when station has frame, it sends
then listens for a bit over max round trip time if receive ACK then fine if not, retransmit if no ACK after repeated transmissions, give up uses a frame check sequence (as in HDLC) frame may be damaged by noise or by another station transmitting at the same time (collision) any overlap of frames causes collision max utilization 18% The earliest of these techniques, known as ALOHA, was developed for packet radio networks. However, it is applicable to any shared transmission medium. ALOHA, or pure ALOHA as it is sometimes called, specifies that a station may transmit a frame at any time. The station then listens for an amount of time equal to the maximum possible round-trip propagation delay on the network (twice the time it takes to send a frame between the two most widely separated stations) plus a small fixed time increment. If the station hears an acknowledgment during that time, fine; otherwise, it resends the frame. If the station fails to receive an acknowledgment after repeated transmissions, it gives up. A receiving station determines the correctness of an incoming frame by examining a frame check sequence field, as in HDLC. If the frame is valid and if the destination address in the frame header matches the receiver's address, the station immediately sends an acknowledgment. The frame may be invalid due to noise on the channel or because another station transmitted a frame at about the same time. In the latter case, the two frames may interfere with each other at the receiver so that neither gets through; this is known as a collision. If a received frame is determined to be invalid, the receiving station simply ignores the frame. ALOHA is as simple as can be, and pays a penalty for it. Because the number of collisions rises rapidly with increased load, the maximum utilization of the channel is only about 18%.
192
Slotted ALOHA time on channel based on uniform slots equal to frame transmission time need central clock (or other sync mechanism) transmission begins at slot boundary frames either miss or overlap totally max utilization 37% both have poor utilization fail to use fact that propagation time is much less than frame transmission time To improve efficiency, a modification of ALOHA, known as slotted ALOHA, was developed. In this scheme, time on the channel is organized into uniform slots whose size equals the frame transmission time. Some central clock or other technique is needed to synchronize all stations. Transmission is permitted to begin only at a slot boundary. Thus, frames that do overlap will do so totally. This increases the maximum utilization of the system to about 37%. Both ALOHA and slotted ALOHA exhibit poor utilization. Both fail to take advantage of one of the key properties of both packet radio networks and LANs, which is that propagation delay between stations may be very small compared to frame transmission time.
193
CSMA stations soon know transmission has started
so first listen for clear medium (carrier sense) if medium idle, transmit if two stations start at the same instant, collision wait reasonable time if no ACK then retransmit collisions occur at leading edge of frame max utilization depends on propagation time (medium length) and frame length The foregoing observations led to the development of carrier sense multiple access (CSMA). With CSMA, a station wishing to transmit first listens to the medium to determine if another transmission is in progress (carrier sense). If the medium is in use, the station must wait. If the medium is idle, the station may transmit. It may happen that two or more stations attempt to transmit at about the same time. If this happens, there will be a collision; the data from both transmissions will be garbled and not received successfully. To account for this, a station waits a reasonable amount of time after transmitting for an acknowledgment, taking into account the maximum round-trip propagation delay and the fact that the acknowledging station must also contend for the channel to respond. If there is no acknowledgment, the station assumes that a collision has occurred and retransmits. This strategy is effective for networks in which the average frame transmission time is much longer than the propagation time. Collisions can occur only when more than one user begins transmitting within a short time interval (the period of the propagation delay). If a station begins to transmit a frame, and there are no collisions during the time it takes for the leading edge of the packet to propagate to the farthest station, then there will be no collision for this frame because all other stations are now aware of the transmission. The maximum utilization achievable using CSMA can far exceed that of ALOHA or slotted ALOHA. The maximum utilization depends on the length of the frame and on the propagation time; the longer the frames or the shorter the propagation time, the higher the utilization.
194
Nonpersistent CSMA Nonpersistent CSMA rules: if medium idle, transmit
if medium busy, wait amount of time drawn from probability distribution (retransmission delay) & retry random delays reduces probability of collisions capacity is wasted because medium will remain idle following end of transmission nonpersistent stations are deferential With CSMA, an algorithm is needed to specify what a station should do if the medium is found busy. One algorithm is nonpersistent CSMA. A station wishing to transmit listens to the medium and obeys the following rules: 1. If the medium is idle, transmit; otherwise, go to step 2. 2. If the medium is busy, wait an amount of time drawn from a probability distribution (the retransmission delay) and repeat step 1. The use of random delays reduces the probability of collisions. To see this, consider that two stations become ready to transmit at about the same time while another transmission is in progress; if both stations delay the same amount of time before trying again, they will both attempt to transmit at about the same time. A problem with nonpersistent CSMA is that capacity is wasted because the medium will generally remain idle following the end of a transmission even if there are one or more stations waiting to transmit.
195
1-persistent CSMA 1-persistent CSMA avoids idle channel time
1-persistent CSMA rules: if medium idle, transmit; if medium busy, listen until idle; then transmit immediately 1-persistent stations are selfish if two or more stations waiting, a collision is guaranteed To avoid idle channel time, the 1-persistent protocol can be used. A station wishing to transmit listens to the medium and obeys the following rules: 1. If the medium is idle, transmit; otherwise, go to step 2. 2. If the medium is busy, continue to listen until the channel is sensed idle; then transmit immediately. Whereas nonpersistent stations are deferential, 1-persistent stations are selfish. If two or more stations are waiting to transmit, a collision is guaranteed. Things get sorted out only after the collision.
196
P-persistent CSMA a compromise to try and reduce collisions and idle time p-persistent CSMA rules: if medium idle, transmit with probability p, and delay one time unit with probability (1–p) if medium busy, listen until idle and repeat step 1 if transmission is delayed one time unit, repeat step 1 issue of choosing effective value of p to avoid instability under heavy load A compromise that attempts to reduce collisions, like nonpersistent, and reduce idle time, like 1-persistent, is p-persistent. The rules are: 1. If the medium is idle, transmit with probability p, and delay one time unit with probability (1 – p). The time unit is typically equal to the maximum propagation delay. 2. If the medium is busy, continue to listen until the channel is idle and repeat step 1. 3. If transmission is delayed one time unit, repeat step 1. The question arises as to what is an effective value of p. The main problem to avoid is one of instability under heavy load.
197
Value of p? have n stations waiting to send
at end of tx, expected no of stations is np if np>1 on average there will be a collision repeated tx attempts mean collisions likely eventually when all stations trying to send have continuous collisions hence zero throughput thus want np<1 for expected peaks of n if heavy load expected, p small but smaller p means stations wait longer Consider the case in which n stations have frames to send while a transmission is taking place. At the end of the transmission, the expected number of stations that will attempt to transmit is equal to the number of stations ready to transmit times the probability of transmitting, or np. If np is greater than 1, on average multiple stations will attempt to transmit and there will be a collision. What is more, as soon as all these stations realize that their transmission suffered a collision, they will be back again, almost guaranteeing more collisions. Worse yet, these retries will compete with new transmissions from other stations, further increasing the probability of collision. Eventually, all stations will be trying to send, causing continuous collisions, with throughput dropping to zero. To avoid this catastrophe, np must be less than one for the expected peaks of n; therefore, if a heavy load is expected to occur with some regularity, p must be small. However, as p is made smaller, stations must wait longer to attempt transmission. At low loads, this can result in very long delays. For example, if only a single station desires to transmit, the expected number of iterations of step 1 is 1/p. Thus, if p = 0.1, at low load, a station will wait an average of 9 time units before transmitting on an idle line.
198
CSMA/CD Description with CSMA, collision occupies medium for duration of transmission better if stations listen whilst transmitting CSMA/CD rules: if medium idle, transmit if busy, listen for idle, then transmit if collision detected, jam and then cease transmission after jam, wait random time then retry CSMA, although more efficient than ALOHA or slotted ALOHA, still has one glaring inefficiency. When two frames collide, the medium remains unusable for the duration of transmission of both damaged frames. For long frames, compared to propagation time, the amount of wasted capacity can be considerable. This waste can be reduced if a station continues to listen to the medium while transmitting. This leads to the following rules for CSMA/CD: 1. If the medium is idle, transmit; otherwise, go to step 2. 2. If the medium is busy, continue to listen until the channel is idle, then transmit immediately. 3. If a collision is detected during transmission, transmit a brief jamming signal to assure that all stations know that there has been a collision and then cease transmission. 4. After transmitting the jamming signal, wait a random amount of time, referred to as the backoff, then attempt to transmit again (repeat from step 1).
199
CSMA/CD Operation Stallings DCC8e Figure 16.2 illustrates the technique for a baseband bus. The upper part of the figure shows a bus LAN layout. At time t0, station A begins transmitting a packet addressed to D. At t 1, both B and C are ready to transmit. B senses a transmission and so defers. C, however, is still unaware of A's transmission (because the leading edge of A's transmission has not yet arrived at C) and begins its own transmission. When A's transmission reaches C, at t 2, C detects the collision and ceases transmission. The effect of the collision propagates back to A, where it is detected some time later, t 3, at which time A ceases transmission. With CSMA/CD, the amount of wasted capacity is reduced to the time it takes to detect a collision. Question: How long does that take? Let us consider the case of a baseband bus and consider two stations as far apart as possible. For example, in Figure 16.2, suppose that station A begins a transmission and that just before that transmission reaches D, D is ready to transmit. Because D is not yet aware of A's transmission, it begins to transmit. A collision occurs almost immediately and is recognized by D. However, the collision must propagate all the way back to A before A is aware of the collision. By this line of reasoning, we conclude that the amount of time that it takes to detect a collision is no greater than twice the end-to-end propagation delay.
200
Which Persistence Algorithm?
IEEE uses 1-persistent both nonpersistent and p-persistent have performance problems 1-persistent seems more unstable than p-persistent because of greed of the stations but wasted time due to collisions is short with random backoff unlikely to collide on next attempt to send An important rule followed in most CSMA/CD systems, including the IEEE standard, is that frames should be long enough to allow collision detection prior to the end of transmission. If shorter frames are used, then collision detection does not occur, and CSMA/CD exhibits the same performance as the less efficient CSMA protocol. For a CSMA/CD LAN, the question arises as to which persistence algorithm to use. You may be surprised to learn that the algorithm used in the IEEE standard is 1-persistent. Recall that both nonpersistent and p-persistent have performance problems. In the nonpersistent case, capacity is wasted because the medium will generally remain idle following the end of a transmission even if there are stations waiting to send. In the p-persistent case, p must be set low enough to avoid instability, with the result of sometimes atrocious delays under light load. The 1-persistent algorithm, which means, after all, that p 1, would seem to be even more unstable than p-persistent due to the greed of the stations. What saves the day is that the wasted time due to collisions is mercifully short (if the frames are long relative to propagation delay), and with random backoff, the two stations involved in a collision are unlikely to collide on their next tries.
201
Binary Exponential Backoff
for backoff stability, IEEE and Ethernet both use binary exponential backoff stations repeatedly resend when collide on first 10 attempts, mean random delay doubled value then remains same for 6 further attempts after 16 unsuccessful attempts, station gives up and reports error 1-persistent algorithm with binary exponential backoff efficient over wide range of loads but backoff algorithm has last-in, first-out effect To ensure that backoff maintains stability, IEEE and Ethernet use a technique known as binary exponential backoff. A station will attempt to transmit repeatedly in the face of repeated collisions. For the first 10 retransmission attempts, the mean value of the random delay is doubled. This mean value then remains the same for 6 additional attempts. After 16 unsuccessful attempts, the station gives up and reports an error. Thus, as congestion increases, stations back off by larger and larger amounts to reduce the probability of collision. The beauty of the 1-persistent algorithm with binary exponential backoff is that it is efficient over a wide range of loads. At low loads, 1-persistence guarantees that a station can seize the channel as soon as it goes idle, in contrast to the non- and p-persistent schemes. At high loads, it is at least as stable as the other techniques. However, one unfortunate effect of the backoff algorithm is that it has a last-in first-out effect; stations with no or few collisions will have a chance to transmit before stations that have waited longer.
202
Collision Detection on baseband bus
collision produces higher signal voltage collision detected if cable signal greater than single station signal signal is attenuated over distance limit to 500m (10Base5) or 200m (10Base2) on twisted pair (star-topology) activity on more than one port is collision use special collision presence signal For baseband bus, a collision should produce substantially higher voltage swings than those produced by a single transmitter. Accordingly, the IEEE standard dictates that the transmitter will detect a collision if the signal on the cable at the transmitter tap point exceeds the maximum that could be produced by the transmitter alone. Because a transmitted signal attenuates as it propagates, there is a potential problem: If two stations far apart are transmitting, each station will receive a greatly attenuated signal from the other. The signal strength could be so small that when it is added to the transmitted signal at the transmitter tap point, the combined signal does not exceed the CD threshold. For this reason, among others, the IEEE standard restricts the maximum length of coaxial cable to 500 m for 10BASE5 and 200 m for 10BASE2. A much simpler collision detection scheme is possible with the twisted-pair star-topology approach. In this case, collision detection is based on logic rather than sensing voltage magnitudes. For any hub, if there is activity (signal) on more than one input, a collision is assumed. A special signal called the collision presence signal is generated. This signal is generated and sent out as long as activity is sensed on any of the input lines. This signal is interpreted by every node as an occurrence of a collision.
203
IEEE Frame Format Stallings DCC8e Figure 16.3 depicts the frame format for the protocol. It consists of the following fields: • Preamble: A 7-octet pattern of alternating 0s and 1s used by the receiver to establish bit synchronization. • Start Frame Delimiter (SFD): The sequence , which indicates the actual start of the frame and enables the receiver to locate the first bit of the rest of the frame. • Destination Address (DA): Specifies the station(s) for which the frame is intended. It may be a unique physical address, a group address, or a global address. • Source Address (SA): Specifies the station that sent the frame. • Length/Type: Length of LLC data field in octets, or Ethernet Type field, depending on whether the frame conforms to the IEEE standard or the earlier Ethernet specification. In either case, the maximum frame size, excluding the Preamble and SFD, is 1518 octets. • LLC Data: Data unit supplied by LLC. • Pad: Octets added to ensure that the frame is long enough for proper CD operation. • Frame Check Sequence (FCS): A 32-bit cyclic redundancy check, based on all fields except preamble, SFD, and FCS.
204
10Mbps Specification (Ethernet)
The IEEE committee has defined a number of alternative physical configurations. This is both good and bad. On the good side, the standard has been responsive to evolving technology. On the bad side, the customer, not to mention the potential vendor, is faced with a bewildering array of options. However, the committee has been at pains to ensure that the various options can be easily integrated into a configuration that satisfies a variety of needs. Thus, the user that has a complex set of requirements may find the flexibility and variety of the standard to be an asset. To distinguish the various implementations that are available, the committee has developed a concise notation: <data rate in Mbps> <signaling method><max segment length in hundreds of meters> The defined alternatives for 10-Mbps are: • 10BASE5: Specifies the use of 50-ohm coaxial cable and Manchester digital signaling. The maximum length of a cable segment is set at 500 meters. Can extend using up to 4 repeaters. • 10BASE2: lower-cost alternative to 10BASE5 using a thinner cable, with fewer taps over a shorter distance than the 10BASE5 cable. • 10BASE-T: Uses unshielded twisted pair in a star-shaped topology, with length of a link is limited to 100 meters. As an alternative, an optical fiber link may be used out to 500 m. • 10BASE-F: Contains three specifications using optical fibre
205
100Mbps Fast Ethernet Fast Ethernet refers to a set of specifications developed by the IEEE committee to provide a low-cost, Ethernet-compatible LAN operating at 100 Mbps. The blanket designation for these standards is 100BASE-T. The committee defined a number of alternatives to be used with different transmission media. Stallings DCC8e Table 16.3 summarizes key characteristics of the 100BASE-T options. All of the 100BASE-T options use the IEEE MAC protocol and frame format. 100BASE-X refers to a set of options that use two physical links between nodes; one for transmission and one for reception. 100BASE-TX makes use of shielded twisted pair (STP) or high-quality (Category 5) unshielded twisted pair (UTP). 100BASE-FX uses optical fiber. For all of the 100BASE-T options, the topology is similar to that of 10BASE-T, namely a star-wire topology. In many buildings, any of the 100BASE-X options requires the installation of new cable. For such cases, 100BASE-T4 defines a lower-cost alternative that can use Category 3, voice-grade UTP in addition to the higher-quality Category 5 UTP. To achieve the 100-Mbps data rate over lower-quality cable, 100BASE-T4 dictates the use of four twisted-pair lines between nodes, with the data transmission making use of three pairs in one direction at a time.
206
100BASE-X uses a unidirectional data rate 100 Mbps over single twisted pair or optical fiber link encoding scheme same as FDDI 4B/5B-NRZI two physical medium specifications 100BASE-TX uses two pairs of twisted-pair cable for tx & rx STP and Category 5 UTP allowed MTL-3 signaling scheme is used 100BASE-FX uses two optical fiber cables for tx & rx convert 4B/5B-NRZI code group into optical signals For all of the transmission media specified under 100BASE-X, a unidirectional data rate of 100 Mbps is achieved transmitting over a single link (single twisted pair, single optical fiber). For all of these media, an efficient and effective signal encoding scheme is required. The one chosen is referred to as 4B/5B-NRZI. This scheme is further modified for each option. The 100BASE-X designation includes two physical medium specifications, one for twisted pair, known as 100BASE-TX, and one for optical fiber, known as 100-BASE-FX. 100BASE-TX makes use of two pairs of twisted-pair cable, one pair used for transmission and one for reception. Both STP and Category 5 UTP are allowed. The MTL-3 signaling scheme is used. 100BASE-FX makes use of two optical fiber cables, one for transmission and one for reception. With 100BASE-FX, a means is needed to convert the 4B/5B-NRZI code group stream into optical signals. The technique used is known as intensity modulation. A binary 1 is represented by a burst or pulse of light; a binary 0 is represented by either the absence of a light pulse or a light pulse at very low intensity.
207
100BASE-T4 100-Mbps over lower-quality Cat 3 UTP
takes advantage of large installed base does not transmit continuous signal between packets useful in battery-powered applications can not get 100 Mbps on single twisted pair so data stream split into three separate streams four twisted pairs used data transmitted and received using three pairs two pairs configured for bidirectional transmission use ternary signaling scheme (8B6T) 100BASE-T4 is designed to produce a 100-Mbps data rate over lower-quality Category 3 cable, thus taking advantage of the large installed base of Category 3 cable in office buildings. The specification also indicates that the use of Category 5 cable is optional. 100BASE-T4 does not transmit a continuous signal between packets, which makes it useful in battery-powered applications. For 100BASE-T4 using voice-grade Category 3 cable, it is not reasonable to expect to achieve 100 Mbps on a single twisted pair. Instead, 100BASE-T4 specifies that the data stream to be transmitted is split up into three separate data streams, each with an effective data rate of Mbps. Four twisted pairs are used. Data are transmitted using three pairs and received using three pairs. Thus, two of the pairs must be configured for bidirectional transmission. As with 100BASE-X, a simple NRZ encoding scheme is not used for 100BASE-T4. This would require a signaling rate of 33 Mbps on each twisted pair and does not provide synchronization. Instead, a ternary signaling scheme known as 8B6T is used
208
100BASE-T Options Can summarize 100Base-T options as shown.
209
Full Duplex Operation traditional Ethernet half duplex
using full-duplex, station can transmit and receive simultaneously 100-Mbps Ethernet in full-duplex mode, giving a theoretical transfer rate of 200 Mbps stations must have full-duplex adapter cards and must use switching hub each station constitutes separate collision domain CSMA/CD algorithm no longer needed 802.3 MAC frame format used A traditional Ethernet is half duplex: a station can either transmit or receive a frame, but it cannot do both simultaneously. With full-duplex operation, a station can transmit and receive simultaneously. If a 100-Mbps Ethernet ran in full-duplex mode, the theoretical transfer rate becomes 200 Mbps. Several changes are needed to operate in full-duplex mode. The attached stations must have full-duplex rather than half-duplex adapter cards. The central point in the star wire cannot be a simple multiport repeater but rather must be a switching hub. In this case each station constitutes a separate collision domain. In fact, there are no collisions and the CSMA/CD algorithm is no longer needed. However, the same MAC frame format is used and the attached stations can continue to execute the CSMA/CD algorithm, even though no collisions can ever be detected.
210
Mixed Configurations Fast Ethernet supports mixture of existing 10-Mbps LANs and newer 100-Mbps LANs supporting older and newer technologies e.g. 100-Mbps backbone LAN supports 10-Mbps hubs stations attach to 10-Mbps hubs using 10BASE-T hubs connected to switching hubs using 100BASE-T high-capacity workstations and servers attach directly to 10/100 switches switches connected to 100-Mbps hubs use 100-Mbps links 100-Mbps hubs provide building backbone connected to router providing connection to WAN One of the strengths of the Fast Ethernet approach is that it readily supports a mixture of existing 10-Mbps LANs and newer 100-Mbps LANs. For example, the 100-Mbps technology can be used as a backbone LAN to support a number of 10-Mbps hubs. Many of the stations attach to 10-Mbps hubs using the 10BASE-T standard. These hubs are in turn connected to switching hubs that conform to 100BASE-T and that can support both 10-Mbps and 100-Mbps links. Additional high-capacity workstations and servers attach directly to these 10/100 switches. These mixed-capacity switches are in turn connected to 100-Mbps hubs using 100-Mbps links. The 100-Mbps hubs provide a building backbone and are also connected to a router that provides connection to an outside WAN.
211
Gigabit Ethernet Configuration
In late 1995, the IEEE committee formed a High-Speed Study Group to investigate means for conveying packets in Ethernet format at speeds in the gigabits per second range. The strategy for Gigabit Ethernet is the same as that for Fast Ethernet. While defining a new medium and transmission specification, Gigabit Ethernet retains the CSMA/CD protocol and Ethernet format of its 10-Mbps and 100-Mbps predecessors. It is compatible with 100BASE-T and 10BASE-T, preserving a smooth migration path. As more organizations move to 100BASE-T, putting huge traffic loads on backbone networks, demand for Gigabit Ethernet has intensified. Stallings DCC8e Figure 16.4 shows a typical application of Gigabit Ethernet. A 1-Gbps switching hub provides backbone connectivity for central servers and high-speed workgroup hubs. Each workgroup LAN switch supports both 1-Gbps links, to connect to the backbone LAN switch and to support high-performance workgroup servers, and 100-Mbps links, to support high-performance workstations, servers, and 100-Mbps LAN switches.
212
Gigabit Ethernet - Differences
carrier extension at least 4096 bit-times long (512 for 10/100) frame bursting not needed if using a switched hub to provide dedicated media access The 1000-Mbps specification calls for the same CSMA/CD frame format and MAC protocol as used in the 10-Mbps and 100-Mbps version of IEEE For shared-medium hub operation, there are two enhancements to the basic CSMA/CD scheme: • Carrier extension: Carrier extension appends a set of special symbols to the end of short MAC frames so that the resulting block is at least 4096 bit-times in duration, up from the minimum 512 bit-times imposed at 10 and 100 Mbps. This is so that the frame length of a transmission is longer than the propagation time at 1 Gbps. • Frame bursting: This feature allows for multiple short frames to be transmitted consecutively, up to a limit, without relinquishing control for CSMA/CD between frames. Frame bursting avoids the overhead of carrier extension when a single station has a number of small frames ready to send. With a switching hub which provides dedicated access to the medium, the carrier extension and frame bursting features are not needed. This is because data transmission and reception at a station can occur simultaneously without interference and with no contention for a shared medium.
213
Gigabit Ethernet – Physical
The current 1-Gbps specification for IEEE includes the following physical layer alternatives, shown above in Stallings DCC8e Figure 16.5: • 1000BASE-SX: This short-wavelength option supports duplex links of up to 275 m using 62.5-µm multimode or up to 550 m using 50-µm multimode fiber. Wavelengths are in the range of 770 to 860 nm. • 1000BASE-LX: This long-wavelength option supports duplex links of up to 550 m of 62.5-µm or 50-µm multimode fiber or 5 km of 10-µm single-mode fiber. Wavelengths are in the range of 1270 to 1355 nm. • 1000BASE-CX: This option supports 1-Gbps links among devices located within a single room or equipment rack, using copper jumpers (specialized shielded twisted-pair cable that spans no more than 25 m). Each link is composed of a separate shielded twisted pair running in each direction. • 1000BASE-T: This option makes use of four pairs of Category 5 unshielded twisted pair to support devices over a range of up to 100 m. The signal encoding scheme used for the first three Gigabit Ethernet options just listed is 8B/10B. The signal-encoding scheme used for 1000BASE-T is 4D-PAM5, a complex scheme whose description is beyond our scope.
214
10Gbps Ethernet growing interest in 10Gbps Ethernet
for high-speed backbone use with future wider deployment alternative to ATM and other WAN technologies uniform technology for LAN, MAN, or WAN advantages of 10Gbps Ethernet no expensive, bandwidth-consuming conversion between Ethernet packets and ATM cells IP and Ethernet together offers QoS and traffic policing approach ATM have a variety of standard optical interfaces With gigabit products still fairly new, attention has turned in the past several years to a 10-Gbps Ethernet capability. The principle driving requirement for 10 Gigabit Ethernet is the increase in Internet and intranet traffic. Initially network managers will use 10-Gbps Ethernet to provide high-speed, local backbone interconnection between large-capacity switches. As the demand for bandwidth increases, 10-Gbps Ethernet will be deployed throughout the entire network and will include server farm, backbone, and campuswide connectivity. Ethernet can compete with ATM and other wide area transmission and networking technologies. In most cases where the customer requirement is data and TCP/IP transport, 10-Gbps Ethernet provides substantial value over ATM transport for both network end users and service providers: • No expensive, bandwidth-consuming conversion between Ethernet packets and ATM cells is required; the network is Ethernet, end to end. • The combination of IP and Ethernet offers quality of service and traffic policing capabilities that approach those provided by ATM, so that advanced traffic engineering technologies are available to users and providers. • A wide variety of standard optical interfaces (wavelengths and link distances) have been specified for 10-Gbps Ethernet, optimizing its operation and cost for LAN, MAN, or WAN applications.
215
10Gbps Ethernet Configurations
Stallings DCC8e Figure 16.6 illustrates potential uses of 10-Gbps Ethernet. Higher-capacity backbone pipes will help relieve congestion for workgroup switches, where Gigabit Ethernet uplinks can easily become overloaded, and for server farms, where 1-Gbps network interface cards are already in widespread use. The goal for maximum link distances cover a range of applications: from 300 m to 40 km. The links operate in full-duplex mode only, using a variety of optical fiber physical media. The success of Fast Ethernet, Gigabit Ethernet, and 10-Gbps Ethernet highlights the importance of network management concerns in choosing a network technology. Both ATM and Fiber Channel, explored later, may be technically superior choices for a high-speed backbone, because of their flexibility and scalability. However, the Ethernet alternatives offer compatibility with existing installed LANs, network management software, and applications. This compatibility has accounted for the survival of a nearly 30-year-old technology (CSMA/CD) in today's fast-evolving network environment.
216
10Gbps Ethernet Options Four physical layer options are defined for 10-Gbps Ethernet, as shown in Stallings DCC8e Figure 16.). The first three of these have two suboptions: an "R" suboption and a "W" suboption. The R designation refers to a family of physical layer implementations that use a signal encoding technique known as 64B/66B. The R implementations are designed for use over dark fiber, meaning a fiber optic cable that is not in use and that is not connected to any other equipment. The W designation refers to a family of physical layer implementations that also use 64B/66B signaling but that are then encapsulated to connect to SONET equipment. The four physical layer options are: • 10GBASE-S (short): Designed for 850-nm transmission on multimode fiber. This medium can achieve distances up to 300 m. There are 10GBASE-SR and 10GBASE-SW versions. • 10GBASE-L (long): Designed for 1310-nm transmission on single-mode fiber. This medium can achieve distances up to 10 km. There are 10GBASE-LR and 10GBASE-LW versions. • 10GBASE-E (extended): Designed for 1550-nm transmission on single-mode fiber. This medium can achieve distances up to 40 km. There are 10GBASE-ER and 10GBASE-EW versions. • 10GBASE-LX4: Designed for 1310-nm transmission on single-mode or multimode fiber. This medium can achieve distances up to 10 km. This medium uses wavelength-division multiplexing (WDM) to multiplex the bit stream across four light waves.
217
Fibre Channel - Background
I/O channel direct point to point or multipoint comms link hardware based, high speed, very short distances network connection based on interconnected access points software based protocol with flow control, error detection & recovery for end systems connections As the speed and memory capacity of personal computers, workstations, and servers have grown, and as applications have become ever more complex with greater reliance on graphics and video, the requirement for greater speed in delivering data to the processor has grown. This requirement affects two methods of data communications with the processor: I/O channel and network communications. An I/O channel is a direct point-to-point or multipoint communications link, predominantly hardware based and designed for high speed over very short distances. The I/O channel transfers data between a buffer at the source device and a buffer at the destination device, moving only the user contents from one device to another, without regard to the format or meaning of the data. The logic associated with the channel typically provides the minimum control necessary to manage the transfer plus hardware error detection. I/O channels typically manage transfers between processors and peripheral devices, such as disks, graphics equipment, CD-ROMs, and video I/O devices. A network is a collection of interconnected access points with a software protocol structure that enables communication. The network typically allows many different types of data transfer, using software to implement the networking protocols and to provide flow control, error detection, and error recovery. As we have discussed in this book, networks typically manage transfers between end systems over local, metropolitan, or wide area distances.
218
Fibre Channel combines best of both technologies channel oriented
data type qualifiers for routing frame payload link level constructs associated with I/O ops protocol interface specifications to support existing I/O architectures network oriented full multiplexing between multiple destinations peer to peer connectivity internetworking to other connection technologies Fibre Channel is designed to combine the best features of both technologies—the simplicity and speed of channel communications with the flexibility and interconnectivity that characterize protocol-based network communications. This fusion of approaches allows system designers to combine traditional peripheral connection, host-to-host internetworking, loosely coupled processor clustering, and multimedia applications in a single multiprotocol interface. The types of channel-oriented facilities incorporated into the Fibre Channel protocol architecture include: • Data-type qualifiers for routing frame payload into particular interface buffers • Link-level constructs associated with individual I/O operations • Protocol interface specifications to allow support of existing I/O channel architectures, such as the Small Computer System Interface (SCSI) The types of network-oriented facilities incorporated into the Fibre Channel protocol architecture include • Full multiplexing of traffic between multiple destinations • Peer-to-peer connectivity between any pair of ports on a Fibre Channel network • Capabilities for internetworking to other connection technologies
219
Fibre Channel Requirements
full duplex links with two fibers per link 100 Mbps to 800 Mbps on single line support distances up to 10 km small connectors high-capacity utilization, distance insensitivity greater connectivity than existing multidrop channels broad availability multiple cost/performance levels carry multiple existing interface command sets for existing channel and network protocols Depending on the needs of the application, either channel or networking approaches can be used for any data transfer. The Fibre Channel Industry Association, which is the industry consortium promoting Fibre Channel, lists the following ambitious requirements that Fibre Channel is intended to satisfy [FCIA01]: • Full duplex links with two fibers per link • Performance from 100 Mbps to 800 Mbps on a single line (full-duplex 200 Mbps to 1600 Mbps per link) • Support for distances up to 10 km • Small connectors • High-capacity utilization with distance insensitivity • Greater connectivity than existing multidrop channels • Broad availability (i.e., standard components) • Support for multiple cost/performance levels, from small systems to supercomputers • Ability to carry multiple existing interface command sets for existing channel and network protocols
220
Fibre Channel Network The solution was to develop a simple generic transport mechanism based on point-to-point links and a switching network. This underlying infrastructure supports a simple encoding and framing scheme that in turn supports a variety of channel and network protocols. The key elements of a Fibre Channel network are the end systems, called nodes, and the network itself, which consists of one or more switching elements. The collection of switching elements is referred to as a fabric. These elements are interconnected by point-to-point links between ports on the individual nodes and switches. Communication consists of the transmission of frames across the point-to-point links. Each node includes one or more ports, called N_ports, for interconnection. Similarly, each fabric-switching element includes multiple ports, called F_ports. Interconnection is by means of bidirectional links between ports. Any node can communicate with any other node connected to the same fabric using the services of the fabric. All routing of frames between N_ports is done by the fabric. Frames may be buffered within the fabric, making it possible for different nodes to connect to the fabric at different data rates. A fabric can be implemented as a single fabric element with attached nodes (a simple star arrangement) or as a more general network of fabric elements, as shown in Stallings DCC8e Figure In either case, the fabric is responsible for buffering and for routing frames between source and destination nodes.
221
Fibre Channel Protocol Architecture
FC-0 Physical Media FC-1 Transmission Protocol FC-2 Framing Protocol FC-3 Common Services FC-4 Mapping The Fibre Channel standard is organized into five levels. Each level defines a function or set of related functions. The standard does not dictate a correspondence between levels and actual implementations, with a specific interface between adjacent levels. Rather, the standard refers to the level as a “document artifice” used to group related functions. The layers are: • FC-0 Physical Media: Includes optical fiber for long-distance applications, coaxial cable for high speeds over short distances, and shielded twisted pair for lower speeds over short distances • FC-1 Transmission Protocol: Defines the signal encoding scheme • FC-2 Framing Protocol: Deals with defining topologies, frame format, flow and error control, and grouping of frames into logical entities called sequences and exchanges • FC-3 Common Services: Includes multicasting • FC-4 Mapping: Defines the mapping of various channel and network protocols to Fibre Channel, including IEEE 802, ATM, IP, and the Small Computer System Interface (SCSI)
222
Fibre Channel Physical Media
One of the major strengths of the Fibre Channel standard is that it provides a range of options for the physical medium, the data rate on that medium, and the topology of the network, as shown in Stallings DCC8e Table The transmission media options that are available under Fibre Channel include shielded twisted pair, video coaxial cable, and optical fiber. Standardized data rates range from 100 Mbps to 3.2 Gbps. Point-to-point link distances range from 33 m to 10 km.
223
Fibre Channel Fabric most general supported topology is fabric or switched topology arbitrary topology with at least one switch to interconnect number of end systems may also consist of switched network routing transparent to nodes when data transmitted into fabric, edge switch uses destination port address to determine location either deliver frame to node attached to same switch or transfers frame to adjacent switch The most general topology supported by Fibre Channel is referred to as a fabric or switched topology. This is an arbitrary topology that includes at least one switch to interconnect a number of end systems. The fabric topology may also consist of a number of switches forming a switched network, with some or all of these switches also supporting end nodes. Routing in the fabric topology is transparent to the nodes. Each port in the configuration has a unique address. When data from a node are transmitted into the fabric, the edge switch to which the node is attached uses the destination port address in the incoming data frame to determine the destination port location. The switch then either delivers the frame to another node attached to the same switch or transfers the frame to an adjacent switch to begin routing the frame to a remote destination.
224
Fabric Advantages scalability of capacity protocol independent
distance insensitive switch and transmission link technologies may change without affecting overall configuration burden on nodes minimized The fabric topology provides scalability of capacity: as additional ports are added, the aggregate capacity of the network increases, thus minimizing congestion and contention and increasing throughput. The fabric is protocol independent and largely distance insensitive. The technology of the switch itself and of the transmission links connecting the switch to nodes may be changed without affecting the overall configuration. Another advantage of the fabric topology is that the burden on nodes is minimized. An individual Fibre Channel node (end system) is only responsible for managing a simple point-to-point connection between itself and the fabric; the fabric is responsible for routing between ports and error detection.
225
Alternative Topologies
Point-to-point topology only two ports directly connected, so no routing needed Arbitrated loop topology simple, low-cost topology up to 126 nodes in loop operates roughly equivalent to token ring topologies, transmission media, and data rates may be combined In addition to the fabric topology, the Fibre Channel standard defines two other topologies. With the point-to-point topology there are only two ports, and these are directly connected, with no intervening fabric switches. In this case there is no routing. The arbitrated loop topology is a simple, low-cost topology for connecting up to 126 nodes in a loop. The arbitrated loop operates in a manner roughly equivalent to the token ring protocols that we have seen. Topologies, transmission media, and data rates may be combined to provide an optimized configuration for a given site.
226
Fibre Channel Applications
Topologies, transmission media, and data rates may be combined to provide an optimized configuration for a given site. Stallings DCC8e Figure 16.9 is an example that illustrates the principal applications of Fiber Channel.
227
Fibre Channel Prospects
backed by Fibre Channel Association various interface cards available widely accepted as peripheral device interconnect technically attractive to general high-speed LAN requirements must compete with Ethernet and ATM LANs cost and performance issues will dominate consideration of competing technologies Fibre Channel is backed by an industry interest group known as the Fibre Channel Association and a variety of interface cards for different applications are available. Fibre Channel has been most widely accepted as an improved peripheral device interconnect, providing services that can eventually replace such schemes as SCSI. It is a technically attractive solution to general high-speed LAN requirements but must compete with Ethernet and ATM LANs. Cost and performance issues should dominate the manager's consideration of these competing technologies.
228
High speed LANs emergence Ethernet technologies
CSMA & CSMA/CD media access 10Mbps ethernet 100Mbps ethernet 1Gbps ethernet 10Gbps ethernet Fibre Channel Chapter16 summary.
229
Chapter 7 Overview of Probability and Stochastic Processes
230
Basic Probability Envision an experiment for which the result is unknown. The collection of all possible outcomes is called the sample space. A set of outcomes, or subset of the sample space, is called an event. A probability space is a three-tuple (W ,, Pr) where W is a sample space, is a collection of events from the sample space and Pr is a probability law that assigns a number to each event in . For any events A and B, Pr must satsify: Pr() = 1 Pr(A) 0 Pr(AC) = 1 – Pr(A) Pr(A B) = Pr(A) + Pr(B), if A B = . If A and B are events in with Pr(B) 0, the conditional probability of A given B is
231
Random Variables Discrete vs. Continuous
Cumulative distribution function Density function Probability distribution (mass) function Joint distributions Conditional distributions Functions of random variables Moments of random variables Transforms and generating functions
232
Functions of Random Variables
Often we’re interested in some combination of r.v.’s Sum of the first k interarrival times = time of the kth arrival Minimum of service times for parallel servers = time until next departure If X = min(Y, Z) then therefore, and if Y and Z are independent, If X = max(Y, Z) then If X = Y + Z , its distribution is the convolution of the distributions of Y and Z. Find it by conditioning.
233
Conditioning (Wolff) Frequently, the conditional distribution of Y given X is easier to find than the distribution of Y alone. If so, evaluate probabilities about Y using the conditional distribution along with the marginal distribution of X: Example: Draw 2 balls simultaneously from urn containing four balls numbered 1, 2, 3 and 4. X = number on the first ball, Y = number on the second ball, Z = XY. What is Pr(Z > 5)? Key: Maybe easier to evaluate Z if X is known
234
Convolution Let X = Y+Z. If Y and Z are independent, Example: Poisson
Note: above is cdf. To get density, differentiate:
235
Moments of Random Variables
Expectation = “average” Variance = “volatility” Standard Deviation Coefficient of Variation
236
Linear Functions of Random Variables
Covariance Correlation If X and Y are independent then
237
Transforms and Generating Functions
Moment-generating function Laplace transform (nonneg. r.v.) Generating function (z – transform) Let N be a nonnegative integer random variable;
238
Special Distributions
Discrete Bernoulli Binomial Geometric Poisson Continuous Uniform Exponential Gamma Normal
239
Bernoulli Distribution
“Single coin flip” p = Pr(success) N = 1 if success, 0 otherwise
240
Binomial Distribution
“n independent coin flips” p = Pr(success) N = # of successes
241
Geometric Distribution
“independent coin flips” p = Pr(success) N = # of flips until (including) first success Memoryless property: Have flipped k times without success;
242
z-Transform for Geometric Distribution
Given Pn = (1-p)n-1p, n = 1, 2, …., find Then,
243
Poisson Distribution “Occurrence of rare events” = average rate of occurrence per period; N = # of events in an arbitrary period
244
Uniform Distribution X is equally likely to fall anywhere within interval (a,b) a b
245
Exponential Distribution
X is nonnegative and it is most likely to fall near 0 Also memoryless; more on this later…
246
Gamma Distribution X is nonnegative, by varying parameter b get a variety of shapes When b is an integer, k, this is called the Erlang-k distribution, and Erlang-1 is same as exponential.
247
Normal Distribution X follows a “bell-shaped” density function
From the central limit theorem, the distribution of the sum of independent and identically distributed random variables approaches a normal distribution as the number of summed random variables goes to infinity.
248
m.g.f.’s of Exponential and Erlang
If X is exponential and Y is Erlang-k, Fact: The mgf of a sum of independent r.v.’s equals the product of the individual mgf’s. Therefore, the sum of k independent exponential r.v.’s (with the same rate l) follows an Erlang-k distribution.
249
Stochastic Processes A stochastic process is a random variable that changes over time, or a sequence of numbers that you don’t know yet. Poisson process Continuous time Markov chains
250
Stochastic Processes Set of random variables, or observations of the same random variable over time: Xt may be either discrete-valued or continuous-valued. A counting process is a discrete-valued, continuous-parameter stochastic process that increases by one each time some event occurs. The value of the process at time t is the number of events that have occurred up to (and including) time t.
251
Poisson Process Let be a stochastic process where X(t) is the number of events (arrivals) up to time t. Assume X(0)=0 and (i) Pr(arrival occurs between t and t+t) = where o(t) is some quantity such that (ii) Pr(more than one arrival between t and t+t) = o(t) (iii) If t < u < v < w, then X(w) – X(v) is independent of X(u) – X(t). Let pn(t) = P(n arrivals occur during the interval (0,t). Then …
252
Poisson Process and Exponential Dist’n
Let T be the time between arrivals. Pr(T > t) = Pr(there are no arrivals in (0,t) = p0(t) = Therefore, that is, the time between arrivals follows an exponential distribution with parameter = the arrival rate. The converse is also true; if interarrival times are exponential, then the number of arrivals up to time t follows a Poisson distribution with mean and variance equal to t.
253
When are Poisson arrivals reasonable?
The Poisson distribution can be seen as a limit of the binomial distribution, as n , p0 with constant =np. many potential customers deciding independently about arriving (arrival = “success”), each has small probability of arriving in any particular time interval Conditions given above: probability of arrival in a small interval is approximately proportional to the length of the interval – no bulk arrivals Amount of time since last arrival gives no indication of amount of time until the next arrival (exponential – memoryless)
254
More Exponential Distribution Facts
Suppose T1 and T2 are independent with Then Suppose (T1, T2, …, Tn ) are independent with Let Y = min(T1, T2, …, Tn ) . Then Suppose (T1, T2, …, Tk ) are independent with Let W= T1 + T2 + … + Tk . Then W has an Erlang-k distribution with density function
255
Continuous Time Markov Chains
A stochastic process with possible values (state space) S = {0, 1, 2, …} is a CTMC if “The future is independent of the past given the present” Define Then
256
CTMC Another Way Each time X(t) enters state j, the sojourn time is exponentially distributed with mean 1/qj When the process leaves state i, it goes to state j i with probability pij, where Let Then
257
CTMC Infinitesimal Generator
The time it takes the process to go from state i to state j Then qij is the rate of transition from state i to state j, The infinitesimal generator is
258
Long Run (Steady State) Probabilities
Let Under certain conditions these limiting probabilities can be shown to exist and are independent of the starting state; They represent the long run proportions of time that the process spends in each state, Also the steady-state probabilities that the process will be found in each state. Then or, equivalently,
259
Phase-Type Distributions
Erlang distribution Hyperexponential distribution Coxian (mixture of generalized Erlang) distributions
260
Chapter 8 Queuing Analysis
261
Queuing Model and Analysis
Queue1 Queuing theory deals with modeling and analyzing systems with queues of items and servers that process the items. Queue2 server Queue3
262
Goals of Queuing Analysis
Typically used in analysis of networking system; examples, increase in disk access time Increase in process load Increase in rate of arrival of packets, processes Especially useful of analysis of performance when either the load on a system is expected to increase or a design change is contemplated. While it is a popular method in network analysis, it has gained popularity within a system esp. with the advent of multi-core processors.
263
Analysis methods After the fact analysis: let the system run some n number times, collect the “real” data and analyze – problems? Predict some simple trends /projections based on experience – problems? Develop analytical model based on queuing theory – problems? Run simulation (not real systems) and collect data to analyze –problems?
264
Single server queue departures queue arrivals server λ= arrival rate
Dispatching discipline w = mean # items waiting Tw = mean waiting time Ts = mean service time ρ = utilization r mean # items residing in the system Tr = mean residence time
265
Multi-server /single queue
arrivals λ= arrival rate Dispatching discipline server0 server1 Servern-1 ……….
266
Multi-server /Multiple queues
arrivals queue server1 ………. queue Servern-1
267
Parameters Items arrive at the facility at some average rate (items arriving per second) l. At any given time, a certain number of items will be waiting in the queue (zero or more); The average number waiting is w, and the mean time that an item must wait is Tw. The server handles incoming items with an average service time Ts;
268
More parameters Utilization, ρ, is the fraction of time that the server is busy, measured over some interval of time. Finally, two parameters apply to the system as a whole. The average number of items resident in the system, including the item being served (if any) and the items waiting (if any), is r; and the average time that an item spends in the system, waiting and being served, is Tr; we refer to this as the mean residence time
269
Analysis As the arrival rate, which is the rate of traffic passing through the system, increases, the utilization increases and with it, congestion. The queue becomes longer, increasing waiting time. At ρ = 1, the server becomes saturated, working 100% of the time. Thus, the theoretical maximum input rate that can be handled by the system is: λmax = 1/Ts However, queues become very large near system saturation, growing without bound when ρ = 1. Practical considerations, such as response time requirements or buffer sizes, usually limit the input rate for a single server to 70-90% of the theoretical maximum. For multi server queue for N servers: λmax = N/Ts
270
Specific Metrics The fundamental task of a queuing analysis is as follows: Given the following information as input: · Arrival rate · Service time Provide as output information concerning: · Items waiting · Waiting time · Items in residence · Residence time. We would like to know their average values (w, Tw, r, Tr) and the respective variability the σ’s We are also interested in some probabilities: what is probability that items waiting in line < M is 0.99?
271
Important Assumptions
The arrival rate obeys the Poisson distribution, which is equivalent to saying that the inter-arrival times are exponential; On other words, the arrivals occur randomly and independent of one another. A convenient notation has been developed for summarizing the principal assumptions that are made in developing a queuing model. The notation is X/Y/N, where X refers to the distribution of the inter-arrival times, Y refers to the distribution of service times, and N refers to the number of servers. M/M/1 refers to a single-server queuing model with Poisson arrivals and exponential service times. M/G/1 and M/M/1 and M/D/1
272
Little’s Law Very simple law that works from a Case Western Reserve University professor Dr. Little Average number of customers in a system = average arrival rate * average time spent in the system r = Tr * λ w = Tw * λ Tr = Tw + Ts Extend it to the M/M/1 queuing model
273
Chapter 9 Self-Similar Traffic
274
What is Self-Similarity?
Self-similarity describes the phenomenon where a certain property of an object is preserved with respect to scaling in space and/or time. If an object is self-similar, its parts, when magnified, resemble the shape of the whole.
275
Pictorial View of Self-Similarity
276
The Famous Data Leland and Wilson collected hundreds of millions of Ethernet packets without loss and with recorded time-stamps accurate to within 100µs. Data collected from several Ethernet LAN’s at the Bellcore Morristown Research and Engineering Center at different times over the course of approximately 4 years.
278
Why is Self-Similarity Important?
Recently, network packet traffic has been identified as being self-similar. Current network traffic modeling using Poisson distributing (etc.) does not take into account the self-similar nature of traffic. This leads to inaccurate modeling which, when applied to a huge network like the Internet, can lead to huge financial losses.
279
Problems with Current Models
Current modeling shows that as the number of sources (Ethernet users) increases, the traffic becomes smoother and smoother Analysis shows that the traffic tends to become less smooth and more bursty as the number of active sources increases
280
Problems with Current Models
Were traffic to follow a Poisson or Markovian arrival process, it would have a characteristic burst length which would tend to be smoothed by averaging over a long enough time scale. Rather, measurements of real traffic indicate that significant traffic variance (burstiness) is present on a wide range of time scales
281
Pictorial View of Current Modeling
282
Side-by-side View
283
Definitions and Properties
Long-range Dependence covariance decays slowly Hurst Parameter Developed by Harold Hurst (1965) H is a measure of “burstiness” also considered a measure of self-similarity 0 < H < 1 H increases as traffic increases
284
Definitions and Properties
low, medium, and high traffic hours as traffic increases, the Hurst parameter increases i.e., traffic becomes more self-similar
285
Self-Similar Measures
Background Let time series: X = (Xt : t = 0, 1, 2, ….) be a covariance stationary stochastic process autocorrelation function: r(k), k ≥ 0 assume r(k) ~ k-β L(t), as k∞ where 0 < β < 1 limt∞ L(tx) / L(t) = 1, for all x > 0
286
Second-order Self-Similar
Exactly A process X is called (exactly) self-similar with self-similarity parameter H = 1 – β/2 if for all m = 1, 2, …. var(X(m)) = σ2m-β r(m)(k) = r(k), k ≥ 0 Asymptotically r(m)(k) = r(k), as m∞ aggregated processes are the same Current model shows aggregated processes tending to pure noise
287
Measuring Self-Similarity
time-domain analysis based on R/S statistic analysis of the variance of the aggregated processes X(m) periodogram-based analysis in the frequency domain
288
Methods of Modeling Self-Similar Traffic
Two formal mathematical models that yield elegant representations of self-similarity fractional Gaussian noise fractional autoregressive integrated moving-average processes
289
Results Ethernet traffic is self-similar irrespective of time
Ethernet traffic is self-similar irrespective of where it is collected The degree of self-similarity measured in terms of the Hurst parameter h is typically a function of the overall utilization of the Ethernet and can be used for measuring the “burstiness” of the traffic Current traffic models are not capable of capturing the self-similarity property
290
Results There exists the presence of concentrated periods of congestion at a wide range of time scales This implies the existence of concentrated periods of light network load These two features cannot be easily controlled by traffic control. i.e., burstiness cannot be smoothed
291
Results These two implications make it difficult to allocated services such that QOS and network utilization are maximized. Self-similar burstiness can lead to the amplification of packet loss.
292
Problems with Packet Loss
Effects in TCP TCP guarantees that packets will be delivered and will be delivered in order When packets are lost in TCP, the lost packets must be retransmitted This wastes valuable resources Effects in UDP UDP sends packets as quickly as possible with no promise of delivery When packets are lost, they are not retransmitted Repercussions for packet loss in UDP include “jitter” in streaming audio/video etc.
293
Possible Methods for Dealing with the Self-Similar Property of Traffic
Dynamic Control of Traffic Flow Structural resource allocation
294
Dynamic Control of Traffic Flow
Predictive feedback control identify the on-set of concentrated periods of either high or low traffic activity adjust the mode of congestion control appropriately from conservative to aggressive
295
Dynamic Control of Traffic Flow
Adaptive forward error correction retransmission of lost information is not viable because of time-constraints (real-time) adjust the degree of redundancy based on the network state increase level of redundancy when traffic is high could backfire as too much of an increase will only further aggrevate congestion decrease level of redundancy when traffic is low
296
Structural Resource Allocation
Two types: bandwidth buffer size Bandwidth increase bandwidth to accommodate periods of “burstiness” could be wasteful in times of low traffic intensity
297
Structural Resource Allocation
buffer size increase the buffer size in routers (et. al.) such that they can absorb periods of “burstiness” still possible to fill a given router’s buffer and create a bottleneck tradeoff increase both until they complement each other and begin curtailing the effects of self-similarity
298
Chapter 10 Congestion Control in Data Networks and Internets
299
Introduction Congestion occurs when number of packets transmitted approaches network capacity Objective of congestion control: keep number of packets below level at which performance drops off dramatically
300
Queuing Theory Data network is a network of queues
If arrival rate > transmission rate then queue size grows without bound and packet delay goes to infinity
301
Figure 10.1
302
At Saturation Point, 2 Strategies
Discard any incoming packet if no buffer available Saturated node exercises flow control over neighbors May cause congestion to propagate throughout network
303
Figure 10.2
304
Ideal Performance I.e., infinite buffers, no overhead for packet transmission or congestion control Throughput increases with offered load until full capacity Packet delay increases with offered load approaching infinity at full capacity Power = throughput / delay Higher throughput results in higher delay
305
Figure 10.3
306
Practical Performance
I.e., finite buffers, non-zero packet processing overhead With no congestion control, increased load eventually causes moderate congestion: throughput increases at slower rate than load Further increased load causes packet delays to increase and eventually throughput to drop to zero
307
Figure 10.4
308
Congestion Control Backpressure
Request from destination to source to reduce rate Choke packet: ICMP Source Quench Implicit congestion signaling Source detects congestion from transmission delays and discarded packets and reduces flow
309
Explicit congestion signaling
Direction Backward Forward Categories Binary Credit-based rate-based
310
Traffic Management Fairness Last-in-first-discarded may not be fair
Quality of Service Voice, video: delay sensitive, loss insensitive File transfer, mail: delay insensitive, loss sensitive Interactive computing: delay and loss sensitive Reservations Policing: excess traffic discarded or handled on best-effort basis
311
Figure 10.5
312
Frame Relay Congestion Control
Minimize frame size Maintain QoS Minimize monopolization of network Simple to implement, little overhead Minimal additional network traffic Resources distributed fairly Limit spread of congestion Operate effectively regardless of flow Have minimum impact other systems in network Minimize variance in QoS
313
Table 10.1
314
Traffic Rate Management
Committed Information Rate (CIR) Rate that network agrees to support Aggregate of CIRs < capacity For node and user-network interface (access) Committed Burst Size Maximum data over one interval agreed to by network Excess Burst Size Maximum data over one interval that network will attempt
315
Figure 10.6
316
Figure 10.7
317
Congestion Avoidance with Explicit Signaling
2 strategies Congestion always occurred slowly, almost always at egress nodes forward explicit congestion avoidance Congestion grew very quickly in internal nodes and required quick action backward explicit congestion avoidance
318
2 Bits for Explicit Signaling
Forward Explicit Congestion Notification For traffic in same direction as received frame This frame has encountered congestion Backward Explicit Congestion Notification For traffic in opposite direction of received frame Frames transmitted may encounter congestion
319
Chapter 11 Link-Level Flow and Error Control
320
Introduction The need for flow and error control
Link control mechanisms Performance of ARQ (Automatic Repeat Request)
321
Flow Control and Error Control
Fundamental mechanisms that determine performance Can be implemented at different levels: link, network, or application Difficult to model performance Simplest case: point-to-point link Constant propagation Constant data rate Probabilistic error rate Traffic characteristics
322
Flow Control Limits the amount or rate of data that is sent Reasons:
Source may send PDUs faster than destination can process headers Higher-level protocol user at destination may be slow in retrieving data Destination may need to limit incoming flow to match outgoing flow for retransmission
323
Flow Control at Multiple Protocol Layers
X.25 virtual circuits (level 3) multiplexed over a data link using LAPB (X.25 level 2) Multiple TCP connections over HDLC link Flow control at higher level applied to each logical connection independently Flow control at lower level applied to total traffic
324
Figure 11.1
325
Flow Control Scope Hop Scope
Between intermediate systems that are directly connected Network interface Between end system and network Entry-to-exit Between entry to network and exit from network End-to-end Between end user systems
326
Figure 11.2
327
Error Control Used to recover lost or damaged PDUs
Involves error detection and PDU retransmission Implemented together with flow control in a single mechanism Performed at various protocol levels
328
Link Control Mechanisms
3 techniques at link level: Stop-and-wait Go-back-N Selective-reject Latter 2 are special cases of sliding-window Assume 2 end systems connected by direct link
329
Sequence of Frames Source breaks up message into sequence of frames
Buffer size of receiver may be limited Longer transmission are more likely to have an error On a shared medium, avoids one station monopolizing medium
330
Stop and Wait Source transmits frame
After reception, destination indicates willingness to accept another frame in acknowledgement Source must wait for acknowledgement before sending another frame 2 kinds of errors: Damaged frame at destination Damaged acknowledgement at source
331
ARQ Automatic Repeat Request Uses: Error detection Timers
Acknowledgements Retransmissions
332
Figure 11.3
333
Figure 11.4
334
Stop-and-Wait Link Utilization
If Tprop large relative to Tframe then throughput reduced If propagation delay is long relative to transmission time, line is mostly idle Problem is only one frame in transit at a time Stop-and-Wait rarely used because of inefficiency
335
Sliding Window Techniques
Allow multiple frames to be in transit at the same time Source can send n frames without waiting for acknowledgements Destination can accept n frames Destination acknowledges a frame by sending acknowledgement with sequence number of next frame expected (and implicitly ready for next n frames)
336
Figure 11.5
337
Figure 11.6
338
Go-back-N ARQ Most common form of error control based on sliding window Number of un-acknowledged frames determined by window size Upon receiving a frame in error, destination discards that frame and all subsequent frames until damaged frame received correctly Sender resends frame (and all subsequent frames) either when it receives a Reject message or timer expires
339
Figure 11.7
340
Figure 11.8
341
Error-Free Stop and Wait
T = Tframe + Tprop + Tproc + Tack + Tprop + Tproc Tframe = time to transmit frame Tprop = propagation time Tproc = processing time at station Tack = time to transmit ack Assume Tproc and Tack relatively small
342
T ≈ Tframe + 2Tprop Throughput = 1/T = 1/(Tframe + 2Tprop) frames/sec Normalize by link data rate: 1/ Tframe frames/sec S = 1/(Tframe + 2Tprop) = Tframe = 1 1/ Tframe Tframe + 2Tprop a where a = Tprop / Tframe
343
Stop-and-Wait ARQ with Errors
P = probability a single frame is in error Nx = 1 - P = average number of times each frame must be transmitted due to errors S = = P Nx (1 + 2a) Nx (1 + 2a)
344
The Parameter a a = propagation time = d/V = Rd
transmission time L/R VL where d = distance between stations V = velocity of signal propagation L = length of frame in bits R = data rate on link in bits per sec
345
Table 11.1
346
Figure 11.9
347
Error-Free Sliding Window ARQ
Case 1: W ≥ 2a + 1 Ack for frame 1 reaches A before A has exhausted its window Case 2: W < 2a +1 A exhausts its window at t = W and cannot send additional frames until t = 2a + 1
348
Figure 11.10
349
Normalized Throughput
W ≥ 2a + 1 S = W W < 2a +1 2a + 1
350
Selective Reject ARQ 1 - P W ≥ 2a + 1 S = W(1 - P) W < 2a +1 2a + 1
351
Go-Back-N ARQ 1 - P W ≥ 2a + 1 S = 1 + 2aP W(1 - P) W < 2a +1
352
Figure 11.11
353
Figure 11.12
354
Figure 11.13
355
High-Level Data Link Control
HDLC is the most important data link control protocol Widely used which forms basis of other data link control protocols
356
Figure 11.15
357
HDLC Operation Initialization Data transfer Disconnect
358
Figure 11.16
359
Chapter 12 TCP Traffic Control
360
Figure 1 - Timing of TCP Flow Control
Protocols and the TCP/IP Suite Figure 1 - Timing of TCP Flow Control Chapter 2
361
Effect of Window Size W = TCP window size (octets)
R = Data rate (bps) at TCP source D = End to end delay (except the transmission delay at source) (seconds) The delay between starting the first bit at source and reception at the destination After TCP source begins transmitting, it takes D seconds for first octet to arrive, and D seconds for acknowledgement to return TCP source could transmit at most 2RD bits, or RD/4 octets, if W permits
362
TCP Utilization (Very Simplistic)
363
Complicating Factors Multiple TCP connections multiplexed over same network interface Reducing R For multi-hop connections, D is sum of delays across each network plus delays at each router If source data rate R exceeds data rate on a hop, that hop will be a bottleneck and will increase D Lost segments retransmitted, reducing throughput Impact depends on retransmission strategy (will see next)
364
Retransmission Strategy
TCP relies on positive acknowledgements Retransmission on timeout Timer associated with each segment as it is sent If timer expires before acknowledgement, sender must retransmit Value of retransmission timer is a key factor Too small: many unnecessary retransmissions, wasting network bandwidth Too large: delay in handling lost segments Timer should be longer than round-trip delay, but this delay is variable
365
Two Strategies Fixed timer
Unable to respond changing network conditions Adaptive Timer value changes as network conditions change TCP uses adaptive timer
366
Problems with Adaptive Scheme
Peer TCP entity may accumulate acknowledgements and may not acknowledge immediately For retransmitted segments, can’t tell whether acknowledgement is response to original transmission or to retransmission The problem is the same: difficulty in calculating the round-trip time and timeout value Actually no perfect solution exists, but there is a standard approaches as will be detailed next
367
Adaptive Retransmission Timer Management
2 sub-problems Estimate the next round trip time (RTT) by observing pattern of delay Determine the timeout value by setting a bit greater than estimate Simple average average the observed RTTs over a number of segments Exponential average later segments have more weight
368
RFC 793 Exponential Averaging
Smoothed Round-Trip Time (SRTT) – Estimated one RTT is the observed one (i.e. time between sending a segment and receiving its acknowledgment) SRTT(K+1) = α*SRTT(K)+(1–α)*RTT(K+1) SRTT(K+1) is estimate for (K+2)nd round-trip time Gives greater weight to more recent values as shown by expansion of above: SRTT(K+1) =(1–α)RTT(K+1)+α(1–α)RTT(K) + α2(1– α)RTT(K–1) +…+αK(1–α)RTT(1) α and 1–α < 1, so successive terms get smaller E.g. α = 0.8 SRTT(K+1)=0.2 RTT(K+1)+0.16 RTT(K) RTT(K–1) + ... Smaller values of α give greater weight to recent RTT values
369
Use of Exponential Averaging – Increasing observed RTT
The legends in Figure 4 of the book are wrong! The figure here is correct
370
Use of Exponential Averaging – Decreasing observed RTT
371
How to determine RTO RTO means Retransmission TimeOut
Also known as Retransmission Timer Two basic approaches Add fixed to estimated RTT RTO(K+1) = SRTT(K+1) + Multiply estimated SRTT with a fixed factor greater than 1 Both not good if the observed RTT has variation It is better if the RTO depends on the estimated SRTT and standard deviation in SRTT Jacobson’s method
372
RTT Variance Estimation (Jacobson’s Algorithm)
Standard method RTT may show high variance. Possible reasons: Variance in packet size may cause variance in transmission delays Network traffic load may change abruptly due to other sources Peer may not acknowledge segments immediately
373
Jacobson’s Algorithm SRTT(K + 1) = (1 – g) × SRTT(K) + g × RTT(K + 1)
SERR(K + 1) = RTT(K + 1) – SRTT(K) SDEV(K + 1) = (1 – h) × SDEV(K) + h ×|SERR(K + 1)| RTO(K + 1) = SRTT(K + 1) + f × SDEV(K + 1) Based on experiments g = 0.125 h = 0.25 f = 2 or f = 4 (most current implementations use f = 4)
374
Jacobson’s RTO Calculation
RTO is quite conservative while RTT is changing Then begins to converge
375
Two Other Factors Jacobson’s algorithm can significantly improve TCP performance, but: What RTO to use for retransmitted segments? ANSWER: exponential RTO backoff algorithm Which round-trip samples to use as input to Jacobson’s algorithm if a segment is retransmitted? ANSWER: Karn’s algorithm
376
Exponential RTO Backoff
Since timeout is probably due to congestion (dropped packet or long round trip delay), maintaining the same RTO is not good idea RTO increases each time a segment is re-transmitted – backoff process RTOi+1 = q*RTOi exponential backoff Most commonly q = 2 binary exponential backoff
377
Which Round-trip Samples?
If a segment is retransmitted, the ACK arriving may be: For the first copy of the segment? For the second copy? TCP source cannot distinguish between these two cases wrong assumptions may yield wrong results and estimates Karn’s rules Do not measure RTT for retransmitted segments to update SRTT and SDEV Calculate backoff RTO when re-transmission occurs Use backoff RTO until ACK arrives for segment that has not been re-transmitted When ACK is received for an un-retransmitted segment (i.e. for a segment sent and its ack is received without retransmission), Jacobson algorithm resumes to calculate future RTO values
378
Window Management Remember that in TCP, source is given some credits to send segments (called the window) There are some TCP window management mechanisms to avoid congestion Slow start Dynamic window sizing on congestion Fast retransmit Fast recovery
379
Slow start It is not a good idea to start with a large window since the network situation is not known Start connection with a small window, called congestion window (cwnd) initially one segment only Enlarge congestion window at each ACK Add one segment to congestion window for each ack received Up to a certain max value, which is the available credit (window) Actually not a slow procedure Congestion window growth is exponential
380
Effect of Slow Start
381
Dynamic windows sizing on congestion
When a timeout occurs Run a slow start until a threshold threshold = half of the current congestion window at which timeout occurred. Increasing cong. window size by 1 segment for every ACK After threshold, increase congestion window by one segment for each RTT linear increase in window size
382
Fast Retransmit RTO is generally noticeably longer than actual RTT
If a segment is lost, TCP may be slow to retransmit TCP rule: if a segment is received out of order, an ack must be issued immediately for the last in-order segment TCP continues to send the same ACK for each incoming segment until the missing one arrives After that all incoming segments are ACKed. Fast Retransmit rule: if 4 acks received for same segment, highly likely it was lost, so retransmit immediately, rather than waiting for timeout
383
Fast Retransmit Example
Segment length is 200 octets
384
Fast Recovery When TCP retransmits a segment using Fast Retransmit, a segment was assumed lost Congestion avoidance measures are appropriate at this point e.g., slow-start from cwnd=1 This may be unnecessarily conservative since multiple acks indicate segments are getting through So Fast Recovery rules are applied retransmit lost segment cut cwnd in half proceed with incrementing the congestion window size by adding 1 segment for each ACK received This avoids initial exponential slow-start
385
TCP Congestion Control
Dynamic routing can reduce congestion by spreading load more evenly But only effective for unbalanced loads and brief surges in traffic Congestion can ultimately be controlled by limiting total amount of data entering network IP is connectionless, with little provision for detecting or controlling congestion ICMP source Quench message is crude and not so effective RSVP may help but not widely implemented and not widely used for all users/applications
386
TCP Flow and Congestion Control
The rate at which a TCP entity can transmit is determined by rate of incoming ACKs to previous segments with new credit Rate of ACK/credit arrival determined by the bottleneck in the round-trip path between source and destination Bottleneck may be destination or Internet
387
TCP Segment Pacing Heights of the pipes represent capacity
Pb = Pr = Ar = Ab = As Steady state: sender’s segment rate is equal to the slowest line on the round trip path TCP’s self-clocking behavior TCP automatically senses the network bottleneck However cannot say whether the bottleneck is at destination or at network
388
Moral of the story If the bottleneck is at physical layer and consistent, then TCP finds its optimal capacity in the steady state However, if the delay is due to fluctuating queuing effects, then the system may not achieve steady state without intervention There will be delays and queues No way out! TCP flow should be arranged accordingly If too slow, system underutilized If fast, congestion TCP sliding window mechanism should react to congestion effectively That is why we have RTT & RTO estimation mechanisms, slow start, dynamic window sizing and other window management mechanisms
389
Explicit Congestion Notification (ECN)
Defined in RFC 3168 (not native in TCP and IP protocols) Routers alert end systems about growing congestion End systems take precautions to reduce load ECN prevents packet drops Alert end systems before congestion causes packet drop Retransmissions are avoided Changes done to use ECN TCP and IP protocol implementations should provide support for ECN Two new bits are added to TCP header Two new bits are added to IP header TCP entities enable ECN by negotiation at connection establishment time
390
IP Header Originally Later this field is reallocated
IPv4 header includes 8-bit Type of Service field IPv6 header includes 8-bit Traffic Class field Later this field is reallocated Leftmost 6 bits dedicated to DS (differentiated services) field, Rightmost 2 bits was unused RFC 3260 renames these unused bits as ECN field Interpretations of the ECN field: Value Label Meaning Not-ECT Packet is not using ECN ECT (1) ECT (0) CE Congestion experienced Set by the sender to indicate ECN-capable transport
391
TCP Header To support ECN, two new flag bits added ECN-Echo (ECE) flag
Used by receiver to inform sender when CE packet has been received Congestion Window Reduced (CWR) flag Used by sender to inform receiver that sender's congestion window has been reduced
392
TCP Initialization TCP header bits used in connection establishment to enable end points to agree to use ECN A sends SYN segment to B with ECE and CWR set Meaning that A is ECN-capable and prepared to use ECN as both sender and receiver If B is prepared to use ECN, returns SYN-ACK segment with ECE set CWR not set If B is not prepared to use ECN, returns SYN-ACK segment with ECE and CWR not set
393
Basic ECN Operation
394
Chapter 13 Traffic and Congestion Control in ATM Networks
395
Introduction Control needed to prevent switch buffer overflow
High speed and small cell size gives different problems from other networks Limited number of overhead bits ITU-T specified restricted initial set I.371 ATM forum Traffic Management Specification 41
396
Overview Congestion problem Framework adopted by ITU-T and ATM forum
Control schemes for delay sensitive traffic Voice & video Not suited to bursty traffic Traffic control Congestion control Bursty traffic Available Bit Rate (ABR) Guaranteed Frame Rate (GFR)
397
Requirements for ATM Traffic and Congestion Control
TCP Traffic and Congestion Control in ATM Networks Requirements for ATM Traffic and Congestion Control Most packet switched and frame relay networks carry non-real-time bursty data No need to replicate timing at exit node Simple statistical multiplexing User Network Interface capacity slightly greater than average of channels Congestion control tools from these technologies do not work in ATM Chapter 13
398
Problems with ATM Congestion Control
Most traffic not amenable to flow control Voice & video can not stop generating Feedback slow Small cell transmission time v propagation delay Wide range of applications From few kbps to hundreds of Mbps Different traffic patterns Different network services High speed switching and transmission Volatile congestion and traffic control
399
Key Performance Issues- Latency/Speed Effects
E.g. data rate 150Mbps Takes (53 x 8 bits)/(150 x 106) =2.8 x 10-6 seconds to insert a cell Transfer time depends on number of intermediate switches, switching time and propagation delay. Assuming no switching delay and speed of light propagation, round trip delay of 48 x 10-3 sec across USA A dropped cell notified by return message will arrive after source has transmitted N further cells N=(48 x 10-3 seconds)/(2.8 x 10-6 seconds per cell) =1.7 x 104 cells = 7.2 x 106 bits i.e. over 7 Mbits
400
Key Performance Issues- Cell Delay Variation
For digitized voice delay across network must be small Rate of delivery must be constant Variations will occur Dealt with by Time Reassembly of CBR cells (see next slide) Results in cells delivered at CBR with occasional gaps due to dropped cells Subscriber requests minimum cell delay variation from network provider Increase data rate at UNI relative to load Increase resources within network
401
Time Reassembly of CBR Cells
402
Network Contribution to Cell Delay Variation
In packet switched network Queuing effects at each intermediate switch Processing time for header and routing Less for ATM networks Minimal processing overhead at switches Fixed cell size, header format No flow control or error control processing ATM switches have extremely high throughput Congestion can cause cell delay variation Build up of queuing effects at switches Total load accepted by network must be controlled
403
Cell Delay Variation at UNI
Caused by processing in three layers of ATM model See next slide for details None of these delays can be predicted None follow repetitive pattern So, random element exists in time interval between reception by ATM stack and transmission
404
Origins of Cell Delay Variation
405
ATM Traffic-Related Attributes
Six service categories (see chapter 5) Constant bit rate (CBR) Real time variable bit rate (rt-VBR) Non-real-time variable bit rate (nrt-VBR) Unspecified bit rate (UBR) Available bit rate (ABR) Guaranteed frame rate (GFR) Characterized by ATM attributes in four categories Traffic descriptors QoS parameters Congestion Other
406
ATM Service Category Attributes
407
Traffic Parameters Traffic pattern of flow of cells
Intrinsic nature of traffic Source traffic descriptor Modified inside network Connection traffic descriptor
408
Source Traffic Descriptor (1)
Peak cell rate Upper bound on traffic that can be submitted Defined in terms of minimum spacing between cells T PCR = 1/T Mandatory for CBR and VBR services Sustainable cell rate Upper bound on average rate Calculated over large time scale relative to T Required for VBR Enables efficient allocation of network resources between VBR sources Only useful if SCR < PCR
409
Source Traffic Descriptor (2)
Maximum burst size Max number of cells that can be sent at PCR If bursts are at MBS, idle gaps must be enough to keep overall rate below SCR Required for VBR Minimum cell rate Min commitment requested of network Can be zero Used with ABR and GFR ABR & GFR provide rapid access to spare network capacity up to PCR PCR – MCR represents elastic component of data flow Shared among ABR and GFR flows
410
Source Traffic Descriptor (3)
Maximum frame size Max number of cells in frame that can be carried over GFR connection Only relevant in GFR
411
Connection Traffic Descriptor
Includes source traffic descriptor plus:- Cell delay variation tolerance Amount of variation in cell delay introduced by network interface and UNI Bound on delay variability due to slotted nature of ATM, physical layer overhead and layer functions (e.g. cell multiplexing) Represented by time variable τ Conformance definition Specify conforming cells of connection at UNI Enforced by dropping or marking cells over definition
412
Quality of Service Parameters- maxCTD
Cell transfer delay (CTD) Time between transmission of first bit of cell at source and reception of last bit at destination Typically has probability density function (see next slide) Fixed delay due to propagation etc. Cell delay variation due to buffering and scheduling Maximum cell transfer delay (maxCTD)is max requested delay for connection Fraction α of cells exceed threshold Discarded or delivered late
413
Quality of Service Parameters- Peak-to-peak CDV & CLR
Peak-to-peak Cell Delay Variation Remaining (1-α) cells within QoS Delay experienced by these cells is between fixed delay and maxCTD This is peak-to-peak CDV CDVT is an upper bound on CDV Cell loss ratio Ratio of cells lost to cells transmitted
414
Cell Transfer Delay PDF
415
Congestion Control Attributes
Only feedback is defined ABR and GFR Actions taken by network and end systems to regulate traffic submitted ABR flow control Adaptively share available bandwidth
416
Other Attributes Behaviour class selector (BCS)
Support for IP differentiated services (chapter 16) Provides different service levels among UBR connections Associate each connection with a behaviour class May include queuing and scheduling Minimum desired cell rate
417
Traffic Management Framework
Objectives of ATM layer traffic and congestion control Support QoS for all foreseeable services Not rely on network specific AAL protocols nor higher layer application specific protocols Minimize network and end system complexity Maximize network utilization
418
Timing Levels Cell insertion time Round trip propagation time
Connection duration Long term
419
Traffic Control and Congestion Functions
420
Traffic Control Strategy
Determine whether new ATM connection can be accommodated Agree performance parameters with subscriber Traffic contract between subscriber and network This is congestion avoidance If it fails congestion may occur Invoke congestion control
421
Traffic Control Resource management using virtual paths
Connection admission control Usage parameter control Selective cell discard Traffic shaping Explicit forward congestion indication
422
Resource Management Using Virtual Paths
Allocate resources so that traffic is separated according to service characteristics Virtual path connection (VPC) are groupings of virtual channel connections (VCC)
423
Applications User-to-user applications VPC between UNI pair
No knowledge of QoS for individual VCC User checks that VPC can take VCCs’ demands User-to-network applications VPC between UNI and network node Network aware of and accommodates QoS of VCCs Network-to-network applications VPC between two network nodes
424
Resource Management Concerns
Cell loss ratio Max cell transfer delay Peak to peak cell delay variation All affected by resources devoted to VPC If VCC goes through multiple VPCs, performance depends on consecutive VPCs and on node performance VPC performance depends on capacity of VPC and traffic characteristics of VCCs VCC related function depends on switching/processing speed and priority
425
VCCs and VPCs Configuration
426
Allocation of Capacity to VPC
Aggregate peak demand May set VPC capacity (data rate) to total of VCC peak rates Each VCC can give QoS to accommodate peak demand VPC capacity may not be fully used Statistical multiplexing VPC capacity >= average data rate of VCCs but < aggregate peak demand Greater CDV and CTD May have greater CLR More efficient use of capacity For VCCs requiring lower QoS Group VCCs of similar traffic together
427
Connection Admission Control
User must specify service required in both directions Category Connection traffic descriptor Source traffic descriptor CDVT Requested conformance definition QoS parameter requested and acceptable value Network accepts connection only if it can commit resources to support requests
428
Procedures to Set Traffic Control Parameters
429
Cell Loss Priority Two levels requested by user
Priority for individual cell indicated by CLP bit in header If two levels are used, traffic parameters for both flows specified High priority CLP = 0 All traffic CLP = 0 + 1 May improve network resource allocation
430
Usage Parameter Control
UPC Monitors connection for conformity to traffic contract Protect network resources from overload on one connection Done at VPC or VCC level VPC level more important Network resources allocated at this level
431
Location of UPC Function
432
Peak Cell Rate Algorithm
How UPC determines whether user is complying with contract Control of peak cell rate and CDVT Complies if peak does not exceed agreed peak Subject to CDV within agreed bounds Generic cell rate algorithm Leaky bucket algorithm
433
Generic Cell Rate Algorithm
434
Virtual Scheduling Algorithm
435
Cell Arrival at UNI (T=4.5δ)
436
Leaky Bucket Algorithm
437
Continuous Leaky Bucket Algorithm
438
Sustainable Cell Rate Algorithm
Operational definition of relationship between sustainable cell rate and burst tolerance Used by UPC to monitor compliance Same algorithm as peak cell rate
439
UPC Actions Compliant cell pass, non-compliant cells discarded
If no additional resources allocated to CLP=1 traffic, CLP=0 cells C If two level cell loss priority cell with: CLP=0 and conforms passes CLP=0 non-compliant for CLP=0 traffic but compliant for CLP=0+1 is tagged and passes CLP=0 non-compliant for CLP=0 and CLP=0+1 traffic discarded CLP=1 compliant for CLP=0+1 passes CLP=1 non-compliant for CLP=0+1 discarded
440
Possible Actions of UPC
441
Selective Cell Discard
Starts when network, at point beyond UPC, discards CLP=1 cells Discard low priority cells to protect high priority cells No distinction between cells labelled low priority by source and those tagged by UPC
442
Traffic Shaping GCRA is a form of traffic policing
Flow of cells regulated Cells exceeding performance level tagged or discarded Traffic shaping used to smooth traffic flow Reduce cell clumping Fairer allocation of resources Reduced average delay
443
Token Bucket for Traffic Shaping
444
Explicit Forward Congestion Indication
Essentially same as frame relay If node experiencing congestion, set forward congestion indication is cell headers Tells users that congestion avoidance should be initiated in this direction User may take action at higher level
445
ABR Traffic Management
QoS for CBR, VBR based on traffic contract and UPC described previously No congestion feedback to source Open-loop control Not suited to non-real-time applications File transfer, web access, RPC, distributed file systems No well defined traffic characteristics except PCR PCR not enough to allocate resources Use best efforts or closed-loop control
446
Best Efforts Share unused capacity between applications
As congestion goes up: Cells are lost Sources back off and reduce rate Fits well with TCP techniques (chapter 12) Inefficient Cells dropped causing re-transmission
447
Closed-Loop Control Sources share capacity not used by CBR and VBR
Provide feedback to sources to adjust load Avoid cell loss Share capacity fairly Used for ABR
448
Characteristics of ABR
ABR connections share available capacity Access instantaneous capacity unused by CBR/VBR Increases utilization without affecting CBR/VBR QoS Share used by single ABR connection is dynamic Varies between agreed MCR and PCR Network gives feedback to ABR sources ABR flow limited to available capacity Buffers absorb excess traffic prior to arrival of feedback Low cell loss Major distinction from UBR
449
Feedback Mechanisms (1)
Cell transmission rate characterized by: Allowable cell rate Current rate Minimum cell rate Min for ACR May be zero Peak cell rate Max for ACR Initial cell rate
450
Feedback Mechanisms (2)
Start with ACR=ICR Adjust ACR based on feedback Feedback in resource management (RM) cells Cell contains three fields for feedback Congestion indicator bit (CI) No increase bit (NI) Explicit cell rate field (ER)
451
Source Reaction to Feedback
If CI=1 Reduce ACR by amount proportional to current ACR but not less than CR Else if NI=0 Increase ACR by amount proportional to PCR but not more than PCR If ACR>ER set ACR<-max[ER,MCR]
452
Variations in ACR
453
Cell Flow on ABR Two types of cell Data & resource management (RM)
Source receives regular RM cells Feedback Bulk of RM cells initiated by source One forward RM cell (FRM) per (Nrm-1) data cells Nrm preset – usually 32 Each FRM is returned by destination as backwards RM (BRM) cell FRM typically CI=0, NI=0 or 1 ER desired transmission rate in range ICR<=ER<=PCR Any field may be changed by switch or destination before return
454
ATM Switch Rate Control Feedback
EFCI marking Explicit forward congestion indication Causes destination to set CI bit in ERM Relative rate marking Switch directly sets CI or NI bit of RM If set in FRM, remains set in BRM Faster response by setting bit in passing BRM Fastest by generating new BRM with bit set Explicit rate marking Switch reduces value of ER in FRM or BRM
455
Flow of Data and RM Cells
456
ARB Feedback v TCP ACK ABR feedback controls rate of transmission
Rate control TCP feedback controls window size Credit control ARB feedback from switches or destination TCP feedback from destination only
457
RM Cell Format
458
RM Cell Format Notes ATM header has PT=110 to indicate RM cell
On virtual channel VPI and VCI same as data cells on connection On virtual path VPI same, VCI=6 Protocol id identifies service using RM (ARB=1) Message type Direction FRM=0, BRM=1 BECN cell. Source (BN=0) or switch/destination (BN=1) CI (=1 for congestion) NI (=1 for no increase) Request/Acknowledge (not used in ATM forum spec)
459
Initial Values of RM Cell Fields
460
ARB Parameters
461
ARB Capacity Allocation
ATM switch must perform: Congestion control Monitor queue length Fair capacity allocation Throttle back connections using more than fair share ATM rate control signals are explicit TCP are implicit Increasing delay and cell loss
462
Congestion Control Algorithms- Binary Feedback
Use only EFCI, CI and NI bits Switch monitors buffer utilization When congestion approaches, binary notification Set EFCI on forward data cells or CI or NI on FRM or BRM Three approaches to which to notify Single FIFO queue Multiple queues Fair share notification
463
Single FIFO Queue When buffer use exceeds threshold (e.g. 80%)
Switch starts issuing binary notifications Continues until buffer use falls below threshold Can have two thresholds One for start and one for stop Stops continuous on/off switching Biased against connections passing through more switches
464
Multiple Queues Separate queue for each VC or group of VCs
Separate threshold on each queue Only connections with long queues get binary notifications Fair Badly behaved source does not affect other VCs Delay and loss behaviour of individual VCs separated Can have different QoS on different VCs
465
Fair Share Selective feedback or intelligent marking
Try to allocate capacity dynamically E.g. fairshare =(target rate)/(number of connections) Mark any cells where CCR>fairshare
466
Explicit Rate Feedback Schemes
Compute fair share of capacity for each VC Determine current load or congestion Compute explicit rate (ER) for each connection and send to source Three algorithms Enhanced proportional rate control algorithm EPRCA Explicit rate indication for congestion avoidance ERICA Congestion avoidance using proportional control CAPC
467
Enhanced Proportional Rate Control Algorithm(EPRCA)
Switch tracks average value of current load on each connection Mean allowed cell rate (MARC) MACR(I)=(1-α)*(MACR(I-1) + α*CCR(I) CCR(I) is CCR field in Ith FRM Typically α=1/16 Bias to past values of CCR over current Gives estimated average load passing through switch If congestion, switch reduces each VC to no more than DPF*MACR DPF=down pressure factor, typically 7/8 ER<-min[ER, DPF*MACR]
468
Load Factor Adjustments based on load factor LF=Input rate/target rate
Input rate measured over fixed averaging interval Target rate slightly below link bandwidth (85 to 90%) LF>1 congestion threatened VCs will have to reduce rate
469
Explicit Rate Indication for Congestion Avoidance (ERICA)
Attempt to keep LF close to 1 Define: fairshare = (target rate)/(number of connections) VCshare = CCR/LF = (CCR/(Input Rate)) *(Target Rate) ERICA selectively adjusts VC rates Total ER allocated to connections matches target rate Allocation is fair ER = max[fairshare, VCshare] VCs whose VCshare is less than their fairshare get greater increase
470
Congestion Avoidance Using Proportional Control (CAPC)
If LF<1 fairshare<-fairshare*min[ERU,1+(1-LF)*Rup] If LF>1 fairshare<-fairshare*min[ERU,1-(1-LF)*Rdn] ERU>1, determines max increase Rup between and 0.1, slope parameter Rdn, between 0.2 and 0.8, slope parameter ERF typically 0.5, max decrease in allottment of fair share If fairshare < ER value in RM cells, ER<-fairshare Simpler than ERICA Can show large rate oscillations if RIF (Rate increase factor) too high Can lead to unfairness
471
GRF Overview Simple as UBR from end system view
End system does no policing or traffic shaping May transmit at line rate of ATM adaptor Modest requirements on ATM network No guarantee of frame delivery Higher layer (e.g. TCP) react to congestion causing dropped frames User can reserve cell rate capacity for each VC Application can send at min rate without loss Network must recognise frames as well as cells If congested, network discards entire frame All cells of a frame have same CLP setting CLP=0 guaranteed delivery, CLP=1 best efforts
472
GFR Traffic Contract Peak cell rate PCR Minimum cell rate MCR
Maximum burst size MBS Maximum frame size MFS Cell delay variation tolerance CDVT
473
Mechanisms for supporting Rate Guarantees
Tagging and policing Buffer management Scheduling
474
Tagging and Policing Tagging identifies frames that conform to contract and those that don’t CLP=1 for those that don’t Set by network element doing conformance check May be network element or source showing less important frames Get lower QoS in buffer management and scheduling Tagged cells can be discarded at ingress to ATM network or subsequent switch Discarding is a policing function
475
Buffer Management Treatment of cells in buffers or when arriving and requiring buffering If congested (high buffer occupancy) tagged cells discarded in preference to untagged Discard tagged cell to make room for untagged cell May buffer per-VC Discards may be based on per queue thresholds
476
Scheduling Give preferential treatment to untagged cells
Separate queues for each VC Per VC scheduling decisions E.g. FIFO modified to give CLP=0 cells higher priority Scheduling between queues controls outgoing rate of VCs Individual cells get fair allocation while meeting traffic contract
477
Components of GFR Mechanism
478
GFR Conformance Definition
UPC function UPC monitors VC for traffic conformance Tag or discard non-conforming cells Frame conforms if all cells in frame conform Rate of cells within contract Generic cell rate algorithm PCR and CDVT specified for connection All cells have same CLP Within maximum frame size (MFS)
479
QoS Eligibility Test Test for contract conformance
Discard or tag non-conforming cells Looking at upper bound on traffic Determine frames eligible for QoS guarantee Under GFR contract for VC Looking at lower bound for traffic Frames are one of: Nonconforming: cells tagged or discarded Conforming ineligible: best efforts Conforming eligible: guaranteed delivery
480
Simplified Frame Based GCRA
481
Chapter 14 Overview of Graph Theory and Least-Cost Paths
482
Introduction Comms networks can be represented by graphs
Switches & routers are vertices Comms lines are edges Routing protocols use shortest path algorithms This chapter is background to chapters on routing
483
Elementary Concepts Graph G(V,E) is two sets of objects
Vertices (or nodes) , set V Edges, set E Defined as an unordered pair of vertices Shown as dots or circles (vertices) joined by lines (edges) Vertex i is adjacent to vertex j if (i,j) E Magnitude of graph G characterised by number of vertices |V| (called the order of G) and number of edges |E|, size of G Running time of algorithm measured in terms of order and size
484
Example Graph
485
Adjacent Matrix Used to represent graph Number vertices Arbitrary
The |V| x |V| adjacent matrix A=(ai,j) defined by: ai,j = 1 if (i,j) E 0 otherwise Matrix symmetrical about upper left to lower right diagonal Because edge defined as unordered pair
486
Adjacent Matrix Example
487
Terminology Two edges incident on same pair of vertices are parallel
Edge incident on single vertex is a loop Graph with neither parallel edges nor loos is simple Path from vertex i to vertex j is: Alternating sequence of vertices and edge Starting at i and ending at j Each edge joins vertices immediately before and after it Simple path – no vertex nor edge appears more than once In simple graph, simple path may be defined by sequence of vertices Each vertex adjacent to preceding and following vertices No vertex repeated
488
Simple Paths (1) From V1 to V6 (incomplete list) V1,V2,V3,V4,V5,V6
Total of 14 paths (Work out the rest yourself)
489
Simple Paths (2) V1,V3,V6 is shortest
Distance between vertices is minimum number of edges on all paths Cycle is path staring and ending on same vertex E.g. V1,V3,V4,V1
490
Digraphs Directed graph
G(V,E) with each edge defined by ordered pair of vertices Lines, representing edges, have arrow head to indicate direction Parallel edges allowed if in opposite directions Good for representing comms networks Each directed edge represents data flow in one direction Still use adjacent matrix Not symmetrical unless each pair of adjacent vertices connected by parallel edges
491
Weighted Graph Or weighted digraph Number associated with each edge
Used to illustrate routing algorithms Adjacent matrix defined as ai,j = wi,j if (i,j) E 0 otherwise Where wi,j is weight associated with edge (i,j) Length of path is sum of weights Shortest-distance path not necessarily shortest-length (see next two slides)
492
Weighted Graph and Adjacent Matrix
493
Path Distances and Lengths V1 to V6
Path Distance Length V1,V2,V3,V4,V5,V V1,V2,V3,V5,V V1,V2,V3,V V1,V2,V4,V3,V5,V V1,V2,V4,V5,V V1,V3,V2,V4,V5,V V1,V3,V V1,V4,V5,V
494
Trees Subset of graphs Equivalent definitions:
Simple graph such that if i and j vertices in T, there is a unique simple path from i to j Simple graph of N vertices is tree if it has N-1 edges and no cycles Simple graph of N vertices is tree if it has N-1 edges and is connected One vertex may be designated root Root drawn at top Vertices adjacent to root drawn at next level Can reach root on path distance 1
495
Family Tree Each vertex (except root) has one parent vertex
Adjacent vertex closer to root Each vertex has zero or more child vertices Adjacent vertices further from root Vertex without children is called a leaf Root assigned level 1 Vertices immediately under root level 1 Children of vertices on level 1 are on level 2
496
E.g. Tree
497
Subgraph Subgraph of graph G obtained by selecting number of edges and vertices from G For each edge, the two vertices incident on that edge must be selected Give graph G(E,V), graph G’(E’,V’) is a subgraph of G iff V’ V and E’ E and e’ E’, if e’ incident on v’ and w’ then v’, w’ V’
498
Spanning Tree Subgraph T of graph G is a spanning tree if T is a tree
T includes all vertices of G In other words remove edges from G such that: Remove all cycles Maintain connectivity Not usually unique
499
E.g. Spanning Trees For Previous Graph
Also previous tree example (slide 16)
500
Breadth First Search (BFS) for Spanning Tree
Partition vertices of graph into sets at various levels Process all vertices on given level before proceeding to next level Start at any vertex, x Assign it level 0 All adjacent vertices are at level 1 Let Vi1, Vi2, Vi3,… Vij, be vertices at level i Consider all vertices adjacent Vi1 not at level 1,2,…,i Assign these level (i+1) Consider all vertices adjacent Vi2 not at level 1,2,3,…,i, (i+1) Assign these also level (i+1) Until all vertices processed
501
E.g. Using Previous Graph
Choose order Obvious one is V1,V2,V3,V4,V5,V6 Select root Again, obvious one is V1 Let tree T consist of single vertex V1 with no edges Add to T each edge (V1,x) and vertex x Such that no cycle is produced Gives edges (V1,V2), (V1,V3), (V1,V4) and vertices V1,V2, V3 This is first level Repeat for all level 1 vertices to give level 2 All vertices now added If not repeat for level 2 to give level 3 …
502
BFS of Previous Graph
503
Shortest Path Distance
BFS finds shortest path distance from given source vertex to all other vertices Minimum number of edges in any path from s to v, δ(s,v)
504
Estimated Running Time
After initialization each vertex is used exactly once as a starting point for adding the next layer Time take is order of |V| Each edge already in tree is rejected if examined again Each edge not in tree is checked to see if it produces a cycle If not it is included Bulk of edge processing is once per edge Time take is order of |E| Total time taken is linear with |V| and |E|
505
Shorted Path Length Determination
Packet switching, frame relay or ATM network can be viewed as digraph Each node is a vertex Each link is a pair of parallel edges For an internet (Internet or intranet) Each router is vertex If routers directly connected (e.g. LAN or WAN) two way connection corresponds to pair of parallel edges If more than two routers, network represented by multiple pairs of parallel edges One pair connecting each pair of routers In both cases, routing decision needed to pass packet from source to destination Equivalent to finding path through a graph
506
Routing Decisions Based on least cost Minimum number of hops
Each edge (hop) has weight 1 Corresponds to minimum path distance Or, cost associated with each hop (next slide) Cost of path is sum of costs of links in path Want least cost path Corresponds to minimum path length in weighted digraph
507
Cost of a Hop Inversely proportional to path capacity
Proportional to current load Monetary cost of link etc. Combination May be different in different directions
508
Dijkstra’s Algorithm (1) – Definitions
N = set of vertices in network s = source vertex (starting point) T = set of vertices so far incorporated Tree = spanning tree for vertices in T including edges on least-cost path from s to each vertex in T w(i,j) = link cost from vertex i to vertex j w(i,i) = 0 w(i,j) = if i, j not directly connected by a single edge w(i,j) 0 of i,j directly connected by single edge L(n) = cost of least cost path from s to n currently known At termination, this is least cost path from s to n
509
Dijkstra’s Algorithm (2) – Steps
Initialization T = Tree = {s} - only source is so far incorporated L(n) = w(s,n) for n s - initial path cost to neighbors are link costs Get next vertex Find x T such that L(x) = min L(j), j T Add x to T and Tree Add edge to T incident on x and has least cost Last hop in path Update least cost paths L(n) = min[L(n), L(x) + w(x,n)] n T If latter term is minimum, path from s to n is now path from s to x concatenated with edge from x to n
510
Dijkstra’s Algorithm (3) – Notes
Terminate when all vertices added to T Requires |V| iterations At termination L(x) associated with each vertex is cost of least cost path from s to x Tree is a spanning tree Defines least cost path from s to each other vertex One step adds one vertex to T and defines least cost path from s to that vertex Running time order of |V|2
511
Dijkstra’s Algorithm on Example Graph
512
Bellman-Ford Algorithm (1) – Definitions
s = source vertex (starting point) w(i,j) = link cost from vertex i to vertex j w(i,i) = 0 w(i,j) = if i, j not directly connected by a single edge w(i,j) 0 of i,j directly connected by single edge h = max number of links in path at current stage Lh(n) = cost of least cost path from s to n such that no more than h links
513
Bellman-Ford Algorithm (2) – Steps
Initialization L0(n) = n s Lh(s) = 0 h Update For each successive h 0 For each n s, compute: Lh+1+(n) = min[Lh(j)+ w(j,n)], j Connect n with predecessor vertex j that achieves minimum Eliminate any connection of n with different predecessor vertex from previous iteration Path from s to n terminates with link from j to n
514
Bellman-Ford Algorithm (3) – Notes
Results agree with Dijkstra Running time order of |V| x |E|
515
Bellman-Ford Algorithm on Example Graph
516
Results of Dijkstra and Bellman-Ford
517
Comparison of Information Needed – Bellman-Ford
Calculation for vertex n involves knowledge of link cost to all neighbors of n plus total path cost to each from source Each vertex can keep set of costs and paths for every other vertex in network Exchange information with direct neighbors Each vertex can use use Bellman-Ford step 2 based on information from neighbors and knowledge of link costs to update its costs and paths
518
Comparison of Information Needed – Dijkstra
Step 3 requires each vertex must have complete topology Must know link costs of all links in network Information must be exchanged between all other vertices Evaluation must also consider calculation time
519
Other Notes Both Algorithms converge under static conditions of topology and link cost Give to same solution If link costs change, algorithms will attempt to catch up If link costs depend on traffic, which depends on routes chosen: Feedback condition exists Instability may result
520
Chapter 15 Interior Routing Protocols
521
Introduction Routing protocols essential to operation of an internet
Routers forward IP datagrams from one router to another on path from source to destination Router must have idea of topology of internet Routing protocols provide this information
522
Internet Routing Principles
Routers receive and forward datagrams Make routing decisions based on knowledge of topology and conditions on internet Decisions based on some least cost criterion
523
Fixed Routing Single permanent route configured for each source-destination pair Routes fixed May change when topology changes Link cost not based on dynamic data Based on estimated traffic volumes or capacity of link
524
Figure 1 A Configuration of Routers and Networks
525
Discussion of Example 5 networks, 8 routers
Link cost for output side of each router for each network Next slide shows how fixed cost routing may be implemented Each router has routing table
526
Routing Table One required for each router Entry for each network
Not for each destination Routing only needs network portion Once datagram reaches router attached to destination network, that router can deliver to host IP address typically has network and host portion Each entry shows next node on route Not whole route
527
Routing Tables in Hosts
May also exist in hosts If attached to single network with single router then not needed All traffic must go through that router (called the gateway) If multiple routers attached to network, host needs table saying which to use
528
Figure 2 Example Routing Tables
529
Interior Routing Protocols
Adaptive Routing As conditions on internet changes, routes may change Failure Can route round problems Congestion Can route round congestion Avoid, or at least not add to further congestion Chapter 11
530
Drawbacks of Adaptive Routing
More complex routing decisions Router processing increases Depends on information collected in one place but used in another More information exchanged improves routing decisions but increases overhead May react two fast causing congestion through oscillation May react to slow, being irrelevant Can produce pathologies Fluttering Looping
531
Fluttering Rapid oscillation in routing
Due to router attempting load balancing or splitting Splitting traffic among a number of routes May result in successive packets bound for same destination taking very different routes (see next slide)
532
Figure 3 Example of Fluttering
533
Problems with Fluttering
If in one direction only, route characteristics may differ in the two directions Including timing and error characteristics Confuses management and troubleshooting applications that measure these Difficulty estimating round trip times TCP packets arrive out of order Spurious retransmission Duplicate acknowledgements
534
Looping Packet forwarded by router eventually returns to that router
Algorithms designed to prevent looping May occur when changes in connectivity not propagated fast enough to all other routers
535
Adaptive Routing Advantages
Improve performance as seen by user Can aid congestion control Benefits depend on soundness of design Adaptive routing very complex Continual evolution of protocols
536
Classification of Adaptive Routing Strategies
Based on information sources Local E.g. route each datagram to network with shortest queue Balance loads on networks May not be heading in correct direction Include preferred direction Rarely used Adjacent nodes Distance vector algorithms All nodes Link-state algorithms Both need routing protocol to exchange information
537
Autonomous Systems (AS)
Group of routers exchanging information via common routing protocol Set of routers and networks managed by single organization Connected Except in time of failure
538
Interior Routing Protocol (IRP)
Passes routing information between routers within AS Does not need to be implemented outside AS Allows IRP to be tailored May be different algorithms and routing information in different connected AS Need minimum information from other connected AS At least one router in each AS must talk Use Exterior Routing Protocol (ERP)
539
Exterior Routing Protocol (ERP)
Pass less information than IRP Router in first system determines route to target AS Routers in target AS then co-operate to deliver datagram ERP does not deal with details within target AS
540
Figure 4 Application of Exterior and Interior Routing Protocols
541
Approaches to Routing – Distance-vector
Each node (router or host) exchange information with neighboring nodes Neighbors are both directly connected to same network First generation routing algorithm for ARPANET Node maintains vector of link costs for each directly attached network and distance and next-hop vectors for each destination Used by Routing Information Protocol (RIP) Requires transmission of lots of information by each router Distance vector to all neighbors Contains estimated path cost to all networks in configuration Changes take long time to propagate
542
Approaches to Routing – Link-state
Designed to overcome drawbacks of distance-vector When router initialized, it determines link cost on each interface Advertises set of link costs to all other routers in topology Not just neighboring routers From then on, monitor link costs If significant change, router advertises new set of link costs Each router can construct topology of entire configuration Can calculate shortest path to each destination network Router constructs routing table, listing first hop to each destination
543
Router does not use distributed routing algorithm
Use any routing algorithm to determine shortest paths In practice, Dijkstra's algorithm Open shortest path first (OSPF) protocol uses link-state routing. Also second generation routing algorithm for ARPANET
544
Exterior Router Protocols – Path-vector
Dispense with routing metrics Provide information about which networks can be reached by a given router and ASs crossed to get there Does not include distance or cost estimate Each block of information lists all ASs visited on this route Enables router to perform policy routing E.g. avoid path to avoid transiting particular AS E.g. link speed, capacity, tendency to become congested, and overall quality of operation, security E.g. minimizing number of transit Ass
545
Least Cost Algorithms Least-cost criterion
If minimize number of hops, link value 1 Link value may be inversely proportional to capacity, proportional to current load, or some combination May differ in different two directions E.g. if cost equaled length of queue Cost of path between two nodes as sum of costs of links traversed For each pair of nodes, find least cost path Dijkstra's algorithm Bellman-Ford algorithm
546
Figure 5 Dijkstra’s Algorithm Applied to Figure 1
547
Figure 6 Bellman-Ford Algorithm Applied to Figure 1
548
Comparison of Algorithms
Bellman-Ford Link cost to all neighboring nodes to node n [i.e., w(j, n)] plus total path cost to those neighboring nodes from a particular source node s [i.e., Lh(j)] Each node can maintain set of costs and associated paths for every other node and exchange information with direct neighbors Each node can use Bellman-Ford based only on information from neighbors and knowledge of its link costs Dijkstra Each node must know link costs of all links Information must be exchanged with all other nodes Both converge under static conditions to same solution If costs change algorithm will attempt to catch up If cost depends on traffic Depends on routes chosen then feedback condition exists Instabilities may result
549
Distance Vector Routing
Each node exchange information with neighbors Directly connected by same network Each node maintains three vectors Link cost Distance vector Next hop vector Every 30 seconds, exchange distance vector with neighbors Use this to update distance and next hop vector
550
Figure 7 Distance Vector Algorithm Applied to Figure 1
551
Distributed Bellman-Ford
RIP is a distributed version of Bellman-Ford Original routing algorithm in ARPANET Each simultaneous exchange of vectors between routers is equivalent to one iteration of step 2 In fact, asynchronous exchange used At start-up, get vectors from neighbors Gives initial routing By own timer, update every 30 seconds Changes are propagated across network Routing converges within finite time Proportional to number of routers
552
RIP Details – Incremental Update
Updates do not arrive from neighbors within small time window RIP packets use UDP Tables updated after receipt of individual distance vector Add any new destination network Replace existing routes with small delay ones If update from router R, update all routes using R as next hop
553
RIP Details – Topology Change
If no updates received from a router within 180 seconds, mark route invalid Assumes router crash or network connection unstable Set distance value to infinity Actually 16
554
Counting to Infinity Problem (1)
Slow convergence may cause: All link costs 1 B has distance to network 5 as 2, next hop D A & C have distance 3 and next hop B
555
Counting to Infinity Problem (2)
Suppose router D fails: B determines network 5 no longer reachable via D Sets distance to 4 based on report from A or C At next update, B tells A and C this A and C receive this and increment their network 5 distance to 5 4 from B plus 1 to reach B B receives distance count 5 and assumes network 5 is 6 away Repeat until reach infinity (16) Takes 8 to 16 minutes to resolve
556
Figure 8 Counting to Infinity Problem
557
Split Horizon Counting to infinity problem caused by misunderstanding between B and A, and B and C Each thinks it can reach network 5 via the other Split Horizon rule says do not send information about a route back in the direction it came from Router sending information is nearer destination than you Erroneous route now eliminated within time out period (180 seconds)
558
Poisoned Reverse Send updates with hop count of 16 to neighbors for route learned from those neighbors If two routers have routes pointing at each other advertising reverse route with metric 16 breaks loop immediately
559
Figure 9 RIP Packet Format
560
RIP Packet Format Notes
Command: 1=request 2=reply Updates are replies whether asked for or not Initializing node broadcasts request Requests are replied to immediately Version: 1 or 2 Address family: 2 for IP IP address: non-zero network portion, zero host portion Identifies particular network Metric Path distance from this router to network Typically 1, so metric is hop count
561
RIP Limitations Destinations with metric more than 15 are unreachable
If larger metric allowed, convergence becomes lengthy Simple metric leads to sub-optimal routing tables Packets sent over slower links Accept RIP updates from any device Misconfigured device can disrupt entire configuration
562
Open Shortest Path First (OSPF)
RIP limited in large internets OSPF preferred interior routing protocol for TCP/IP based internets Link state routing used
563
Link State Routing When initialized, router determines link cost on each interface Router advertises these costs to all other routers in topology Router monitors its costs When changes occurs, costs are re-advertised Each router constructs topology and calculates shortest path to each destination network Not distributed version of routing algorithm Can use any algorithm Dijkstra
564
Flooding Packet sent by source router to every neighbor
Incoming packet resent to all outgoing links except source link Duplicate packets already transmitted are discarded Prevent incessant retransmission All possible routes tried so packet will get through if route exists Highly robust At least one packet follows minimum delay route Reach all routers quickly All nodes connected to source are visited All routers get information to build routing table High traffic load
565
Figure 10 Flooding Example
566
OSPF Overview Router maintains descriptions of state of local links
Transmits updated state information to all routers it knows about Router receiving update must acknowledge Lots of traffic generated Each router maintains database Directed graph
567
Router Database Graph Vertices Router Network Transit Stub Edges
Connecting two routers Connecting router to network Built using link state information from other routers
568
Figure 11 Sample Autonomous System
569
Figure 12 Directed Graph of Autonomous System of Figure 7
570
Link Costs Cost of each hop in each direction is called routing metric
OSPF provides flexible metric scheme based on type of service (TOS) Normal (TOS) 0 Minimize monetary cost (TOS 2) Maximize reliability (TOS 4) Maximize throughput (TOS 8) Minimize delay (TOS 16) Each router generates 5 spanning trees (and 5 routing tables)
571
Figure 13 The SPF Tree for Router R6
572
Areas Make large internets more manageable
Configure as backbone and multiple areas Area – Collection of contiguous networks and hosts plus routers connected to any included network Backbone – contiguous collection of networks not contained in any area, their attached routers and routers belonging to multiple areas
573
Operation of Areas Each are runs a separate copy of the link state algorithm Topological database and graph of just that area Link state information broadcast to other routers in area Reduces traffic Intra-area routing relies solely on local link state information
574
Inter-Area Routing Path consists of three legs Within source area
Intra-area Through backbone Has properties of an area Uses link state routing algorithm for inter-area routing Within destination area
575
Figure 14 OSPF Packet Header
576
Packet Format Notes Version number: 2 is current
Type: one of 5, see next slide Packet length: in octets including header Router id: this packet’s source, 32 bit Area id: Area to which source router belongs Authentication type: null, simple password or encryption Authentication data: used by authentication procedure
577
OSPF Packet Types Hello: used in neighbour discovery
Database description: Defines set of link state information present in each router’s database Link state request Link state update Link state acknowledgement
578
Chapter 16 Exterior Routing Protocols and Multicast
579
Boarder Gateway Protocol (BGP)
Allows routers (gateways) in different ASs to exchange routing information Messages sent over TCP See next slide Three functional procedures Neighbor acquisition Neighbor reachability Network reachability
580
Table 1 BGP-4 Messages
581
Neighbor Acquisition Neighbors attach to same subnetwork
If in different ASs routers may wish to exchange information Neighbor acquisitionis when two neighboring routers agree to exchange routing information regularly Needed because one router may not wish to take part One router sends request, the other acknowledges Knowledge of existence of other routers and need to exchange information established at configuration time or by active intervention
582
Neighbor Reachability
Periodic issue of keepalive messages Between all routers that are neighbors
583
Network Reachability Each router keeps database of subnetworks it can reach and preferred route When change made, router issues update message All BGP routers build up and maintain routing information
584
Figure 1 BGP Message Formats
585
Neighbor Acquisition Detail
Router opens TCP connection with neighbor Sends open message Identifies sender’s AS and gives IP address Includes Hold Time As proposed by sender If recipient prepared to open neighbor relationship Calculate hold time min [own hold time, received hold time] Max time between keepalive/update messages Reply with keepalive
586
Keepalive Detail Header only
Often enough to prevent hold time expiring
587
Update Detail Information about single route through internet
Information to be added to database of any recipient router Network layer reachability information (NLRI) List of network portions of IP addresses of subnets reached by this route Total path attributes length field Path attributes field (next slide) List of previously advertised routes being withdrawn May contain both
588
Path Attributes Field Origin
Interior (e.g. OSPF) or exterior (BGP) protocol AS_Path ASs traversed for this route Next_Hop IP address of boarder router for next hop Multi_Exit_disc Information about routers internal to AS Local_Pref Tell other routers within AS degree of preference Atomic_Aggregate, Aggregator Uses subnet addresses in tree view of network to reduce information needed in NLRI
589
Withdrawal of Route(s)
Route identified by IP address of destination subnetwork(s)
590
Notification Message Error notification Message header error
Includes authentication and syntax errors Open message error Syntax errors and option not recognised Proposed hold time unacceptable Update message error Syntax and validity errors Hold time expired Finite state machine error Cease Close connection in absence of any other error
591
BGP Routing Information Exchange
R1 constructs routing table for AS1 using OSPF R1 issues update message to R5 (in AS2) AS_Path: identity of AS1 Next_Hop: IP address of R1 NLRI: List of all subnets in AS1 Suppose R5 has neighbor relationship with R9 in AS3 R9 forwards information from R1 to R9 in update message AS_Path: list of ids {AS2,AS1} Next_Hop: IP address of R5 NLRI: All subnets in AS1 R9 decides if this is prefered route and forwards to neighbors
592
Inter-Domain Routing Protocol (IDRP)
Exterior routing protocol for IPv6 ISO-OSI standard Path-vector routing Superset of BGP Operates over any internet protocol (not just TCP) Own handshaking for guaranteed delivery Variable length AS identifiers Handles multiple internet protocols and address schemes Aggregates path information using routing domain confederations
593
Routing Domain Confederations
Set of connected AS Appear to outside world as single AS Recursive Effective scaling
594
Multicasting Addresses that refer to group of hosts on one or more networks Uses Multimedia “broadcast” Teleconferencing Database Distributed computing Real time workgroups
595
Figure 2 Example Configuration
596
Broadcast and Multiple Unicast
Broadcast a copy of packet to each network Requires 13 copies of packet Multiple Unicast Send packet only to networks that have hosts in group 11 packets
597
True Multicast Determine least cost path to each network that has host in group Gives spanning tree configuration containing networks with group members Transmit single packet along spanning tree Routers replicate packets at branch points of spanning tree 8 packets required
598
Figure 3 Multicast Transmission Example
599
Table 2 Traffic Generated by Various Multicasting Strategies
600
Requirements for Multicasting (1)
Router may have to forward more than one copy of packet Convention needed to identify multicast addresses IPv4 - Class D - start 1110 IPv6 - 8 bit prefix, all 1, 4 bit flags field, 4 bit scope field, 112 bit group identifier Nodes must translate between IP multicast addresses and list of networks containing group members Router must translate between IP multicast address and network multicast address
601
Requirements for Multicasting (2)
Mechanism required for hosts to join and leave multicast group Routers must exchange info Which networks include members of given group Sufficient info to work out shortest path to each network Routing algorithm to work out shortest path Routers must determine routing paths based on source and destination addresses
602
Figure 4 Spanning Tree from Router C to Multicast Group
603
Internet Group Management Protocol (IGMP)
RFC 3376 Host and router exchange of multicast group info Use broadcast LAN to transfer info among multiple hosts and routers
604
Principle Operations Hosts send messages to routers to subscribe to and unsubscribe from multicast group Group defined by multicast address Routers check which multicast groups of interest to which hosts IGMP currently version 3 IGMPv1 Hosts could join group Routers used timer to unsubscribe members
605
Operation of IGMPv1 & v2 Receivers have to subscribe to groups
Sources do not have to subscribe to groups Any host can send traffic to any multicast group Problems: Spamming of multicast groups Even if application level filters drop unwanted packets, they consume valuable resources Establishment of distribution trees is problematic Location of sources is not known Finding globally unique multicast addresses difficult
606
IGMP v3 Allows hosts to specify list from which they want to receive traffic Traffic from other hosts blocked at routers Allows hosts to block packets from sources that send unwanted traffic
607
Figure 5a IGMP Message Formats Membership Query
608
Membership Query Sent by multicast router General query
Which groups have members on attached network Group-specific query Does group have members on an attached network Group-and-source specific query Do attached device want packets sent to specified multicast address From any of specified list of sources
609
Membership Query Fields (1)
Type Max Response Time Max time before sending report in units of 1/10 second Checksum Same algorithm as IPv4 Group Address Zero for general query message Multicast group address for group-specific or group-and-source S Flag 1 indicates that receiving routers should suppress normal timer updates done on hearing query
610
Membership Query Fields (2)
QRV (querier's robustness variable) RV value used by sender of query Routers adopt value from most recently received query Unless RV was zero, when default or statically configured value used RV dictates number of retransmissions to assure report not missed QQIC (querier's querier interval code) QI value used by querier Timer for sending multiple queries Routers not current querier adopt most recently received QI Unless QI was zero, when default QI value used Number of Sources Source addresses One 32 bit unicast address for each source
611
Figure 5b IGMP Message Formats Membership Report
612
Membership Reports Type Checksum Number of Group Records Group Records
One 32-bit unicast address per source
613
Figure 5c IGMP Message Formats Group Record
614
Group Record Record Type See later Aux Data Length In 32-bit words
Number of Sources Multicast Address Source Addresses One 32-bit unicast address per source Auxiliary Data Currently, no auxiliary data values defined
615
IGMP Operation - Joining
Host using IGMP wants to make itself known as group member to other hosts and routers on LAN IGMPv3 can signal group membership with filtering capabilities with respect to sources EXCLUDE mode – all group members except those listed INCLUDE mode – Only from group members listed To join group, host sends IGMP membership report message Address field multicast address of group Sent in IP datagram with Group Address field of IGMP message and Destination Address encapsulating IP header same Current members of group will receive learn of new member Routers listen to all IP multicast addresses to hear all reports
616
IGMP Operation – Keeping Lists Valid
Routers periodically issue IGMP general query message In datagram with all-hosts multicast address Hosts that wish to remain in groups must read datagrams with this all-hosts address Hosts respond with report message for each group to which it claims membership Router does not need to know every host in a group Needs to know at least one group member still active Each host in group sets timer with random delay Host that hears another claim membership cancels own report If timer expires, host sends report Only one member of each group reports to router
617
IGMP Operation - Leaving
Host leaves group, by sending leave group message to all-routers static multicast address Send membership report message with EXCLUDE option and null list of source addresses Router determine if there are any remaining group members using group-specific query message
618
Group Membership with IPv6
IGMP defined for IPv4 Uses 32-bit addresses IPv6 internets need functionality IGMP functions incorporated into Internet Control Message Protocol version 6 (ICMPv6) ICMPv6 includes all of functionality of ICMPv4 and IGMP ICMPv6 includes group-membership query and group-membership report message Used in the same fashion as in IGMP
619
Multicast Extension to OSPF (MOSPF)
Enables routing of IP multicast datagrams within single AS Each router uses MOSPF to maintain local group membership information Each router periodically floods this to all routers in area Routers build shortest path spanning tree from a source network to all networks containing members of group (Dijkstra) Takes time, so on demand only
620
Forwarding Multicast Packets
If multicast address not recognised, discard If router attaches to a network containing a member of group, transmit copy to that network Consult spanning tree for this source-destination pair and forward to other routers if required
621
Equal Cost Multipath Ambiguities
Dijkstra’ algorithm will include one of multiple equal cost paths Which depends on order of processing nodes For multicast, all routers must have same spanning tree for given source node MOSPF has tiebreaker rule
622
Interarea Multicasting
Multicast groups amy contain members from more than one area Routers only know about multicast groups with members in its area Subset of area’s border routers forward group membership information and multicast datagrams between areas Interarea multicast forwarders
623
Inter-AS Multicasting
Certain boundary routers act as inter-AS multicast forwarders Run and inter-AS multicast routing protocol as well as MOSPF and OSPF MOSPF makes sure they receive all multicast datagrams from within AS Each such router forwards if required Use reverse path routing to determine source Assume datagram from X enters AS at point advertising shortest route back to X Use this to determine path of datagram through MOSPF AS
624
Figure 6 Illustrations of MOSPF Routing
625
Multicast Routing Protocol Characteristics
Extension to existing protocol MOSPF v OSPF Designed to be efficient for high concentration of group members Appropriate with single AS Not for large internet
626
Protocol Independent Multicast (PIM)
Independent of unicast routing protocols Extract required routing information from any unicast routing protocol Work across multiple AS with different unicast routing protocols
627
PIM Strategy Flooding is inefficient over large sparse internet
Little opportunity for shared spanning trees Focus on providing multiple shortest path unicast routes Two operation modes Dense mode For intra-AS Alternative to MOSPF Sparse mode Inter-AS multicast routing
628
Spares Mode PIM A spare group:
Number of networks/domains with group members present significantly small than number of networks/domains in internet Internet spanned by group not sufficiently resource rich to ignore overhead of current multicast schemes
629
Group Destination Router Group Source Router
Has local group members Router becomes destination router for given group when at least one host joins group Using IGMP or similar Group source router Attaches to network with at least one host transmitting on multicast address via that router
630
PIM Approach For a group, one router designated rendezvous point (RP)
Group destination router sends join message towards RP requesting its members be added to group Use unicast shortest path route to send Reverse path becomes part of distribution tree for this RP to listeners in this group Node sending to group sends towards RP using shortest path unicast route Destination router may replace group-shared tree with shortest path tree to any source By sending a join back to source router along unicast shortest path Selection of RP dynamic Not critical
631
Figure 7 Example of PIM Operation
632
Chapter 17 Integrated and Differentiated Services
633
Introduction New additions to Internet increasing traffic
High volume client/server application Web Graphics Real time voice and video Need to manage traffic and control congestion IEFT standards Integrated services (IntServ) Collective service to set of traffic demands in domain Limit demand & reserve resources Differentiated services (DiffServ) Classify traffic in groups Different group traffic handled differently
634
Integrated Services Architecture (ISA)
IPv4 header fields for precedence and type of service usually ignored Need to support Quality of Service (QoS) within TCP/IP Add functionality to routers Means of requesting QoS
635
Internet Traffic – Elastic
Can adjust to changes in delay and throughput E.g. common TCP and UDP application – insensitive to delay changes FTP – User expect delay proportional to file size Sensitive to changes in throughput SNMP – delay not a problem, except when caused by congestion Web (HTTP), TELNET – sensitive to delay Not per packet delay – total elapsed time E.g. web page loading time For small items, delay across internet dominates For large items it is throughput over connection Need some QoS control to match to demand
636
Internet Traffic – Inelastic
Does not easily adapt to changes in delay and throughput Real time traffic Requirements: Throughput Minimum may be required Delay E.g. stock trading Jitter - Delay variation More jitter requires a bigger buffer E.g. teleconferencing requires reasonable upper bound Packet loss
637
Inelastic Traffic Problems
Difficult to meet requirements on network with variable queuing delays and congestion Need preferential treatment Applications need to state requirements Ahead of time (preferably) or on the fly Using fields in IP header Resource reservation protocol Must still support elastic traffic Deny service requests that leave too few resources to handle elastic traffic demands
638
ISA Approach Provision of QoS over IP
Sharing available capacity when congested Router mechanisms Routing Algorithms Select to minimize delay Packet discard Causes TCP sender to back off and reduce load
639
Flow IP packet can be associated with a flow
RFC 1633 defines a flow as a distinguishable stream of related IP packets that results from a single user activity and requires same QoS. E.g. one transport connection or one video stream Unidirectional Can be more than one recipient Multicast Membership of flow identified by source and destination IP address, port numbers, protocol type IPv6 header flow identifier can be used but is not necessarily equivalent to ISA flow
640
ISA Functions Admission control
For QoS, reservation required for new flow RSVP used Routing algorithm Routing decision based on QoS parameters Queuing discipline Take account of different flow requirements Discard policy The choice and timing of packet discards Manage congestion and meet QoS
641
Figure 1 ISA Implemented in Router
Background Forwarding
643
ISA Components – Background Functions
Reservation Protocol RSVP (Resource ReSerVation Protocol) Admission control Management agent Can use agent to modify traffic control database and direct admission control Routing protocol Maintaining a routing database
644
ISA Components – Forwarding
Classifier and route selection Incoming packets mapped to classes Single flow or set of flows with same QoS E.g. all video flows Based on IP header fields Determines next hop Packet scheduler Manages one or more queues for each output Order queued packets sent Based on class, traffic control database, current and past activity on outgoing port Policing Determine whether the packet traffic in a flow exceeds the requested capacity. Decide how to treat the excess packets.
645
ISA Services ISA service for a flow is defined on two levels.
General categories of service Guaranteed Controlled load Best effort (default) The service for a particular is specified by values of certain parameters. Tspec Rspec
646
Token Bucket Traffic Specification
A way of characterizing traffic Three advantages: Many traffic sources can be defined by token bucket scheme Provides concise description of load imposed by flow. Easy to determine resource requirements Provides input parameters to policing function Consists of two parameters R: token replenishment rate B: bucket size During any time period T, the amount of data sent cannot exceed RT + B
647
Figure 2 Token Bucket Scheme
648
ISA Services – Guaranteed Service
Key elements of guaranteed service Assured capacity level or data rate Specific upper bound on queuing delay through network Must be added to propagation delay to get total delay No queuing losses I.e. no packets are lost due to buffer overflow E.g. Real time play back of incoming signal can use delay buffer for incoming signal but will not tolerate packet loss
649
ISA Services – Controlled Load
Key elements of controlled load service Tightly approximates to best efforts under unloaded conditions No upper bound on queuing delay. High percentage of packets do not experience delay over minimum transit delay Very high percentage delivered. Almost no queuing loss Useful for adaptive real time applications Receiver measures jitter and sets playback point Video can drop a frame or delay output slightly Voice can adjust silence periods
650
Differentiated Services (DS)
ISA and RSVP complex to deploy May not scale well for large volumes of traffic Amount of control signals required Maintenance of state information at routers DS architecture (RFC 2475) is designed to provide simple, easy to implement, low overhead tool Support range of network services differentiated on basis of performance
651
Characteristics of DS Use IPv4 header Type of Service or IPv6 Traffic Class field No change to IP Service level agreement (SLA) established between provider (internet domain) and customer prior to use. DS mechanisms not needed in applications Build-in aggregation All traffic with same DS field treated same E.g. multiple voice connections DS implemented in individual routers by queuing and forwarding based on DS field State information on flows not saved by routers
652
DS Terminology (1) Behavior Aggregate
A set of packets with the same DS codepoint crossing a link in a particular direction. Classifier Selects packets based on the DS field (BA classifier) or on multiple fields within the packet header (MF classifier). DS Boundary Node A DS node that connects one DS domain to a node in another domain DS Codepoint A specified value of the 6-bit DSCP portion of the 8-bit DS field in the IP header. DS Domain A contiguous (connected) set of nodes, capable of implementing differentiated services, that operate with a common set of service provisioning policies and per-hop behavior definitions. DS Interior Node A DS node that is not a DS boundary node. DS Node A node that supports differentiated services. Typically, a DS node is a router. A host system that provides differentiated services for applications in the host is also a DS node. Dropping The process of discarding packets based on specified rules; also called policing.
653
Table 1 DS Terminology (2)
Marking The process of setting the DS codepoint in a packet. Packets may be marked on initiation and may be re-marked by an en route DS node. Metering The process of measuring the temporal properties (e.g., rate) of a packet stream selected by a classifier. The instantaneous state of that process may affect marking, shaping, and dropping functions. Per-Hop Behavior (PHB) The externally observable forwarding behavior applied at a node to a behavior aggregate. Service Level Agreement (SLA) A service contract between a customer and a service provider that specifies the forwarding service a customer should receive. Shaping The process of delaying packets within a packet stream to cause it to conform to some defined traffic profile. Traffic Conditioning Control functions performed to enforce rules specified in a TCA, including metering, marking, shaping, and dropping. Traffic Conditioning Agreement (TCA) An agreement specifying classifying rules and traffic conditioning rules that are to apply to packets selected by the classifier.
654
Services Provided within DS domain
Contiguous portion of Internet over which consistent set of DS policies administered. Typically under control of one administrative entity Defined in SLA (Service Level Agreement) SLA: Service contract between customer and service provider Specify packet classes, marked in DS field Service provider configures forwarding policies at routers Must measure performance provided for each class DS domain is expected to provide agreed service. If destination in another domain, DS domain attempts to forward packets through other domains, requesting appropriate service to match the requested service.
655
SLA Parameters Detailed service performance parameters
Throughput, drop probability, latency Constraints on ingress and egress points Indicate scope of service Traffic profiles to be adhered to Token bucket Disposition of traffic in excess of profile
656
Example Services Qualitative Service Level A: Low latency
Service Level B: Low loss Quantitative C: 90% in-profile traffic delivered with no more than 50 ms latency D: 95% in-profile traffic delivered Mixed E: Twice bandwidth of F F: Traffic with drop precedence X has higher delivery probability than that with drop precedence Y
657
Figure 11 DS Field – DS Codepoint
658
DS/ECN
659
DS Field Detail Leftmost 6 bits are DS codepoint
64 different classes available 3 pools xxxxx0 : reserved for standards : default packet class xxx000 : reserved for backwards compatibility with IPv4 TOS xxxx11 : reserved for experimental or local use xxxx01 : reserved for experimental or local use but may be allocated for future standards if needed Rightmost 2 bits unused
660
Precedence Field Indicates degree of urgency or priority
If router supports precedence, three approaches: Route selection Particular route may be selected if smaller queue or next hop on supports network precedence or priority e.g. token ring supports priority Network service Network on next hop supports precedence, service is invoked Queuing discipline Use precedence to affect how queues handled E.g. preferential treatment in queues to datagrams with higher precedence
661
Type of Service
662
Router Queuing Disciplines – 1. Queue Service
RFC 1812 Queue service SHOULD implement precedence-ordered queue service (strict ordering) Highest precedence packet queued for link is sent MAY implement policy-based throughput management procedures other than strict ordering MUST be configurable to suppress them
663
Router Queuing Disciplines – 2. Congestion Control
Router receives packet beyond storage capacity Discard it or other packet(s) MAY discard packet just received Simplest but not best policy Should select packet from session most heavily abusing link, given that QoS permits this. FIFO queues: discard packet randomly selected Fair queues: discard from longest queue If precedence-ordered implemented and enabled MUST NOT discard packet with precedence higher than packet not discarded MAY protect packets that request maximize reliability TOS MAY protect fragmented IP packets MAY protect packets used for control or management
664
DS Configuration and Operation
665
Configuration – Interior Routers
Domain consists of set of contiguous routers Interpretation of DS codepoints within domain is consistent Interior nodes (routers) have simple mechanisms to handle packets based on codepoints Queuing discipline Gives preferential treatment depending on codepoint, i.e. Per Hop behaviour (PHB) in DS specification. PHB must be available to all routers Typically the only part implemented in interior routers Packet dropping rule Dictate which to drop when buffer saturated
666
Configuration – Boundary Routers
Include PHB and traffic conditioning Five elements of traffic conditioning function: Classifier (BA or MF) Behavior aggregate classifier (DS codepoint), Multi-field classifier Separate packets into different classes Meter Measure traffic for conformance to profile (within or exceed) Marker Policing by remarking codepoints if required E.g. Remark packets that exceed the profile Shaper Delay packets so that packet stream does not exceed traffic rate specified in the profile Dropper Drop packets when packet rate exceeds the profile
667
DS Functions (Fig. 13, page 334)
668
Per-Hop Behavior 1. Expedited Forwarding (EF)
EF PHB is to provide premium service Low loss, delay, jitter; assured bandwidth end-to-end service through domains Looks like point to point or leased line Difficult to achieve Configure nodes so traffic aggregate has well defined minimum departure rate EF PHB Condition aggregate so arrival rate at any node is always less that minimum departure rate Boundary conditioners
669
Per Hop Behaviour – Explicit Allocation
Superior to best efforts Does not require reservation of resources Does not require detailed discrimination among flows Key elements of explicit allocation scheme: Users are offered choice of a number of classes User traffic is monitored at boundary node Marked in or out, depending on matching profile or not Inside network all traffic treated as single pool of packets, distinguished only as in or out When congestion occurs, drop out packets before in packets if necessary Different levels of service because different number of in packets for each user
670
PHB – 2. Assured Forwarding (AF)
Four classes defined A user may select one or more to meet requirements Within class, packets marked by customer or provider with one of three drop precedence values Used to determine relative importance when dropping packets as result of congestion
671
Codepoints for AF PHB Figure 11(b)
672
Chapter 18 Protocols for QOS Support
673
Increased Demands Need to incorporate bursty and stream traffic in TCP/IP architecture Increase capacity Faster links, switches, routers Intelligent routing policies End-to-end flow control Multicasting Quality of Service (QoS) capability Transport protocol for streaming
674
Resource Reservation - Unicast
Prevention as well as reaction to congestion required Can do this by resource reservation Unicast End users agree on QoS for task and request from network May reserve resources Routers pre-allocate resources If QoS not available, may wait or try at reduced QoS
675
Resource Reservation – Multicast
Generate vast traffic High volume application like video Lots of destinations Can reduce load Some members of group may not want current transmission “Channels” of video Some members may only be able to handle part of transmission Basic and enhanced video components of video stream Routers can decide if they can meet demand
676
Resource Reservation Problems on an Internet
Must interact with dynamic routing Reservations must follow changes in route Soft state – a set of state information at a router that expires unless refreshed End users periodically renew resource requests
677
Resource ReSerVation Protocol (RSVP) Design Goals
Enable receivers to make reservations Different reservations among members of same multicast group allowed Deal gracefully with changes in group membership Dynamic reservations, separate for each member of group Aggregate for group should reflect resources needed Take into account common path to different members of group Receivers can select one of multiple sources (channel selection) Deal gracefully with changes in routes Re-establish reservations Control protocol overhead Independent of routing protocol
678
RSVP Characteristics Unicast and Multicast Simplex Receiver initiated
Unidirectional data flow Separate reservations in two directions Receiver initiated Receiver knows which subset of source transmissions it wants Maintain soft state in internet Responsibility of end users Providing different reservation styles Users specify how reservations for groups are aggregated Transparent operation through non-RSVP routers Support IPv4 (ToS field) and IPv6 (Flow label field)
679
Data Flows - Session Data flow identified by destination
Resources allocated by router for duration of session Defined by Destination IP address Unicast or multicast IP protocol identifier TCP, UDP etc. Destination port May not be used in multicast
680
Flow Descriptor Reservation Request Flow spec Desired QoS
Used to set parameters in node’s packet scheduler Service class, Rspec (reserve), Tspec (traffic) Filter spec Set of packets for this reservation Source address, source prot
681
Treatment of Packets of One Session at One Router
682
RSVP Operation Diagram
683
RSVP Operation G1, G2, G3 members of multicast group
S1, S2 sources transmitting to that group Heavy black line is routing tree for S1, heavy grey line for S2 Arrowed lines are packet transmission from S1 (black) and S2 (grey) All four routers need to know reservation s for each multicast address Resource requests must propagate back through routing tree
684
Filtering G3 has reservation filter spec including S1 and S2
G1, G2 from S1 only R3 delivers from S2 to G3 but does not forward to R4 G1, G2 send RSVP request with filter excluding S2 G1, G2 only members of group reached through R4 R4 doesn’t need to forward packets from this session R4 merges filter spec requests and sends to R3 R3 no longer forwards this session’s packets to R4 Handling of filtered packets not specified Here they are dropped but could be best efforts delivery R3 needs to forward to G3 Stores filter spec but doesn’t propagate it
685
Reservation Styles Determines manner in which resource requirements from members of group are aggregated Reservation attribute Reservation shared among senders (shared) Characterizing entire flow received on multicast address Allocated to each sender (distinct) Simultaneously capable of receiving data flow from each sender Sender selection List of sources (explicit) All sources, no filter spec (wild card)
686
Reservation Attributes and Styles
Distinct Sender selection explicit = Fixed filter (FF) Sender selection wild card = none Shared Sender selection explicit= Shared-explicit (SE) Sender selection wild card = Wild card filter (WF)
687
Wild Card Filter Style Single resource reservation shared by all senders to this address If used by all receivers: shared pipe whose capacity is largest of resource requests from receivers downstream from any point on tree Independent of number of senders using it Propagated upstream to all senders WF(*{Q}) * = wild card sender Q = flowspec Audio teleconferencing with multiple sites
688
Fixed Filter Style Distinct reservation for each sender
Explicit list of senders FF(S1{Q!}, S2{Q2},…) Video distribution
689
Shared Explicit Style Single reservation shared among specific list of senders SE(S1, S2, S3, …{Q}) Multicast applications with multiple data sources but unlikely to transmit simultaneously
690
Reservation Style Examples
691
RSVP Protocol Mechanisms
Two message types Resv Originate at multicast group receivers Propagate upstream Merged and packet when appropriate Create soft states Reach sender Allow host to set up traffic control for first hop Path Provide upstream routing information Issued by sending hosts Transmitted through distribution tree to all destinations
692
RSVP Host Model
693
Multiprotocol Label Switching (MPLS)
Routing algorithms provide support for performance goals Distributed and dynamic React to congestion Load balance across network Based on metrics Develop information that can be used in handling different service needs Enhancements provide direct support IS, DS, RSVP Nothing directly improves throughput or delay MPLS tries to match ATM QoS support
694
Background Efforts to marry IP and ATM IP switching (Ipsilon)
Tag switching (Cisco) Aggregate route based IP switching (IBM) Cascade (IP navigator) All use standard routing protocols to define paths between end points Assign packets to path as they enter network Use ATM switches to move packets along paths ATM switching (was) much faster than IP routers Use faster technology
695
Developments IETF working group in 1997, proposed standard 2001
Routers developed to be as fast as ATM switches Remove the need to provide both technologies in same network MPLS does provide new capabilities QoS support Traffic engineering Virtual private networks Multiprotocol support
696
Connection Oriented QoS Support
Guarantee fixed capacity for specific applications Control latency/jitter Ensure capacity for voice Provide specific, guaranteed quantifiable SLAs Configure varying degrees of QoS for multiple customers MPLS imposes connection oriented framework on IP based internets
697
Traffic Engineering Ability to dynamically define routes, plan resource commitments based on known demands and optimize network utilization Basic IP allows primitive traffic engineering E.g. dynamic routing MPLS makes network resource commitment easy Able to balance load in face of demand Able to commit to different levels of support to meet user traffic requirements Aware of traffic flows with QoS requirements and predicted demand Intelligent re-routing when congested
698
VPN Support Traffic from a given enterprise or group passes transparently through an internet Segregated from other traffic on internet Performance guarantees Security
699
Multiprotocol Support
MPLS can be used on different network technologies IP Requires router upgrades Coexist with ordinary routers ATM Enables and ordinary switches co-exist Frame relay Mixed network
700
MPLS Terminology
701
MPLS Operation Label switched routers capable of switching and routing packets based on label appended to packet Labels define a flow of packets between end points or multicast destinations Each distinct flow (forward equivalence class – FEC) has specific path through LSRs defined Connection oriented Each FEC has QoS requirements IP header not examined Forward based on label value
702
MPLS Operation Diagram
703
Explanation - Setup Labelled switched path established prior to routing and delivery of packets QoS parameters established along path Resource commitment Queuing and discard policy at LSR Interior routing protocol e.g. OSPF used Labels assigned Local significance only Manually or using Label distribution protocol (LDP) or enhanced version of RSVP
704
Explanation – Packet Handling
Packet enters domain through edge LSR Processed to determine QoS LSR assigns packet to FEC and hence LSP May need co-operation to set up new LSP Append label Forward packet Within domain LSR receives packet Remove incoming label, attach outgoing label and forward Egress edge strips label, reads IP header and forwards
705
Notes MPLS domain is contiguous set of MPLS enabled routers
Traffic may enter or exit via direct connection to MPLS router or from non-MPLS router FEC determined by parameters, e.g. Source/destination IP address or network IP address Port numbers IP protocol id Differentiated services codepoint IPv6 flow label Forwarding is simple lookup in predefined table Map label to next hop Can define PHB at an LSR for given FEC Packets between same end points may belong to different FEC
706
MPLS Packet Forwarding
707
Label Stacking Packet may carry number of labels LIFO (stack)
Processing based on top label Any LSR may push or pop label Unlimited levels Allows aggregation of LSPs into single LSP for part of route C.f. ATM virtual channels inside virtual paths E.g. aggregate all enterprise traffic into one LSP for access provider to handle Reduces size of tables
708
Label Format Diagram Label value: Locally significant 20 bit
Exp: 3 bit reserved for experimental use E.g. DS information or PHB guidance S: 1 for oldest entry in stack, zero otherwise Time to live (TTL): hop count or TTL value
709
Time to Live Processing
Needed to support TTL since IP header not read First label TTL set to IP header TTL on entry to MPLS domain TTL of top entry on stack decremented at internal LSR If zero, packet dropped or passed to ordinary error processing (e.g. ICMP) If positive, value placed in TTL of top label on stack and packet forwarded At exit from domain, (single stack entry) TTL decremented If zero, as above If positive, placed in TTL field of Ip header and forwarded
710
Label Stack Appear after data link layer header, before network layer header Top of stack is earliest (closest to network layer header) Network layer packet follows label stack entry with S=1 Over connection oriented services Topmost label value in ATM header VPI/VCI field Facilitates ATM switching Top label inserted between cell header and IP header In DLCI field of Frame Relay Note: TTL problem
711
Position of MPLS Label Stack
712
FECs, LSPs, and Labels Traffic grouped into FECs
Traffic in a FEC transits an MLPS domain along an LSP Packets identified by locally significant label At each LSR, labelled packets forwarded on basis of label. LSR replaces incoming label with outgoing label Each flow must be assigned to a FEC Routing protocol must determine topology and current conditions so LSP can be assigned to FEC Must be able to gather and use information to support QoS LSRs must be aware of LSP for given FEC, assign incoming label to LSP, communicate label to other LSRs
713
Topology of LSPs Unique ingress and egress LSR
Single path through domain Unique egress, multiple ingress LSRs Multiple paths, possibly sharing final few hops Multiple egress LSRs for unicast traffic Multicast
714
Route Selection Selection of LSP for particular FEC Hop-by-hop
LSR independently chooses next hop Ordinary routing protocols e.g. OSPF Doesn’t support traffic engineering or policy routing Explicit LSR (usually ingress or egress) specifies some or all LSRs in LSP for given FEC Selected by configuration,or dynamically
715
Constraint Based Routing Algorithm
Take in to account traffic requirements of flows and resources available along hops Current utilization, existing capacity, committed services Additional metrics over and above traditional routing protocols (OSPF) Max link data rate Current capacity reservation Packet loss ratio Link propagation delay
716
Label Distribution Setting up LSP Assign label to LSP
Inform all potential upstream nodes of label assigned by LSR to FEC Allows proper packet labelling Learn next hop for LSP and label that downstream node has assigned to FEC Allow LSR to map incoming to outgoing label
717
Real Time Transport Protocol
TCP not suited to real time distributed application Point to point so not suitable for multicast Retransmitted segments arrive out of order No way to associate timing with segments UDP does not include timing information nor any support for real time applications Solution is real-time transport protocol RTP
718
RTP Architecture Close coupling between protocol and application layer functionality Framework for application to implement single protocol Application level framing Integrated layer processing
719
Application Level Framing
Recovery of lost data done by application rather than transport layer Application may accept less than perfect delivery Real time audio and video Inform source about quality of delivery rather than retransmit Source can switch to lower quality Application may provide data for retransmission Sending application may recompute lost values rather than storing them Sending application can provide revised values Can send new data to “fix” consequences of loss Lower layers deal with data in units provided by application Application data units (ADU)
720
Integrated Layer Processing
Adjacent layers in protocol stack tightly coupled Allows out of order or parallel functions from different layers
721
RTP Architecture Diagram
722
RTP Data Transfer Protocol
Transport of real time data among number of participants in a session, defined by: RTP Port number UDP destination port number if using UDP RTP Control Protocol (RTCP) port number Destination port address used by all participants for RTCP transfer IP addresses Multicast or set of unicast
723
Multicast Support Each RTP data unit includes: Source identifier
Timestamp Payload format
724
Relays Intermediate system acting as receiver and transmitter for given protocol layer Mixers Receives streams of RTP packets from one or more sources Combines streams Forwards new stream Translators Produce one or more outgoing RTP packets for each incoming packet E.g. convert video to lower quality
725
RTP Header
726
RTP Control Protocol (RTCP)
RTP is for user data RTCP is multicast provision of feedback to sources and session participants Uses same underlying transport protocol (usually UDP) and different port number RTCP packet issued periodically by each participant to other session members
727
RTCP Functions QoS and congestion control Identification
Session size estimation and scaling Session control
728
RTCP Transmission Number of separate RTCP packets bundled in single UDP datagram Sender report Receiver report Source description Goodbye Application specific
729
RTCP Packet Formats
730
Packet Fields (All Packets)
Version (2 bit) currently version 2 Padding (1 bit) indicates padding bits at end of control information, with number of octets as last octet of padding Count (5 bit) of reception report blocks in SR or RR, or source items in SDES or BYE Packet type (8 bit) Length (16 bit) in 32 bit words minus 1 In addition Sender and receiver reports have: Synchronization Source Identifier
731
Packet Fields (Sender Report) Sender Information Block
NTP timestamp: absolute wall clock time when report sent RTP Timestamp: Relative time used to create timestamps in RTP packets Sender’s packet count (for this session) Sender’s octet count (for this session)
732
Packet Fields (Sender Report) Reception Report Block
SSRC_n (32 bit) identifies source refered to by this report block Fraction lost (8 bits) since previous SR or RR Cumulative number of packets lost (24 bit) during this session Extended highest sequence number received (32 bit) Least significant 16 bits is highest RTP data sequence number received from SSRC_n Most significant 16 bits is number of times sequence number has wrapped to zero Interarrival jitter (32 bit) Last SR timestamp (32 bit) Delay since last SR (32 bit)
733
Receiver Report Same as sender report except:
Packet type field has different value No sender information block
734
Source Description Packet
Used by source to give more information 32 bit header followed by zero or more additional information chunks E.g.: 0 END End of SDES list 1 CNAME Canonical name 2 NAME Real user name of source 3 address
735
Goodbye (BYE) Indicates one or more sources no linger active
Confirms departure rather than failure of network
736
Application Defined Packet
Experimental use For functions & features that are application specific
737
Chapter 19 Overview of Information Theory
738
Introduction to Information Theory
Information theory is a branch of science that deals with the analysis of a communications system We will study digital communications – using a file (or network protocol) as the channel Claude Shannon Published a landmark paper in 1948 that was the beginning of the branch of information theory We are interested in communicating information from a source to a destination Source of Message Encoder NOISE Channel Decoder Destination of Message
739
Introduction to Information Theory
In our case, the messages will be a sequence of binary digits Does anyone know the term for a binary digit? One detail that makes communicating difficult is noise noise introduces uncertainty Suppose I wish to transmit one bit of information what are all of the possibilities? tx 0, rx 0 - good tx 0, rx 1 - error tx 1, rx 0 - error tx 1, rx 1 - good Two of the cases above have errors – this is where probability fits into the picture In the case of steganography, the “noise” may be due to attacks on the hiding algorithm
740
Introduction to Information Theory
Claude Shannon introduced the idea of self-information Suppose we have an event X, where Xi represents a particular outcome of the event Consider flipping a fair coin, there are two equiprobable outcomes: say X0 = heads, P0 = 1/2, X1 = tails, P1 = 1/2 The amount of self-information for any single result is 1 bit In other words, the number of bits required to communicate the result of the event is 1 bit
741
Introduction to Information Theory
When outcomes are equally likely, there is a lot of information in the result The higher the likelihood of a particular outcome, the less information that outcome conveys However, if the coin is biased such that it lands with heads up 99% of the time, there is not much information conveyed when we flip the coin and it lands on heads
742
Introduction to Information Theory
Suppose we have an event X, where Xi represents a particular outcome of the event Consider flipping a coin, however, let’s say there are 3 possible outcomes: heads (P = 0.49), tails (P=0.49), lands on its side (P = 0.02) – (likely MUCH higher than in reality) Note: the total probability MUST ALWAYS add up to one The amount of self-information for either a head or a tail is 1.02 bits For landing on its side: 5.6 bits
743
Introduction to Information Theory
Entropy is the measurement of the average uncertainty of information We will skip the proofs and background that leads us to the formula for entropy, but it was derived from required properties Also, keep in mind that this is a simplified explanation H – entropy P – probability X – random variable with a discrete set of possible outcomes (X0, X1, X2, … Xn-1) where n is the total number of possibilities
744
Introduction to Information Theory
Entropy is greatest when the probabilities of the outcomes are equal Let’s consider our fair coin experiment again The entropy H = ½ lg 2 + ½ lg 2 = 1 Since each outcome has self-information of 1, the average of 2 outcomes is (1+1)/2 = 1 Consider a biased coin, P(H) = 0.98, P(T) = 0.02 H = 0.98 * lg 1/ * lg 1/0.02 = = 0.98 * * = =
745
Introduction to Information Theory
In general, we must estimate the entropy The estimate depends on our assumptions about about the structure (read pattern) of the source of information Consider the following sequence: Obtaining the probability from the sequence 16 digits, 1, 6, 7, 10 all appear once, the rest appear twice The entropy H = 3.25 bits Since there are 16 symbols, we theoretically would need 16 * 3.25 bits to transmit the information
746
Introduction to Information Theory
Consider the following sequence: Obtaining the probability from the sequence 1, 2 four times (4/22), (4/22) 4 fourteen times (14/22) The entropy H = = bits Since there are 22 symbols, we theoretically would need 22 * = (29) bits to transmit the information However, check the symbols 12, 44 12 appears 4/11 and 44 appears 7/11 H = = bits 11 * = (11) bits to tx the info (38 % less!) We might possibly be able to find patterns with less entropy
747
Chapter 20 Lossless Compression
748
Compression Why Compression?
All media, be it text, audio, graphics or video has “redundancy”. Compression attempts to eliminate this redundancy. What is Redundancy? If one representation of a media content, M, takes X bytes and another takes Y bytes(Y< X), then X is a redundant representation relative to Y. Other forms of Redundancy If the representation of a media captures content that is not perceivable by humans, then removing such content will not affect the quality of the content. Compression makes it possible for efficient storage and communication of media. Compression is essentially the elimination of redundancy which is inherent in most representations of text audio or video. Redundancy arises from the fact that most representations of media are not optimized in terms of space. For example, using 8-bit ASCII codes to represent characters is convenient but clearly not optimal: It would make sense to give smaller codes to more frequently occurring characters, like vowels and letters, r, t, s, n, etc., and longer codes to less frequently occurring characters like z, x, v, k etc. Redundancy can also occur due to representation of content that is beyond the perceptive abilities of the listener (viewer). An important question is: “Is there an optimal representation of a source?” This questions is tackled by information theory.
749
For example, capturing audio frequencies outside the human hearing range can be avoided without any harm to the audio’s quality. Is there a representation with an optimal size Z that cannot be improved upon? This question is tackled by information theory.
750
Compression Lossless Compression Compression with loss M M
Compress with loss m m M’ M Uncompress Uncompress Before we look at Information theory lets make a clear distinction between lossless compression and compression with loss, or sometimes referred to as lossy compression. Notice from the two figures: When lossless compression is used the compressed media, M, does not loose any content, therefore, uncompressing it restores the media to its native state, M. On the other hand with a lossy compression, the compressed media when uncompressed is different from the original media. Depending on the mechanism used the difference might be too subtle to notice. M M’
751
Information Theory According to Shannon, the of an information source S is defined as: H(S) = Σi (pi log 2 (1/pi )) log 2 (1/pi ) indicates the amount of information contained in symbol Si, i.e., the number of bits needed to code symbol Si. For example, in an image with uniform distribution of gray-level intensity, i.e. pi = 1/256, with the number of bits needed to code each gray level being 8 bits. The entropy of the image is 8. Q: What is the entropy of a source with M symbols where each symbol is equally likely? Entropy, H(S) = log2 M Q: How about an image in which half of the pixels are white and half are black? Entropy, H(S) = 1 Information theory deals with the questions of “information content” of a data source, also referred to as the entropy of the source. The entropy H of a source S is given by the formula H(S)…, where, P sub I is the probability of occurrence of symbol S sub I The formula simply states that the information content of a source is given by the sum of the logarithms of the information content of each symbol weighed by its probability. The information content of a given symbol S sub I is given by log 1 over P sub I to the base 2. The units are bits. The term (log1 over P_I) is sometimes referred to as the uncertainty of the symbol S_I. The greater the probability of occurrence of a symbol lower its uncertainty and hence fewer bits are needed to represent it. Similarly lower the probability of occurrence of a symbol higher its uncertainty and hence more bits are needed to represent it. Observe that the entropy of a source is highest when all symbols are equally likely. Work the two questions out while you are on this slide, they are simple applications of the Entropy formula.
752
Information Theory Discussion:
Entropy is a measure of how much information is encoded in a message. Higher the entropy, higher the information content. We could also say entropy is a measure of uncertainty in a message. Information and Uncertainty are equivalent concepts. The units (in coding theory) of entropy are bits per symbol. It is determined by the base of the logarithm: 2: binary (bit); 10: decimal (digit). Entropy gives the actual number of bits of information contained in a message source. Lets try to get some more insight into the purport of the Entropy equation Entropy is a measure of information content or uncertainty. Example: 4 bits because the information content of the letter e is log (1/1/16) which is log 16, which is 4. Refer to the Information theory primer for a quick brush up on logarithms and their properties.
753
Example: If the probability of the character ‘e’ appearing in this slide is 1/16, then the information content of this character is 4 bits. So, the character string “eeeee” has a total content of 20 bits (contrast this to using an 8-bit ASCII coding that could result in 40 bits to represent “eeeee”).
754
Data Compression = Modeling + Coding
Data Compression consists of taking a stream of symbols and transforming them into codes. The model is a collection of data and rules used to process input symbols and determine their probabilities. A coder uses a model (probabilities) to spit out codes when its given input symbols Let’s take Huffman coding to demonstrate the distinction: Information theory has motivated the idea that a Data source has an entropy that gives us the inherent information content it possess. If we can find codes to assign to symbols that will result in the data source being encoded in the fewest number of bits as indicated by their entropy then we can achieve maximum compression. That is, if the encoding of the data source reflects its entropy then we would have reached the optimal (minimal) coding for the source. There are two components to compression: Modeling and Coding. The model involves reasoning that does into capturing the Codes Input Stream Symbols Probabilities Output Stream Model Encoder
755
The output of the Huffman encoder is determined by the Model (probabilities). Higher the probability shorter the code. Model A could determine raw probabilities of each symbol occurring anywhere in the input stream. (pi = # of occurrences of Si / Total number of Symbols) Model B could determine prob. based on the last 10 symbols in the i/p stream. (continuously re-computes the probabilities)
756
The Shannon-Fano Encoding Algorithm
Example Symbol Count B A D C E 7 15 6 5 1 1 1 1 Symbol Count Info. -log2(pi) Code Subtotal# of Bits A 15 1.38 00 30 B 7 2.48 01 14 C 6 2.70 10 12 D 110 18 E 5 2.96 111 x 85.25 89 Screen1: Say, we are given that there are five symbols (A thru E) that can occur in a source with their frequencies being and 5. First sort the symbols in decreasing order of frequency. Screen2: Divide the list into two equal halves. That is, the counts of both halves are as close as possible to each other. So in this case we split the list between B and C. The upper half of the list is assigned a code 0 and the lower part is assigned a 1. Screen3: We recursively repeat the steps of splitting and assigning code till each symbol become a code leaf on the tree. That is, treat each split as a list and apply splitting and code assigning till you are left with lists of single elements. Screen 4: Note that we split the list containing C, D and E between C and D because the difference between the split lists is 11 minus 6 which is 5, if we were to have divided between D and E we would get a difference of 12-5 which is 7. Screen 5: We complete the algorithm and as a result have codes assigned to the symbols. Screen 6, 7, 8: Lets do a little analysis. The table shows the entropy of each symbol along with the codes assigned by the Shannon-Fano Coding. We also compute the contribution in bits of each symbol.. We note that it takes a total of 89 bits using the Shannon-Fano Algorithm to encode bits of information. I encourage you to compute what it would take if we used ASCII. It takes a total of 89 bits to encode bits of information (Pretty good huh!)
757
The Huffman Algorithm Example 39 1 1 24 1 13 1 11 Symbol B A D C E
1 1 24 1 13 1 11 Symbol B A D C E Count 7 15 6 5 Symbol Count Info. -log2(pi) Code Subtotal# of Bits A 15 1.38 1 B 7 2.48 000 21 C 6 2.70 001 18 D 010 E 5 2.96 011 x 85.25 87 Screen1: Lets take the same example. We compute the frequencies and sort the symbols in a list by decreasing order of frequencies. We maintain this list in sorted order at all times. Each symbol is represented by a node. Screen2: Take the lowest two nodes and add up their frequencies, create a parent node from them whose frequency is the sum of their frequencies. Insert this node into the list. This insertion will be in sorted order. Therefore the list will now contain: A new_node B and C. Assign codes 0 and 1 to the left and right branches of the tree and delete the two children (D and E) from the list. Screen3,4,5: Repeat the process by picking the lowest frequency nodes from the list, till you are left with only one node in the list. Screen 6,7: Lets do a little analysis. Again the table shows the entropy of each symbol (same as before) along with the codes assigned by the Huffman Coding. We also compute the contribution in bits of each symbol.. We note that it takes a total of 87 bits using the Huffman Algorithm to encode bits of information. This might not seem like a significant improvement from Shannon-Fano’s coding but it turns out to be optimal in that there is NO code assignment that can reduce the # of bits needed to less than 87.
758
Initialization: Put all nodes in an OPEN list L, keep it sorted at all times (e.g., ABCDE).
the following steps Repeat until the list L has only one node left: From L pick two nodes having the lowest frequencies, create a parent node of them. Assign the sum of the children's frequencies to the parent node and insert it into L. Assign code 0, 1 to the two branches of the tree, and delete the children from L.
759
Huffman Alg.: Discussion
Decoding for the above two algorithms is trivial as long as the coding table (the statistics) is sent before the data. There is an overhead for sending this, negligible if the data file is big. Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) --> great for decoder, unambiguous; unique Decipherability? If prior statistics are available and accurate, then Huffman coding is very good. Number of bits (per symbol) needed for Huffman Coding is: 87 / 39 = 2.23 Number of bits (per symbol)needed for Shannon-Fano Coding is: 89 / 39 = 2.28 So far we only looked at the coding part. The decoding is rather trivial for both these mechanisms as long as the coding table containing the symbols and their codes is sent before the data. Both these mechanisms derive their simplicity in the decoding process due to a property called “the unique prefix property” that their codes exhibit. Stated simply, this property requires that no code is a prefix of any other code. Refer back to the codes assigned by both the algorithms and you will notice for example that the SF code for B is 01 and none of the other symbols have 01 at their beginning, similarly the Huffman code for A is 1 and none of the other symbols have a 1 at their beginning. Note that the SF codes of 110 and 111 for D and E do not violate the unique prefix property because 11 is not a valid SF code, that is none of the symbols has the code 11 assigned to it. Think about it! What is the significance of this property: The fact is both these mechanisms assign codes of unequal length therefore without this property we would have no way of determining when a code ends and when the next one begins. Decoding now becomes a simple task of traversing the tree and when you hit a leaf you have the code and the symbol it represents. Try it, you will see what I mean. Lastly, we note that the entropy of the message source is 85.25/39 = 2.18 bits per symbol and the Huffman coding results in 2.23 bits per symbol and SF results in 2.28 bits per symbol.
760
Chapter 21 Lossy Compression
761
Lossy Compression Based on spatial redundancy
Measure of spatial redundancy: 2D covariance CovX(i,j)= 2e-a(i*i+j*j) Vertical correlation r1= Horizontal correlation r2= For images we assume equal correlations Typically e-a = r1= r2= 0.95 Measure of loss (or distortion): MSE between encoded and decoded image E[X(i,j)X(i-1,j)] E[X2(i,j)] E[X(i,j)X(i,j-1)] E[X2(i,j)]
762
Rate-Distortion Function
Tradeoff between bit rate (R) of compressed image and distortion (D) R measured in its per encoder output symbol Compression ratio = encoder input bits/R D normalized by the variance of the encoder input Possible SNR definition = 10 log10 D-1 For images that can be modeled as uncorrelated Gaussian R(D)=0.5log2 D-1 More realistic images See graph How do you make these graphs?
763
Sample vs. Block-based Coding
Sample-based In spatial or frequency domain Like the JPEG-LS Make a predictor function (often weighted sum) Compute and quantize residual Encode Block-based Spatial: group pixels into blocks, compress blocks Transform: group into blocks, transform, encode
764
Which Transformation? Considerations
Packing the most energy in the least number of elements Minimizing the total entropy of the sequence Decorrelating elements in the input blocks maximally Coding complexity DFT, KLT, DCT, DHT? See the effects DCT-based coding High compaction efficiency for correlated data Orthogonal and separable Fast, approximate DCT algorithms available
765
The JPEG standard Operating modes Lossless Sequential DCT-based
Progressive DCT-based Hierarchical
766
JPEG: The Process Preprocess colors
Divide the image into 8 pixel x 8 pixel non-overlapping blocks – why 8? Transform each block into 2D DCT Encode Stage 1: predictive coder for DC or run-length coder for AC coefficients Stage 2: Huffman or arithmetic coding
767
JPEG: Color Processing
Maximum number of color components = 256 Each sample may be 8 or 12 bits in precision Conversion of a decorrelated color space Y-Cb-Cr, YUV, CIELAB Subsample chrominance components Interleave components Maximum MCU components=4 Maximum number of data units within MCU = 10
768
JPEG: Quantization Tables
8 X 8 quantization table for each image component Q(i,j): quantization step for corresponding DCT element 1 Q(i,j) 255 Psycho-visual experiments Bit-rate control Total number of block = B yk[i,j] : (i,j) output of k-th block bits per DCT coeff target av. bit-rate for 8-bit images
769
JPEG: Entropy Coding Baseline processing
Total of 4 code tables allowed Different code tables for luminance and chrominance DC coefficients DC differentials are computed and have the range [-2047, 2047] The range is divided into 12 size categories Category i needs i bits to represent the value DC residuals are represented as [size, amplitude] pairs Size is Huffman encoded
770
JPEG: Entropy Coding AC coefficients
Take value in the range [-1023, 1023] 10 size categories Only non-zero coefficients need to be encoded Processed in zig-zag order More efficient run-length encoding of AC coefficients Represented as (run/size, amplitude) If run > 15, possibly several (15/0) symbols are used
771
JPEG: Progressive Coding
Coding is performed sequentially but in multiple scans The first scan should produce the full image without all details, and details are provided in successive scans Spectral selection Each block divided into frequency bands Each band transmitted during a different scan Successive approximation Given a frequency band, DCT coefficients are divided by 2k MSBs are transmitted first
772
Subband Coding Pass an image through an n-band filter bank
Possibly subsample each filtered output Encode each subband separately Compression may be achieved by discarding unimportant bands Advantages Fewer artifacts than block-coded compression More robust under transmission errors Selective encoding/decoding possible More expensive
773
Wavelet Compression Special case of subband compression
Space and frequency-limited mother function y(t) Function family: ymn(t) = a0-m/2 y(a0-mt - nb0) If a0=2 and b0=1, the ymn(t) function is an orthonormal basis function f(t) = The a0-m term scales the signal Scaling function fmn(t) = 2-m/2 f(2-mt - n) Unscaled high-pass filter downscaled low-pass filter
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.