Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 CS716 Advanced Computer Networks By Dr. Amir Qayyum.

Similar presentations


Presentation on theme: "11 CS716 Advanced Computer Networks By Dr. Amir Qayyum."— Presentation transcript:

1 11 CS716 Advanced Computer Networks By Dr. Amir Qayyum

2 2 Lecture No. 25

3 Review Lecture

4 4 Switched Networks –Two or more nodes connected by a link –Circular nodes (switches) implement the network –Squared nodes (hosts) use the network A network can be defined recursively as...

5 5 Switched Networks –Two or more networks connected by one or more nodes: internetworks –Circular nodes (router or gateway) interconnects the networks –A cloud denotes “any type of independent network” A network can be defined recursively as...

6 6 Switching Strategies Circuit switching: Carry bit streams a.establishes a dedicated circuit b.links reserved for use by communication channel c.send/receive bit stream at constant rate d.example: original telephone network Packet switching: Store-and-forward messages a.operates on discrete blocks of data b.utilizes resources dynamically according to traffic demand c.send/receive messages at variable rate d.example: Internet

7 7 Multiplexing Physical links/switches must be shared among users –(Synchronous) Time-Division Multiplexing (TDM) –Frequency-Division Multiplexing (FDM) L1 L2 L3 R1 R2 R3 Switch 1Switch 2 Multiple flows on a single link Do you see any problem with TDM / FDM ?

8 8 Statistical Multiplexing On-demand time-division, possibly synchronous (ATM) Schedule link on a per-packet basis Buffer packets in switches that are contending for the link Packets from different sources interleaved on link … Do you see any problem ?

9 9 Inter-Process Communication Turn host-to-host connectivity into process-to-process communication, making the communication meaningful. Fill gap between what applications expect and what the underlying technology provides. Abstraction for application-level communication Host Application Host Application Host Channel

10 10 Abstract Channel Functionality What functionality does a channel provide ? –Smallest set of abstract channel types adequate for largest number of applications Where the functionality is implemented ? –Network as a simple bit-pipe with all high-level communication semantics at the hosts –More intelligent switches allowing hosts to be “dumb” devices (telephone network)

11 11 Performance Metrics … and to do so while delivering “good” performance Bandwidth (throughput) –Data transmitted per unit time, e.g. 10 Mbps –Link bandwidth versus end-to-end bandwidth –Notation KB = 2 10 bytes Kbps = 10 3 bits per second

12 12 Performance Metrics Latency / Delay –Time to send message from point A to point B –One-way versus Round-Trip Time (RTT) –Components Latency = Propagation + Transmit + Queue Propagation = Distance / c Transmit = Size / Bandwidth Note: No queuing delay in direct (point-to-point) link Bandwidth irrelevant if size = 1 bit Process-to-process latency includes software processing overhead (dominates over shorter distances)

13 13 Delay x Bandwidth Product Amount of data “in flight” or “in the pipe” Example: 100ms RTT x 45Mbps BW = 560KB This much data must be buffered before the sender responds to slowdown the request Delay Bandwidth

14 14 Network Architecture The challenge is to fill the gap between hardware capabilities and application expectations, and to do so while delivering “good” performance Designers cope with this complex task by developing a network architecture as a guideline –Layering, protocols, standards

15 15 Layering Alternative abstractions at each layer Manageable network components Modify layers independently Hardware Host-to-host connectivity Application programs Request/reply channel Message stream channel

16 16 Protocols Building blocks of a network architecture Each protocol object has two different interfaces –service interface: operations on this protocol –peer-to-peer interface: messages exchanged with peer Term “protocol” is overloaded –Specification of peer-to-peer interface –Module that implements this interface –Peer modules are interoperable if both accurately follow the specifications

17 17 Host 1Host 2 Service interface Peer-to-peer interface Protocol Interfaces High-level object High-level object Protocol

18 18 Protocol Graph – Network Architecture Collection of protocols and their dependencies –Most peer-to-peer communication is indirect –Peer-to-Peer is direct only at hardware level Host 1 Host 2 File application Digital library application Video application File application Digital library application Video application RRP MSP HHP RRP: Request Reply Protocol MSP: Message Stream Protocol HHP: Host-to- Host Protocol

19 19 Protocol Machinery Multiplexing and Demultiplexing (demux key) Encapsulation (header/body) in peer-to-peer interfaces –Indirect communication (except at hardware level) –Each protocol adds a header –Part of header includes demultiplexing field (e.g., pass up to request/reply or to message stream?)

20 20 Encapsulation Host 1Host 2 Application program Application program Data RRP Data HHP RRPData HHP RRP Data

21 21 Standard Architectures Open System Interconnect (OSI) Architecture –International Standards Organization (ISO) –International Telecommunications Union (ITU), formerly CCITT –“X dot” series: X.25, X.400, X.500 –Primarily a reference model

22 22 OSI Architecture Application Presentation Session Transport End host One or more nodes within the network Network Data link Physical Network Data link Physical Network Data link Physical Application Presentation Session Transport End host Network Data link Physical Application Data formatting Connection management Process-to-process communication channel Host-to-host packet delivery Framing of data bits Transmission of raw bits User level OS kernel

23 23 Internet Architecture TCP/IP Architecture –Developed with ARPANET and NSFNET –Internet Engineering Task Force (IETF) Culture: implement, then standardize OSI culture: standardize, then implement –Became popular with release of Berkeley Software Distribution (BSD) Unix; i.e. free software –Standard suggestions traditionally debated publically through “Request For Comments” (RFC’s)

24 24 Internet Architecture Implementation and design done together Hourglass Design (bottleneck is IP) Application vs Application Protocol (FTP, HTTP) … NET n 2 1 IP TCP UDP FTPHTTPNV TFTP

25 25 Internet Architecture Layering is not very strict Application TCP UDP IP Network

26 26 Networking in the Internet Age

27 27 Network Application Programming Interface (API) Interface that the OS provides to its networking subsystem –Most network protocols are implemented in software –All systems implement network protocols as part of the OS –Each OS is free to define its own network API –Applications can be ported from one OS to another if APIs are similar *IF* application program does not interact with other parts of the OS other than the network (file system, fork processes, display …)

28 28 Protocols and API Protocols provide a certain set of services API provides a syntax by which those services can be invoked Implementation is responsible for mapping API syntax onto protocol services

29 29 Socket API Use sockets as “abstract endpoints” of communication Issues –Creating & identifying sockets –Sending & receiving data Mechanisms –UNIX system calls and library routines socket process

30 30 Protocol-to-Protocol Interface A protocol interacts with a lower level protocol like an application interacts with underlying network Why not using available network APIs for PPI ? –Inefficiencies built into the socket interface Application programmer tolerate them to simplify their task –inefficiency at one level Protocol implementers do not tolerate them –inefficiencies at several layers of protocols

31 31 Protocol-to-Protocol Interface Issues Configure Multiple Layers –Static vs Extensible Process Model –Avoid context switches Buffer Model –Avoid data copies

32 32 Process Model (a)(b) Process-per-ProtocolProcess-per-Message inter-process communication procedure call

33 33 Buffer Model Buffer Copy Application Process Topmost Protocol send()deliver()

34 34 Network Programming Things to Learn –Internet protocols (IP, TCP, UDP, …) –Sockets API (Application Programming Interface) Why IP and Sockets Allows a common name space across most of Internet –IP (Internet Protocol) is standard Reduces number of translations, which incur overhead –Sockets: reasonably simple and elegant Unix interface (most servers run Unix)

35 35 Socket Programming Reading: Stevens 2nd edition, Chapter 1-6 Sockets API: A transport layer service interface –Introduced in 1981 by BSD 4.1 –Implemented as library and/or system calls –Similar interfaces to TCP and UDP –Can also serve as interface to IP (for super-user) known as “raw sockets” –Linux also provides interface to MAC layer (for super- user) known as “data-link sockets”

36 36 Client-Server Model Asymmetric relationship Server/Daemon –Well-known name –Waits for contact –Process requests, sends replies Client –Initiates contact –Waits for response Server Client

37 37 Client-Server Model Bidirectional communication channel Service models –Sequential: server processes only one client’s requests at a time –Concurrent: server processes multiple clients’ requests simultaneously –Hybrid: server maintains multiple connections, but processes requests sequentially Server and client categories not disjoint –Server can be client of another server –Server as client of its own client (peer-to-peer architecture)

38 38 TCP Connections TCP connection setup via 3-way handshake –J and K are sequence numbers for messages ClientServer SYN J SYN K ACK J+1 ACK K+1 Hmmm … RTT is important!

39 39 TCP Connections TCP connection teardown (4 steps) (either client or server can initiate connection teardown) Client Server FIN J FIN K ACK K+1 ACK J+1 active close passive close closes connection Hmmm … Latency matters!

40 40 UDP - Aspects of Services Unit of transfer is a datagram (variable length packet) Unreliable, drops packets silently No ordering guarantees No flow control 16-bit port space (distinct from TCP ports) allows multiple recipients on a single host

41 41 Addresses and Data Internet domain names: human readable –Mnemonic –Variable Length e.g. www.case.edu.pk, www.carepvtltd.com (FQDN) IP addresses: easily handled by routers/computers –Fixed Length –Tied (loosely) to geography e.g. 131.126.143.82 or 212.0.0.1

42 42 Endianness Machines on Internet have different endianness Little-endian (Intel, DEC): least significant byte of word stored in lowest memory address Big-endian (Sun, SGI, HP): most significant byte...

43 43 Socket Address Structures Socket address structures (all fields in network byte order except sin_family) IP address struct in_addr { in_addr_t s_addr; /* 32-bit IP address */ }; TCP or UDP address struct sockaddr_in { short sin_family; /* e.g., AF_INET */ ushort sin_port; /* TCP / UDP port */ struct in_addr; /* IP address */ };

44 44 Address Conversion All binary values used and returned by these functions are network byte ordered struct hostent* gethostbyname (const char* hostname); translates English host name to IP address (uses DNS) struct hostent* gethostbyaddr (const char* addr, size_t len, int family); translates IP address to English host name (not secure) int gethostname (char* name, size_t namelen); reads host’s name (use with gethostbyname to find local IP)

45 45 Address Conversion in_addr_t inet_addr (const char* strptr); translate dotted-decimal notation to IP address; returns -1 on failure, thus cannot handle broadcast value “255.255.255.255” int inet_aton (const char* strptr, struct in_addr inaddr); translate dotted-decimal notation to IP address; returns 1 on success, 0 on failure char* inet_ntoa (struct in_addr inaddr); translate IP address to ASCII dotted-decimal notation (e.g., “128.32.36.37”); not thread-safe

46 46 Socket API Creating a socket int socket(int domain, int type, int protocol) domain (family) = AF_INET, PF_UNIX, AF_OSI type = SOCK_STREAM, SOCK_DGRAM protocol = TCP, UDP, UNSPEC return value is a handle for the newly created socket

47 47 Sockets (cont) Passive Open (on server) int bind(int socket, struct sockaddr *addr, int addr_len) int listen(int socket, int backlog) int accept(int socket, struct sockaddr *addr, int addr_len) Active Open (on client) int connect(int socket, struct sockaddr *addr, int addr_len)

48 48 Sockets (cont) Sending Messages int send(int socket, char *msg, int mlen, int flags) Receiving Messages int recv(int socket, char *buf, int blen, int flags)

49 49 Point-to-Point Links Reading: Peterson and Davie, Ch. 2 Outline Hardware building blocks Encoding Framing Error Detection Reliable transmission Sliding Window Algorithm

50 50 Direct Link Issues in the OSI and Hardware/Software Contexts Transport Network Data Link Physical Session Presentation Application user-level software kernel software (device drivers) reliability framing, error detection, MAC encoding hardware (network adapter)

51 51 Hardware Building Blocks Nodes –Hosts: general-purpose computers –Switches: typically special-purpose hardware –Routers (connecting networks): varies Links –Copper wire with electronic signaling –Glass fiber with optical signaling –Wireless with electromagnetic (radio, infrared, microwave) signaling

52 52 Links Physical Media –Twisted pair cable –Coaxial cable –Optical fiber –Space Media is used to propagate signals Signals are electromagnetic waves of certain frequency, traveling at speed of light

53 53 Signals Over a Link Signal is modulated for transmission –Varying frequency/amplitude/phase to receive distinguishable signals Binary data (0s and 1s) is encoded in a signal –Make it understandable by the receiving host

54 54 Bits Over a Link Bit streams may be transmitted both ways at a time on a point-to-point link –Full Duplex Sometimes two nodes must alternate link usage –Half Duplex

55 55 Encoding Signals propagate over a physical medium –Modulate electromagnetic waves –e.g. vary voltage Encode binary data onto signals that propagate Signalling component Signal Bits Node Adaptor

56 56 Encoding Problems with signal transmission –Attenuation: signal power absorbed by medium –Dispersion: a discrete signal spreads in space –Noise: random background “signals” modulator demodulator a string of signals Digital data (a string of symbols)

57 57 RS-232(-C) Communication between computer and modem Uses two voltage levels (+15V, -15V), a binary voltage encoding Data rate limited to 19.2 kbps (RS-232-C) raised in later standards

58 58 Binary Voltage Encoding NRZ (Non-Return to Zero) NRZI (NRZ Inverted) Manchester (used by IEEE 802.3, 10 Mbps Ethernet) 4B/5B (8B/10B) in Fast Ethernet

59 59 Non-Return to Zero (NRZ) Encode binary data onto signals –e.g. 0 as low signal and 1 as high signal –Voltage does not return to zero between bits Known as Non-Return to Zero (NRZ) Bits NRZ 0010111101000010

60 60 Problem: Consecutive 1s or 0s Low signal (0) may be interpreted as no signal High signal (1) leads to baseline wander Unable to recover clock –Sender’s and receiver’s clock have to be precisely synchronized –Receiver resynchronizes on each signal transition –Clock Drift in long periods without transition Sender’s clock Receiver’s clock

61 61 Alternative Encodings Non-Return to Zero Inverted (NRZI) Make a transition from current signal (switch voltage level) to encode/transmit a “one” Stay at current signal (maintain voltage level) to encode/transmit a “zero” Solves the problem of consecutive ones (shifts to 0s)

62 62 Alternative Encodings Manchester (in IEEE 802.3 – 10 Mbps Ethernet) Split cycle into two parts –Send high--low for “1”, low--high for “0” –Transmit XOR of NRZ encoded data and the clock Only 50% efficient (1/2 bit per transition)

63 63 4B/5B Encoding Every 4 consecutive bits of data encoded in a 5-bit code (symbol) –4-bit pattern is “translated” to a 5-bit pattern (not addition) 5-bit codes selected to have no more than one leading 0 and no more than two trailing 0s –00xxx (8 symbols) and xx000 (4 symbols) are illegal –5 free symbols (non-data) Thus, never gets more than three consecutive 0s Resulting 5-bit codes are transmitted using NRZI Achieves 80% efficiency

64 64 Binary Voltage Encoding Problem: wide frequency range required, implying –Significant dispersion –Uneven attenuation Prefer to use narrow frequency band (carrier frequency) Types of modulation –Amplitude Modulation (AM) –Frequency Modulation (FM) –Phase/Phase Shift –Combination of these (e.g. QAM)

65 65 Phase Modulation Algorithm Send carrier frequency for one period Perform phase shift Shift value encodes symbol –Value in range [0, 360º] –Multiple values for multiple symbols –Represent as circle 135 0 45 0 225 0 315 0 180 0 0 0 90 0 270 0 8-symbol example

66 66 Constellation Pattern for V.32 QAM For a given symbol: 1.Perform phase shift 2.Change to new amplitude 45 0 15 0 Points in constellation diagram –Chosen to maximize error detection –Process called trellis coding

67 67 Bit Rate and Baud Rate Bit rate is bits per second Baud rate is “symbols” per second If each symbol contains 4 bits then data rate is 4 times the baud rate

68 68 What Limits Baud Rate ? Baud rates are typically limited by electrical signaling properties No matter how small the voltage or how short the wire, changing voltages takes time Electronics are slow as compared to optics

69 69 Summary of Encoding Problems: attenuation, dispersion, noise Digital transmission allows periodic regeneration Variety of binary voltage encodings –High frequency components limit to short range –More voltage levels provide higher data rate Carrier frequency and modulation –Amplitude, frequency, phase, and combination (QAM) Nyquist (noiseless) and Shannon (noisy) limits on data rates

70 70 Framing Breaks continuous stream/sequence of bits into a frame and demarcates units of transfer Typically implemented by network adaptor –Adaptor fetches/deposits frames out of or into host memory Frames Bits Adaptor Node BNode A

71 71 Advantages of Framing Synchronization recovery –Consider continuous stream of unframed bytes –Recall RS-232 start and stop bits Multiplexing of link –Multiple hosts on shared medium –Simplifies multiplexing of logical channels Efficient error detection –Frame serves as unit of detection (valid or invalid) –Error detection overhead scales as log N

72 72 Approaches Organized by end of frame detection method Approaches to framing –Sentinel (marker, like C strings) –Length-based (like Pascal strings) –Clock-based

73 73 Approaches Other aspects of a particular approach –Bit-oriented or byte-oriented –Fixed or variable length –Data-dependent or data-independent length

74 74 Framing with Sentinels End of frame: special byte or bit pattern Choice of end of frame marker –Valid data byte or bit sequence e.g. 01111110 –Physical signal not used by valid data symbol 8 16 8 Beginning sequence Header Body CRC Ending sequence

75 75 Sentinel Based Approach Problem: equal size frames are not possible –Frame length is data-dependent Sentinel based framing examples –High-Level Data Link Control (HDLC) protocol –Point-to-Point Protocol (PPP) –ARPANET IMP-IMP protocol –IEEE 802.4 (token bus)

76 76 Length-based Framing Include payload length in header e.g., DDCMP (byte-oriented, variable-length) e.g. RS-232 (bit-oriented, implicit fixed length) Problem: count field corrupted Solution: catch when CRC fails 8 14 8 SYN Class Length 8 42 Header 16 Body CRC

77 77 Clock-based Framing Continuous stream of fixed-length frames –Each frame is 125µs long (all STS formats) (why?) Clocks must remain synchronized e.g. SONET: Synchronous Optical NETwork –Dominated standard for long distance transmission –Multiplexing of low-speed links onto one high-speed link –Byte-interleaved multiplexing –Payload bytes are scrambled (data XOR 127 bit-pattern) –STS-n (STS – 1 = 51.84 Mbps)

78 78 SONET Frame Format (STS-1)

79 79 Clock-based Framing Problem: how to recover frame synchronization –2-byte synchronization pattern starts each frame (unlikely to occur in data) –Wait until pattern appears in same place repeatedly

80 80 Clock-based Framing Problem: how to maintain clock synchronization –NRZ encoding, data scrambled (XOR’d) with 127-bit pattern –Creates transitions –Also reduces chance of finding false sync pattern

81 81 Error Detection Validates correctness of each frame Errors checked at many levels Demodulation of signals into symbols (analog) Bit error detection/correction (digital)— our main focus –Within network adapter (CRC check) –Within IP layer (IP checksum) –Possibly within application as well

82 82 Error Detection and Correction Possible binary voltage encoding symbol Neighborhoods and erasure region +15 -15 voltage 0 1 ? (erasure) Possible QAM symbol Neighborhoods in green All other space results in erasure Input to digital level: valid symbols or erasures

83 83 Error Detection: How ? How to detect error ? –Add redundant information to a frame to determine errors Transmit two complete copies of data –n redundant bits for n-bit message –Error at the same position in two copies go undetected

84 84 Error Detection: How ? We want only k redundant bits for an n-bit message, where k < < n –In Ethernet, 32-bit CRC for 12,000 bits (1500 bytes) k bits are derived from the original message Both the sender and receiver know the algorithm

85 85 Hamming Distance (1950 Paper) Minimum number of bit flips between code words –2 flips for parity –3 flips for voting n-bit error detection –No code word changed into another code word –Requires Hamming distance of n+1

86 86 Hamming Distance (1950 Paper) n-bit error correction –N-bit neighborhood: all code words within n bit flips –No overlap between n-bit neighborhoods –Requires Hamming distance of 2n+1

87 87 Digital Error Detection Techniques Two-dimensional parity –Detects up to 3-bit errors –Good for burst errors Internet checksum (used as backup to CRC) –Simple addition –Simple in software Cyclic redundancy check (CRC) –Powerful mathematics –Tricky in software, simple in hardware –Used in network adapter

88 88 Two-Dimensional Parity Adding one extra bit to a 7-bit code to balance 1s extra parity byte for the entire frame Catches all 1, 2 and 3 bit errors and most 4 bit errors 14 redundant bits for a 42-bit message, in the example 10111101 11010010 0101001 1 10111110 01101001 0001110 1 11110110 Parity bits Parity byte Data

89 89 Internet Checksum Algorithm Not used at the link level but provides same sort of functionality as CRC and parity Idea: –Add up all words (16-bit integers) that are transmitted –Transmit the result (checksum) of that sum –Receiver performs the same calculation on received data and compares the result with the received checksum –If the results do not match, an error is detected 16 redundant bits for a message of any length Weak protection, accepted as a last line of defense

90 90 Cyclic Redundancy Check Theory Based on finite-field (binary-valued) arithmetic Bit string represented as polynomial Coefficients are binary-valued Divide bit string polynomial by generator polynomial to generate CRC Practice Bitwise XOR’s

91 91 Cyclic Redundancy Check Add k bits of redundant data to an n-bit message –Want k << n –e.g. k = 32 and n = 12,000 (1500 bytes) Represent n-bit message as n-1 degree polynomial –e.g. MSG=10011010 as M(x) = x 7 + x 4 + x 3 + x 1 –Sender and receiver exchange polynomials Let k be the degree of some agreed-upon divisor/ generator polynomial –e.g. C(x) = x 3 + x 2 + 1

92 92 Cyclic Redundancy Check Transmit polynomial P(x) that is evenly divisible by C(x) –Shift left k bits, i.e. M(x)x k –Add remainder of M(x)x k / C(x) into M(x)x k Receiver receives polynomial P(x) + E(x) –E(x) = 0 implies no errors Receiver divides (P(x) + E(x)) by C(x); remainder will be zero ONLY if: –E(x) was zero (no error), or –E(x) is exactly divisible by C(x)

93 93 Reliable Transmission Error-correcting codes are not advanced enough to handle the range of bit and burst errors –Corrupt frames generally must be discarded –A reliable link-level protocol must recover from discarded frames Goals for reliable transmission –Make channel appear reliable –Maintain packet order (usually) –Impose low overhead / allow full use of link

94 94 Reliable Transmission Reliability accomplished using acknowledgments and timeouts –ACK is a small control frame confirming reception of an earlier frame –Having no ACK, sender retransmits after a timeout

95 95 Reliable Transmission Automatic Repeat reQuest (ARQ) algorithms –Stop-and-wait –Concurrent logical channels –Sliding window Go-back-n, or selective repeat Alternative: Forward Error Correction (FEC)

96 96 Automatic Repeat reQuest Acknowledgement (ACK) –Receiver tells sender when frame received –Cumulative ACK (used by TCP): have received specified frame and all previous –Selective ACK (SACK): specifies set of frames received –Negative ACK (NACK or NAK): receiver refuses to accept frame now, e. g. when out of buffer space

97 97 Automatic Repeat reQuest Timeout: sender decides that frame was lost and tries again ARQ also called Positive Acknowledgement with Retransmission (PAR)

98 98 Stop-and-Wait Send a single frame Wait for ACK or timeout –If ACK received, continue with next frame –If timeout occurred, send again (and wait) Frame lost in transit; or corrupted and discarded Sender Receiver Frame 0 Frame1 ACK0 ACK1

99 99 Stop-and-Wait Frames delivered reliably and in order Is that enough ? –No, we need performance, too. Problem: keeping the pipe full … ? Example –1.5Mbps link x 45ms RTT = 67.5Kb (~8KB) –1KB frames implies 182 Kbps (1/8th link utilization) –Want the sender to transmit 8 frames before waiting for ACK –Throughput remains 182 Kbps regardless of the link bandwidth !!

100 100 Concurrent Logical Channels Multiplex several logical channels over a single p-to-p physical link (include channel ID in header) Use stop-and-wait for each logical channel Maintain three bits of state for each logical channel: –Boolean saying whether channel is currently busy –Sequence number for frames sent on this channel –Next sequence number to expect on this channel ARPANET IMP-IMP supported 8 logical channels over each ground link (16 over each satellite link)

101 101 Concurrent Logical Channels Header for each frame include 3-bit channel number and 1-bit sequence number –Same number of bits (4) as the sliding window requires to support up to 8 outstanding frames on the link

102 102 Sliding Window Allow sender to transmit multiple frames before receiving an ACK, thereby keeping the pipe full Upper bound on outstanding un-ACKed frames Also used at the transport layer (by TCP) Sender Receiver T ime … …

103 103 Sliding Window Concepts Consider ordered stream of data –Broken into frames –Stop-and-Wait Window of one frame Slides along stream over time Sliding window algorithms generalize this notion –Multiple-frame send window –Multiple-frame receive window time

104 104 Sliding Window - Sender Assign sequence number to each frame ( SeqNum ) Maintain three state variables: –Send Window Size ( SWS ) –Last Acknowledgment Received ( LAR ) –Last Frame Sent ( LFS ) Maintain invariant: LFS – LAR ≤ SWS Advance LAR when ACK arrives Buffer up to SWS frames and associate timeouts time 14151213111920171816 LAR=13LFS=18 ≤ SWS

105 105 Sliding Window - Receiver Maintain three state variables –Receive Window Size ( RWS ) –Largest Frame Acceptable ( LFA ) –Next Frame Expected ( NFE ) Maintain invariant: LFA – NFE+1 ≤ RWS Frame SeqNum arrives: –If NFE ≤ SeqNum ≤ LFA  accept –If SeqNum ≤ NFE or SeqNum > LFA  discarded Send cumulative ACKs time 14151213111920171816 NFE=13 LFA=17 ≤ RWS

106 106 Sliding Window Issues When a timeout occurs, data in transit decreases –Pipe is no longer full when packet losses occur –Problem aggravates with delay in packet loss detection Early detection of packet losses improves performance: –Negative Acknowledgements (NACKs) –Duplicate Acknowledgements –Selective Acknowledgements (SACKs) Adds complexity but helps keeping the pipe full

107 107 Sliding Window Classification Stop-and-wait:SWS=1, RWS=1 Go-back-N:SWS=N, RWS=1 Selective repeat:SWS=N, RWS=M (usually M = N) Selective Repeat Go-back-N Stop-and-Wait

108 108 Sequence Number Space SeqNum field is finite; sequence numbers wrap around Sequence number space must be larger than number of outstanding frames ( SWS ) SWS <= MaxSeqNum-1 is not sufficient –Suppose 3-bit SeqNum field (0..7); SWS=RWS=7 –Sender transmits frames 0..6; which arrive successfully (receiver window advances) –ACKs are lost; sender retransmits 0..6 –Receiver expecting 7, 0..5, but receives second incarnation of 0..5 assuming them as 8 th to 13 th frame

109 109 Required Sequence Number Space ? Assume SWS=RWS (simplest, and typical) –Sender transmits full SWS –Two extreme cases at receiver None received (waiting for 0…SWS – 1) All received (waiting for SWS…2 × SWS – 1) All possible packets must have unique SeqNum SWS < (MaxSeqNum+1)/2 or SWS+RWS < MaxSeqNum+1 is the correct rule Intuitively, SeqNum “slides” between two halves of sequence number space

110 110 Shared Media: Problems Problem: demands can conflict, e. g. two hosts send simultaneously –STDM does not address this problem - centralized –Solution is a medium access control (MAC) algorithm

111 111 Shared Media: Solutions Three solutions (out of many) –Carrier Sense Multiple Access with Collision Detection (CSMA / CD) Send only if medium is idle Stop sending immediately if collision detected –Token Ring/FDDI pass a token around a ring; only token holder sends –Radio / wireless (IEEE 802.11)

112 112 History of Ethernet Developed by Xerox PARC in mid-1970s Roots in Aloha packet-radio network Standardized by Xerox/DEC/Intel in 1978 Similar to IEEE 802.3 standard IEEE 802.3u standard defines Fast Ethernet (100 Mbps) New switched Ethernet now popular

113 113 Ethernet – Alternative Technologies Can be constructed from a thinner cable (10Base2) rather than 50-ohm coax cable (10Base5) Newer technology uses 10BaseT (twisted pair) –Several point-to-point segments coming out of a multiway repeater called “hub” Hub

114 114 Ethernet – Multiple Segments Repeaters forward the broadcast signal on all out going segments (10Base5) Maximum of 4 repeaters (2500m), 1024 hosts Repeater Host … … …

115 115 Ethernet Packet Frame Preamble allows the receiver to synchronize with signal Frame must contain at least 46 bytes to detect collision 802.3 standard substitutes length with type field –Type field (demux key) is the first thing in data portion –A device can accept both frames: type > 1500 Dest addr 644832 CRCPreamble Src addr TypeBody 1648

116 116 Ethernet MAC – CSMA/CD Multiple access –Nodes send and receive frames over a shared link Carrier sense –Nodes can distinguish between an idle and busy link Collision detection –A node listens as it transmits to detect collision

117 117 CSMA/CD MAC Algorithm If line is idle (no carrier sensed) –Send immediately –Upper bound message size of ~1500 bytes –Must wait 9.6µs between back-to- back frames

118 118 CSMA/CD MAC Algorithm If line is busy (carrier sensed) … –Wait until the line becomes idle and then transmit immediately –Called 1-persistent (special case of p- persistent) If collision detected –Stop sending data and jam signal –Try again later

119 119 Constraints on Collision Detection In our example, consider –my-machine’s message reaches your-machine at T –your-machine’s message reaches my-machine at 2T Thus, my-machine must still be transmitting at 2T

120 120 Ethernet Min. Frame Size RTT on a maximally configured Ethernet of 2500m, with 4 repeaters is about 51.2 μs –2500m / 2 x 10 8 m/s = 12.5 µs –2 x 12.5 = 25 us + repeater delays 51.2 μs on 10 Mbps corresponds to 512 bits (64 bytes) Therefore, the minimum frame length for Ethernet is 64 bytes (header + 46 bytes data)

121 121 Retry After the Collision How long should a host wait to retry after a collision ? –Binary exponential backoff Maximum backoff doubles with each failure (exponential) After N failures, pick an N-bit number 2 N discrete possibilities from 0 to maximum

122 122 Ethernet Frame Reception Sender handles all access control Receiver simply pulls frames from network Ethernet controller/card –Sees all frames –Selectively passes frames to host processor

123 123 Experience With Ethernet Number of hosts limited to 200 in practice, standard allows 1024 Range much shorter than 2.5 km limit in standard Round-trip time is typically 5 or 10 μs, not 50μs

124 124 Token Ring Overview Token Ring network “was” a candidate to replace Ethernet; used in some MAN backbones –16Mbps IEEE 802.5 (based on earlier 4Mbps IBM ring) –100Mbps Fiber Distributed Data Interface (FDDI)

125 125 IBM Token Ring – IEEE 802.5 Ring is viewed as a single shared medium –Each node is allowed to transmit according to some distributed algorithm for medium access –All nodes see all frames; destination saves a copy of frame as it flows past The term “token” indicates the way the access to shared channel is managed

126 126 Token in a Token Ring Token is a special bit pattern that rotates around the ring –A node must capture token before transmitting –A node releases token after done transmitting Immediate release-token follows last frame (FDDI) Delayed release – after last frame returns to sender

127 127 Token in a Token Ring Remove your frame when it comes back around –Transmit another frame or re-insert the token Stations get round-robin service as the token circulates around the ring

128 128 Physical Properties Data rate can be 4 Mbps or 16 Mbps Encoding of bits uses differential Manchester Ring may have up to 250 (802.5) or 260 (IBM) nodes Physical medium is twisted pair (IBM Token Ring)

129 129 Token Ring MAC Network adaptor contains receiver, transmitter and some storage of bits between them Token circulates if no station has anything to send –Ring must have enough capacity to store entire token –At least 24 stations with 1-bit storage for 24-bit long token (if propagation delay is negligible) –This situation is avoided by designating a monitor

130 130 Token Ring MAC Any station that has a data to send can seize token In 802.5, simply 1 bit in second byte token is modified First two bytes of modified token become preamble for the next frame

131 131 Frame Format “Illegal” Manchester codes in the start and end delimiters Frame priority and reservation bits in access control byte Demux key in frame control byte A and C bits for reliable delivery, in status byte BodyCRC Src addr Variable48 Dest addr 4832 End delimiter 8 Frame status 8 Frame control 8 Access control 8 Start delimiter 8

132 132 Timed Token Algorithm Token Holding Time (THT) –Upper limit on how long a station can hold the token –A node checks before putting each frame on ring that its transmit time would not cause THT to exceed –Long THT achieves better utilization with few senders –Short THT helps when multiple nodes have data to send

133 133 Reliable Delivery The A and C bit in the packet trailer for reliability Both bits are initially set to 0 Destination sets A bit if it sees the frame and sets C bit if it copies the frame into its adaptor

134 134 Token Ring Packet Priorities A station willing to send priority n packet can set reservation bits to n, if this makes it lower in value –It captures the token when the current sender releases it with priority set to n Strict priority scheme: no lower- priority packets get sent when higher priority packets are waiting

135 135 Token Maintenance Token rings have a designated monitor node Any station can become the monitor according to a well defined procedure Monitor is elected when the ring is first connected, or when the current monitor fails

136 136 Token Maintenance Monitor periodically announces its presence Claim token sent by a station seeing no monitor –If the sender receives back the claim token, it becomes monitor –If another station is also contending for monitor, some rule defines the monitor

137 137 Fiber Distributed Data Interface Similar to 802.5/IBM token rings but runs on fiber Consists of a dual ring: two independent rings that transmit data in opposite directions at 100Mbps Tolerates a single link break or node failure (self- healing ring)

138 138 FDDI – Physical Properties Variable size buffer (9 – 80 bits) between input and output interfaces (10ns bit time) –Not required to fill buffer before starting transmission Maximum 500 stations, maximum 2 km distance between any pair of stations

139 139 FDDI – Physical Properties Total 200 km fiber: dual nature implies 100 km cable connecting all stations Physical media can be coax or twisted pair cable Uses 4B/5B encoding

140 140 Timed Token Algorithm Token Holding Time (THT) –Upper limit on how long a station can hold the token –Configured to some suitable value Token Rotation Time (TRT) –How long it takes the token to traverse the ring (time since a host released the token) –TRT <= ActiveNodes x THT + RingLatency

141 141 Timed Token Algorithm Target Token Rotation Time (TTRT) –“agreed-upon” or negotiated upper bound on TRT

142 142 MAC Algorithm Each node measures TRT between successive token arrivals If measured-TRT > TTRT – Token is late – Can not send data

143 143 FDDI Traffic Classes Synchronous traffic –Latency sensitive –Gets higher priority –Can always send data

144 144 Bounded Priority Traffic If a node has large amount of synchronous data –It will send regardless of measured TRT –TTRT will become meaningless !!! Therefore, total synchronous data during one token rotation is bounded by TTRT

145 145 Token Maintenance The procedure when a node –Joins the ring (startup) –Suspects a failure Claim frame is used in order to –Generate a new Token –Agree on TTRT (so that an application can meet its timing constraints) A node can send a claim frame without holding the token

146 146 Frame Format 4B/5B control symbols for start and end of frame Control Field –1st bit: asynchronous (0) versus synchronous (1) data –2nd bit: 16-bit (0) versus 48-bit (1) addresses –Last 6 bits: demux key (includes reserved patterns for token and claim frame) Status Field –From receiver back to sender; error in frame –Recognized address; accepted frame (flow control) BodyCRC Src addr Variable48 Dest addr 4832 End of frame 8 Status 24 Control 8 Start of frame 8

147 147 Wireless LANs IEEE 802.11 standard –Designed for use in a small area (offices, campuses) Bandwidth: 1, 2 or 11 Mbps –Up to 56Mbps in newer 802.11a standard Targets three physical media –Two spread spectrum radio (2.4GHz freq) –One diffused infrared (10m range, 850 nm band)

148 148 802.11 MAC: CSMA/CA Similar to Ethernet … –Defer the transmission until the link becomes idle –Take back off if collision occurs Is it sufficient ? All nodes are not always within reach of (to hear) each other

149 149 Hidden and Exposed Nodes Hidden nodes –Sender thinks its OK to send when its not (false +ve) –A-C and B-D are hidden nodes in the figure below Exposed nodes –Sender does not send when its OK to send (false –ve) –B and C are exposed nodes in the figure below

150 150 Multiple Access with Collision Avoidance (MACA) Sender transmits RequestToSend (RTS) frame –Contains intended time to hold the medium Receiver replies with ClearToSend (CTS) frame

151 151 MACA for Wireless (MACAW) Collision detection –No active collision detection –Known only if CTS or ACK is not received –Binary exponential back off (BEB) is used in case of collision, like in Ethernet

152 152 802.11 - Distribution System Nodes roam freely but operate within a structure –Tethered by wired network infrastructure (Ethernet ?) –Each Access Point (AP) services nodes in some region –Each mobile node associates itself with an AP

153 153 Managing Connectivity/Roaming How wireless nodes select Access Point ? Scanning (active search for an AP) –Node sends Probe frame –All AP’s within reach reply with Probe Response frame –Node selects one AP; sends it Associate Request frame –AP replies with Association Response –New AP informs old AP via wired backbone

154 154 Managing Connectivity Active scanning: when a node join or move Passive scanning: AP periodically sends Beacon frame, advertising its capabilities

155 155 Frame Format Control field contains three subfields: –6-bit Type field (data, RTS, CTS, scanning); –1-bit ToDS; and –1-bit FromDS A single frame contains up to 2312 bytes of data ToDS=0, FromDS=0C A ToDS=1, FromDS=1E AP-3 AP-1 A

156 156 Overview Also called network interface card (NIC) Components (high-level overview) Options for use –Data motion –Event notification Potential performance bottlenecks Programming device drivers

157 157 Typical Workstation Architecture CPU Cache $ Memory I/O bus Network Adaptor memory bus Communication ? To Network Typically where data link functionality is implemented

158 158 Components of a Network Adaptor Bus interface communicates with a specific host –Bus defines protocol for CPU-adaptor communication Link interface speaks correct protocol on network –Implemented by a chip set, in software or on FPGA Buffering between different speed bus and link Host I/O bus Network Adaptor Bus Interface Link Interface network

159 159 Host Perspective Adaptor is ultimately programmed by CPU Adaptor exports a Control Status Register (CSR) CSR is readable and writable from CPU at some memory address

160 160 Data Motion Options for Network Adaptor Use Transfer frames between adaptor and host memory Programmed input/output (PIO) –Processor manages itself each access (loads/stores) –Faster than DMA for small amounts of data

161 161 Data Motion Options for Network Adaptor Use Direct memory access (DMA) –Adaptor gets buffer descriptor lists by host for read/write –Processor is not involved: free to do other things –Can be faster than memory copy through CPU –Start-up cost

162 162 Data Motion CPU Cache $ Memory I/O bus Network Adaptor memory bus To Network Data movement path using PIO Data movement path using DMA

163 163 Network Adaptor: Event Notification Hardware interrupts – Processor free to do other things – Events delivered “immediately” – State (register) save/restore expensive – Context switches more expensive

164 164 Network Adaptor: Event Notification Event polling – Processor must periodically check – Events wait until next check – No extra state changes

165 165 Device Drivers Operating system routines anchoring protocol stack to network hardware Initialize device, transmit frames, field interrupts Code contains device specific details –Difficult to read but simple in logic

166 166 Performance Bottlenecks Link capacity Processor computing power I/O bus bandwidth – Overhead involved in each bus transfer

167 167 Performance Bottlenecks Memory bus bandwidth – Memory hierarchy with cache levels – Memory accesses results in multiple memory copies in different buffers

168 168 Packet Switches A multi-input multi-output device Local star topology Performance independent of connectivity –(e.g. adding new host) if switch is designed with enough aggregate capacity Maximum degree < physical network limit

169 169 Forwarding Packets arrive at one of the several inputs and have to be forwarded/switched to one of the available outputs –Connectionless and connection-oriented approach to determine the correct output Which way should it go ? First challenge: forwarding

170 170 Routing Forwarding requires information Second challenge: routing How to maintain forwarding information ?

171 171 Contention and Congestion If arrival rate for a certain output is greater than the output capacity, then contention occurs If arrival rate of packets is too high to cause buffer overflow, then congestion occurs Who goes first ? Any one is dropped ?

172 172 Network Layers and Switches One or more nodes within the network User level OS kernel host switch switch between different physical layers Transport Network Data Link Physical Session Presentation Application Network Data Link Physical

173 173 Packet Switching / Forwarding Three approaches –Datagram or connectionless approach –Virtual circuit or connection-oriented approach –Source routing Important notion: unique global address per host

174 174 Datagram Switching / Forwarding Every packet contains enough information –Enables switch to decide how to forward it Switch translates global address to output port –Maintains forwarding table for translations Each packet forwarded and travels independently

175 175 Datagram Switching Managing tables in large, complex networks with dynamically changing topologies is a real challenge for the routing protocol 0 1 3 2 0 1 3 2 0 1 3 2 Switch 3 Host B Switch 2 Host A Switch 1 Host C Host D Host E Host F Host G Host H At switch 1: DestPort#/Interface A 2 B 1 C 3 D 0 E 1 … …

176 176 Datagram Model No round trip time delay waiting for connection setup –Host can send data anywhere, anytime as soon as it is ready –Source has no way of knowing if the network is capable of delivering a packet or if the destination host is even up Packets are treated independently –Possible to route around link and node failures dynamically

177 177 Virtual Circuit Switching Explicit connection setup (and tear- down) phase from source to destination: connection-oriented model –Subsequence packets follow established circuit Supporting “connections” in network layer may be useful for service notions

178 178 VC Tables in VC Switching Setup message in signaling process (to create VC table) is forwarded like a datagram Acknowledgment of connection setup to downstream neighbors to complete signaling –Data transfer phase can start after ACK is received

179 179 Signaling in VC Switching Setup message is forwarded from Host A to Host B On connection request, each switch creates an entry in VC table with a VCI for the connection 0 1 3 2 2 1 3 0 0 1 3 2 Switch 3 Host B Switch 2 Switch 1 Host A I/F VCI in in out out setup B B B B B B B B 2 5 1 I/F VCI in in out out 2 7 3 I/F VCI in in out out 3 9 0

180 180 Virtual Circuit Model Typically wait full RTT for connection setup before sending first data packet –Can not avoid failures dynamically, must re-establish connection (old one is torn down to free storage space)

181 181 Source Routing Packet header contains sequence of address/ports on path from source to destination –One direction per switch: port, next switch (absolute) –Switches read, use, and then discard directions

182 182 Data Transfer in Source Routing Analogous to following directions 0 1 3 2 2 1 3 0 0 1 3 2 Switch 3 Host B Switch 2 Switch 1 Host A data 0 0 1 1 3 3 3 3 0 0 1 1 1 1 3 3 0 0 3 3 0 0 1 1 1 1 0 0 3 3 2 2 3 3 0 0 1 1

183 183 Source Routing Model Source host needs to know the correct and complete topology of the network –Changes must propagate to all hosts Packet headers may be large and variable in size: the length is unpredictable

184 184 Implementation and Performance Packet arriving at interface 1 has to go on interface 2 Point of contention for packets: I/O and memory bus

185 185 Building Extended LANs Traditional LAN –Shared medium (e.g. Ethernet) –Cheap, easy to administer –Supports broadcast traffic Problem –Want to scale LAN concept Larger geographic area (Greater than O(1 km)) More hosts (Greater than O(100)) –But retain LAN-like functionality Solution: bridges

186 186 Bridges Connect two or more LANs with a bridge –Transparently extends a LAN over multiple networks –Accept & forward strategy (in promiscuous mode) –Level 2 connection (does not add packet header) A Bridge BC XY Z Port 1 Port 2

187 187 Learning Bridges Learn table entries based on source address –Timeout entries to allow movement of hosts Table is an optimization need not be complete Always forward broadcast frames Uses datagram or connectionless forwarding A Bridge BC XY Z Port 1 Port 2 Host Port A 1 B 1 C 1 X 2 Y 2 Z 2

188 188 Learning Bridges Problem –Redundancy (desirable to handle failures, but …) –Makes extended LAN structure cyclic –Frames may cycle forever Solution: spanning tree B3 A C E D B2 B5 B B7 K F H B4 J B1 B6 G I

189 189 Spanning Tree Subset of forwarding possibilities All LAN’s reachable, but Acyclic Bridges run a distributed algorithm to calculate the spanning tree –Select which bridge actively forward –Developed by Radia Perlman of DEC –Now IEEE 802.1 specification –Reconfigurable algorithm

190 190 Spanning Tree Algorithm All designated bridges forward frames –On all designated ports –On preferred port (path leading to root) B3 A C E D B2 B5 B B7 K F H B4 J B1 B6 G I B2 LAN Designated port Preferred port Designated bridge

191 191 Distributed Spanning Tree Algorithm Bridges exchange configuration messages –ID for bridge sending the message –ID for what the sending bridge believes to be root bridge –Distance (hops) from sending bridge to root bridge

192 192 Limitations of Bridges Do not scale –Spanning tree algorithm does not scale –Broadcast does not scale Do not accommodate heterogeneity –Only supports networks with same address formats

193 193 ATM (Asynchronous Transfer Mode) Common in WANs, can also be used in LANs –Competing technology with Ethernet, but areas of application only partially overlap Connection-oriented packet- switched network –Virtual-circuit routing Typically implemented on SONET (other physical layers possible)

194 194 ATM Signaling Connection setup called signaling (standard Q.2931) Route discovery, resource resv, QoS,... Send through network –Request setup circuit –Send setup frame on setup circuit Establish locally –No intermediate switch involvement –Requires pre-established virtual path

195 195 Cell Switching (ATM) Fixed length (53 bytes) frames are called cells –5-byte (header + 1 – byte CRC – 8) + 48-byte payload Standard defines 3 layers (5 sublayers) –Layers interface to physical media and to higher layers (e.g. encapsulating variable-length frames)

196 196 Cell Switching (ATM) 2-level connection hierarchy –Virtual circuits –Virtual paths Bundles of virtual circuits Travel along common route Reduces forwarding information

197 197 ATM Cell Format User-Network Interface (UNI) –Host-to-switch format –GFC: Generic Flow Control (still being defined) –VCI/VPI: Virtual Circuit/Path Identifier –Type: management, congestion control, AAL5 (later) –CLP: Cell Loss Priority –HEC: Header Error Check (CRC-8) Network-Network Interface (NNI) –Switch-to-switch format –GFC becomes part of VPI field GFCVPIVCIType CLPHEC(CRC-8) payload 41631 8384 (48 bytes)8

198 198 Segmentation and Reassembly ATM Adaptation Layer (AAL) –Application to ATM cell mapping –AAL header contains information for reassembly –AAL1, AAL2 for applications needing guaranteed rate –AAL3/4 designed for variable-length packet data –AAL5 is an alternative standard for packet data AAL ATM AAL ATM ……

199 199 ATM Layers ATM Adaptation Layer (AAL) –Convergence Sublayer (CS) supports different application service models –Segmentation and Reassembly (SAR) supports variable-length frames ATM Layer –Handles virtual circuits, cell header generation, flow control Physical layer –Transmission Convergence (TC) handles error detection, framing –Physical medium dependent (PMD) sublayer handles encoding ATM AAL CS SAR PHY TC PMD

200 200 AAL 3/4 Provides information to allow variable size packets to be sent in fixed-size ATM cells Convergence Sublayer Protocol Data Unit (CS-PDU) –CPI: Common Part Indicator (version field) –Btag/Etag:beginning and ending tags (same) –BAsize: hint on reassembly buffer space to allocate –Length: size of whole PDU Segmented into cells: header/trailer + 44-byte data CPIBtagBAsize payload Pad0 Etag Length 8160-248 8< 64 KB816

201 201 ATM Cell Format for AAL 3/4 Type (is-start? and is-end? bits) –BOM (10): Beginning Of Message –COM (00): Continuation Of Message –EOM (01): End Of message –SSM (11): Single-Segment Message SEQ: Sequence Number (for cell loss/reordering) MID: multiplexing ID (mux onto virtual circuits) Length: number of bytes of PDU in this cell ATM header typeseqMID payload lengthCRC-10 404352 (44 bytes)6 21016

202 202 Encapsulation and Segmentation for AAL3/4 44 bytes <44 bytes ATM header AAL header Cell payload AAL trailer Padding CS-PDU header User data CS-PDU trailer < 64 KB4-7 bytes 4 bytes

203 203 AAL 5 CS-PDU CS-PDU Format –Pad so trailer always falls at the end of ATM cell –Length: size of PDU (data only) –CRC-32 (detects missing or misordered cells) Cell Format –End-of-PDU bit in Type field of ATM header 0 - 472 < 64 KB232 data padreservedlengthCRC-32

204 204 Encapsulation and Segmentation for AAL 5 User data 48 bytes ATM headerCell payload Padding CS-PDU trailer

205 205 Virtual Paths with ATM Two level hierarchy of virtual connection: 8-bit VPI and 16-bit VCI –Switches in the public network use 8-bit VPI –Corporate sites use full 24-bit address (VPI + VCI) –Much less connection-state info in switches –Virtual path: fat pipe with bundle of virtual circuits

206 206 ATM as a LAN Backbone Different from traditional LANs, no native support for broadcast or multicast

207 207 Shared Ethernet Emulation with LANE All hosts think they are on the same Ethernet LANE / Ethernet Adaptor Card LANE / Ethernet Adaptor Card H H H H H H H H H H Ethernet Switch ATM Switch LANE / Ethernet Adaptor Card LANE / Ethernet Adaptor Card H H H H H H H H H H Ethernet Switch ATM Switch

208 208 ATM / LANE Protocol Layers Higher-layer protocols (IP, ARP,...) Signalling + LANE AAL5 ATM PHY ATM PHY Higher-layer protocols (IP, ARP,...) Signalling + LANE AAL5 ATM HostSwitchHost PHY Ethernet-like interface

209 209 Clients and Servers in LANE LAN Emulation Client (LEC) –Host, bridge, router or switch LAN Emulation Server (LES) –Maintains client’s MAC and ATM addresses –Maintains ATM address of BUS

210 210 Clients and Servers in LANE LAN Emulation Configuration Server (LECS) –High-level network management when LEC starts up –Reachable by preset VC (recall known server port#) –Maintains mapping of ATM address to LANE type

211 211 Clients and Servers in LANE Broadcast and Unknown Server (BUS) –Emulates broadcast and multicast; critical to LANE –Uses point-to-multipoint VC with all clients Servers physically located in one or more devices LECS

212 212 LANE Registration 1.Client contacts LECS on predefined VC, and sends ATM address to it 2.LECS returns LAN type, MTU and ATM address of LES 3.Client signals connection to LES, and registers MAC and ATM addresses with LES 4.LES returns ATM address of BUS 5.Client signals connection to BUS 6.Bus adds client to point-to-multipoint VC ATM Network LECS LESBUS H1 H2 H3

213 213 LANE Circuit Setup 1.Client (H1) knows destination MAC address of receiver (H2) 2.Client (H1) sends 1 st packet to BUS 3.BUS sends address resolution request to LES 4.LES returns ATM address to client (H1) 5.Client (H1) signals connection to H2 for subsequent packets ATM Network LECS LESBUS H1 H2 H3

214 214 Contention in Switches Some packets destined for same output –One goes first –Others delayed or dropped Delaying packets requires buffering –Finite capacity, some packets must still drop –At inputs Increases/adds false contention Sometimes necessary –At outputs –Can also exert “backpressure”

215 215 Output Buffering 1x6 Switch x a Standard check-in lines Customer service trying to check-in you Mr. X writing complaint letter Mr. A waiting to claim refund of Rs.100

216 216 Input Buffering: Head-of-line Blocking 1x6 Switch x a Standard check-in lines Customer service trying to check-in you Mr. X writing complaint letter Mr. A waiting to claim refund of Rs.100 agents are standing by !

217 217 Backpressure Propagation delay requires that switch 2 exert backpressure at high-water mark rather than when buffer completely full It is thus typically only used in networks with small propagation delays (e.g. switch fabrics) Switch 1 Switch 2 “no more, please”

218 218 Switching Fabric Special-purpose (switching) hardware General problem –Connect N inputs to M outputs (NxM switch) –Often N=M (bidirectional links) Design goals –High throughput: want aggregate close to MIN (sum of inputs, sum of outputs) –Avoid contention (fabric faster than ports) –Good scalability:linear size/cost growth in N/M

219 219 Switch: Fabric and Ports Fabric has a job to deliver packets to the right output Input Port Input Port Input Port Input Port Output Port Output Port Output Port Output Port Fabric Switch fabric (with small internal buffering)

220 220 Ports and Fabric Ports deals with the complexity of the real world –Virtual circuit management is handled in ports –Determine output port using forwarding tables Input port is the first in performance bottlenecks –Header processing and handling packet to fabric

221 221 Design Goals - Throughput An n x m switch can provide max ideal throughput of: S = S 1 + S 2 + ……… + S n –Only possible if traffic at inputs is evenly distributed across all outputs –Sustained throughput higher than link speed of output is not possible

222 222 Design Goals - Scalability Cost of hardware rises fast with increasing the number of ports n –Adding ports increases hardware & design complexity –Scalability in terms of rate of increase in cost Design complexity determines maximum switch size –Switch designs run into problems at some maximum number of inputs and outputs

223 223 Switch Performance Avoid contention with buffering –Use output buffering when possible –Apply backpressure through fabric –Input buffering with “peeking” (non-FIFO semantics) to reduce head-of-line blocking problems –Drop packets if input buffer overflows Good scalability –O(N) ports –Port design complexity O(N) gives O(N 2 ) for switch –Port design complexity O(1) gives O(N) for switch

224 224 Crossbar (“Perfect”) Switch Problem: hardware scales as O(N 2 )

225 225 Knockout Switch: Pick L from N Problem: what if more than L arrive? 1 2 3 4 Outputs Inputs D DDDD D D D D D D D D D D 2 × 2 random selector delay unit 8-to-4 Concentrator

226 226 Shared Memory Switch MuxBuffer memoryDemux Write control Read control InputsOutputs ……

227 227 Self-Routing Fabrics Use source routing on “network” within switch Input port attaches output port number as header Fabric routes packet based on output port Types –Banyan Network –Batcher-Banyan Network –Sunshine Switch

228 228 Banyan Network Sends 0 bit up, 1 bit down MSB LSB

229 229 Batcher (Merge Sort) Network Routing packets through a Batcher network Batcher-Banyan Network –Attach the two-back-to-back –Arbitrary unique permutations routed without contention

230 230 Batcher-Banyan Network Sends 1 bit up Sends 0 bit down Sends 0 bit up Sends 1 bit down

231 231 Sunshine Switch Like a Knockout switch Re-circulates overflow packets i.e. when more than L arrive in one cycle (marks overflow packets)

232 232 What we understand … Concepts of networking and network programming –Elements of networks: nodes and links –Building a packet abstraction on a link Transmission, and units of communication data –How to detect transmission errors in a frame after encoding and framing it –How to simulate a reliable channel (sliding window) –How to arbitrate access to shared media in any network Design issues of direct link networks –Functionality of network adaptors

233 233 We also understand … How switches may provide indirect connectivity –Different ways to move through a network (forwarding) –Bridge approach to extending LAN concept –Example of a real virtual circuit network (ATM) –How switches are built and contention within switches Next: lets different networks “work together”

234 234 Internetworking Reading: Peterson and Davie, Ch. 4 Basics of Internetworking – Heterogeneity –The IP protocol, address resolution, control messages Dealing with simple heterogeneity issues –Defining a service model –Defining a global namespace –Structuring the namespace to simplify forwarding –Hiding variations in frame size limits

235 235 Internetworking Routing – moving forward with IP –Building forwarding information Dealing with global internets-scale –Virtual geography and addresses –Hierarchical routing –Name translation and lookup: translating between global and local (physical) names –Multicast traffic Future internetworking: IPv6

236 236 Internet Protocol (IP) Network protocol for the Internet Operates on all hosts and routers (routers connect distinct networks into the Internet) … TFTP NVHTTPFTP UDP TCP IP FDDI Ethernet ATM

237 237 IP Service Model Provided to transport layer (TCP, UDP) –Global name space –Host-to-host connectivity (connectionless) –“Best effort” packet delivery (datagram-based) No delivery guarantees on bandwidth, delay, etc. –Packet delayed for very long time –Packet lost –Packet delivered more than once –Packets delivered out of order Simplest model: ability of IP to “run over anything”

238 238 Internetwork Concatenation of networks Protocol stack Network 1 Ethernet Network 1 Ethernet Network 3 FDDI Network 3 FDDI Network 4 Ethernet Network 4 Ethernet R1 R2 R3 H8 H2 H3 H1 H4 H5 H6 H7 Network 2 Point -to- point R1 H1 TCP IP ETH PPP IP R2 PPP FDDI IP R3 FDDI ETH IP H8 TCP IP ETH

239 239 IP Addresses –18.10.5.22host in class A network (MIT) –130.126.143.254host in class B network (UIUC) –192.12.70.111host in class C network More recent classes –Multicast (class D): starts with 1110 –Future expansions (class E): starts with 1111 NetworkHost 7 bits (126 nets)24 bits (16 million hosts) 0 Class A: NetworkHost 14 bits (16k nets)16 bits (64K hosts) 10 Class B: NetworkHost 21 bits (2 million nets)8 bits (256) 110 Class C:

240 240 Datagram Format 4-bit version (4 for IPv4, 6 for IPv6) 4-bit header length (in words, minimum of 5) 8-bit type of service (TOS) more or less unused 16-bit datagram length (in bytes) 8-bit protocol (e.g. TCP=6 or UDP=17) VersionHLen TOSLength IdentFlagsOffset TTLProtocolChecksum SourceAddr DestinationAddr Options (variable) Pad (variable) 048161931 Data

241 241 Internet Protocol (IP) Service model: glob address, H-H connect, BE Overview of message transmission Host addressing and address translation Datagram forwarding Fragmentation and reassembly Error reporting/control messages Dynamic configuration Protocol extensions through tunneling Note: congestion control not handled by IP

242 242 Fragmentation and Reassembly Example Ident = xOffset = 0 Start of header 0 Rest of header 1400 data bytes Ident = xOffset = 0 Start of header 1 Rest of header 512 data bytes Ident = xOffset = 64 Start of header 1 Rest of header 512 data bytes Ident = xOffset = 128 Start of header 0 Rest of header 376 data bytes

243 243 Datagram Forwarding Network #NetmaskNest hop / port 18.0.0.0255.0.0.01 128.32.0.0255.255.0.02 0.0.0.00.0.0.03 dest: 18.26.10.0mask with 255.0.0.0 matched! send to port 1 dest: 128.16.14.0mask with 255.0.0.0 not matched mask with 255.255.0.0 not matched mask with 0.0.0.0 matched! send to port 3

244 244 ARP Packet Format TargetHardwareAddr (bytes 2–5) TargetProtocolAddr (bytes 0–3) SourceProtocolAddr (bytes 2–3) Hardware type = 1ProtocolType = 0x0800 SourceHardwareAddr (bytes 4–5) TargetHardwareAddr (bytes 0–1) SourceProtocolAddr (bytes 0–1) HLen = 48PLen = 32Operation SourceHardwareAddr (bytes 0–3) 081631

245 245 Internet Control Message Protocol (ICMP) IP companion protocol (not necessary) Handles error and control messages … TFTP NVHTTPFTP UDP TCP IP FDDI Ethernet ATM ICMP

246 246 ICMP Message Sent to the source when a node is unable to process IP datagram successfully Error messages –Destination unreachable (protocol, port, or host) –Reassembly failed –IP Checksum failed; or invalid header –TTL exceeded (so datagrams don’t cycle forever) –Cannot fragment Control messages –Echo (ping) request and reply –Redirect (from router to source host, to change route)

247 247 Dynamic Host Configuration Protocol- DHCP DHCP server is required to provide configuration information to each host –Each host retrieve this information on bootup DHCP server can be configured manually, or it may allocate addresses on-demand –Addresses are “leased” for some period of time Each host is not configured for DHCP server, it performs a DHCP server discovery –A broadcast discovery message is sent by the host and a unicast reply is sent by the server

248 248 Virtual Private Networks - VPN Controlled connectivity –Restrict forwarding to authorized hosts Controlled capacity –Change router drop and priority policies –Provide guarantees on bandwidth, delay, etc. Virtual net replaces leased line with shared net Unwanted connectivity is prevented on this logical link using IP tunnel

249 249 IP Tunnel in VPNs Virtual point-to-point link between a pair of nodes separated by many networks IP header, Destination = 2.x IP payload IP header, Destination = 10.0.0.1 IP header, Destination = 2.x IP payload IP header, Destination = 2.x IP payload Network 1R1 Internetwork Network 2R2 10.0.0.1

250 250 IP Tunneling for Multicast Set up a tunnel between each pair of universities Multicast packets –Received by tunnel entry node –Encapsulated (another IP header added for tunnel exit) –Travel through the Internet (the tunnel) –Received by tunnel exit node –Unwrapped and delivered to another multicast-capable university campus

251 251 What is Routing ? Definition: task of constructing and maintaining forwarding information (in hosts or in switches) Goals for routing –Capture notion of “best” routes –Propagate changes effectively –Require limited information exchange –Admit efficient implementation Important notion: graph representation of network

252 252 Routing Overview Hierarchical routing infrastructure defines routing domains –Where all routers are under same administrative control Network as a Graph –Nodes are routers –Edges are links –Each link has a cost Problem: Find lowest cost path between two nodes –Maintain information about each link –Static: topology changes are not incorporated –Dynamic (or distributed): complex algorithms

253 253 Routing Outline Algorithms –Static shortest path algorithms Bellman-Ford: all pairs shortest paths to destination Dijkstra’s algorithm: single source shortest path –Distributed, dynamic routing algorithms Distance Vector routing (based on Bellman-Ford) Link State routing (Dijkstra’s algorithm at each node) Metrics (from ArpaNet, with informative names) –Original –New –Revised

254 254 Bellman-Ford Algorithm Static, centralized algorithm, (local iterations/destination) Requires: directed graph with edge weights (cost) Calculates: shortest paths for all directed pairs Check use of each node as successor in all paths For every node N –for each directed pair (B,C) is the path B  N   …C better than B  C ? is cost B  N  destination smaller than previously known? For N nodes –Uses an NxN matrix of (distance, successor) values

255 255 Dijkstra’s Algorithm Static, centralized algorithm, build tree from source Requires directed graph with edge weights (distance) Calculates: shortest paths from 1 node to all other Greedily grow set S of known minimum paths From node N –Start with S = {N} and one-hop paths from N –Loop n-1 times add closest outside node M to S for each node P not in S –is the path N.....M  P better than N  P ?

256 256 Distance Vector Routing Distributed, dynamic version of Bellman-Ford Each node maintains distance vector: set of triples –(Destination, Cost, NextHop) –Edge weights starting at a node assumed known by that node Exchange updates of distance vector ( Destination, Cost ) with directly connected neighbors (known as advertising the routes) –Periodically (on the order of several seconds to minutes) –Whenever vector changes (called triggered update)

257 257 Distance Vector Routing Example Information in routing table of each node: Iteration 3 At distance to reach node node A B C D E F G A 0 1 1 2 1 1 2 B 1 0 1 2 2 2 3 C 1 1 0 1 2 2 2 D 2 2 1 0 3 2 1 E 1 2 2 3 0 2 3 F 1 2 2 2 2 0 1 G 2 3 2 1 3 1 0

258 258 Distance Vector Routing: Link Failure F detects that link to G has failed F sets distance to G to infinity and sends update to A A sets distance to G to infinity since it uses F to reach G A receives periodic update from C with 2-hop path to G A sets distance to G to 3 and sends update to F F decides it can reach G in 4 hops via A

259 259 Count to Infinity Problem Link from A to E fails A advertises distance of infinity to E, but B and C advertise a distance of 2 to E ! B decides it can reach E in 3 hops; advertises this to all A decides it can read E in 4 hops; advertises this to all C decides that it can reach E in 5 hops… We are counting to infinity …

260 260 Split Horizon Avoid counting to infinity by solving “mutual deception” problem When sending an update to node X, do not include destinations that you would route through X –If X thinks route is not through you, no effect –If X thinks route is through you, X will timeout route A A B B C C D D C : 1 : C C : 2 : B C : ∞ : - C : 2 : B Loop of > 2 nodes fails split horizon !!!

261 261 Split Horizon with Poison Reverse When sending update to node X, include destinations that you would route through X with distance set to infinity Don’t need to wait for X to timeout

262 262 Link State Routing Distributed, dynamic form of Dijkstra’s algorithm Strategy –Send to all nodes (not just neighbors) information about directly connected nodes (not entire route table) in LSP Basic data structure: Link State Packet (LSP) –ID of the node that created the LSP –Cost of link to each directly connected neighbor: vector of (distance, successor) values –Sequence number (SEQNO) –Time-to-live (TTL) for this packet

263 263 Link State Routing Each node maintains a list of (ideally all) LSP’s –Runs Dijkstra’s algorithm on list –May discover its neighbors by “Hello” messages Information acquisition via reliable flooding –Create new LSP periodically; send to 1-hop neighbors Increment SEQNO (start SEQNO at 0 when reboot) –Store most recent (higher SEQNO) LSP from each node –Forward new LSP to all nodes but the one that sent it Decrement TTL of each LSP; discard when TTL=0 –Try to minimize routing traffic “overhead”

264 264 Route Calculation At node D Confirmed listTentative list 1.(D,0,-) 2.(D,0,-)(C,2,C), (B,11,B) 3.(D,0,-), (C,2,C)(B,11,B) 4.(D,0,-), (C,2,C)(B,5,C), (A,12,C) 5.(D,0,-), (C,2,C), (B,5,C)(A,12,C) 6.(D,0,-), (C,2,C), (B,5,C)(A,10,C) 7.(D,0,-), (C,2,C), (B,5,C), (A,10,C) D A B C 5 3 2 11 10


Download ppt "11 CS716 Advanced Computer Networks By Dr. Amir Qayyum."

Similar presentations


Ads by Google