CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/22/20091CSE 124 Networked Services Fall 2009 Some.

CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 10/22/20091CSE 124 Networked Services Fall 2009 Some of these slides are adapted from various sources/individuals including but not limited to the slides from the text books by Kurose and Ross, digital libraries such as IEEE/ACM digital libraries and slides from Prof. Vahdat. Use of these slides other than for pedagogical purpose for CSE 124, may require explicit permissions from the respective sources.

Announcements Programming Assignment 1 – Submission window 23-26 th October Week-3 Homework – Due on 26 th October First Paper Discussion – Discussion on 29 th October – Write-up due on: 28 th October Midterm: November 5 10/22/20092CSE 124 Networked Services Fall 2009

TCP Round Trip Time and Timeout EstimatedRTT = (1-  )*EstimatedRTT +  *SampleRTT r Exponential weighted moving average r influence of past sample decreases exponentially fast r typical value:  = 0.125 10/22/20093CSE 124 Networked Services Fall 2009

TCP Round Trip Time and Timeout Setting the timeout EstimtedRTT plus “safety margin” – large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT: TimeoutInterval = EstimatedRTT + 4*DevRTT TimeoutInterval is expoentially increased with every retransmission DevRTT = (1-  )*DevRTT +  *|SampleRTT-EstimatedRTT| (typically,  = 0.25) Then set timeout interval: 10/22/20094CSE 124 Networked Services Fall 2009

Fast Retransmit time-out period often relatively long: – long delay before resending lost packet detect lost segments via duplicate ACKs. – sender often sends many segments back-to-back – if segment is lost, there will likely be many duplicate ACKs for that segment If sender receives 3 ACKs for same data, it assumes that segment after ACKed data was lost: – fast retransmit: resend segment before timer expires 10/22/20095CSE 124 Networked Services Fall 2009

Host A timeout Host B time X resend seq X2 seq # x1 seq # x2 seq # x3 seq # x4 seq # x5 ACK x1 triple duplicate ACKs 10/22/20096CSE 124 Networked Services Fall 2009

TCP congestion control: r TCP sender should transmit as fast as possible, but without congesting network m Q: how to find rate just below congestion level r decentralized: each TCP sender sets its own rate, based on implicit feedback: m ACK: segment received (a good thing!), network not congested, so increase sending rate m lost segment: assume loss due to congested network, so decrease sending rate 10/22/20097CSE 124 Networked Services Fall 2009

TCP congestion control: bandwidth probing r “probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate m continue to increase on ACK, decrease on loss (since available bandwidth is changing, depending on other connections in network) ACKs being received, so increase rate X X X X X loss, so decrease rate sending rate time r Q: how fast to increase/decrease? m details to follow TCP’s “sawtooth” behavior 10/22/20098CSE 124 Networked Services Fall 2009

TCP Congestion Control: details sender limits rate by limiting number of unACKed bytes “in pipeline”: – cwnd: differs from rwnd (how, why?) – sender limited by min(cwnd,rwnd) roughly, cwnd is dynamic, function of perceived network congestion rate = cwnd RTT bytes/sec LastByteSent-LastByteAcked  cwnd cwnd bytes RTT ACK(s) 10/22/20099CSE 124 Networked Services Fall 2009

TCP Congestion Control: more details segment loss event: reducing cwnd timeout: no response from receiver – cut cwnd to 1 3 duplicate ACKs: at least some segments getting through (recall fast retransmit) – cut cwnd in half, less aggressively than on timeout ACK received: increase cwnd r slowstart phase: m increase exponentially fast (despite name) at connection start, or following timeout r congestion avoidance: m increase linearly 10/22/200910CSE 124 Networked Services Fall 2009

TCP Slow Start when connection begins, cwnd = 1 MSS – example: MSS = 500 bytes & RTT = 200 msec – initial rate = 20 kbps available bandwidth may be >> MSS/RTT – desirable to quickly ramp up to respectable rate increase rate exponentially until first loss event or when threshold reached – double cwnd every RTT – done by incrementing cwnd by 1 for every ACK received Host A one segment RTT Host B time two segments four segments 10/22/200911CSE 124 Networked Services Fall 2009

TCP slow (exponential) start 10/22/2009CSE 124 Networked Services Fall 200912

Transitioning into/out of slowstart ssthresh: cwnd threshold maintained by TCP on loss event: set ssthresh to cwnd/2 – remember (half of) TCP rate when congestion last occurred when cwnd >= ssthresh : transition from slowstart to congestion avoidance phase slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment  cwnd > ssthresh cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed new ACK dupACKcount++ duplicate ACK  cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 congestion avoidance 10/22/200913CSE 124 Networked Services Fall 2009

TCP: congestion avoidance when cwnd > ssthresh grow cwnd linearly – increase cwnd by 1 MSS per RTT – approach possible congestion slower than in slowstart – implementation: cwnd = cwnd + MSS/cwnd for each ACK received r ACKs: increase cwnd by 1 MSS per RTT: additive increase r loss: cut cwnd in half (non-timeout-detected loss ): multiplicative decrease AIMD AIMD: Additive Increase Multiplicative Decrease 10/22/200914CSE 124 Networked Services Fall 2009

TCP congestion control FSM: details slow start congestion avoidance fast recovery timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment  cwnd > ssthresh cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s),as allowed new ACK. dupACKcount++ duplicate ACK ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment dupACKcount == 3 dupACKcount++ duplicate ACK ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment dupACKcount == 3 timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment cwnd = cwnd + MSS transmit new segment(s), as allowed duplicate ACK cwnd = ssthresh dupACKcount = 0 New ACK  cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 10/22/200915CSE 124 Networked Services Fall 2009

Popular “flavors” of TCP ssthresh TCP Tahoe TCP Reno Transmission round cwnd window size (in segments) 10/22/200916CSE 124 Networked Services Fall 2009

Summary: TCP Congestion Control when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. when cwnd >= ssthresh, sender is in congestion- avoidance phase, window grows linearly. when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS. 10/22/200917CSE 124 Networked Services Fall 2009

Simplified TCP throughput Average throughout of TCP as function of window size, RTT? – ignoring slow start let W be window size when loss occurs. – when window is W, throughput is W/RTT – just after loss, window drops to W/2, throughput to W/2RTT. – average throughout:.75 W/RTT 10/22/200918CSE 124 Networked Services Fall 2009

Assuming in a cycle, 1 packet is lost Therefore, the loss rate L is obtained as Since we can get Throughput = 10/22/2009CSE 124 Networked Services Fall 200919 TCP throughput as a function of Loss rate.75 W/RTT=

TCP Futures: TCP over “long, fat pipes” example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput requires window size W = 83,333 in-flight segments throughput in terms of loss rate: Required value of packet loss rate, L = 2x10 -10 Existing TCP may not scale well in future networks Need new versions of TCP for high-speed 10/22/200920CSE 124 Networked Services Fall 2009

fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 bottleneck router capacity R TCP connection 2 TCP Fairness 10/22/200921CSE 124 Networked Services Fall 2009

Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R R equal bandwidth share Connection 1 throughput Connection 2 throughput congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 10/22/200922CSE 124 Networked Services Fall 2009

Fairness (more) Fairness and UDP multimedia apps often do not use TCP – do not want rate throttled by congestion control instead use UDP: – pump audio/video at constant rate, tolerate packet loss Fairness and parallel TCP connections nothing prevents app from opening parallel connections between 2 hosts. web browsers do this example: link of rate R supporting 9 connections; – new app asks for 1 TCP, gets rate R/10 – new app asks for 11 TCPs, gets R/2 ! 10/22/200923CSE 124 Networked Services Fall 2009

Two TCP flows sharing a link TCP and UDP flows sharing a link 10/22/2009CSE 124 Networked Services Fall 200924 Bandwidth sharing with TCP

Networks Vs Processors Network speeds – 100Mbps to 1Gbps to 10Gbps Network protocol stack throughput – is good for only 100Mbps, – with fine-tuning, OK for 1Gbps – What about 10Gbps? Example – Payload size: 1460B, 2-3GHz processor – Receive throughput achieved: 750Mbps – Transmit throughput achieved: 1Gbps Need radical solutions to support 10Gbps and beyond 10/22/2009CSE 124 Networked Services Fall 200925

Where is the overhead? TCP was suspected of being too complex – In 1989, Clarke, Jacobson and others proved otherwise The complexity (overhead) lies in – Computing environment where TCP operates Interrupts OS scheduling Buffering Data movement Simple solutions that improves performance – Interrupt moderation NIC waits for multiple packets and notify the processor once Amortize the high cost of interrupts – Checksum offload Checksum calculation in processor is costly Offload checksum calculation to NIC (in hardware) – Large Segment offload Segment large chunks of data to smaller segments is expensive Offload segmentation and TCP/IP header preparation to NIC Useful for sender-side TCP – Can support upto ~1Gbps PHYs 10/22/2009CSE 124 Networked Services Fall 200926

Challenges in detail OS issues – Interrupts Interrupt moderation Polling Hybrid interrupts Memory – Latency Memory is slower than processor – Poor cache locality New Data entering from NIC or application Cache miss and CPU stall is common Buffering and copying – Usually two copies required Application to TCP copy and TCP to NIC copy – Receive side: Copy can be reduced to one if posted buffers are provided by application Mostly two copy required – Transmit side: Zero copy on Transmit (DMA from Application to NIC) can help Implemented on selected systems 10/22/2009CSE 124 Networked Services Fall 200927

TCP/IP Acceleration Methods Three main strategies – TCP Offloading Engine (TOE) – TCP Onloading – Stack and NIC enhancements TCP Offloading Engine – Offload TCP/IP processing to devices attached to the server’s I/O system – Use separate processing and memory resources – Pros Improves throughput and utilization performance Useful for bulk data transfer such as IP-storage Good for few connections with high bandwidth links – Cons May not scale well to large number of connections Needs special processors (expensive) Needs high memory in NIC (expensive) Store and forward in ToE is suitable only for large transfers – Latency between I/O subsystem and main memory is high Expensive TOEs or NICs are required 10/22/2009CSE 124 Networked Services Fall 200928 Processor Cache memory NIC device TCP Offload Engine

TCP onloading Dedicate TCP/IP processing to one or more general purpose cores – high performance – Cheap – Main memory to CPU latency is small Extensible – Programming tools and implementations exist – Good for long term performance Scalable – Good for large number of flows 10/22/2009CSE 124 Networked Services Fall 200929 Cache memory NIC device Core 1 (Applic ation) Core 2 (Applic ation) Core 3 (TCP/IP Processing or onloading) Core 0 (Applic ation)

Stack and NIC enhancements Asynchronous I/O – Asynchronous call backs on data arrival – Pre-posting buffers by application to avoid copying Header Splitting – Splitting headers and data – Better data pre-fetching – NIC can place the header Receive-side scaling – Using multiple cores to achieve connection level parallelism – Have multiple Queues in NIC – Map each queue to mapped to a different processor 10/22/2009CSE 124 Networked Services Fall 200930

Summary Reading assignment TCP from Chapter 3 in Kurose and Ross TCP from Chapter 5 in Peterson and Davie Homework: – Problems P37 and P43 (Pages 306-308) from Kurose and Ross – Deadline: 30 th October 2009 10/22/200931CSE 124 Networked Services Fall 2009

CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/22/20091CSE 124 Networked Services Fall 2009 Some.

Similar presentations

Presentation on theme: "CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/22/20091CSE 124 Networked Services Fall 2009 Some."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/22/20091CSE 124 Networked Services Fall 2009 Some.

Similar presentations

Presentation on theme: "CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/22/20091CSE 124 Networked Services Fall 2009 Some."— Presentation transcript:

Similar presentations

About project

Feedback