CS551: Queue Management Christos Papadopoulos (http://netweb.usc.edu/cs551/)http://netweb.usc.edu/cs551/)

Congestion control vs. resource allocation Network’s key role is to allocate its transmission resources to users or applications Two sides of the same coin –let network do resource allocation (e.g., VCs) difficult to do allocation of distributed resources can be wasteful of resources –let sources send as much data as they want recover from congestion when it occurs easier to implement, may lose packets

Connectionless Flows How can a connectionless network allocate anything to a user? –It doesn’t know about users or applications Flow: –a sequence of packets between same source - destination pair, following the same route Flow is visible to routers - it is not a channel, which is an end-to-end abstraction Routers may maintain soft-state for a flow Flow can be implicitly defined or explicitly established (similar to VC) –Different from VC in that routing is not fixed

Taxonomy Router-centric v.s. Host-centric –router-centric: address problem from inside network - routers decide what to forward and what to drop A variant not captured in the taxonomy: adaptive routing! –host centric: address problem at the edges - hosts observe network conditions and adjust behavior –not always a clear separation: hosts and routers may collaborate, e.g., routers advise hosts

..Taxonomy.. Reservation-based v.s. Feedback-based –Reservations: hosts ask for resources, network responds yes/no implies router-centric allocation –Feedback: hosts send with no reservation, adjust according to feedback either router or host centric: explicit (e.g., ICMP source quench) or implicit (e.g., loss) feedback

..Taxonomy Window-based v.s. Rate-based Both tell sender how much data to transmit Window: TCP flow/congestion control –flow control: advertised window –congestion control: cwnd Rate: still an open area of research –may be logical choice for reservation-based system

Service Models In practice, fewer than eight choices Best-effort networks –Mostly host-centric, feedback, window based –TCP as an example Networks with flexible Quality of Service –Router-centric, reservation, rate-based

Queuing Disciplines Each router MUST implement some queuing discipline regardless of what the resource allocation mechanism is Queuing allocates bandwidth, buffer space, and promptness: –bandwidth: which packets get transmitted –buffer space: which packets get dropped –promptness: when packets get transmitted

FIFO Queuing FIFO:first-in-first-out (or FCFS: first-come-first- serve) Arriving packets get dropped when queue is full regardless of flow or importance - implies drop- tail Important distinction: –FIFO: scheduling discipline (which packet to serve next) –Drop-tail: drop policy (which packet to drop next)

Dimensions Scheduling Per-connection state Single class Drop position Head Tail Random location Class-based queuing Early dropOverflow drop FIFO

..FIFO FIFO + drop-tail is the simplest queuing algorithm –used widely in the Internet Leaves responsibility of congestion control to edges (e.g., TCP) FIFO lets large user get more data through but shares congestion with others –does not provide isolation between different flows –no policing

Fair Queuing Demmers90

Fair Queuing Main idea: –maintain a separate queue for each flow currently flowing through router –router services queues in Round-Robin fashion Changes interaction between packets from different flows –Provides isolation between flows –Ill-behaved flows cannot starve well-behaved flows –Allocates buffer space and bandwidth fairly

FQ Illustration Flow 1 Flow 2 Flow n I/P O/P Variation: Weighted Fair Queuing (WFQ)

Some Issues What constitutes a user? –Several granularities at which one can express flows –For now, assume at the granularity of source- destination pair, but this assumption is not critical Packets are of different length –Source sending longer packets can still grab more than their share of resources –We really need bit-by-bit round-robin –Fair Queuing simulates bit-by-bit RR not feasible to interleave bits!

Bit-by-bit RR Router maintains local clock Single flow: suppose clock ticks when a bit is transmitted. For packet i: –Pi: length, Ai = arrival time, Si: begin transmit time, Fi: finish transmit time. Fi = Si+Pi –Fi = max (Fi-1, Ai) + Pi Multiple flows: clock ticks when a bit from all active flows is transmitted

Fair Queuing While we cannot actually perform bit-by-bit interleaving, can compute (for each packet) Fi. Then, use Fi to schedule packets –Transmit earliest Fi first Still not completely fair –But difference now bounded by the size of the largest packet –Compare with previous approach

Fair Queuing Example F=10 Flow 1 (arriving) Flow 2 transmitting Output F=2 F=5 F=8 Flow 1Flow 2 Output F=10 Cannot preempt packet currently being transmitted

Delay Allocation Aim: give less delay to those using less than their fair share Advance finish times for sources whose queues drain temporarily Bi = Pi + max (Fi-1, Ai -  ) Schedule earliest Bi first

Allocate Promptness Bi = Pi + max (Fi-1, Ai -  )  gives added promptness: –if Ai < Fi-1, conversation is active and  does not affect it: Fi = Pi + Fi-1 –if Ai > Fi-1, conversation is inactive and  determines how much history to take into account

Notes on FQ FQ is a scheduling policy, not a drop policy Still achieves statistical muxing - one flow can fill entire pipe if no contenders – FQ is work conserving WFQ is a possible variation – need to learn about weights off line. Default is one bit per flow, but sending more bits is possible

More Notes on FQ Router does not send explicit feedback to source - still needs e2e congestion control –FQ isolates ill-behaved users by forcing users to share overload with themselves –user: flow, transport protocol, etc Optimal behavior at source is to keep one packet in the queue But, maintaining per flow state can be expensive –Flow aggregation is a possibility

Congestion Avoidance TCP’s approach is reactive: –detect congestion after it happens –increase load trying to maximize utilization until loss occurs –TCP has a congestion avoidance phase, but that’s different from what we’re talking about here Alternatively, we can be proactive: –we can try to predict congestion and reduce rate before loss occurs –this is called congestion avoidance

Router Congestion Notification Routers well-positioned to detect congestion –Router has unified view of queuing behavior –Routers can distinguish between propagation and persistent queuing delays –Routers can decide on transient congestion, based on workload Hosts themselves are limited in their ability to infer these from perceived behavior

Router Mechanisms Congestion notification –the DEC-bit scheme explicit congestion feedback to the source –Random Early Detection (RED) implicit congestion feedback to the source well suited for TCP

Design Choices for Feedback What kind of feedback –Separate packets (source quench) –Mark packets, receiver propagates marks in ACKs When to generate feedback –Based on router utilization You can be near 100% utilization without seeing a throughput degradation –Queue lengths But what queue lengths (instantaneous, average)?

A Binary Feedback Scheme for Congestion Control in Computer Networks (DEC-bit) Ramakrishnan90

The Dec-bit Scheme

Basic ideas: –on congestion, router sets a bit (CI) bit on packet –receiver relays bit to sender in acknowledgements –sender uses feedback to adjust sending rate Key design questions: –Router: Feedback policy (how and when does a router generate feedback) –Source: Signal filtering (how does the sender respond?)

Why Queue Lengths? It is desirable to implement FIFO –Fast implementations possible –Shares delay among connections –Gives low delay during bursts FIFO queue length is then a natural choice for detecting the onset of congestion

The Use of Hysteresis If we use queue lengths, at what queue lengths should we generate feedback? –Threshold or hysteresis? Surprisingly, simulations showed that if you want to increase power –Use no hysteresis –Use average queue length threshold of 1 –Maximizes power function Power = throughput/delay

Computing Average Queue Lengths Possibilities: –Instantaneous Premature, unfair –Averaged over a fixed time window, or exponential average Can be unfair if time window different from round-trip time Solution –Adaptive queue length estimation: busy/idle cycles –But need to account for long current busy periods

Sender Behavior How often should the source change window? In response to what received information should it change its window? By how much should the source change its window? –We already know the answer to this: AIMD DEC-bit scheme uses a multiplicative factor of 0.875

How Often to Change Window? Not on every ACK received –Window size would oscillate dramatically because it takes time for a window change’s effects to be felt If window changes to W, it takes (W+1) packets for feedback about that window to be received Correct policy: wait for (W+W’) acks –Where W is window size before update and W’ is size after update

Using Received Information Use the CI bits from W’ acks in order to decide whether congestion still persists Clearly, if some fraction of bits are set, then congestion exists What fraction? –Depends on the policy to set the threshold –When queue size threshold is 1, cutoff fraction should be 0.5 –This has the nice property that the resulting power is relatively insensitive to this choice

Changing the Sender’s Window Sender policy –monitor packets within a window –make change if more than 50% of packets had CI set: if < 50% had CI set, then increase window by 1 else new window = window * 0.875 –additive increase, multiplicative decrease for stability

Dec-bit Evaluation Relatively easy to implement No per-connection state Stable Assumes cooperative sources Conservative window increase policy Some analytical intuition to guide design –Most design parameters determined by extensive simulation

Random Early Detection (RED) Floyd93

Random Early Detection (RED) Motivation: –high bandwidth-delay flows have large queues to accommodate transient congestion –TCP detects congestion from loss - after queues have built up and increase delay Aim: –keep throughput high and delay low –accommodate bursts

Why Active Queue Management? (Rfc2309) Lock-out problem –drop-tail allows a few flows to monopolize the queue space, locking out other flows (due to synchronization) Full queues problem: –drop tail maintains full or nearly-full queues during congestion; but queue limits should reflect the size of bursts we want to absorb, not steady-state queuing

Other Options Random drop: –packet arriving when queue is full causes some random packet to be dropped Drop front: –on full queue, drop packet at head of queue Random drop and drop front solve the lock- out problem but not the full-queues problem

Solving the Full Queues Problem Drop packets before queue becomes full (early drop) Intuition: notify senders of incipient congestion –example: early random drop (ERD): if qlen > drop level, drop each new packet with fixed probability p does not control misbehaving users

Differences With Dec-bit Random marking/dropping of packets Exponentially weighted queue lengths Senders react to single packet Rationale: –Exponential weighting better for high bandwidth connections –No bias when weighting interval different from round- trip time, since packets are marked randomly –Random marking avoids bias against bursty traffic

RED Goals Detect incipient congestion, allow bursts Keep power (throughput/delay) high –keep average queue size low –assume hosts respond to lost packets Avoid window synchronization –randomly mark packets Avoid bias against bursty traffic Some protection against ill-behaved users

RED Operation Min thresh Max thresh Average queue length minthreshmaxthresh MaxP 1.0 Avg length P(drop)

Queue Estimation Standard EWMA: avg - (1-w q ) avg + w q qlen Upper bound on w q depends on min th –want to set w q to allow a certain burst size Lower bound on w q to detect congestion relatively quickly

Thresholds min th determined by the utilization requirement –Needs to be high for fairly bursty traffic max th set to twice min th –Rule of thumb –Difference must be larger than queue size increase in one RTT Bandwidth dependence

Packet Marking Marking probability based on queue length –Pb = max p (avg - min th ) / (max th - min th ) Just marking based on Pb can lead to clustered marking -> global synchronization Better to bias Pb by history of unmarked packets –Pb = Pb/(1 - count*Pb)

RED Algorithm

RED Variants FRED: Fair Random Early Drop (Sigcomm, 1997) –maintain per flow state only for active flows (ones having packets in the buffer) CHOKe (choose and keep/kill) (Infocom 2000) –compare new packet with random pkt in queue –if from same flow, drop both –if not, use RED to decide fate of new packet

Extending RED for Flow Isolation Problem: what to do with non-cooperative flows? Fair queuing achieves isolation using per- flow state - expensive at backbone routers Pricing can have a similar effect –But needs much infrastructure to be developed How can we isolate unresponsive flows without per-flow state?

Red Penalty Box With RED, monitor history for packet drops, identify flows that use disproportionate bandwidth Isolate and punish those flows

Flows That Must Be Regulated Unresponsive: –fail to reduce load in response to increased loss Not TCP friendly –long-term usage exceeds that of TCP under same conditions Using disproportionate bandwidth –use disproportionately more bandwidth than other flows during congestion Assumptions: –We can monitor a flow’s arrival rate

Identifying Flows to Regulate Not TCP friendly: use TCP model –TCP tput: (1.5*sqrt(0.66B)) / (RTT*sqrt(p)) –B: packet size in bytes, p: packet drop rate –Better approximation in Padhye et al. paper –Problems: Needs bounds on packet sizes and RTTs Unresponsive –if drop rate increases by x then arrival rate should decrease by a factor of sqrt(x)

..Flows to Regulate Flows using disproportionate bandwidth –assume additive increase, multiplicative decrease only flows –assume cwin = W at loss –can be shown that: loss prob <= 8/(3W 2 ) –for segment size B: –tput < 0.75W*B/RTT

CS551: Queue Management Christos Papadopoulos (http://netweb.usc.edu/cs551/)http://netweb.usc.edu/cs551/)

Similar presentations

Presentation on theme: "CS551: Queue Management Christos Papadopoulos (http://netweb.usc.edu/cs551/)http://netweb.usc.edu/cs551/)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS551: Queue Management Christos Papadopoulos (http://netweb.usc.edu/cs551/)http://netweb.usc.edu/cs551/)

Similar presentations

Presentation on theme: "CS551: Queue Management Christos Papadopoulos (http://netweb.usc.edu/cs551/)http://netweb.usc.edu/cs551/)"— Presentation transcript:

Similar presentations

About project

Feedback