Presentation is loading. Please wait.

Presentation is loading. Please wait.

HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs.

Similar presentations


Presentation on theme: "HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs."— Presentation transcript:

1 HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs

2 2 IEEE Infocom 2003 Outline Motivation: Unique System Area Network (SAN) characteristics require new congestion control approach Proposed approach appropriate for SANs: –ECN packet marking –Source response: rate control with window limit Focus: Design of source response functions –New convergence conditions, design methodology –New functions: LIPD and FIMD Performance Evaluation: LIPD, FIMD, AIMD Conclusions

3 HP Labs 3 IEEE Infocom 2003 System Area Networks Characteristics InfiniBand example: Industry standard server interconnect – 2Gb/s(1x) to 24Gb/s(12x) links Characteristics: congestion control implications –No packet dropping  Need network support for detecting congestion – Low network latency (tens of ns cut-through switching)  Simple logic for hardware implementation – Low buffer capacity at switches (e.g., 2KB input buffer stores only four 512-byte packets)  TCP window mechanism inadequate (narrow operational range) – Input-buffered switches  Alternative congestion detection mechanisms

4 HP Labs 4 IEEE Infocom 2003 Problem: Congestion Spreading Link BW: 8 Gb/s (4x link) Packet Size: 2 KB Buffer Size: 4 packets/port (8 KB) Buffer Org.: Input port Flow not using congested link suffers performance degradation (victim flow) Simulation (R=L=10) Remote flows use only 30% of inter- switch link bandwidth Contention for root link  full buffer  prevents victim flow from using remaining inter- switch link bandwidth non-congested link local flows (L) SWITCH A SWITCH B congested link remote flows (R) victim flow Inter-switch Link

5 HP Labs 5 IEEE Infocom 2003 Our Congestion Control Approach Explicit Congestion Notification (ECN) for input-buffered switches Source adjusts packet injection according to network feedback encoded in ECN returned via ACK –Combines window and rate control –New source response functions more efficient than AIMD

6 HP Labs 6 IEEE Infocom 2003 Source Response: Rate Control with Window Limit Window Control + Self-clocked, bounds switch buffer utilization – Narrow operational range (window=2 uses all bandwidth in idle network) – Window=1 is too large if # flows > # buffer slots Rate Control + Low buffer util. possible (< 1 packet per flow) + Wide operational range – Not self-clocked Proposed Approach: Rate control with a fixed window limit (w=1)

7 HP Labs 7 IEEE Infocom 2003 Designing Rate Control Functions Definition: When source receives ACK Decrease rate on marked ACK: r new = f dec (r) Increase rate on unmarked ACK: r new = f inc (r) f dec (r) and f inc (r) should provide : –Congestion avoidance –High network bandwidth utilization –Fair allocation of bandwidth among flows Develop new sufficient conditions for f dec (r) & f inc (r) –Exploit differences in packet marking rates across flows to relax conditions Requires novel time-based formulation

8 HP Labs 8 IEEE Infocom 2003 Avoiding Congested State Steady state: flow rate oscillates around optimal value in alternating phases of rate decrease and increase Want to avoid time in congested state Magnitude of response to marked ACK is larger or equal to magnitude of response to unmarked ACK Congestion Avoidance Condition: f inc (f dec (r))  r

9 HP Labs 9 IEEE Infocom 2003 Fairness Convergence [Chiu/Jain 1989][Bansal/Balakrishnan 2001] developed convergence conditions assuming all flows receive feedback and adjust rates synchronously –Each increase/decrease cycle must improve fairness Observation: In congested state, the mean number of marked packets for a flow is proportional to the flow rate. –bias promotes flow rate fairness  Enables weaker fairness convergence condition  Benefit: fairness with faster rate recovery

10 HP Labs 10 IEEE Infocom 2003 Fairness Convergence Relax condition: rate decrease-increase cycles need only maintain fairness in the synchronous case –If two flows receive marks, lower rate flow should recover earlier than or in the same time as higher rate flow t F inc (t) time rate.. r f dec (r) T rec (r) decrease step Rate as function of time (in absence of marks) Fairness Convergence Condition: T rec (r1)  T rec (r2) for r1 < r2

11 HP Labs 11 IEEE Infocom 2003 Maximizing Bandwidth Utilization Goal: as flows depart, remaining flows should recover rate quickly to maximize utilization Fastest recovery: use limiting cases of conditions –Congestion Avoidance Condition f inc (f dec (r))  r Use f inc (f dec (r)) = r for minimum rate R min –Fairness Convergence Condition T rec (r1)  T rec (r2) Use T rec (r1) = T rec (r2) for higher rates Maximum Bandwidth Utilization Condition: T rec (r) = 1/ R min for all r

12 HP Labs 12 IEEE Infocom 2003 Design Methodology: Choose f dec (r), find f inc (r) satisfying conditions Use f dec (r) to derive F inc (t): F inc (t) = f dec (F inc (t + T rec )), T rec =1/R min Use F inc (t) to find f inc (r): f inc (r ) = F inc (t r +1/r) where F inc (t r ) = r t F inc (t) time rate r f dec (r) T rec =1/R min decrease step t F inc (t) time rate r f inc (r) 1/r increase step trtr

13 HP Labs 13 IEEE Infocom 2003 New Response Functions Fast Increase Multiplicative Decrease (FIMD): –Decrease function: f dec fimd (r) = r/m, constant m>1 (same as AIMD) –Increase function: f inc fimd (r) = r · m Rmin/r –Much faster rate recovery than AIMD Linear Inter-Packet Delay (LIPD): –Decrease function: increases inter-packet delay (ipd) by 1 packet transmission time r = R max /(ipd+1) –Increase function: f inc lipd (r) = r/(1- R min /R max ) –Large decreases at high rate, small decreases at low rate Simple Implementation: e.g., table lookup

14 HP Labs 14 IEEE Infocom 2003 Increase Behavior Over Time : FIMD, AIMD, LIPD 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2K10K20K30K40K50K60K 65K normalized rate time (units of packet transmission time) FIMD (m=2) AIMD (m=2) LIPD F inc (t)

15 HP Labs 15 IEEE Infocom 2003 Performance: Source Response Functions root link (RL) inter-switch link (IL) local flows (LF) remote flows (RF) RLIL 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4567891011 normalized rate buffer size (packets) LIPD 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4567891011 normalized rate buffer size (packets) FIMD 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4567891011 normalized rate buffer size (packets) AIMD

16 HP Labs 16 IEEE Infocom 2003 Conclusions Proposed/Evaluated congestion control approach appropriate for unique characteristics of SANs such as InfiniBand –ECN applicable to modern input-queued switches –Source response: rate control w/ window limit Derived new relaxed conditions for source response function convergence  functions with fast bandwidth reclamation –Based on observation of packet marking bias –Two examples: FIMD/LIPD outperform AIMD Future extensions: –Hybrid window-rate control (allow w > 1) –Evaluation with richer traffic patterns/topologies

17 HP Labs 17 IEEE Infocom 2003


Download ppt "HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs."

Similar presentations


Ads by Google