Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the Correct Sizing on Meshes Through an Effective Congestion Management Strategy P. J. García 1, J. Flich 2, J. Duato 2, I. Johnson 3, F. J. Quiles.

Similar presentations


Presentation on theme: "On the Correct Sizing on Meshes Through an Effective Congestion Management Strategy P. J. García 1, J. Flich 2, J. Duato 2, I. Johnson 3, F. J. Quiles."— Presentation transcript:

1 On the Correct Sizing on Meshes Through an Effective Congestion Management Strategy P. J. García 1, J. Flich 2, J. Duato 2, I. Johnson 3, F. J. Quiles 1, F. Naven 3 2 Technical University of Valencia Valencia, Spain 3 Xyratex Havant, UK 1 University of Castilla-La Mancha Albacete, Spain Euro-Par 200530 August - 2 SeptemberLisboa, Portugal

2 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 2 Outline Introduction Congestion and HOL blocking Why HOL blocking affects network sizing? HOL blocking elimination techniques RECN Performance evaluation Conclusions

3 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 3 Introduction PC Clusters: Alternative to massive parallel computers Current use: –High Performance Computing Systems (HPC) –Internet Servers –Storage Area Network (SANs) Usually based on high-speed interconnection networks High-speed interconnection networks: Myrinet, Infiniband, Quadrics, Advanced Switching… Main features: High bandwidth, Low latencies Additional features: Lossless networks, Flexible topology Network performance may be affected by congestion

4 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 4 Network contention: Several packets request the same output port One makes progress, the others wait Network congestion: Persistent network contention It is quickly propagated by flow control (lossless nets) Network performance degrades dramatically Head of line (HOL) blocking: When the first packet in a queue is blocked, any other packet in the same queue is also blocked, even if it will request available resources Congestion and HOL Blocking

5 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 5 Congestion and HOL Blocking Network contention

6 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 6 Congestion and HOL Blocking Persistent network contention

7 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 7 Congestion and HOL Blocking Persistent network contention Flow control

8 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 8 Congestion and HOL Blocking Persistent network contention Congestion propagates

9 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 9 Congestion and HOL Blocking Congestion introduces HOL blocking, and this may degrade network performance dramatically 33% HOL 33% 33% 100% 33% 100%

10 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 10 Why HOL blocking affects network sizing? Network size restricted by: Required system bandwidth: Network offered bandwidth must meet the system traffic conditions Components cost: Recent interconnects (Myrinet, InfiniBand, ASI) are expensive compared to processors Power consumption: As network size increases, higher power consumption, higher heat dissipation Other constraints: Topology, Links per switch, etc. Even if the network is correctly sized, HOL blocking may prevent to reach the expected performance

11 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 11 Why HOL blocking affects network sizing? Example 1: Reducing cost and consumption A solution is to reduce the number of network components Link utilization increases Low link utilization High link utilization High cost and consumption High congestion and HOL blocking probability

12 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 12 Why HOL blocking affects network sizing? Example 2: Increasing network bandwidth A solution is to add as network components as necessary Cost, consumption and length of routes increase Greater length of routes Small length of routes Greater HOL blocking probability when congested Low offered bandwidth

13 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 13 HOL blocking elimination/reduction techniques DAMQs and Virtual Channels not efficient for multihop networks VOQ (Virtual Output Queueing) VOQ at switch level scales but does not eliminate HOL blocking VOQ at network level: A separate queue at every input port for every destination Number of required resources scales at least quadratically with network size !!! Credit Flow Controlled ATM References congestion to network output only Consumes large number of buffers: A separate queue at every output port for every destination

14 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 14 RECN: Regional Explicit Congestion Notification RECN is a new efficient and scalable congestion management technique Basic ideas: The real problem is not the congestion, but its negative effects (HOL blocking) By eliminating HOL blocking, congestion becomes harmless Non-congested flows do not introduce significant HOL blocking HOL blocking elimination: Packets belonging to congested flows are stored in specific Set Aside Queues (SAQs) Packets belonging to non-congested flows are stored in a common queue Implementation requirements: Deterministic source routing A reduced number of SAQs per port, controlled by a CAM

15 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 15 How RECN works RECN basic procedure: Congested points are detected in any egress or ingress switch port of the network The routes to detected congested points are progressively notified to ingress and egress ports crossed by congested flows After receiving a notification, a port allocates a SAQ for the detected congested point A packet arriving to a port will be stored in a SAQ if it will pass through the congested point associated to that SAQ A packet arriving at a port will be stored in the common queue if its route does not match any SAQ SAQs can be deallocated, and later allocated for other congested points

16 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 16 A congestion point forms How RECN Works

17 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 17 How RECN Works Cold queue fills over a threshold

18 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 18 How RECN Works

19 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 19 How RECN Works Internal notification to each input port sending packets to the output port

20 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 20 How RECN Works

21 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 21 How RECN Works Input ports allocate a new SAQ for packets addressed to the congested output port

22 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 22 How RECN Works

23 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 23 How RECN Works Notification sent when the SAQ fills over a threshold

24 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 24 How RECN Works

25 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 25 How RECN Works A new SAQ allocated for the congested port at each output port

26 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 26 How RECN Works Internal notification when the SAQ fills over A threshold

27 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 27 How RECN Works The input port allocates A new SAQ

28 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 28 How RECN Works At the end, the congestion tree builds and is mapped entirely onto SAQs

29 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 29 Performance Evaluation Two network-sizing scenarios considered: Network cost and consumption reduction: –Network is downsized, keeping constant the number of total system endnodes Network bandwidth increase: –Network size is increased, keeping constant the number of endnodes per switch Evaluation based on simulation results Evaluation metric: Network relative throughput when using: –RECN –VOQ at network level (VOQnet) –VOQ at switch level (VOQsw)

30 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 30 Simulation Model Simulation assumptions: Mesh topologies Deterministic routing (X-Y) 128 KB memories at ingress/egress ports Multiplexed crossbar (BW=12 Gbps) Serial full-duplex pipelined links (BW=8 Gbps) 64-byte packets Credit-based and Xon-Xoff (for SAQs) flow control Maximum of 8 SAQs at ingress/egress ports (RECN)

31 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 31 Network Configurations Five different mesh-based network configurations: Network configuration Mesh topology SwitchesEndnodes per switch #116 x 162561 #28 x 8644 #34 x 416 #48 x 8641 #54 x 4161

32 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 32 Traffic Load Three different synthetic traffic patterns: Normal trafficCongestion tree Traffic case#SourcesDestination#SourcesDestination #1100%Random-- #287.5%Random12.5%hot-spot #375%Random25%hot-spot

33 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 33 Network Cost and Consumption Reduction Relative throughput: Traffic case 1 256-endnodes networks (Network configurations 1,2,3) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 2) 4 x 4 switches network (Conf. 3)

34 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 34 Network Cost and Consumption Reduction Relative throughput: Traffic case 2 256-endnodes networks (Network configurations 1,2,3) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 2) 4 x 4 switches network (Conf. 3)

35 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 35 Network Cost and Consumption Reduction Relative throughput: Traffic case 3 256-endnodes networks (Network configurations 1,2,3) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 2) 4 x 4 switches network (Conf. 3)

36 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 36 Network Bandwidth Increase Relative throughput: Traffic case 1 1 endnode/switch networks (Network configurations 1,4,5) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 4) 4 x 4 switches network (Conf. 5)

37 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 37 Network Bandwidth Increase Relative throughput: Traffic case 2 1 endnode/switch networks (Network configurations 1,4,5) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 4) 4 x 4 switches network (Conf. 5)

38 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 38 Network Bandwidth Increase Relative throughput: Traffic case 3 1 endnode/switch networks (Network configurations 1,4,5) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 4) 4 x 4 switches network (Conf. 5)

39 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 39 Network Bandwidth Increase Maximum number of SAQs used by RECN: Traffic case 3 1 endnode/switch networks (Network configurations 1,4,5) 16 x 16 switches network (Conf. 1) 8 x 8 switches network (Conf. 4) 4 x 4 switches network (Conf. 5)

40 Title: On the Correct Sizing on Meshes Through an Effective Congestion Management Strategyy Conference: Euro-Par 200530 August-2 SeptemberLisboa, Portugal 40 Conclusions HOL blocking may affect the performance of networks dimensioned with different restrictions We have analyzed the importance of using an efficient HOL blocking elimination strategy We have shown that RECN allows to size the network in any way while keeping network performance at the expected maximum RECN only requires a small number of SAQs for a wide range of network sizes, so it is a scalable strategy

41 On the Correct Sizing on Meshes Through an Effective Congestion Management Strategy P. J. García 1, J. Flich 2, J. Duato 2, I. Johnson 3, F. J. Quiles 1, F. Naven 3 2 Technical University of Valencia Valencia, Spain 3 Xyratex Havant, UK 1 University of Castilla-La Mancha Albacete, Spain Euro-Par 200530 August - 2 SeptemberLisboa, Portugal


Download ppt "On the Correct Sizing on Meshes Through an Effective Congestion Management Strategy P. J. García 1, J. Flich 2, J. Duato 2, I. Johnson 3, F. J. Quiles."

Similar presentations


Ads by Google