Resilient Datacenter Load Balancing in the Wild

Slides:



Advertisements
Similar presentations
Traffic Engineering with Forward Fault Correction (FFC)
Advertisements

Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Jaringan Komputer Lanjut Packet Switching Network.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
Leveraging Multiple Network Interfaces for Improved TCP Throughput Sridhar Machiraju, Prof. Randy Katz.
Leveraging Multiple Network Interfaces for Improved TCP Throughput Sridhar Machiraju SAHARA Retreat, June 10-12, 2002.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
Energy-Efficient Design Some design issues in each protocol layer Design options for each layer in the protocol stack.
RRAPID: Real-time Recovery based on Active Probing, Introspection, and Decentralization Takashi Suzuki Matthew Caesar.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Information-Agnostic Flow Scheduling for Commodity Data Centers
CONGA: Distributed Congestion-Aware Load Balancing for Datacenters
A Scalable, Commodity Data Center Network Architecture.
Practical TDMA for Datacenter Ethernet
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
Improving QoS Support in Mobile Ad Hoc Networks Agenda Motivations Proposed Framework Packet-level FEC Multipath Routing Simulation Results Conclusions.
On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.
MaxNet NetLab Presentation Hailey Lam Outline MaxNet as an alternative to TCP Linux implementation of MaxNet Demonstration of fairness, quick.
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
Presto: Edge-based Load Balancing for Fast Datacenter Networks
Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly.
1 1 July 28, Goal of this session is too have a discussion where we learn about the relevant data to help us understand the problem and design.
Data Center Load Balancing T Seminar Kristian Hartikainen Aalto University, Helsinki, Finland
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
Revisiting Transport Congestion Control Jian He UT Austin 1.
Shuihai Hu, Wei Bai, Kai Chen, Chen Tian (NJU), Ying Zhang (HP Labs), Haitao Wu (Microsoft) Sing Hong Kong University of Science and Technology.
Delay-based Congestion Control for Multipath TCP Yu Cao, Mingwei Xu, Xiaoming Fu Tsinghua University University of Goettingen.
1 ICCCN 2003 Modelling TCP Reno with Spurious Timeouts in Wireless Mobile Environments Shaojian Fu School of Computer Science University of Oklahoma.
Chen Qian, Xin Li University of Kentucky
HULA: Scalable Load Balancing Using Programmable Data Planes
6.888 Lecture 5: Flow Scheduling
5/3/2018 3:51 AM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
How I Learned to Stop Worrying About the Core and Love the Edge
Architecture and Algorithms for an IEEE 802
Data Center Network Architectures
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
How I Learned to Stop Worrying About the Core and Love the Edge
ECE 544: Traffic engineering (supplement)
TCP-in-UDP draft-welzl-irtf-iccrg-tcp-in-udp-00.txt
Data Center Networks and Switching and Queueing
Improving Datacenter Performance and Robustness with Multipath TCP
Congestion-Aware Load Balancing at the Virtual Edge
Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan
11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
Augmenting Proactive Congestion Control with Aeolus
SDN Based IoT-Cloud Comm.
So far, On the networking side, we looked at mechanisms to links hosts using direct linked networks and then forming a network of these networks. We introduced.
Multi-hop Coflow Routing and Scheduling in Data Centers
AMP: A Better Multipath TCP for Data Center Networks
COS 561: Advanced Computer Networks
Fast Congestion Control in RDMA-Based Datacenter Networks
Modeling and Taming Parallel TCP on the Wide Area Network
Congestion-Aware Load Balancing at the Virtual Edge
Lecture 17, Computer Networks (198:552)
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Chapter 3 outline 3.1 Transport-layer services
QoS routing Finding a path that can satisfy the QoS requirement of a connection. Achieving high resource utilization.
Review of Internet Protocols Transport Layer
2019/10/9 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Jin-Li Ye, Yu-Huang Chu, Chien Chen.
Jennifer Rexford Princeton University
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Towards Predictable Datacenter Networks
Presentation transcript:

Resilient Datacenter Load Balancing in the Wild Hong Zhang1 Junxue Zhang1, Wei Bai1, Kai Chen1, Mosharaf Chowdhury2

Difficult because datacenters are filled with uncertainties Background Datacenter networks --- multi-rooted trees (e.g., Fat-tree, Leaf-spine) Multiple-paths between each end host pair Precise load balancing is required Difficult because datacenters are filled with uncertainties Switch & server icon source: CONGA [SIGCOMM’14]

Uncertainties in Datacenter Networks Traffic dynamics Congestions can quickly arise at any place

Uncertainties in Datacenter Networks Asymmetries Link cuts Heterogenous devices 40G Spine Link cut 10G Leaf

Uncertainties in Datacenter Networks Switch Failures Packet blackholes: drop packets with certain patterns deterministically; Silent random packet drops: drops packets randomly at a high rate; ‘Gray failure’ Spine Leaf

How to effectively and appropriately load balance traffic? Uncertainties in Datacenter Networks Uncertainties: traffic dynamics, asymmetries, switch failures ‘Gray failure’ Link cut Spine Leaf . How to effectively and appropriately load balance traffic?

Sensing Uncertainties Reacting to Uncertainties efficiently sense congestion & failures Sensing Uncertainties Prior arts have important drawbacks in both appropriately split traffic among parallel paths in reaction to uncertainties Reacting to Uncertainties

Sensing Uncertainties --- Current Practice Sensing Congestion Congestion-oblivious ECMP, RPS[INFOCOM’09], DRB[CoNEXT’13], Presto[SIGCOMM’15] Congestion-aware Switch-based CONGA[SIGCOMM’14], HULA[SOSR’16], DRILL*[SIGCOMM’17] End host-based CLOVE-ECN[HotNets’16] Sensing Failures Most current solutions do not sense failures Poor under asymmetry Advanced hardware Limited visibility

Problem of Being Failure-ignorant Path S0 S1 L1 5 Dest Leaf Random drop S0 L0 L1 S1

Problem of Being Failure-ignorant Path S0 S1 L1 2 5 Dest Leaf Random drop S0 1 L0 L1 S1

Even worse than ECMP under failures Problem of Being Failure-ignorant Path S0 S1 L1 2 Dest Leaf Random drop S0 L0 L1 S1 Even worse than ECMP under failures

Reacting to Uncertainties --- Current Practice Problem of flowlet switching --- CONGA[SIGCOMM’14], CLOVE[HotNets’16], LetFlow[NSDI’17], … Flowlet gap Passive and conservative in order to preserve packet orders

Reacting to Uncertainties --- Current Practice Problem of flowlet switching Flow A, B finish Time P1 P2 Ideal Flow C, D finish L0 L1 P1 P2 A B C D Flows Flow C reroute from P2 to P1 Flow A, B finish Time P1 P2 CONGA (flowlet) + DCTCP Flow C, D finish Cannot find a flowlet gap Cannot always timely react to uncertainties

Reacting to Uncertainties --- Current Practice Problem of vigorous rerouting Packet reordering Congestion mismatch What is congestion mismatch? Congestion control: adjust rates based on the congestion of the current path; With vigorous rerouting: congestion states of different paths are mixed together; Congestion on one path may be mistakenly used to adjust the rate on another path

Reacting to Uncertainties --- Current Practice Example of congestion mismatch Sending rate keeps increasing 10 flowcells S0 10G L0 L1 10G 10G 1 flowcell Flow A DCTCP Start with high sending rate 1G 1G fix sized data units Flowcell (Presto[SIGCOMM’15] ) S1

Reacting to Uncertainties --- Current Practice Example of congestion mismatch 10 flowcells Start with low sending rate S0 10G L0 L1 10G 10G 1 flowcell Flow A DCTCP 1G 1G Rate reduce greatly S1

Reacting to Uncertainties --- Current Practice Example of congestion mismatch 10 flowcells Cannot fully utilize 10Gbps S0 10G L0 L1 10G 10G 1 flowcell Flow A DCTCP 1G Severe queue build-up 1G S1 Congestion mismatch leads to performance loss

Q: Can we design a resilient load balancing scheme that can gracefully handle all these uncertainties? Comprehensiveness: effectively detect congestion and failures Timeliness: quickly react to uncertainties Transport-friendliness: limited impact of reordering and congestion mismatch Deployability: implementable with commodity hardware Hermes

Hermes in One Slide Endhost-based --- No hardware/kernel modification Comprehensive Sensing Leveraging Transport-layer signals & events Active probing with small costs Hypervisor Network Traffic (Re)Routing Module When & Where to reroute? (Re)Route Feed Active Probing Sensing Congestion Probe Sensing Module Sensing Failures Trigger Timely yet Cautious Rerouting Explicitly consider both the cost and gain of rerouting

Comprehensive Sensing Idea 1: Leveraging transport-level signals & events Sensing Congestion ECN and RTT ----- widely used in congestion control, directly observable Sensing Failures Packet blackhole Random packet drop Failed paths --- Frequent timeout ---- Frequent retransmission

Sacrifice some visibility for much smaller probing overhead Comprehensive Sensing Idea 1: Leveraging transport-level signals & events Sensing Congestion ECN and RTT ----- widely used in congestion control, directly observable Sensing Failures Packet blackhole Random packet drop Failed paths --- Frequent timeout ---- Frequent retransmission Idea 2: Improving visibility via active probing Baseline Probe all paths for all endhost pairs Power of 2 Choices Probe 2 random + 1 previous best path Sacrifice some visibility for much smaller probing overhead

Timely yet Cautious Rerouting When to reroute? Flowlet-switching: too conservative for timely reaction Vigorous-switching: too aggressive to be transport-friendly Can we achieve a better trade-off by explicitly considering both the cost and gain of rerouting? A new angle: utility-based rerouting ----- reroute when it is likely to be beneficial final performance vs. intermediate consequences Estimated based on both path conditions and flow status obtained from comprehensive sensing.

Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Rate R2 Do not reroute R1 0.5R1 Remaining size = R1 × T1 T2 T1 Time Motivation for timely rerouting

Quick reaction to uncertainties Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Rate T2 R2 Reroute 0.5R1 Do not reroute R1 Remaining size = R1 × T1 T2 T1 Time Motivation for timely rerouting Rerouting can be beneficial even with packet reordering; Reroute immediately as long as it is likely to reduce flow completion time. Quick reaction to uncertainties

Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Rate T2 R2 Reroute 0.5R1 R2 Estimation Error Reroute 0.5R1 T2 Do not reroute R1 Remaining size = R1 × T1 T1 Time Heuristics for cautious rerouting Reroute only if new path is notably better (in terms of ECN&RTT);

Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Time Rate R1 T1 Remaining size = R1 × T1 Do not reroute T2 R2 Reroute 0.5R1 Heuristics for cautious rerouting Reroute only if new path is notably better (in terms of ECN&RTT); Avoid rerouting flows with small remaining size;

Limited impact of congestion mismatch and packet reordering Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Time Rate R1 T1 Remaining size = R1 × T1 Do not reroute R2 R’2 Heuristics for cautious rerouting Reroute only if new path is notably better (in terms of ECN&RTT); Avoid rerouting flows with small remaining size; Avoid rerouting flows with high sending rate R1; Limited impact of congestion mismatch and packet reordering

Evaluation Settings Workload Transport Protocol Web-search Data-mining Transport Protocol DCTCP Large Scale Simulations 128 servers, 16 switches 8X8 Leaf Spine with 2:1 oversubscription ratio Testbed Evaluations 12 servers, 4 switches 2X2 leaf spine with 3:2 oversubscription ratio

Switch-based solution has better visibility to congestion Evaluation Results Hermes under baseline topology (8*8 leaf-spine) Switch-based solution has better visibility to congestion More heavy tailed, less bursty, thus more difficult to create flowlet gaps Web-Search Workload Outperforms ECMP by up to 55% Within 17% of CONGA Data-Mining Workload 29% better than ECMP at high load slightly outperform CONGA (up to 4%)

Evaluation Results Hermes under asymmetric case (data-mining workload) Reduce the capacity from 10Gbps to 2Gbps for 20% of randomly selected leaf-to-spine links (Weighted) Presto*: congestion-oblivious, thus not efficient against asymmetry LetFlow & CLOVE-ECN: Hermes has better visibility and more timely reaction CONGA: Hermes can more timely resolve collisions of large flows on 2Gbps links

Outperform other schemes by over 32% Evaluation Results Hermes under switch failures Silent random packet drops Packet blackhole Outperform other schemes by over 32%

Conclusion Sensing Reacting Datacenter is filled with uncertainties Hermes: a resilient load balancing scheme that gracefully handles uncertainties. Readily-deployable at end hosts Sensing Reacting Congestion & failure-aware Timely & Cautious rerouting Improved visibility

Thank You!