CQRD: A Switch-based Approach to Flow Interference in Data Center Networks Guo Chen Dan Pei, Youjian Zhao Tsinghua University, Beijing, China.

Slides:



Advertisements
Similar presentations
Finishing Flows Quickly with Preemptive Scheduling
Advertisements

B 黃冠智.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
Congestion control in data centers
1 Modeling and Emulation of Internet Paths Pramod Sanaga, Jonathon Duerig, Robert Ricci, Jay Lepreau University of Utah.
1 Congestion Control. Transport Layer3-2 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for.
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Information-Agnostic Flow Scheduling for Commodity Data Centers
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong RAPIER: Integrating Routing and Scheduling for Coflow-aware Data Center.
Spring 2004 EE4272 The Need of Spanning Tree Algorithm.
Reduced TCP Window Size for Legacy LAN QoS Niko Färber July 26, 2000.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Ns Simulation Final presentation Stella Pantofel Igor Berman Michael Halperin
ICTCP: Incast Congestion Control for TCP in Data Center Networks∗
Practical TDMA for Datacenter Ethernet
Multicast Congestion Control in the Internet: Fairness and Scalability
IA-TCP A Rate Based Incast- Avoidance Algorithm for TCP in Data Center Networks Communications (ICC), 2012 IEEE International Conference on 曾奕勳.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.
Congestion Control Ian Colloff LWG San Francisco September 25, 2006.
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Department of Computer Science A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08 Reporter:
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 2:00-3:00 PM.
Analysis of TCP Latency over Wireless Links Supporting FEC/ARQ-SR for Error Recovery Raja Abdelmoumen, Mohammad Malli, Chadi Barakat PLANETE group, INRIA.
Self-generated Self-similar Traffic Péter Hága Péter Pollner Gábor Simon István Csabai Gábor Vattay.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
Shuihai Hu, Wei Bai, Kai Chen, Chen Tian (NJU), Ying Zhang (HP Labs), Haitao Wu (Microsoft) Sing Hong Kong University of Science and Technology.
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
ICTCP: Incast Congestion Control for TCP in Data Center Networks By: Hilfi Alkaff.
VL2: A Scalable and Flexible Data Center Network
Incast-Aware Switch-Assisted TCP Congestion Control for Data Centers
Resilient Datacenter Load Balancing in the Wild
6.888 Lecture 5: Flow Scheduling
How I Learned to Stop Worrying About the Core and Love the Edge
Data Center Network Architectures
OTCP: SDN-Managed Congestion Control for Data Center Networks
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
Improving Datacenter Performance and Robustness with Multipath TCP
Congestion-Aware Load Balancing at the Virtual Edge
Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan
Congestion Control in Software Define Data Center Network
TCP in Mobile Ad-hoc Networks
Providing QoS through Active Domain Management
AMP: A Better Multipath TCP for Data Center Networks
VL2: A Scalable and Flexible Data Center Network
Congestion Control, Quality of Service, & Internetworking
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Data Center Architectures
SICC: SDN-Based Incast Congestion Control For Data Centers Ahmed M
Centralized Arbitration for Data Centers
TCP Congestion Control
Congestion-Aware Load Balancing at the Virtual Edge
Lecture 17, Computer Networks (198:552)
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
AMP: An Adaptive Multipath TCP for Data Center Networks
Presentation transcript:

CQRD: A Switch-based Approach to Flow Interference in Data Center Networks Guo Chen Dan Pei, Youjian Zhao Tsinghua University, Beijing, China

The Problem Flow interference dramatically increases the flow completion time (FCT) of short delay-sensitive flows in data center networks (DCN) 2

Flow Interference Short delay-sensitive flows (majority in DCN) have to wait a long time at switches for buffer and bandwidth resources occupied by a few of long bandwidth-greedy flows (e.g., backup, replication)Short delay-sensitive flows (majority in DCN) have to wait a long time at switches for buffer and bandwidth resources occupied by a few of long bandwidth-greedy flows (e.g., backup, replication) 3

Flow Interference Short delay-sensitive flows (majority in DCN) have to wait a long time at switches for buffer and bandwidth resources occupied by a few of long bandwidth-greedy flows (e.g., backup, replication)Short delay-sensitive flows (majority in DCN) have to wait a long time at switches for buffer and bandwidth resources occupied by a few of long bandwidth-greedy flows (e.g., backup, replication) Caused by coarse Output Queue (OQ) switch queue management schemeCaused by coarse Output Queue (OQ) switch queue management scheme 4

Transport Layer Rate Control:Transport Layer Rate Control: DCTCP [SIGCOMM’10]DCTCP [SIGCOMM’10] HULL [NSDI’12]HULL [NSDI’12] D 2 TCP [SIGCOMM’12]D 2 TCP [SIGCOMM’12] D 3 [SIGCOMM’11]D 3 [SIGCOMM’11] Preemptive Flow Scheduling:Preemptive Flow Scheduling: PDQ [SIGCOMM’12]PDQ [SIGCOMM’12] pFabric [SIGCOMM’13]pFabric [SIGCOMM’13] Prior solutions Modification to end host and/or switch hardware New protocol stack and switch hardware 5

Intuition of CQRD Tackling the root cause of flow interference: Need a more fine-grained queue management scheme 6

The Goal Goal:Goal: Alleviate flow interferenceAlleviate flow interference Reduce FCT of short delay-sensitive flowsReduce FCT of short delay-sensitive flows Maintain high goodput of long bandwidth-greedy flowsMaintain high goodput of long bandwidth-greedy flows Objectives:Objectives: Transparent to end hostTransparent to end host No modification to protocol stackNo modification to protocol stack Based on underlying techniques available in commodity productionsBased on underlying techniques available in commodity productions 7

Our Solution CQRD: A fine-grained switch queue management scheme to flow interference 8

A GENDA 9

Toy Example: Flow Interference in OQ Switch NS2 simulation parameters:NS2 simulation parameters: Link capacity=10Gbps, Link delay=4us, Total buffer size=288KB, TCP initial window size=4, TCP initial RTO=200us.Link capacity=10Gbps, Link delay=4us, Total buffer size=288KB, TCP initial window size=4, TCP initial RTO=200us. 8x8 switch connected to host 1-8, Host 1-5 sending 10KB TCP flow to host 8, Host 6-7 sending 100MB TCP flow to host 88x8 switch connected to host 1-8, Host 1-5 sending 10KB TCP flow to host 8, Host 6-7 sending 100MB TCP flow to host 8 10

Toy Example: Flow Interference in OQ Switch FCT Goodput Short flows completed in ~100ms Goodput of short flows collapse 11

Toy Example: Flow Interference in OQ Switch FCT Goodput Short flows completed in ~100ms Goodput of short flows collapse Interfered by these 2 long flows 12

Toy Example: Flow Interference in OQ Switch FCT Goodput Short flows completed in ~100ms Goodput of short flows collapse Interfered by these 2 long flows Unfairly served 13

A GENDA 14

CQRD Design Crosspoint-QueueCrosspoint-Queue 15

CQRD Design Crosspoint-QueueCrosspoint-Queue Eliminating interference between flows on different switch pathsEliminating interference between flows on different switch paths (Output-Contending but not Path-Contending, OC-PC) 16

CQRD Design Crosspoint-QueueCrosspoint-Queue Eliminating interference between flows on different switch pathsEliminating interference between flows on different switch paths (Output-Contending but not Path-Contending, OC-PC) Separate buffer & Fair scheduling 17

CQRD Design Crosspoint-QueueCrosspoint-Queue Eliminating interference between flows on different switch pathsEliminating interference between flows on different switch paths (Output-Contending but not Path-Contending, OC-PC) Random-DropRandom-Drop Alleviate the flow interference within the same switch path (Path-Contending, PC)Alleviate the flow interference within the same switch path (Path-Contending, PC) 18

CQRD Design Crosspoint-QueueCrosspoint-Queue Eliminating interference between flows on different switch pathsEliminating interference between flows on different switch paths (Output-Contending but not Path-Contending, OC-PC) Random-DropRandom-Drop Alleviate the flow interference within the same switch path (Path-Contending, PC)Alleviate the flow interference within the same switch path (Path-Contending, PC) Occupy more buffer, more likely to be dropped 19

CQRD Design Crosspoint-QueueCrosspoint-Queue Eliminating interference between flows on different switch pathsEliminating interference between flows on different switch paths (Output-Contending but not Path-Contending, OC-PC) Random-DropRandom-Drop Alleviate the flow interference within the same switch path (Path-Contending, PC)Alleviate the flow interference within the same switch path (Path-Contending, PC) Occupy more buffer, more likely to be dropped 20

Toy Example: Flow Interference FCT Goodput 3 orders shorter FCT 3 orders higher goodput 21

Toy Example: Flow Interference FCT Goodput 3 orders shorter FCT 3 orders higher goodput Fairly served Almost no cost of goodput 22

Toy Example: Flow Interference FCT Goodput 3 orders shorter FCT 3 orders higher goodput Fairly served Almost no cost of goodput 23

A GENDA 24

Evaluation 1. How much FCT of short delay-sensitive flows is reduced in CQRD?1. How much FCT of short delay-sensitive flows is reduced in CQRD? 2. How much goodput of long bandwidth-greedy flows is sacrificed in CQRD?2. How much goodput of long bandwidth-greedy flows is sacrificed in CQRD? 25

Experiment 1 Single aggregation/core switch (ns2 simulations)Single aggregation/core switch (ns2 simulations) Simulation parameters:Simulation parameters: Link capacity=10Gbps, Link delay=4us, Total buffer size=5MB, TCP initial window size=4, TCP initial RTO=200us.Link capacity=10Gbps, Link delay=4us, Total buffer size=5MB, TCP initial window size=4, TCP initial RTO=200us. Traffic:Traffic: 1200 TCP flows, Flow size & inter-arrival time from realistic distributions, Random source & destination port1200 TCP flows, Flow size & inter-arrival time from realistic distributions, Random source & destination port 26

Single aggregation/core switch FCT of all short flows ( 100KB) interfered by the giant flows (> 1MB, included by large flows) at moderate load (0.1). 27

Single aggregation/core switch FCT of all short flows ( 100KB) interfered by the giant flows (> 1MB, included by large flows) at moderate load (0.1). ~36% lower ~7% lower ~28% lower ~4% lower 28

Experiment 2 Multi-stage DCN switching fabric (ns2 simulations)Multi-stage DCN switching fabric (ns2 simulations) Simulation parameters:Simulation parameters: Link delay=2us, Agg switch buffer size=5MB, ToR switch buffer size=4MB, TCP initial window size=4, TCP initial RTO=200us.Link delay=2us, Agg switch buffer size=5MB, ToR switch buffer size=4MB, TCP initial window size=4, TCP initial RTO=200us. Traffic:Traffic: 2000 TCP flows, realistic distributions; ECMP load-balancing schemes2000 TCP flows, realistic distributions; ECMP load-balancing schemes 29

Single aggregation/core switch ~14% lower ~30% lower ~2.5% lower ~same FCT of all short flows ( 100KB) interfered by the giant flows (> 1MB, included by large flows) at moderate load (0.1). 30

A GENDA 31

Conclusion Tackling the root cause of flow interference:Tackling the root cause of flow interference: Need a more fine-grained queue management schemeNeed a more fine-grained queue management scheme Simple solution: CQRD—switch queue management schemeSimple solution: CQRD—switch queue management scheme Transparent to end hostTransparent to end host No modification to protocol stackNo modification to protocol stack Based on underlying techniques available in commodity productionsBased on underlying techniques available in commodity productions Reduces the FCT of short flows by 20-44% in a single switch and 8-30% in a multi-stage data center switch networkReduces the FCT of short flows by 20-44% in a single switch and 8-30% in a multi-stage data center switch network At the cost of a minor goodput decrease for large flowsAt the cost of a minor goodput decrease for large flows 32

T HANK Y OU