Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

Finishing Flows Quickly with Preemptive Scheduling
B 黃冠智.
Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Lecture 18: Congestion Control in Data Center Networks 1.
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Presented by Shaddi.
CSIT560 Internet Infrastructure: Switches and Routers Active Queue Management Presented By: Gary Po, Henry Hui and Kenny Chong.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
Mohammad Alizadeh Adel Javanmard and Balaji Prabhakar Stanford University Analysis of DCTCP:Analysis of DCTCP: Stability, Convergence, and FairnessStability,
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Bertha & M Sadeeq.  Easy to manage the problems  Scalability  Real time and real environment  Free data collection  Cost efficient  DCTCP only covers.
Congestion control in data centers
1 Modeling and Emulation of Internet Paths Pramod Sanaga, Jonathon Duerig, Robert Ricci, Jay Lepreau University of Utah.
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
RAP: An End-to-End Rate-Based Congestion Control Mechanism for Realtime Streams in the Internet Reza Rejai, Mark Handley, Deborah Estrin U of Southern.
1 TCP-LP: A Distributed Algorithm for Low Priority Data Transfer Aleksandar Kuzmanovic, Edward W. Knightly Department of Electrical and Computer Engineering.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
TCP Congestion Control
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Information-Agnostic Flow Scheduling for Commodity Data Centers
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall,
Practical TDMA for Datacenter Ethernet
Mohammad Alizadeh Stanford University Joint with: Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, Masato Yasuda HULL: High bandwidth, Ultra Low-Latency.
Lecture 22 Page 1 Advanced Network Security Other Types of DDoS Attacks Advanced Network Security Peter Reiher August, 2014.
IA-TCP A Rate Based Incast- Avoidance Algorithm for TCP in Data Center Networks Communications (ICC), 2012 IEEE International Conference on 曾奕勳.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
15-744: Computer Networking L-14 Data Center Networking II.
TCP Throughput Collapse in Cluster-based Storage Systems
TFRC: TCP Friendly Rate Control using TCP Equation Based Congestion Model CS 218 W 2003 Oct 29, 2003.
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
MaxNet NetLab Presentation Hailey Lam Outline MaxNet as an alternative to TCP Linux implementation of MaxNet Demonstration of fairness, quick.
Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011.
27th, Nov 2001 GLOBECOM /16 Analysis of Dynamic Behaviors of Many TCP Connections Sharing Tail-Drop / RED Routers Go Hasegawa Osaka University, Japan.
DCTCP & CoDel the Best is the Friend of the Good Bob Briscoe, BT Murari Sridharan, Microsoft IETF-84 TSVAREA Jul 2012 Le mieux est l'ennemi du bien Voltaire.
Requirements for Simulation and Modeling Tools Sally Floyd NSF Workshop August 2005.
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
Deadline-based Resource Management for Information- Centric Networks Somaya Arianfar, Pasi Sarolahti, Jörg Ott Aalto University, Department of Communications.
TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail.
XCP: eXplicit Control Protocol Dina Katabi MIT Lab for Computer Science
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
Revisiting Transport Congestion Control Jian He UT Austin 1.
15-744: Computer Networking L-14 Data Center Networking III.
Scalable Congestion Control Protocol based on SDN in Data Center Networks Speaker : Bo-Han Hua Professor : Dr. Kai-Wei Ke Date : 2016/04/08 1.
OverQos: An Overlay based Architecture for Enhancing Internet Qos L Subramanian*, I Stoica*, H Balakrishnan +, R Katz* *UC Berkeley, MIT + USENIX NSDI’04,
1 ICCCN 2003 Modelling TCP Reno with Spurious Timeouts in Wireless Mobile Environments Shaojian Fu School of Computer Science University of Oklahoma.
Data Center TCP (DCTCP)
Accelerating Peer-to-Peer Networks for Video Streaming
Incast-Aware Switch-Assisted TCP Congestion Control for Data Centers
Resilient Datacenter Load Balancing in the Wild
6.888 Lecture 5: Flow Scheduling
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
Router-Assisted Congestion Control
15-744: Computer Networking
Improving Datacenter Performance and Robustness with Multipath TCP
Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan
Lecture 19 – TCP Performance
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.
AMP: A Better Multipath TCP for Data Center Networks
Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.
RAP: Rate Adaptation Protocol
Fast Congestion Control in RDMA-Based Datacenter Networks
Iroko A Data Center Emulator for Reinforcement Learning
Lecture 16, Computer Networks (198:552)
Lecture 17, Computer Networks (198:552)
Modeling and Evaluating Variable Bit rate Video Steaming for ax
Presentation transcript:

Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar

Balajee Vamanan et al. Datacenters and OLDIs  OLDI = O n L ine D ata I ntensive applications  e.g., Web search, retail, advertisements  An important class of datacenter applications  Vital to many Internet companies OLDIs are critical datacenter applications

Balajee Vamanan et al. Challenges Posed by OLDIs Two important properties: 1)Deadline bound (e.g., 300 ms)  Missed deadlines affect revenue 2)Fan-in bursts  Large data, 1000s of servers  Tree-like structure (high fan-in)  Fan-in bursts  long “tail latency”  Network shared with many apps (OLDI and non-OLDI) Network must meet deadlines & handle fan-in bursts

Balajee Vamanan et al. Current Approaches TCP: deadline agnostic, long tail latency  Congestion  timeouts (slow), ECN (coarse) Datacenter TCP (DCTCP) [SIGCOMM '10]  first to comprehensively address tail latency  Finely vary sending rate based on extent of congestion  shortens tail latency, but is not deadline aware  ~25% missed deadlines at high fan-in & tight deadlines DCTCP handles fan-in bursts, but is not deadline-aware

Balajee Vamanan et al. Current Approaches Deadline Delivery Protocol (D 3 ) [SIGCOMM '11]:  first deadline-aware flow scheduling  Proactive & centralized  No per-flow state  FCFS  Many deadline priority inversions at fan-in bursts  Other practical shortcomings  Cannot coexist with TCP, requires custom silicon D 3 is deadline-aware, but does not handle fan-in bursts well; suffers from other practical shortcomings

Balajee Vamanan et al. D 2 TCP’s Contributions 1)Deadline-aware and handles fan-in bursts  Elegant gamma-correction for congestion avoidance  far-deadline  back off more near-deadline  back off less  Reactive, decentralized, state (end hosts) 2)Does not hinder long-lived (non-deadline) flows 3)Coexists with TCP  incrementally deployable 4)No change to switch hardware  deployable today D 2 TCP achieves 75% and 50% fewer missed deadlines than DCTCP and D 3

Balajee Vamanan et al. Outline  Introduction  OLDIs  D 2 TCP  Results: Small Scale Real Implementation  Results: At-Scale Simulation  Conclusion

Balajee Vamanan et al. OLDIs OLDI = O n L ine D ata I ntensive applications  Deadline bound, handle large data  Partition-aggregate  Tree-like structure  Root node sends query  Leaf nodes respond with data  Deadline budget split among nodes and network  E.g., total = 300 ms, parents-leaf RPC = 50 ms  Missed deadlines  incomplete responses  affect user experience & revenue

Balajee Vamanan et al. Long Tail Latency in OLDIs  Large data  High Fan-in degree  Fan-in bursts  Children respond around same time  Packet drops: Increase tail latency  Hard to absorb in buffers  Cause many missed deadlines  Current solutions either  Over-provision the network  high cost  Increase network budget  less compute time Current solutions are insufficient

Balajee Vamanan et al. Outline  Introduction  OLDIs  D 2 TCP  Results: Small Scale Real Implementation  Results: At-Scale Simulation  Conclusion

Balajee Vamanan et al. D 2 TCP Deadline-aware and handles fan-in bursts Key Idea: Vary sending rate based on both deadline and extent of congestion  Built on top of DCTCP  Distributed: uses per-flow state at end hosts  Reactive: senders react to congestion  no knowledge of other flows

Balajee Vamanan et al. D 2 TCP: Congestion Avoidance A D 2 TCP sender varies sending window (W) based on both extent of congestion and deadline Note: Larger p ⇒ smaller window. p = 1 ⇒ W/2. p = 0 ⇒ W/2 W := W * ( 1 – p / 2 ) P is our gamma correction function

Balajee Vamanan et al. D 2 TCP: Gamma Correction Function Gamma Correction (p) is a function of congestion & deadlines  α: extent of congestion, same as DCTCP’s α (0 ≤ α ≤ 1)  d: deadline imminence factor  “completion time with window (W)” ÷ “deadline remaining”  d 1 for near-deadline flows p = α d

Balajee Vamanan et al. Gamma Correction Function (cont.) Key insight: Near-deadline flows back off less while far-deadline flows back off more   d < 1 for far-deadline flows  p large  shrink window  d > 1 for near-deadline flows  p small  retain window  Long lived flows  d = 1   DCTCP behavior p 1.0 d = 1 d < 1 (far deadline) d > 1 (near deadline) α W := W * ( 1 – p / 2 ) Gamma correction elegantly combines congestion and deadlines far near p = α d d = 1

Balajee Vamanan et al. Gamma Correction Function (cont.)  α is calculated by aggregating ECN (like DCTCP)  Switches mark packets if queue_length > threshold  ECN enabled switches common  Sender computes the fraction of marked packets averaged over time Threshold

Balajee Vamanan et al. Gamma Correction Function (cont.)  The deadline imminence factor (d): “completion time with window (W)” ÷ “deadline remaining” (d = T c / D)  B  Data remaining, W  Current Window Size Avg. window size ~= 3⁄4 * W ⇒ T c ~= B ⁄ (3⁄4 * W) A more precise analysis in the paper!

Balajee Vamanan et al. D 2 TCP: Stability and Convergence  D 2 TCP’s control loop is stable  Poor estimate of d corrected in subsequent RTTs  When flows have tight deadlines (d >> 1) 1.d is capped at 2.0  flows not over aggressive 2.As α (and hence p) approach 1, D 2 TCP defaults to TCP  D 2 TCP avoids congestive collapse p = α d W := W * ( 1 – p / 2 )

Balajee Vamanan et al. D 2 TCP: Practicality  Does not hinder background, long-lived flows  Coexists with TCP  Incrementally deployable  Needs no hardware changes  ECN support is commonly available D 2 TCP is deadline-aware, handles fan-in bursts, and is deployable today

Balajee Vamanan et al. Outline  Introduction  OLDIs  D 2 TCP  Results: Real Implementation  Results: Simulation  Conclusion

Balajee Vamanan et al. Methodology 1)Real Implementation  Small scale runs 2)Simulation  Evaluate production-like workloads  At-scale runs  Validated against real implementation

Balajee Vamanan et al. Real Implementation  16 machines connected to ToR  24x 10Gbps ports  4 MB shared packet buffer  Publicly available DCTCP code  D 2 TCP  ~100 lines of code over DCTCP All parameters match DCTCP paper D 3 requires custom hardware  comparison with D 3 only in simulation ToR Switch Servers Rack

Balajee Vamanan et al. D 2 TCP: Deadline-aware Scheduling  DCTCP  All flows get same b/w irrespective of deadline  D 2 TCP  Near-deadline flows get more bandwidth

Balajee Vamanan et al. At-Scale Simulation  1000 machines  25 Racks x 40 machines-per-rack  Fabric switch is non-blocking  simulates fat-tree Fabric Switch Racks

Balajee Vamanan et al. At-Scale Simulation (cont.)  ns-3  Calibrated to unloaded RTT of ~200 μs  Matches real datacenters  DCTCP, D 3 implementation matches specs in paper

Balajee Vamanan et al. Workloads  5 synthetic OLDI applications  Message size distribution from DCTCP/D 3 paper  Message sizes: {2,6,10,14,18} KB  Deadlines calibrated to match DCTCP/D 3 paper results  Deadlines: {20,30,35,40,45} ms  Use random assignment of threads to nodes  Long-lived flows sent to root(s)  Network utilization at 10-20%  typical of datacenters

Balajee Vamanan et al. Missed Deadlines  At fan-in of 40, both DCTCP and D 3 miss ~25% deadlines  At fan-in of 40, D 2 TCP misses ~7% deadlines D 2 TCP

Balajee Vamanan et al. Performance of Long-lived Flows  Long-lived flows achieve similar b/w under D 2 TCP (within 5% of TCP) D 2 TCP

Balajee Vamanan et al. The next two talks …  Address similar problems  Allow them to present their work  Happy to take comparison questions offline

Balajee Vamanan et al. Conclusion  D 2 TCP is deadline-aware and handles fan-in bursts  50% fewer missed deadlines than D 3  Does not hinder background, long-lived flows  Coexists with TCP  Incrementally deployable  Needs no hardware changes D 2 TCP is an elegant and practical solution to the challenges posed by OLDIs